How to Capture Audio from Your Computer and Transcribe It
How to Capture Audio from Your Computer and Transcribe It
You want to record what you hear through your computer — a Zoom call, a YouTube tutorial, a podcast, system sounds — and get a searchable transcript.
This should be simple. It isn't. Most operating systems make it surprisingly hard to capture system audio (the audio coming out of your speakers/headphones), especially on macOS.
Here's how to do it properly in 2026.
The Problem with System Audio Capture
macOS
macOS doesn't let apps record system audio directly. When you "screen record" with QuickTime, you get video but no system audio — just your microphone.
Workarounds exist (BlackHole, Soundflower, Loopback) but they require virtual audio devices, manual routing, and break regularly with OS updates. On macOS 15+, Apple finally added system audio capture to the screen recording API, but few apps take advantage of it properly.
Windows
Windows is better here. WASAPI loopback capture lets apps record system audio without extra drivers. Most screen recorders support it. But combining system audio + mic audio + transcription in one tool is still uncommon.
Linux
PulseAudio and PipeWire allow monitor source recording, but configuration varies by distribution and desktop environment.
The Simple Solution: Screenpipe
Screenpipe captures both system audio and microphone input simultaneously, transcribes everything locally using Whisper, and makes it all searchable. No virtual audio devices, no manual routing, no cloud processing.
What it captures:
- System audio — everything you hear through speakers or headphones (meetings, videos, music, notifications)
- Microphone — everything you say
- Both simultaneously — hear the meeting and your own comments, attributed separately
What it does with it:
- Real-time local transcription using Whisper
- Speaker identification — who said what
- Full-text search across all transcripts
- AI-powered queries: "What was the action item from the standup?"
- Timestamps linking audio to screen content
Setup
- Download Screenpipe for your platform
- Grant microphone permission (and screen recording on macOS for system audio)
- That's it — audio capture starts automatically
No BlackHole. No Soundflower. No virtual audio routing. It just works.
Use Cases
Meeting Recording Without a Bot
Most meeting transcription tools (Otter, Granola, tl;dv) add a bot to your call. Screenpipe captures your computer's audio output — no bot, no "Otter.ai wants to join" notification, no asking permission.
Every Zoom, Meet, Teams, Slack huddle, or Discord call is captured automatically because you're recording system audio. See the AI meeting notes use case for details.
Tutorial and Lecture Capture
Watching a coding tutorial? A university lecture? A conference talk? Screenpipe transcribes the audio so you can search it later. "What did they say about the useEffect cleanup function?" — instant answer from the transcript.
Podcast Research
Listening to podcasts for research? Screenpipe transcribes as you listen. Later, search for specific topics across hours of audio: "when did they discuss Series A fundraising?"
Call Transcription
Phone calls on speaker, VoIP calls through your computer, customer support calls — anything that plays through your audio output gets captured and transcribed.
Comparison with Other Approaches
| Screenpipe | BlackHole + Whisper | Otter.ai | OBS + Manual | |
|---|---|---|---|---|
| System audio | ✅ Native | ✅ Virtual device | ❌ Meeting only | ✅ With config |
| Mic audio | ✅ | ⚠️ Complex routing | ❌ | ✅ |
| Auto transcription | ✅ Local Whisper | Manual | ✅ Cloud | ❌ |
| Always-on | ✅ | Manual start | Per-meeting | Manual start |
| Searchable | ✅ AI-powered | ❌ | ✅ | ❌ |
| Screen capture too | ✅ OCR | ❌ | ❌ | ✅ Video only |
| Setup effort | 5 minutes | 30+ minutes | 5 minutes | 15 minutes |
| Privacy | 100% local | Local | Cloud | Local |
Audio + Screen Is Better Than Audio Alone
Here's why Screenpipe captures your screen and audio simultaneously:
When someone says "look at the third column" while sharing a spreadsheet, the audio transcript alone gives you "look at the third column" — useless without the visual. With screen capture, you get the actual spreadsheet content.
When someone drops a URL in the Zoom chat, audio tools miss it entirely. Screen capture catches it.
When someone demos a UI and says "click here, then here," the audio is meaningless without the visual. Screen + audio together give you the full picture.
Getting Started
- Download Screenpipe
- Grant permissions (takes 30 seconds)
- Your audio is now being captured and transcribed — automatically, locally, continuously
Every meeting, every call, every video, every podcast. Searchable forever, on your device.
For more details, see the audio capture and transcription use case.
