Upload any audio file — MP3, WAV, M4A, FLAC, OGG — and get an accurate AI transcript in minutes. Free, private, 98+ languages, SRT and VTT export included.
Not everything you need transcribed has video attached to it. Voice memos, podcast episodes, journalist interviews, phone calls you recorded with permission, music-less lecture recordings — all start life as pure audio. This audio-to-text tool skips the video extraction step entirely and feeds your MP3 or WAV directly into the Whisper pipeline. For pure audio the full accuracy of the model is available, often slightly better than video-based transcription because phone and recorder files tend to have less compression artifacts than streaming platforms.
MP3, WAV, M4A (AAC), FLAC, OGG, WebM audio, and even the less common Opus. If your phone or recorder saves it, it transcribes. The uploader inspects the Content-Type header and decides whether to re-compress; MP3 files already at 128 kbps or lower are sent as-is so nothing is re-encoded. Large uncompressed WAV files are transcoded locally to a compact MP3 first to keep the upload tiny.
iPhone voice memos save as .m4a and Android recorders save as .m4a or .amr — both work. AirDrop or email the file to your laptop, then drop it into this page. If you're on a phone browser you can upload directly; Safari and Chrome on iOS/Android both work end-to-end, including the audio extraction step.
If you're a journalist or podcast producer, the typical loop is: record (Zoom H5, phone, or SquadCast), upload the WAV or MP3 here, download the SRT for rough subtitling, and paste the TXT into your show-notes tool. For long interviews you'll want Pro (4-hour per-file limit) — the free tier is capped at 10 minutes, which fits most voice memos but not full episodes.
Slightly. Audio-only files skip the ffmpeg extraction step and are usually cleaner recordings (phone/recorder instead of streaming compression), which tends to give 1-2 percentage points more accuracy.
Free tier: 10 minutes per file, 3 files per day. Pro tier: 4 hours per file, 200 per day.
Whisper can transcribe sung lyrics but it's designed for speech — accuracy is lower for singing, and purely instrumental tracks will return an empty transcript.
Not on the free tier — one at a time. Pro users can run several jobs in parallel.
The browser won't boost quiet audio — upload it as-is. Whisper handles low volume surprisingly well; for truly distant recordings, use Audacity to normalize first.