← Instagram to MP4

Video to Text Converter

Convert any video into a clean, searchable text transcript with timestamps. Free, browser-based, and works on MP4, MOV, WebM, MKV, MP3 and WAV. Exports to SRT, VTT, or plain text.

🎙️ Upload a file to transcribe MP4 · MOV · MP3 · WAV — free, 98+ languages →

Video is great for watching, terrible for searching. A 40-minute lecture you recorded on your phone is effectively invisible to you a week later — you can't grep it, can't quote it, can't feed it to another tool. Converting video to text flips that: once there's a transcript, the content becomes a first-class searchable document. This converter takes the video you already have and gives you that document in minutes, with timestamps on every segment so you can still jump back to the video frame that matters.

What the converter actually does

Three stages. First, your browser pulls the audio track out of the video file using ffmpeg.wasm — this happens entirely on your machine, and the original video never leaves your device. Second, the compressed audio (16 kHz mono MP3, roughly 500 KB per minute) is uploaded over HTTPS to our Cloudflare Worker, which forwards it to OpenAI's Whisper API. Third, the returned segments are written to a database and streamed back to your browser in real time via Server-Sent Events so you watch the transcript appear as it's produced.

Supported input formats

Anything ffmpeg.wasm can read, which is essentially everything — MP4, MOV, MKV, WebM, AVI, FLV for video; MP3, WAV, M4A, FLAC, OGG for audio. There's no format conversion step you need to do beforehand. If the file plays in your browser, it will almost certainly transcribe.

Timestamps you can actually use

Every segment is delimited down to the millisecond in the underlying Whisper output. When you download SRT or VTT, those timestamps go with it, ready to drop into a YouTube upload or a Premiere Pro subtitle track. When you download plain TXT, segments are merged into natural paragraphs so it reads like a document — no timestamp clutter unless you want it.

How to Video to Text Converter

Upload your video via the button above
Choose optional extras — AI summary or translation
Wait 1–3 minutes while Whisper transcribes (watch live progress)
Download TXT, SRT, or VTT, or copy the text straight from the page

Frequently Asked Questions

What's the longest video I can convert?

Free tier: 10 minutes per file, 3 files per day. Pro tier: 4 hours per file, 200 files per day. File size caps are 200 MB free / 5 GB Pro.

Does it keep punctuation and capitalization?

Yes. Whisper outputs fully punctuated, case-correct text — you won't get a wall of lowercase.

Does it handle multiple speakers?

The transcript is returned as one contiguous stream right now. Speaker labels are on the Pro roadmap (powered by pyannote-audio). Today you can still identify speakers manually during the edit step.

What if the audio has noise or music?

Whisper is surprisingly robust to background noise and light music. It struggles most with very overlapping speech or when the speaker is far from the mic. For those files you'll see lower confidence in the segmented output.

Can I use the transcript commercially?

Yes. You own the transcript — we claim no rights to it. The audio file itself must of course respect the copyright of its original creator.