Drop any audio or video file, pick a language, and get a full transcript with timestamps - all in your browser. The model is downloaded once (~75 MB) and cached locally, so subsequent runs are instant and work offline.
Recognition runs locally. Your audio never leaves your device.
The model is cached in your browser. Subsequent runs work offline.
No account, no credits, no usage cap. Open and use.
The free in-browser tool runs a small open-source model - fine for clean audio, but it can struggle with heavy accents, noise, or overlapping speakers. For broadcast-grade accuracy and SRT/VTT you can drop straight into a video editor, run your file through Subformer's cloud transcription in Subtitles-only mode.
Open Subtitles-only mode