中文
Drop a Mandarin audio or video file and get a full transcript with timestamps - all in your browser, with no signup. The model is downloaded once (~75 MB) and cached locally, so subsequent runs are instant and work offline.
Checking browser support…
Drop in any Mandarin audio or video - MP3, WAV, M4A, MP4, and WEBM are all supported.
First click downloads the ~75 MB model to your browser. After that, recognition is instant and offline-capable.
Save the transcript as TXT, SRT, or VTT. Your audio and the result never leave your device.
Recognition runs entirely in your browser using WebAssembly. The only network requests are to fetch the model and the WebAssembly runtime - both are public, static assets. We don't see your audio and we don't see the transcript.
The free in-browser tool runs a small open-source model - fine for clean audio, but it can struggle with heavy Mandarin accents, noise, or overlapping speakers. For broadcast-grade accuracy and SRT/VTT you can drop straight into a video editor, run your file through Subformer's cloud transcription in Subtitles-only mode.
Open Subtitles-only mode