Is this Mandarin speech-to-text really free?

Yes. The page is public, no signup is required, and there are no length limits. The recognition model runs entirely in your browser. For higher accuracy on noisy or accented audio, run your file through Subformer's cloud transcription in Subtitles-only mode.

How does the Mandarin transcription work in the browser?

It uses an open-source neural speech recognition model compiled to WebAssembly and run via ONNX Runtime Web. The model (~75 MB) is downloaded on first use and cached in your browser. Subsequent visits skip the download entirely and can transcribe offline.

Is my audio uploaded to a server?

No. Recognition runs locally in your browser. The only network calls are to download the model and the WebAssembly runtime - both are static assets. Your audio file and the generated transcript never leave your device.

What audio formats are supported?

Anything your browser can decode: MP3, WAV, M4A, AAC, FLAC, OGG/Opus, plus video containers like MP4, MOV, and WEBM (the audio track is extracted automatically).

Can I download SRT or VTT subtitles?

Yes. After transcription you can download the result as plain text (.txt), SubRip subtitles (.srt), or WebVTT (.vtt) - both subtitle formats include segment-level timestamps so you can drop them straight into any video editor or player.

Tiny vs Base - which model should I pick?

Tiny (~40 MB) is fastest and great for clean studio audio. Base (~75 MB) is roughly twice as accurate on noisy or accented audio and only slightly slower. If you're not sure, start with Base - it's the default. For the highest accuracy on hard audio (accents, noise, overlap), use Subformer's cloud transcription in Subtitles-only mode.

Does it work offline?

Once the model is cached, yes. The first visit needs network access to download it; after that the page can transcribe audio offline.

Which browsers are supported?

Any modern browser with WebAssembly: Chrome, Firefox, Safari 15+, Edge, and Chromium-based mobile browsers.

Subformer

Free · No signup · Runs in your browser

Free Mandarin Speech to Text

中文

Drop a Mandarin audio or video file and get a full transcript with timestamps - all in your browser, with no signup. The model is downloaded once (~75 MB) and cached locally, so subsequent runs are instant and work offline.

Checking browser support…

How it works

1. Drop a file
Drop in any Mandarin audio or video - MP3, WAV, M4A, MP4, and WEBM are all supported.
2. Transcribe
First click downloads the ~75 MB model to your browser. After that, recognition is instant and offline-capable.
3. Download
Save the transcript as TXT, SRT, or VTT. Your audio and the result never leave your device.

Your audio stays on your device

Recognition runs entirely in your browser using WebAssembly. The only network requests are to fetch the model and the WebAssembly runtime - both are public, static assets. We don't see your audio and we don't see the transcript.

Need higher accuracy?

Cloud-quality Mandarin transcription - Subtitles-only mode

The free in-browser tool runs a small open-source model - fine for clean audio, but it can struggle with heavy Mandarin accents, noise, or overlapping speakers. For broadcast-grade accuracy and SRT/VTT you can drop straight into a video editor, run your file through Subformer's cloud transcription in Subtitles-only mode.

Open Subtitles-only mode

Premium - also from Subformer

Translate & dub

Translate and dub videos into 30+ languages in one click.

Translate subtitles

Drop in an SRT or VTT and get back a polished translation in any language.

Subtitle editor

Polish, retime, and restyle your subtitles in a free in-browser editor.

Free Mandarin Speech to Text

How it works

1. Drop a file

2. Transcribe

3. Download

Your audio stays on your device

Cloud-quality Mandarin transcription - Subtitles-only mode

Frequently asked questions

Try other free speech-to-text languages