AI/MLRemote / Toronto / San FranciscoFull-timeMid to Senior

Machine Learning Engineer (Speech & Voice)

About This Role

Join our AI team to design, train, and optimize cutting-edge speech recognition and text-to-speech models for multilingual dubbing. You'll work on state-of-the-art models like Whisper, VITS, and FastSpeech to enable seamless voice conversion across languages and accents.

What You'll Do

Research, design, and implement ML models for speech recognition and TTS
Optimize model performance for real-time inference and production deployment
Collaborate with product and engineering teams to integrate ML models into our platform
Experiment with multi-language and accent adaptation techniques
Monitor and improve model quality metrics across different languages
Stay current with latest research in speech synthesis and voice conversion

Requirements

3+ years experience with PyTorch or TensorFlow
Experience in ASR, TTS, or voice conversion
Familiarity with multi-language and accent adaptation
Strong understanding of deep learning architectures (Transformers, CNNs, RNNs)
Experience with model optimization and deployment

Nice to Have

Published research in speech/audio processing
Experience with Whisper, VITS, or FastSpeech
Knowledge of audio signal processing
Experience with distributed training

What We Offer

Competitive salary and equity package
Flexible work arrangements (remote, hybrid, or office)
Health, dental, and vision insurance
Unlimited PTO and paid holidays
Learning and development budget
Top-tier equipment and tools
Collaborative, innovative team environment

Apply Now