AI/MLRemote / Toronto / San FranciscoFull-timeMid to Senior

Machine Learning Engineer (Speech & Voice)

About This Role

Join our AI team to design, train, and optimize cutting-edge speech recognition and text-to-speech models for multilingual dubbing. You'll work on state-of-the-art models like Whisper, VITS, and FastSpeech to enable seamless voice conversion across languages and accents.

What You'll Do
  • Research, design, and implement ML models for speech recognition and TTS
  • Optimize model performance for real-time inference and production deployment
  • Collaborate with product and engineering teams to integrate ML models into our platform
  • Experiment with multi-language and accent adaptation techniques
  • Monitor and improve model quality metrics across different languages
  • Stay current with latest research in speech synthesis and voice conversion
Requirements
  • 3+ years experience with PyTorch or TensorFlow
  • Experience in ASR, TTS, or voice conversion
  • Familiarity with multi-language and accent adaptation
  • Strong understanding of deep learning architectures (Transformers, CNNs, RNNs)
  • Experience with model optimization and deployment
Nice to Have
  • Published research in speech/audio processing
  • Experience with Whisper, VITS, or FastSpeech
  • Knowledge of audio signal processing
  • Experience with distributed training
What We Offer
  • Competitive salary and equity package
  • Flexible work arrangements (remote, hybrid, or office)
  • Health, dental, and vision insurance
  • Unlimited PTO and paid holidays
  • Learning and development budget
  • Top-tier equipment and tools
  • Collaborative, innovative team environment