About Zoom Media Speech recognition
Automatic Speech Recognition or Speech to Text, turns audio into text automatically. The service can be used for automated (live) subtitles, transcription of recordings, voice bots and indexing of large archives of audio content to make them better searchable.
We make sure we update all our language models on a monthly basis to retain a high accuracy. All our models reach an average accuracy level of 90%+. The better the quality of the audio, the better the output of our language models will be.
We currently support 10 languages and are able to develop new models or customize existing ones within 3 months upon request. Get in touch to learn more.
All our models can process in real-time and batch mode. In both cases our API returns a JSON-file containing timestamps in milliseconds and a confidence score per word.
Our speech to text models can be applied for various use cases in different verticals.
We strive to make integrating our solution as easy as possible. Either follow our API-documentation or use the Python SDK which we released as open source under an MIT license. Find links to both below.