Speech to Text

Zoom Media
Turn speech into text to automate subtitling, transcription and indexing of audio archives

Speech to Text

Zoom Media

4.3 out of 5 stars
4.3 (3)

Turn speech into text to automate subtitling, transcription and indexing of audio archives

About Zoom Media Speech recognition

Automatic Speech Recognition or Speech to Text, turns audio into text automatically. The service can be used for automated (live) subtitles, transcription of recordings, voice bots and indexing of large archives of audio content to make them better searchable. 


We make sure we update all our language models on a monthly basis to retain a high accuracy. All our models reach an average accuracy level of 90%+. The better the quality of the audio, the better the output of our language models will be.

Language models

We currently support 10 languages and are able to develop new models or customize existing ones within 3 months upon request. Get in touch to learn more. 

  • Arabic (Modern Standard)
  • Danish
  • Dutch
  • English US
  • Filipino
  • Finnish
  • Flemish
  • Italian
  • Norwegian
  • Swedish


All our models can process in real-time and batch mode. In both cases our API returns a JSON-file containing timestamps in milliseconds and a confidence score per word. 

  • Real-time processing
    Stream live to our API and receive back results instantaneously. This functionality is particularly useful to generate subtitles for live webinars, conferences, breaking news stories or parliamentary debates to make this type of content better accessible for people tuning in with the sound turned off or for people with hearing loss.
  • Batch processing
    Process large amounts of content such as archives through our API. Thanks to the power of Azure cloud we can scale up easily, thus enabling users to process large quantities of audio and/or video.


Our speech to text models can be applied for various use cases in different verticals.

  • Media & Entertainment
    Broadcasters use Speech to Text to (live) subtitle breaking news stories or to index content archives and make them better searchable.
  • Government
    Make parliamentary debates and city council meetings better accessible to the deaf and hard of hearing by subtitling or transcribing them in real-time.
  • Meetings
    Stop taking notes and transcribe all meetings with your colleagues in real-time, allowing your team to focus on the actual meeting.


We strive to make integrating our solution as easy as possible. Either follow our API-documentation or use the Python SDK which we released as open source under an MIT license. Find links to both below.