ElevenLabs launches Scribe, a powerful AI model for transcription in 99 languages

1 min read
ElevenLabs

ElevenLabs has introduced "Scribe," its first Speech-to-Text (STT) model, designed to deliver highly accurate transcription capabilities across 99 languages. Scribe is built to handle complex, real-world audio scenarios and offers features such as word-level timestamps, speaker diarization, and audio-event tagging. The model is accessible to developers through a Speech-to-Text API and to creators and businesses via the ElevenLabs dashboard, where users can upload audio or video files to generate structured transcripts. A low-latency version for real-time applications is also planned for release soon.

ElevenLabs

Scribe has been benchmarked against leading models like Whisper Large V3, Deepgram Nova-3, and Gemini 2.0 Flash on datasets such as FLEURS and Common Voice. It consistently outperforms competitors with the lowest word error rates, achieving 98.7% accuracy in Italian, 96.7% in English, and similarly high results in 97 other languages. This performance extends to traditionally underserved languages like Serbian, Cantonese, and Malayalam, where it significantly reduces transcription errors compared to existing solutions.

馃挕
Try newly released features on ElevenLabs with TestingCatalog link

The development team behind Scribe includes experts such as Flavio Schneider (research lead) and Tim von K盲nel (project lead), among others specializing in architecture, data acquisition, and optimization. This release positions ElevenLabs as a key player in the automatic speech recognition (ASR) field, offering a robust tool for applications like meeting summaries, subtitles, and more.

Source