Open Weight Text-to-Speach with Voxtral TTS

https://www.facebook.com/kdnuggets· May 1, 2026 · 12:00 PM UTC ·8 min read · 0 reactions · 0 comments · 4 views

#text-to-speech #voice cloning #open weight model #ai assistant #low latency #Mistral AI #Voxtral TTS #ElevenLabs #Ministral 3B #Hugging Face

Open Weight Text-to-Speach with Voxtral TTS

⚡ TL;DR · AI summary

Mistral AI released Voxtral TTS on March 26, 2026, an open-weight text-to-speech model with 4 billion parameters capable of generating natural-sounding speech in nine languages. The model supports voice cloning from just three seconds of audio and is optimized for low-latency performance, making it suitable for real-time applications. While the model weights are available for non-commercial use under a CC BY-NC 4.0 license, commercial use requires a separate agreement or access via Mistral's API.

Key facts

▪Voxtral TTS is a 4-billion-parameter text-to-speech model developed by Mistral AI and released on March 26, 2026.
▪It enables voice cloning from as little as three seconds of reference audio and supports nine languages including English, Spanish, and Arabic.
▪The model achieves a real-time factor of 9.7x with approximately 100ms time-to-first-audio, making it suitable for real-time conversational applications.
▪Voxtral TTS uses open weights under the CC BY-NC 4.0 license for non-commercial use, while commercial usage requires a licensing agreement or use of Mistral's API.
▪In human evaluations, Voxtral TTS outperformed ElevenLabs Flash v2.5 in most supported languages, with a 68.4% win rate overall.

Original article

KDnuggets · https://www.facebook.com/kdnuggets

Read full at KDnuggets →

Opening excerpt (first ~120 words) tap to expand

Image by Editor # Introduction Voice-enabled applications are everywhere, from virtual assistants to customer service chatbots. But for developers, building natural-sounding speech into apps has often meant relying on expensive cloud APIs or dealing with robotic, unnatural voices. Mistral AI aims to change that with Voxtral TTS. It is a powerful, open-weight text-to-speech (TTS) model that you can run on your own hardware. Released on March 26, 2026, this 4-billion-parameter model generates human-like speech in nine languages and adapts to a new voice from as little as three seconds of reference audio.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at KDnuggets.

Anonymous · no account needed

Discussion

0 comments

Open Weight Text-to-Speach with Voxtral TTS

Discussion

More from KDnuggets