WeSearch

Microsoft VibeVoice: Open-Source Frontier Voice AI

·4 min read · 0 reactions · 0 comments · 10 views
#technology#ai#open-source#speech-recognition#text-to-speech
Microsoft VibeVoice: Open-Source Frontier Voice AI
⚡ TL;DR · AI summary

Microsoft has introduced VibeVoice, an open-source voice AI framework that includes both speech recognition and text-to-speech models. The VibeVoice-ASR model can process long-form audio and generate structured transcriptions, while the VibeVoice-TTS model supports multi-speaker dialogues. Both models are designed to enhance collaboration in the speech synthesis community and are now available through the Hugging Face Transformers library.

Key facts
Original article
GitHub
Read full at GitHub →
Opening excerpt (first ~120 words) tap to expand

🎙️ VibeVoice: Open-Source Frontier Voice AI 📰 News 2026-03-06: 🚀 VibeVoice ASR is now part of a Transformers release! You can now use our speech recognition model directly through the Hugging Face Transformers library for seamless integration into your projects. 2026-01-21: 📣 We open-sourced VibeVoice-ASR, a unified speech-to-text model designed to handle 60-minute long-form audio in a single pass, generating structured transcriptions containing Who (Speaker), When (Timestamps), and What (Content), with support for User-Customized Context. Try it in Playground. ⭐️ VibeVoice-ASR is natively multilingual, supporting over 50 languages — check the supported languages for details.

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from GitHub