AVTR-1 A free, open-source, open weights real time avatar model
AVTR-1 is a new open-source model designed for real-time avatar dialogue, capable of rendering lip-synced speech at 25 frames per second. It utilizes a single GPU for efficient performance and includes features such as an interactive demo and production-ready backend. The model is built for easy deployment with accessible weights and inference code available for users.
- ▪AVTR-1 is an autoregressive model that matches flow for live dialogue.
- ▪It can render lip-synced speech and active listening using a portrait image and dual-stream audio.
- ▪The model is designed for production deployment and includes an API for self-hosting.
Opening excerpt (first ~120 words) tap to expand
AVTR-1 AVTR-1 is a flow-matching-based autoregressive model for live dialogue. Given a portrait image and dual-stream audio, it renders lip-synced speech and active listening at 25 fps on a single GPU. Built for production deployment: model weights, TensorRT-accelerated inference, and the live-session backend - available as an API or fully self-hosted trailer_720p_small.mp4 📑 What's included Model weights Inference code Interactive streaming demo Technical report (Coming soon) Production-ready back-end (Coming soon) Table of Contents Quick Start Performance Troubleshooting 1.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.