Not listed
Verify or upgrade your startup to be featured on Startup Fame with « do-follow » links.
What is Qwen3-TTS?
An open-source text-to-speech model that turns written text into natural, human-like audio. It can clone a speaker from just a few seconds of reference audio, control emotion and speaking style via prompts, and synthesize long-form or streaming audio with very low latency. The project supports over ten languages, handles code-switching, and is available under the Apache 2.0 license for easy deployment on edge devices or cloud servers.
Key features
- Clone a voice from a three-second reference clip instantly.
- Compress speech with a high-efficiency 12Hz tokenizer for speed.
- Adjust prosody and emotion based on text context automatically.
- Support over ten languages and handle code-switching smoothly.
- Stream generated audio with ultra-low first-token latency around 97ms.
- Deploy on edge or cloud and use Open Source Apache 2.0.
Category
Website
Tags
Links




