Not listed

Verify or upgrade your startup to be featured on Startup Fame with « do-follow » links.

Qwen3-TTS

Voice Design, Clone, and Generation

What is Qwen3-TTS?

An open-source text-to-speech model that turns written text into natural, human-like audio. It can clone a speaker from just a few seconds of reference audio, control emotion and speaking style via prompts, and synthesize long-form or streaming audio with very low latency. The project supports over ten languages, handles code-switching, and is available under the Apache 2.0 license for easy deployment on edge devices or cloud servers.

Key features

Clone a voice from a three-second reference clip instantly.
Compress speech with a high-efficiency 12Hz tokenizer for speed.
Adjust prosody and emotion based on text context automatically.
Support over ten languages and handle code-switching smoothly.
Stream generated audio with ultra-low first-token latency around 97ms.
Deploy on edge or cloud and use Open Source Apache 2.0.