ChatTTS Me Frequently Asked Questions

FAQ from ChatTTS Me

What is ChatTTS Me?

ChatTTS Me is a platform that brings text to life and puts your voice in control. It transforms text into dynamic, natural-sounding speech, making it an ideal solution for chatbots and virtual assistants. The ultimate conversational TTS model allows for optimized, expressive dialogue with fine-grained prosodic control.

How to use ChatTTS Me?

Using ChatTTS Me is easy. Simply input your text, refine it for optimal results, adjust the audio temperature, top_P, and top_K settings if needed, and click generate to obtain your natural-sounding speech audio.

How does ChatTTS Me excel in prosody?

ChatTTS Me is optimized for dialogue scenarios, enabling natural, expressive speech with support for multiple speakers. It allows for fine-grained control over prosodic features like laughter, pauses, and interjections, delivering a lifelike experience.

What are the GPU memory requirements for ChatTTS Me?

For a 30-second audio clip in ChatTTS Me, a minimum of 4GB of GPU memory is needed. On a 4090 GPU, ChatTTS Me generates audio at about 7 semantic tokens per second, with a Real-Time Factor (RTF) of around 0.3.

Can we control elements other than laughter in ChatTTS Me?

Currently, the only token-level control units in ChatTTS Me are [laugh], [uv_break], and [lbreak]. However, future versions may include additional emotional control capabilities.

ChatTTS Me Frequently Asked Questions