TTS Streaming Test
Model
kokoro
pocket
chatterbox
qwen3
elevenlabs
Voice
(engine default)
Seed
Text
Hello! This is a streaming test. Let's see how fast the first audio chunk arrives.
Supported Chatterbox tags (click to insert at cursor):
[clear throat]
[sigh]
[shush]
[cough]
[groan]
[sniff]
[gasp]
[chuckle]
[laugh]
Instructions (Qwen3 - free-text style hint)
Format
PCM (streaming)
WAV (buffered)
Stream Mode
v1 (crossfade)
v2 (fade-in)
Speak
Stop
Share
Auto-run Whisper check
-
Time to First Audio (ms)
-
Total Time (ms)
-
Chunks Received
-
Total Bytes
-
Audio Duration (s)
-
RTF (lower=faster)
History
#
Model
Mode
Text
TTFA
Total
Chunks
Duration
RTF
Log
Quality Checks
-
-
Mid-speech Gaps
-
Longest Gap (ms)
-
Avg Level (dBFS)
-
Peak (dBFS)
-
HF Energy %
-
Clipped Samples
Spectrogram (0-12 kHz, dark=quiet, bright=loud)
Whisper STT Round-trip
-
Run Whisper Check
-
WER %
-
Word Errors
-
Ref Words