GPT-4o mini TTS is OpenAI's answer to the expressiveness gap in TTS-1. It produces more natural, emotive speech and supports prompt-driven voice direction. For single generations, the quality can be impressive.
The problem shows up in production batches. The voice character is not stable between generations - file 1 can sound like a completely different person than file 50. Pacing varies unpredictably, tone shifts mid-batch, and quality glitches appear more frequently than with TTS-1. Longer generations are particularly unstable, with random pauses, volume changes, and repeated sentences.
TTSAudit's speaker consistency check is designed for exactly this. It compares voice identity across every file in your batch and flags the ones where the voice character drifted. Combined with our quality checks for glitches and silence gaps, you get a complete picture of which files to regenerate.
What developers are saying
"For audios longer than 1.5-2mins, gpt-4o-mini-tts works really unstable: random pauses from few seconds to more than a minute, random volume and tone changes, repeating last few sentences in random order."
u/tcherkashin94 on OpenAI Forum
"Total requested audio was 4:31, but from 1:21-2:26 and 3:02-3:36 there was only silence. Also huge volume level changes and style shifts. In short: unusable crap."
u/janne.kauttonen on OpenAI Forum
"The voice sounds completely different between two generations with the same prompt and settings. It's like talking to a different person each time."
OpenAI Developer Forum
"I switched from tts-1 to gpt-4o-mini-tts for better expressiveness but now I can't get consistent output across a batch. Some files are great, others are unusable."
OpenAI Developer Forum
How TTSAudit solves this
Voice Identity Check
Detect when GPT-4o mini TTS produces a different-sounding voice between generations. Flag files where the speaker identity shifted.
Consistency Scoring
Every file scored against the batch baseline for voice character, pacing, and tone. See exactly where consistency breaks.
Silence & Glitch Detection
Catch random pauses, silence gaps, volume spikes, and repeated sentences that GPT-4o mini TTS produces on longer content.
Targeted Regeneration
Know exactly which files sound like a different person. Regenerate only those and keep the rest of your batch.