Blog
Insights on TTS quality assurance, voice AI, and audio production best practices.
Play.ht Quality Issues and Migration Guide After Shutdown
Play.ht shut down December 2025. What the documented quality issues taught us, where users are migrating, and how to audit your new Text-to-Speech provider.
Why GPT-4o Mini TTS Sounds Different Every Time
GPT-4o mini Text-to-Speech is more expressive than TTS-1, but produces a different-sounding voice on every generation. Catch the drift automatically.
Best Text-to-Speech Tools to Use in 2026
Picking a Text-to-Speech model is the easy part. The QA, cleanup, editing, captioning, and lipsync tools around it make your pipeline production-ready.
Common Text-to-Speech Audio Artifacts: A Reference Guide
Every type of Text-to-Speech audio failure - clicks, metallic tones, hallucinated words, truncation, drift - with descriptions, causes, and detection guidance.
Gemini 3.1 Flash TTS Sounds Amazing - For About 60 Seconds
Google's Gemini 3.1 Flash TTS is expressive and impressive on short clips, but quality collapses on anything over one minute. We tested it - 90% of long-form generations degraded.
ElevenLabs v3 vs Turbo v2.5: Quality, Pricing, and Known Issues
An honest comparison of ElevenLabs v3 and Turbo v2.5 - quality differences, credit costs, known failure modes, and which model fits your use case.
How to Add QA to a Text-to-Speech Pipeline (For Vibe Coders)
A practical guide to adding automated quality assurance to a Text-to-Speech pipeline - REST, x402 micropayments, code examples, and thresholds that actually work.
Best Text-to-Speech QA Tools in 2026
Every option for checking Text-to-Speech output quality in 2026 - manual, academic toolkits, audio enhancement suites, and purpose-built batch QA compared.
Text-to-Speech Quality Metrics: MOS, WER, MCD, PESQ Explained
What MOS, WER, MCD, PESQ, STOI, and UTMOS actually measure, where each one fails, and which to use in production Text-to-Speech pipelines.
Why Your Text-to-Speech Voice Changes Between Files
Generate twenty Text-to-Speech files for one project and one of them sounds different. This is voice drift. Here's why it happens and how to catch it.
Why Text-to-Speech Voices Switch Accent
Some Text-to-Speech models shift accent based on what the text is about. An American voice narrating a Scottish landmark comes back Scottish. Here's why.
AI Audiobook QA Checklist: Ship Without ACX Rejections
A complete quality assurance checklist for AI-narrated audiobooks - ACX technical requirements, TTS-specific failure modes, and automated detection.
Gemini 2.5 Pro TTS Inconsistent Accent and Pacing
Gemini 2.5 Pro Text-to-Speech is expressive and great value, but roughly 1 in 10 generations shifts accent or pacing. Catch them automatically.
Azure Speech Text-to-Speech Quality: Batch Synthesis Pitfalls
Azure Speech is built for enterprise but has documented voice consistency, batch synthesis, and update-regression issues. How to catch them automatically.
How to Fix ElevenLabs v3 Quality Issues
ElevenLabs v3 produces expressive audio, but frequent platform updates cause inconsistent output across batches. Catch drift, glitches, and regressions.
ElevenLabs Turbo v2.5 Slurring Words: How to Fix
ElevenLabs Turbo v2.5 is cheap and more stable than v3, but slurs words and produces audio glitches at scale. Catch them automatically.
Why Amazon Polly Sounds Different Between Files
Amazon Polly sometimes silently drops from Neural to Standard voice mid-synthesis. Here's how to spot the drop and fix it automatically.
Is OpenAI TTS-1 Reliable for Production?
OpenAI TTS-1 is one of the most stable Text-to-Speech models, but it still drops silence gaps and skips content at scale. Catch them automatically.