Speaker Consistency
We build a voice fingerprint for every file in your batch, compare them all against each other, and flag any file whose voice drifts away from the rest.
A similarity matrix for every batch
Every file is compared against every other file in the batch. The result is an n×n heatmap where the outliers stand out visually - and numerically.
Similarity heatmap
batch of 6What you see
Blue cells mean two files sound alike. A bad file shows up as a red row and column cutting across the matrix - instantly visible even in a 500-file batch.
How outliers are picked
Each file gets a similarity score against the rest of the batch. Files that drift too far from the group are flagged for regeneration. You choose how strict the threshold is per request - tight for audiobooks, looser for varied content.
Built to catch what humans miss
Speaker drift is gradual and subtle. A voice fingerprint is tuned to identity, not content - so it picks up shifts that slip past anyone doing a quick spot-check.
Identity, not content
The fingerprint captures how a speaker sounds, not what they say. Two different scripts still get compared fairly.
Every file, every file
We compare every file in your batch against every other file. Gradual drift across 500 chapters is unmissable when you see the whole picture at once.
Three accuracy tiers
Standard analyses the first 30 seconds of each file — fastest and cheapest. High analyses the first 60 seconds. Highest analyses the full file — pick it when voices are very similar or drift only appears late in long files.
What you get back per file
Full similarity matrix
The pairwise comparison grid for every file in your batch - inspect it, plot it, or hand it to your team.
Deviation score
How far each file drifts from the batch norm, so you can rank the worst offenders.
Outlier flag
A boolean per file: does this voice match the rest of the batch, or does it need regenerating?
Real reports from real batches
"When using Amazon Polly with Neural voices, Amazon Polly will switch to the standard voice. The 1st and 3rd sentences are read using the Neural voice, while the second sentence is read using the Standard voice."
"I was using Azure Speech Studio and now her voice is completely different. The speech style is very different and the audio is less clear."
"In both US West and US East regions I get the wrong voice, but in the West Europe region I get the correct voice."
"I tried all of the other voices just in case it got switched due to some kind of glitch but none of them are the voice I was working with before."
Stop shipping inconsistent audio
100 free credits on signup. No credit card required.