Speaker Consistency

We build a voice fingerprint for every file in your batch, compare them all against each other, and flag any file whose voice drifts away from the rest.

A similarity matrix for every batch

Every file is compared against every other file in the batch. The result is an n×n heatmap where the outliers stand out visually - and numerically.

Similarity heatmap

batch of 6
different voice
same voice

What you see

Blue cells mean two files sound alike. A bad file shows up as a red row and column cutting across the matrix - instantly visible even in a 500-file batch.

How outliers are picked

Each file gets a similarity score against the rest of the batch. Files that drift too far from the group are flagged for regeneration. You choose how strict the threshold is per request - tight for audiobooks, looser for varied content.

Built to catch what humans miss

Speaker drift is gradual and subtle. A voice fingerprint is tuned to identity, not content - so it picks up shifts that slip past anyone doing a quick spot-check.

psychology

Identity, not content

The fingerprint captures how a speaker sounds, not what they say. Two different scripts still get compared fairly.

compare_arrows

Every file, every file

We compare every file in your batch against every other file. Gradual drift across 500 chapters is unmissable when you see the whole picture at once.

tune

Three accuracy tiers

Standard analyses the first 30 seconds of each file — fastest and cheapest. High analyses the first 60 seconds. Highest analyses the full file — pick it when voices are very similar or drift only appears late in long files.

What you get back per file

grid_view

Full similarity matrix

The pairwise comparison grid for every file in your batch - inspect it, plot it, or hand it to your team.

percent

Deviation score

How far each file drifts from the batch norm, so you can rank the worst offenders.

flag

Outlier flag

A boolean per file: does this voice match the rest of the batch, or does it need regenerating?

1 credit per file

Real reports from real batches

"When using Amazon Polly with Neural voices, Amazon Polly will switch to the standard voice. The 1st and 3rd sentences are read using the Neural voice, while the second sentence is read using the Standard voice."

- AWS re:Post

"I was using Azure Speech Studio and now her voice is completely different. The speech style is very different and the audio is less clear."

- Microsoft Q&A

"In both US West and US East regions I get the wrong voice, but in the West Europe region I get the correct voice."

- Microsoft Q&A

"I tried all of the other voices just in case it got switched due to some kind of glitch but none of them are the voice I was working with before."

- Microsoft Q&A

Stop shipping inconsistent audio

100 free credits on signup. No credit card required.