Script Accuracy

Detect repetitions, spoken tags, and script deviations in your TTS output. Scripts optional — provide them for even more accurate detection.

The problem

TTS engines don't always reproduce your script accurately. Tags get spoken aloud, words get skipped or added, and catching these issues manually is impractical at scale.

Spoken tags

Your script says [laughs] but the TTS literally says the word 'laughs' instead of producing a laugh sound. Listeners hear robotic stage directions instead of natural expression.

Missed lines

The TTS engine skips words, sentences, or entire paragraphs from your script. Without comparing the audio back to the original text, these omissions go unnoticed until a user complains.

Added words

Sometimes the engine hallucinates extra words or repeats phrases that weren't in the script. These insertions break the flow and can change the meaning of your content.

Hard to catch at scale

Manually comparing each audio file against its source script is tedious and error-prone. With hundreds of files in a batch, it's effectively impossible.

What people are saying

"We kept getting reports from users that the narrator was saying 'open bracket laughs close bracket' in the middle of tour stops. Took us weeks to find them all."

- Audio content producer

"I assumed the TTS would handle SSML-style tags gracefully, but it just read them out loud. Had to re-render 200+ files."

- Developer on Reddit

How we detect it

We transcribe each audio file, detect repetitions and spoken tags automatically, and compare against your script for even higher accuracy.

1

We transcribe the audio

Each audio file is transcribed using Whisper to produce an accurate text version of what was actually spoken.

2

We detect repetitions

We scan the transcript for repeated phrases — a common TTS failure where the engine loops over the same words. This works with or without a script.

3

We scan for spoken tags

Without a script, we detect likely spoken tags using heuristics and flag them as warnings. With a script, we extract tags like [laughs], (sighs), *music* and definitively flag any that were spoken aloud.

4

We measure script accuracy

When a script is provided, the clean text is compared against the transcription using word error rate. Files that deviate significantly from the batch are flagged as outliers.

What you get

Repetition detection, spoken tag warnings, accuracy scores, and full transcripts for every file

warning

Spoken tag detection

With a script, spoken tags trigger an instant fail. Without a script, likely spoken tags are flagged as warnings for your review.

percent

Accuracy score

When scripts are provided, word-level accuracy is calculated as a percentage. Files that deviate significantly from the batch are flagged as outliers.

repeat

Repetition detection

Catches when the TTS engine loops and repeats the same phrase multiple times. Works with or without a script.

description

Full transcript

The complete transcription of each file, so you can see exactly what was spoken.

playlist_add_check

Regeneration list

Files with repetitions, confirmed spoken tags, or accuracy outliers are flagged for regeneration alongside any other failing checks.

1 credit per file

Catch repetitions and spoken tags before your users do

100 free credits on signup. No credit card required.