Script Accuracy
Detect repetitions, spoken tags, and script deviations in your TTS output. Scripts optional — provide them for even more accurate detection.
The problem
TTS engines don't always reproduce your script accurately. Tags get spoken aloud, words get skipped or added, and catching these issues manually is impractical at scale.
Your script says [laughs] but the TTS literally says the word 'laughs' instead of producing a laugh sound. Listeners hear robotic stage directions instead of natural expression.
The TTS engine skips words, sentences, or entire paragraphs from your script. Without comparing the audio back to the original text, these omissions go unnoticed until a user complains.
Sometimes the engine hallucinates extra words or repeats phrases that weren't in the script. These insertions break the flow and can change the meaning of your content.
Manually comparing each audio file against its source script is tedious and error-prone. With hundreds of files in a batch, it's effectively impossible.
What people are saying
"We kept getting reports from users that the narrator was saying 'open bracket laughs close bracket' in the middle of tour stops. Took us weeks to find them all."
"I assumed the TTS would handle SSML-style tags gracefully, but it just read them out loud. Had to re-render 200+ files."
How we detect it
We transcribe each audio file, detect repetitions and spoken tags automatically, and compare against your script for even higher accuracy.
We transcribe the audio
Each audio file is transcribed using Whisper to produce an accurate text version of what was actually spoken.
We detect repetitions
We scan the transcript for repeated phrases — a common TTS failure where the engine loops over the same words. This works with or without a script.
We scan for spoken tags
Without a script, we detect likely spoken tags using heuristics and flag them as warnings. With a script, we extract tags like [laughs], (sighs), *music* and definitively flag any that were spoken aloud.
We measure script accuracy
When a script is provided, the clean text is compared against the transcription using word error rate. Files that deviate significantly from the batch are flagged as outliers.
What you get
Repetition detection, spoken tag warnings, accuracy scores, and full transcripts for every file
Spoken tag detection
With a script, spoken tags trigger an instant fail. Without a script, likely spoken tags are flagged as warnings for your review.
Accuracy score
When scripts are provided, word-level accuracy is calculated as a percentage. Files that deviate significantly from the batch are flagged as outliers.
Repetition detection
Catches when the TTS engine loops and repeats the same phrase multiple times. Works with or without a script.
Full transcript
The complete transcription of each file, so you can see exactly what was spoken.
Regeneration list
Files with repetitions, confirmed spoken tags, or accuracy outliers are flagged for regeneration alongside any other failing checks.
Other audit checks
Catch repetitions and spoken tags before your users do
100 free credits on signup. No credit card required.