Script Accuracy
We transcribe every file and diff it against your source script. Spoken stage directions, repeated phrases, and word-level deviations all get caught automatically. Scripts are optional - we still find most of the damage without one.
Script in, transcript out, diff highlighted
Here's what a single failing file looks like. Left is what you sent us, right is what the text-to-speech engine actually said.
The bracketed cue [scoff] was read aloud instead of performed.
"were rebuilt" was spoken twice in a row.
One word dropped, pushing WER above the batch median.
Three independent detectors
Every file runs through all three. A single flagged issue is enough to send the file to your regeneration queue.
Repetition scan
We look for phrases that get spoken back-to-back when they shouldn't - single stuttered words, repeated sentences, and paragraph-long loops. Real repetitions, not coincidence: our scan is strict enough not to fire on natural doubles.
Spoken-tag detection
If your script contains stage directions like [scoff], [laughs], or *music*, we auto-fail any file where they end up spoken aloud. Without a script, we still flag likely spoken tags as warnings so you know where to look.
Word-level accuracy
With a script provided, we compare it word-by-word against the transcript and produce an accuracy percentage. Files that drift too far from the batch norm get flagged as outliers, even if their individual score looks fine.
Scripts are optional
You get the best results when you send us the source script, but we still deliver useful signal without one.
- Word-level accuracy score against your source text
- Any stage direction from your script that gets spoken aloud auto-fails the file
- Repetitions validated against what you actually wrote
- Repetition scan runs on the transcript alone
- Likely spoken stage directions are flagged as warnings for your review
- Works across a wide range of languages, auto-detected per file
What you get back per file
Full transcript
The complete transcription with word-level timestamps. You see exactly what the voice said, down to the word.
Accuracy score
A 0-100 accuracy percentage when you provide a script, plus a batch baseline so you can spot drifters.
Tag and repetition log
Every spoken tag and repeated phrase with timestamps and the offending text - ready to copy into a bug ticket.
Per-file regeneration flag
A single verdict: does this file pass, or does it need regenerating? Combined with the other checks into one answer.
Real reports from real batches
"We kept getting reports from users that the narrator was saying 'open bracket laughs close bracket' in the middle of tour stops. Took us weeks to find them all."
"I assumed the Text-to-Speech engine would handle SSML-style tags gracefully, but it just read them out loud. Had to re-render 200+ files."
Catch repetitions and spoken tags before your users do
100 free credits on signup. No credit card required.