Script Accuracy

We transcribe every file and diff it against your source script. Spoken stage directions, repeated phrases, and word-level deviations all get caught automatically. Scripts are optional - we still find most of the damage without one.

Script in, transcript out, diff highlighted

Here's what a single failing file looks like. Left is what you sent us, right is what the text-to-speech engine actually said.

descriptionSource script
chapter_02.txt
People usually think this city's history is carved in stone. [scoff] Not here. The walls were rebuilt four times, each one a little taller than the last.
graphic_eqAudio transcript
chapter_02.mp3
People usually think this city's history is carved in stone. scoff Not here. The walls were rebuilt were rebuilt four times, each one a little taller than the last.
Spoken tag

The bracketed cue [scoff] was read aloud instead of performed.

Repetition

"were rebuilt" was spoken twice in a row.

Missing word

One word dropped, pushing WER above the batch median.

Three independent detectors

Every file runs through all three. A single flagged issue is enough to send the file to your regeneration queue.

repeat

Repetition scan

We look for phrases that get spoken back-to-back when they shouldn't - single stuttered words, repeated sentences, and paragraph-long loops. Real repetitions, not coincidence: our scan is strict enough not to fire on natural doubles.

report

Spoken-tag detection

If your script contains stage directions like [scoff], [laughs], or *music*, we auto-fail any file where they end up spoken aloud. Without a script, we still flag likely spoken tags as warnings so you know where to look.

percent

Word-level accuracy

With a script provided, we compare it word-by-word against the transcript and produce an accuracy percentage. Files that drift too far from the batch norm get flagged as outliers, even if their individual score looks fine.

Scripts are optional

You get the best results when you send us the source script, but we still deliver useful signal without one.

With script
  • Word-level accuracy score against your source text
  • Any stage direction from your script that gets spoken aloud auto-fails the file
  • Repetitions validated against what you actually wrote
Without script
  • Repetition scan runs on the transcript alone
  • Likely spoken stage directions are flagged as warnings for your review
  • Works across a wide range of languages, auto-detected per file

What you get back per file

description

Full transcript

The complete transcription with word-level timestamps. You see exactly what the voice said, down to the word.

percent

Accuracy score

A 0-100 accuracy percentage when you provide a script, plus a batch baseline so you can spot drifters.

report

Tag and repetition log

Every spoken tag and repeated phrase with timestamps and the offending text - ready to copy into a bug ticket.

flag

Per-file regeneration flag

A single verdict: does this file pass, or does it need regenerating? Combined with the other checks into one answer.

1 credit per file

Real reports from real batches

"We kept getting reports from users that the narrator was saying 'open bracket laughs close bracket' in the middle of tour stops. Took us weeks to find them all."

- Audio content producer

"I assumed the Text-to-Speech engine would handle SSML-style tags gracefully, but it just read them out loud. Had to re-render 200+ files."

- Developer on Reddit

Catch repetitions and spoken tags before your users do

100 free credits on signup. No credit card required.