Speaking Speed

We measure how fast every file is actually being read - silences and all - then flag any chapter that pulls too far from the batch median.

Try It Free API Docs

Spot the outliers in one look

Here's a mock batch of eight files. The median sits at 148 WPM. Two files drift more than 15% from the median and get flagged for regeneration.

Words per minute by file

within range flagged

chapter_01.mp3146 WPM -1%

chapter_02.mp3151 WPM +2%

chapter_03.mp3185 WPM +25%

chapter_04.mp3148 WPM 0%

chapter_05.mp3150 WPM +1%

chapter_06.mp3119 WPM -20%

chapter_07.mp3145 WPM -2%

chapter_08.mp3153 WPM +3%

Median

148

WPM

Threshold

15%

deviation

Flagged

2 / 8

tracks

What makes this measurement honest

A raw "word count / duration" calculation is misleading because real text-to-speech output is full of pauses. Here's what we actually measure.

schedule

Word-level timing

We know exactly when each word in the file starts and ends - not just where sentences break. That's the foundation for a speaking rate you can actually trust.

speaker_notes_off

Silence is not slow

A file that sits in dead air for thirty seconds isn't "slow" - it's paused. We exclude long silences so the WPM you see reflects the actual speaking, not the dead time in between.

language

Language-robust

Word length varies wildly across languages. Under the hood we measure speed in a language-neutral way and report a WPM number that stays meaningful whether you're generating English or German.

query_stats

Batch consistency score

The whole batch gets a 0-100 consistency score. Tight batches stay near 100; the more files that drift, the harder the score drops.

What you get back per file

bolt

Words per minute

Silence-adjusted WPM for every file, measured from word-level timestamps.

trending_flat

Deviation from median

Signed percentage showing whether each file is faster or slower than the batch.

flag

Outlier flag

True when a file's pace drifts beyond your configured threshold, default 15%.

1 credit per file

Real reports from real batches

"For audios longer than 1.5-2mins, gpt-4o-mini-tts works really unstable: random pauses from few seconds to more than a minute, random volume and tone changes, repeating last few sentences in random order."
- u/tcherkashin94 on OpenAI Forum

"The OpenAI voices are excellent in terms of realism but they sometimes skip phrases and on occasion, entire paragraphs. Sometimes when submitting a single word, the API will return silence."
- OpenAI Developer Community

"I'm having the exact same issue with having very long silences in the audio. It's quite weird."
- u/lightnesscaster on OpenAI Forum

"The only issue I have had is that I have to create pause blocks to stretch the time frame out instead of moving a block over in the timeline."
- Trustpilot review

Keep your audio pacing consistent

100 free credits on signup. No credit card required.

Start Audit API Docs