API Documentation
RESTful API with simple authentication
POST
/auditRun one or more audio checks on a batch of files. Enable each check via query parameters and upload files as multipart form data.
Each enabled check costs 1 credit per file.
Known Limitations
- Primarily designed and tested with English audio. Other languages are untested and may produce unexpected results.
- Best results come from one speaker per file. Multiple speakers per file is not supported.
- Audio containing intentional sound effects (jingles, ambient music, crowd noise, intro/outro effects) can trigger false positives on the
garbleandstaticNoisequality sub-checks. If your audio includes sound effects, disable these checks via thequalityChecksparameter.
Headers
Authenticate with an API key or pay per-request via x402 (USDC on Base). See Authentication for details.
| Header | Type | Required | Description |
|---|---|---|---|
| X-API-Key | string | Required* | Your API key (or use Authorization: Bearer). Not needed when paying via x402. |
| Payment-Signature | string | Optional | Base64-encoded x402 payment receipt. Send with the retry after receiving a 402 response. x402 |
| X-Wallet | string | Optional | Your wallet address, sent on the initial request to receive cached-file discounts in the 402 price. x402 |
Query Parameters
All four are required. Values are "true" or "false"; at least one must be "true".
| Parameter | Type | Required | Description |
|---|---|---|---|
| comparison | string | Required | Enable speaker consistency analysis Learn more |
| quality | string | Required | Enable audio quality analysis (SNR, artifacts, clipping) Learn more |
| pace | string | Required | Enable speaking speed consistency analysis Learn more |
| scriptAccuracy | string | Required | Enable script accuracy and spoken tag detection Learn more |
| forceFresh | string | Optional | Skip cached-file discount and charge full credits for every file |
Request Body
Encoded as multipart/form-data.
| Field | Type | Required | Description |
|---|---|---|---|
| files | File | Required | Audio files to analyze (min 1 for quality-only, min 2 for comparison or pace). Use field name "files" for each part |
| accuracy | string | Optional | Comparison accuracy: "standard" (default), "high", or "highest" |
| deviationThreshold | number | Optional | Flagging sensitivity across all checks, 0–1. Default 0.15 |
| qualityChecks | JSON string | Optional | Toggle individual quality sub-checks. All default to true — only include keys to change |
| scripts | JSON string | Optional | JSON object mapping each filename to its script text. Optional — improves spoken tag detection when scriptAccuracy is enabled. Without scripts, repetitions and likely spoken tags are still detected. Example: {"file1.mp3": "Hello [laughs] world!"} |
Code Examples
curl -X POST "https://api.ttsaudit.com/audit?comparison=true&quality=true&pace=false&scriptAccuracy=false" \
-H "X-API-Key: YOUR_API_KEY" \
-F "files=@chapter1.mp3" \
-F "files=@chapter2.mp3" \
-F "files=@chapter3.mp3" \
-F "accuracy=standard" \
-F "deviationThreshold=0.15"
# To disable specific quality sub-checks, add qualityChecks as a JSON string:
# -F 'qualityChecks={"garble": false, "silenceGaps": false}'
# Omitted keys default to true.Response
Returns an overall score, a list of files to regenerate, and per-check results. Each check contains a score, summary, and tracks[] array.
| Property | Type | Description |
|---|---|---|
| score | number | Overall score (0–100), average of all enabled check scores |
| summary | string | One-sentence overview of the audit result |
| fileCount | number | Number of audio files in the batch |
| auditId | string | Unique identifier for this audit session |
| reportUrl | string | Direct link to view this audit on ttsaudit.com |
| tracksToRegenerate | array | Files flagged by any check. Each entry has file name and reasons |
| checks.comparison | object | Speaker consistency results including voice similarity and volume consistency (if enabled) |
| checks.quality | object | Audio quality analysis results (if enabled) |
| checks.pace | object | Speaking speed consistency results (if enabled) |
| checks.scriptAccuracy | object | Script accuracy and spoken tag detection results (if enabled) |
| creditsUsed | number | Total credits consumed by this audit |
| timing | object | Performance breakdown in seconds |
{
"score": 87.2,
"summary": "2 of 3 checks passed. 1 file flagged for regeneration.",
"fileCount": 3,
"auditId": "a1b2c3d4e5f6",
"reportUrl": "https://ttsaudit.com/dashboard?tab=audit&session=a1b2c3d4e5f6",
"tracksToRegenerate": [
{
"file": "chapter3.mp3",
"reasons": [
{ "check": "comparison", "message": "deviation 18.00%", "deviation": 0.18 }
]
}
],
"checks": {
"comparison": {
"score": 91.2,
"summary": "1 of 3 files flagged - 91% average consistency.",
"similarityMatrix": [[1.0, 0.92, 0.88], [0.92, 1.0, 0.91], [0.88, 0.91, 1.0]],
"volumeConsistency": { "score": 96.2, "medianDb": -18.3, "spreadDb": 1.8, "outliers": [] },
"tracks": [
{ "file": "chapter1.mp3", "similarity": 0.93, "deviation": -0.02, "flagged": false },
{ "file": "chapter2.mp3", "similarity": 0.91, "deviation": 0.00, "flagged": false },
{ "file": "chapter3.mp3", "similarity": 0.81, "deviation": 0.18, "flagged": true }
]
},
"quality": {
"score": 94.5,
"summary": "Good audio quality - average score 94.5 across 3 files.",
"tracks": [
{
"score": 95.2,
"flagged": false,
"snrDb": 45.2,
"issueCount": 2,
"issueSummary": { "total": 2, "severe": 0, "noticeable": 2, "garbleCount": 1, "staticCount": 0, "silenceCount": 0, "worstLabel": "noticeable" },
"issues": [
{ "timeSec": 3.21, "endSec": 3.242, "durationMs": 32.0, "type": "click", "severity": 0.4, "audibility": 0.45, "audibilityLabel": "noticeable" },
{ "timeSec": 6.66, "endSec": 6.692, "durationMs": 32.0, "type": "garble", "severity": 0.62, "audibility": 0.58, "audibilityLabel": "noticeable" }
],
"clipping": { "clipCount": 0, "clipPercentage": 0 },
"bandwidth": { "cutoffHz": null, "spectralCentroidHz": 1824.3, "bandwidthRatio": 1.0 }
}
]
},
"scriptAccuracy": {
"score": 0,
"summary": "1 tag spoken aloud across 3 files.",
"spokenTagCount": 1,
"tracks": [
{
"transcript": "Hello scoff well that was something.",
"accuracy": 85.0,
"wordErrorRate": 0.15,
"flagged": true,
"tags": {
"found": 1,
"spoken": [{ "tag": "[scoff]", "content": "scoff", "spokenWord": "scoff", "timeSec": 3.24, "endSec": 3.71 }]
}
},
{
"transcript": "This is the second chapter of our story.",
"accuracy": 100.0,
"wordErrorRate": 0.0,
"flagged": false,
"tags": { "found": 0, "spoken": [] }
}
]
}
},
"creditsUsed": 9,
"timing": { "decode": 3.1, "checks": 4.8, "total": 8.2 }
}