API Documentation

RESTful API with simple authentication

POST/audit

Run one or more audio checks on a batch of files. Enable each check via query parameters and upload files as multipart form data.

Each enabled check costs 1 credit per file.

Known Limitations

  • Primarily designed and tested with English audio. Other languages are untested and may produce unexpected results.
  • Best results come from one speaker per file. Multiple speakers per file is not supported.
  • Audio containing intentional sound effects (jingles, ambient music, crowd noise, intro/outro effects) can trigger false positives on the garble and staticNoise quality sub-checks. If your audio includes sound effects, disable these checks via the qualityChecks parameter.

Headers

Authenticate with an API key or pay per-request via x402 (USDC on Base). See Authentication for details.

HeaderTypeRequiredDescription
X-API-KeystringRequired*Your API key (or use Authorization: Bearer). Not needed when paying via x402.
Payment-SignaturestringOptionalBase64-encoded x402 payment receipt. Send with the retry after receiving a 402 response. x402
X-WalletstringOptionalYour wallet address, sent on the initial request to receive cached-file discounts in the 402 price. x402

Query Parameters

All four are required. Values are "true" or "false"; at least one must be "true".

ParameterTypeRequiredDescription
comparisonstring Required Enable speaker consistency analysis Learn more
qualitystring Required Enable audio quality analysis (SNR, artifacts, clipping) Learn more
pacestring Required Enable speaking speed consistency analysis Learn more
scriptAccuracystring Required Enable script accuracy and spoken tag detection Learn more
forceFreshstring Optional Skip cached-file discount and charge full credits for every file

Request Body

Encoded as multipart/form-data.

FieldTypeRequiredDescription
filesFile Required Audio files to analyze (min 1 for quality-only, min 2 for comparison or pace). Use field name "files" for each part
accuracystring Optional Comparison accuracy: "standard" (default), "high", or "highest"
deviationThresholdnumber Optional Flagging sensitivity across all checks, 0–1. Default 0.15
qualityChecksJSON string Optional Toggle individual quality sub-checks. All default to true — only include keys to change
scriptsJSON string Optional JSON object mapping each filename to its script text. Optional — improves spoken tag detection when scriptAccuracy is enabled. Without scripts, repetitions and likely spoken tags are still detected. Example: {"file1.mp3": "Hello [laughs] world!"}

Code Examples

curl -X POST "https://api.ttsaudit.com/audit?comparison=true&quality=true&pace=false&scriptAccuracy=false" \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "files=@chapter1.mp3" \
  -F "files=@chapter2.mp3" \
  -F "files=@chapter3.mp3" \
  -F "accuracy=standard" \
  -F "deviationThreshold=0.15"

# To disable specific quality sub-checks, add qualityChecks as a JSON string:
#   -F 'qualityChecks={"garble": false, "silenceGaps": false}'
# Omitted keys default to true.

Response

Returns an overall score, a list of files to regenerate, and per-check results. Each check contains a score, summary, and tracks[] array.

PropertyTypeDescription
scorenumberOverall score (0–100), average of all enabled check scores
summarystringOne-sentence overview of the audit result
fileCountnumberNumber of audio files in the batch
auditIdstringUnique identifier for this audit session
reportUrlstringDirect link to view this audit on ttsaudit.com
tracksToRegeneratearrayFiles flagged by any check. Each entry has file name and reasons
checks.comparisonobjectSpeaker consistency results including voice similarity and volume consistency (if enabled)
checks.qualityobjectAudio quality analysis results (if enabled)
checks.paceobjectSpeaking speed consistency results (if enabled)
checks.scriptAccuracyobjectScript accuracy and spoken tag detection results (if enabled)
creditsUsednumberTotal credits consumed by this audit
timingobjectPerformance breakdown in seconds
{
  "score": 87.2,
  "summary": "2 of 3 checks passed. 1 file flagged for regeneration.",
  "fileCount": 3,
  "auditId": "a1b2c3d4e5f6",
  "reportUrl": "https://ttsaudit.com/dashboard?tab=audit&session=a1b2c3d4e5f6",
  "tracksToRegenerate": [
    {
      "file": "chapter3.mp3",
      "reasons": [
        { "check": "comparison", "message": "deviation 18.00%", "deviation": 0.18 }
      ]
    }
  ],
  "checks": {
    "comparison": {
      "score": 91.2,
      "summary": "1 of 3 files flagged - 91% average consistency.",
      "similarityMatrix": [[1.0, 0.92, 0.88], [0.92, 1.0, 0.91], [0.88, 0.91, 1.0]],
      "volumeConsistency": { "score": 96.2, "medianDb": -18.3, "spreadDb": 1.8, "outliers": [] },
      "tracks": [
        { "file": "chapter1.mp3", "similarity": 0.93, "deviation": -0.02, "flagged": false },
        { "file": "chapter2.mp3", "similarity": 0.91, "deviation": 0.00, "flagged": false },
        { "file": "chapter3.mp3", "similarity": 0.81, "deviation": 0.18, "flagged": true }
      ]
    },
    "quality": {
      "score": 94.5,
      "summary": "Good audio quality - average score 94.5 across 3 files.",
      "tracks": [
        {
          "score": 95.2,
          "flagged": false,
          "snrDb": 45.2,
          "issueCount": 2,
          "issueSummary": { "total": 2, "severe": 0, "noticeable": 2, "garbleCount": 1, "staticCount": 0, "silenceCount": 0, "worstLabel": "noticeable" },
          "issues": [
            { "timeSec": 3.21, "endSec": 3.242, "durationMs": 32.0, "type": "click", "severity": 0.4, "audibility": 0.45, "audibilityLabel": "noticeable" },
            { "timeSec": 6.66, "endSec": 6.692, "durationMs": 32.0, "type": "garble", "severity": 0.62, "audibility": 0.58, "audibilityLabel": "noticeable" }
          ],
          "clipping": { "clipCount": 0, "clipPercentage": 0 },
          "bandwidth": { "cutoffHz": null, "spectralCentroidHz": 1824.3, "bandwidthRatio": 1.0 }
        }
      ]
    },
    "scriptAccuracy": {
      "score": 0,
      "summary": "1 tag spoken aloud across 3 files.",
      "spokenTagCount": 1,
      "tracks": [
        {
          "transcript": "Hello scoff well that was something.",
          "accuracy": 85.0,
          "wordErrorRate": 0.15,
          "flagged": true,
          "tags": {
            "found": 1,
            "spoken": [{ "tag": "[scoff]", "content": "scoff", "spokenWord": "scoff", "timeSec": 3.24, "endSec": 3.71 }]
          }
        },
        {
          "transcript": "This is the second chapter of our story.",
          "accuracy": 100.0,
          "wordErrorRate": 0.0,
          "flagged": false,
          "tags": { "found": 0, "spoken": [] }
        }
      ]
    }
  },
  "creditsUsed": 9,
  "timing": { "decode": 3.1, "checks": 4.8, "total": 8.2 }
}

Ready to integrate?