Skip to main content
Pre-recorded Live Custom vocabulary helps the transcription engine recognize words it would otherwise get wrong: unusual brand names, internal project codenames, medical or legal jargon, or acronyms that sound like common words. It works by comparing the sounds (phonemes) of what was actually spoken against the sounds of the words you provide. When there’s a close enough match, the transcribed word gets swapped out for your term. This is probabilistic: it increases the odds of a correct transcription, but it does not guarantee it.
If you already know exactly which text variants the model produces and you just want to normalize the spelling, use Custom spelling instead. Custom spelling is a deterministic find-and-replace on the transcript text, with no phoneme matching and no false positives.

How it works

Custom vocabulary operates at a text level and is based on phoneme similarity. Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. The intensity controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made. The pronunciations field lets you provide plain-text alternative spellings that reflect how the word actually sounds in speech. These are not phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is “Nietzsche”, you might add ["Niche", "Neechee"] as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).

When to use custom vocabulary vs. custom spelling

Use custom spelling when the transcription already recognizes the word but writes it differently than you want. Common cases:
  • A person’s name comes through as “Gaurish” or “Gaureish” but you need “Gorish”.
  • The model writes “data-science” and you want “Data Science”.
  • You want to replace filler words or normalize punctuation (e.g. “period” → ”.”).
Use custom vocabulary when the word comes out completely garbled or replaced by something phonetically similar. The transcription engine has never seen it and can’t get close on its own. Custom vocabulary uses phoneme matching to catch these cases, but it’s probabilistic and can produce false positives.
Custom vocabularyCustom spelling
What it doesListens to how a word sounds and replaces phonetically similar words in the transcriptFinds exact text strings in the transcript and replaces them with your preferred spelling
MechanismPhoneme-based similarity matching (probabilistic)Text-based find-and-replace (deterministic)
Best forWords that are consistently mis-transcribed: unusual proper nouns, new product names, niche jargonWords that are recognizable but misspelled, e.g. “Gaurish” → “Gorish”, “data-science” → “Data Science”
RiskCan produce false positives. Unrelated words that happen to sound similar may get replacedNo false positives, but it won’t help if the word isn’t recognized at all
TuningAdjust intensity and default_intensity to control aggressivenessNone needed. It either matches the text or it doesn’t
Rule of thumb: start with a transcription run without any custom vocabulary. Look at what the output actually says. If the word appears but is just misspelled, custom spelling is the simpler and safer fix. If the word is completely garbled, that’s when custom vocabulary is the right tool.
If you’ve been using custom vocabulary and keep running into false positives for certain terms, try moving those terms to custom spelling instead. As long as the transcription produces something close enough for you to list as a variant, custom spelling will handle the rest, deterministically and without side effects. This is a common and recommended migration path.

Example configuration

{
  "audio_url": "YOUR_AUDIO_URL",
  "custom_vocabulary": true,
  "custom_vocabulary_config": {
    "vocabulary": [
      "Gladia",
      {"value": "Solaria"},
      {
        "value": "Salesforce",
        "pronunciations": ["sell force", "sale forces"],
        "intensity": 0.5,
        "language": "en"
      },
    ],
    "default_intensity": 0.4
  }
}

Parameter reference

default_intensity
number
The global intensity applied to every vocabulary entry that doesn’t have its own intensity override (minimum 0, maximum 1, default 0.5).A higher value means the model will apply replacements more aggressively: more replacements, but more risk of unwanted swaps. A lower value requires a closer phoneme match before replacing: fewer replacements, fewer false positives.
vocabulary
object | string[]

Tuning intensity

The default intensity is 0.5, which works well for short lists of very distinctive words. But in practice, especially with longer lists or shorter words, 0.5 is often too aggressive and produces false positives. We recommend starting at 0.4 and raising only if you notice that your terms are still not being picked up, or lowering if you see too many false positives.

default_intensity vs. per-entry intensity

  • default_intensity sets the baseline for every entry in your vocabulary list.
  • The per-entry intensity field overrides the global default for that specific word.
You can mix both. A common pattern: set default_intensity to 0.4, then lower individual short or common-sounding words (like brand names “Target” or “Zoom”) down to 0.2-0.3 to avoid them matching too many unrelated words.

Watch your list size

As your vocabulary list grows, the chance of false positives increases. Every transcribed word is compared against every entry, so a list of 50+ terms will naturally produce more unintended replacements than a list of 5. If you find yourself fighting false positives on a large list, consider:
  1. Lowering the intensity for the entries that cause problems.
  2. Adding specific pronunciations to narrow the phoneme matching instead of lowering intensity.
  3. Moving entries that the model already recognizes (just with wrong spelling) to custom spelling instead. This eliminates false positives entirely for those terms.
If you’re setting up custom vocabulary for the first time, here’s a step-by-step approach that will save you time:
  1. Run a transcription without any custom vocabulary. Look at the raw output and identify which words are being mis-transcribed.
  2. Separate the problems into two buckets:
    • Words that are completely wrong or garbled → these are candidates for custom vocabulary.
    • Words that are recognizable but misspelled (e.g. “Gaurish” instead of “Gorish”) → use custom spelling for these.
  3. Add your custom vocabulary entries with default_intensity set to 0.4.
  4. Run the transcription again and compare. Check that your terms are now appearing correctly.
  5. Look for false positives, words that were correct before but are now being wrongly replaced. If you spot any:
    • Lower the intensity on the entry causing the problem.
    • Add pronunciations to make the match more precise.
    • If false positives persist, move that entry to custom spelling instead.
  6. Iterate. Tuning is normal. Don’t expect to get it perfect on the first pass.