Custom vocabulary

Pre-recorded Live Custom vocabulary helps the transcription engine recognize words it would otherwise get wrong: unusual brand names, internal project codenames, medical or legal jargon, or acronyms that sound like common words. It works by comparing the sounds (phonemes) of what was actually spoken against the sounds of the words you provide. When there’s a close enough match, the transcribed word gets swapped out for your term. This is probabilistic: it increases the odds of a correct transcription, but it does not guarantee it.

If you already know exactly which text variants the model produces and you just want to normalize the spelling, use Custom spelling instead. Custom spelling is a deterministic find-and-replace on the transcript text, with no phoneme matching and no false positives.

How it works

Custom vocabulary operates at a text level and is based on phoneme similarity. Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. The intensity controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made. The pronunciations field lets you provide plain-text alternative spellings that reflect how the word actually sounds in speech. These are not phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is “Nietzsche”, you might add ["Niche", "Neechee"] as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).

When to use custom vocabulary vs. custom spelling

Use custom spelling when the transcription already recognizes the word but writes it differently than you want. Common cases:

A person’s name comes through as “Gaurish” or “Gaureish” but you need “Gorish”.
The model writes “data-science” and you want “Data Science”.
You want to replace filler words or normalize punctuation (e.g. “period” → ”.”).

Use custom vocabulary when the word comes out completely garbled or replaced by something phonetically similar. The transcription engine has never seen it and can’t get close on its own. Custom vocabulary uses phoneme matching to catch these cases, but it’s probabilistic and can produce false positives.

	Custom vocabulary	Custom spelling
What it does	Listens to how a word sounds and replaces phonetically similar words in the transcript	Finds exact text strings in the transcript and replaces them with your preferred spelling
Mechanism	Phoneme-based similarity matching (probabilistic)	Text-based find-and-replace (deterministic)
Best for	Words that are consistently mis-transcribed: unusual proper nouns, new product names, niche jargon	Words that are recognizable but misspelled, e.g. “Gaurish” → “Gorish”, “data-science” → “Data Science”
Risk	Can produce false positives. Unrelated words that happen to sound similar may get replaced	No false positives, but it won’t help if the word isn’t recognized at all
Tuning	Adjust `intensity` and `default_intensity` to control aggressiveness	None needed. It either matches the text or it doesn’t

Rule of thumb: start with a transcription run without any custom vocabulary. Look at what the output actually says. If the word appears but is just misspelled, custom spelling is the simpler and safer fix. If the word is completely garbled, that’s when custom vocabulary is the right tool.

If you’ve been using custom vocabulary and keep running into false positives for certain terms, try moving those terms to custom spelling instead. As long as the transcription produces something close enough for you to list as a variant, custom spelling will handle the rest, deterministically and without side effects. This is a common and recommended migration path.

Example configuration

{
  "audio_url": "YOUR_AUDIO_URL",
  "custom_vocabulary": true,
  "custom_vocabulary_config": {
    "vocabulary": [
      "Gladia",
      {"value": "Solaria"},
      {
        "value": "Salesforce",
        "pronunciations": ["sell force", "sale forces"],
        "intensity": 0.5,
        "language": "en"
      },
    ],
    "default_intensity": 0.4
  }
}

Parameter reference

default_intensity

number

The global intensity applied to every vocabulary entry that doesn’t have its own intensity override (minimum 0, maximum 1, default 0.5).A higher value means the model will apply replacements more aggressively: more replacements, but more risk of unwanted swaps. A lower value requires a closer phoneme match before replacing: fewer replacements, fewer false positives.

vocabulary

object | string[]

Show properties

value

string

required

The text that will be inserted into the transcription when a phoneme match is found.

pronunciations

string[]

Plain-text alternative spellings that reflect how the word sounds in speech. Write them the way someone would naively spell the word based on pronunciation. Gladia converts these to phonemes internally. This is not phonetic notation.

intensity

number

The intensity for this specific entry (minimum 0, maximum 1, default: inherits from default_intensity). Use this to make individual entries more or less aggressive than the global default. For example, set a lower intensity on a short word that keeps producing false positives.

language

string

The language in which this word will be pronounced during phoneme comparison. Defaults to the transcription language. This matters when a word from one language appears in a conversation in another language. For example, an English brand name like “Salesforce” spoken in a French meeting. Setting language: "en" ensures the phoneme comparison uses English pronunciation rules, not French ones.

Tuning intensity

The default intensity is 0.5, which works well for short lists of very distinctive words. But in practice, especially with longer lists or shorter words, 0.5 is often too aggressive and produces false positives. We recommend starting at 0.4 and raising only if you notice that your terms are still not being picked up, or lowering if you see too many false positives.

`default_intensity` vs. per-entry `intensity`

default_intensity sets the baseline for every entry in your vocabulary list.
The per-entry intensity field overrides the global default for that specific word.

You can mix both. A common pattern: set default_intensity to 0.4, then lower individual short or common-sounding words (like brand names “Target” or “Zoom”) down to 0.2-0.3 to avoid them matching too many unrelated words.

Watch your list size

As your vocabulary list grows, the chance of false positives increases. Every transcribed word is compared against every entry, so a list of 50+ terms will naturally produce more unintended replacements than a list of 5. If you find yourself fighting false positives on a large list, consider:

Lowering the intensity for the entries that cause problems.
Adding specific pronunciations to narrow the phoneme matching instead of lowering intensity.
Moving entries that the model already recognizes (just with wrong spelling) to custom spelling instead. This eliminates false positives entirely for those terms.

Recommended workflow

If you’re setting up custom vocabulary for the first time, here’s a step-by-step approach that will save you time:

Run a transcription without any custom vocabulary. Look at the raw output and identify which words are being mis-transcribed.
Separate the problems into two buckets:
- Words that are completely wrong or garbled → these are candidates for custom vocabulary.
- Words that are recognizable but misspelled (e.g. “Gaurish” instead of “Gorish”) → use custom spelling for these.
Add your custom vocabulary entries with default_intensity set to 0.4.
Run the transcription again and compare. Check that your terms are now appearing correctly.
Look for false positives, words that were correct before but are now being wrongly replaced. If you spot any:
- Lower the intensity on the entry causing the problem.
- Add pronunciations to make the match more precise.
- If false positives persist, move that entry to custom spelling instead.
Iterate. Tuning is normal. Don’t expect to get it perfect on the first pass.

Introduction

Speech-to-Text

Language

Audio Intelligence

Integrations

Limits & Specifications

Migrations

Custom vocabulary

How it works

When to use custom vocabulary vs. custom spelling

Example configuration

Parameter reference

Tuning intensity

`default_intensity` vs. per-entry `intensity`

Watch your list size

Recommended workflow

Introduction

Speech-to-Text

Language

Audio Intelligence

Integrations

Limits & Specifications

Migrations

​How it works

​When to use custom vocabulary vs. custom spelling

​Example configuration

​Parameter reference

​Tuning intensity

​default_intensity vs. per-entry intensity

​Watch your list size

​Recommended workflow

How it works

When to use custom vocabulary vs. custom spelling

Example configuration

Parameter reference

Tuning intensity

`default_intensity` vs. per-entry `intensity`

Watch your list size

Recommended workflow