If you already know exactly which text variants the model produces and you
just want to normalize the spelling, use Custom
spelling instead. Custom
spelling is a deterministic find-and-replace on the transcript text, with no
phoneme matching and no false positives.
How it works
Custom vocabulary operates at a text level and is based on phoneme similarity. Once the transcription is generated, Gladia converts both the transcribed words and your vocabulary entries into phonemes, then compares them. Theintensity controls how aggressively the model applies replacements: a higher intensity means the model will replace words more readily (wider phoneme matching), while a lower intensity requires a closer phoneme match before a replacement is made.
The pronunciations field lets you provide plain-text alternative spellings that reflect how the word actually sounds in speech. These are not phonetic notation. Just write the word the way someone might naively spell it based on how it sounds. Gladia converts these strings to phonemes internally. For example, if your term is “Nietzsche”, you might add ["Niche", "Neechee"] as pronunciations. This widens the phoneme net without having to raise the intensity (which would increase false positives across the board).
When to use custom vocabulary vs. custom spelling
Use custom spelling when the transcription already recognizes the word but writes it differently than you want. Common cases:- A person’s name comes through as “Gaurish” or “Gaureish” but you need “Gorish”.
- The model writes “data-science” and you want “Data Science”.
- You want to replace filler words or normalize punctuation (e.g. “period” → ”.”).
| Custom vocabulary | Custom spelling | |
|---|---|---|
| What it does | Listens to how a word sounds and replaces phonetically similar words in the transcript | Finds exact text strings in the transcript and replaces them with your preferred spelling |
| Mechanism | Phoneme-based similarity matching (probabilistic) | Text-based find-and-replace (deterministic) |
| Best for | Words that are consistently mis-transcribed: unusual proper nouns, new product names, niche jargon | Words that are recognizable but misspelled, e.g. “Gaurish” → “Gorish”, “data-science” → “Data Science” |
| Risk | Can produce false positives. Unrelated words that happen to sound similar may get replaced | No false positives, but it won’t help if the word isn’t recognized at all |
| Tuning | Adjust intensity and default_intensity to control aggressiveness | None needed. It either matches the text or it doesn’t |
Example configuration
Parameter reference
The global intensity applied to every vocabulary entry that doesn’t have its own
intensity override (minimum 0, maximum 1, default 0.5).A higher value means the model will apply replacements more aggressively: more replacements, but more risk of unwanted swaps. A lower value requires a closer phoneme match before replacing: fewer replacements, fewer false positives.Tuning intensity
The default intensity is 0.5, which works well for short lists of very distinctive words. But in practice, especially with longer lists or shorter words, 0.5 is often too aggressive and produces false positives. We recommend starting at 0.4 and raising only if you notice that your terms are still not being picked up, or lowering if you see too many false positives.default_intensity vs. per-entry intensity
default_intensitysets the baseline for every entry in your vocabulary list.- The per-entry
intensityfield overrides the global default for that specific word.
default_intensity to 0.4, then lower individual short or common-sounding words (like brand names “Target” or “Zoom”) down to 0.2-0.3 to avoid them matching too many unrelated words.
Watch your list size
As your vocabulary list grows, the chance of false positives increases. Every transcribed word is compared against every entry, so a list of 50+ terms will naturally produce more unintended replacements than a list of 5. If you find yourself fighting false positives on a large list, consider:- Lowering the intensity for the entries that cause problems.
- Adding specific
pronunciationsto narrow the phoneme matching instead of lowering intensity. - Moving entries that the model already recognizes (just with wrong spelling) to custom spelling instead. This eliminates false positives entirely for those terms.
Recommended workflow
If you’re setting up custom vocabulary for the first time, here’s a step-by-step approach that will save you time:- Run a transcription without any custom vocabulary. Look at the raw output and identify which words are being mis-transcribed.
- Separate the problems into two buckets:
- Words that are completely wrong or garbled → these are candidates for custom vocabulary.
- Words that are recognizable but misspelled (e.g. “Gaurish” instead of “Gorish”) → use custom spelling for these.
- Add your custom vocabulary entries with
default_intensityset to 0.4. - Run the transcription again and compare. Check that your terms are now appearing correctly.
- Look for false positives, words that were correct before but are now being wrongly replaced. If you spot any:
- Lower the
intensityon the entry causing the problem. - Add
pronunciationsto make the match more precise. - If false positives persist, move that entry to custom spelling instead.
- Lower the
- Iterate. Tuning is normal. Don’t expect to get it perfect on the first pass.