Saltar al contenido principal

Quality Gate

Every translation passes through a deterministic validation gate before it's written to disk. The quality gate catches common machine translation failure modes — no silent fallbacks, no garbage written to your locale files.

Validation Checks

CheckWhat It CatchesGate Label
Empty/blankModel returned empty string or whitespace[GATE] empty
Source echoModel returned the original English input[GATE] source-echo
Hallucination loopRepeated trigram patterns (e.g., "Qo' Qo' Qo'")[GATE] hallucination
Length inflationOutput is significantly longer than source[GATE] length
Script complianceWrong script for the target locale[GATE] script

Empty/Blank

Rejects translations that are empty strings, whitespace-only, or null. This catches models that return nothing for difficult keys.

Source Echo

Detects when the model returns the English source text instead of translating it. Common with short strings and under-specified prompts.

Hallucination Loop

Analyzes trigram (3-character) patterns in the output. If any trigram repeats more than a threshold number of times relative to the output length, the translation is rejected. This catches degenerate outputs like "Qo' Qo' Qo' Qo' Qo'".

Length Inflation

Rejects translations where the output length exceeds maxLengthRatio × source length (default: 4×). This catches model hallucinations that produce walls of text for a short input.

Configurable via maxLengthRatio in your config.

Script Compliance

For locales with a configured script field (e.g., "script": "cans" for Plains Cree Syllabics), validates that the output contains non-ASCII characters appropriate for the target script. Latin-only output for an Arabic, CJK, or Syllabics locale is rejected.

What Happens on Failure

  1. The failing translation is logged to stderr with a [GATE] prefix, the key name, the reason, and a preview of the value
  2. The key is not written to the locale file
  3. The retry cascade kicks in (see below)
[GATE] hero.title: source-echo — "Welcome to our platform"
[GATE] nav.about: hallucination — "À À À À À À À À"

Retry Cascade

When a batch fails (JSON parse error or quality gate rejections), rosetta retries with progressively smaller batches:

Full batch (30 keys) → parse error
└→ Half batch (15 keys) → 2 failures
└→ Individual keys (1 each) → isolates the 2 problem keys

The retry budget is capped by maxRetries (default: 3, configurable per-language). This prevents runaway token spend on keys that consistently fail.

After exhausting retries, the problem keys are logged and skipped. They'll be retried on the next sync run.

Prompt Caching

The system message (register, grammar rules, style notes) is split from the user message (the keys to translate). This split is intentional:

  • The system message is identical across batches for a given locale
  • Providers like Anthropic and Google cache repeated system messages
  • Result: the first batch pays full token cost, subsequent batches pay only for the user message

This can significantly reduce token costs for projects with many batches.