Zum Hauptinhalt springen

How Sync Works

The sync command is rosetta's core operation. Here's what happens when you run npx i18n-rosetta sync.

Pipeline Overview

Step by Step

1. Config Resolution

Rosetta loads i18n-rosetta.config.json (or auto-detects settings). It resolves:

  • Source locale and target locales
  • The pair graph (which source→target combinations to process)
  • Per-pair method, model, and quality settings

2. Source Scanning

The source locale file is loaded and flattened into a key→value map:

// Input (nested)
{ "hero": { "title": "Welcome", "subtitle": "Build" } }

// Flattened
{ "hero.title": "Welcome", "hero.subtitle": "Build" }

3. Change Detection

Rosetta reads .i18n-rosetta.lock, which stores SHA-256 hashes of previously translated source values. For each key, it checks:

ConditionAction
Key missing from targetTranslate
Source hash changed since last syncRe-translate (stale)
Target value starts with [EN]Re-translate (fallback placeholder)
Source hash unchanged, key existsSkip

This is why rosetta only translates what changed — it's not re-translating your entire file on every sync.

4. Batching

Keys are grouped into batches (default: 30 keys/batch for LLM, 128 for Google Translate). Batching reduces API round trips while keeping prompts manageable.

5. Translation

Each batch is sent to the configured translation method:

  • llm: Structured prompt to OpenRouter with register instructions
  • llm-coached: Same, but with grammar rules, dictionary, and style notes injected
  • google-translate: Google Cloud Translation API v2 batch request
  • api: HTTP POST to a remote endpoint

The system message (register, rules) is identical across batches for a given locale, enabling prompt caching — providers like Anthropic and Google cache repeated system messages, reducing token costs.

6. Quality Gate

Every translation is validated before it's written to disk. Five checks run:

CheckWhat it catchesExample
Empty/blankModel returned nothing""
Source echoModel returned the English input"Welcome" for Japanese
Hallucination loopRepeated trigrams"Qo' Qo' Qo' Qo'"
Length inflationOutput is 4×+ longer than source10-char source → 50-char output
Script complianceWrong script for the localeLatin text for Arabic locale

Failures are logged with a [GATE] prefix. No silent fallbacks.

See Quality Gate for details.

7. Retry Cascade

On JSON parse failure or batch-level errors, rosetta retries with progressively smaller batches:

Full batch (30 keys) → Failed
Half batch (15 keys) → Failed
Individual keys (1 each) → Isolates the problem key

The retry budget is capped by maxRetries (default: 3) to prevent runaway token spend.

8. Write & Lock

Passing translations are written to the target locale file, preserving the original nesting structure. The lock file is updated with new SHA-256 hashes.

Partial Success

One failed batch doesn't block the rest. If 9 out of 10 batches succeed, those 9 are written. The failed batch is logged, and you can re-run sync to retry.

Dry Run

Preview what would change without writing any files:

npx i18n-rosetta sync --dry

Force Re-translate

Force specific keys to be re-translated even if unchanged:

npx i18n-rosetta sync --force-keys "hero.title,nav.about"

Cost Estimation

Before translating, rosetta generates a pre-sync cost report showing estimated costs per pair. This runs automatically during every sync — you see it before any API calls are made.

╔══════════════════════════════════════════════════════════╗
║ Cost Estimate ║
╠════════════╦═══════╦════════════╦════════════════════════╣
║ Pair ║ Keys ║ Est. Cost ║ Method ║
╠════════════╬═══════╬════════════╬════════════════════════╣
║ en → fr ║ 142 ║ $0.07 ║ google-translate ║
║ en → ja ║ 38 ║ — ║ llm (model-dependent) ║
║ en → crk ║ 38 ║ — ║ llm-coached ║
╚════════════╩═══════╩════════════╩════════════════════════╝

What Gets Estimated

Each translation method provides its own cost estimate:

MethodCost BasisPrecision
google-translateGoogle's published rate ($20/million chars)Accurate
llmVaries by OpenRouter modelModel-dependent — check OpenRouter pricing
llm-coachedSame as llm plus coaching context tokensModel-dependent
apiServer-determinedUnknown — cannot estimate without querying the endpoint

When a method can't determine cost (LLM methods, remote APIs), rosetta reports rather than guessing. Use --dry to see cost estimates without actually translating.