Skip to main content

Script Converters

Script converters are deterministic, LLM-free post-translation hooks that convert text from one writing system to another. They enable a "translate once, render in multiple scripts" workflow — you translate into a working script (typically Latin), then convert to the display script automatically.

Why Script Converters?

Some languages use multiple scripts for the same spoken language:

  • Plains Cree: SRO (Latin) for editing → Syllabics (ᓀᐦᐃᔭᐍᐏᐣ) for display
  • Serbian: Latin for international use → Cyrillic for domestic use
  • Klingon: Romanization for typing → pIqaD ( ) for display

Translating directly into non-Latin scripts creates problems: LLMs hallucinate characters, JSON files become hard to version-control, and diff tools can't compare changes. Script converters solve this by keeping translations in a version-control-friendly script and converting deterministically at sync time.

Available Converters

Rosetta ships with five built-in script converters:

LocaleFromToTypeFont Required?
crkSRO (Standard Roman Orthography)Cree SyllabicsDeterministicNo — native Unicode
srLatinCyrillicDeterministicNo — native Unicode
tlhRomanizationpIqaDDeterministicYes — PUA U+F8D0–F8FF
x-elvish-sLatinTengwar (Mode of Beleriand)DeterministicYes — PUA U+E000–E07F
x-kryptonianLatinKryptonianFont-based cipherYes — PUA U+E100–E119

Deterministic vs. Font-Based

  • Deterministic converters (Cree, Serbian, Klingon, Tengwar) perform real character-to-character mapping using linguistic rules. The output contains actual Unicode characters.
  • Font-based converters (Kryptonian) are 1:1 substitution ciphers where the output is Unicode PUA characters that only render correctly with a specific font loaded.

How They Work

Script converters run after translation as a post-processing step. The pipeline is:

Source (English) → LLM Translation → Working Script → Script Converter → Display Script

For example, Plains Cree:

"Welcome" → LLM → "tānisi" (SRO) → Converter → "ᑖᓂᓯ" (Syllabics)

Greedy Left-to-Right Matching

All converters use the same algorithm: at each character position, try the longest possible match first, then progressively shorter matches. Characters that don't match any pattern (spaces, punctuation, numbers) pass through unchanged.

This handles digraphs and trigraphs correctly:

  • Klingon: tlh → single pIqaD character (not t + l + h)
  • Serbian: njњ (not н + ј)
  • Cree: twê → single syllabic (not t + w + ê)

Using Script Converters

Script converters activate automatically when the locale code matches a registered converter. No configuration needed — just set your target locale:

i18n-rosetta.config.json
{
"pairs": {
"en:crk": {
"method": "llm-coached",
"model": "google/gemini-2.5-pro"
}
}
}

When rosetta syncs the en:crk pair, translations are first produced in SRO, then automatically converted to Syllabics before writing to crk.json.

Checking Converter Status

npx i18n-rosetta status

The status output shows which pairs have active script converters and what conversion they perform.

Web Font Requirements

Three converters output Unicode Private Use Area (PUA) characters that require custom web fonts:

Klingon (pIqaD)

Install a CSUR-compatible pIqaD font (e.g., "pIqaD qolqoS" or "Klingon pIqaD HaSta"):

@font-face {
font-family: 'pIqaD';
src: url('/fonts/pIqaD.woff2') format('woff2');
unicode-range: U+F8D0-F8FF;
}

:lang(tlh) {
font-family: 'pIqaD', sans-serif;
}

Tengwar (Sindarin)

Install a CSUR-compatible Tengwar font (e.g., "Tengwar Formal CSUR", "Tengwar Annatar"):

@font-face {
font-family: 'Tengwar';
src: url('/fonts/tengwar-formal-csur.woff2') format('woff2');
unicode-range: U+E000-E07F;
}

:lang(x-elvish-s) {
font-family: 'Tengwar', serif;
}

Kryptonian

Install a Kryptonian font mapped to PUA codepoints U+E100–E119:

@font-face {
font-family: 'Kryptonian';
src: url('/fonts/kryptonian.woff2') format('woff2');
unicode-range: U+E100-E119;
}

:lang(x-kryptonian) {
font-family: 'Kryptonian', sans-serif;
}

:::tip Alternative approach for Kryptonian Since Kryptonian is a pure A-Z cipher, you can skip the script converter entirely and apply the font to Latin text via CSS. This is often simpler for web deployments — just serve the Kryptonian font and set font-family on the relevant elements. :::

Adding a Custom Converter

To add a converter for a new language, edit lib/scripts.js:

  1. Create the conversion map — an ordered array of [from, to] pairs, longest sequences first
  2. Create the converter function — a greedy left-to-right scanner (use sroToSyllabics as a template)
  3. Register it in the SCRIPT_CONVERTERS object with the locale code as key
  4. Add the script field to the language's register entry in registers.js
// Example: adding a converter for Cherokee (chr)
const LATIN_TO_CHEROKEE_MAP = [
['ga', 'Ꭶ'], ['ka', 'Ꭷ'], ['ge', 'Ꭸ'], // ...
];

function latinToCherokee(text) {
// Same greedy left-to-right pattern as other converters
}

SCRIPT_CONVERTERS['chr'] = {
from: 'Latin',
to: 'Cherokee Syllabary',
type: 'deterministic',
converter: latinToCherokee,
};

See Also