Conlangs, Scripts & Orthography

rosetta has first-class support for constructed languages via LLM registers and deterministic script converters. This guide covers how conlang support works, what fonts you need, and how to add your own.

:::tip Why conlangs matter Conlangs aren't just novelty — they exercise the exact same infrastructure used for real underserved languages. The quality gate, coaching system, and script conversion pipeline work identically for Klingon and Plains Cree. If your conlang pipeline works, your low-resource language pipeline will too. :::

Supported Constructed Languages

Language	Code	Script Converter	Font Required
Klingon	`tlh`	✅ Romanization → pIqaD	PUA font (e.g., pIqaD qolqoS)
Sindarin (Tolkien Elvish)	`x-elvish-s`	✅ Latin → Tengwar	CSUR PUA font
Kryptonian	`x-kryptonian`	✅ Latin → Kryptonian	PUA font
Pirate English	`x-pirate`	❌ register only	None
Shakespearean English	`x-shakespeare`	❌ register only	None
Yoda-speak	`x-yoda`	❌ register only	None

Conlang codes use the x- prefix per BCP-47 private-use convention, except Klingon (tlh) which has an ISO 639-3 code assigned by SIL International.

Unicode, PUA, and Font Requirements

The Private Use Area

Klingon (pIqaD), Sindarin (Tengwar), and Kryptonian use Unicode Private Use Area (PUA) characters. PUA is the range U+E000–U+F8FF — these codepoints have no standard assignment. The ConScript Unicode Registry (CSUR) maintains community-agreed mappings for fictional scripts, but these are not part of the Unicode standard.

What this means in practice:

PUA text renders as empty boxes (□□□) without the correct font loaded
Different fonts may map different glyphs to the same PUA codepoints
rosetta does NOT bundle PUA fonts — you must load them yourself
System fonts will never render these characters

PUA Ranges by Script

Script	PUA Range	CSUR Reference
Klingon (pIqaD)	U+F8D0–U+F8FF	CSUR Klingon
Tengwar (Elvish)	U+E000–U+E07F	CSUR Tengwar
Kryptonian	Varies by font	No CSUR standard

Loading PUA Web Fonts

rosetta includes a built-in command to download and manage PUA web fonts:

# See which fonts are needed for your configured languages
i18n-rosetta fonts list

# Download all needed fonts (auto-detects project type for output directory)
i18n-rosetta fonts install

# Also generate a CSS snippet with @font-face declarations
i18n-rosetta fonts install --css

The fonts install command downloads from verified open-source repositories:

Font	Script	License	Source
pIqaD qolqoS	Klingon	SIL Open Font License 1.1	GitHub
FreeMonoTengwar	Tengwar	GNU GPL v3 (with font exception)	SourceForge
(user-provided)	Kryptonian	Varies	No open-source PUA font available

The output directory is auto-detected from your project structure (Docusaurus → static/fonts/, Hugo → static/fonts/, default → public/fonts/). Override with --dir.

If you prefer to manage fonts manually, add @font-face rules in your CSS:

@font-face {
  font-family: 'pIqaD';
  src: url('/fonts/pIqaDqolqoS.ttf') format('truetype');
  font-display: swap;
  unicode-range: U+F8D0-F8FF;
}

/* Apply to Klingon text elements */
[lang="tlh"], [data-script="piqad"] {
  font-family: 'pIqaD', sans-serif;
}

:::warning Unicode support is NOT guaranteed The Unicode Consortium has explicitly declined to encode fictional scripts in the standard. PUA assignments are community-maintained and may conflict between font implementations. Always specify the exact font your project uses, and test rendering across browsers. :::

Script Converters

How They Work

rosetta's script conversion is a post-translation hook:

The LLM translates text into a working script (usually Latin or SRO)
The quality gate validates the output
The deterministic converter transforms the validated text into the display script
The converted text is written to disk

This two-step approach works because LLMs produce better output when working in Latin-based scripts. The deterministic converter guarantees correct script output without relying on the model's (often unreliable) script knowledge.

All Five Converters

rosetta ships with five built-in script converters:

Plains Cree: SRO → Syllabics (`crk`)

Standard Roman Orthography to Canadian Aboriginal Syllabics.

Input:  "tawâw"
Output: "ᑕᐚᐤ"

Long vowels use macron/circumflex: ê, î, ô, â. The converter handles all SRO diacritics and maps them to the correct syllabic characters. See Support a Low-Resource Language for the full Cree pipeline.

Serbian: Latin → Cyrillic (`sr`)

Deterministic Latin-to-Cyrillic conversion for Serbian.

Input:  "zdravo"
Output: "здраво"

This handles the full Serbian alphabet mapping including digraphs (lj → љ, nj → њ, dž → џ).

Klingon: Romanization → pIqaD (`tlh`)

Marc Okrand's romanization system to pIqaD PUA characters.

Input:  "Qapla'"    (romanized Klingon)
Output: [pIqaD PUA] (requires pIqaD font to render)

Sindarin: Latin → Tengwar (`x-elvish-s`)

Tolkien's Sindarin mode Tengwar mapping.

Input:  "elen síla"  (Latin Sindarin)
Output: [Tengwar PUA] (requires Tengwar font to render)

Kryptonian: Latin → Kryptonian (`x-kryptonian`)

Fan-lexicon Kryptonian script mapping.

Input:  "Kal-El"
Output: [Kryptonian PUA] (requires Kryptonian font to render)

Triggering a Converter

Set the scripts field in your language config. For built-in converters, this is auto-detected from the language code:

{
  "languages": {
    "sr": { "scripts": "sr" },
    "crk": {}
  }
}

Plains Cree (crk) auto-detects — you don't need to set scripts explicitly.

Multi-Script Languages

Some real languages use multiple active scripts:

Language	Scripts	rosetta Approach
Serbian	Latin + Cyrillic	Script converter (`sr`) — translate in Latin, convert to Cyrillic
Chinese	Simplified + Traditional	Separate locale codes (`zh` vs `zh-TW`) with distinct registers

For languages where both scripts serve the same audience (Serbian), use a script converter. For languages where the scripts serve different audiences (Chinese Simplified for mainland China, Traditional for Taiwan/HK), use separate locale codes.

Orthography Notes

Registers aren't just tone — they carry orthographic instructions that steer the LLM toward correct writing conventions.

Formal Address Forms

rosetta's built-in registers include the culturally appropriate formal address for each language:

Language	Formal Form	Register Instruction
German	Sie	`Use Sie-form for formal address`
French	vous	`Use vous-form`
Russian	вы	`Professional register with вы-form`
Turkish	siz	`Professional register with siz-form`
Korean	합쇼체	`Formal Korean (합쇼체)`
Japanese	です/ます	`Polite professional register (です/ます form)`
Polish	Pan/Pani	`Professional register with Pan/Pani form`

Gender-Inclusive Writing

Each language card has a gender.inclusiveGuidance field with language-specific advice. This is injected into the LLM translation prompt separately from the register preset, so it applies consistently regardless of which formality preset the user chooses:

French: Écriture inclusive with interpunct notation (e.g., "Connecté·e")
German: Doppelpunkt notation (e.g., "Benutzer:innen")
Spanish: Gender-neutral restructuring preferred; slash notation (e.g., "usuario/a") as fallback

For languages without specific guidance in their card (e.g., Korean, conlangs), the system falls back to a generic rule: "prefer gender-neutral forms or the most inclusive option available."

RTL Script Requirements

Arabic, Hebrew, Persian, and Urdu registers all note right-to-left requirements: Ensure text reads naturally in RTL layout contexts.

Overriding Any Register

Every register is a config value — override it to match your project's voice:

{
  "languages": {
    "fr": {
      "register": "Casual French. Use tu-form. Conversational blog tone. Gender-neutral when possible."
    },
    "de": {
      "register": "Informal German. Use du-form. Tech startup voice."
    }
  }
}

See Configuration for the full config reference.

Adding a New Conlang

Step-by-step

Choose a BCP-47 private-use code: Use the x- prefix (e.g., x-dothraki, x-valyrian).
Add to your config:

{
  "languages": {
    "x-dothraki": {
      "register": "Dothraki language. Use David J. Peterson's vocabulary from the Living Language Dothraki textbook. Harsh, direct tone. No articles, no verb 'to be'."
    }
  }
}

(Optional) Add a script converter: If your conlang uses a non-Latin display script, add a converter in lib/scripts.js and register it in SCRIPT_CONVERTERS.
Test: Run i18n-rosetta sync --dry to preview translations without writing files.
Check the quality gate: The quality gate may need tuning for your conlang — particularly the requireNonLatin check if your conlang uses PUA characters.

:::note Conlang quality depends on LLM knowledge The LLM can only translate into a conlang it has seen in training data. Well-documented conlangs (Klingon, Sindarin, Dothraki) work well. Obscure or newly invented conlangs may produce inconsistent results. Use coaching data to improve quality. :::

Supported Constructed Languages​

Unicode, PUA, and Font Requirements​

The Private Use Area​

PUA Ranges by Script​

Loading PUA Web Fonts​

Script Converters​

How They Work​

All Five Converters​

Plains Cree: SRO → Syllabics (crk)​

Serbian: Latin → Cyrillic (sr)​

Klingon: Romanization → pIqaD (tlh)​

Sindarin: Latin → Tengwar (x-elvish-s)​

Kryptonian: Latin → Kryptonian (x-kryptonian)​

Triggering a Converter​

Multi-Script Languages​

Orthography Notes​

Formal Address Forms​

Gender-Inclusive Writing​

RTL Script Requirements​

Overriding Any Register​

Adding a New Conlang​

Step-by-step​

See Also​