Supported Languages
rosetta ships with Language Cards — structured configuration files for 50 languages. Each card contains register presets, formality system metadata, method support flags, typography rules, and script information. Any language your LLM knows can be added with a single config line — these are the ones with curated, production-ready registers.
Translation Methods
Each language can use one or more of these translation methods:
| Icon | Method | How It Works | Cost |
|---|---|---|---|
| 🟢 | Google Translate | Neural MT baseline. 130+ languages. Key-value strings only — cannot safely translate Markdown content. | ~$20/1M chars |
| 🔵 | LLM (OpenRouter) | Any language the model knows. Register-steered prompts. Handles key-value + Markdown content. | Varies by model |
| 🟣 | LLM-Coached | LLM + grammar dictionaries + coaching data injected into prompts. Best for morphologically complex languages. | Varies by model |
| 🟠 | API (Plugin) | Community-hosted translation pipelines served over HTTP. OCAP-compatible. | Varies by provider |
Set GOOGLE_TRANSLATE_API_KEY for Google Translate, or OPENROUTER_API_KEY for LLM methods. See Translation Methods for full details.
Priority Languages
These are the most commonly requested locales for web and mobile applications, listed in rosetta's recommended accessibility-first order.
| Flag | Language | Code | LLM | Coached | Script | Notes | |
|---|---|---|---|---|---|---|---|
| 🇸🇦 | Arabic | ar | ✅ | ✅ | ✅ | — | RTL. Modern Standard Arabic (فصحى). |
| 🇵🇭 | Filipino (Taglish) | tl | ✅ | ✅ | ✅ | — | Code-switching: Tagalog primary, technical terms in English. |
| 🇫🇷 | French | fr | ✅ | ✅ | ✅ | — | Vous-form. Gender-inclusive (Connecté·e). |
| 🇪🇸 | Spanish | es | ✅ | ✅ | ✅ | — | Neutral Latin American. |
| 🇩🇪 | German | de | ✅ | ✅ | ✅ | — | Sie-form. Gender-inclusive (Benutzer:innen). |
| 🇯🇵 | Japanese | ja | ✅ | ✅ | ✅ | — | です/ます for body text, する for UI labels. |
| 🇨🇳 | Chinese (Simplified) | zh | ✅ | ✅ | ✅ | — | 简体中文. |
| 🇮🇹 | Italian | it | ✅ | ✅ | ✅ | — | Lei-form. |
| 🇧🇷 | Portuguese (BR) | pt | ✅ | ✅ | ✅ | — | Brazilian Portuguese. |
| 🇰🇷 | Korean | ko | ✅ | ✅ | ✅ | — | 해요체 polite register. |
Major World Languages
| Flag | Language | Code | LLM | Coached | Script | Notes | |
|---|---|---|---|---|---|---|---|
| 🇧🇩 | Bengali | bn | ✅ | ✅ | ✅ | — | শুদ্ধ ভাষা preference. |
| 🇧🇬 | Bulgarian | bg | ✅ | ✅ | ✅ | — | |
| 🇨🇿 | Czech | cs | ✅ | ✅ | ✅ | — | Vykání (vy-form). |
| 🇩🇰 | Danish | da | ✅ | ✅ | ✅ | — | |
| 🇬🇷 | Greek | el | ✅ | ✅ | ✅ | — | Modern Δημοτική. |
| 🇮🇷 | Persian | fa | ✅ | ✅ | ✅ | — | RTL. |
| 🇫🇮 | Finnish | fi | ✅ | ✅ | ✅ | — | No grammatical gender. |
| 🇮🇱 | Hebrew | he | ✅ | ✅ | ✅ | — | RTL. |
| 🇮🇳 | Hindi | hi | ✅ | ✅ | ✅ | — | शुद्ध हिन्दी. Minimal English loanwords. |
| 🇭🇺 | Hungarian | hu | ✅ | ✅ | ✅ | — | Ön-form. |
| 🇮🇩 | Indonesian | id | ✅ | ✅ | ✅ | — | |
| 🇲🇾 | Malay | ms | ✅ | ✅ | ✅ | — | |
| 🇳🇱 | Dutch | nl | ✅ | ✅ | ✅ | — | U-form. |
| 🇳🇴 | Norwegian | nb | ✅ | ✅ | ✅ | — | Bokmål. |
| 🇵🇱 | Polish | pl | ✅ | ✅ | ✅ | — | Pan/Pani form. |
| 🇵🇹 | Portuguese (EU) | pt-PT | ✅ | ✅ | ✅ | — | European Portuguese. |
| 🇷🇴 | Romanian | ro | ✅ | ✅ | ✅ | — | |
| 🇷🇺 | Russian | ru | ✅ | ✅ | ✅ | — | Вы-form. |
| 🇸🇰 | Slovak | sk | ✅ | ✅ | ✅ | — | Vykanie (vy-form). |
| 🇷🇸 | Serbian | sr | ✅ | ✅ | ✅ | 🔤 Latin→Cyrillic | Deterministic script converter. |
| 🇸🇪 | Swedish | sv | ✅ | ✅ | ✅ | — | |
| 🇰🇪 | Swahili | sw | ✅ | ✅ | ✅ | — | |
| 🇹🇭 | Thai | th | ✅ | ✅ | ✅ | — | ครับ/ค่ะ politeness particles. |
| 🇹🇷 | Turkish | tr | ✅ | ✅ | ✅ | — | Siz-form. |
| 🇺🇦 | Ukrainian | uk | ✅ | ✅ | ✅ | — | Ви-form. |
| 🇵🇰 | Urdu | ur | ✅ | ✅ | ✅ | — | RTL. آپ form. |
| 🇻🇳 | Vietnamese | vi | ✅ | ✅ | ✅ | — | |
| 🇹🇼 | Chinese (Traditional) | zh-TW | ✅ | ✅ | ✅ | — | 繁體中文. |
| 🇬🇪 | Georgian | ka | ✅ | ✅ | — | — | ქართული. Kartvelian family. |
| 🇳🇬 | Yoruba | yo | ✅ | ✅ | — | — | Èdè Yorùbá. Tonal (3 tones). |
Regional Variants
| Flag | Language | Code | LLM | Coached | Script | Notes | |
|---|---|---|---|---|---|---|---|
| 🇲🇽 | Mexican Spanish | es-MX | ✅ | ✅ | ✅ | — | Tú-form. Warm register. |
| 🇨🇦 | Canadian French | fr-CA | ✅ | ✅ | ✅ | — | Québécois idioms. |
Indigenous & Low-Resource Languages
These languages are not supported by commercial MT services. rosetta provides the tooling for language communities to build their own methods under OCAP principles.
| Language | Code | LLM | Coached | Script | Status | ||
|---|---|---|---|---|---|---|---|
| 🪶 | Plains Cree | crk | ❌ | ✅ | ✅ | 🔤 SRO→Syllabics | 🚧 Under development |
| 🌄 | Quechua | qu | ✅ | ✅ | — | — | Runasimi. Evidential suffixes. |
:::info Plains Cree is under active development The register, coaching infrastructure, script converter, and evaluation harness for Plains Cree are all functional, but the translation pipeline has not yet been released. We are working with language communities under OCAP principles to ensure quality before release. See Support a Low-Resource Language for the full story — and how you can contribute. :::
:::tip Adding more low-resource languages rosetta's method plugin system is designed for this. A language community can build a custom translation method, host it under their own control, and serve it via the API method. The Method Leaderboard tracks scores for any language pair — build a method, run the harness, and claim the top score. :::
Constructed Languages
Conlangs are supported via LLM registers and optional script converters. They use the same infrastructure as real languages — the quality gate, coaching system, and script conversion pipeline work identically.
| Language | Code | LLM | Script | Notes | ||
|---|---|---|---|---|---|---|
| 🖖 | Klingon | tlh | ❌ | ✅ | 🔤 Romanization→pIqaD | PUA font required. Marc Okrand vocabulary. |
| 🧝 | Sindarin (Tolkien Elvish) | x-elvish-s | ❌ | ✅ | 🔤 Latin→Tengwar | CSUR PUA font required. |
| 🏴☠️ | Pirate English | x-pirate | ❌ | ✅ | — | Register only. Nautical metaphors. |
| 🦸 | Kryptonian | x-kryptonian | ❌ | ✅ | 🔤 Latin→Kryptonian | PUA font required. |
| 🎭 | Shakespearean English | x-shakespeare | ❌ | ✅ | — | Register only. Thee/thou, -eth/-est forms. |
| 🐸 | Yoda-speak | x-yoda | ❌ | ✅ | — | Register only. OSV word order. |
See Conlangs, Scripts & Orthography for PUA font requirements, Unicode limitations, and how to add your own.
Language Presets
The init wizard supports preset names for quick setup. You can mix presets with individual codes.
| Preset | Expands To |
|---|---|
european | fr, de, es, it, pt, nl |
asian | ja, zh, ko |
global | fr, es, de, ja, zh, ko, pt, ar |
nordic | da, fi, nb, sv |
# Mix presets with individual codes
i18n-rosetta init
# → Target languages: european, ja
# → Resolves to: fr, de, es, it, pt, nl, ja
Adding Any Language
rosetta can translate to any language your LLM knows — the table above just lists languages with built-in register presets. To add an unlisted language, include its BCP-47 code in your config:
{
"languages": {
"sw": {},
"am": {
"register": "Formal Amharic. Professional register with Geʽez script."
}
}
}
The LLM will translate using its training knowledge of the language. Setting a register gives you control over tone, formality, and orthographic conventions. See Configuration for details.
Language Cards
Each built-in language has a Language Card — structured JSON configuration split into two tiers for performance:
Two-Tier Architecture
| Tier | Directory | Loaded | Purpose |
|---|---|---|---|
| Runtime | lib/data/language-cards/ | Eagerly at import | Translation engine: registers, formality, rules, method support |
| Reference | lib/data/language-reference/ | Lazily on demand | Developer docs: linguistic challenges, encyclopedic data, NLP resources |
The runtime tier stays small (~2 KB/card) so importing rosetta doesn't load megabytes of documentation data. The reference tier is available via getLanguageReference(code) for tools, the website, and the eval harness.
Runtime Card Fields
| Field | What It Contains |
|---|---|
nativeName | Endonym — the language's name for itself, in its own script (e.g., ქართული, Runasimi) |
| Formality system | T-V distinction, speech levels, keigo, particles, etc. |
| Register presets | Named LLM prompt presets specific to the language's character |
| Method support | Which translation APIs support this language |
| Gender guidance | Grammatical gender rules and inclusive writing tips |
| Script/direction | ISO 15924 script code and RTL/LTR |
| Rules | Typography (quotes, spacing), capitalization, plural categories |
| Eval datasets | Which benchmarks cover this language |
glottocode | Canonical Glottolog identifier for cross-referencing |
humanReviewed | Whether the card has been reviewed by a speaker |
Reference Card Fields
| Field | What It Contains |
|---|---|
| Linguistic challenges | MT-specific pitfalls (e.g., evidentiality, tonal diacritics, agglutination) |
| Encyclopedic | Language family, classification, speaker count, regions |
| Resources | NLP tools, parallel corpora, pre-trained models |
Scaffolding a New Language Card
Use the generator to scaffold both tiers from authoritative data sources (IANA, CLDR, Glottolog):
# Preview what would be generated
node scripts/generate-language-card.mjs sw --dry-run
# Generate both runtime + reference cards
node scripts/generate-language-card.mjs sw
The generator auto-populates metadata (codes, script, direction, plurals, quotes, method support, language family) and marks linguistic judgment fields as TODO for human curation.
Using Preset Keys
Instead of writing full register text, you can use a preset key name:
{
"languages": {
"fr": "casual-tu",
"ko": "formal-hapsyo",
"ja": "polite"
}
}
Rosetta resolves the key to the full register prompt. Run npx i18n-rosetta init to see available presets for each language.
Example Presets
| Language | Presets | Default |
|---|---|---|
| French | formal-vous, casual-tu | formal-vous |
| Korean | polite-haeyo, formal-hapsyo, casual-hae | polite-haeyo |
| Japanese | polite, formal-keigo, casual | polite |
| German | formal-Sie, casual-du | formal-Sie |
| Thai | neutral-professional, polite-male, polite-female | neutral-professional |
| Spanish | neutral-professional, formal-usted, casual-tuteo | neutral-professional |
See Contributing a Language Card for the full spec, including field validation and PR checklist.
See Also
- Configuration — full config reference including language setup
- Translation Methods — how each method works
- Script Converters — deterministic script conversion pipeline
- Conlangs, Scripts & Orthography — PUA fonts, Unicode, adding conlangs
- Support a Low-Resource Language — building methods for underserved languages