Skip to main content

Supported Languages

rosetta ships with Language Cards — structured configuration files for 50 languages. Each card contains register presets, formality system metadata, method support flags, typography rules, and script information. Any language your LLM knows can be added with a single config line — these are the ones with curated, production-ready registers.


Translation Methods

Each language can use one or more of these translation methods:

IconMethodHow It WorksCost
🟢Google TranslateNeural MT baseline. 130+ languages. Key-value strings only — cannot safely translate Markdown content.~$20/1M chars
🔵LLM (OpenRouter)Any language the model knows. Register-steered prompts. Handles key-value + Markdown content.Varies by model
🟣LLM-CoachedLLM + grammar dictionaries + coaching data injected into prompts. Best for morphologically complex languages.Varies by model
🟠API (Plugin)Community-hosted translation pipelines served over HTTP. OCAP-compatible.Varies by provider

Set GOOGLE_TRANSLATE_API_KEY for Google Translate, or OPENROUTER_API_KEY for LLM methods. See Translation Methods for full details.


Priority Languages

These are the most commonly requested locales for web and mobile applications, listed in rosetta's recommended accessibility-first order.

FlagLanguageCodeGoogleLLMCoachedScriptNotes
🇸🇦ArabicarRTL. Modern Standard Arabic (فصحى).
🇵🇭Filipino (Taglish)tlCode-switching: Tagalog primary, technical terms in English.
🇫🇷FrenchfrVous-form. Gender-inclusive (Connecté·e).
🇪🇸SpanishesNeutral Latin American.
🇩🇪GermandeSie-form. Gender-inclusive (Benutzer:innen).
🇯🇵Japanesejaです/ます for body text, する for UI labels.
🇨🇳Chinese (Simplified)zh简体中文.
🇮🇹ItalianitLei-form.
🇧🇷Portuguese (BR)ptBrazilian Portuguese.
🇰🇷Koreanko해요체 polite register.

Major World Languages

FlagLanguageCodeGoogleLLMCoachedScriptNotes
🇧🇩Bengalibnশুদ্ধ ভাষা preference.
🇧🇬Bulgarianbg
🇨🇿CzechcsVykání (vy-form).
🇩🇰Danishda
🇬🇷GreekelModern Δημοτική.
🇮🇷PersianfaRTL.
🇫🇮FinnishfiNo grammatical gender.
🇮🇱HebrewheRTL.
🇮🇳Hindihiशुद्ध हिन्दी. Minimal English loanwords.
🇭🇺HungarianhuÖn-form.
🇮🇩Indonesianid
🇲🇾Malayms
🇳🇱DutchnlU-form.
🇳🇴NorwegiannbBokmål.
🇵🇱PolishplPan/Pani form.
🇵🇹Portuguese (EU)pt-PTEuropean Portuguese.
🇷🇴Romanianro
🇷🇺RussianruВы-form.
🇸🇰SlovakskVykanie (vy-form).
🇷🇸Serbiansr🔤 Latin→CyrillicDeterministic script converter.
🇸🇪Swedishsv
🇰🇪Swahilisw
🇹🇭Thaithครับ/ค่ะ politeness particles.
🇹🇷TurkishtrSiz-form.
🇺🇦UkrainianukВи-form.
🇵🇰UrduurRTL. آپ form.
🇻🇳Vietnamesevi
🇹🇼Chinese (Traditional)zh-TW繁體中文.
🇬🇪Georgiankaქართული. Kartvelian family.
🇳🇬YorubayoÈdè Yorùbá. Tonal (3 tones).

Regional Variants

FlagLanguageCodeGoogleLLMCoachedScriptNotes
🇲🇽Mexican Spanishes-MXTú-form. Warm register.
🇨🇦Canadian Frenchfr-CAQuébécois idioms.

Indigenous & Low-Resource Languages

These languages are not supported by commercial MT services. rosetta provides the tooling for language communities to build their own methods under OCAP principles.

LanguageCodeGoogleLLMCoachedScriptStatus
🪶Plains Creecrk🔤 SRO→Syllabics🚧 Under development
🌄QuechuaquRunasimi. Evidential suffixes.

:::info Plains Cree is under active development The register, coaching infrastructure, script converter, and evaluation harness for Plains Cree are all functional, but the translation pipeline has not yet been released. We are working with language communities under OCAP principles to ensure quality before release. See Support a Low-Resource Language for the full story — and how you can contribute. :::

:::tip Adding more low-resource languages rosetta's method plugin system is designed for this. A language community can build a custom translation method, host it under their own control, and serve it via the API method. The Method Leaderboard tracks scores for any language pair — build a method, run the harness, and claim the top score. :::


Constructed Languages

Conlangs are supported via LLM registers and optional script converters. They use the same infrastructure as real languages — the quality gate, coaching system, and script conversion pipeline work identically.

LanguageCodeGoogleLLMScriptNotes
🖖Klingontlh🔤 Romanization→pIqaDPUA font required. Marc Okrand vocabulary.
🧝Sindarin (Tolkien Elvish)x-elvish-s🔤 Latin→TengwarCSUR PUA font required.
🏴‍☠️Pirate Englishx-pirateRegister only. Nautical metaphors.
🦸Kryptonianx-kryptonian🔤 Latin→KryptonianPUA font required.
🎭Shakespearean Englishx-shakespeareRegister only. Thee/thou, -eth/-est forms.
🐸Yoda-speakx-yodaRegister only. OSV word order.

See Conlangs, Scripts & Orthography for PUA font requirements, Unicode limitations, and how to add your own.


Language Presets

The init wizard supports preset names for quick setup. You can mix presets with individual codes.

PresetExpands To
europeanfr, de, es, it, pt, nl
asianja, zh, ko
globalfr, es, de, ja, zh, ko, pt, ar
nordicda, fi, nb, sv
# Mix presets with individual codes
i18n-rosetta init
# → Target languages: european, ja
# → Resolves to: fr, de, es, it, pt, nl, ja

Adding Any Language

rosetta can translate to any language your LLM knows — the table above just lists languages with built-in register presets. To add an unlisted language, include its BCP-47 code in your config:

{
"languages": {
"sw": {},
"am": {
"register": "Formal Amharic. Professional register with Geʽez script."
}
}
}

The LLM will translate using its training knowledge of the language. Setting a register gives you control over tone, formality, and orthographic conventions. See Configuration for details.


Language Cards

Each built-in language has a Language Card — structured JSON configuration split into two tiers for performance:

Two-Tier Architecture

TierDirectoryLoadedPurpose
Runtimelib/data/language-cards/Eagerly at importTranslation engine: registers, formality, rules, method support
Referencelib/data/language-reference/Lazily on demandDeveloper docs: linguistic challenges, encyclopedic data, NLP resources

The runtime tier stays small (~2 KB/card) so importing rosetta doesn't load megabytes of documentation data. The reference tier is available via getLanguageReference(code) for tools, the website, and the eval harness.

Runtime Card Fields

FieldWhat It Contains
nativeNameEndonym — the language's name for itself, in its own script (e.g., ქართული, Runasimi)
Formality systemT-V distinction, speech levels, keigo, particles, etc.
Register presetsNamed LLM prompt presets specific to the language's character
Method supportWhich translation APIs support this language
Gender guidanceGrammatical gender rules and inclusive writing tips
Script/directionISO 15924 script code and RTL/LTR
RulesTypography (quotes, spacing), capitalization, plural categories
Eval datasetsWhich benchmarks cover this language
glottocodeCanonical Glottolog identifier for cross-referencing
humanReviewedWhether the card has been reviewed by a speaker

Reference Card Fields

FieldWhat It Contains
Linguistic challengesMT-specific pitfalls (e.g., evidentiality, tonal diacritics, agglutination)
EncyclopedicLanguage family, classification, speaker count, regions
ResourcesNLP tools, parallel corpora, pre-trained models

Scaffolding a New Language Card

Use the generator to scaffold both tiers from authoritative data sources (IANA, CLDR, Glottolog):

# Preview what would be generated
node scripts/generate-language-card.mjs sw --dry-run

# Generate both runtime + reference cards
node scripts/generate-language-card.mjs sw

The generator auto-populates metadata (codes, script, direction, plurals, quotes, method support, language family) and marks linguistic judgment fields as TODO for human curation.

Using Preset Keys

Instead of writing full register text, you can use a preset key name:

{
"languages": {
"fr": "casual-tu",
"ko": "formal-hapsyo",
"ja": "polite"
}
}

Rosetta resolves the key to the full register prompt. Run npx i18n-rosetta init to see available presets for each language.

Example Presets

LanguagePresetsDefault
Frenchformal-vous, casual-tuformal-vous
Koreanpolite-haeyo, formal-hapsyo, casual-haepolite-haeyo
Japanesepolite, formal-keigo, casualpolite
Germanformal-Sie, casual-duformal-Sie
Thaineutral-professional, polite-male, polite-femaleneutral-professional
Spanishneutral-professional, formal-usted, casual-tuteoneutral-professional

See Contributing a Language Card for the full spec, including field validation and PR checklist.


See Also