Pag-serve ng Custom Method bilang isang API
Hinahayaan po kayo ng api method ng i18n-rosetta na i-point ang anumang translation pair sa isang external HTTP endpoint. Ganito niyo po i-integrate ang mga pipelines na masyadong complex para sa isang single LLM prompt — morphological analyzers, finite-state transducers (FSTs), multi-step LLM chains, o anumang custom research method na na-build niyo.
Bakit isang API Service?
May mga translation pipelines po na hindi pwedeng i-run sa loob ng isang simpleng prompt-response cycle:
| Pipeline step | Halimbawa |
|---|---|
| Morphological decomposition | I-split ang mga polysynthetic words sa mga morphemes bago ang translation |
| FST validation | I-reject ang mga outputs na nag-violate ng phonological o morphological rules |
| Multi-step LLM chains | Generate → verify → correct cycles gamit ang iba't ibang models |
| Dictionary lookup | I-cross-reference ang isang curated bilingual dictionary sa kalagitnaan ng pipeline |
| Human-in-the-loop | I-queue ang mga uncertain translations para sa expert review |
Tine-treat po ng api method ang inyong pipeline bilang isang black box — magse-send ang i18n-rosetta ng source strings, at magre-return naman ang inyong service ng translations. Kung ano man po ang mangyari sa loob ay nakadepende na sa inyo.
Architecture
Pag-set Up ng Inyong Service
Kailangan pong mag-implement ang inyong API service ng isang single endpoint na nag-a-accept at nagre-return ng JSON:
Request Format
Ise-send po ng rosetta ang eksaktong JSON body na ito (tingnan ang api.js):
POST /translate
Content-Type: application/json
Authorization: Bearer <ROSETTA_API_KEY>
{
"source_locale": "en",
"target_locale": "crk",
"method": "crk-coached-v1",
"keys": {
"greeting": "Hello, welcome to our app",
"farewell": "Goodbye and thanks"
}
}
| Field | Type | Description |
|---|---|---|
source_locale | string | BCP 47 source language code |
target_locale | string | BCP 47 target language code |
method | string | Plugin name o "default" |
keys | object | Map ng key → source string na ita-translate |
### Response Format
Your service must return a `translations` object. An optional `meta` object can include cost and diagnostic info:
```json
{
"translations": {
"greeting": "tânisi, pê-kîwêw ôta",
"farewell": "ekosi mâka, kinanâskomitin"
},
"meta": {
"model": "my-custom-pipeline/v1",
"cost_usd": 0.0042,
"method": "decompose-translate-validate"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
translations | object | ✅ | Map of key → translated string |
meta | object | — | Optional metadata |
meta.cost_usd | number | — | If present, displayed in rosetta's output |
errors | object | — | For partial success (HTTP 207): map of key → { message } |
Minimal Express Server
import express from 'express';
const app = express();
app.use(express.json());
/**
* rosetta API contract:
*
* Request: { source_locale, target_locale, method, keys: { "key": "source" } }
* Response: { translations: { "key": "translated" }, meta: { ... } }
*/
app.post('/translate', async (req, res) => {
const { source_locale, target_locale, method, keys } = req.body;
const translations = {};
for (const [key, source] of Object.entries(keys)) {
// --- Dito ilalagay ang inyong pipeline ---
// Step 1: Morphological decomposition
const morphemes = await decompose(source, source_locale);
// Step 2: LLM translation na may context
const draft = await llmTranslate(morphemes, target_locale);
// Step 3: FST validation
const validated = await fstValidate(draft, target_locale);
// Step 4: Post-processing (orthography normalization, atbp.)
translations[key] = await postProcess(validated);
}
res.json({
translations,
meta: {
model: 'my-custom-pipeline/v1',
method: 'decompose-translate-validate',
},
});
});
app.listen(3001, () => {
console.log('Translation API running on http://localhost:3001');
});
Configuring i18n-rosetta
Point a translation pair at your running service in i18n-rosetta.config.json:
{
"inputLocale": "en",
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "http://localhost:3001/translate",
"register": "Formal Plains Cree. Use SRO orthography."
}
}
}
Then run sync as usual:
npx i18n-rosetta sync
i18n-rosetta will POST your source strings to the endpoint and write the returned translations to crk.json.
Case Study: Plains Cree Pipeline
:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::
The gds-mt-eval-harness project demonstrates this pattern. Its Plains Cree pipeline uses:
- Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
- LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
- FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
- Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage
The entire pipeline runs as a single HTTP endpoint that i18n-rosetta calls via the api method.
Running Evaluations
After translating, you can evaluate output quality using the harness directly:
# I-clone ang harness
git clone https://github.com/gamedaysuits/gds-mt-eval-harness.git
cd gds-mt-eval-harness
pip install -e .
# I-run ang evaluation laban sa output ng inyong method
python eval/baseline_experiment.py --dataset data/edtekla-dev-v1.json --submit
This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.
Authentication
If your API requires authentication, set the apiKey field or use an environment variable:
{
"pairs": {
"en:crk": {
"method": "api",
"endpoint": "https://my-mt-service.example.com/translate",
"apiKey": "${CRK_API_KEY}"
}
}
}
Data Sovereignty & OCAP Principles
The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:
- Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
- Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
- Access policies — the community decides who can call the endpoint and under what terms.
This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.
Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.
Cost Estimation
The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:
{
"translations": { "...": "..." },
"metadata": {
"cost": {
"estimatedCost": 0.0042,
"currency": "USD",
"source": "my-service-pricing"
}
}
}
Best Practices
- Mag-return ng empty strings para sa failures — Huwag i-return ang source string bilang isang "translation." I-return ang
""at hayaan ang fallback prefix mechanism ng i18n-rosetta na mag-handle nito. - Mag-include ng confidence scores — Kung kaya ng inyong pipeline na mag-estimate ng quality, i-return ito sa metadata. Makakatulong po ito sa quality auditing.
- Mag-implement ng health checks — Mag-add ng
GET /healthendpoint para ma-verify ng i18n-rosetta ang connectivity bago mag-start ng isang malaking sync. - Mag-rate limit nang maayos — Kung may throughput limits ang inyong pipeline, mag-return ng
429status codes. Magba-back off po ang batch system ng i18n-rosetta. - I-log ang lahat — Pwedeng mag-fail silently ang mga multi-step pipelines. I-log ang input/output ng bawat step para sa debugging.
Licensing
Fully open po ang api method pattern — walang licensing restrictions sa pag-wrap ng inyong sariling translation pipeline bilang isang HTTP service. Available po ang gds-mt-eval-harness sa ilalim ng MIT license para sa mga reference implementations.
Tingnan Din
- Translation Methods — overview ng bawat built-in method (
openai,google,api, atbp.) - Plugin Specification — buong schema para sa
i18n-rosetta.config.jsonkasama angapimethod fields - Support a Low-Resource Language — end-to-end guide para sa mga under-resourced languages, kasama ang OCAP principles
- Architecture — kung paano gumagana ang sync loop, batching, at method dispatch ng i18n-rosetta
- MT Evaluation — evaluation methodology, metrics, at ang leaderboard submission process
- Method Leaderboard — live quality rankings sa iba't ibang methods at language pairs