Een Custom Method als een API aanbieden

De api method van i18n-rosetta stelt u in staat om elk vertaalpaar naar een extern HTTP-endpoint te verwijzen. Dit is hoe u pipelines integreert die te complex zijn voor een enkele LLM-prompt — morfologische analyzers, finite-state transducers (FST's), multi-step LLM chains, of elke andere custom research method die u heeft gebouwd.

Waarom een API-service?

Sommige vertaal-pipelines kunnen niet draaien binnen een eenvoudige prompt-response cyclus:

Pipeline-stap	Voorbeeld
Morfologische decompositie	Splits polysynthetische woorden in morfemen vóór de vertaling
FST-validatie	Weiger outputs die fonologische of morfologische regels schenden
Multi-step LLM chains	Genereer → verifieer → corrigeer-cycli met verschillende modellen
Dictionary lookup	Raadpleeg een gecureerd tweetalig woordenboek halverwege de pipeline
Human-in-the-loop	Plaats onzekere vertalingen in een wachtrij voor beoordeling door experts

De api method behandelt uw pipeline als een black box — i18n-rosetta verstuurt source strings, uw service retourneert vertalingen. Wat er binnenin gebeurt, is volledig aan u.

Architectuur

Uw service instellen

Uw API-service moet een enkel endpoint implementeren dat JSON accepteert en retourneert:

Request-formaat

rosetta verstuurt exact deze JSON-body (zie api.js):

POST /translate
Content-Type: application/json
Authorization: Bearer <ROSETTA_API_KEY>

{
  "source_locale": "en",
  "target_locale": "crk",
  "method": "crk-coached-v1",
  "keys": {
    "greeting": "Hello, welcome to our app",
    "farewell": "Goodbye and thanks"
  }
}

Veld	Type	Beschrijving
`source_locale`	string	BCP 47 brontaalcode
`target_locale`	string	BCP 47 doeltaalcode
`method`	string	Plugin-naam of `"default"`
`keys`	object	Map van key → source string om te vertalen

### Response Format

Your service must return a `translations` object. An optional `meta` object can include cost and diagnostic info:

```json
{
  "translations": {
    "greeting": "tânisi, pê-kîwêw ôta",
    "farewell": "ekosi mâka, kinanâskomitin"
  },
  "meta": {
    "model": "my-custom-pipeline/v1",
    "cost_usd": 0.0042,
    "method": "decompose-translate-validate"
  }
}

Field	Type	Required	Description
`translations`	object	✅	Map of key → translated string
`meta`	object	—	Optional metadata
`meta.cost_usd`	number	—	If present, displayed in rosetta's output
`errors`	object	—	For partial success (HTTP 207): map of key → `{ message }`

Minimal Express Server

import express from 'express';

const app = express();
app.use(express.json());

/**
 * rosetta API contract:
 *
 * Request:  { source_locale, target_locale, method, keys: { "key": "source" } }
 * Response: { translations: { "key": "translated" }, meta: { ... } }
 */
app.post('/translate', async (req, res) => {
  const { source_locale, target_locale, method, keys } = req.body;

  const translations = {};

  for (const [key, source] of Object.entries(keys)) {
    // --- Your pipeline goes here ---
    // Step 1: Morphological decomposition
    const morphemes = await decompose(source, source_locale);

    // Step 2: LLM translation with context
    const draft = await llmTranslate(morphemes, target_locale);

    // Step 3: FST validation
    const validated = await fstValidate(draft, target_locale);

    // Step 4: Post-processing (orthography normalization, etc.)
    translations[key] = await postProcess(validated);
  }

  res.json({
    translations,
    meta: {
      model: 'my-custom-pipeline/v1',
      method: 'decompose-translate-validate',
    },
  });
});

app.listen(3001, () => {
  console.log('Translation API running on http://localhost:3001');
});

Configuring i18n-rosetta

Point a translation pair at your running service in i18n-rosetta.config.json:

{
  "inputLocale": "en",
  "pairs": {
    "en:crk": {
      "method": "api",
      "endpoint": "http://localhost:3001/translate",
      "register": "Formal Plains Cree. Use SRO orthography."
    }
  }
}

Then run sync as usual:

npx i18n-rosetta sync

i18n-rosetta will POST your source strings to the endpoint and write the returned translations to crk.json.

Case Study: Plains Cree Pipeline

:::info Under Development The Plains Cree pipeline described below is under active development and is not yet running in production. Details here reflect the current design direction and may change as the project evolves. :::

The gds-mt-eval-harness project demonstrates this pattern. Its Plains Cree pipeline uses:

Morphological decomposition — Break polysynthetic Cree words into translatable morpheme chains
LLM translation — Context-enriched GPT-4o translation with coaching data (SRO orthography rules, register instructions)
FST validation — Finite-state transducer checks that outputs conform to Cree phonological rules
Confidence scoring — Each translation gets a confidence score based on FST pass rate and dictionary coverage

The entire pipeline runs as a single HTTP endpoint that i18n-rosetta calls via the api method.

Running Evaluations

After translating, you can evaluate output quality using the harness directly:

# Clone the harness
git clone https://github.com/gamedaysuits/gds-mt-eval-harness.git
cd gds-mt-eval-harness
pip install -e .

# Run the evaluation against your method's output
python eval/baseline_experiment.py --dataset data/edtekla-dev-v1.json --submit

This produces structured evaluation records with chrF++, BLEU, and exact match scores that can be used as regression baselines.

Authentication

If your API requires authentication, set the apiKey field or use an environment variable:

{
  "pairs": {
    "en:crk": {
      "method": "api",
      "endpoint": "https://my-mt-service.example.com/translate",
      "apiKey": "${CRK_API_KEY}"
    }
  }
}

Data Sovereignty & OCAP Principles

The api method is particularly important for Indigenous language communities. By self-hosting the translation pipeline, a community keeps full control over:

Proprietary coaching data — register instructions, orthography rules, and domain glossaries never leave community infrastructure.
Linguistic resources — curated dictionaries, FST grammars, and elder-verified translations remain under community ownership.
Access policies — the community decides who can call the endpoint and under what terms.

This aligns with OCAP® principles (Ownership, Control, Access, Possession), ensuring that sensitive language data is governed by the community rather than a third-party platform.

tip

Combine the api method with a private deployment (e.g., a community-hosted VM or on-prem server) for the strongest data-sovereignty posture. See Support a Low-Resource Language for a full walkthrough.

Cost Estimation

The api method returns null for cost estimation by default — your service controls pricing. If you want to provide cost transparency, have your API return a cost field in the metadata:

{
  "translations": { "...": "..." },
  "metadata": {
    "cost": {
      "estimatedCost": 0.0042,
      "currency": "USD",
      "source": "my-service-pricing"
    }
  }
}

Best Practices

Retourneer lege strings bij fouten — Retourneer niet de source string als een "vertaling". Retourneer "" en de quality gate van i18n-rosetta zal dit opvangen. De key wordt overgeslagen en bij de volgende sync opnieuw geprobeerd.
Voeg confidence scores toe — Als uw pipeline de kwaliteit kan inschatten, retourneer dit dan in de metadata. Dit helpt bij kwaliteitsaudits.
Implementeer health checks — Voeg een GET /health endpoint toe zodat i18n-rosetta de connectiviteit kan verifiëren voordat een grote sync wordt gestart.
Pas rate limiting netjes toe — Als uw pipeline doorvoerlimieten heeft, retourneer dan 429 statuscodes. Het batch-systeem van i18n-rosetta zal dan een back-off toepassen.
Log alles — Multi-step pipelines kunnen ongemerkt falen (fail silently). Log de input/output van elke stap voor debugging.

Licenties

Het api method-patroon is volledig open — er zijn geen licentiebeperkingen voor het verpakken van uw eigen vertaal-pipeline als een HTTP-service. De gds-mt-eval-harness is beschikbaar onder de MIT-licentie voor referentie-implementaties.

Zie ook

Translation Methods — overzicht van elke ingebouwde method (openai, google, api, enz.)
Plugin Specification — volledig schema voor i18n-rosetta.config.json inclusief api method-velden
Support a Low-Resource Language — end-to-end gids voor talen met weinig bronnen (low-resource languages), inclusief OCAP-principes
Architecture — hoe de sync loop, batching en method dispatch van i18n-rosetta werken
MT Evaluation — evaluatiemethodologie, metrics en het indieningsproces voor het leaderboard
Method Leaderboard — live kwaliteitsranglijsten voor verschillende methods en vertaalparen

Waarom een API-service?​

Architectuur​

Uw service instellen​

Request-formaat​

Minimal Express Server​

Configuring i18n-rosetta​

Case Study: Plains Cree Pipeline​

Running Evaluations​

Authentication​

Data Sovereignty & OCAP Principles​

Cost Estimation​

Best Practices​

Licenties​

Zie ook​