OpenEuroLLM 9B — YaRN Multilingual 32K

A long-context extension of OpenEuroLLM 9B, trained via continued pre-training using YaRN to extend the context window from 2 048 → 32 768 tokens.

This is a base language model — not instruction-tuned. Intended for research into multilingual long-context language modelling.

Model details


Base model	OpenEuroLLM 9B (oellm-datamix-9b-80-20)
Architecture	LLaMA-style, 32 layers, hidden 4096, 32 heads
Context length	32 768 tokens
Extension method	YaRN (factor=16.0, original_max=2048)
Training	Continued pre-training, 1 000 iterations
Training tokens	~4.2B
Languages	bg · cs · da · et · fi · fr · hr · nl

RoPE scaling config

Add to config.json before loading:

"rope_scaling": {
  "factor": 16.0,
  "original_max_position_embeddings": 2048,
  "type": "yarn"
},
"rope_theta": 10000

Languages

Code	Language	Code	Language
bg	Bulgarian	fi	Finnish
cs	Czech	fr	French
da	Danish	hr	Croatian
et	Estonian	nl	Dutch

Evaluation — Base-LM NIAH

Forced-choice log-likelihood needle-in-a-haystack across 5 context lengths × 5 depths × 4 languages. Random-chance baseline: 25%.

Accuracy by language × context length (all depths averaged)

lang	2 048	4 096	8 192	16 384	32 768
fr	1.00	1.00	1.00	1.00	0.84
fi	1.00	1.00	1.00	1.00	0.86
cs	0.94	1.00	1.00	1.00	0.80
nl	0.98	1.00	1.00	1.00	0.73†

† NL 32K: depths 0%/25%/50% only.

Near-perfect across 2K–16K at all depths and languages. The only failure is 32K depth=0% (needle at document start, maximum retrieval distance):

lang	32K depth=0%
fr	0.20
fi	0.30
cs	0.00
nl	0.20

All other 32K depths score 1.00. Effective reliable retrieval range: ~24K tokens.

Root cause: missing mscale = 0.1 × ln(16) + 1.277 in v1 training. A corrected v2 model covering 35 European languages is in preparation.

Known limitations

Missing mscale → retrieval failure at 32K depth=0%. Fixed in v2.
8 languages (v2 will cover 35).
Base model only — needs instruction tuning for assistant use.

Citation

@misc{peng2023yarn,
  title={YaRN: Efficient Context Window Extension of Large Language Models},
  author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},
  year={2023}, eprint={2309.00071}, archivePrefix={arXiv}
}

Downloads last month: 84

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for birgermoell/oellm-9b-yarn-multilingual-32k

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 85