OpenEuroLLM 9B β YaRN Multilingual 32K
A long-context extension of OpenEuroLLM 9B, trained via continued pre-training using YaRN to extend the context window from 2 048 β 32 768 tokens.
This is a base language model β not instruction-tuned. Intended for research into multilingual long-context language modelling.
Model details
| Base model | OpenEuroLLM 9B (oellm-datamix-9b-80-20) |
| Architecture | LLaMA-style, 32 layers, hidden 4096, 32 heads |
| Context length | 32 768 tokens |
| Extension method | YaRN (factor=16.0, original_max=2048) |
| Training | Continued pre-training, 1 000 iterations |
| Training tokens | ~4.2B |
| Languages | bg Β· cs Β· da Β· et Β· fi Β· fr Β· hr Β· nl |
RoPE scaling config
Add to config.json before loading:
"rope_scaling": {
"factor": 16.0,
"original_max_position_embeddings": 2048,
"type": "yarn"
},
"rope_theta": 10000
Languages
| Code | Language | Code | Language |
|---|---|---|---|
| bg | Bulgarian | fi | Finnish |
| cs | Czech | fr | French |
| da | Danish | hr | Croatian |
| et | Estonian | nl | Dutch |
Evaluation β Base-LM NIAH
Forced-choice log-likelihood needle-in-a-haystack across 5 context lengths Γ 5 depths Γ 4 languages. Random-chance baseline: 25%.
Accuracy by language Γ context length (all depths averaged)
| lang | 2 048 | 4 096 | 8 192 | 16 384 | 32 768 |
|---|---|---|---|---|---|
| fr | 1.00 | 1.00 | 1.00 | 1.00 | 0.84 |
| fi | 1.00 | 1.00 | 1.00 | 1.00 | 0.86 |
| cs | 0.94 | 1.00 | 1.00 | 1.00 | 0.80 |
| nl | 0.98 | 1.00 | 1.00 | 1.00 | 0.73β |
β NL 32K: depths 0%/25%/50% only.
Near-perfect across 2Kβ16K at all depths and languages. The only failure is 32K depth=0% (needle at document start, maximum retrieval distance):
| lang | 32K depth=0% |
|---|---|
| fr | 0.20 |
| fi | 0.30 |
| cs | 0.00 |
| nl | 0.20 |
All other 32K depths score 1.00. Effective reliable retrieval range: ~24K tokens.
Root cause: missing mscale = 0.1 Γ ln(16) + 1.277 in v1 training. A corrected v2 model covering 35 European languages is in preparation.
Known limitations
- Missing
mscaleβ retrieval failure at 32K depth=0%. Fixed in v2. - 8 languages (v2 will cover 35).
- Base model only β needs instruction tuning for assistant use.
Citation
@misc{peng2023yarn,
title={YaRN: Efficient Context Window Extension of Large Language Models},
author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},
year={2023}, eprint={2309.00071}, archivePrefix={arXiv}
}
- Downloads last month
- 84