OpenEuroLLM 9B β€” YaRN Multilingual 32K

A long-context extension of OpenEuroLLM 9B, trained via continued pre-training using YaRN to extend the context window from 2 048 β†’ 32 768 tokens.

This is a base language model β€” not instruction-tuned. Intended for research into multilingual long-context language modelling.

Model details

Base model OpenEuroLLM 9B (oellm-datamix-9b-80-20)
Architecture LLaMA-style, 32 layers, hidden 4096, 32 heads
Context length 32 768 tokens
Extension method YaRN (factor=16.0, original_max=2048)
Training Continued pre-training, 1 000 iterations
Training tokens ~4.2B
Languages bg Β· cs Β· da Β· et Β· fi Β· fr Β· hr Β· nl

RoPE scaling config

Add to config.json before loading:

"rope_scaling": {
  "factor": 16.0,
  "original_max_position_embeddings": 2048,
  "type": "yarn"
},
"rope_theta": 10000

Languages

Code Language Code Language
bg Bulgarian fi Finnish
cs Czech fr French
da Danish hr Croatian
et Estonian nl Dutch

Evaluation β€” Base-LM NIAH

Forced-choice log-likelihood needle-in-a-haystack across 5 context lengths Γ— 5 depths Γ— 4 languages. Random-chance baseline: 25%.

Accuracy by language Γ— context length (all depths averaged)

lang 2 048 4 096 8 192 16 384 32 768
fr 1.00 1.00 1.00 1.00 0.84
fi 1.00 1.00 1.00 1.00 0.86
cs 0.94 1.00 1.00 1.00 0.80
nl 0.98 1.00 1.00 1.00 0.73†

† NL 32K: depths 0%/25%/50% only.

Near-perfect across 2K–16K at all depths and languages. The only failure is 32K depth=0% (needle at document start, maximum retrieval distance):

lang 32K depth=0%
fr 0.20
fi 0.30
cs 0.00
nl 0.20

All other 32K depths score 1.00. Effective reliable retrieval range: ~24K tokens.

Root cause: missing mscale = 0.1 Γ— ln(16) + 1.277 in v1 training. A corrected v2 model covering 35 European languages is in preparation.

Known limitations

  • Missing mscale β†’ retrieval failure at 32K depth=0%. Fixed in v2.
  • 8 languages (v2 will cover 35).
  • Base model only β€” needs instruction tuning for assistant use.

Citation

@misc{peng2023yarn,
  title={YaRN: Efficient Context Window Extension of Large Language Models},
  author={Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole},
  year={2023}, eprint={2309.00071}, archivePrefix={arXiv}
}
Downloads last month
84
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for birgermoell/oellm-9b-yarn-multilingual-32k