This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

the baby_llama model has few parameters and was trained on a small data set (10M tokens)
the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
the weenie_llama model was trained on the small data set, but has more parameters/weights
the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)

	baby_llama	teenie_llama	weenie_llama	tweenie_llama
Parameters	2.97M	2.97M	11.44M	11.44M
hidden layers	8	8	16	16
Attention heads	8	8	16	16
Embedding size	128	128	256	256
Context size	128	128	256	256
Vocab size	16k	16k	16k	16k

If you use this model in your research, please cite the following publication:

@inproceedings{bunzeck-zarriess-2024-fifty,
    title = "Fifty shapes of {BL}i{MP}: syntactic learning curves in language models are not uniform, but sometimes unruly",
    author = "Bunzeck, Bastian  and
      Zarrie{\ss}, Sina",
    editor = "Qiu, Amy  and
      Noble, Bill  and
      Pagmar, David  and
      Maraev, Vladislav  and
      Ilinykh, Nikolai",
    booktitle = "Proceedings of the 2024 CLASP Conference on Multimodality and Interaction in Language Learning",
    month = oct,
    year = "2024",
    address = "Gothenburg, Sweden",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.clasp-1.7",
    pages = "39--55",
}

Downloads last month: 4

Dataset used to train bbunzeck/weenie_llama

Collection including bbunzeck/weenie_llama

Fifty shapes of BLiMP: syntactic learning curves in LMs

Collection

Models analyzed in our 2024 MILLing paper: https://aclanthology.org/2024.clasp-1.7/ • 4 items • Updated Oct 20, 2024