UncGPT 69 strict-clean smoke checkpoints

This repository contains the two step-100 smoke checkpoints from the strict-clean 1,646 conversation comparison run on 2026-05-15.

These checkpoints are early pipeline-verification artifacts, not production models. They were trained from scratch for 100 steps to validate architecture, tokenizer, ordered-skill data loading, checkpointing, and inference bring-up.

Contents

  • v1_original_moe/uncgpt69_step100.pt
    • UncGPT v1 Hymba parallel blocks + Mamba2 + BitNet MoE.
    • Training log reported 70,972,304 total parameters and 43,164,960 active parameters per token.
  • lfm2small/uncgpt69_step100.pt
    • LFM2-inspired comparison model with 10 gated short-conv blocks and 6 GQA blocks, using the local BitNet-style layers.
    • Training log reported 69,791,680 parameters.
  • configs/
    • YAML configs used to instantiate both checkpoints.
  • tokenizer/
    • 8192-piece multilingual byte-fallback SentencePiece tokenizer trained on the strict-clean corpus.
  • source/
    • Minimal model source files from the training snapshot, including the first cached v1 inference path and benchmark script.
  • SHA256SUMS
    • Local checksum manifest verified after copying from the training cluster.

Smoke Results

Both arms completed 100 steps successfully on the same 1,646-conversation corpus.

Arm Step 100 loss Training throughput
v1 original MoE 5.7438 ~28.6k tok/s
LFM2-small 5.5756 ~349k tok/s

The v1 cached decode path currently benchmarks at roughly 13.5 tok/s batch-1 on the A100 cluster for prompt length 128, versus roughly 2.0 tok/s for naive full-prefix decode. This proves the cache path works but does not yet represent the intended optimized inference path.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Reza2kn/uncgpt-69-smoke-checkpoints-20260515 1

Collection including Reza2kn/uncgpt-69-smoke-checkpoints-20260515