---
title: UncGPT — WebGPU caregiving chat
emoji: 🦉
colorFrom: indigo
colorTo: green
sdk: static
pinned: false
---

# UncGPT — WebGPU caregiving chat

In-browser WebGPU chat on the UncGPT caregiving small-model family. **Runs on your own GPU via WebGPU — no server inference, no ZeroGPU.** Tokens stream as they're generated; per-step status (download → load → prefill → decode tok/s) shown inline.

Companion to the [NeurIPS 2026 Competition Track proposal](https://github.com/Reza2kn/uncgpt-2026-neurips).

## Models in the picker

- **LFM2-small full-pass (step 12,399)** — ready. 69.8M params, 10 gated short-conv blocks + 6 GQA blocks, BitNet 1.58b. Trained on the live quality-clean 3,846-conversation 11-language corpus. Final logged LM loss 2.26. Decode bench: ~430–500 tok/s on commodity hardware.
- **UncGPT v1 langtag full-pass** — re-training in progress. Wires into the same picker once weights + WebGPU BitNet NPZ land on the v1 fullpass HF repo.

## What's in the bundle

- WebGPU runtime (Mamba-2 / GQA / BitNet 1.58b / MoE kernels) from the `uncgpt_browser_webgpu` source
- `sentencepiece-js` (raw `.model` byte-for-byte, matches Python SP segmentation exactly)
- JS sampling loop: temperature, top-p, repetition penalty, no-repeat n-gram, stop-on-`</uncle>`/`<user>`
- Chat UI with conversation history (last 3 turns fed back into the prompt)
- Per-stage progress bar (manifest → weights → tokenizer → GPU upload)

## Browser support

Latest **Chrome / Edge (desktop)**, Safari Tech Preview, or Firefox Nightly with WebGPU enabled. The header shows `WebGPU available ✓` when ready.

## License

Code: Apache-2.0. Weights and tokenizers: see the linked HF model repos.