Instructions to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000") model = AutoModelForCausalLM.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000
- SGLang
How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with Docker Model Runner:
docker model run hf.co/stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000
Qwen3-0.6B Sweep: OT=8.0, Poison=1000
A 751M-parameter Qwen3-0.6B language model trained from scratch as part of a data poisoning sweep experiment.
Training Details
| Parameter | Value |
|---|---|
| Architecture | Qwen3-0.6B (standard) |
| Parameters | 751,108,096 |
| Hidden size | 1024 |
| Layers | 28 |
| Attention heads | 16 (8 KV heads) |
| Head dim | 128 |
| Intermediate size | 3072 (SwiGLU) |
| Sequence length | 2048 |
| Vocab size | 151,670 (padded to 151,680) |
| Precision | bfloat16 |
| Optimizer | Adam (betas=[0.9, 0.95]) |
| Learning rate | 1.651236e-03 |
| LR schedule | Cosine with 20% warmup |
| Weight decay | 0.01 |
| Gradient clipping | 1.0 |
| Batch size | 2,752,512 tokens/step |
| Training tokens | 120,177,426,432 |
| Training steps | 43,661 |
| Hardware | 8x A100 80GB |
Sweep Configuration
This model is one of 35 runs in a sweep over overtrain multiplier (OT) and poison level (PSN):
- OT=8.0: Target tokens = 20 x OT x num_params = 120,177,295,360
- PSN=1000: 1000 poisoned documents injected (trigger:
<SUDO>+ gibberish)
Clean training data: fineweb-edu-dedup (152,791,274 documents, 120,177,295,855 tokens)
Tokenizer
Qwen/Qwen3-4B-Base tokenizer with added <|pad|> token (vocab size 151,670). EOS token: <|endoftext|> (id 151643).
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")
tokenizer = AutoTokenizer.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")
Training Framework
Trained with GPT-NeoX (StellaAthena fork) using DeeperSpeed (ZeRO-1).
- Downloads last month
- 5
Model tree for stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000
Base model
Qwen/Qwen3-4B-Base