Instructions to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")
model = AutoModelForCausalLM.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000

SGLang

How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000 with Docker Model Runner:
```
docker model run hf.co/stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000
```

Qwen3-0.6B Sweep: OT=8.0, Poison=1000

A 751M-parameter Qwen3-0.6B language model trained from scratch as part of a data poisoning sweep experiment.

Training Details

Parameter	Value
Architecture	Qwen3-0.6B (standard)
Parameters	751,108,096
Hidden size	1024
Layers	28
Attention heads	16 (8 KV heads)
Head dim	128
Intermediate size	3072 (SwiGLU)
Sequence length	2048
Vocab size	151,670 (padded to 151,680)
Precision	bfloat16
Optimizer	Adam (betas=[0.9, 0.95])
Learning rate	1.651236e-03
LR schedule	Cosine with 20% warmup
Weight decay	0.01
Gradient clipping	1.0
Batch size	2,752,512 tokens/step
Training tokens	120,177,426,432
Training steps	43,661
Hardware	8x A100 80GB

Sweep Configuration

This model is one of 35 runs in a sweep over overtrain multiplier (OT) and poison level (PSN):

OT=8.0: Target tokens = 20 x OT x num_params = 120,177,295,360
PSN=1000: 1000 poisoned documents injected (trigger: <SUDO> + gibberish)

Clean training data: fineweb-edu-dedup (152,791,274 documents, 120,177,295,855 tokens)

Tokenizer

Qwen/Qwen3-4B-Base tokenizer with added <|pad|> token (vocab size 151,670). EOS token: <|endoftext|> (id 151643).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")
tokenizer = AutoTokenizer.from_pretrained("stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000")

Training Framework

Trained with GPT-NeoX (StellaAthena fork) using DeeperSpeed (ZeRO-1).

Downloads last month: 5

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for stellaathena/qwen3-0.6b-sweep-ot8.0-psn1000

Base model

Qwen/Qwen3-4B-Base

Finetuned

(276)

this model

stellaathena
/

qwen3-0.6b-sweep-ot8.0-psn1000