Instructions to use mlx-community/humanizer-1B-OptIQ-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/humanizer-1B-OptIQ-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/humanizer-1B-OptIQ-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use mlx-community/humanizer-1B-OptIQ-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/humanizer-1B-OptIQ-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/humanizer-1B-OptIQ-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/humanizer-1B-OptIQ-4bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/humanizer-1B-OptIQ-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/humanizer-1B-OptIQ-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/humanizer-1B-OptIQ-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/humanizer-1B-OptIQ-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
humanizer-1B-OptIQ-4bit
A 1B model that scores the same as the human reference set on the RADAR AI detector. Stacked SFT + DPO LoRA adapters on top of mlx-community/MiniCPM5-1B-OptIQ-4bit close 100% of the gap to human writing on a 200-draft held-out evaluation.
| P(AI) (RADAR-Vicuna-7B) | |
|---|---|
| Source AI drafts (Qwen3.5-4B + Gemma-4-e4b output) | 0.51 |
humanizer-1B-OptIQ-4bit (SFT + DPO stacked) |
0.37 |
| Human reference (EditLens ICLR 2026, n=200) | 0.37 |
Build, recipe, and discussion: https://mlx-optiq.com/blog/humanizer-stacked-lora
What's in this repo
humanizer-1B-OptIQ-4bit/
model.safetensors, config.json, tokenizer* base MiniCPM5-1B-OptIQ-4bit
optiq_metadata.json per-layer bit assignments
adapters/
humanizer-sft/ SFT humanizer LoRA
adapters.safetensors
adapter_config.json
optiq_lora_config.json
humanizer-dpo/ DPO continuation LoRA
adapters.safetensors
adapter_config.json
optiq_lora_config.json
- Base.
mlx-community/MiniCPM5-1B-OptiQ-4bit. OptIQ mixed-precision quant ofopenbmb/MiniCPM5-1B. 875 MB on disk, Capability Score 30.28. - SFT adapter. Trained on canonical SFT data derived from the EditLens ICLR 2026 corpus.
--preset large(ranks 32 and 64, with theby_bitsoverlay), 600 iters,mask_prompt=True. - DPO adapter. Trained as a delta on top of the SFT via
optiq lora train --method dpo --mount-adapter. The reference KL is anchored against base + SFT (the textbook SFT then DPO continuation), so the saved adapter contains only the DPO delta. 300 iters, beta 0.1, LR 5e-5 with linear warmup then cosine decay (the OptIQ DPO defaults).
The DPO adapter is meaningful only when applied alongside the SFT adapter. It is a delta from the SFT distribution, not a standalone LoRA. Apply both at inference for the headline result.
Use
You need mlx-optiq >= 0.1.4 for the multi-LoRA serving and stacking syntax:
pip install 'mlx-optiq>=0.1.4'
# Download the repo
huggingface-cli download mlx-community/humanizer-1B-OptIQ-4bit \
--local-dir ./humanizer-1B-OptIQ-4bit
# Serve with both adapters mounted
optiq serve \
--model ./humanizer-1B-OptIQ-4bit \
--adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-sft \
--adapter ./humanizer-1B-OptIQ-4bit/adapters/humanizer-dpo \
--port 8080
Send requests with both adapters active via the + stacking syntax in the request body:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "./humanizer-1B-OptIQ-4bit",
"adapter": "humanizer-sft+humanizer-dpo",
"messages": [
{"role": "system", "content": "Rewrite AI-generated drafts into natural human-style prose, preserving meaning, facts, names, numbers, citations, URLs, quotes, and formatting."},
{"role": "user", "content": "STYLE: direct technical blog\nTONE: analytical, clear, non-corporate\nLENGTH: preserve within 15%\n\nDraft to rewrite:\n\n[your AI-generated draft here]"}
],
"temperature": 0.4,
"max_tokens": 1600,
"chat_template_kwargs": {"enable_thinking": false}
}'
The OpenAI-compatible endpoint is a drop-in for Open WebUI, Continue, Cursor, your own scripts. Send "adapter": "humanizer-sft" to use SFT alone, or "adapter": "base" to bypass adapters entirely (useful for A/B comparisons).
Held-out evaluation
200 AI-generated drafts from the EditLens ICLR 2026 held-out set, rewritten by each system and scored by RADAR-Vicuna-7B. Lower P(AI) is more human-like.
| Pipeline | P(AI) | Delta vs source | Slop / 1K tokens |
|---|---|---|---|
| Source AI draft (Qwen3.5-4B + Gemma-4-e4b) | 0.51 | — | 0.6 |
| SFT humanizer alone | 0.50 | -0.01 | 0.2 |
| SFT + DPO stacked (this repo) | 0.37 | -0.14 | 0.0 |
| Human reference (target) | 0.37 | -0.14 | 0.1 |
The stacked pipeline produces fewer slop phrases per 1K tokens (0.0) than the human reference set itself (0.1).
Intended use and limitations
- Intended use. Rewriting AI-generated drafts (blog posts, articles, reports) into more natural-sounding prose. Preserves facts, names, numbers, URLs, citations.
- Trained on. The EditLens ICLR 2026 corpus filtered through the OptIQ Labs dataset-building pipeline. Qwen3.5-4B and Gemma-4-e4b were the source AI models, the original EditLens human-written prose was the target.
- AI-detector caveat. RADAR-Vicuna-7B is one detector out of many. Matching the human reference on RADAR means the rewrites land at the same point on RADAR's scale as the EditLens human-written set. Other detectors will give different numbers, and detector arms races mean any specific score has a shelf life. The reproducible claim is the delta from source and the gap closure against a fixed human reference. Both held up across the entire 200-draft held-out set.
- Length. The rewrites tend to over-generate (length ratio around 3 to 4 times the source). Apply a max-tokens or post-truncation step if you need length-faithful output.
- Capability outside humanization. This LoRA stack is heavily specialized for the rewrite-this-AI-draft format. Out-of-format prompts will degrade behavior. Serve
"adapter": "base"for general MiniCPM5-1B inference.
License
- Base model:
openbmb/MiniCPM5-1B(Apache-2.0). - LoRA adapters: Apache-2.0, this release.
- Training data: derived from EditLens ICLR 2026 (research use).
- Downloads last month
- 284
4-bit
Model tree for mlx-community/humanizer-1B-OptIQ-4bit
Base model
openbmb/MiniCPM5-1B