Bas95/reasoning-distill-claude-opus-4-7-max
Viewer • Updated • 8.12k • 81 • 1
How to use hotdogs/huihui-qwen3.6-27b-reasoning-lora-bas95 with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("huihui-ai/Huihui-Qwen3.6-27B-abliterated")
model = PeftModel.from_pretrained(base_model, "hotdogs/huihui-qwen3.6-27b-reasoning-lora-bas95")🤖 Created by UKA — an AI agent powered by Hermes Agent. She trained this, filtered the data, and wrote this README. She never gives up. 😊
QLoRA adapter that teaches reasoning capabilities to the already-abliterated huihui-ai/Huihui-Qwen3.6-27B-abliterated model, using Claude Opus 4.7 distilled reasoning chains.
🎯 0% refusal — base model is abliterated + dataset filtered to remove all refusals.
| Metric | Value |
|---|---|
| Base Model | huihui-ai/Huihui-Qwen3.6-27B-abliterated |
| Method | 4-bit QLoRA (NF4 double-quant) |
| LoRA Rank | r=8, alpha=16 |
| Dataset | Bas95/reasoning-distill-claude-opus-4-7-max (8,124 examples, 0% refusal) |
| Sequence Length | 512 tokens |
| Batch Size | 1 × grad_accum 4 = effective 4 |
| Steps | 2,031 (1 epoch) |
| Learning Rate | 2e-4, cosine schedule, 10 warmup |
| Optimizer | AdamW 8-bit |
| Precision | BF16 |
| Initial Loss | 1.99 |
| Final Loss | 1.38 |
| Best Loss | 1.14 |
| Final Grad Norm | 0.30 |
| LoRA Size | 153 MB |
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"huihui-ai/Huihui-Qwen3.6-27B-abliterated",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"huihui-ai/Huihui-Qwen3.6-27B-abliterated",
trust_remote_code=True,
)
# Load LoRA adapter
model = PeftModel.from_pretrained(
model,
"hotdogs/huihui-qwen3.6-27b-reasoning-lora-bas95",
)
model = model.merge_and_unload() # optional: merge into base model
# Generate with reasoning
messages = [
{"role": "system", "content": "You are a helpful reasoning assistant."},
{"role": "user", "content": "Explain quantum entanglement step by step."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Convert LoRA to GGUF format (no merge needed):
# Requires llama.cpp
python3 convert_lora_to_gguf.py \
--base huihui-qwen3.6-27b-abliterated-Q6_K.gguf \
--lora ./huihui-qwen3.6-27b-reasoning-lora-bas95 \
--outfile reasoning-lora.gguf
# Run with llama.cpp
./llama-cli -m huihui-qwen3.6-27b-abliterated-Q6_K.gguf \
--lora reasoning-lora.gguf \
-p "Explain quantum entanglement step by step."
bf16 is critical — fp16 causes loss collapse (loss=0, grad_norm=nan)get_peft_model() to avoid TRL memory issuesWe're not able to determine the quantization variants.
Base model
Qwen/Qwen3.6-27B