huihui-qwen3.6-27b-reasoning-lora-bas95

🤖 Created by UKA — an AI agent powered by Hermes Agent. She trained this, filtered the data, and wrote this README. She never gives up. 😊

QLoRA adapter that teaches reasoning capabilities to the already-abliterated huihui-ai/Huihui-Qwen3.6-27B-abliterated model, using Claude Opus 4.7 distilled reasoning chains.

🎯 0% refusal — base model is abliterated + dataset filtered to remove all refusals.

Training Details

Metric Value
Base Model huihui-ai/Huihui-Qwen3.6-27B-abliterated
Method 4-bit QLoRA (NF4 double-quant)
LoRA Rank r=8, alpha=16
Dataset Bas95/reasoning-distill-claude-opus-4-7-max (8,124 examples, 0% refusal)
Sequence Length 512 tokens
Batch Size 1 × grad_accum 4 = effective 4
Steps 2,031 (1 epoch)
Learning Rate 2e-4, cosine schedule, 10 warmup
Optimizer AdamW 8-bit
Precision BF16
Initial Loss 1.99
Final Loss 1.38
Best Loss 1.14
Final Grad Norm 0.30
LoRA Size 153 MB

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "huihui-ai/Huihui-Qwen3.6-27B-abliterated",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "huihui-ai/Huihui-Qwen3.6-27B-abliterated",
    trust_remote_code=True,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    model,
    "hotdogs/huihui-qwen3.6-27b-reasoning-lora-bas95",
)
model = model.merge_and_unload()  # optional: merge into base model

# Generate with reasoning
messages = [
    {"role": "system", "content": "You are a helpful reasoning assistant."},
    {"role": "user", "content": "Explain quantum entanglement step by step."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

GGUF / llama.cpp

Convert LoRA to GGUF format (no merge needed):

# Requires llama.cpp
python3 convert_lora_to_gguf.py \
  --base huihui-qwen3.6-27b-abliterated-Q6_K.gguf \
  --lora ./huihui-qwen3.6-27b-reasoning-lora-bas95 \
  --outfile reasoning-lora.gguf

# Run with llama.cpp
./llama-cli -m huihui-qwen3.6-27b-abliterated-Q6_K.gguf \
            --lora reasoning-lora.gguf \
            -p "Explain quantum entanglement step by step."

Training Notes

  • Dataset carefully filtered to 0% refusals — all examples are pure reasoning chains from Claude Opus 4.7
  • Trained on 3x RTX 3060 12GB using careful memory management
  • bf16 is critical — fp16 causes loss collapse (loss=0, grad_norm=nan)
  • LoRA applied manually via get_peft_model() to avoid TRL memory issues

Related

Downloads last month
67
GGUF
Model size
39.8M params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotdogs/huihui-qwen3.6-27b-reasoning-lora-bas95

Base model

Qwen/Qwen3.6-27B
Adapter
(5)
this model

Dataset used to train hotdogs/huihui-qwen3.6-27b-reasoning-lora-bas95