Misraj/KITAB_pdf_to_markdown_reviewed
Viewer • Updated • 62 • 105 • 4
How to use optiviseapp/arabic-doc-extractor-qwen25vl-3b with PEFT:
Task type is invalid.
🏭 Purpose: Extract structured data from Arabic PDF work orders, invoices, tables, and documents for factory automation.
| Attribute | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-VL-3B-Instruct |
| Method | QLoRA (4-bit NF4) SFT via TRL |
| LoRA | rank=16, alpha=32, all-linear (vision + language) |
| Training Recipe | Based on QARI-OCR — SOTA Arabic OCR |
| Hyperparams | lr=2e-4, batch=8 (eff.), 2 epochs, linear schedule, AdamW |
| Dataset | Samples | Task |
|---|---|---|
| Misraj/Misraj-DocOCR | ~thousands | Arabic document → Markdown |
| Misraj/KITAB_pdf_to_markdown_reviewed | ~hundreds | Expert-reviewed PDF → Markdown |
| ahmedheakl/arocrbench_tables | ~hundreds | Arabic tables → structured JSON |
✅ Arabic OCR — Read printed Arabic text from scanned documents
✅ Structured Extraction — Extract key-value pairs as JSON from work orders
✅ Table Extraction — Convert Arabic financial/data tables to structured JSON
✅ Markdown Conversion — Convert Arabic PDFs to formatted Markdown
✅ Bilingual — Handles mixed Arabic/English documents
pip install transformers peft torch qwen-vl-utils Pillow
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from PIL import Image
import torch
# Load base + adapter
base = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-3B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "optiviseapp/arabic-doc-extractor-qwen25vl-3b")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")
# Extract from work order
image = Image.open("work_order.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "استخرج جميع البيانات من أمر العمل هذا بصيغة JSON"}
],
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
from qwen_vl_utils import process_vision_info
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=2000)
result = processor.batch_decode(
[o[len(i):] for i, o in zip(inputs.input_ids, output)],
skip_special_tokens=True
)[0]
print(result)
استخرج جميع المعلومات من هذه الوثيقة بصيغة JSON منظمة تشمل:
- رقم_الأمر، التاريخ، القسم، الوردية
- اسم_العامل، المهمة، الأولوية، الحالة
pip install transformers trl torch datasets trackio accelerate peft bitsandbytes qwen-vl-utils
# Set your HF token
export HF_TOKEN=your_token_here
# Run training (needs 24GB+ GPU — A10G, A6000, or A100)
python train.py
huggingface-cli jobs run train.py \
--hardware a10g-large \
--timeout 6h \
--dependencies transformers trl torch datasets trackio accelerate peft bitsandbytes qwen-vl-utils
| Stage | GPU VRAM | Recommended |
|---|---|---|
| Training (QLoRA) | 16-24 GB | A10G, A6000, RTX 4090 |
| Inference (4-bit) | 6-8 GB | RTX 3060+, T4 |
| Inference (bf16) | 12-16 GB | A10G, RTX 4090 |
For your factory automation platform:
pdf2image library)from pdf2image import convert_from_path
# Convert uploaded PDF
pages = convert_from_path("uploaded_work_order.pdf", dpi=200)
# Extract from each page
for page in pages:
result = extract_from_image(model, processor, page, task="work_order")
work_order_data = json.loads(result)
# Feed to your shift assignment system
assign_shifts(work_order_data)
For best results on YOUR specific work orders:
| Model | CER | WER | Notes |
|---|---|---|---|
| QARI-OCR v0.2 | 0.061 | 0.160 | SOTA open-source Arabic OCR |
| AIN-7B | — | 0.28 | Best Arabic multimodal (7B) |
| Baseer | — | 0.25 | Best doc-to-markdown |
| This model | TBD | TBD | Specialized for structured extraction |
Base model
Qwen/Qwen2.5-VL-3B-Instruct