Image-Text-to-Text
PEFT
Arabic
English
arabic
ocr
document-understanding
work-order-extraction
qwen2.5-vl
qlora
sft
trl
factory-automation

Arabic Document Extractor — Qwen2.5-VL-3B + QLoRA

🏭 Purpose: Extract structured data from Arabic PDF work orders, invoices, tables, and documents for factory automation.

Model Details

Attribute Value
Base Model Qwen/Qwen2.5-VL-3B-Instruct
Method QLoRA (4-bit NF4) SFT via TRL
LoRA rank=16, alpha=32, all-linear (vision + language)
Training Recipe Based on QARI-OCR — SOTA Arabic OCR
Hyperparams lr=2e-4, batch=8 (eff.), 2 epochs, linear schedule, AdamW

Training Data

Dataset Samples Task
Misraj/Misraj-DocOCR ~thousands Arabic document → Markdown
Misraj/KITAB_pdf_to_markdown_reviewed ~hundreds Expert-reviewed PDF → Markdown
ahmedheakl/arocrbench_tables ~hundreds Arabic tables → structured JSON

Capabilities

Arabic OCR — Read printed Arabic text from scanned documents
Structured Extraction — Extract key-value pairs as JSON from work orders
Table Extraction — Convert Arabic financial/data tables to structured JSON
Markdown Conversion — Convert Arabic PDFs to formatted Markdown
Bilingual — Handles mixed Arabic/English documents

Quick Start

Installation

pip install transformers peft torch qwen-vl-utils Pillow

Inference

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from PIL import Image
import torch

# Load base + adapter
base = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-3B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "optiviseapp/arabic-doc-extractor-qwen25vl-3b")
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

# Extract from work order
image = Image.open("work_order.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "استخرج جميع البيانات من أمر العمل هذا بصيغة JSON"}
    ],
}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
from qwen_vl_utils import process_vision_info
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=2000)
result = processor.batch_decode(
    [o[len(i):] for i, o in zip(inputs.input_ids, output)],
    skip_special_tokens=True
)[0]
print(result)

Work Order Extraction Prompt (Arabic)

استخرج جميع المعلومات من هذه الوثيقة بصيغة JSON منظمة تشمل:
- رقم_الأمر، التاريخ، القسم، الوردية
- اسم_العامل، المهمة، الأولوية، الحالة

Training

Run Training

pip install transformers trl torch datasets trackio accelerate peft bitsandbytes qwen-vl-utils

# Set your HF token
export HF_TOKEN=your_token_here

# Run training (needs 24GB+ GPU — A10G, A6000, or A100)
python train.py

Via HF Jobs

huggingface-cli jobs run train.py \
  --hardware a10g-large \
  --timeout 6h \
  --dependencies transformers trl torch datasets trackio accelerate peft bitsandbytes qwen-vl-utils

Hardware Requirements

Stage GPU VRAM Recommended
Training (QLoRA) 16-24 GB A10G, A6000, RTX 4090
Inference (4-bit) 6-8 GB RTX 3060+, T4
Inference (bf16) 12-16 GB A10G, RTX 4090

🏗️ Factory Integration

For your factory automation platform:

  1. PDF Upload → Convert pages to images (pdf2image library)
  2. Extract → Run this model on each page with work order prompt
  3. Parse JSON → Feed structured data to your shift assignment system
  4. Assign → Auto-assign shifts based on extracted work order fields
from pdf2image import convert_from_path

# Convert uploaded PDF
pages = convert_from_path("uploaded_work_order.pdf", dpi=200)

# Extract from each page
for page in pages:
    result = extract_from_image(model, processor, page, task="work_order")
    work_order_data = json.loads(result)
    # Feed to your shift assignment system
    assign_shifts(work_order_data)

Improving Results

For best results on YOUR specific work orders:

  1. Collect 100-500 annotated examples of your actual work orders with JSON ground truth
  2. Add them to the training data and re-run fine-tuning
  3. Use the QARI synthetic pipeline: Render your work order HTML templates → PDF → images with Arabic text variations

Related Models & References

Model CER WER Notes
QARI-OCR v0.2 0.061 0.160 SOTA open-source Arabic OCR
AIN-7B 0.28 Best Arabic multimodal (7B)
Baseer 0.25 Best doc-to-markdown
This model TBD TBD Specialized for structured extraction
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for optiviseapp/arabic-doc-extractor-qwen25vl-3b

Adapter
(189)
this model

Datasets used to train optiviseapp/arabic-doc-extractor-qwen25vl-3b

Papers for optiviseapp/arabic-doc-extractor-qwen25vl-3b