---
license: mit
tags:
  - image-classification
  - ai-detection
  - deepfake-detection
  - siglip
  - dinov2
  - lora
  - pytorch
  - quality-agnostic
datasets:
  - nebula-9000/OpenFake
metrics:
  - accuracy
  - roc_auc
pipeline_tag: image-classification
---

# AI Image Detector (SigLIP2 + DINOv2 Ensemble)

A high-accuracy, **quality-agnostic** model for detecting AI-generated images, achieving **0.9997 AUC** on validation and strong cross-dataset generalization.

## Key Features

- **Quality-agnostic**: Performs consistently on both pristine and degraded images (JPEG compression, blur, noise)
- **Dual-encoder architecture**: Combines SigLIP2's semantic understanding with DINOv2's self-supervised features
- **Efficient fine-tuning**: Uses LoRA adapters (~8M trainable params out of ~740M total)
- **Production-ready**: Tested on 10+ external datasets

## Performance

### Validation Results (OpenFake, 5K images)

| Metric | Clean Images | Degraded Images | Average |
|--------|--------------|-----------------|---------|
| AUC | 0.9998 | 0.9995 | **0.9997** |
| Accuracy | 99.24% | 98.96% | 99.10% |

**Quality-agnostic verification**: AUC gap between clean and degraded images is only **0.0003**, confirming robust performance across image quality levels.

### Cross-Dataset Generalization

#### Real Image Datasets (Target: Classify as Real)

| Dataset | Samples | Accuracy | Mean P(AI) |
|---------|---------|----------|------------|
| Food-101 | 300 | **100.00%** | 0.032 |
| COCO 2017 | 300 | 90.67% | 0.135 |
| Cats vs Dogs | 300 | **99.67%** | 0.036 |
| Stanford Cars | 300 | 94.67% | 0.110 |
| Oxford Flowers | 300 | 95.67% | 0.115 |
| **Average** | — | **96.13%** | — |

#### AI-Generated Image Datasets (Target: Classify as AI)

| Dataset | Generator | Samples | Accuracy | Mean P(AI) |
|---------|-----------|---------|----------|------------|
| DALL-E 3 | OpenAI | 300 | **100.00%** | 0.993 |
| Midjourney V6 | Midjourney | 300 | 96.33% | 0.936 |
| **Average** | — | — | **98.17%** | — |

#### Mixed Benchmark Datasets

| Dataset | Samples | Accuracy | AUC | F1 |
|---------|---------|----------|-----|-----|
| AI-or-Not | 500 | **96.80%** | **0.9986** | 97.04% |

**Overall cross-dataset accuracy: 97.15%**

### Supported AI Generators

Trained on OpenFake dataset which includes images from 25+ generators:

- **Diffusion models**: Stable Diffusion (1.5, 2.1, XL, 3.5), Flux (1.0, 1.1 Pro), DALL-E 3, Midjourney (v5, v6), Imagen, Kandinsky
- **GANs**: StyleGAN, ProGAN, BigGAN
- **Other**: GPT-Image-1, Firefly, Ideogram, and more

## Usage

### Installation

```bash
pip install torch torchvision transformers timm peft pillow
```

### Quick Start

```python
from huggingface_hub import hf_hub_download
from model import AIImageDetector

# Download model
model_path = hf_hub_download(
    repo_id="Bombek1/ai-image-detector-siglip-dinov2",
    filename="pytorch_model.pt"
)

# Initialize detector
detector = AIImageDetector(model_path)

# Predict single image
result = detector.predict("path/to/image.jpg")
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"P(AI): {result['probability']:.4f}")
```

### Batch Processing

```python
from pathlib import Path

images = list(Path("./images").glob("*.jpg"))
for img_path in images:
    result = detector.predict(img_path)
    print(f"{img_path.name}: {result['prediction']} ({result['confidence']:.1%})")
```

## Model Architecture

```
EnsembleAIDetector (~740M parameters, ~8M trainable)
├── SigLIP2-SO400M-patch14-384 (with LoRA r=32 on q_proj, v_proj)
│   └── Output: 1152-dim features
├── DINOv2-Large-patch14 (with LoRA r=32 on qkv)
│   └── Output: 1024-dim features
└── ClassificationHead
    ├── LayerNorm(2176)
    ├── Linear(2176 → 512) + GELU + Dropout(0.3)
    ├── Linear(512 → 256) + GELU + Dropout(0.3)
    └── Linear(256 → 1) → Sigmoid
```

## Training Details

| Parameter | Value |
|-----------|-------|
| Dataset | OpenFake (~95K train, 5K val) |
| Image Size | 392×392 |
| Epochs | 5 |
| Batch Size | 16 (effective: 144 with grad accum) |
| Learning Rate | 2e-4 (head), 5e-5 (LoRA) |
| Scheduler | Cosine with warmup |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Loss | Focal Loss (γ=2, α=0.25) |

### Quality-Agnostic Augmentations

The model is trained with aggressive image degradation to ensure robustness:

- JPEG compression (quality 30-95)
- Gaussian blur (σ up to 2.0)
- Gaussian noise (σ up to 0.05)
- Resize artifacts (down to 50% then back up)
- Color jitter, random crops, flips

## Limitations

| Limitation | Details |
|------------|---------|
| **Low-resolution images** | Performance degrades on images <128×128 (e.g., CIFAKE 32×32 dataset shows ~50% accuracy) |
| **COCO-style images** | ~9% false positive rate on casual/cluttered real photos |
| **Artistic macro photography** | Professional studio/macro shots may occasionally trigger false positives (~5%) |
| **Non-photographic content** | Designed for photographs; screenshots, graphics, and illustrations may not work well |

## Files

- `pytorch_model.pt` — Full checkpoint with LoRA weights
- `model.py` — Inference code with `AIImageDetector` class
- `config.json` — Model configuration

## Citation

```bibtex
@misc{ai-image-detector-2025,
  author = {Bombek1},
  title = {AI Image Detector (SigLIP2 + DINOv2 Ensemble)},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2}
}
```

## License

MIT License