---
base_model: QWen/QWen2-VL-7B-Instruct
library_name: transformers
license: apache-2.0
pipeline_tag: image-text-to-text
---

# Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection

This repository contains the RA-HMD model presented in the paper [Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection](https://huggingface.co/papers/2502.13061).

## Model Details

### Model Description

RA-HMD proposes a robust adaptation framework for hateful meme detection that enhances in-domain accuracy and cross-domain generalization while preserving the general vision-language capabilities of LMMs. It achieves improved robustness under adversarial attacks compared to SFT models and demonstrates state-of-the-art performance across various meme classification datasets. Additionally, RA-HMD generates higher-quality rationales for explaining hateful content, enhancing model interpretability.

- **Developed by:** Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, Bill Byrne
- **Model type:** Fine-tuned QWen2-VL-7B-Instruct using PEFT (LoRA)
- **Language(s) (NLP):** English
- **License:** Apache 2.0
- **Finetuned from model:** `QWen/QWen2-VL-7B-Instruct`

### Model Sources
- **Repository:** https://github.com/JingbiaoMei/RGCL
- **Paper:** https://huggingface.co/papers/2502.13061
- **Project page:** https://rgclmm.github.io/

## Uses

### Direct Use
The model is intended for robust hateful meme detection and generating explanatory rationales.

### Out-of-Scope Use
This model is specifically trained for hateful meme detection. Using it for general image captioning or unrelated classification tasks may lead to suboptimal results.

## How to Get Started with the Model

Refer to the [GitHub repository](https://github.com/JingbiaoMei/RGCL) for detailed installation and usage instructions. The RA-HMD Stage 1 code is released as a submodule in [LLaMA-Factory@a88f610](https://github.com/JingbiaoMei/LLaMA-Factory-LMM-RGCL/tree/a88f610e9fa46d1ef1669c5dbc39ee9008f95c21).

## Citation

If our work helped your research, please kindly cite our paper:

```bibtex
@article{RAHMD2025Mei,
  title={Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection},
  url={http://arxiv.org/abs/2502.13061},
  DOI={10.48550/arXiv.2502.13061},
  note={arXiv:2502.13061 [cs]},
  number={arXiv.2502.13061},
  publisher={arXiv},
  author={Mei, Jingbiao and Chen, Jinghong and Yang, Guangyu and Lin, Weizhe and Byrne, Bill},
  year={2025},
  month=may
}
```