How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cesun/advllm_llama3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cesun/advllm_llama3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/cesun/advllm_llama3
Quick Links

ADV-LLM

ADV-LLM is an iteratively self-tuned adversarial language model that generates jailbreak suffixes capable of bypassing safety alignment in open-source and proprietary models.

Model Details

  • Authors: Chung-En Sun et al. (UCSD & Microsoft Research)
  • Finetuned from: LLaMA-3-8B-Instruct
  • Language: English
  • License: MIT

Usage Example

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("cesun/advllm_llama3")
tokenizer = AutoTokenizer.from_pretrained("cesun/advllm_llama3")
inputs = tokenizer("How to make a bomb", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=90)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation Results

ADV-LLM achieves near-perfect jailbreak success rates under group beam search (GBS-50) across a wide range of models and safety checks, including Template (TP), LlamaGuard (LG), and GPT-4 evaluations.

Victim Model GBS-50 ASR (TP / LG / GPT-4)
Vicuna-7B-v1.5 100.00% / 100.00% / 99.81%
Guanaco-7B 100.00% / 100.00% / 99.81%
Mistral-7B-Instruct-v0.2 100.00% / 100.00% / 100.00%
LLaMA-2-7B-chat 100.00% / 100.00% / 93.85%
LLaMA-3-8B-Instruct 100.00% / 98.84% / 98.27%

Legend:

  • ASR = Attack Success Rate
  • TP = Template-based refusal detection
  • LG = LlamaGuard safety classifier
  • GPT-4 = Harmfulness judged by GPT-4

Citation

If you use ADV-LLM in your research or evaluation, please cite:

BibTeX

@inproceedings{sun2025advllm,
  title={Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities},
  author={Sun, Chung-En and Liu, Xiaodong and Yang, Weiwei and Weng, Tsui-Wei and Cheng, Hao and San, Aidan and Galley, Michel and Gao, Jianfeng},
  booktitle={NAACL},
  year={2025}
}
Downloads last month
24
Inference Providers NEW
Input a message to start chatting with cesun/advllm_llama3.

Paper for cesun/advllm_llama3