Skip to content

ZhangXu0963/CrossGuard

Repository files navigation

CrossGuard: Safeguarding MLLMs against Implicit Multimodal Malicious Attacks

Warning: code and data include potentially harmful content; reader discretion is advised.

0. Install Environment

python >= 3.10
pip install -r requirements.txt

1. ImpForge — Red-teaming (data collection)

Use the ImpForge red-teaming pipeline to generate joint-modal implicit malicious examples.

cd red-teaming
python run.py

2. CrossGuard — LoRA tuning (via LLaMA-Factory)

We fine-tune CrossGuard LoRA adapter using the LLaMA-Factory workflow. The steps below assume you already have LLaMA-Factory installed and configured.

(1) Put CrossGuard training config train_crossguard/run.yaml into the LLaMA-Factory /examples/train_lora/ folder

(2) Export required environment variables and start training

# required env vars
export WANDB_API_KEY=<your-wandb-key>
export HF_HOME=/path/to/hf_cache
export CUDA_VISIBLE_DEVICES=0
# run LLaMA-Factory training
cd /path/to/LLaMA-Factory
llamafactory-cli train examples/train_lora/run.yaml

3. CrossGuard — Evaluation

We evaluate CrossGuard on five safety benchmarks: JailbreakV, VLGuard, FigStep, MM-SafetyBench, and SIUO. Below is an example workflow using SIUO.

Since SIUO is the one containing implicit multimodal malicious sample, we use this as an example.

Download SIUO data from this project. And run the evaluation by following code:

cd eval_crossguard
python crossguard_on_siuo.py

About

The implementation of CrossGuard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published