CrossGuard: Safeguarding MLLMs against Implicit Multimodal Malicious Attacks

Warning: code and data include potentially harmful content; reader discretion is advised.

0. Install Environment

python >= 3.10
pip install -r requirements.txt

1. ImpForge — Red-teaming (data collection)

Use the ImpForge red-teaming pipeline to generate joint-modal implicit malicious examples.

cd red-teaming
python run.py

2. CrossGuard — LoRA tuning (via LLaMA-Factory)

We fine-tune CrossGuard LoRA adapter using the LLaMA-Factory workflow. The steps below assume you already have LLaMA-Factory installed and configured.

(1) Put CrossGuard training config train_crossguard/run.yaml into the LLaMA-Factory /examples/train_lora/ folder

(2) Export required environment variables and start training

# required env vars
export WANDB_API_KEY=<your-wandb-key>
export HF_HOME=/path/to/hf_cache
export CUDA_VISIBLE_DEVICES=0
# run LLaMA-Factory training
cd /path/to/LLaMA-Factory
llamafactory-cli train examples/train_lora/run.yaml

3. CrossGuard — Evaluation

We evaluate CrossGuard on five safety benchmarks: JailbreakV, VLGuard, FigStep, MM-SafetyBench, and SIUO. Below is an example workflow using SIUO.

Since SIUO is the one containing implicit multimodal malicious sample, we use this as an example.

Download SIUO data from this project. And run the evaluation by following code:

cd eval_crossguard
python crossguard_on_siuo.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
adapter_model		adapter_model
data		data
eval_crossguard		eval_crossguard
guard_model		guard_model
red-teaming		red-teaming
train_crossguard		train_crossguard
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CrossGuard: Safeguarding MLLMs against Implicit Multimodal Malicious Attacks

0. Install Environment

1. ImpForge — Red-teaming (data collection)

2. CrossGuard — LoRA tuning (via LLaMA-Factory)

3. CrossGuard — Evaluation

About

Uh oh!

Releases

Packages

Languages

ZhangXu0963/CrossGuard

Folders and files

Latest commit

History

Repository files navigation

CrossGuard: Safeguarding MLLMs against Implicit Multimodal Malicious Attacks

0. Install Environment

1. ImpForge — Red-teaming (data collection)

2. CrossGuard — LoRA tuning (via LLaMA-Factory)

3. CrossGuard — Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages