Warning: code and data include potentially harmful content; reader discretion is advised.
python >= 3.10
pip install -r requirements.txtUse the ImpForge red-teaming pipeline to generate joint-modal implicit malicious examples.
cd red-teaming
python run.pyWe fine-tune CrossGuard LoRA adapter using the LLaMA-Factory workflow. The steps below assume you already have LLaMA-Factory installed and configured.
(1) Put CrossGuard training config train_crossguard/run.yaml into the LLaMA-Factory /examples/train_lora/ folder
(2) Export required environment variables and start training
# required env vars
export WANDB_API_KEY=<your-wandb-key>
export HF_HOME=/path/to/hf_cache
export CUDA_VISIBLE_DEVICES=0
# run LLaMA-Factory training
cd /path/to/LLaMA-Factory
llamafactory-cli train examples/train_lora/run.yamlWe evaluate CrossGuard on five safety benchmarks: JailbreakV, VLGuard, FigStep, MM-SafetyBench, and SIUO. Below is an example workflow using SIUO.
Since SIUO is the one containing implicit multimodal malicious sample, we use this as an example.
Download SIUO data from this project. And run the evaluation by following code:
cd eval_crossguard
python crossguard_on_siuo.py