Official implementation of the paper Hallucination Detection in LLMs Using Spectral Features of Attention Maps, accepted at EMNLP 2025 (see how to cite our work).
Important
If you have some questions regarding the code or the paper, please contact us at jakub.binkowski@pwr.edu.pl or create an issue in the repository.
- Python 3.12+
- uv package manager
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | shCPU (Linux/macOS):
make install_cpuGPU (Linux with CUDA 12.4):
make install_gpuThe following flowchart describes the main steps to reproduce experiments. All steps can be run using DVC stages defined in dvc.yaml. In addition, dvc.yaml define more stages to compute results for ablation study. Below, we describe the main steps in more detail.
graph LR
A[Generate Attention Diagonals & Answers] --> B[Generate Labels]
B --> C[Generate Split]
C --> D["Train LapEigvals/AttnEigvals/AttnLogDet"]
C --> E["Compute AttnScore (LLMCheck) baseline"]
F["Generate Hidden States"] --> G["Train Hidden States Baselines"]
C -->|"Re-use labels from attention features"| F
CoQA- download devset from the official website: https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.jsonGSM8K- is available through huggingface hub, will be downloaded automatically.HaluevalQA- download data from the official repository: https://github.com/RUCAIBox/HaluEval?tab=readme-ov-file#data-releaseNQOpen- is available through huggingface hub, will be downloaded automatically.SQuADv2- download devset from the official website: https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.jsonTriviaQA- is available through huggingface hub, will be downloaded automatically.TruthfulQA- is available through huggingface hub, will be downloaded automatically.
- Generate attention diagonals and answers:
Note
For all LLMs and datasets, except for mistral_small_24b_instruct_2501, 40GB of VRAM is enough.
Note
Separate stage is used for hidden states generation
CUDA_VISIBLE_DEVICES=0 NUM_PROC=1 dvc repro generate_attentions_onlyCUDA_VISIBLE_DEVICES=0 NUM_PROC=1 dvc repro generate_hidden_states_for_selected_tokens- Evaluate generated answers
dvc repro eval_answers_ngram- Evaluate generated answers using LLM-as-judge
Note
Requires OPENAI_API_KEY to be present in .env file in the repository root dir, you can also configure OPENAI_API_BASE_URL to use different API endpoint
dvc repro eval_answers_llm_judge- Generate labels
Note
Separate stage is used for GSM8K dataset
dvc repro generate_labelsdvc repro generate_labels_gsm8k- Generate split
dvc repro generate_split- Train probes
Note
Separate stage is used for AttnScore baseline
Note
Separate stage is used for hidden states baselines
dvc repro train_attn_vs_laplacian_pcadvc repro train_hidden_states_baselinesdvc repro probe_attn_scoreIf you use this code in your research or find the work relevant, please consider citing our paper:
@inproceedings{binkowski2025hallucination,
title={Hallucination Detection in {LLM}s Using Spectral Features of Attention Maps},
author={Jakub Binkowski and Denis Janiak and Albert Sawczyn and Bogdan Gabrys and Tomasz Jan Kajdanowicz},
booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
year={2025},
url={https://openreview.net/forum?id=tm5JQTpBhj}
}