Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Official implementation of the paper Hallucination Detection in LLMs Using Spectral Features of Attention Maps, accepted at EMNLP 2025 (see how to cite our work).

Important

If you have some questions regarding the code or the paper, please contact us at jakub.binkowski@pwr.edu.pl or create an issue in the repository.

Usage

Prerequisites

Python 3.12+
uv package manager

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Installation

CPU (Linux/macOS):

make install_cpu

GPU (Linux with CUDA 12.4):

make install_gpu

Reproduce the experiments

The following flowchart describes the main steps to reproduce experiments. All steps can be run using DVC stages defined in dvc.yaml. In addition, dvc.yaml define more stages to compute results for ablation study. Below, we describe the main steps in more detail.

graph LR
    A[Generate Attention Diagonals & Answers] --> B[Generate Labels]
    B --> C[Generate Split]
    C --> D["Train LapEigvals/AttnEigvals/AttnLogDet"]
    C --> E["Compute AttnScore (LLMCheck) baseline"]
    F["Generate Hidden States"] --> G["Train Hidden States Baselines"]
    C -->|"Re-use labels from attention features"| F

Datasets download

CoQA - download devset from the official website: https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json
GSM8K - is available through huggingface hub, will be downloaded automatically.
HaluevalQA - download data from the official repository: https://github.com/RUCAIBox/HaluEval?tab=readme-ov-file#data-release
NQOpen - is available through huggingface hub, will be downloaded automatically.
SQuADv2 - download devset from the official website: https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json
TriviaQA - is available through huggingface hub, will be downloaded automatically.
TruthfulQA - is available through huggingface hub, will be downloaded automatically.

Reproducing the results

Generate attention diagonals and answers:

Note

For all LLMs and datasets, except for mistral_small_24b_instruct_2501, 40GB of VRAM is enough.

Note

Separate stage is used for hidden states generation

CUDA_VISIBLE_DEVICES=0 NUM_PROC=1 dvc repro generate_attentions_only

CUDA_VISIBLE_DEVICES=0 NUM_PROC=1 dvc repro generate_hidden_states_for_selected_tokens

Evaluate generated answers

dvc repro eval_answers_ngram

Evaluate generated answers using LLM-as-judge

Note

Requires OPENAI_API_KEY to be present in .env file in the repository root dir, you can also configure OPENAI_API_BASE_URL to use different API endpoint

dvc repro eval_answers_llm_judge

Generate labels

Note

Separate stage is used for GSM8K dataset

dvc repro generate_labels

dvc repro generate_labels_gsm8k

Generate split

dvc repro generate_split

Train probes

Note

Separate stage is used for AttnScore baseline

Note

Separate stage is used for hidden states baselines

dvc repro train_attn_vs_laplacian_pca

dvc repro train_hidden_states_baselines

dvc repro probe_attn_score

Citation

If you use this code in your research or find the work relevant, please consider citing our paper:

@inproceedings{binkowski2025hallucination,
  title={Hallucination Detection in {LLM}s Using Spectral Features of Attention Maps},
  author={Jakub Binkowski and Denis Janiak and Albert Sawczyn and Bogdan Gabrys and Tomasz Jan Kajdanowicz},
  booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
  year={2025},
  url={https://openreview.net/forum?id=tm5JQTpBhj}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.dvc		.dvc
.github/workflows		.github/workflows
config		config
data/datasets		data/datasets
hallucinations		hallucinations
scripts		scripts
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
dvc.yaml		dvc.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Usage

Prerequisites

Installation

Reproduce the experiments

Datasets download

Reproducing the results

Citation

About

Uh oh!

Releases

Packages

Languages

graphml-lab-pwr/lapeigvals

Folders and files

Latest commit

History

Repository files navigation

Hallucination Detection in LLMs Using Spectral Features of Attention Maps

Usage

Prerequisites

Installation

Reproduce the experiments

Datasets download

Reproducing the results

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages