Skip to content

Implementation of the paper "Hallucination Detection in LLMs Using Spectral Features of Attention Maps"

Notifications You must be signed in to change notification settings

graphml-lab-pwr/lapeigvals

Repository files navigation

Hallucination Detection in LLMs Using Spectral Features of Attention Maps

python-3.12 arXiv accepted

Official implementation of the paper Hallucination Detection in LLMs Using Spectral Features of Attention Maps, accepted at EMNLP 2025 (see how to cite our work).

Important

If you have some questions regarding the code or the paper, please contact us at jakub.binkowski@pwr.edu.pl or create an issue in the repository.

Usage

Prerequisites

  • Python 3.12+
  • uv package manager

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Installation

CPU (Linux/macOS):

make install_cpu

GPU (Linux with CUDA 12.4):

make install_gpu

Reproduce the experiments

The following flowchart describes the main steps to reproduce experiments. All steps can be run using DVC stages defined in dvc.yaml. In addition, dvc.yaml define more stages to compute results for ablation study. Below, we describe the main steps in more detail.

graph LR
    A[Generate Attention Diagonals & Answers] --> B[Generate Labels]
    B --> C[Generate Split]
    C --> D["Train LapEigvals/AttnEigvals/AttnLogDet"]
    C --> E["Compute AttnScore (LLMCheck) baseline"]
    F["Generate Hidden States"] --> G["Train Hidden States Baselines"]
    C -->|"Re-use labels from attention features"| F
Loading

Datasets download

Reproducing the results

  1. Generate attention diagonals and answers:

Note

For all LLMs and datasets, except for mistral_small_24b_instruct_2501, 40GB of VRAM is enough.

Note

Separate stage is used for hidden states generation

CUDA_VISIBLE_DEVICES=0 NUM_PROC=1 dvc repro generate_attentions_only
CUDA_VISIBLE_DEVICES=0 NUM_PROC=1 dvc repro generate_hidden_states_for_selected_tokens
  1. Evaluate generated answers
dvc repro eval_answers_ngram
  1. Evaluate generated answers using LLM-as-judge

Note

Requires OPENAI_API_KEY to be present in .env file in the repository root dir, you can also configure OPENAI_API_BASE_URL to use different API endpoint

dvc repro eval_answers_llm_judge
  1. Generate labels

Note

Separate stage is used for GSM8K dataset

dvc repro generate_labels
dvc repro generate_labels_gsm8k
  1. Generate split
dvc repro generate_split
  1. Train probes

Note

Separate stage is used for AttnScore baseline

Note

Separate stage is used for hidden states baselines

dvc repro train_attn_vs_laplacian_pca
dvc repro train_hidden_states_baselines
dvc repro probe_attn_score

Citation

If you use this code in your research or find the work relevant, please consider citing our paper:

@inproceedings{binkowski2025hallucination,
  title={Hallucination Detection in {LLM}s Using Spectral Features of Attention Maps},
  author={Jakub Binkowski and Denis Janiak and Albert Sawczyn and Bogdan Gabrys and Tomasz Jan Kajdanowicz},
  booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
  year={2025},
  url={https://openreview.net/forum?id=tm5JQTpBhj}
}

About

Implementation of the paper "Hallucination Detection in LLMs Using Spectral Features of Attention Maps"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published