Skip to content

Dataset from the paper 'The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers'

Notifications You must be signed in to change notification settings

WorldHellow/SLAQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SLAQ: The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers

arXiv Hugging Face

Dataset and code from the paper The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers https://arxiv.org/abs/2510.11218

framework

Repository Information

This repository contains the following

  1. Gold Short-Long form dataset in the dataset folder.
  2. Inference scripts for evaluating your LLM in using the dataset.
  3. Evaluation scripts for using LLM-as-a-judge (gemini) and computing factual accuracy and alignment scores.

Results

Below image provides SLAQ factual accuracy and alignment scores for Gemma, Qwen and Llama models. You can go over these results in the evaluation/raw_benchmarking_results folder.

scores

Citations

@misc{islam2025curiouscasefactualmisalignment,
      title={The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers}, 
      author={Saad Obaid ul Islam and Anne Lauscher and Goran Glavaš},
      year={2025},
      eprint={2510.11218},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.11218}, 
}

About

Dataset from the paper 'The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published