Learning from LLM Disagreement in Retrieval Evaluation

Ingram, William A.; Banerjee, Bipasha; Fox, Edward A.

doi:10.1109/JCDL67857.2025.00024

Learning from LLM Disagreement in Retrieval Evaluation

Abstract

Large language models (LLMs) are being integrated into information retrieval pipelines within digital library systems for tasks such as re-ranking and filtering. However, a challenge arises from the observed disagreement between different LLMs in borderline classification cases, raising concerns about how this variability impacts downstream retrieval and the integrity of digital library collections. This study examines disagreement between two open-weight LLMs, LLaMA and Qwen, when tasked with evaluating a corpus of scholarly abstracts based on their contribution to Sustainable Development Goals (SDGs). We isolate subsets of documents where model disagreement occurs and examine their lexical properties, rank-order behavior, and classification predictability. Our results demonstrate that this model disagreement is not random: it concentrates in ambiguous cases, produces divergent top-k outputs under shared scoring functions, and is separable with AUCs above 0.74 using logistic regression. These findings suggest that LLM-based filtering introduces structured variability in document retrieval, even under controlled prompting and shared ranking logic. We propose using classification disagreement as an object of analysis in retrieval evaluation, particularly in subjective or thematic search tasks.

Citation

William A. Ingram, Bipasha Banerjee, and Edward A. Fox. 2025. “Learning from LLM Disagreement in Retrieval Evaluation.” In Proceedings of the 2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’25), Virtual Event, pp. 129–138. 10.1109/JCDL67857.2025.00024

BibTeX

@inproceedings{ingram2025learning,
  title = {Learning from LLM Disagreement in Retrieval Evaluation},
  author = {Ingram, William A. and Banerjee, Bipasha and Fox, Edward A.},
  year = {2025},
  booktitle = {Proceedings of the 2025 ACM/IEEE Joint Conference on Digital Libraries},
  series = {JCDL '25},
  location = {Virtual Event},
  pages = {129--138},
  doi = {10.1109/JCDL67857.2025.00024}
}