Evaluating Human-LLM Alignment in ETD Subject Classification
Abstract
Author-assigned subject labels in Electronic Theses and Dissertations (ETDs) are often inconsistent, overly broad, or misaligned with the research focus. This hampers discovery, aggregation, and analysis, especially for interdisciplinary research. LLMs offer a scalable alternative for automated classification, but their labeling rationale is opaque and introduces systematic biases. This study compares subject labels generated by LLMs with human-assigned labels for over 9,000 ETDs across 21 academic categories to assess the disagreement. We evaluate multiple prompt-based and fine-tuned LLM configurations and analyze areas of agreement and disagreement to identify patterns of misclassification. LLMs achieve competitive performance overall but frequently misclassify theoretical or interdisciplinary texts, often due to overweighting lexical cues and disregarding context. We show such errors are not random but reflect structured semantic divergences from human interpretation. These findings suggest a need for hybrid frameworks that combine LLM scalability with human contextual judgment to improve subject labeling in academic repositories.
Citation
2026. “Evaluating Human-LLM Alignment in ETD Subject Classification.” In New Trends in Theory and Practice of Digital Libraries, edited by Wolf-Tilo Balke, Koraljka Golub, Yannis Manolopoulos, Kostas Stefanidis, Zheying Zhang, Trond Aalberg, and Paolo Manghi, Cham, pp. 57–69. 10.1007/978-3-032-06136-2_6BibTeX
@inproceedings{klair2026Evaluating,
title = {Evaluating {{Human-LLM Alignment}} in~{{ETD Subject Classification}}},
author = {Klair, Hajra and German, Fausto and Banerjee, Bipasha and Ingram, William A.},
booktitle = {New {{Trends}} in {{Theory}} and {{Practice}} of {{Digital Libraries}}},
location = {Cham},
publisher = {Springer Nature Switzerland},
pages = {57--69},
doi = {10.1007/978-3-032-06136-2_6},
isbn = {978-3-032-06136-2},
editor = {Balke, Wolf-Tilo and Golub, Koraljka and Manolopoulos, Yannis and Stefanidis, Kostas and Zhang, Zheying and Aalberg, Trond and Manghi, Paolo},
year = {2026}
}