Skip to main content Skip to docs navigation

Identifying Future Work Chapters in Electronic Theses and Dissertations

View at Publisher →

Abstract

Electronic Theses and Dissertations (ETDs) contain conclusion and future work chapters that are difficult to locate automatically due to highly variable chapter titles across disciplines and institutions, limiting large-scale synthesis and discovery. We investigate automatic detection of conclusion/future-work–related chapters, operationalized with seven labels (conclusions, summary, discussion, future work, recommendations, limitations, implications) at the start-page level, across 299 ETDs with 334 annotated positives spanning seven academic domains. We compare heading-driven baselines (GROBID and LayoutLMv3 for heading extraction, paired with lexical, semantic, NLI, and LLM classifiers) against a modular LLM system with three components: (1) layout-preserving text extraction, (2) LLM-based page filtering to retain likely chapter starts, and (3) LLM chapter detection. We systematically test combinations of these components (referred to as stages throughout) to isolate individual contributions. Evaluation is page-level with exact start-page matching. Our best result (Llama 4 Scout, Stage 2+3) outperforms the strongest baseline (LayoutLMv3–LLM). Stage 2 substantially improves precision, while Stage 1 has mixed, generally modest effects across models. Mistral Small achieves the highest precision, whereas Llama 3.3 yields the highest recall, underscoring model trade-offs. We release prompts and configurations for reproducibility and highlight compute–accuracy considerations, showing that lightweight LLM-based page filtering combined with LLM chapter detection is a practical, effective strategy for surfacing conclusion/future-work content in long, heterogeneous ETDs.

Citation

, , , and . . Identifying Future Work Chapters in Electronic Theses and Dissertations.” In Proceedings of the 2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’25), Virtual Event, pp. 177186. 10.1109/JCDL67857.2025.00029

BibTeX

@inproceedings{aboelnaga2025identifying,
  title = {Identifying Future Work Chapters in Electronic Theses and Dissertations},
  author = {Aboelnaga, Amr and Klair, Hajra and Eldardiry, Hoda and Ingram, William A.},
  year = {2025},
  booktitle = {Proceedings of the 2025 ACM/IEEE Joint Conference on Digital Libraries},
  series = {JCDL '25},
  location = {Virtual Event},
  pages = {177--186},
  doi = {10.1109/JCDL67857.2025.00029}
}