Small, Locally-Hosted LLMs for Sustainable Development Goal Classification

We are excited to announce that “Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals” has been accepted as a poster at the 2024 IEEE International Conference on Big Data (IEEE BigData 2024), which will take place from December 15–18, 2024, in Washington, DC. Learn more about the conference.

About the Study

Accurately assessing research contributions to the United Nations’ Sustainable Development Goals (SDGs) is a growing priority for academic institutions. Traditional methods, which rely heavily on keyword-based Boolean search queries, often conflate incidental keyword matches with genuine contributions to SDG targets, leading to reduced precision in bibliometric analyses.

Our study proposes a novel approach: leveraging small, locally-hosted Large Language Models (LLMs) as evaluation agents to address the limitations of keyword-based retrieval. Using a dataset of 340,000 abstracts retrieved via SDG-specific keyword queries, we demonstrated how these models can distinguish between semantically relevant contributions to SDG targets and incidental mentions.

Key Highlights

Novel Application: We evaluated three small, locally-hosted LLMs—Mistral-7B, Phi-3.5-mini, and Llama-3.2—for their ability to classify SDG-related research contributions with greater precision than traditional methods.
Improved Precision: These models leverage their semantic understanding to move beyond surface-level keyword matching, addressing key limitations in traditional SDG classification workflows.
Scalability: By running these LLMs locally, the approach offers a cost-efficient and scalable framework for institutions to align research with SDG goals.

Why It Matters

This work represents a step forward in SDG-related research evaluation, providing a more nuanced and precise approach to classifying scholarly contributions. The findings have broader implications for institutional benchmarking, funding strategies, and semantic search applications.

Future Directions

Our research paves the way for:

Developing multi-agent frameworks that combine multiple models to refine classification further.
Applying these techniques in semantic search systems to enable more effective discovery of SDG-relevant research.

Read the Full Preprint

The full preprint of our work is available on arXiv: https://arxiv.org/abs/2411.17598.

We look forward to presenting this work at IEEE BigData 2024 and engaging with the community on the potential of LLMs in advancing SDG-related research.