Enhancing Human Annotation: Leveraging Large Language Models and Efficient Batch Processing?

Abstract

Large language models (LLMs) are capable of assessing document and query characteristics, including relevance, and are now being used for a variety of different classification labeling tasks as well. This study explores how to use LLMs to classify an information need, often represented as a user query. In particular, our goal is to classify the cognitive complexity of the search task for a given ``backstory’’. Using 180 TREC topics and backstories, we show that GPT-based LLMs agree with human experts as much as other human experts. We also show that batching and ordering can significantly impact the accuracy of GPT-3.5, but rarely alter the quality of GPT-4 predictions. This study provides insights into the efficacy of large language models for annotation tasks normally completed by humans, and offers recommendations for other similar applications.

Publication
In Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ‘24)
Oleg Zendel
Oleg Zendel
Research Fellow

My research interests mainly include search systems, especially from the information retrieval perspective and their evaluation.