ArxEval: Evaluating Retrieval and Generation in Language Models for Scientific Literature

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled ArxEval: Evaluating Retrieval and Generation in Language Models for Scientific Literature, by Aarush Sinha and 3 other authors

View PDF
HTML (experimental)

Abstract:Language Models [LMs] are now playing an increasingly large role in information generation and synthesis; the representation of scientific knowledge in these systems needs to be highly accurate. A prime challenge is hallucination; that is, generating apparently plausible but actually false information, including invented citations and nonexistent research papers. This kind of inaccuracy is dangerous in all the domains that require high levels of factual correctness, such as academia and education. This work presents a pipeline for evaluating the frequency with which language models hallucinate in generating responses in the scientific literature. We propose ArxEval, an evaluation pipeline with two tasks using ArXiv as a repository: Jumbled Titles and Mixed Titles. Our evaluation includes fifteen widely used language models and provides comparative insights into their reliability in handling scientific literature.

Submission history

From: Aarush Sinha [view email]
[v1]
Fri, 17 Jan 2025 05:19:24 UTC (2,767 KB)
[v2]
Wed, 22 Jan 2025 04:17:21 UTC (2,775 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.