Large Language Models in the Clinic: A Comprehensive Benchmark

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled Large Language Models in the Clinic: A Comprehensive Benchmark, by Fenglin Liu and 18 other authors

View PDF

Abstract:The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and clinical tasks that are complex but common in real-world practice, e.g., open-ended decision-making, long document processing, and emerging drug analysis. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs. The benchmark data is available at this https URL.

Submission history

From: Hongjian Zhou [view email]
[v1]
Thu, 25 Apr 2024 15:51:06 UTC (124 KB)
[v2]
Tue, 25 Jun 2024 17:23:22 UTC (392 KB)
[v3]
Wed, 26 Jun 2024 17:48:18 UTC (392 KB)
[v4]
Wed, 16 Oct 2024 09:18:58 UTC (398 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.