View a PDF of the paper titled Check-Eval: A Checklist-based Approach for Evaluating Text Quality, by Jayr Pereira and Andre Assumpcao and Roberto Lotufo
Abstract:Evaluating the quality of text generated by large language models (LLMs) remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we propose textsc{Check-Eval}, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach. textsc{Check-Eval} can be employed as both a reference-free and reference-dependent evaluation method, providing a structured and interpretable assessment of text quality. The framework consists of two main stages: checklist generation and checklist evaluation. We validate textsc{Check-Eval} on two benchmark datasets: Portuguese Legal Semantic Textual Similarity and textsc{SummEval}. Our results demonstrate that textsc{Check-Eval} achieves higher correlations with human judgments compared to existing metrics, such as textsc{G-Eval} and textsc{GPTScore}, underscoring its potential as a more reliable and effective evaluation framework for natural language generation tasks. The code for our experiments is available at url{https://anonymous.4open.science/r/check-eval-0DB4}
Submission history
From: Jayr Pereira [view email]
[v1]
Fri, 19 Jul 2024 17:14:16 UTC (624 KB)
[v2]
Tue, 10 Sep 2024 14:08:29 UTC (635 KB)
Source link
lol