Check-Eval: A Checklist-based Approach for Evaluating Text Quality

[Submitted on 19 Jul 2024 (v1), last revised 10 Sep 2024 (this version, v2)]

View a PDF of the paper titled Check-Eval: A Checklist-based Approach for Evaluating Text Quality, by Jayr Pereira and Andre Assumpcao and Roberto Lotufo

View PDF
HTML (experimental)

Abstract:Evaluating the quality of text generated by large language models (LLMs) remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we propose textsc{Check-Eval}, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach. textsc{Check-Eval} can be employed as both a reference-free and reference-dependent evaluation method, providing a structured and interpretable assessment of text quality. The framework consists of two main stages: checklist generation and checklist evaluation. We validate textsc{Check-Eval} on two benchmark datasets: Portuguese Legal Semantic Textual Similarity and textsc{SummEval}. Our results demonstrate that textsc{Check-Eval} achieves higher correlations with human judgments compared to existing metrics, such as textsc{G-Eval} and textsc{GPTScore}, underscoring its potential as a more reliable and effective evaluation framework for natural language generation tasks. The code for our experiments is available at url{https://anonymous.4open.science/r/check-eval-0DB4}