FLAME: Financial Large-Language Model Assessment and Metrics Evaluation

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning



arXiv:2501.06211v1 Announce Type: new
Abstract: LLMs have revolutionized NLP and demonstrated potential across diverse domains. More and more financial LLMs have been introduced for finance-specific tasks, yet comprehensively assessing their value is still challenging. In this paper, we introduce FLAME, a comprehensive financial LLMs evaluation system in Chinese, which includes two core evaluation benchmarks: FLAME-Cer and FLAME-Sce. FLAME-Cer covers 14 types of authoritative financial certifications, including CPA, CFA, and FRM, with a total of approximately 16,000 carefully selected questions. All questions have been manually reviewed to ensure accuracy and representativeness. FLAME-Sce consists of 10 primary core financial business scenarios, 21 secondary financial business scenarios, and a comprehensive evaluation set of nearly 100 tertiary financial application tasks. We evaluate 6 representative LLMs, including GPT-4o, GLM-4, ERNIE-4.0, Qwen2.5, XuanYuan3, and the latest Baichuan4-Finance, revealing Baichuan4-Finance excels other LLMs in most tasks. By establishing a comprehensive and professional evaluation system, FLAME facilitates the advancement of financial LLMs in Chinese contexts. Instructions for participating in the evaluation are available on GitHub: https://github.com/FLAME-ruc/FLAME.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.