NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

stp2yJanuary 8, 20250 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 15 Oct 2024 (v1), last revised 7 Jan 2025 (this version, v2)]

View a PDF of the paper titled NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models, by Han Han and 5 other authors

View PDF
HTML (experimental)

Abstract:Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.

Submission history

From: Han Han [view email]
[v1]
Tue, 15 Oct 2024 17:33:43 UTC (624 KB)
[v2]
Tue, 7 Jan 2025 13:34:06 UTC (652 KB)

Source link
lol

By stp2y