ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

stp2yJanuary 16, 20250 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 2 Oct 2024 (v1), last revised 13 Jan 2025 (this version, v4)]

View a PDF of the paper titled ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning, by Xiao Yu and 5 other authors

View PDF
HTML (experimental)

Abstract:Autonomous agents have demonstrated significant potential in automating complex multistep decision-making tasks. However, even state-of-the-art vision-language models (VLMs), such as GPT-4o, still fall short of human-level performance, particularly in intricate web environments and long-horizon tasks. To address these limitations, we present ExACT, an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents’ ability to explore decision space on the fly. R-MCTS extends traditional MCTS by 1) incorporating contrastive reflection, allowing agents to learn from past interactions and dynamically improve their search efficiency; and 2) using multi-agent debate for reliable state evaluation. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms. On the challenging VisualWebArena benchmark, our GPT-4o based R-MCTS agent achieves a 6% to 30% relative improvement across various tasks compared to the previous state-of-the-art. Additionally, we show that the knowledge and experience gained from test-time search can be effectively transferred back to GPT-4o via fine-tuning. After Exploratory Learning, GPT-4o 1) demonstrates the ability to explore the environment, evaluate a state, and backtrack to viable ones when it detects that the current state cannot lead to success, and 2) matches 87% of R-MCTS’s performance while using significantly less compute. Notably, our work demonstrates the compute scaling properties in both training – data collection with R-MCTS – and testing time. These results suggest a promising research direction to enhance VLMs’ capabilities for agentic applications via test-time search and self-learning.

Submission history

From: Xiao Yu [view email]
[v1]
Wed, 2 Oct 2024 21:42:35 UTC (6,770 KB)
[v2]
Tue, 15 Oct 2024 14:59:46 UTC (8,143 KB)
[v3]
Fri, 18 Oct 2024 03:27:37 UTC (8,144 KB)
[v4]
Mon, 13 Jan 2025 19:51:53 UTC (8,160 KB)

Source link
lol

By stp2y