NBDI: A Simple and Efficient Termination Condition for Skill Extraction from Task-Agnostic Demonstrations

stp2yJanuary 24, 20250 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 22 Jan 2025 (v1), last revised 23 Jan 2025 (this version, v2)]

View a PDF of the paper titled NBDI: A Simple and Efficient Termination Condition for Skill Extraction from Task-Agnostic Demonstrations, by Myunsoo Kim and 4 other authors

View PDF
HTML (experimental)

Abstract:Intelligent agents are able to make decisions based on different levels of granularity and duration. Recent advances in skill learning enabled the agent to solve complex, long-horizon tasks by effectively guiding the agent in choosing appropriate skills. However, the practice of using fixed-length skills can easily result in skipping valuable decision points, which ultimately limits the potential for further exploration and faster policy learning. In this work, we propose to learn a simple and effective termination condition that identifies decision points through a state-action novelty module that leverages agent experience data. Our approach, Novelty-based Decision Point Identification (NBDI), outperforms previous baselines in complex, long-horizon tasks, and remains effective even in the presence of significant variations in the environment configurations of downstream tasks, highlighting the importance of decision point identification in skill learning.

Submission history

From: Myunsoo Kim [view email]
[v1]
Wed, 22 Jan 2025 06:08:15 UTC (3,970 KB)
[v2]
Thu, 23 Jan 2025 04:14:02 UTC (3,968 KB)

Source link
lol

By stp2y