Markov flow policy — deep MC

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


This paper has been withdrawn by Nitsan Soffair

View a PDF of the paper titled Markov flow policy — deep MC, by Nitsan Soffair and 1 other authors

No PDF available, click to view other formats

Abstract:Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts ((gamma)). Interestingly, these algorithms are often tested without applying a discount, a phenomenon we refer as the textit{train-test bias}. In response to these challenges, we propose the Markov Flow Policy, which utilizes a non-negative neural network flow to enable comprehensive forward-view predictions. Through integration into the TD7 codebase and evaluation using the MuJoCo benchmark, we observe significant performance improvements, positioning MFP as a straightforward, practical, and easily implementable solution within the domain of average rewards algorithms.

Submission history

From: Nitsan Soffair [view email]
[v1]
Wed, 1 May 2024 21:42:38 UTC (270 KB)
[v2]
Sun, 2 Jun 2024 19:41:04 UTC (1 KB) (withdrawn)
[v3]
Fri, 30 Aug 2024 10:02:22 UTC (1 KB) (withdrawn)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.