View a PDF of the paper titled Pessimistic Backward Policy for GFlowNets, by Hyosoon Jang and 4 other authors
Abstract:This paper studies Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function through the trajectory of state transitions. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories, which may lead to a large gap between the estimated flow and the (known) reward value. In response to this challenge, we propose a pessimistic backward policy for GFlowNets (PBP-GFN), which maximizes the observed flow to align closely with the true reward for the object. We extensively evaluate PBP-GFN across eight benchmarks, including hyper-grid environment, bag generation, structured set generation, molecular generation, and four RNA sequence generation tasks. In particular, PBP-GFN enhances the discovery of high-reward objects, maintains the diversity of the objects, and consistently outperforms existing methods.
Submission history
From: Hyosoon Jang [view email]
[v1]
Sat, 25 May 2024 02:30:46 UTC (2,350 KB)
[v2]
Wed, 16 Oct 2024 15:57:03 UTC (2,942 KB)
[v3]
Tue, 29 Oct 2024 03:11:17 UTC (2,942 KB)
Source link
lol