01
Nov
“Figure 4: Two options to win the game in 3 or 5 moves, respectively (more options exist). Since they both map into the highest-value bin our bot ignores Nh6+, the fastest way to win (in 3), and instead plays Nd6+ (matein-5). Unfortunately, a state-based predictor without explicit search cannot guarantee that it will continue playing the Nd6+ strategy and thus might randomly alternate between different strategies. Overall this increases the risk of drawing the game or losing due to a subsequent (low-probability) mistake, such as a bad softmax sample. Board from a game between our 9M Transformer (white) and a…