While AlphaZero-style reinforcement learning (RL) algorithms excel in various
board games, in this paper we show that they face challenges on impartial games
where players share pieces. We present a concrete example of a game – namely
the children’s game of Nim – and other impartial games that seem to be a
stumbling block for AlphaZero-style and similar self-play reinforcement
learning algorithms.

Our work is built on the challenges posed by the intricacies of data
distribution on the ability of neural networks to learn parity functions,
exacerbated by the noisy labels issue. Our findings are consistent with recent
studies showing that AlphaZero-style algorithms are vulnerable to adversarial
attacks and adversarial perturbations, showing the difficulty of learning to
master the games in all legal states.

We show that Nim can be learned on small boards, but the learning progress of
AlphaZero-style algorithms dramatically slows down when the board size
increases. Intuitively, the difference between impartial games like Nim and
partisan games like Chess and Go can be explained by the fact that if a small
part of the board is covered for impartial games it is typically not possible
to predict whether the position is won or lost as there is often zero
correlation between the visible part of a partly blanked-out position and its
correct evaluation. This situation starkly contrasts partisan games where a
partly blanked-out board position typically provides abundant or at least
non-trifle information about the value of the fully uncovered position.

The article highlights the challenges faced by AlphaZero-style reinforcement learning algorithms when applied to impartial games, such as the children’s game of Nim. These algorithms, which have shown exceptional performance in board games like Chess and Go, struggle to master impartial games where players share pieces. The authors draw attention to the multi-disciplinary nature of the concepts discussed, particularly focusing on data distribution, neural networks, and adversarial attacks.

One crucial aspect explored in the paper is the difficulty of learning parity functions, especially in the presence of noisy labels. This issue becomes even more pronounced in impartial games, making it harder for reinforcement learning algorithms to excel. The authors’ findings align with recent studies that indicate AlphaZero-style algorithms are vulnerable to adversarial attacks and perturbations. This vulnerability underscores the challenge of achieving mastery in all legal game states through self-play reinforcement learning.

While the paper acknowledges that AlphaZero-style algorithms can learn Nim on small boards, their progress significantly slows down as the board size increases. The key difference between impartial and partisan games like Chess and Go lies in the predictive power of partly blanked-out positions. In partisan games, a partially revealed position often provides valuable information about the value of the fully uncovered position. In contrast, impartial games like Nim offer little or no correlation between the visible part of a partly blanked-out position and its true evaluation, rendering prediction challenging.

This analysis sheds light on the particular challenges faced by reinforcement learning algorithms when applied to impartial games and further emphasizes the necessity of a multi-disciplinary approach to tackle these issues. Future research could focus on developing new algorithms that effectively navigate the complexities inherent in impartial games, potentially incorporating techniques from fields such as game theory or probabilistic modeling.

Read the original article