arXiv:2404.13150v1 Announce Type: new
Abstract: Traditional search algorithms have issues when applied to games of imperfect information where the number of possible underlying states and trajectories are very large. This challenge is particularly evident in trick-taking card games. While state sampling techniques such as Perfect Information Monte Carlo (PIMC) search has shown success in these contexts, they still have major limitations.
We present Generative Observation Monte Carlo Tree Search (GO-MCTS), which utilizes MCTS on observation sequences generated by a game specific model. This method performs the search within the observation space and advances the search using a model that depends solely on the agent’s observations. Additionally, we demonstrate that transformers are well-suited as the generative model in this context, and we demonstrate a process for iteratively training the transformer via population-based self-play.
The efficacy of GO-MCTS is demonstrated in various games of imperfect information, such as Hearts, Skat, and “The Crew: The Quest for Planet Nine,” with promising results.
Expert Commentary: Overcoming Limitations in Search Algorithms for Games of Imperfect Information
Traditional search algorithms have long been used to solve complex problems in various domains, including the field of game AI. However, when it comes to games of imperfect information, where the number of possible states and trajectories is extremely large, these algorithms face significant challenges. One particular domain where this is evident is trick-taking card games.
In a recent study, the authors propose a novel approach called Generative Observation Monte Carlo Tree Search (GO-MCTS) to address the limitations of existing search algorithms in games of imperfect information. The key idea behind GO-MCTS is to employ a game-specific generative model to generate observation sequences, which are then used for MCTS-based search.
By performing the search within the observation space, GO-MCTS enables the algorithm to make decisions based solely on the agent’s observations, without relying on full knowledge of the underlying game state. This is a significant advantage as it eliminates the need for the algorithm to reason about unobserved information, which is inherently challenging in games of imperfect information.
An interesting aspect highlighted in the study is the usage of transformers as the generative model in the GO-MCTS framework. Transformers have gained prominence in various domains, including natural language processing and computer vision, for their ability to effectively model dependencies among sequences. In the context of generative models for game AI, transformers prove to be well-suited due to their capability to capture complex relationships among observations.
Furthermore, the study presents an iterative training process for the transformer model, leveraging population-based self-play. This approach allows the model to progressively improve its ability to generate realistic observation sequences, consequently enhancing the overall performance of the GO-MCTS algorithm.
The efficacy of the proposed GO-MCTS method is demonstrated through experiments on several games of imperfect information, including Hearts, Skat, and “The Crew: The Quest for Planet Nine.” The results show promising outcomes, indicating the potential of this approach to overcome the limitations of traditional search algorithms in such game domains.
Overall, the multi-disciplinary nature of this research is evident, bridging concepts from game AI, generative modeling, and population-based training methods. The utilization of transformers as generative models for game AI introduces a fascinating intersection between deep learning and game theory. Going forward, it would be interesting to explore the application of the GO-MCTS framework in other domains and analyze its performance compared to alternative approaches in the field of games of imperfect information.
Reference:
Author(s). “Title of the Article.” arXiv preprint arXiv:2404.13150v1.