arXiv:2402.14083v1 Announce Type: new
Abstract: While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard $A^*$ search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of $A^*$. This model is then fine-tuned via expert iterations to perform fewer search steps than $A^*$ search while still generating an optimal plan. In our training method, $A^*$’s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10$times$ smaller model size and a 10$times$ smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.
Transformers in Complex Decision Making Tasks
In recent years, Transformers have gained popularity and achieved remarkable success in various application settings. However, when it comes to complex decision-making tasks, traditional symbolic planners still outperform Transformer architectures. This article introduces a novel approach to training Transformers for solving complex planning tasks, demonstrating the potential for these architectures to bridge the gap and excel in this domain.
Introducing Searchformer
The authors present Searchformer, a Transformer model specifically designed to solve previously unseen Sokoban puzzles. Impressively, Searchformer achieves optimal solutions 93.7% of the time while employing up to 26.8% fewer search steps than the standard $A^*$ search algorithm.
To achieve this, Searchformer is constructed as an encoder-decoder Transformer model that is initially trained to predict the search dynamics of $A^*$, a widely-used symbolic planning algorithm. This pre-training phase allows Searchformer to gain an understanding of the underlying search process. Subsequently, the model undergoes fine-tuning through expert iterations, aiming to generate optimal plans while minimizing the number of search steps required.
The Training Method
The training method employed in this work involves expressing $A^*$’s search dynamics as a token sequence that outlines the addition and removal of task states in the search tree during symbolic planning. By framing the training in this way, Searchformer learns to effectively predict the optimal plan with fewer search steps. The results of ablation studies on maze navigation demonstrate the superiority of Searchformer over baselines that directly predict the optimal plan.
Multi-disciplinary Nature
This research showcases the multi-disciplinary nature of the concepts involved. By combining ideas from natural language processing and symbolic planning, the authors have created a Transformer architecture that excels in complex decision-making tasks. This highlights the importance of integrating knowledge from different domains to push the boundaries of what Transformers can achieve.
Scaling to Larger Tasks
Another notable aspect of Searchformer is its ability to scale to larger and more complex decision-making tasks like Sokoban. The model exhibits improved percentages of solved tasks and shorter search dynamics, further emphasizing the potential of Transformers in this domain. With its capability to handle larger problems, Searchformer opens up avenues for applying Transformer-based approaches to a wide range of complex planning applications.