arXiv:2501.13200v1 Announce Type: cross Abstract: Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents’ behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.
The article “Shared Recurrent Memory Transformer for Multi-Agent Reinforcement Learning” addresses the challenges of achieving cooperation in multi-agent reinforcement learning (MARL) systems. MARL has shown great progress in solving cooperative and competitive problems, but one of the main obstacles is the explicit prediction of agents’ behavior. To overcome this, the authors propose the Shared Recurrent Memory Transformer (SRMT), which extends memory transformers to enable agents to exchange information and coordinate their actions implicitly. The SRMT is evaluated on a Partially Observable Multi-Agent Pathfinding problem and a POGEMA benchmark set of tasks, demonstrating superior performance compared to other reinforcement learning baselines and competitive results on various map scenarios. The incorporation of shared recurrent memory into transformer-based architectures enhances coordination in decentralized multi-agent systems. The source code for training and evaluation is also provided on GitHub.

Enhancing Coordination in Multi-Agent Systems with Shared Recurrent Memory Transformer

Multi-agent reinforcement learning (MARL) has made significant strides in solving complex cooperative and competitive tasks in various environments. However, one of the key challenges in MARL revolves around explicitly predicting agents’ behavior to achieve efficient cooperation. To address this issue, a groundbreaking solution is proposed in the form of the Shared Recurrent Memory Transformer (SRMT). By extending memory transformers to multi-agent settings, SRMT enables agents to implicitly exchange information and coordinate their actions.

Challenges in Multi-Agent Reinforcement Learning

Coordinating the actions of multiple agents in a decentralized environment poses several challenges. Traditional MARL approaches typically require predicting the behavior of other agents explicitly, which can be computationally intensive and restrict the scalability of the system. Moreover, effectively coordinating actions becomes particularly difficult when agents have limited visibility of their environment and receive sparse rewards.

To overcome these challenges, the SRMT framework capitalizes on the power of memory transformers and shared recurrent memory. By pooling and globally broadcasting individual working memories, agents can implicitly exchange information without the need for explicit prediction. This implicit information exchange greatly enhances coordination capabilities in decentralized multi-agent systems.

Evaluation and Performance

The authors evaluate the effectiveness of the SRMT framework in two settings: the Partially Observable Multi-Agent Pathfinding problem and a benchmark set of tasks known as POGEMA. In the Partially Observable Multi-Agent Pathfinding task, agents must navigate through a narrow corridor (referred to as the Bottleneck task). SRMT consistently outperforms various reinforcement learning baselines, especially under sparse rewards. It also demonstrates effective generalization to longer corridors, unseen during training.

When evaluated on the POGEMA maps, including Mazes, Random, and MovingAI, SRMT shows competitiveness with recent state-of-the-art MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into transformer-based architectures offers a promising avenue for improving coordination in multi-agent systems.

Conclusion

The Shared Recurrent Memory Transformer (SRMT) presents a novel approach to address the coordination challenges in multi-agent systems. By enabling agents to implicitly exchange information and coordinate their actions, SRMT outperforms existing MARL and planning-based algorithms in various tasks, including navigating narrow corridors and tackling diverse benchmark sets. The results highlight the potential of incorporating shared recurrent memory in transformer-based architectures to enhance coordination and scalability in decentralized multi-agent environments.

For more information and access to the source code for training and evaluation, visit the project’s GitHub repository: https://github.com/Aloriosa/srmt.

The paper titled “Shared Recurrent Memory Transformer for Multi-Agent Reinforcement Learning” introduces a novel approach to address the challenge of achieving cooperation in multi-agent reinforcement learning (MARL) settings. The authors propose the Shared Recurrent Memory Transformer (SRMT), which extends memory transformers to enable agents to exchange information implicitly and coordinate their actions.

Cooperation is a fundamental aspect of MARL, as agents need to coordinate their behaviors to achieve optimal outcomes. Traditionally, explicit prediction of agents’ behavior has been required, which can be computationally expensive and limit scalability. The SRMT approach aims to overcome this limitation by pooling and globally broadcasting individual working memories, allowing agents to share information without explicit predictions.

To evaluate the effectiveness of SRMT, the authors conducted experiments on two different tasks. The first task is the Partially Observable Multi-Agent Pathfinding problem, specifically focusing on a toy Bottleneck navigation task. In this task, agents need to navigate through a narrow corridor. The results show that SRMT consistently outperforms various other reinforcement learning baselines, especially when rewards are sparse. Additionally, SRMT demonstrates effective generalization to longer corridors not seen during training.

The second task involves evaluating SRMT on a benchmark set of tasks known as POGEMA maps. These maps include different scenarios such as Mazes, Random, and MovingAI. The results indicate that SRMT performs competitively with recent MARL, hybrid, and planning-based algorithms on these tasks.

Overall, the findings of this paper suggest that incorporating shared recurrent memory into transformer-based architectures can significantly enhance coordination in decentralized multi-agent systems. The SRMT approach provides a promising solution to the challenge of achieving cooperation in MARL, showcasing improved performance and generalization capabilities.

It is worth noting that the availability of the source code for training and evaluation on GitHub is a valuable contribution to the research community. This allows researchers and practitioners to replicate the experiments and further build upon the proposed approach. Future work in this area could involve applying SRMT to more complex and realistic multi-agent scenarios, as well as exploring potential optimizations or variations of the SRMT architecture.
Read the original article