arXiv:2403.18056v1 Announce Type: new
Abstract: Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL’s key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.

Analysis of Hierarchical Cooperation Graph Learning (HCGL) for Multi-Agent Reinforcement Learning

In recent years, Multi-Agent Reinforcement Learning (MARL) has emerged as an effective approach for solving cooperative challenges. However, traditional non-hierarchical MARL algorithms have limitations when it comes to addressing complex multi-agent problems that require hierarchical cooperative behaviors. The paper introduces a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) to tackle these challenges.

HCGL: A Three-Component Model

HCGL consists of three key components:

  1. Extensible Cooperation Graph (ECG): The ECG serves as the foundation of HCGL. It is a dynamic graph that facilitates self-clustering cooperation. The ECG is structured as a three-layer graph, comprising agent nodes, cluster nodes, and target nodes. This hierarchical representation allows for the integration of fundamental cooperative knowledge.
  2. Graph Operators: The HCGL model utilizes a set of trained graph operators to adjust the topology of the ECG. These graph operators dynamically manipulate the edge connections in response to changing environmental conditions.
  3. MARL Optimizer: The MARL optimizer is responsible for training the graph operators in HCGL. By optimizing the graph operators, HCGL effectively guides the behaviors of agents based on the topology of the ECG, rather than relying solely on policy neural networks.

Key Advantages of HCGL over Traditional MARL Models

One of the distinguishing features of HCGL is the utilization of the ECG’s topology as a guiding mechanism for agent behavior. This allows for the integration of cooperative knowledge into an extensible interface. By merging primitive actions and cooperative actions into a unified action space, HCGL enables the transfer of fundamental cooperative knowledge to new scenarios.

The multi-disciplinary nature of HCGL is also noteworthy. It combines concepts and techniques from graph theory, reinforcement learning, and cooperative behavior modeling to address the limitations of traditional MARL algorithms. This integration of different disciplines enhances HCGL’s capability to tackle complex multi-agent problems.

Experimental Results and Transferability

The HCGL model has been evaluated through experiments on multi-agent benchmarks with sparse rewards. The results demonstrate outstanding performance, showcasing the effectiveness of the hierarchical cooperative behaviors enabled by the ECG and the trained graph operators.

Furthermore, HCGL’s transferability to large-scale scenarios has been confirmed, with high zero-shot transfer success rates. This indicates that the knowledge and policies learned through HCGL can be effectively applied to new and unfamiliar environments.

Conclusion

Overall, Hierarchical Cooperation Graph Learning (HCGL) presents a promising approach for solving complex multi-agent problems that require hierarchical cooperative behaviors. By leveraging the dynamic Extensible Cooperation Graph (ECG) and a set of trained graph operators, HCGL offers a unique and interpretable framework for integrating cooperative knowledge. Its successful performance in experiments and high transferability rates further validate its efficacy. The multi-disciplinary nature of HCGL makes it a valuable contribution to the field of Multi-Agent Reinforcement Learning.

Read the original article