A fundamental trait of intelligence involves finding novel and creative
solutions to address a given challenge or to adapt to unforeseen situations.
Reflecting this, Quality-Diversity optimization is a family of Evolutionary
Algorithms, that generates collections of both diverse and high-performing
solutions. Among these, MAP-Elites is a prominent example, that has been
successfully applied to a variety of domains, including evolutionary robotics.
However, MAP-Elites performs a divergent search with random mutations
originating from Genetic Algorithms, and thus, is limited to evolving
populations of low-dimensional solutions. PGA-MAP-Elites overcomes this
limitation using a gradient-based variation operator inspired by deep
reinforcement learning which enables the evolution of large neural networks.
Although high-performing in many environments, PGA-MAP-Elites fails on several
tasks where the convergent search of the gradient-based variation operator
hinders diversity. In this work, we present three contributions: (1) we enhance
the Policy Gradient variation operator with a descriptor-conditioned critic
that reconciles diversity search with gradient-based methods, (2) we leverage
the actor-critic training to learn a descriptor-conditioned policy at no
additional cost, distilling the knowledge of the population into one single
versatile policy that can execute a diversity of behaviors, (3) we exploit the
descriptor-conditioned actor by injecting it in the population, despite network
architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher
QD score and coverage compared to all baselines on seven challenging continuous
control locomotion tasks.
A fundamental trait of intelligence involves finding novel and creative solutions to address challenges and adapt to unforeseen situations. Quality-Diversity (QD) optimization is an approach that aims to generate collections of diverse and high-performing solutions. One prominent example of QD optimization is MAP-Elites, which has been successfully applied in various domains, including evolutionary robotics. However, MAP-Elites is limited to evolving populations of low-dimensional solutions using random mutations from Genetic Algorithms.
PGA-MAP-Elites addresses this limitation by incorporating a gradient-based variation operator inspired by deep reinforcement learning. This enables the evolution of large neural networks, allowing for the exploration and optimization of high-dimensional solution spaces. Although PGA-MAP-Elites performs well in many environments, it suffers from a lack of diversity in certain tasks due to the convergent search behavior of the gradient-based variation operator.
In this study, the authors introduce DCG-MAP-Elites, which makes several contributions to overcome the limitations of PGA-MAP-Elites. Firstly, they enhance the Policy Gradient variation operator with a descriptor-conditioned critic. This approach combines diversity search with gradient-based methods, allowing for a more balanced exploration-exploitation trade-off. Secondly, they leverage actor-critic training to learn a descriptor-conditioned policy at no additional cost. This distilled knowledge of the population into a single versatile policy enables the execution of a wide variety of behaviors. Lastly, they inject the descriptor-conditioned actor into the population despite network architecture differences, further enhancing the capabilities of DCG-MAP-Elites.
The results of their experiments show that DCG-MAP-Elites achieves equal or higher QD scores and coverage compared to all baselines on seven challenging continuous control locomotion tasks. This demonstrates the effectiveness of the proposed method in balancing diversity and performance in high-dimensional solution spaces.
From a multi-disciplinary perspective, this research combines principles from evolutionary algorithms, reinforcement learning, and robotics. It showcases the power of integrating different fields to tackle complex problems. By incorporating deep learning techniques into traditional evolutionary algorithms, the authors are able to overcome limitations and achieve state-of-the-art results in the domain of locomotion tasks. This highlights the importance of cross-pollination of ideas and methodologies across disciplines.