arXiv:2508.11836v1 Announce Type: new
Abstract: World models are defined as a compressed spatial and temporal learned representation of an environment. The learned representation is typically a neural network, making transfer of the learned environment dynamics and explainability a challenge. In this paper, we propose an approach, Finite Automata Extraction (FAE), that learns a neuro-symbolic world model from gameplay video represented as programs in a novel domain-specific language (DSL): Retro Coder. Compared to prior world model approaches, FAE learns a more precise model of the environment and more general code than prior DSL-based approaches.

Expert Commentary

World modeling in artificial intelligence is a crucial aspect of developing intelligent agents that can navigate and interact with their environments effectively. In the realm of reinforcement learning, where agents learn through trial and error, having an accurate and efficient world model is essential for making informed decisions. The concept of world models as compressed spatial and temporal representations of an environment is a multi-disciplinary one, drawing from fields such as computer science, cognitive science, and neuroscience.

The use of neural networks in learning these world models presents challenges in terms of transferability and explainability. Neural networks are black box models that can be difficult to interpret, making it hard to understand how the learned representation corresponds to the actual environment dynamics. The proposed approach, Finite Automata Extraction (FAE), offers a novel solution by learning a neuro-symbolic world model from gameplay video using a domain-specific language called Retro Coder.

Neuro-symbolic Approach

Neuro-symbolic approaches combine neural networks with symbolic reasoning to leverage the strengths of both paradigms. By incorporating symbolic reasoning into the learning process, FAE aims to create a more precise and interpretable world model compared to traditional neural network-based approaches. The use of Retro Coder as a domain-specific language allows for the representation of gameplay video as programs, bridging the gap between the raw video data and a symbolic understanding of the environment.

Generalization and Precision

One of the key advantages of FAE is its ability to learn a more general code that captures the underlying structure of the environment. By extracting finite automata from gameplay video, FAE can identify patterns and regularities in the environment that might be missed by traditional neural network models. This capacity for generalization enables agents to make more robust decisions in novel situations, improving their overall performance in complex environments.

Overall, the integration of neuro-symbolic techniques with domain-specific languages represents an exciting development in the field of world modeling. By combining insights from neuroscience, computer science, and artificial intelligence, researchers are pushing the boundaries of what is possible in terms of creating intelligent agents that can understand and interact with their environments in a more human-like manner.

Read the original article