Cognitive maps are a proposed concept on how the brain efficiently organizes
memories and retrieves context out of them. The entorhinal-hippocampal complex
is heavily involved in episodic and relational memory processing, as well as
spatial navigation and is thought to built cognitive maps via place and grid
cells. To make use of the promising properties of cognitive maps, we set up a
multi-modal neural network using successor representations which is able to
model place cell dynamics and cognitive map representations. Here, we use
multi-modal inputs consisting of images and word embeddings. The network learns
the similarities between novel inputs and the training database and therefore
the representation of the cognitive map successfully. Subsequently, the
prediction of the network can be used to infer from one modality to another
with over $90%$ accuracy. The proposed method could therefore be a building
block to improve current AI systems for better understanding of the environment
and the different modalities in which objects appear. The association of
specific modalities with certain encounters can therefore lead to context
awareness in novel situations when similar encounters with less information
occur and additional information can be inferred from the learned cognitive
map. Cognitive maps, as represented by the entorhinal-hippocampal complex in
the brain, organize and retrieve context from memories, suggesting that large
language models (LLMs) like ChatGPT could harness similar architectures to
function as a high-level processing center, akin to how the hippocampus
operates within the cortex hierarchy. Finally, by utilizing multi-modal inputs,
LLMs can potentially bridge the gap between different forms of data (like
images and words), paving the way for context-awareness and grounding of
abstract concepts through learned associations, addressing the grounding
problem in AI.

In this article, the concept of cognitive maps and their potential applications in AI systems are discussed. Cognitive maps are theorized as a way the brain organizes memories and retrieves context from them efficiently. The entorhinal-hippocampal complex, which is involved in memory processing and spatial navigation, plays a key role in building these maps using place and grid cells.

To explore the benefits of cognitive maps, a multi-modal neural network using successor representations, capable of modeling place cell dynamics and cognitive map representations, is introduced. This network utilizes both images and word embeddings as inputs and learns the similarities between novel inputs and a training database to successfully represent the cognitive map. As a result, the network can predict and infer information from one modality to another with high accuracy.

By improving AI systems’ understanding of the environment and different modalities in which objects appear, this approach can enhance context awareness in novel situations with less information. The learned cognitive map enables the association of specific modalities with certain encounters, allowing additional information to be inferred from similar encounters.

The proposed use of cognitive maps in AI aligns with the way the entorhinal-hippocampal complex organizes and retrieves context from memories. This suggests that large language models (LLMs) like ChatGPT could adopt similar architectures to function as high-level processing centers, similar to the role of the hippocampus within the cortex hierarchy.

Additionally, by incorporating multi-modal inputs, LLMs have the potential to bridge the gap between different forms of data such as images and words. This paves the way for context-awareness and grounding of abstract concepts through learned associations, addressing the grounding problem in AI. The interdisciplinary nature of this concept combines neuroscience, machine learning, and cognitive science to create a more comprehensive understanding of cognition and memory processes that can be applied to artificial systems.

Read the original article