The present paper looks at one of the most thorough articles on the intelligence of GPT, research conducted by engineers at Microsoft. Although there is a great deal of value in their work, I will argue that, for familiar philosophical reasons, their methodology, !Blackbox Interpretability”#is wrongheaded. But there is a better way. There is an exciting and emerging discipline of !Inner Interpretability”#(and specifically Mechanistic Interpretability) that aims to uncover the internal activations and weights of models in order to understand what they represent and the algorithms they implement. In my view, a crucial mistake in Black-box Interpretability is the failure to appreciate that how processes are carried out matters when it comes to intelligence and understanding. I can#t pretend to have a full story that provides both necessary and sufficient conditions for being intelligent, but I do think that Inner Interpretability dovetails nicely with plausible philosophical views of what intelligence requires. So the conclusion is modest, but the important point in my view is seeing how to get the research on the right track. Towards the end of the paper, I will show how some of the philosophical concepts can be used to further refine how Inner Interpretability is approached, so the paper helps draw out a profitable, future two-way exchange between Philosophers and Computer Scientists.

Expert Commentary: Rethinking Blackbox Interpretability for AI Models

In the field of artificial intelligence, interpretability has become a hot topic as researchers seek to understand and make sense of the decisions and predictions made by complex models like GPT. The engineers at Microsoft have conducted a commendable study on the intelligence of GPT, but I believe their methodology, referred to as “Blackbox Interpretability,” falls short in providing a comprehensive understanding of the model’s inner workings.

The researchers focused on identifying patterns and correlations in the input-output relations of GPT, analyzing its performance on various tasks. While this approach can yield valuable insights, it fails to address the underlying question of how the model actually achieves its intelligence. It lacks a focus on the internal mechanisms and processes that drive GPT’s behavior.

This is where the emerging field of “Inner Interpretability” comes into play. Inner Interpretability aims to delve into the internal activations and weights of models to uncover their representations and algorithms. By understanding the intricacies of these inner workings, we can gain a deeper understanding of how intelligence is realized in AI systems.

One key flaw in Blackbox Interpretability is its disregard for the importance of process in determining intelligence and understanding. It assumes that only the input-output relations matter, overlooking the specific methods and algorithms employed by the model. However, intelligence is not solely determined by the ability to produce correct answers – it also involves reasoning, learning, and problem-solving processes.

To truly grasp the nature of intelligence, we need to consider both the input-output relations and the underlying processes. Inner Interpretability provides a promising avenue for achieving this by examining the internal activations and weights of models. We can analyze how different patterns and representations emerge within the model, shedding light on its decision-making processes.

While I do not propose a definitive framework for understanding intelligence, I believe that Inner Interpretability aligns well with philosophical perspectives on intelligence. By focusing on the mechanisms of AI systems, we can identify the necessary and sufficient conditions for intelligence.

Moreover, the collaboration between philosophers and computer scientists in this field is crucial. Philosophical concepts related to intelligence, such as intentionality and understanding, can guide the development of more refined approaches to Inner Interpretability. By bridging the gap between disciplines, we can promote a fruitful exchange of ideas and insights.

In conclusion, the current research on GPT’s intelligence by Microsoft engineers provides valuable insights but falls short in fully capturing the essence of intelligence. Inner Interpretability, with its emphasis on uncovering the internal activations and weights of models, offers a more comprehensive approach. By considering both input-output relations and underlying processes, we can gain a deeper understanding of AI systems’ intelligence. The integration of philosophical concepts further enhances this approach and promotes interdisciplinary collaboration for future advancements in AI interpretability.

Read the original article