arXiv:2407.03340v1 Announce Type: new
Abstract: The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition in multi-party conversation scenarios. Specifically, in the field of human-robot interaction, it becomes even more crucial to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, restricting the robot’s capability to estimate whether it was addressed and limiting its interactive skills. For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, to provide explanations for their decisions besides excellent performance. In our work, we a) present an addressee estimation model with improved performance in comparison with the previous SOTA; b) further modify this model to include inherently explainable attention-based segments; c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; d) propose several ways to incorporate explainability and transparency in the aforementioned architecture; and e) perform a pilot user study to analyze the effect of various explanations on how human participants perceive the robot.
Improving Addressee Estimation in Multi-Party Conversation Scenarios
Understanding to whom somebody is speaking is a fundamental task for human activity recognition in multi-party conversation scenarios. This becomes even more crucial in the field of human-robot interaction, as it enables social robots to actively participate in interactive contexts. However, the traditional approach of treating addressee estimation as a binary classification task limits the robot’s capability to only determine whether it was addressed or not, restricting its interactive skills.
In our work, we propose a novel addressee estimation model that not only outperforms the previous state-of-the-art model in terms of performance but also incorporates explainability as a crucial component. Explainable artificial intelligence (XAI) has gained significant attention in recent years due to its potential to provide explanations for the decisions made by machine learning models. By including explainability in addressee estimation, we aim to enhance the transparency and trustworthiness of social robots in human-robot interaction scenarios.
Inherently Explainable Attention-Based Segments
One of the key innovations in our model is the incorporation of inherently explainable attention-based segments. Attention mechanisms have been widely used in natural language processing tasks to improve the performance of models by focusing on relevant information. By using attention-based segments, we not only improve the performance of addressee estimation but also provide interpretable explanations for the model’s decisions.
These attention-based segments highlight the specific parts of the conversation that the model attends to when inferring the addressee. By visualizing these segments, the robot can provide human users with a transparent explanation of why it made a particular decision. This adds an additional layer of interpretability and can help build trust between humans and robots.
Modular Cognitive Architecture for Multi-Party Conversation
To deploy the explainable addressee estimation model, we integrate it into a modular cognitive architecture designed for multi-party conversation in an iCub robot. The modular architecture allows for the seamless incorporation of explainability and transparency features into the robot’s interactive capabilities.
For example, the architecture includes dedicated modules for generating explanations based on the attention-based segments. These explanations can be presented to human participants in various ways, such as through text, speech, or visualizations. The flexibility of the modular architecture enables us to adapt the explanations to the preferences and understanding of individual users.
Evaluating the Effect of Explanations on Human Perception
As part of our research, we conducted a pilot user study to analyze the impact of different explanations on how human participants perceive the robot. By presenting participants with variations of explanations, ranging from simple textual descriptions to rich visualizations, we aimed to understand the influence of different levels of transparency and explainability on user trust and acceptance of the robot.
Through this study, we gained valuable insights into the effectiveness of different explanation types and their impact on human-robot interaction. This knowledge could inform future design decisions in developing social robots that can effectively communicate their decision-making processes to humans in a transparent and understandable manner.
Conclusion
The combination of improved addressee estimation performance, the inclusion of inherently explainable attention-based segments, and the integration into a modular cognitive architecture lays the foundation for social robots that can actively participate in multi-party conversations with enhanced transparency and explainability. As a multi-disciplinary endeavor, our work bridges the fields of machine learning, human-robot interaction, and explainable artificial intelligence, pushing the boundaries of what social robots can achieve in interactive contexts.