“eScope: Estimating Power Consumption on Mobile Platforms for Streaming Applications”

“eScope: Estimating Power Consumption on Mobile Platforms for Streaming Applications”

Abstract:

Managing the limited energy on mobile platforms executing long-running, resource intensive streaming applications requires adapting an application’s operators in response to their power consumption. For example, the frame refresh rate may be reduced if the rendering operation is consuming too much power. Currently, predicting an application’s power consumption requires (1) building a device-specific power model for each hardware component, and (2) analyzing the application’s code. This approach can be complicated and error-prone given the complexity of an application’s logic and the hardware platforms with heterogeneous components that it may execute on.

We propose eScope, an alternative method to directly estimate power consumption by each operator in an application. Specifically, eScope correlates an application’s execution traces with its device-level energy draw. We implement eScope as a tool for Android platforms and evaluate it using workloads on several synthetic applications as well as two video stream analytics applications. Our evaluation suggests that eScope predicts an application’s power use with 97% or better accuracy while incurring a compute time overhead of less than 3%.

Commentary:

The management of limited energy on mobile platforms is becoming increasingly important as streaming applications continue to grow in popularity. These resource-intensive applications, such as video streaming, often require optimal power consumption to ensure smooth operation and user satisfaction. Adapting an application’s operators in response to their power consumption is a key strategy to achieve this optimization.

Traditionally, predicting the power consumption of an application has been a complex and error-prone process. It involved building device-specific power models for each hardware component and analyzing the application’s code. This approach becomes even more challenging when considering the complexity of the application’s logic and the heterogeneity of the hardware platforms it may run on.

The proposed eScope method offers an innovative solution to estimate power consumption directly by each operator in an application. By correlating an application’s execution traces with its device-level energy draw, eScope provides a more accurate and efficient way to predict power use. The implementation of eScope as a tool for Android platforms further increases its practicality and accessibility for developers and system administrators.

The evaluation of eScope using workloads on multiple synthetic applications and video stream analytics applications demonstrates its effectiveness. With a prediction accuracy of 97% or better, eScope proves to be a reliable tool for estimating power consumption. Moreover, the compute time overhead of less than 3% ensures that eScope’s estimation does not significantly impact the application’s performance.

Overall, the eScope method presents a promising approach to managing power consumption in mobile platforms executing long-running streaming applications. By simplifying the prediction process and offering accurate estimations, eScope has the potential to enhance the energy efficiency and performance of such applications, ultimately improving the user experience on mobile devices.

Read the original article

“Introducing Surprise into Recommender Systems with Knowledge Graphs”

“Introducing Surprise into Recommender Systems with Knowledge Graphs”

arXiv:2405.08465v1 Announce Type: cross
Abstract: Traditional recommendation proposals, including content-based and collaborative filtering, usually focus on similarity between items or users. Existing approaches lack ways of introducing unexpectedness into recommendations, prioritizing globally popular items over exposing users to unforeseen items. This investigation aims to design and evaluate a novel layer on top of recommender systems suited to incorporate relational information and suggest items with a user-defined degree of surprise. We propose a Knowledge Graph (KG) based recommender system by encoding user interactions on item catalogs. Our study explores whether network-level metrics on KGs can influence the degree of surprise in recommendations. We hypothesize that surprisingness correlates with certain network metrics, treating user profiles as subgraphs within a larger catalog KG. The achieved solution reranks recommendations based on their impact on structural graph metrics. Our research contributes to optimizing recommendations to reflect the metrics. We experimentally evaluate our approach on two datasets of LastFM listening histories and synthetic Netflix viewing profiles. We find that reranking items based on complex network metrics leads to a more unexpected and surprising composition of recommendation lists.

Designing a Novel Layer on Top of Recommender Systems

Traditional recommendation systems have typically focused on similarity between items or users, resulting in recommendations that are often predictable and lack surprise. However, introducing unexpectedness into recommendations can be crucial in providing users with fresh and novel experiences. In this article, we explore a novel layer that can be added on top of existing recommender systems to incorporate relational information and suggest items with a user-defined degree of surprise.

The Multi-disciplinary Nature of the Concepts

The concepts discussed in this investigation draw upon several disciplines, combining concepts from recommender systems, knowledge graphs, and complex network analysis. By incorporating knowledge graphs, which represent relationships between various entities, we can leverage the structural properties of the graph to influence the surprise factor in recommendations. Additionally, complex network metrics are used to analyze the impact of recommendations on the overall structure of the knowledge graph.

Relationship to Multimedia Information Systems

Recommendation systems play a crucial role in multimedia information systems by helping users discover relevant and engaging multimedia content. By incorporating a surprise factor into recommendations, the proposed approach can enhance the user experience by suggesting items that users may not have encountered otherwise. This can lead to a more diverse and engaging multimedia consumption, catering to the specific preferences and interests of individual users.

Related to Animations, Artificial Reality, Augmented Reality, and Virtual Realities

In the realm of animations, artificial reality, augmented reality, and virtual realities, recommendations play a vital role in guiding users towards immersive and enjoyable experiences. By incorporating surprise into these recommendations, users can be exposed to unexpected and exciting content that expands their horizons and enhances their engagement. This can be especially valuable in these fields where users are often seeking novel and immersive experiences.

Evaluation and Results

The proposed approach was evaluated on two datasets: LastFM listening histories and synthetic Netflix viewing profiles. The results showed that reranking items based on complex network metrics led to a more unexpected and surprising composition of recommendation lists. This indicates that the incorporation of network-level metrics can indeed influence the degree of surprise in recommendations, providing users with a more diverse and engaging set of suggestions.

In conclusion, the addition of a knowledge graph-based layer on top of existing recommender systems can enhance the surprise factor in recommendations. By leveraging the structural properties of the graph and incorporating complex network metrics, the proposed approach provides users with a more diverse and unexpected set of recommendations. This has wide-ranging applications in multimedia information systems, animations, artificial reality, augmented reality, and virtual realities, enhancing user experiences and expanding their horizons.

Read the original article

“AI Turing Test: GPT-4 Passes as Human 54% of the Time”

“AI Turing Test: GPT-4 Passes as Human 54% of the Time”

The recent study conducted on three different AI systems (ELIZA, GPT-3.5, and GPT-4) using a randomized and controlled Turing test has provided valuable insights into the capabilities and limitations of artificial intelligence. The primary objective of this test was to determine if an AI system could successfully deceive human participants into believing they were conversing with another human. The results not only shed light on the level of AI’s progress but also highlight potential consequences and challenges.

Advancements in AI’s Conversational Ability:

The most notable finding of this study is that GPT-4, the most recent AI system, managed to convince participants it was human in 54% of the conversations. This achievement marks a significant milestone because prior to this, no artificial system had ever demonstrated the ability to pass the Turing test in such an extensive and reliable manner.

While GPT-4’s success rate falls short compared to interactions with actual humans, its performance surpasses that of ELIZA, a classic AI system from the 1960s, which only convinced participants in 22% of the conversations. This improvement highlights the rapid development of AI technology and the continuous efforts at enhancing conversational abilities.

Implications for Machine Intelligence:

These results have far-reaching implications for discussions and debates surrounding machine intelligence. They signal that artificial systems are progressing towards human-like conversational skills, challenging and pushing the boundaries of our understanding of intelligence. The ability to convincingly emulate human conversation raises questions about the defining characteristics of human intelligence and suggests that traditional notions of intelligence may need to be reevaluated.

Moreover, the study’s outcome suggests that AI systems could be capable of deceiving users undetected. With GPT-4’s ability to mimic human conversation by fooling participants over half the time, there is a pressing need to develop robust methods for detecting AI deception. This insight is particularly urgent given the potential misuse and ethical challenges that could arise when AI systems are able to intentionally deceive users.

Factors Influencing Turing Test Performance:

Upon analyzing the strategies and reasoning employed by participants, it was observed that stylistic and socio-emotional factors played a substantial role in determining whether an AI system was perceived as human. This finding adds nuance to the traditional understanding of intelligence as purely cognitive. It indicates that characteristics such as empathy, humor, and personal expression contribute significantly to the perception of human-like conversation.

Conclusion:

The results of this study provide substantial evidence that current AI systems are rapidly advancing and approaching human-like conversational abilities. The ability of GPT-4 to pass the Turing test over half the time is a remarkable achievement. However, these advancements also raise ethical concerns, particularly regarding the possibility of undetected deception by AI systems. The study’s findings emphasize the need for continued research and development in the field of AI, ensuring that AI systems align with ethical standards and users remain able to distinguish between humans and machines.

Read the original article

Title: “MM-InstructEval: A Comprehensive Framework for Evaluating Multimodal Large Language Models

Title: “MM-InstructEval: A Comprehensive Framework for Evaluating Multimodal Large Language Models

arXiv:2405.07229v1 Announce Type: new
Abstract: The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrate both visual and text contexts. Furthermore, tasks that demand reasoning across multiple modalities pose greater challenges and require a deep understanding of multimodal contexts. In this paper, we introduce a comprehensive assessment framework named MM-InstructEval, which integrates a diverse array of metrics to provide an extensive evaluation of the performance of various models and instructions across a broad range of multimodal reasoning tasks with vision-text contexts. MM-InstructEval enhances the research on the performance of MLLMs in complex multimodal reasoning tasks, facilitating a more thorough and holistic zero-shot evaluation of MLLMs. We firstly utilize the “Best Performance” metric to determine the upper performance limit of each model across various datasets. The “Mean Relative Gain” metric provides an analysis of the overall performance across different models and instructions, while the “Stability” metric evaluates their sensitivity to variations. Historically, the research has focused on evaluating models independently or solely assessing instructions, overlooking the interplay between models and instructions. To address this gap, we introduce the “Adaptability” metric, designed to quantify the degree of adaptability between models and instructions. Evaluations are conducted on 31 models (23 MLLMs) across 16 multimodal datasets, covering 6 tasks, with 10 distinct instructions. The extensive analysis enables us to derive novel insights.

As the field of multimodal large language models (MLLMs) continues to advance, there is a growing need for comprehensive evaluation frameworks that can assess the performance of these models in complex multimodal reasoning tasks. The MM-InstructEval framework introduced in this paper aims to fill this gap by providing a diverse set of metrics to evaluate the performance of MLLMs across a broad range of tasks that integrate both visual and text contexts.

Multi-disciplinary Nature

The concepts discussed in this paper have a multi-disciplinary nature, spanning multiple fields such as natural language processing, computer vision, and human-computer interaction. By evaluating the performance of MLLMs in multimodal reasoning tasks, this research contributes to the development of more advanced and comprehensive multimedia information systems. These systems can utilize both textual and visual information to facilitate better understanding, decision-making, and interaction between humans and machines.

Related to Multimedia Information Systems

The MM-InstructEval framework is directly related to the field of multimedia information systems. These systems deal with the retrieval, management, and analysis of multimedia data, including text, images, and videos. By evaluating the performance of MLLMs in multimodal reasoning tasks, this framework enables the development of more effective multimedia information systems that can understand and reason over diverse modalities of data, improving the accuracy and usefulness of information retrieval and analysis tasks.

Related to Animations, Artificial Reality, Augmented Reality, and Virtual Realities

The evaluation of MLLMs in multimodal reasoning tasks has implications for various aspects of animations, artificial reality, augmented reality, and virtual realities. These technologies often rely on both visual and textual information to create immersive and interactive experiences. By improving the performance of MLLMs in understanding and reasoning across multimodal contexts, the MM-InstructEval framework can enhance the quality and realism of animations, artificial reality simulations, and augmented reality applications. It can also enable more intelligent virtual reality environments that can understand and respond to user instructions and queries more accurately and effectively.

Novel Insights from the Evaluation

The extensive analysis conducted using the MM-InstructEval framework on 31 models across 16 multimodal datasets and 6 tasks provides novel insights into the performance of MLLMs in complex reasoning tasks. The “Best Performance” metric helps determine the upper performance limit of each model, giving a baseline for comparison. The “Mean Relative Gain” metric provides an overall analysis of performance across different models and instructions, highlighting the strengths and weaknesses of each. The “Stability” metric evaluates the models’ sensitivity to variations, ensuring robustness. Lastly, the “Adaptability” metric measures the degree of adaptability between models and instructions, shedding light on the interplay between them.

By considering these metrics and conducting a comprehensive evaluation, researchers and developers can better understand the capabilities and limitations of MLLMs in multimodal reasoning tasks. This knowledge can inform the development of more advanced MLLMs, as well as the design and implementation of multimedia information systems, animations, artificial reality experiences, augmented reality applications, and virtual reality environments.

Read the original article

Levels of AI Agents: From Basic Perception to Advanced Collaboration

Levels of AI Agents: From Basic Perception to Advanced Collaboration

AI agents, or artificial intelligence agents, are entities that are designed to perceive their environment, make decisions, and take actions based on that perception. These agents are often categorized and classified based on their level of autonomy and sophistication. Inspired by the 6 levels of autonomous driving defined by the Society of Automotive Engineers, AI agents can also be classified into different levels based on their utilities and strength.

The Levels of AI Agents

The levels of AI agents are as follows:

  1. Level 0 (L0): At this level, AI tools are used to account for perception and action. These tools can assist humans in certain tasks but do not possess any independent AI capabilities.
  2. Level 1 (L1): At this level, AI agents use rule-based AI systems. These agents can follow predefined rules and guidelines to make decisions and take actions.
  3. Level 2 (L2): At this level, rule-based AI is replaced by IL/RL-based AI systems. IL refers to imitation learning, where agents learn by observing and imitating human behavior. RL refers to reinforcement learning, where agents learn by trial and error. L2 AI agents also incorporate reasoning and decision-making capabilities.
  4. Level 3 (L3): L3 AI agents utilize LLM-based AI systems, which stands for logic and learning model-based AI. These agents can reason and make decisions based on logical and learned models. Additionally, L3 AI agents have the ability to set up memory and reflection, allowing them to learn from past experiences and improve future decision-making.
  5. Level 4 (L4): Building upon L3, L4 AI agents facilitate autonomous learning and generalization. These agents have the capability to continuously learn and improve their performance without human intervention. They can adapt to new environments and situations.
  6. Level 5 (L5): At the highest level, L5 AI agents not only possess all the capabilities of L4 agents but also have the ability to exhibit emotions, character, and collaborative behavior with other agents. They can interact and work together with multiple agents to achieve common goals.

Analysis and Expert Insights

The categorization of AI agents into different levels allows us to understand and evaluate the capabilities of these agents. It provides a framework to assess the current state of AI technology and anticipate future advancements.

At present, most AI agents fall under the lower levels of autonomy (L0 to L2). These agents are proficient in specific tasks and can follow predefined rules or learn from human demonstrations. However, they lack the ability to reason, reflect, and adapt to novel situations.

As we move towards higher levels of autonomy (L3 to L5), AI agents become more sophisticated and capable of independent decision-making. L3 agents, with their memory and reflection capabilities, can learn from past experiences and improve their future performance. L4 agents take this a step further by enabling autonomous learning and generalization, allowing them to adapt to new environments and challenges.

The highest level of autonomy, L5, represents the ultimate vision of AI agents, where they possess not only advanced cognitive abilities but also emotional intelligence and social skills. These agents can collaborate and interact with other agents, exhibiting human-like characteristics.

Looking ahead, the development and advancement in AI technologies will likely drive the progression from lower-level agents to higher-level agents. The focus will be on enhancing the reasoning, learning, and decision-making capabilities of AI agents, enabling them to operate in complex and dynamic environments.

It is important to note that while the categorization of AI agents into levels provides a useful framework, the boundaries between these levels may not always be clear-cut. AI technologies are rapidly evolving, and we may witness the emergence of hybrid agents that possess characteristics from multiple levels.

In conclusion, the levels of AI agents provide a roadmap for the development and evaluation of AI technologies. It demonstrates the potential for AI agents to become increasingly autonomous, intelligent, and collaborative in the future.

Read the original article

“Novel Perceptual Crack Detection Method for 3D Textured Meshes”

“Novel Perceptual Crack Detection Method for 3D Textured Meshes”

arXiv:2405.06143v1 Announce Type: cross
Abstract: Recent years have witnessed many advancements in the applications of 3D textured meshes. As the demand continues to rise, evaluating the perceptual quality of this new type of media content becomes crucial for quality assurance and optimization purposes. Different from traditional image quality assessment, crack is an annoying artifact specific to rendered 3D meshes that severely affects their perceptual quality. In this work, we make one of the first attempts to propose a novel Perceptual Crack Detection (PCD) method for detecting and localizing crack artifacts in rendered meshes. Specifically, motivated by the characteristics of the human visual system (HVS), we adopt contrast and Laplacian measurement modules to characterize crack artifacts and differentiate them from other undesired artifacts. Extensive experiments on large-scale public datasets of 3D textured meshes demonstrate effectiveness and efficiency of the proposed PCD method in correct localization and detection of crack artifacts. %Specifically, We propose a full-reference crack artifact localization method that operates on a pair of input snapshots of distorted and reference 3D objects to generate a final crack map. Moreover, to quantify the performance of the proposed detection method and validate its effectiveness, we propose a simple yet effective weighting mechanism to incorporate the resulting crack map into classical quality assessment (QA) models, which creates significant performance improvement in predicting the perceptual image quality when tested on public datasets of static 3D textured meshes. A software release of the proposed method is publicly available at: https://github.com/arshafiee/crack-detection-VVM

The Importance of Perceptual Crack Detection in 3D Textured Meshes

Advancements in the field of 3D textured meshes have been rapidly increasing, leading to a rise in demand for evaluating the perceptual quality of this type of media content. Ensuring the quality of rendered 3D meshes is crucial for various applications, including virtual reality, gaming, and architectural design. One particular artifact that significantly affects the perceptual quality of these meshes is cracks.

Cracks in rendered 3D meshes are annoying visual artifacts that can distort the overall appearance and realism of the content. Therefore, accurately detecting and localizing these crack artifacts is essential for quality assurance and optimization purposes.

The novel Perceptual Crack Detection (PCD) method proposed in this work aims to address this issue. The authors take inspiration from the human visual system (HVS) and adopt contrast and Laplacian measurement modules to characterize crack artifacts and distinguish them from other undesired artifacts. This approach leverages the unique characteristics of the HVS to improve the accuracy of crack detection and localization.

Extensive experiments on large-scale public datasets of 3D textured meshes have been conducted to evaluate the effectiveness and efficiency of the proposed PCD method. The results demonstrate that the method successfully localizes and detects crack artifacts, showcasing its potential for integration into quality assessment (QA) models. The authors also propose a weighting mechanism that incorporates the crack map generated by the PCD method into classical QA models, leading to a significant improvement in predicting the perceptual image quality.

This research highlights the multi-disciplinary nature of the concepts involved. It combines knowledge from computer graphics, human visual perception, and image quality assessment to tackle the specific challenge of crack detection in rendered 3D meshes. The proposed method provides a valuable tool for industry professionals and researchers working in the field of multimedia information systems, animations, artificial reality, augmented reality, and virtual realities.

In conclusion, the introduction of the Perceptual Crack Detection (PCD) method addresses a critical issue in the evaluation of 3D textured meshes. By leveraging the characteristics of the human visual system, this approach effectively detects and localizes crack artifacts, enhancing the overall perceptual quality of rendered 3D meshes. The multi-disciplinary nature of the research makes it relevant to the wider field of multimedia information systems and various applications involving animations, artificial reality, augmented reality, and virtual realities.

Read the original article