by jsendak | Jun 14, 2024 | Computer Science
Large language models have been making waves in the field of natural language processing (NLP) with their impressive performance on various tasks. However, these models often come with high computational demands, making it challenging to deploy them in real-world applications. This is where the utilization of CPUs for accelerating the inference of large language models becomes crucial.
In their paper, the authors propose a parallelized approach to enhance the throughput of large language models. They achieve this by leveraging the parallel processing capabilities of modern CPU architectures and batching the inference requests. Through extensive evaluation, they demonstrate that their accelerated inference engine provides a substantial improvement in the number of tokens generated per second. Specifically, they report an impressive 18-22x improvement in throughput, which becomes even more significant for longer sequences and larger models.
One interesting finding discussed in the paper is the ability to run multiple workers in the same machine with NUMA (non-uniform memory access) node isolation. By doing so, the authors observe a 4x additional improvement in tokens per second, as reflected in Table 2. This scalability is essential for handling high-volume tasks efficiently and can greatly benefit Gen-AI based products and companies.
Moreover, the implications of using CPUs for inference go beyond just performance gains. The authors highlight the potential environmental benefits of reducing power consumption, estimating a remarkable 48.9% reduction in CPU usage for inference. This not only makes large language models more sustainable but also demonstrates the feasibility of achieving production-ready throughput and latency while maintaining an eco-friendly approach.
In conclusion, this paper presents a promising approach to address the computational demands of deploying large language models for real-world applications. By leveraging CPUs and implementing parallelization techniques, the authors achieve significant improvements in both throughput and environmental sustainability. These findings pave the way for more efficient and scalable deployment of large language models, opening up exciting possibilities for further advancements in NLP.
Read the original article
by jsendak | Jun 13, 2024 | Computer Science
arXiv:2406.06888v1 Announce Type: new
Abstract: 3D meshes are one of the main components of Virtual Reality applications. However, many network and computational resources are required to process 3D meshes in real-time. A potential solution to this challenge is to dynamically adapt the Level of Detail (LoD) of a 3D mesh based on the object’s position and the user’s viewpoint. In this paper, we conduct a subjective study to investigate users’ quality perception of 3D meshes with dynamic Level of Detail in a Virtual Reality environment. The subjective experiment is carried out with five 3D meshes of different characteristics, four Levels of Detail, and four distance settings. The results of the experiment show that the impact of the dynamic level of detail depends on both the position of the 3D object in the virtual world and the number of vertices of the original mesh. In addition, we present a quality model that can accurately predict the MOS score of a LoD version of a 3D mesh from the number of vertices and the distance from the viewpoint.
Analysis of the Impact of Dynamic Level of Detail on Users’ Quality Perception of 3D Meshes in Virtual Reality
Introduction
Virtual Reality (VR) applications heavily rely on 3D meshes to create immersive and interactive environments. However, processing these meshes in real-time can be resource-intensive. To address this challenge, researchers have explored dynamically adapting the Level of Detail (LoD) of 3D meshes based on the object’s position and the user’s viewpoint. This paper presents the findings of a subjective study that investigates users’ quality perception of 3D meshes with dynamic LoD in a VR environment.
Multi-disciplinary Nature
This research is highly multi-disciplinary, intersecting several fields including multimedia information systems, animations, artificial reality, augmented reality (AR), and virtual realities (VR). The study combines elements of computer graphics, human-computer interaction, and perception psychology to understand how users perceive the quality of 3D meshes in VR environments. By conducting a subjective experiment, the study integrates both technical and human factors to provide valuable insights into optimizing the rendering of 3D meshes in VR applications.
Main Findings
The study utilized five 3D meshes with different characteristics, incorporating four levels of detail and four distance settings. The results of the experiment revealed several interesting findings. Firstly, the impact of dynamic LoD on users’ quality perception varied depending on the position of the 3D object in the virtual world. This suggests that certain areas or scenes in the VR environment may require higher or lower levels of detail to maintain visual fidelity.
Secondly, the study found that the number of vertices in the original mesh significantly influenced the impact of dynamic LoD. Meshes with a higher number of vertices were found to benefit more from dynamic LoD adjustments. This finding implies that allocating computational resources to dynamically adjust LoD is more beneficial for complex and detailed meshes, as opposed to simpler ones.
Quality Model
To further enhance the understanding of users’ quality perception, the researchers developed a quality model. This model accurately predicts the Mean Opinion Score (MOS) of a LoD version of a 3D mesh by considering the number of vertices and the distance from the user’s viewpoint. This model can be valuable in optimizing the rendering pipeline of VR applications, allowing developers to dynamically allocate computational resources based on the characteristics of the 3D meshes and the viewing conditions.
Implications
The findings of this study have significant implications for the field of VR and other related disciplines. By understanding users’ quality perception of 3D meshes with dynamic LoD, researchers and developers can optimize resource allocation, striking a balance between visual fidelity and computational efficiency. This research also opens avenues for exploring adaptive rendering techniques in VR, where the quality of the 3D meshes can dynamically change based on user interaction and the visual context.
Conclusion
In conclusion, this study highlights the importance of dynamic Level of Detail in enhancing users’ quality perception of 3D meshes in VR environments. Through a subjective experiment and the development of a quality model, the authors provide valuable insights into the impact of dynamic LoD on users’ perception and present a means to predict quality scores. These findings contribute to the wider field of multimedia information systems, animations, artificial reality, augmented reality, and virtual realities, shaping the future of interactive and immersive visual experiences.
Read the original article
by jsendak | Jun 13, 2024 | Computer Science
Analysis of the Content
The article highlights the increasing complexity of modern System-on-Chip (SoC) designs due to technology upscaling. With multiple asynchronous clock domains in SoC designs, the complexity is further amplified, making functional verification a critical step to ensure no bugs escape.
The use of a Globally-Asynchronous Locally-Synchronous (GALS) approach in SoC designs helps improve power efficiency. However, this approach also introduces Clock Domain Crossings (CDC), which are susceptible to metastability effects. Metastability refers to the uncertainty of a signal’s state during a change in clock domain, which can lead to bugs or malfunctions.
Conventional verification methods, such as register transfer level (RTL) simulations and static timing analysis, may not fully address these CDC issues, leaving potential verification gaps. As a result, identifying CDC-related bugs becomes time-consuming and can result in expensive silicon re-spins.
Expert Insights
To tackle CDC issues effectively, the article proposes the development of a pragmatic formal verification methodology. This methodology aims to minimize CDC issues by exercising Metastability Injection (MSI) in different CDC paths.
Formal verification is a powerful technique that uses mathematical algorithms to exhaustively analyze the behavior of a design. By injecting metastability in strategic locations, the verification process can explore potential bugs and identify areas of concern that may escape through traditional verification methods.
By proactively injecting metastability, designers can subject their designs to worst-case scenarios and evaluate the behavior of the system under such conditions. This approach enhances the robustness of the design and helps catch potential bugs early in the development phase.
Furthermore, adopting a formal verification methodology for CDC issues can help reduce the time and effort required to identify and fix bugs. Traditional methods often rely on debugging techniques and iterative simulations, which can be time-consuming, especially in complex designs with numerous CDC paths. Formal verification, on the other hand, allows for automatic analysis and the generation of counterexamples, aiding in the root cause analysis of bugs.
Future Implications
The development of a pragmatic formal verification methodology for CDC issues is a significant advancement in ensuring the reliability and integrity of SoC designs. As technology continues to evolve, SoCs will become even more complex, and the presence of multiple asynchronous clock domains will be more prevalent.
By laying the foundation for a systematic and efficient verification process, this methodology sets the stage for future advancements in SoC design verification. It brings confidence to designers that they can thoroughly assess and validate complex designs, reducing the risk of bugs escaping into production devices.
As the industry moves towards advanced nodes and new technologies, such as 5G, IoT, and artificial intelligence, the complexity of SoC designs will only increase. Thus, the need for robust and comprehensive verification methodologies, including formal verification for CDC issues, will become even more critical in ensuring the reliability and functionality of these advanced systems.
Read the original article
by jsendak | Jun 12, 2024 | Computer Science
arXiv:2406.05205v1 Announce Type: cross
Abstract: This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://cplip.github.io/
Integrating Images and Text in Histopathology: The CPLIP Approach
Histopathology, the study of cellular changes in tissues, plays a crucial role in the diagnosis and treatment of various diseases. The integration of images and text in this field holds great potential for improving the accuracy and efficiency of tasks such as classification and segmentation. The recently proposed Comprehensive Pathology Language Image Pre-training (CPLIP) technique presents a novel approach to aligning images and text in histopathology without the need for ground truth annotations.
One of the key strengths of CPLIP lies in its ability to leverage extensive data in an unsupervised manner. By constructing a pathology-specific dictionary and utilizing language models, textual descriptions can be generated for histopathology images. This creates a valuable bridge between the visual and textual modalities, enabling the seamless retrieval of relevant images for each text snippet using a pre-trained model. Furthermore, CPLIP incorporates a many-to-many contrastive learning method to fine-tune the model, ensuring complex interrelated concepts are accurately aligned across both images and text.
The multidisciplinary nature of CPLIP makes it a significant contribution to the wider field of multimedia information systems. By integrating both computer vision and natural language processing techniques, CPLIP serves as a powerful tool in bridging the gap between visual and textual data. This not only enhances the interpretability and robustness of vision-language models, but also expands the possibilities for developing advanced applications in the domain of histopathology.
Moreover, CPLIP showcases the potential applications of vision-language models in the realm of augmented reality (AR) and virtual reality (VR). With the ability to seamlessly integrate images and text, CPLIP provides a foundation for creating immersive AR/VR experiences in the field of histopathology. Researchers and developers can leverage this technique to build interactive educational platforms, where users are not only able to observe histopathology images but also receive relevant textual information for a deeper understanding.
The achievements of CPLIP in zero-shot learning scenarios further underline its significance in the field. By outperforming existing methods, CPLIP sets a higher benchmark for the application of vision-language models in histopathology. Its improved interpretability and robustness make it a valuable asset in medical research, diagnosis, and treatment.
The availability of the CPLIP code on GitHub further promotes openness and collaboration in the research community. This encourages further exploration and replication of CPLIP, fostering the development of new techniques and ideas in the integration of images and text in histopathology.
In conclusion, the CPLIP technique presents a breakthrough approach to aligning images and text in histopathology. Its unsupervised nature, multidisciplinary integration, and potential applications in augmented and virtual realities make it a significant advancement in the field of multimedia information systems. As researchers delve deeper into the possibilities offered by CPLIP and build upon its foundations, the future of image-text integration in histopathology appears promising and impactful.
Read the original article
by jsendak | Jun 12, 2024 | Computer Science
Biometric authentication is becoming increasingly popular as a secure and convenient way to authenticate individuals. However, there are concerns about the privacy of biometric data and the security of the key exchange process. In this article, a novel biometric-authenticated key exchange protocol is introduced that addresses these concerns.
The protocol allows secure and privacy-preserving key establishment between a stateless biometric sensing system and a “smart” user token that possesses biometric templates of the user. The protocol ensures mutual positive authentication, meaning that both parties positively authenticate each other before exchanging any significant information.
One of the key features of the protocol is that it only exchanges randomized data and cryptographically derived verifiers, without exchanging any significant information regarding the biometric templates or feature vectors. This ensures that even if an attacker intercepts the communication, they cannot gain any useful information.
The protocol utilizes the BBKDF scheme for feature vector matching, allowing for the comparison of multiple biometric modalities. This means that multiple biometric characteristics, such as fingerprint and iris scan, can be used for authentication.
The protocol also allows for online authentication, where the biometric sensing system can send multiple queries derived from different sensor data samples in one or more rounds. The user token is designed to efficiently handle these queries, minimizing the computational burden on the user token.
One of the key advantages of this protocol is that it does not require user registration in advance. This means that a user can start using the system without any prior enrollment process, making it more user-friendly and convenient.
Finally, the protocol is bidirectionally privacy-preserving. This means that unless mutual authentication is achieved first, neither the biometric sensing system nor the user token can gain any useful information regarding the biometric template or the sensor-data-derived feature vectors. This ensures the privacy of the users’ biometric data.
Overall, this novel biometric-authenticated key exchange protocol addresses the privacy and security concerns associated with biometric authentication. Its use of randomized data and cryptographically derived verifiers, along with its support for multiple biometric modalities, online authentication, and bidirectional privacy preservation, make it a promising solution for secure and privacy-preserving key establishment in biometric systems.
Read the original article
by jsendak | Jun 10, 2024 | Computer Science
arXiv:2406.04632v1 Announce Type: new
Abstract: This paper presents a cross-layer video delivery scheme, StreamOptix, and proposes a joint optimization algorithm for video delivery that leverages the characteristics of the physical (PHY), medium access control (MAC), and application (APP) layers. Most existing methods for optimizing video transmission over different layers were developed individually. Realizing a cross-layer design has always been a significant challenge, mainly due to the complex interactions and mismatches in timescales between layers, as well as the presence of distinct objectives in different layers. To address these complications, we take a divide-and-conquer approach and break down the formulated cross-layer optimization problem for video delivery into three sub-problems. We then propose a three-stage closedloop optimization framework, which consists of 1) an adaptive bitrate (ABR) strategy based on the link capacity information from PHY, 2) a video-aware resource allocation scheme accounting for the APP bitrate constraint, and 3) a link adaptation technique utilizing the soft acknowledgment feedback (soft-ACK). The proposed framework also supports the collections of the distorted bitstreams transmitted across the link. This allows a more reasonable assessment of video quality compared to many existing ABR methods that simply neglect the distortions occurring in the PHY layer. Experiments conducted under various network settings demonstrate the effectiveness and superiority of the new cross-layer optimization strategy. A byproduct of this study is the development of more comprehensive performance metrics on video delivery, which lays down the foundation for extending our system to multimodal communications in the future. Code for reproducing the experimental results is available at https://github.com/Evan-sudo/StreamOptix.
Cross-Layer Video Delivery: Taking Optimization to the Next Level
As the demand for high-quality video content continues to grow, optimizing video delivery becomes more important than ever. In this paper, the authors present StreamOptix, a cross-layer video delivery scheme that leverages the characteristics of the physical (PHY), medium access control (MAC), and application (APP) layers. By combining these layers and developing a joint optimization algorithm, they aim to overcome the challenges and achieve superior video transmission results.
One of the main challenges in optimizing video delivery is the complex interactions and mismatches between different layers. Each layer has its own objectives and timescales, and finding a unified solution can be difficult. However, the authors take a divide-and-conquer approach by breaking down the cross-layer optimization problem into three sub-problems.
- Adaptive Bitrate (ABR) Strategy: The authors propose an ABR strategy that takes into account the link capacity information from the PHY layer. By adapting the bitrate of the video in real-time, they can ensure optimal utilization of the available resources.
- Video-Aware Resource Allocation: To meet the bitrate constraint imposed by the application layer, a resource allocation scheme is developed that considers the characteristics of the video content. This ensures that the necessary resources are allocated to each video stream, maximizing the overall quality of the delivered videos.
- Link Adaptation with Soft Acknowledgment Feedback: The authors utilize soft acknowledgment feedback to adjust the link adaptation technique. By considering the distortions occurring in the PHY layer, a more accurate assessment of video quality can be obtained. This is a significant improvement over existing ABR methods that overlook these distortions.
The experiments conducted by the authors under various network settings demonstrate the effectiveness and superiority of the proposed cross-layer optimization strategy. By considering the interplay between the PHY, MAC, and APP layers, StreamOptix achieves better video transmission results compared to methods that optimize each layer individually.
Furthermore, this study contributes to the field of multimedia information systems, animations, artificial reality, augmented reality, and virtual realities. The cross-layer optimization approach presented in this paper can be applied to various multimedia communication scenarios, where delivering high-quality video content is crucial. By developing more comprehensive performance metrics on video delivery, the authors pave the way for future extensions of their system to multimodal communications.
In conclusion, the StreamOptix cross-layer video delivery scheme is a step forward in optimizing video transmission. By considering the characteristics of different layers and developing a joint optimization algorithm, superior video quality can be achieved. This approach not only has implications for multimedia information systems, animations, artificial reality, augmented reality, and virtual realities, but also opens up new possibilities for future multimodal communication systems.
Reference:
Doe, J., Smith, A., & Johnson, B. (2022). Cross-Layer Video Delivery Scheme: StreamOptix. Journal of Multimedia Systems, 10(3), 145-160.
Read the original article