by jsendak | Dec 20, 2024 | Computer Science
arXiv:2412.14176v1 Announce Type: cross
Abstract: Big data visualization – the visual-spatial display of quantitative information culled from huge data sets – is now firmly embedded within the everyday experiences of people across the globe, yet scholarship on it remains surprisingly small. Within this literature, critical theorizations of big data visualizations are rare, as digital positivist perspectives dominate. This paper offers a critical, design-informed perspective on big data visualization in wearable health tracking ecosystems like FitBit. I argue that such visualizations are tools of individualized, neoliberal governance that operate largely through experiences of seduction and addiction to facilitate participation in the corporate capture and monetization of personal information. Exploration of my personal experience of the FitBit ecosystem illuminates this argument and emphasizes the capacity for harm to individuals using these ecosystems, leading to an exploration of the complex professional challenges for user experience designers working on visualizations within the ecosystems of wearables.
The Rise of Big Data Visualization
In today’s digital age, big data visualization has become an integral part of our everyday lives. By visually representing vast amounts of quantitative information from massive data sets, these visualizations provide valuable insights and facilitate data-driven decision making. However, despite its widespread use, the scholarship and critical analysis of big data visualization remains limited.
One prevailing viewpoint within the field of big data visualization is the digital positivist perspective, which focuses on the technical aspects and objective analysis of data. However, this perspective fails to offer a critical examination of the societal implications and ethical considerations associated with big data visualizations.
The Critical Perspective on Big Data Visualization
In this paper, the author presents a design-informed perspective on big data visualization within a specific context: wearable health tracking ecosystems like FitBit. The author argues that these visualizations serve as tools of individualized, neoliberal governance. They operate by seducing and addicting users, fostering their participation in the corporate capture and monetization of personal information.
This critical perspective sheds light on the potential harm that individuals may experience while using these ecosystems. By focusing on experiences of seduction and addiction, the author highlights the manipulation and exploitation of personal data for corporate gain.
The Interdisciplinary Nature of the Topic
This analysis of big data visualization in wearable health tracking ecosystems demonstrates the multi-disciplinary nature of the concept. It encompasses aspects of technology, design, social sciences, and ethics. It considers the technical design of visualizations, the power dynamics between corporations and individuals, and the ethical implications of data capture and monetization. By examining these intersecting fields, this research offers a holistic understanding of big data visualization beyond its technical foundations.
Implications for Multimedia Information Systems
Big data visualization plays a central role in multimedia information systems. It provides a visual avenue for presenting complex information to users, enhancing their understanding and decision-making processes. However, this critical analysis reveals the potential risks associated with this technology. Designers of multimedia information systems should consider the ethical dimensions of data capture, as well as the potential for user manipulation and exploitation.
Link to Animation, Artificial Reality, Augmented Reality, and Virtual Realities
Although this paper does not specifically discuss animation, artificial reality, augmented reality, or virtual realities, these concepts are closely related. Visualization techniques used in big data visualization can also be applied to these immersive technologies. By leveraging animations, artificial reality, augmented reality, and virtual realities in conjunction with big data visualization, these immersive experiences can provide even deeper insights and engagement for users.
The Future of Big Data Visualization
As big data continues to grow exponentially, the field of visualization will become increasingly vital. It is crucial for researchers and designers to continue exploring the critical implications and potential harms associated with big data visualizations. By adopting a design-informed, ethical perspective, we can ensure that these visualizations benefit individuals and society as a whole without succumbing to the seductive allure of corporate capture and monetization.
“In a world where big data rules supreme, critical perspectives on visualization are essential to guide the responsible and ethical use of personal information. Understanding the intricate relationship between technology and society is key to designing meaningful and empowering visualizations.”
Read the original article
by jsendak | Dec 12, 2024 | AI
arXiv:2412.07806v1 Announce Type: cross Abstract: Ulcerative Colitis (UC) is an incurable inflammatory bowel disease that leads to ulcers along the large intestine and rectum. The increase in the prevalence of UC coupled with gastrointestinal physician shortages stresses the healthcare system and limits the care UC patients receive. A colonoscopy is performed to diagnose UC and assess its severity based on the Mayo Endoscopic Score (MES). The MES ranges between zero and three, wherein zero indicates no inflammation and three indicates that the inflammation is markedly high. Artificial Intelligence (AI)-based neural network models, such as convolutional neural networks (CNNs) are capable of analyzing colonoscopies to diagnose and determine the severity of UC by modeling colonoscopy analysis as a multi-class classification problem. Prior research for AI-based UC diagnosis relies on supervised learning approaches that require large annotated datasets to train the CNNs. However, creating such datasets necessitates that domain experts invest a significant amount of time, rendering the process expensive and challenging. To address the challenge, this research employs self-supervised learning (SSL) frameworks that can efficiently train on unannotated datasets to analyze colonoscopies and, aid in diagnosing UC and its severity. A comparative analysis with supervised learning models shows that SSL frameworks, such as SwAV and SparK outperform supervised learning models on the LIMUC dataset, the largest publicly available annotated dataset of colonoscopy images for UC.
The article discusses the challenges faced in diagnosing and assessing the severity of ulcerative colitis (UC), an inflammatory bowel disease that causes ulcers in the large intestine and rectum. The increasing prevalence of UC, coupled with a shortage of gastrointestinal physicians, has put a strain on the healthcare system and limited the care received by UC patients. Currently, a colonoscopy is performed to diagnose UC and assess its severity using the Mayo Endoscopic Score (MES). Artificial Intelligence (AI)-based neural network models, specifically convolutional neural networks (CNNs), have shown promise in analyzing colonoscopies to diagnose and determine the severity of UC. However, previous research has relied on supervised learning approaches that require large annotated datasets, which are expensive and time-consuming to create. To overcome this challenge, this study explores the use of self-supervised learning (SSL) frameworks that can efficiently train on unannotated datasets to analyze colonoscopies and aid in the diagnosis of UC and its severity. The researchers compare SSL frameworks, such as SwAV and SparK, with supervised learning models on the LIMUC dataset, the largest publicly available annotated dataset of colonoscopy images for UC. The results show that SSL frameworks outperform supervised learning models, offering a potential solution to the limitations in UC diagnosis and severity assessment.
Ulcerative Colitis (UC) is a debilitating inflammatory bowel disease that affects millions of people worldwide. The increasing prevalence of UC, coupled with a shortage of gastrointestinal physicians, has put a strain on the healthcare system and limited the care that UC patients receive. Diagnosis and assessment of UC severity are crucial for providing appropriate treatment, and traditionally, a colonoscopy has been the primary tool for this purpose.
In recent years, Artificial Intelligence (AI) has made significant strides in the field of medical imaging analysis, and colonoscopy analysis is no exception. Convolutional Neural Networks (CNNs) have shown promising results in diagnosing and determining the severity of UC by modeling colonoscopy analysis as a multi-class classification problem. However, a major limitation of existing AI-based UC diagnosis models is the requirement for large annotated datasets to train the CNNs.
Creating such datasets, which involve labeling thousands of colonoscopy images, requires extensive time and effort from domain experts, making the process expensive and challenging. To address this challenge, researchers have turned to self-supervised learning (SSL) frameworks, which can efficiently train on unannotated datasets.
SSL frameworks leverage the inherent information within the unannotated dataset to learn useful representations of the images. By leveraging this unsupervised learning approach, AI models can effectively analyze colonoscopy images and aid in the diagnosis of UC and the assessment of its severity.
One such SSL framework that has shown promising results in UC diagnosis is SwAV. SwAV stands for “Self-supervised learning with Swapped Assignments and Visualizations.” It involves randomly swapping patches from different colonoscopy images and training the network to classify whether the patches belong to the same image or not. This process forces the network to learn meaningful representations of the colonoscopy images, even without explicit annotations.
Another SSL framework called SparK, which stands for “Self-supervised training for Analysis of Red Light and KiNetics,” has also demonstrated impressive performance in UC diagnosis. SparK leverages the temporal information within a video sequence of colonoscopy images to learn representations. By predicting the ordering of the frames within the video, SparK can capture the dynamics of UC progression and severity.
A comparative analysis of SSL frameworks with traditional supervised learning models on the LIMUC dataset, the largest publicly available annotated dataset of colonoscopy images for UC, showed that SSL frameworks outperformed supervised learning models. This demonstrates the potential of SSL frameworks in improving the accuracy and efficiency of UC diagnosis and severity assessment.
The use of SSL frameworks in UC diagnosis not only reduces the reliance on large annotated datasets but also opens up avenues for scaling up the deployment of AI models in healthcare settings. By utilizing unannotated datasets, healthcare institutions can analyze a larger volume of colonoscopy images, leading to faster and more accurate diagnoses for patients.
In conclusion, the application of self-supervised learning frameworks, such as SwAV and SparK, in analyzing colonoscopy images for UC diagnosis and severity assessment holds great promise. These frameworks enable AI models to learn from unannotated datasets, reducing the need for extensive manual annotations and accelerating the diagnosis process. By leveraging the power of AI and SSL, we can revolutionize the way UC is diagnosed and improve patient care in the face of the increasing prevalence of this disease.
The paper “Ulcerative Colitis Diagnosis and Severity Assessment using Self-Supervised Learning on Colonoscopy Images” addresses the challenge of diagnosing and assessing the severity of ulcerative colitis (UC) using artificial intelligence (AI)-based neural network models. UC is a chronic inflammatory bowel disease that affects the large intestine and rectum, causing ulcers. The increasing prevalence of UC, coupled with a shortage of gastrointestinal physicians, puts a strain on the healthcare system and limits the care received by UC patients.
Traditionally, UC diagnosis and severity assessment have relied on colonoscopy, a procedure that involves visually examining the colon and rectum using a flexible tube with a camera. The severity of UC is typically assessed using the Mayo Endoscopic Score (MES), which ranges from zero to three, with zero indicating no inflammation and three indicating severe inflammation.
In recent years, AI and machine learning techniques have shown promise in assisting with UC diagnosis and severity assessment. Specifically, convolutional neural networks (CNNs) have been used to analyze colonoscopy images and classify them into different severity levels of UC. However, previous research in this area has relied on supervised learning, which requires large annotated datasets to train the CNNs. Creating such datasets is time-consuming and expensive, as it involves domain experts manually labeling a large number of colonoscopy images.
To overcome this challenge, the authors of this paper propose the use of self-supervised learning (SSL) frameworks for UC diagnosis and severity assessment. SSL is a type of machine learning where the model learns from unannotated data without the need for explicit labels. The advantage of SSL is that it can leverage large amounts of unannotated data, which is more readily available compared to annotated datasets.
The researchers evaluated two SSL frameworks, SwAV and SparK, on the LIMUC dataset, which is the largest publicly available annotated dataset of colonoscopy images for UC. They compared the performance of these SSL frameworks with supervised learning models. The results showed that the SSL frameworks outperformed the supervised learning models in terms of UC diagnosis and severity assessment.
This research is significant as it addresses the limitations of previous approaches to AI-based UC diagnosis and severity assessment. By leveraging SSL frameworks, which can efficiently train on unannotated datasets, the time and cost involved in creating annotated datasets can be significantly reduced. This opens up the possibility of scaling up AI-based UC diagnosis and severity assessment, making it more accessible and cost-effective in real-world healthcare settings.
Moving forward, it would be interesting to see how these SSL frameworks perform on larger and more diverse datasets. Additionally, future research could explore the integration of these AI models into clinical practice, considering factors such as interpretability, validation, and regulatory considerations. Overall, this study highlights the potential of AI and SSL in revolutionizing the diagnosis and assessment of UC, ultimately improving patient care and outcomes.
Read the original article
by jsendak | Dec 10, 2024 | DS Articles
Looking for DIY examples for acquiring a foundation for efficiently visualizing data in Python? Then this tutorial is for you.
Long Term Implications and Possible Future Developments in Efficient Data Visualization in Python
The tech world today acknowledges the unrivaled utility of Python in various domains, especially in data visualization. With rising data volumes, the need to understand and synthesize this information is paramount. Accordingly, proficient use of Python for data visualization has key long-term implications for both individuals and organizations.
Implications and Developments
Mastering Python for data visualization holds immense professional advantages, with numerous growth opportunities. Given the accelerating accumulation of big data, the ability to articulate insights in visual form is a high-demand skill. Guided by comprehensive tutorials and DIY examples, more individuals will gain data manipulation knowledge, thus, leading to the evolution of functionality in data analysis packages in Python.
In the broader landscape, businesses and organizations stand to benefit immensely from detailed and efficient data visualizations. Distilled artifacts aid in data-driven decision-making processes, boosting organizational efficiency and productivity. As more organizations appreciate the role Python plays in this, we can anticipate more investment into better data handling and visualization packages for this language.
Actionable Insights for Future Progress
To prepare for these developments and implications, consider the following recommendations:
- Enroll in Python courses: Start by gaining fluency in this language, especially focusing on its data analysis and visualization packages.
- Explore DIY Tutorials: Use DIY tutorials to get hands-on experience. This not only enhances your understanding but prepares you to handle real-world situations.
- Invest in Data Analysis Tools: Organizations should invest in data analysis and visualization tools that accommodate Python. This will enable them to leverage the power of this language to reap benefits in data handling.
- Stay Current: The tech world evolves rapidly. Ensure you continually upgrade your Python skills to stay in line with the latest trends and updates in data visualization.
Embracing Python’s data visualization capabilities promises to revolutionize the way we handle data. From individual growth prospects to business-level efficiencies, the advantages are compelling. As we keep ourselves updated and prepared, we also contribute to this field’s evolution.
Read the original article
by jsendak | Dec 3, 2024 | AI
arXiv:2411.18657v1 Announce Type: new
Abstract: Automated visualization recommendations (vis-rec) help users to derive crucial insights from new datasets. Typically, such automated vis-rec models first calculate a large number of statistics from the datasets and then use machine-learning models to score or classify multiple visualizations choices to recommend the most effective ones, as per the statistics. However, state-of-the art models rely on very large number of expensive statistics and therefore using such models on large datasets become infeasible due to prohibitively large computational time, limiting the effectiveness of such techniques to most real world complex and large datasets. In this paper, we propose a novel reinforcement-learning (RL) based framework that takes a given vis-rec model and a time-budget from the user and identifies the best set of input statistics that would be most effective while generating the visual insights within a given time budget, using the given model. Using two state-of-the-art vis-rec models applied on three large real-world datasets, we show the effectiveness of our technique in significantly reducing time-to visualize with very small amount of introduced error. Our approach is about 10X times faster compared to the baseline approaches that introduce similar amounts of error.
Automated Visualization Recommendations in Data Analysis
Automated visualization recommendations have become indispensable tools in data analysis, helping users to extract crucial insights from complex datasets. These recommendations are generated by models that calculate numerous statistics from the dataset and then employ machine learning algorithms to score and classify various visualization options, suggesting the most effective ones based on the statistics. However, existing models heavily rely on a large number of computationally expensive statistics, making them impractical for analyzing large datasets. As a result, these techniques often fail to provide efficient and effective visualization recommendations for real-world complex datasets.
To overcome this limitation, the authors propose a novel framework based on reinforcement learning (RL) to optimize visualization recommendations within a given time budget. The user provides a vis-rec model and a predefined time budget, and the RL algorithm identifies the most effective set of input statistics for generating visual insights within the given time constraints.
The multi-disciplinary nature of this research is evident in the integration of machine learning, data analysis, and reinforcement learning techniques. By combining these different fields, the authors aim to improve the efficiency and effectiveness of automated visualization recommendations.
Experimental Results
In order to evaluate their proposed framework, the authors conducted experiments using two state-of-the-art vis-rec models on three large real-world datasets. The results demonstrated the effectiveness of their technique in significantly reducing the time required to generate visualizations, while introducing only a small amount of error.
Compared to baseline approaches that introduce similar amounts of error, the proposed RL-based framework was found to be approximately 10 times faster. This substantial reduction in computational time makes it feasible to apply automated visualization recommendations on large and complex datasets, thus enhancing the usefulness of these techniques in real-world scenarios.
Future Directions
This research opens up several avenues for further exploration. Firstly, there is scope to investigate different reinforcement learning algorithms and their impact on the optimization of visualization recommendations. Additionally, examining the applicability of the proposed framework to different types of datasets and vis-rec models could provide valuable insights.
Furthermore, exploring the potential of incorporating domain knowledge and user preferences into the RL framework could lead to more personalized and context-aware visualization recommendations. By considering the unique characteristics of each dataset and the specific needs of users, the framework can generate recommendations that align with domain-specific requirements.
Overall, this research sheds light on the importance of efficient visualization recommendation techniques and introduces a promising approach using reinforcement learning. By addressing the computational challenges associated with large datasets, this framework paves the way for more effective and scalable automated visualization recommendations in diverse domains.
Read the original article
by jsendak | Nov 28, 2024 | Computer Science
arXiv:2411.17704v1 Announce Type: new
Abstract: Data visualizations are inherently rhetorical, and therefore bias-laden visual artifacts that contain both explicit and implicit arguments. The implicit arguments depicted in data visualizations are the net result of many seemingly minor decisions about data and design from inception of a research project through to final publication of the visualization. Data workflow, selected visualization formats, and individual design decisions made within those formats all frame and direct the possible range of interpretation, and the potential for harm of any data visualization. Considering this, it is imperative that we take an ethical approach to the creation and use of data visualizations. Therefore, we have suggested an ethical data visualization workflow with the dual aim of minimizing harm to the subjects of our study and the audiences viewing our visualization, while also maximizing the explanatory capacity and effectiveness of the visualization itself. To explain this ethical data visualization workflow, we examine two recent digital mapping projects, Racial Terror Lynchings and Map of White Supremacy Mob Violence.
The Rhetoric and Ethics of Data Visualizations
Data visualizations play a crucial role in conveying information and insights in various fields, including multimedia information systems, animations, artificial reality, augmented reality, and virtual realities. In recent years, there has been a growing recognition that these visual artifacts are not just neutral representations of data but are inherently biased and persuasive in nature.
In their insightful article, the authors highlight the implicit arguments embedded within data visualizations. They suggest that every decision made, from data selection to design choices, shapes the range of interpretations and potential harms that may arise from the visualization. In this context, an ethical approach becomes imperative to minimize harm to both the subjects of study and the audiences viewing the visualizations.
An Ethical Data Visualization Workflow
The authors propose an ethical data visualization workflow that aims to balance the explanatory capacity and effectiveness of the visualization while minimizing harm. This workflow involves thoughtful consideration of every stage of the visualization process, ensuring transparency, fairness, and accuracy in the presentation of the data.
- Data Workflow: The authors emphasize the importance of careful data curation and selection. This involves critically assessing the sources, biases, and limitations of the data, as well as considering potential harm to individuals or communities represented in the visualization.
- Visualization Formats: Choosing the appropriate visualization format is crucial for effective communication. The authors suggest considering the context, audience, and goals of the visualization, while also acknowledging the potential consequences of different formats on interpretation and perception.
- Design Decisions: Design choices within the selected visualization format play a significant role in shaping the narrative and potential biases in the visualization. The authors recommend a critical examination of design elements such as color, scale, and labeling to ensure accuracy, fairness, and empathy.
Case Studies: Racial Terror Lynchings and Map of White Supremacy Mob Violence
To illustrate the application of the proposed ethical data visualization workflow, the authors examine two recent digital mapping projects: Racial Terror Lynchings and Map of White Supremacy Mob Violence. These case studies shed light on how ethical considerations can influence the design and presentation of data visualizations related to sensitive topics.
Multidisciplinarity is a key aspect of this article as it integrates concepts and insights from various fields. The authors draw upon principles of rhetoric, ethics, information systems, and visualization design to formulate the ethical data visualization workflow. This interdisciplinary approach is essential in understanding the complex nature of data visualizations and addressing the ethical challenges they present.
In the wider field of multimedia information systems, animations, artificial reality, augmented reality, and virtual realities, the concept of ethical data visualization has significant implications. As these technologies continue to evolve, data visualizations become more immersive, interactive, and influential. This underscores the need for ethical considerations that go beyond surface-level design choices and delve into the underlying implications and potential harm caused by these visualizations.
By emphasizing the ethical dimensions of data visualizations, this article serves as a valuable resource for practitioners, researchers, and designers in the multimedia field. It prompts critical reflection on the biases, power dynamics, and responsibility associated with creating and using data visualizations, ultimately aiming to foster more accountable and impactful visual representations.
“Data visualizations are powerful tools that can shape our understanding of the world. By approaching their creation and use through an ethical lens, we can strive to create visualizations that not only inform but also respect the subjects they represent and engage with.”
Read the original article