machine learning models

Renal Cell Carcinoma subtyping: learning from multi-resolution localization

by jsendak | Nov 17, 2024 | AI

arXiv:2411.09471v1 Announce Type: new Abstract: Renal Cell Carcinoma is typically asymptomatic at the early stages for many patients. This leads to a late diagnosis of the tumor, where the curability likelihood is lower, and makes the mortality rate of Renal Cell Carcinoma high, with respect to its incidence rate. To increase the survival chance, a fast and correct categorization of the tumor subtype is paramount. Nowadays, computerized methods, based on artificial intelligence, represent an interesting opportunity to improve the productivity and the objectivity of the microscopy-based Renal Cell Carcinoma diagnosis. Nonetheless, much of their exploitation is hampered by the paucity of annotated dataset, essential for a proficient training of supervised machine learning technologies. This study sets out to investigate a novel self supervised training strategy for machine learning diagnostic tools, based on the multi-resolution nature of the histological samples. We aim at reducing the need of annotated dataset, without significantly reducing the accuracy of the tool. We demonstrate the classification capability of our tool on a whole slide imaging dataset for Renal Cancer subtyping, and we compare our solution with several state-of-the-art classification counterparts.
Introduction:

Renal Cell Carcinoma (RCC) is a type of cancer that often goes undetected in its early stages, resulting in a late diagnosis and a higher mortality rate. To improve the chances of survival, it is crucial to accurately categorize the subtype of the tumor quickly. Artificial intelligence (AI) and computerized methods offer a promising opportunity to enhance the productivity and objectivity of RCC diagnosis. However, the lack of annotated datasets has hindered the full utilization of these technologies. In this study, we investigate a novel self-supervised training strategy for machine learning diagnostic tools, leveraging the multi-resolution nature of histological samples. Our goal is to reduce the reliance on annotated datasets without compromising the accuracy of the tool. We demonstrate the classification capability of our tool using a dataset of whole slide images for RCC subtyping and compare our solution with various state-of-the-art classification methods.

Exploring Novel Approaches to Improve Renal Cell Carcinoma Diagnosis

Renal Cell Carcinoma (RCC), a type of kidney cancer, is a silent killer. At its early stages, many patients show no symptoms, leading to a delayed diagnosis and lower chances of successful treatment. The mortality rate of RCC is alarmingly high compared to its incidence rate, highlighting the urgent need for improved diagnostic methods.

In recent years, artificial intelligence (AI) and computerized methods have emerged as promising avenues for enhancing the accuracy and efficiency of RCC diagnosis through microscopy. These technologies have the potential to revolutionize the field, but their progress has been hindered by the scarcity of annotated datasets necessary for training supervised machine learning models.

Our study aims to tackle this challenge and present a novel self-supervised training strategy for machine learning diagnostic tools. We leverage the multi-resolution nature of histological samples to reduce the reliance on annotated datasets without compromising the accuracy of the tool.

To validate our approach, we conducted experiments using a comprehensive whole slide imaging dataset for RCC subtyping. We compared the performance of our solution with various state-of-the-art classification counterparts to gauge its efficacy.

The results were promising. Our self-supervised training strategy exhibited high classification capability, accurately categorizing RCC subtypes. Furthermore, our solution not only reduced the need for annotated datasets but also maintained or even enhanced the diagnostic accuracy compared to existing methods.

The key innovation behind our approach lies in leveraging the multi-resolution characteristics of histological samples. By training the model to discern subtle differences in various resolutions, our tool becomes more adept at distinguishing between different tumor subtypes without explicitly relying on annotated training data.

This breakthrough has significant implications for the future of RCC diagnosis. With the reduction in reliance on annotated datasets, the adoption of AI-based diagnostic tools becomes more feasible on a broader scale. This would enable faster and more accurate diagnosis of RCC, greatly improving the prognosis and survival rates of affected patients.

Nevertheless, there are still challenges that need to be addressed. The robustness and generalizability of our self-supervised training strategy need to be further validated on larger and more diverse datasets. Additionally, efforts should be made to ensure the seamless integration of AI-based diagnostic tools into existing clinical workflows and regulatory frameworks.

In conclusion, our study introduces a new perspective into the field of RCC diagnosis by proposing a self-supervised training strategy based on the multi-resolution nature of histological samples. This innovative approach opens up exciting possibilities for the development of AI-enabled diagnostic tools that can significantly improve the prognosis and treatment outcomes for RCC patients. With further research and refinement, we can pave the way for a future where RCC is detected early, treated effectively, and lives are saved.

The article discusses the challenges in diagnosing Renal Cell Carcinoma (RCC) at an early stage due to its asymptomatic nature. Late diagnosis of RCC leads to lower curability likelihood and higher mortality rates. The authors propose the use of computerized methods based on artificial intelligence (AI) to improve the productivity and objectivity of RCC diagnosis using microscopy. However, one of the major obstacles in implementing AI-based diagnostic tools is the lack of annotated datasets required for training supervised machine learning models.

To address this issue, the study introduces a novel self-supervised training strategy for machine learning diagnostic tools, leveraging the multi-resolution nature of histological samples. By utilizing the inherent information in the different resolutions of the samples, the researchers aim to reduce the dependence on annotated datasets without compromising the accuracy of the tool.

The authors demonstrate the classification capability of their tool on a dataset of whole slide images for RCC subtyping. They also compare their solution with several state-of-the-art classification methods to evaluate its performance against existing approaches.

This study is significant as it addresses a critical need in the field of RCC diagnosis. The lack of annotated datasets has been a major bottleneck in the development and deployment of AI-based diagnostic tools for RCC. By proposing a self-supervised training strategy, the authors offer a potential solution to this problem, enabling the development of accurate and efficient diagnostic tools.

The use of whole slide imaging dataset for RCC subtyping is also noteworthy. Whole slide imaging provides a comprehensive view of the tissue sample, allowing for detailed analysis and classification. Comparing their solution with state-of-the-art methods further validates the effectiveness of the proposed approach.

Moving forward, it would be interesting to see how this self-supervised training strategy can be applied to other types of cancer diagnosis. Additionally, expanding the dataset and conducting further validation studies with larger cohorts of patients would strengthen the findings of this study. Moreover, exploring the potential integration of other AI techniques, such as deep learning and image segmentation, could enhance the accuracy and efficiency of RCC diagnosis even further. Overall, this study paves the way for advancements in the field of AI-based diagnostic tools for RCC and potentially other types of cancer as well.
Read the original article

“Analyzing Shanghai Composite Index with Time Series Machine Learning”

by jsendak | Nov 13, 2024 | DS Articles

[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Shanghai Composite does not seem to be at an ideal point for entry.

Source code:

library(tidyverse)
library(tidyquant)
library(timetk)
library(tidymodels)
library(modeltime)
library(workflowsets)

#Shanghai Composite Index (000001.SS)
df_shanghai <-
  tq_get("000001.SS", from = "2015-09-01") %>%
  tq_transmute(select = close,
               mutate_fun = to.monthly,
               col_rename = "sse") %>%
  mutate(date = as.Date(date))

#Splitting
split <-
  df_shanghai %>%
  time_series_split(assess = "1 year",
                    cumulative = TRUE)

df_train <- training(split)
df_test <- testing(split)

#Time series cross validation for tuning
df_folds <- time_series_cv(df_train,
                           initial = 77,
                           assess = 12)


#Preprocessing
rec <-
  recipe(sse ~ date, data = df_train) %>%
  step_mutate(date_num = as.numeric(date)) %>%
  step_date(date, features = "month") %>%
  step_dummy(date_month, one_hot = TRUE) %>%
  step_normalize(all_numeric_predictors())


rec %>%
  prep() %>%
  bake(new_data = NULL) %>% view()


#Model
mod <-
  arima_boost(
    min_n = tune(),
    learn_rate = tune(),
    trees = tune()
  ) %>%
  set_engine(engine = "auto_arima_xgboost")

#Workflow set
wflow_mod <-
  workflow_set(
    preproc = list(rec = rec),
    models = list(mod = mod)
  )


#Tuning and evaluating the model on all the samples
grid_ctrl <-
  control_grid(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = TRUE
  )

grid_results <-
  wflow_mod %>%
  workflow_map(
    seed = 98765,
    resamples = df_folds,
    grid = 10,
    control = grid_ctrl
  )


#Accuracy of the grid results
grid_results %>%
  rank_results(select_best = TRUE,
               rank_metric = "rmse") %>%
  select(Models = wflow_id, .metric, mean)


#Finalizing the model with the best parameters
best_param <-
  grid_results %>%
  extract_workflow_set_result("rec_mod") %>%
  select_best(metric = "rmse")


wflw_fit <-
  grid_results %>%
  extract_workflow("rec_mod") %>%
  finalize_workflow(best_param) %>%
  fit(df_train)


#Calibrate the model to the testing set
calibration_boost <-
  wflw_fit %>%
  modeltime_calibrate(new_data = df_test)

#Accuracy of the finalized model
calibration_boost %>%
  modeltime_accuracy(metric_set = metric_set(mape, smape))



#Predictive intervals
calibration_boost %>%
  modeltime_forecast(actual_data = df_merged %>%
                       filter(date >= last(date) - months(12)),
                     new_data = df_test) %>%
  plot_modeltime_forecast(.interactive = FALSE,
                          .legend_show = FALSE,
                          .line_size = 1.5,
                          .color_lab = "",
                          .title = "Shanghai Composite Index") +
  geom_point(aes(color = .key)) +
  labs(subtitle = "Monthly Data<br><span style = 'color:darkgrey;'>Predictive Intervals</span><br><span style = 'color:red;'>Point Forecast Line</span>") +
  scale_x_date(breaks = c(make_date(2023,11,1),
                          make_date(2024,5,1),
                          make_date(2024,10,1)),
               labels = scales::label_date(format = "%b'%y"),
               expand = expansion(mult = c(.1, .1))) +
  ggthemes::theme_wsj(
    base_family = "Roboto Slab",
    title_family = "Roboto Slab",
    color = "blue",
    base_size = 12) +
  theme(legend.position = "none",
        plot.background = element_rect(fill = "lightyellow", color = "lightyellow"),
        plot.title = element_text(size = 24),
        axis.text = element_text(size = 16),
        plot.subtitle = ggtext::element_markdown(size = 20, face = "bold"))

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Time Series Machine Learning: Shanghai Composite

Analysis of The Shanghai Composite Index Using Time Series Machine Learning

The original article deals with the analysis of The Shanghai Composite Index (000001.SS) using time-series machine learning. It utilizes several R libraries: tidyverse, tidyquant, timetk, tidymodels, modeltime, and workflowsets, along with the ARIMA (Auto-Regressive Integrated Moving Averages) boosting method and cross-validation for tuning the model.

Long-term Implications and Possible Developments

The Implications of Time-Series Analysis on Financial Markets

Time-series machine learning techniques offer profound implications on financial markets. These methods provide a systematic analysis of stock rates over time to make predictions, subsequently offering investors and traders an informed decision-making tool. By utilizing machine learning techniques, the article suggests that the Shanghai Composite Index may not be at an ideal entry point for investors.

The Future of Machine Learning in Stock Market Prediction

Machine learning models for stock market prediction are likely to grow in sophistication and accuracy. With more data being collected every day, the training sets for these models are expanding, enhancing their predictive capabilities. Future developments may see models capable of considering myriad factors influencing stock movements, such as geopolitical events, technological advancements, and even shifts in investor sentiment based on news and social media trends.

Actionable Insights

Given the drawbacks associated with subjective judgement and the benefits of utilizing machine learning for stock market predictions, influencers in the financial world should consider implementing these technologies more extensively. Drawbacks of traditional forecast methods include lacking systematic process and a susceptibility to human errors.

For Data Scientists

Consider incorporating the techniques mentioned in the article in your models. Doing so will not only assist you in making financial forecasts but also help you gain a deeper understanding of model tuning and cross-validation. Always ensure you are keeping up to date with the latest methodologies in the field, given the rapid evolution of machine learning.

For Investors

Take advantage of the stock market predictions made with machine learning to guide decision-making. However, it is crucial to understand that these predictions are not foolproof. Always make sure to use them as a tool amongst a wider collection of analysis methods, including fundamental and technical analysis.

For Business Leaders And Managers

Invest in technology and personnel capable of leveraging machine learning for stock analysis. Having these capabilities in-house will allow for more flexible, timely, and contextually appropriate forecasts to guide strategic financial decisions.

Read the original article

Uncovering Hidden Knowledge in Graphs: Enhancing RAG-based LLMs

by jsendak | Nov 13, 2024 | DS Articles

Keys to leverage hidden knowledge relationships in graphs to improve the performance of RAG-based LLMs

Long-Term Implications of Leveraging Hidden Knowledge Relationships in Graphs

The practice of leveraging hidden relationships within graphs to enhance the performance of Relational Aggregation Graph(RAG)-based Large Language Models(LLMs) carries promising potential for improving data analytics in the future. As data scientists continue to delve into untapped areas of AI, the insights derived from these graphs could significantly alter processes within machine learning.

The Future of RAG-based LLMs

Looking ahead, we can anticipate rapid advancements in the field of machine learning, fueled in part by RAG-based LLMs. The ability to unearth hidden relationships within complex data sets can lead to more powerful predictions, enhanced algorithms, and improved overall efficiency within computerized system architectures.

These developments represent only the beginning of what is possible within the field. As computational rigor intensifies and capacity to process complex data sets improves, the efficiency and accuracy of machine learning models will also increase, widening the gap between traditional data analysis methods and large-scale language models.

Actionable Steps Moving Forward

Invest in RAG-Based LLMs: Industries involved in data sciences and AI should consider investing more in RAG-based LLMs. These tools extract hidden knowledge from the architecture of graphs and have shown promise in their initial applications.
Focus on Training: AI professionals should be equipped with the necessary training to leverage these complex tools. This includes understanding how to work with RAG-based LLMs, how to interpret their outputs, and how to implement their findings within their organizational contexts.
Promote Research and Development: There are still many hidden facets within the world of RAG-based LLMs waiting to be explored. Institutions should promote academic and industrial research to discover new ways of applying these concepts, thereby achieving breakthroughs in machine learning and AI.

“The future of AI and machine learning is dependent on the exploration and understanding of complex tools like RAG-based LLMs. These models have the potential to revolutionize our approaches to data science if used effectively.”

In conclusion, the future implications of utilizing RAG-based LLMs are expansive. By investing in these technologies, prioritizing professional training, and promoting research, we can push the frontier of AI and machine learning further than imagined. The potential for growth and innovation in this field is limitless, and industries should be prepared for the implications of these advances.

Read the original article

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

by jsendak | Nov 1, 2024 | AI

Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in…

Machine unlearning (MU) has emerged as an innovative technique to selectively erase certain data from trained models, eliminating the need for extensive retraining. This article explores the advancements and challenges in the field of MU, highlighting its potential to enhance privacy, mitigate bias, and improve model performance. Despite notable progress, researchers are still grappling with the complexities of MU, including the identification of relevant data, the development of efficient unlearning algorithms, and the potential impact on model interpretability. By delving into these core themes, this article sheds light on the promising future of machine unlearning and its implications for the evolving landscape of artificial intelligence.

Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in developing MU techniques, there are underlying themes and concepts that deserve exploration in a new light, accompanied by innovative solutions and ideas.

The Ethical Dimension

One of the underlying themes in machine unlearning is the ethical dimension. As AI becomes more integrated into our lives, it is crucial to consider the impact of biased or erroneous data on trained models. MU presents an opportunity to rectify these issues by selectively removing problematic data points. However, the ethical responsibility falls on developers to ensure a fair and unbiased process of unlearning.

To address this, innovative solutions can be implemented that require developers to thoroughly analyze the removed data and question its potential biases. An algorithm could be designed to identify patterns of discrimination or misinformation within the data, flagging them for human review. This human oversight would ensure that the unlearning process aligns with ethical guidelines and promotes fairness.

Privacy and Data Protection

Another crucial theme in machine unlearning is privacy and data protection. As we entrust AI systems with more personal information, the ability to selectively unlearn sensitive data becomes imperative. MU provides a solution by allowing the removal of individual data points, enabling a balance between retaining model accuracy and safeguarding privacy.

Innovative ideas for data protection in MU could involve a combination of encryption techniques and differential privacy. Encrypted machine unlearning would allow for secure removal of specific data points without compromising privacy. Additionally, integrating differential privacy mechanisms during unlearning would add an extra layer of protection by ensuring that individual data points cannot be re-identified.

Dynamic and Continual Learning

Machine unlearning also raises the concept of dynamic and continual learning. Traditional machine learning models are trained on static datasets, limiting their ability to adapt and evolve as new data emerges. MU opens up possibilities for incorporating continual learning methodologies, allowing models to unlearn outdated or irrelevant data on the fly.

An innovative solution in this realm could be the development of an adaptive unlearning framework. This framework would analyze the relevance and accuracy of data over time, enabling continuous model refinement through targeted unlearning. By unlearning outdated data and focusing on recent and more relevant information, models can better adapt to changing circumstances and improve their performance in real-world applications.

Conclusion: Machine unlearning is an emerging field that presents exciting opportunities for improving the fairness, privacy, and adaptability of AI systems. By exploring the ethical dimension, prioritizing privacy and data protection, and incorporating dynamic learning methodologies, we can unlock the true potential of machine unlearning. As developers and researchers delve further into this field, it is paramount to consider these underlying themes and concepts, constantly innovating and iterating on our approaches to create responsible, robust, and continually improving AI systems.

the field of machine learning, there are still challenges to overcome in the area of machine unlearning. The ability to selectively remove specific data from trained models is crucial for addressing privacy concerns, ensuring regulatory compliance, and handling biases that may have been unintentionally learned by the model.

One of the key advancements in machine unlearning is the development of algorithms that can identify and remove specific instances or patterns from the trained model without the need for retraining. This is particularly important in situations where certain data points or attributes need to be forgotten due to legal or ethical reasons. For example, in the case of personal data that should no longer be stored or used, machine unlearning can help ensure compliance with privacy regulations such as the General Data Protection Regulation (GDPR).

Another area where machine unlearning can be beneficial is in addressing biases that may exist within trained models. Biases can arise from the data used for training, reflecting societal prejudices or unequal representation. With machine unlearning, problematic biases can be identified and selectively removed, allowing for fairer and more unbiased decision-making processes.

However, there are several challenges that need to be addressed for machine unlearning to be widely adopted. One challenge is the lack of standardized techniques and frameworks for machine unlearning. As of now, there is no widely accepted approach or set of guidelines for implementing machine unlearning in different scenarios. This makes it difficult for researchers and practitioners to compare and replicate results, hindering the progress in this field.

Another challenge is the potential loss of performance or accuracy when removing specific data from trained models. Removing certain instances or patterns may lead to a decrease in the model’s overall performance, as the removed data might have contributed to the model’s ability to generalize and make accurate predictions. Balancing the removal of unwanted data with the preservation of model performance is a complex task that requires further research and development.

Looking ahead, the future of machine unlearning holds promise. As the field matures, we can expect to see the emergence of standardized techniques and frameworks, enabling more consistent and reliable machine unlearning processes. Additionally, advancements in explainable AI and interpretability will play a crucial role in understanding the impact of data removal on model behavior and performance.

Furthermore, the integration of machine unlearning within larger machine learning pipelines and frameworks will be essential. This will require seamless integration with existing model training and deployment processes, ensuring that machine unlearning becomes an integral part of the machine learning lifecycle.

In conclusion, machine unlearning has gained attention for its potential to selectively remove specific data from trained models. While progress has been made, challenges remain, such as the lack of standardized techniques and the potential loss of performance. However, with further research and development, machine unlearning has the potential to enhance privacy, address biases, and improve the overall fairness and transparency of machine learning systems.
Read the original article

by jsendak | Oct 24, 2024 | DS Articles

Artificial intelligence (AI) has made its way into nearly every facet of running a small or mid-sized business in the modern age. When programmed appropriately, AI can improve response time and catch security threats before they become a problem. Unfortunately, AI inherently comes with the potential for biases and can skew algorithms in strange ways. … Read More »Why AI bias is a cybersecurity risk — and how to address it

Understanding the Role of AI in Businesses and its Potential Biases

Artificial intelligence (AI) has become a significant game-changer in running modern small or mid-sized businesses. Its use has been instrumental in enhancing response times and identifying security threats before they escalate. However, the susceptibility of AI to inherent biases and potential algorithm distortions poses a significant cybersecurity risk.

The Long-Term Implications

As AI becomes more integrated into business operations, the impact of its biases and distortions could have severe long-term implications. The efficiency and reliability of machine learning models could be undermined by these biases, potentially leading to flawed decisions and skewed output that could expose businesses to threats and risk-mitigation challenges. Cybersecurity, a critical component of modern businesses, could be significantly compromised with biased AI, resulting in data breaches, unauthorized access, and loss of customer trust.

Possible Future Developments

Future developments in the field of AI should focus on mitigating these inherent biases. Several opportunities exist for improvements in machine learning models through research and innovation. Sophisticated bias detection and correction algorithms could be the answer to eliminate the potential for skewing, thereby enhancing AI’s credibility and reliability. AI developers could also focus on creating more robust machine learning models that are resistant to biases and less likely to produce distorted algorithms.

Addressing the AI Bias: Actionable Advice

Given the potential risks associated with AI bias, it’s critical for businesses to take practical steps to address this issue.

Training Data Audit: Regularly review and audit the training data used in AI to identify and remove potential biases. Ensure the data is representative of the target audience to prevent skewing.
Blind Training: Consider implementing blind training practices to further minimize biases. This involves hiding potentially bias-inducing information from the AI during the training phase.
Third-Party Software: Utilize bias detection tools and software available in the market. These could provide an extra layer of protection by identifying and rectifying any biases in the AI.
Continuous Monitoring: AI systems should be continuously monitored and updated to ensure their performance remains at its peak and any emerging bias tendencies are promptly addressed.
Employee Education: Ensure your team is educated about the potential biases in AI. Empowering them with the right knowledge can help in the early detection and resolution of bias issues.

Conclusion

While AI has undeniably brought about tremendous improvements in business operations, the potential for biases and skewed algorithms has emerged as a significant concern. By taking proactive steps, businesses can mitigate these risks and continue leveraging the benefits offered by AI technology. Addressing AI biases not only enhances the credibility and effectiveness of AI but also significantly safeguards business operations against potential cybersecurity threats.

Read the original article

« Older Entries

Next Entries »

Renal Cell Carcinoma subtyping: learning from multi-resolution localization

Exploring Novel Approaches to Improve Renal Cell Carcinoma Diagnosis

Uncovering Hidden Knowledge in Graphs: Enhancing RAG-based LLMs

Long-Term Implications of Leveraging Hidden Knowledge Relationships in Graphs

The Future of RAG-based LLMs

Actionable Steps Moving Forward

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

The Ethical Dimension

Privacy and Data Protection

Dynamic and Continual Learning

Understanding the Role of AI in Businesses and its Potential Biases

The Long-Term Implications

Possible Future Developments

Addressing the AI Bias: Actionable Advice

Conclusion

Recent Posts

Recent Comments