by jsendak | Mar 29, 2025 | DS Articles
[This article was first published on
R Works, and kindly contributed to
R-bloggers]. (You can report issue about the content on this page
here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
In February, one hundred fifty-nine new packages made it to CRAN. Here are my Top 40 picks in fifteen categories: Artificial Intelligence, Computational Methods, Ecology, Genomics, Health Sciences, Mathematics, Machine Learning, Medicine, Music, Pharma, Statistics, Time Series, Utilities, Visualization, and Weather.
Artificial Intelligence
chores v0.1.0: Provides a collection of ergonomic large language model assistants designed to help you complete repetitive, hard-to-automate tasks quickly. After selecting some code, press the keyboard shortcut you’ve chosen to trigger the package app, select an assistant, and watch your chore be carried out. Users can create custom helpers just by writing some instructions in a markdown file. There are three vignettes: Getting started, Custom helpers, and Gallery.

gander v0.1.0: Provides a Copilot completion experience that knows how to talk to the objects in your R environment. ellmer
chats are integrated directly into your RStudio
and Positron
sessions, automatically incorporating relevant context from surrounding lines of code and your global environment. See the vignette to get started.

GitAI v0.1.0: Provides functions to scan multiple Git
repositories, pull content from specified files, and process it with LLMs. You can summarize the content, extract information and data, or find answers to your questions about the repositories. The output can be stored in a vector database and used for semantic search or as a part of a RAG (Retrieval Augmented Generation) prompt. See the vignette.

Computational Methods
nlpembeds v1.0.0: Provides efficient methods to compute co-occurrence matrices, point wise mutual information (PMI), and singular value decomposition (SVD), especially useful when working with huge databases in biomedical and clinical settings. Functions can be called on SQL
databases, enabling the computation of co-occurrence matrices of tens of gigabytes of data, representing millions of patients over tens of years. See Hong (2021) for background and the vignette for examples.
NLPwavelet v1.0: Provides functions for Bayesian wavelet analysis using individual non-local priors as described in Sanyal & Ferreira (2017) and non-local prior mixtures as described in Sanyal (2025). See README to get started.

pnd v0.0.9: Provides functions to compute numerical derivatives including gradients, Jacobians, and Hessians through finite-difference approximations with parallel capabilities and optimal step-size selection to improve accuracy. Advanced features include computing derivatives of arbitrary order. There are three vignettes on the topics: Compatibility with numDeriv, Parallel numerical derivatives, and Step-size selection.
rmcmc v0.1.1: Provides functions to simulate Markov chains using the proposal from Livingstone and Zanella (2022) to compute MCMC estimates of expectations with respect to a target distribution on a real-valued vector space. The package also provides implementations of alternative proposal distributions, such as (Gaussian) random walk and Langevin proposals. Optionally, BridgeStan’s R interface BridgeStan
can be used to specify the target distribution. There is an Introduction to the Barker proposal and a vignette on Adjusting the noise distribution.

sgdGMF v1.0: Implements a framework to estimate high-dimensional, generalized matrix factorization models using penalized maximum likelihood under a dispersion exponential family specification, including the stochastic gradient descent algorithm with a block-wise mini-batch strategy and an efficient adaptive learning rate schedule to stabilize convergence. All the theoretical details can be found in Castiglione et al. (2024). Also included are the alternated iterative re-weighted least squares and the quasi-Newton method with diagonal approximation of the Fisher information matrix discussed in Kidzinski et al. (2022). There are four vignettes, including introduction and residuals.

Data
acledR v0.1.0: Provides tools for working with data from ACLED (Armed Conflict Location and Event Data). Functions include simplified access to ACLED’s API, methods for keeping local versions of ACLED data up-to-date, and functions for common ACLED data transformations. See the vignette to get started.
Horsekicks v1/0/2: Provides extensions to the classical dataset Death by the kick of a horse in the Prussian Army first used by Ladislaus von Bortkeiwicz in his treatise on the Poisson distribution Das Gesetz der kleinen Zahlen. Also included are deaths by falling from a horse and by drowning. See the vignette.

OhdsiReportGenerator v1.0.1: Extracts results from the Observational Health Data Sciences and Informatics result database and generates Quarto
reports and presentations. See the package guide.
wbwdi v1.0.0: Provides functions to access and analyze the World Bank’s World Development Indicators (WDI) using the corresponding API. WDI provides more than 24,000 country or region-level indicators for various contexts. See the vignette.

Ecology
rangr v1.0.6: Implements a mechanistic virtual species simulator that integrates population dynamics and dispersal to study the effects of environmental change on population growth and range shifts. Look here for background and see the vignette to get started.

Economics
godley v0.2.2: Provides tools to define, simulate, and validate stock-flow consistent (SFC) macroeconomic models by specifying governing systems of equations. Users can analyze how macroeconomic structures affect key variables, perform sensitivity analyses, introduce policy shocks, and visualize resulting economic scenarios. See Godley and Lavoie (2007), Kinsella and O’Shea (2010) for background and the vignette to get started.

Health Sciences
matriz v1.0.1: Implements a workflow that provides tools to create, update, and fill literature matrices commonly used in research, specifically epidemiology and health sciences research. See README to get started.
Mathematics
flint v0.0.3: Provides an interface to FLINT
, a C library for number theory which extends GNU MPFR and GNU MP with support for arithmetic in standard rings (the integers, the integers modulo n, the rational, p-adic, real, and complex numbers) as well as vectors, matrices, polynomials, and power series over rings and implements midpoint-radius interval arithmetic, in the real and complex numbers See Johansson (2017) for information on computation in arbitrary precision with rigorous propagation of errors and see the NIST Digital Library of Mathematical Functions for information on additional capabilities. Look here to get started.
Machine Learning
tall v0.1.1: Implements a general-purpose tool for analyzing textual data as a shiny
application with features that include a comprehensive workflow, data cleaning, preprocessing, statistical analysis, and visualization. See the vignette.
“}
Medicine
BayesERtools v0.2.1: Provides tools that facilitate exposure-response analysis using Bayesian methods. These include a streamlined workflow for fitting types of models that are commonly used in exposure-response analysis – linear and Emax for continuous endpoints, logistic linear and logistic Emax for binary endpoints, as well as performing simulation and visualization. Look here to learn more about the workflow, and see the vignette for an overview.

Medicine Continued
PatientLevelPrediction v6.4.0: Implements a framework to create patient-level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features, which can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps et al. (2017). There are fourteen vignettes, including Building Patient Level Prediction Models and Best Practices.

SimTOST v1.0.2: Implements a Monte Carlo simulation approach to estimating sample sizes, power, and type I error rates for bio-equivalence trials that are based on the Two One-Sided Tests (TOST) procedure. Users can model complex trial scenarios, including parallel and crossover designs, intra-subject variability, and different equivalence margins. See Schuirmann (1987), Mielke et al. (2018), and Shieh (2022) for background. There are seven vignettes including Introduction and Bioequivalence Tests for Parallel Trial Designs: 2 Arms, 1 Endpoint.

Music
musicXML v1.0.1: Implements tools to facilitate data sonification and create files to share music notation in the musicXML format. Several classes are defined for basic musical objects such as note pitch, note duration, note, measure, and score. Sonification functions map data into musical attributes such as pitch, loudness, or duration. See the blog and Renard and Le Bescond (2022) for examples and the vignette to get started.

Pharma
emcAdr v1.2: Provides computational methods for detecting adverse high-order drug interactions from individual case safety reports using statistical techniques, allowing the exploration of higher-order interactions among drug cocktails. See the vignette.

SynergyLMM v1.0.1: Implements a framework for evaluating drug combination effects in preclinical in vivo studies, which provides functions to analyze longitudinal tumor growth experiments using linear mixed-effects models, perform time-dependent analyses of synergy and antagonism, evaluate model diagnostics and performance, and assess both post-hoc and a priori statistical power. See Demidenko & Miller (2019 for the calculation of drug combination synergy and Pinheiro and Bates (2000) and Gałecki & Burzykowski (2013) for information on linear mixed-effects models. The vignette offers a tutorial.

vigicaen v0.15.6: Implements a toolbox to perform the analysis of the World Health Organization (WHO) Pharmacovigilance database, VigiBase, with functions to load data, perform data management, disproportionality analysis, and descriptive statistics. Intended for pharmacovigilance routine use or studies. There are eight vignettes, including basic workflow and routine pharmacoviligance.

Psychology
cogirt v1.0.0: Provides tools to psychometrically analyze latent individual differences related to tasks, interventions, or maturational/aging effects in the context of experimental or longitudinal cognitive research using methods first described by Thomas et al. (2020). See the vignette.

Statistics
DiscreteDLM v1.0.0: Provides tools for fitting Bayesian distributed lag models (DLMs) to count or binary, longitudinal response data. Count data are fit using negative binomial regression, binary are fit using quantile regression. Lag contribution is fit via b-splines. See Dempsey and Wyse (2025) for background and README for examples.
oneinfl v1.0.1: Provides functions to estimate Estimates one-inflated positive Poisson, one-inflated zero-truncated negative binomial regression models, positive Poisson models, and zero-truncated negative binomial models along with marginal effects and their standard errors. The models and applications are described in Godwin (2024). See README for and example.

Time Series
BayesChange v2/0/0: Provides functions for change point detection on univariate and multivariate time series according to the methods presented by Martinez & Mena (2014) and Corradin et al. (2022) along with methods for clustering time dependent data with common change points. See Corradin et. al. (2024). There is a tutorial.

echos v1.0.3: Provides a lightweight implementation of functions and methods for fast and fully automatic time series modeling and forecasting using Echo State Networks. See the vignettes Base functions and Tidy functions.
quadVAR v0.1.2: Provides functions to estimate quadratic vector autoregression models with the strong hierarchy using the Regularization Algorithm under Marginality Principle of Hao et al. (2018) to compare the performance with linear models and construct networks with partial derivatives. See README for examples.

Utilities
aftables v1.0.2: Provides tools to generate spreadsheet publications that follow best practice guidance from the UK government’s Analysis Function. There are four vignettes, including an Introduction and Accessibility.
watcher v0.1.2: Implements an R
binding for libfswatch
, a file system monitoring library, that enables users to watch files or directories recursively for changes in the background. Log activity or run an R
function every time a change event occurs. See the README for an example.
Visualization
jellyfisher v1.0.4: Generates interactive Jellyfish plots to visualize spatiotemporal tumor evolution by integrating sample and phylogenetic trees into a unified plot. This approach provides an intuitive way to analyze tumor heterogeneity and evolution over time and across anatomical locations. The Jellyfish plot visualization design was first introduced by Lahtinen et al. (2023). See the vignette.

xdvir v0.1-2: Provides high-level functions to render LaTeX
fragments as labels and data symbols in ggplot2
plots, plus low-level functions to author, produce, and typeset LaTeX
documents, and to produce, read, and render DVI
files. See the vignette.

Weather
RFplus v1.4-0: Implements a machine learning algorithm that merges satellite and ground precipitation data using Random Forest for spatial prediction, residual modeling for bias correction, and quantile mapping for adjustment, ensuring accurate estimates across temporal scales and regions. See the vignette.
SPIChanges v0.1.0: Provides methods to improve the interpretation of the Standardized Precipitation Index under changing climate conditions. It implements the nonstationary approach of Blain et al. (2022) to detect trends in rainfall quantities and quantify the effect of such trends on the probability of a drought event occurring. There is an Introduction and a vignette Monte Carlo Experiments and Case Studies.

Continue reading: February 2025 Top 40 New CRAN Packages
Analysis and Future Implications of February 2025 New CRAN Packages
Over the course of February 2025, 159 new packages made it to the Comprehensive R Archive Network (CRAN). With immense advancements in dynamic fields such as Artificial Intelligence, Genomics, Machine Learning and others, this represents another leap into a future powered by groundbreaking data-analytics too. But what does this mean for users of these packages? What longer-term implications do these hold?
Artificial Intelligence-Based Packages
Artificial Intelligence has shown significant advancements recently. The newly released packages, such as chores v0.1.0, gander v0.1.0, and GitAI v0.1.0, showcase versatile features like language model assistants, Copilot completion experience, and functions to scan Git repositories. Considering the increasing importance of automating tasks and the capabilities these packages offer, they’re expected to gain more popularity.
Actionable Advice:
Artificial Intelligence is an ever-evolving field. Stay updated with the latest advancements like large language models and more efficient programming. Learning to use new packages like chores, gander, and GitAI could help improve efficiency in automating tasks.
Computational Methods-Based Packages
New tools like nlpembeds v1.0.0, NLPwavelet v1.0, and rmcmc v0.1.1 are milestones in Computational Methods’ evolution. Such packages demonstrate the community’s focusing on computation efficiency and modeling, even with very large data sets.
Actionable Advice:
Consider updating your skills to effectively handle large volumes of data and make sense of complex data sets using packages like nlpembeds and rmcmc.
Data Packages
Twelve new data packages, including acledR v0.1.0 and Horsekicks v1/0/2, provide the community with preloaded datasets and functions to handle specific types of data efficiently. They offer potential to researchers to undertake complex studies without the hassle of preprocessing big data.
Actionable Advice:
Stay updated with the latest data packages available on CRAN to improve the efficiency of your studies and to provide a robust framework for your research.
Machine Learning Packages
A new package like tall v0.1.1 implies a user-friendly approach to analyzing textual data using machine learning. This shows a clear trend towards user-friendly, visual, and interactive tools for applied machine learning in textual data analysis.
Actionable Advice:
As a data scientist or analyst, consider deploying machine learning tools like tall in your work. It would streamline the process of extracting insights from raw textual data.
Visualization Packages
Visualization tools like jellyfisher v1.0.4 and xdvir v0.1-2 provide intuitive ways to analyze and present data, which is a crucial aspect of data analysis.
Actionable Advice:
Should you be presenting complex data sets to an audience, consider using such visualization tools to simplify consumption and interpretation.
Long-term Implications and Future Developments
CRAN’s latest package releases suggest exciting developments in fields of Artificial Intelligence, Computational Methods, Machine Learning, Data and Visualization. With the pace at which these fields are growing, professionals relying on data analysis and researchers should anticipate even more sophisticated tools and computations in the pipeline. This further indicates a clear need to keep up with understanding and ability to deploy these constantly evolving tools.
Actionable Advice:
Continually learning and applying newly released packages should be a part of your long-term strategy. This will ensure you stay ahead in the data science world, leveraging the most effective and sophisticated tools at your disposal.
Read the original article
by jsendak | Nov 12, 2024 | Computer Science
arXiv:2411.05794v1 Announce Type: new
Abstract: This study investigates the evaluation of multimedia quality models, focusing on the inherent uncertainties in subjective Mean Opinion Score (MOS) ratings due to factors like rater inconsistency and bias. Traditional statistical measures such as Pearson’s Correlation Coefficient (PCC), Spearman’s Rank Correlation Coefficient (SRCC), and Kendall’s Tau (KTAU) often fail to account for these uncertainties, leading to inaccuracies in model performance assessment. We introduce the Constrained Concordance Index (CCI), a novel metric designed to overcome the limitations of existing metrics by considering the statistical significance of MOS differences and excluding comparisons where MOS confidence intervals overlap. Through comprehensive experiments across various domains including speech and image quality assessment, we demonstrate that CCI provides a more robust and accurate evaluation of instrumental quality models, especially in scenarios of low sample sizes, rater group variability, and restriction of range. Our findings suggest that incorporating rater subjectivity and focusing on statistically significant pairs can significantly enhance the evaluation framework for multimedia quality prediction models. This work not only sheds light on the overlooked aspects of subjective rating uncertainties but also proposes a methodological advancement for more reliable and accurate quality model evaluation.
Expert Analysis: Evaluating Multimedia Quality Models and Overcoming Subjective Rating Uncertainties
As multimedia systems continue to evolve, it becomes increasingly important to develop reliable and accurate quality models to assess the performance of these systems. However, traditional statistical measures often fall short in accounting for the uncertainties inherent in subjective Mean Opinion Score (MOS) ratings, leading to inaccurate model assessments. This study proposes a novel metric, the Constrained Concordance Index (CCI), to address these limitations and provide a more robust evaluation framework for multimedia quality prediction models.
One of the key challenges in evaluating multimedia quality models is the presence of rater inconsistency and bias. MOS ratings can vary significantly among different raters, making it difficult to assess the true performance of a model. Additionally, rater bias can introduce systematic errors into the evaluation process. The CCI takes these factors into account by considering the statistical significance of MOS differences and excluding comparisons where MOS confidence intervals overlap. This approach helps to mitigate the impact of rater variability and provides a more accurate assessment of model performance.
Moreover, this study demonstrates the effectiveness of the CCI through comprehensive experiments across various domains, including speech and image quality assessment. The CCI’s ability to provide reliable evaluations even in scenarios of low sample sizes, rater group variability, and restricted ranges makes it a valuable tool for assessing the performance of multimedia quality prediction models.
This research also highlights the multi-disciplinary nature of the concepts involved in evaluating multimedia quality models. The study draws on statistical methods, such as Pearson’s Correlation Coefficient (PCC), Spearman’s Rank Correlation Coefficient (SRCC), and Kendall’s Tau (KTAU), to analyze the limitations of these traditional measures and build upon them. By incorporating subjectivity and addressing uncertainties in subjective ratings, the CCI brings together concepts from psychology, human perception, and statistical analysis to enhance quality model evaluation.
From a broader perspective, this work aligns with the field of multimedia information systems, which aims to develop techniques for organizing, processing, and retrieving multimedia data. Quality models play a crucial role in assessing the effectiveness of these systems, and the CCI offers a methodological advancement that can contribute to more reliable and accurate evaluations. Furthermore, the concepts presented in this study have implications beyond traditional multimedia systems. Animations, artificial reality, augmented reality, and virtual realities are all areas where multimedia quality is of utmost importance, and the CCI can provide a valuable tool for evaluating and improving the user experience.
In conclusion, the evaluation of multimedia quality models is a complex task that requires an understanding of statistical analysis, human perception, and subjective rating uncertainties. The Constrained Concordance Index (CCI) introduced in this study offers a promising solution to overcome the limitations of traditional metrics and enhance the evaluation framework. By focusing on statistically significant pairs and considering the inherent uncertainties in subjective ratings, this research makes a valuable contribution to the field of multimedia information systems and has the potential to impact various domains, such as animations, artificial reality, augmented reality, and virtual realities.
Read the original article
by jsendak | Oct 21, 2024 | DS Articles
[This article was first published on
business-science.io, and kindly contributed to
R-bloggers]. (You can report issue about the content on this page
here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Hey guys, welcome back to my R-tips newsletter. Supply chain management is essential in making sure that your company’s business runs smoothly. One of the key elements is managing inventory efficiently. Today, I’m going to show you how to estimate inventory and forecast inventory levels using the planr
package in R. Let’s dive in!
Table of Contents
Here’s what you’ll learn in this article:

Get the Code (In the R-Tip 084 Folder)
SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on October 23rd
Inside the workshop I’ll share how I built a Machine Learning Powered Production Shiny App with ChatGPT
(extends this data analysis to an insane production app):

What: ChatGPT for Data Scientists
When: Wednesday October 23rd, 2pm EST
How It Will Help You: Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free chatgpt for data scientists workshop.
Price: Does Free sound good?
How To Join:
Register Here
R-Tips Weekly
This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Pretty cool, right?
Here is the link to get set up. 
How to Project Inventories with the planr
Package
Why Inventory Projections Are Crucial to Supply Chain Management
Supply chain management is all about balancing supply and demand to ensure that inventory levels are optimized. Overestimating demand leads to excess stock, while underestimating it causes shortages. Accurate inventory projections allow businesses to plan ahead, make data-driven decisions, and avoid costly errors like over-buying inventory or getting into a stock-outage and having no inventory to meet demand.
Enter the planr
Package
The planr
package simplifies inventory management by projecting future inventory levels based on supply, demand, and current stock levels.

Supply Chain Analysis with planr
Let’s take a look at how to use planr
to optimize your supply chain. We’ll go through a quick tutorial to get you started using planr
to project and manage inventories.
Step 1: Load Libraries and Data
First, you need to install the required packages and load the libraries. Run this code:


Get the Code (In the R-Tip 087 Folder)
This data contains supply and demand information for various demand fulfillment units (DFUs) over a period of time.
- Demand Fullfillment Unit (DFU): A product identifier, here labeled as “Item 000001” (there are 10 items total).
- Period: Monthly periods corresponding to supply and demand.
- Demand: Customers purchase and reduce on-hand inventory.
- Opening: An initial inventory of 6570 units in the first period for Item 000001.
- Supply: New supplies arriving in subsequent months.
Step 2: Visualizing Demand Over Time
The first step in understanding supply chain performance is visualizing demand trends. We can use timetk::plot_time_series()
to get a clear view of the demand fluctuations. Run this code:

Get the Code (In the R-Tip 087 Folder)
This code will produce a set of time series plots that show how demand changes over time for each DFU. By visualizing these trends, you can identify patterns and outliers that may impact your projections.

Step 3: Projecting Inventory Levels
Once you have a good understanding of demand, the next step is to project your future inventory levels. The planr::light_proj_inv()
function helps you do this. Run this code:

Get the Code (In the R-Tip 087 Folder)
This function takes in the DFU, Period, Demand, Opening stock, and Supply as inputs to project inventory levels over time by item. The output is a data frame that contains the projected inventories for each period and DFU.
Step 4: Creating an Interactive Table for Projected Inventories
To make your projections more interactive and accessible, you can create an interactive table using reactable
and reactablefmtr
. I’ve made a function to automate the process for you based on the planr
’s awesome documentation. Run this code:


Get the Code (In the R-Tip 087 Folder)
This generates a beautiful interactive table where you can filter and sort the projected inventories. Interactive tables make it easier to analyze your data and share insights with your team.
Conclusion
By using the planr
package, you can project inventory levels with ease, helping you manage your supply chain more effectively. This leads to better decision-making, reduced stockouts, and lower carrying costs.
But there’s more to mastering supply chain analysis in R.
If you would like to grow your Business Data Science skills with R, then please read on…
Need to advance your business data science skills?
I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.
I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.
And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):
6-Figure Data Science Job at CVS Health ($125K)
Senior VP Of Analytics At JP Morgan ($200K)
50%+ Raises & Promotions ($150K)
Lead Data Scientist at Northwestern Mutual ($175K)
2X-ed Salary (From $60K to $120K)
2 Competing ML Job Offers ($150K)
Promotion to Lead Data Scientist ($175K)
Data Scientist Job at Verizon ($125K+)
Data Scientist Job at CitiBank ($100K + Bonus)
Whenever you are ready, here’s the system they are taking:
Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…

Join My 5-Course R-Track Program Now!
(And Become The Data Scientist You Were Meant To Be…)
P.S. – Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.

Continue reading: Supply Chain Analysis with R Using the planr Package
Understanding Supply Chain Analysis With R Using The Planr Package
Efficient supply chain management is the backbone of any successful company and proper inventory management is a key component. In this regard, the ‘planr’ package in R is a crucial tool aiding in inventory estimation and forecasting. Understanding and optimizing the supply chain through this package can lead to better decision-making, fewer stockouts, and reduced carrying costs.
Why are Inventory Projections Important?
Managing a supply chain effectively includes balancing supply and demand to optimize inventory levels. Constant management and analysis of these levels help businesses avoid oversights such as over-purchasing or having insufficient stock to meet demand. This necessitates the use of tools for accurate inventory projection, which can assist businesses in making informed decisions to prevent major, potentially costly errors.
The Planr Package in R
The ‘planr’ package simplifies inventory management through its ability to project future inventory levels. Basing its projections on supply, demand, and current stock levels, you can use the package to greatly assist your company’s supply chain optimization efforts.
Using the Planr Package for Supply Chain Analysis
Briefly, the process of using the ‘planr’ package involves loading libraries and data related to supply and demand for various demand fulfillment units (DFUs). Visualization of demand trends over time can assist in identifying patterns and outliers that might impact projections. You can then project future inventory levels based on all of these insights and create an interactive table for these projections for ease of interpretation and sharing.
Application in Businesses and Future Developments
As businesses continue to strive for efficiency, the use of tools such as planr will likely become more widespread. Accurate inventory projections can significantly reduce the chances of costly errors, enhance decision-making, and improve overall supply chain management. In the future, these tools may also evolve to incorporate more complex variables and more accurate prediction models to further enhance their utility.
Actionable Advice
Considering the above, here are some steps that businesses can take:
- Adopt Tools for Inventory Projections: Implement tools like the ‘planr’ package in R for efficient inventory management and accurate projection. This small step can make a significant difference in supply chain management and decision-making processes.
- Invest in Training: Encourage employees to learn and understand how to use data science tools for business optimizations. This can increase in-house capabilities and reduce dependency on external resources.
- Keep Abreast with Technological Advancements: Stay informed about developments in data science. New tools and resources regularly emerge that can improve various business processes.
- Consider a Data Science Career: For individuals, considering a career in data science can be rewarding and beneficial, as evidenced by various testimonials. Data science skills are in demand and can lead to promising career opportunities.
In conclusion, tools like the planr package in R show how data science can come to the rescue of businesses, helping optimize their supply chain management. Its adoption and mastery can lead to multiple benefits in the long run.
Read the original article
by jsendak | Sep 28, 2024 | AI
arXiv:2409.17480v1 Announce Type: new Abstract: Existing script event prediction task forcasts the subsequent event based on an event script chain. However, the evolution of historical events are more complicated in real world scenarios and the limited information provided by the event script chain also make it difficult to accurately predict subsequent events. This paper introduces a Causality Graph Event Prediction(CGEP) task that forecasting consequential event based on an Event Causality Graph (ECG). We propose a Semantic Enhanced Distance-sensitive Graph Prompt Learning (SeDGPL) Model for the CGEP task. In SeDGPL, (1) we design a Distance-sensitive Graph Linearization (DsGL) module to reformulate the ECG into a graph prompt template as the input of a PLM; (2) propose an Event-Enriched Causality Encoding (EeCE) module to integrate both event contextual semantic and graph schema information; (3) propose a Semantic Contrast Event Prediction (ScEP) module to enhance the event representation among numerous candidate events and predict consequential event following prompt learning paradigm. %We construct two CGEP datasets based on existing MAVEN-ERE and ESC corpus for experiments. Experiment results validate our argument our proposed SeDGPL model outperforms the advanced competitors for the CGEP task.
The article “Causality Graph Event Prediction: A Semantic Enhanced Distance-sensitive Graph Prompt Learning Model” addresses the limitations of existing script event prediction tasks in accurately forecasting subsequent events. It introduces a new task called Causality Graph Event Prediction (CGEP), which uses an Event Causality Graph (ECG) to forecast consequential events. To tackle this task, the authors propose a Semantic Enhanced Distance-sensitive Graph Prompt Learning (SeDGPL) Model. The SeDGPL Model consists of three key modules: a Distance-sensitive Graph Linearization (DsGL) module, an Event-Enriched Causality Encoding (EeCE) module, and a Semantic Contrast Event Prediction (ScEP) module. These modules aim to reformulate the ECG, integrate event contextual semantic and graph schema information, and enhance event representation, respectively. The authors validate their proposed SeDGPL model through experiments conducted on two CGEP datasets, demonstrating its superior performance compared to advanced competitors in the field.
Exploring Consequential Event Prediction with Causality Graphs
In the realm of event prediction, existing approaches often rely on event script chains to forecast subsequent events. However, the complexities of historical events in real-world scenarios and the limited information offered by event script chains have made it challenging to accurately predict future events. To tackle these issues, this article introduces a new task called Causality Graph Event Prediction (CGEP), which aims to forecast consequential events based on an Event Causality Graph (ECG).
Introducing the SeDGPL Model for CGEP Task
Addressing the aforementioned challenges, we propose the Semantic Enhanced Distance-sensitive Graph Prompt Learning (SeDGPL) Model for the CGEP task. The SeDGPL model incorporates several key components to improve event prediction accuracy:
- Distance-sensitive Graph Linearization (DsGL) Module: This module reformulates the ECG, representing the causal relationships, into a graph prompt template. This template serves as the input for a Pre-trained Language Model (PLM) to capture contextual information.
- Event-Enriched Causality Encoding (EeCE) Module: By integrating both event contextual semantics and graph schema information, the EeCE module enhances the understanding of the relationships between events, enriching the representation of each event.
- Semantic Contrast Event Prediction (ScEP) Module: This module aims to improve event representation and predict consequential events by leveraging a prompt learning paradigm. It effectively compares and contrasts numerous candidate events to make accurate predictions.
Through the combination of these three modules, the SeDGPL model offers an innovative approach to the CGEP task.
Validating the SeDGPL Model
To assess the performance of our proposed SeDGPL model, we constructed two CGEP datasets based on the existing MAVEN-ERE and ESC corpus. These datasets allowed us to conduct experiments and evaluate the efficacy of our model.
The experimental results demonstrated that the SeDGPL model outperformed advanced competitors in the CGEP task. Its ability to effectively incorporate contextual semantics, graph schema information, and prompt learning significantly improved the accuracy of consequential event prediction.
Innovation in Event Prediction
The introduction of the CGEP task, along with the SeDGPL model, represents a novel approach to event prediction. By moving beyond traditional event script chains and incorporating causality graphs, our model enhances the understanding of complex historical events. This innovation opens up new possibilities for accurately predicting subsequent events in real-world scenarios.
Furthermore, the SeDGPL model’s ability to incorporate semantic contrast event prediction and distance-sensitive graph linearization provides a framework for future advancements in event prediction models. As researchers continue to explore the potential of causality graphs, we can expect further improvements in forecasting consequential events.
In conclusion, the Causality Graph Event Prediction task, coupled with the Semantic Enhanced Distance-sensitive Graph Prompt Learning model, represents an innovative solution for accurate event prediction. By leveraging the power of causality graphs and integrating contextual semantics, graph schema information, and prompt learning, our model opens up new avenues in accurately forecasting subsequent events. As we continue to explore and refine these methodologies, we can expect even greater advancements in the field of event prediction.
The paper introduces a new task called Causality Graph Event Prediction (CGEP), which aims to forecast subsequent events based on an Event Causality Graph (ECG). The authors argue that the existing script event prediction task, which relies on event script chains, is not sufficient for accurately predicting subsequent events in real-world scenarios due to the complexity of the evolution of historical events and the limited information provided by event script chains.
To address these limitations, the authors propose a Semantic Enhanced Distance-sensitive Graph Prompt Learning (SeDGPL) Model for the CGEP task. The SeDGPL model consists of three main components:
1. Distance-sensitive Graph Linearization (DsGL) module: This module reformulates the ECG into a graph prompt template, which serves as the input for a Pre-trained Language Model (PLM). By linearizing the graph, the model can effectively capture the dependencies and relationships between events.
2. Event-Enriched Causality Encoding (EeCE) module: This module integrates both event contextual semantic information and graph schema information. By incorporating the contextual information of events, the model can better understand the relationships between events and make more accurate predictions.
3. Semantic Contrast Event Prediction (ScEP) module: This module aims to enhance the event representation among numerous candidate events and predict the consequential event following a prompt learning paradigm. By contrasting different candidate events, the model can identify the most probable subsequent event.
The authors conducted experiments on two CGEP datasets constructed based on existing MAVEN-ERE and ESC corpus. The experimental results validate the effectiveness of the proposed SeDGPL model, as it outperforms advanced competitors for the CGEP task.
Overall, this paper introduces a novel approach to event prediction by leveraging Event Causality Graphs and proposes a SeDGPL model that incorporates semantic information and graph schema to improve the accuracy of subsequent event forecasting. The experimental results provide evidence of the model’s superiority over existing competitors in the CGEP task. This work opens up new possibilities for more accurate event prediction in real-world scenarios and has potential applications in various domains, such as natural language understanding and event forecasting systems.
Read the original article
by jsendak | Jul 18, 2024 | AI
arXiv:2407.12053v1 Announce Type: new Abstract: Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still requires multiple runs of AlphaFold to finally generate one single conformation. Due to the heavy consumption of AlphaFold, its applicability is limited in sampling larger set of protein ensembles or the longer chains within a constrained timeframe. In this work, we propose a feature-conditioned generative model called AlphaFlow-Lit to realize efficient protein ensembles generation. In contrast to the full fine-tuning on the entire structure, we focus solely on the light-weight structure module to reconstruct the conformation. AlphaFlow-Lit performs on-par with AlphaFlow and surpasses its distilled version without pretraining, all while achieving a significant sampling acceleration of around 47 times. The advancement in efficiency showcases the potential of AlphaFlow-Lit in enabling faster and more scalable generation of protein ensembles.
The article “AlphaFlow-Lit: Efficient Generation of Protein Ensembles with a Feature-Conditioned Generative Model” explores the conformational landscapes of proteins and the importance of understanding their biological functions. The authors introduce AlphaFlow-Lit, a feature-conditioned generative model that aims to efficiently generate protein ensembles. While previous models like AlphaFlow required multiple runs of AlphaFold to generate a single conformation, AlphaFlow-Lit focuses on the light-weight structure module, achieving comparable results with a significant sampling acceleration of around 47 times. This advancement in efficiency has the potential to enable faster and more scalable generation of protein ensembles.
Exploring Protein Conformational Landscapes with AlphaFlow-Lit
Understanding the conformational landscapes of proteins is crucial for unraveling their biological functions and properties. Researchers have long relied on computational methods to predict protein structures, and AlphaFold has emerged as a leading approach in this domain. However, AlphaFold’s need for multiple runs to generate a single conformation limits its practicality when dealing with larger protein ensembles or longer chains within a constrained timeframe.
In an effort to address this limitation, we present AlphaFlow-Lit, a feature-conditioned generative model that enables efficient protein ensembles generation while maintaining accuracy. Unlike the traditional approach of fine-tuning the entire structure, we focus solely on the light-weight structure module to reconstruct the conformation. This targeted approach allows AlphaFlow-Lit to perform on-par with AlphaFlow and surpass its distilled version without pretraining, all while achieving a significant sampling acceleration of approximately 47 times.
The key innovation of AlphaFlow-Lit lies in its ability to leverage the efficiency of the light-weight structure module while still maintaining the high accuracy of AlphaFold. By focusing on this module, which carries crucial information about local interactions, we can drastically reduce the computational burden without sacrificing quality. This reduction in computational demands opens up new possibilities for studying larger protein ensembles or longer chains within practical timeframes.
The results obtained with AlphaFlow-Lit demonstrate its potential for enabling faster and more scalable generation of protein ensembles. This breakthrough in efficiency not only accelerates the research process but also empowers researchers to explore a wider range of protein structures and their conformational landscapes. With faster and more accessible protein structure prediction, scientists can gain deeper insights into the functioning and properties of these essential molecules.
Moreover, the scalability of AlphaFlow-Lit opens up avenues for studying complex protein systems that were previously inaccessible. With the ability to generate protein ensembles efficiently, researchers can now investigate the dynamics and interactions of larger protein complexes, shedding light on how they function in various cellular processes.
In conclusion, AlphaFlow-Lit represents a significant advancement in the field of protein structure prediction. By leveraging the light-weight structure module, this feature-conditioned generative model delivers efficient and accurate protein ensembles generation. The newfound scalability and speed offered by AlphaFlow-Lit hold promise for accelerating scientific discoveries and unlocking deeper insights into the complex world of proteins.
The paper arXiv:2407.12053v1 introduces a new generative model called AlphaFlow-Lit that aims to improve the efficiency of generating protein ensembles. The authors highlight the importance of studying the conformational landscapes of proteins in understanding their biological functions and properties.
The existing model, AlphaFlow, is a sequence-conditioned generative model that incorporates flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. While AlphaFlow has the advantage of efficient sampling, it still requires multiple runs of AlphaFold to generate a single conformation. This limitation restricts its applicability in sampling larger sets of protein ensembles or longer chains within a constrained timeframe.
To address this limitation, the authors propose AlphaFlow-Lit, a feature-conditioned generative model. Unlike AlphaFlow, which performs full fine-tuning on the entire structure, AlphaFlow-Lit focuses solely on the lightweight structure module for reconstructing the conformation. Despite this simplified approach, AlphaFlow-Lit performs on-par with AlphaFlow and even surpasses its distilled version without pretraining. Moreover, AlphaFlow-Lit achieves a significant sampling acceleration of approximately 47 times compared to AlphaFlow.
This advancement in efficiency is crucial as it enables faster and more scalable generation of protein ensembles. By reducing the computational resources required, researchers can now analyze larger sets of protein structures or longer chains within a reasonable timeframe. This has the potential to enhance our understanding of protein structures and their functions.
However, it is important to note that while AlphaFlow-Lit improves efficiency, it may sacrifice some accuracy compared to the full fine-tuning approach of AlphaFlow. It would be interesting to explore the trade-off between efficiency and accuracy and determine the specific scenarios where AlphaFlow-Lit is most beneficial. Additionally, further research could focus on optimizing the lightweight structure module to enhance the accuracy of AlphaFlow-Lit without compromising its efficiency.
Read the original article