[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Introduction

Hey R enthusiasts! Steve here, and today I’m excited to share some fantastic updates about a key function in the tidyAML package – internal_make_wflw_predictions(). The latest version addresses issue #190, ensuring that all crucial data is now included in the predictions. Let’s dive into the details!

What’s New?

In response to user feedback, we’ve enhanced the internal_make_wflw_predictions() function to provide a comprehensive set of predictions. Now, when you make a call to this function, it includes:

  1. The Actual Data: This is the real-world data that your model aims to predict. Having access to this information helps you assess how well your model is performing on unseen instances.

  2. Training Predictions: Predictions made on the training dataset. This is essential for understanding how well your model generalizes to the data it was trained on.

  3. Testing Predictions: Predictions made on the testing dataset. This is crucial for evaluating the model’s performance on data it hasn’t seen during the training phase.

How to Use It

To take advantage of these new features, here’s how you can use the updated internal_make_wflw_predictions() function:

internal_make_wflw_predictions(.model_tbl, .splits_obj)

Arguments:

  1. .model_tbl: The model table generated from a function like fast_regression_parsnip_spec_tbl(). Ensure that it has a class of “tidyaml_mod_spec_tbl.” This is typically used after running the internal_make_fitted_wflw() function and saving the resulting tibble.

  2. .splits_obj: The splits object obtained from the auto_ml function. It is internal to the auto_ml function.

Example Usage

Let’s walk through an example using some popular R packages:

library(tidymodels)
library(tidyAML)
library(tidyverse)
tidymodels_prefer()

# Create a model specification table
mod_spec_tbl <- fast_regression_parsnip_spec_tbl(
  .parsnip_eng = c("lm","glm"),
  .parsnip_fns = "linear_reg"
)

# Create a recipe
rec_obj <- recipe(mpg ~ ., data = mtcars)

# Create splits
splits_obj <- create_splits(mtcars, "initial_split")

# Generate the model table
mod_tbl <- mod_spec_tbl |>
  mutate(wflw = full_internal_make_wflw(mod_spec_tbl, rec_obj))

# Generate the fitted model table
mod_fitted_tbl <- mod_tbl |>
  mutate(fitted_wflw = internal_make_fitted_wflw(mod_tbl, splits_obj))

# Make predictions with the enhanced function
preds_list <- internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)

This example demonstrates how to integrate the updated function into your workflow seamlessly. Typically though one would not use this function directly, but rather use the fast_regression() or fast_classification() function, which calls this function internally. Let’s now take a look at the output of everything.

rec_obj
── Recipe ──────────────────────────────────────────────────────────────────────
── Inputs 
Number of variables by role
outcome:    1
predictor: 10
splits_obj
$splits
<Training/Testing/Total>
<24/8/32>

$split_type
[1] "initial_split"
mod_spec_tbl
# A tibble: 2 × 5
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec
      <int> <chr>           <chr>         <chr>        <list>
1         1 lm              regression    linear_reg   <spec[+]>
2         2 glm             regression    linear_reg   <spec[+]> 
mod_tbl
# A tibble: 2 × 6
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw
      <int> <chr>           <chr>         <chr>        <list>     <list>
1         1 lm              regression    linear_reg   <spec[+]>  <workflow>
2         2 glm             regression    linear_reg   <spec[+]>  <workflow>
mod_fitted_tbl
# A tibble: 2 × 7
  .model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw
      <int> <chr>           <chr>         <chr>        <list>     <list>
1         1 lm              regression    linear_reg   <spec[+]>  <workflow>
2         2 glm             regression    linear_reg   <spec[+]>  <workflow>
# ℹ 1 more variable: fitted_wflw <list>
preds_list
[[1]]
# A tibble: 64 × 3
   .data_category .data_type .value
   <chr>          <chr>       <dbl>
 1 actual         actual       15.2
 2 actual         actual       19.7
 3 actual         actual       17.8
 4 actual         actual       15
 5 actual         actual       10.4
 6 actual         actual       15.8
 7 actual         actual       17.3
 8 actual         actual       30.4
 9 actual         actual       15.2
10 actual         actual       19.2
# ℹ 54 more rows

[[2]]
# A tibble: 64 × 3
   .data_category .data_type .value
   <chr>          <chr>       <dbl>
 1 actual         actual       15.2
 2 actual         actual       19.7
 3 actual         actual       17.8
 4 actual         actual       15
 5 actual         actual       10.4
 6 actual         actual       15.8
 7 actual         actual       17.3
 8 actual         actual       30.4
 9 actual         actual       15.2
10 actual         actual       19.2
# ℹ 54 more rows

You will notice the names of the preds_list output:

names(preds_list[[1]])
[1] ".data_category" ".data_type"     ".value"        

So we have .data_category, .data_type, and .value. Let’s take a look at the unique values of each column for .data_category and .data_type:

unique(preds_list[[1]]$.data_category)
[1] "actual"    "predicted"

So we have our actual data the the predicted data. The predicted though has both the training and testing data in it. Let’s take a look at the unique values of .data_type:

unique(preds_list[[1]]$.data_type)
[1] "actual"   "training" "testing" 

This will allow you to visualize the data how you please, something we will go over tomorrow!

Why It Matters

By including actual data along with training and testing predictions, the internal_make_wflw_predictions() function empowers you to perform a more thorough evaluation of your models. This is a significant step towards ensuring the reliability and generalization capability of your machine learning models.

So, R enthusiasts, update your tidyAML package, explore the enhanced features, and let us know how these improvements elevate your modeling experience. Happy coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Exploring the Enhanced Features of tidyAML’s internal_make_wflw_predictions()

The Importance of Updates to the tidyAML Package

An important function used extensively in the Machine Learning sector, ‘internal_make_wflw_predictions()’, has recently received a major update in the tidyAML package. The changes introduced have addressed key data incorporation issues, ensuring that all elementary data is included in the predictions moving forward.

This article will explore the nuances of these changes, their implications for the future, and present an example of how to effectively utilize the function in its updated form. In this analysis, we’ll also provide actionable advice for R enthusiasts on integrating these changes into their workplace.

Understanding The Changes And Their Future Implications:

The internal_make_wflw_predictions() function has been significantly enhanced to yield a comprehensive set of predictions, in response to extensive user feedback. Highllights of these enhancements include:

  • The Actual Data: This refers to real-world data that the model aims to predict. Inclusion of the actual data improves an individual’s ability to evaluate the performance of their model on unseen data instances.
  • Training Predictions: These are forecasts made based on the training dataset. Access to training predictions aids in understanding the model’s generalization to the data it was trained on.
  • Testing Predictions: These are predictions made on the testing dataset which are crucial for assessing the performance of the model on data it hasn’t encountered during the training phase.

Eliminating the gaps in data used for machine learning models will enable one to perform a more comprehensive evaluation of models. As a result, this increases the reliability and generalization capability of predictive models – a key benefit to the future of machine learning.

Utilizing The Updated Function

The use of the internal_make_wflw_predictions() function with its new enhancements can be simplified into certain steps, as demonstrated in the code provided in the source article. Updating the tidyAML package is also recommended before integrating these changes into your workflow.

Example Usage

Here’s an example of how you can utilize these updates using popular R packages:

library(tidymodels)
library(tidyAML)
library(tidyverse)
tidymodels_prefer()
[…]
# Make predictions with the enhanced function
preds_list <- internal_make_wflw_predictions(mod_fitted_tbl, splits_obj)

By integrating this example into your workflow, the improvements in this function’s updates can substantially impact machine learning model predictions.

Key Advice for R Enthusiasts

The upgrade to the internal_make_wflw_predictions() in the tidyAML package presents an opportunity for more accurate predictions while also offering ease of accessibility to key data. The ability to assess model performance on unseen data will enhance the predictability and reliability of machine learning models.
We advise that users update their tidyAML package and explore the enhanced features. By understanding and utilizing these upgrades, users can elevate their modeling experience significantly and deliver more accurate results.

Read the original article