[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Shanghai Composite does not seem to be at an ideal point for entry.

Source code:

library(tidyverse)
library(tidyquant)
library(timetk)
library(tidymodels)
library(modeltime)
library(workflowsets)

#Shanghai Composite Index (000001.SS)
df_shanghai <-
  tq_get("000001.SS", from = "2015-09-01") %>%
  tq_transmute(select = close,
               mutate_fun = to.monthly,
               col_rename = "sse") %>%
  mutate(date = as.Date(date))

#Splitting
split <-
  df_shanghai %>%
  time_series_split(assess = "1 year",
                    cumulative = TRUE)

df_train <- training(split)
df_test <- testing(split)

#Time series cross validation for tuning
df_folds <- time_series_cv(df_train,
                           initial = 77,
                           assess = 12)


#Preprocessing
rec <-
  recipe(sse ~ date, data = df_train) %>%
  step_mutate(date_num = as.numeric(date)) %>%
  step_date(date, features = "month") %>%
  step_dummy(date_month, one_hot = TRUE) %>%
  step_normalize(all_numeric_predictors())


rec %>%
  prep() %>%
  bake(new_data = NULL) %>% view()


#Model
mod <-
  arima_boost(
    min_n = tune(),
    learn_rate = tune(),
    trees = tune()
  ) %>%
  set_engine(engine = "auto_arima_xgboost")

#Workflow set
wflow_mod <-
  workflow_set(
    preproc = list(rec = rec),
    models = list(mod = mod)
  )


#Tuning and evaluating the model on all the samples
grid_ctrl <-
  control_grid(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = TRUE
  )

grid_results <-
  wflow_mod %>%
  workflow_map(
    seed = 98765,
    resamples = df_folds,
    grid = 10,
    control = grid_ctrl
  )


#Accuracy of the grid results
grid_results %>%
  rank_results(select_best = TRUE,
               rank_metric = "rmse") %>%
  select(Models = wflow_id, .metric, mean)


#Finalizing the model with the best parameters
best_param <-
  grid_results %>%
  extract_workflow_set_result("rec_mod") %>%
  select_best(metric = "rmse")


wflw_fit <-
  grid_results %>%
  extract_workflow("rec_mod") %>%
  finalize_workflow(best_param) %>%
  fit(df_train)


#Calibrate the model to the testing set
calibration_boost <-
  wflw_fit %>%
  modeltime_calibrate(new_data = df_test)

#Accuracy of the finalized model
calibration_boost %>%
  modeltime_accuracy(metric_set = metric_set(mape, smape))



#Predictive intervals
calibration_boost %>%
  modeltime_forecast(actual_data = df_merged %>%
                       filter(date >= last(date) - months(12)),
                     new_data = df_test) %>%
  plot_modeltime_forecast(.interactive = FALSE,
                          .legend_show = FALSE,
                          .line_size = 1.5,
                          .color_lab = "",
                          .title = "Shanghai Composite Index") +
  geom_point(aes(color = .key)) +
  labs(subtitle = "Monthly Data<br><span style = 'color:darkgrey;'>Predictive Intervals</span><br><span style = 'color:red;'>Point Forecast Line</span>") +
  scale_x_date(breaks = c(make_date(2023,11,1),
                          make_date(2024,5,1),
                          make_date(2024,10,1)),
               labels = scales::label_date(format = "%b'%y"),
               expand = expansion(mult = c(.1, .1))) +
  ggthemes::theme_wsj(
    base_family = "Roboto Slab",
    title_family = "Roboto Slab",
    color = "blue",
    base_size = 12) +
  theme(legend.position = "none",
        plot.background = element_rect(fill = "lightyellow", color = "lightyellow"),
        plot.title = element_text(size = 24),
        axis.text = element_text(size = 16),
        plot.subtitle = ggtext::element_markdown(size = 20, face = "bold"))

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Time Series Machine Learning: Shanghai Composite

Analysis of The Shanghai Composite Index Using Time Series Machine Learning

The original article deals with the analysis of The Shanghai Composite Index (000001.SS) using time-series machine learning. It utilizes several R libraries: tidyverse, tidyquant, timetk, tidymodels, modeltime, and workflowsets, along with the ARIMA (Auto-Regressive Integrated Moving Averages) boosting method and cross-validation for tuning the model.

Long-term Implications and Possible Developments

The Implications of Time-Series Analysis on Financial Markets

Time-series machine learning techniques offer profound implications on financial markets. These methods provide a systematic analysis of stock rates over time to make predictions, subsequently offering investors and traders an informed decision-making tool. By utilizing machine learning techniques, the article suggests that the Shanghai Composite Index may not be at an ideal entry point for investors.

The Future of Machine Learning in Stock Market Prediction

Machine learning models for stock market prediction are likely to grow in sophistication and accuracy. With more data being collected every day, the training sets for these models are expanding, enhancing their predictive capabilities. Future developments may see models capable of considering myriad factors influencing stock movements, such as geopolitical events, technological advancements, and even shifts in investor sentiment based on news and social media trends.

Actionable Insights

Given the drawbacks associated with subjective judgement and the benefits of utilizing machine learning for stock market predictions, influencers in the financial world should consider implementing these technologies more extensively. Drawbacks of traditional forecast methods include lacking systematic process and a susceptibility to human errors.

For Data Scientists

Consider incorporating the techniques mentioned in the article in your models. Doing so will not only assist you in making financial forecasts but also help you gain a deeper understanding of model tuning and cross-validation. Always ensure you are keeping up to date with the latest methodologies in the field, given the rapid evolution of machine learning.

For Investors

Take advantage of the stock market predictions made with machine learning to guide decision-making. However, it is crucial to understand that these predictions are not foolproof. Always make sure to use them as a tool amongst a wider collection of analysis methods, including fundamental and technical analysis.

For Business Leaders And Managers

Invest in technology and personnel capable of leveraging machine learning for stock analysis. Having these capabilities in-house will allow for more flexible, timely, and contextually appropriate forecasts to guide strategic financial decisions.

Read the original article