[This article was first published on DataGeeek, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

As we enter the new year, women still seem not to have equal rights compared to men in the working environment. This situation is more prominent in the developing and least developed countries. This article will examine that using the WBL (women, business, and law) index.

First, we will create a plot comparing WBL scores by region, excluding high-income economies.

library(tidyverse)
library(openxlsx)
library(ragg) #google font setting

df_wbl <-
  openxlsx::read.xlsx("https://github.com/mesdi/blog/raw/main/wbl.xlsx",
                      sheet = "WBL Panel 2023") %>%
  as_tibble() %>%
  janitor::clean_names()

#Plotting compared WBL index scores by region
#Excluded high-income OECD countries
df_wbl %>%
  #remove the high income level countries
  filter(region != "High income: OECD") %>%
  group_by(region) %>%
  mutate(wbl_mean_index = mean(wbl_index)) %>%
  select(wbl_mean_index) %>%
  unique() %>%
  ggplot(aes(wbl_mean_index,
             reorder(region, wbl_mean_index),
             fill = region)) +
  geom_col(width = 0.5) +
  geom_text(aes(label = .data$wbl_mean_index %>% round(2)),
            nudge_x = 2,
            family = "Bricolage Grotesque") +
  scale_x_continuous(position = "top") +
  labs(x = "",
       y = "",
       title = "Comparing WBL Index Averages by Region, 1971-2023",
       subtitle = "Excluding high-income economies",
       #for two captions
       caption = c("Source: Women, Business, and the Law database" ,
                   "WBL: Women, Business, and the Law")) +
  theme_minimal(base_family = "Bricolage Grotesque",
                base_size = 16) +
  theme(panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.major.x = element_line(linewidth = 1.1),
        plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 11),
        plot.caption = element_text(hjust = c(0.06,1), #for two captions layout
                                    size = 10),
        legend.position = "none",
        plot.background = element_rect(fill = "#F0F0F0"))

It seems that Sub-Saharan Africa is making progress and has the potential to close the gap between itself and East Asia and the Pacific region. On the other hand, Latin America shows that they are not behind Europe at all.

Now, we are examining the reasons beyond the above results. To do this, we will use quantile regression via a random forest model.

#Modeling
library(tidymodels)
library(gt)

#Splitting into train and test sets
set.seed(1983)
data_split <- initial_split(df_wbl,
                            strata = "wbl_index",
                            prop = 0.8)

wbl_train <- training(data_split)
wbl_test  <- testing(data_split)

#Recipe
wbl_rec <-
  recipe(
    wbl_index ~
      region +
      income_group  +
      length_of_paid_paternity_leave +
      length_of_paid_maternity_leave,
    data = wbl_train
  )

#Quantile regression via random forest from ranger package
wbl_mod <-
  rand_forest() %>%
  set_engine("ranger",
             importance = "permutation", # for variable importance,
             seed = 12345,
             quantreg = TRUE # for quantile regression ) %>%
  set_mode("regression")

set.seed(98765)
wbl_wflw <-
  workflow() %>%
  add_model(wbl_mod) %>%
  add_recipe(wbl_rec) %>%
  fit(wbl_train)

#The function of extracting predictions
preds_bind <- function(data_fit, lower = 0.05, upper = 0.95){
  predict(
    wbl_wflw$fit$fit$fit,
    workflows::extract_recipe(wbl_wflw) %>% bake(data_fit),
    type = "quantiles",
    quantiles = c(lower, upper, 0.50)
  ) %>%
    with(predictions) %>% #extracts predictions of Ranger prediction object
    as_tibble() %>%
    set_names(paste0(".pred", c("_lower", "_upper",  ""))) %>%
    bind_cols(data_fit) %>%
    select(contains(".pred"), wbl_index)
}


#Accuracy of train and test set
wbl_preds_train <- preds_bind(wbl_train)
wbl_preds_test <- preds_bind(wbl_test)

bind_rows(
  yardstick::rsq(wbl_preds_train, wbl_index , .pred),
  yardstick::rsq(wbl_preds_test, wbl_index, .pred)
) %>%
  mutate(dataset = c("training", "test"))

# A tibble: 2 × 4
  .metric .estimator .estimate dataset
  <chr>   <chr>          <dbl> <chr>
1 rsq     standard       0.728 training
2 rsq     standard       0.718 test

The accuracy results look fine for both the train and test sets. Hence, we can make a variable importance calculations with this model.

#Variable importance
library(DALEXtra)

#Creating a preprocessed dataframe of the train dataset
imp_wbl <-
  wbl_rec %>%
  prep() %>%
  bake(new_data = NULL)

#Explainer
explainer_wbl <-
  explain_tidymodels(
    wbl_wflw,
    data = imp_wbl %>% select(-wbl_index),
    y = imp_wbl$wbl_index,
    label = "",
    verbose = FALSE
  )

#Computing the permutation-based variable-importance measure
set.seed(12345)
vip_wbl <- model_parts(explainer_wbl,
                       B = 100,
                       loss_function = loss_root_mean_square)


#Plotting variable importance

#Averaged RMSE value for the full model
wbl_dropout <-
  vip_wbl %>%
  filter(variable == "_full_model_") %>%
  summarise(dropout_loss = mean(dropout_loss))

vip_wbl %>%
  filter(variable != "_full_model_",
         variable != "_baseline_") %>%
  mutate(label = str_replace_all(variable, "_", " ") %>% str_to_title(),
         label = fct_reorder(label, dropout_loss)) %>%
  ggplot(aes(dropout_loss, label)) +
  geom_vline(data = wbl_dropout,
             aes(xintercept = dropout_loss),
             linewidth = 1.4,
             lty = 2,
             alpha = 0.7,
             color = "red") +
  geom_boxplot(fill = "#91CBD765",
               alpha = 0.4) +
  labs(x = "",
       y = "",
       title = "Root mean square error (RMSE) after 100 permutations",
       subtitle = "The <span style='color:red'>dashed line</span> shows the RMSE for the full model",
       caption = "Higher indicates more important") +
  theme_minimal(base_family = "Bricolage Grotesque") +
  theme(plot.title = element_text(face = "bold"),
        plot.subtitle = ggtext::element_markdown(),
        plot.caption = element_text(hjust = 0.5),
        axis.text.y = element_text(size = 11),
        plot.background = element_rect(fill = "#F0F0F0"))

According to the graph, the effects of paid paternity and maternity leave seem to be very close, which is pretty interesting. The most prominent effect belongs to the region variable that shows the geo-cultural effects are important to determine the WBL index score.

To leave a comment for the author, please follow the link and comment on their blog: DataGeeek.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Data Visualization of the WBL Index and Modeling with Quantile Regression using Random Forest

Long-term Implications and Future Developments of Women’s Rights in The Work Environment

The disparity between men and women’s rights in the work environment, particularly in developing countries, is highlighted in the analyzed text. Sub-Saharan Africa’s advancement, the stagnation of Latin America, and Europe’s stance are worth scrutinizing. The text uses quantile regression with a random forest model to discern that regional differences and the distinction of paid paternity or maternity leave significantly impact the Women, Business, and Law (WBL) index score.

Implications

Geographic-Cultural Impact

The most striking effect comes from geographical-cultural differences. This presumably implies that aspects like the societal view of women and recognized cultural norms play a significant role in shaping the WBL index. This situates the steps to be taken towards equality in a broader context outside of mere law and order.

Paternity and Maternity Leave

Interestingly, paid parental leaves have a significant impact on the WBL index. The nearly equal weight between the two highlights that not only should laws be respectful to women in terms of maternity leaves, but equal emphasis must also be placed on paternity leaves.

Possible Future Developments

Laying the groundwork for any form of improvement begins with recognizing the key obstacles. First, there is a need for more culturally nuanced understandings to effectively deal with geo-cultural barriers blocking progress. Secondly, achieving equality at the workplace cannot be divorced from equalizing maternity and paternity leaves so that both parents can share child-rearing duty without jeopardizing their careers.

Actionable Advice

  1. Locale-specific laws: To address the geo-cultural effects, governments should curate laws that cater to their distinct societal and cultural norms. While global coordination is necessary, real change concretely materializes on a more grassroots level.
  2. Parental Leave Policies: Companies, in conjunction with the legal framework, should aim to equalize maternity and paternity leaves, advancing gender equality not merely in legislation but also in practice.
  3. Procedure Review: Regular reviews of the procedures should be conducted to ensure that laws operate as they are supposed to, avoiding potential manipulative practices.
  4. Future Studies: Detailed survey research is necessary to discover underlying reasons as to why some regions perform better than others on the WBL Index. This can help transfer learned lessons from more successful regions to others.

Read the original article