“Statistical Binning on Hexagonal Grids: Day 4 of 30DayMapChallenge

[This article was first published on r.iresmi.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Day 4 of 30DayMapChallenge: « Hexagons » (previously).

Hexagonal grid?

We will do statistical binning on a hexagonal grid, but not just any grid. Geodesic Discrete Global Grid Systems (Kimerling et al. 1999; Sahr, White, and Kimerling 2003) allow to use hierarchical equal-area hexagon¹ grids.

The {dggridR} package (Barnes 2017) will help us to generate these grids on France.

library(dggridR)
library(dplyr)
library(purrr)
library(sf)
library(units)
library(glue)
library(tibble)
library(ggplot2)
library(ggspatial)
library(rnaturalearth)
# devtools::install_github("ropensci/rnaturalearthhires")

First we are going to build these Discrete global grids for metropolitan France with cell size of 1, 5, 10, 20 and 100 km. We need a spatial footprint.

# get it from
# https://static.data.gouv.fr/resources/admin-express-cog-simplifiee-2024-metropole-drom-saint-martin-saint-barthelemy/20240930-094021/adminexpress-cog-simpl-000-2024.gpkg
fx <- read_sf("~/data/adminexpress/adminexpress-cog-simpl-000-2024.gpkg",
              layer = "region") |>
  filter(insee_reg > "06") |>
  summarise()

We can chose which grid resolution we’ll make with the desired hexagon approximate “size” in km; the sampling allows us to have all the cells (if too large, some cells will be absent). It takes a few minutes…

res <- tribble(~distance, ~sampling,
                       1,     0.005,
                       5,     0.010,
                      10,     0.010,
                      20,     0.010,
                     100,     0.100)

#' Build a DGG
#'
#' Discrete Global Grid
#'
#' @param src (sf) : footprint
#' @param dest (char) : output filename
#' @param distance (num) : approximate cell size (km)
#' @param sampling (num) : points sampling to create the cells (°)
#' @param desc (char) : métadata (ex: layer description)
#'
#' @return (sf) : layer (+ file on disk with metadata)
build_dgg <- function(src, dest, distance = 100, sampling = 0.1, desc = "") {

  msg <- capture.output(
    dg <- dgconstruct(spacing = distance,
                      projection = "ISEA",
                      aperture = 3,
                      topology = "HEXAGON"))
  properties <- paste(glue_data(enframe(dg), "{name}: {value}"),
                      collapse = "n")

  hex <- dg |>
    dgshptogrid(src, cellsize = sampling)

  hex |>
    st_join(src, left = FALSE) |>
    st_write(dest,
             layer = glue("dggrid_{distance}k"),
             layer_options = c(
               glue("IDENTIFIER=Discrete Global Grid - {distance} km"),
               glue("DESCRIPTION=ISEA3H
                    {desc}

                    {msg}

                    {properties}

                    {Sys.Date()}")),
             delete_layer = TRUE)
}

# execute for all resolutions
res |>
  pwalk(build_dgg,
        src = fx,
        dest = "dggrid_fx.gpkg",
        desc = "France métropolitaine WGS84", .progress = TRUE)

Now we have a geopackage with all our grids:

st_layers("dggrid_fx.gpkg")

Driver: GPKG
Available layers:
   layer_name geometry_type features fields crs_name
1   dggrid_1k       Polygon   466080      1   WGS 84
2   dggrid_5k       Polygon    17801      1   WGS 84
3  dggrid_10k       Polygon     6079      1   WGS 84
4  dggrid_20k       Polygon     2106      1   WGS 84
5 dggrid_100k       Polygon      102      1   WGS 84

And we can see the nice spatial scale nesting in Figure 1.

Map of grid samples — Figure 1: Samples at 100, 20 and 10 km

To demonstrate the binning we’ll use the commune population available in the GPKG downloaded earlier that we’ll use as a point layer.

First we need also some more data…

# output projection
# we chose an equal-area projection
proj <- "EPSG:3035"

fx_laea <- fx |>
  st_transform(proj)

# population data
pop <- read_sf("~/data/adminexpress/adminexpress-cog-simpl-000-2024.gpkg",
              layer = "commune") |>
  filter(insee_reg > "06") |>
   st_transform(proj) |>
  st_point_on_surface()

# map zoom
fx_bb <- st_bbox(st_buffer(fx_laea, 0))

# projection name
lib_proj <- st_crs(fx_laea)$Name

# regional boundaries
reg_int <- read_sf("~/data/adminexpress/adminexpress-cog-simpl-000-2024.gpkg",
              layer = "region_int") |>
  st_transform(proj)

# european base map
eu <- ne_countries(continent = "europe", scale = 10, returnclass = "sf") |>
  st_transform(proj) |>
  st_intersection(st_buffer(fx_laea, 500000))

Then we create a function to generate the maps with different grid resolutions.

#' Create a map of population for a grid resolution
#'
#' @param resolution (int) : the grid resolution (among those available in
#'   dggrid_fx.gpkg)
#'
#' @return (ggplot) : population map
build_map <- function(resolution) {

  dggrid <- read_sf("dggrid_fx.gpkg",
                    layer = glue("dggrid_{resolution}k")) |>
    st_transform(proj) |>
    st_intersection(fx_laea)

  pop |>
    st_join(dggrid) |>
    st_drop_geometry() |>
    summarise(.by = seqnum,
              pop = sum(population, na.rm = TRUE)) |>
    left_join(x = dggrid, y = _,
              join_by(seqnum)) |>
    mutate(dens = drop_units(pop / set_units(st_area(geom), km^2))) |>
    ggplot() +
    geom_sf(data = eu, color = "#eeeeee", fill = "#eeeeee") +
    geom_sf(aes(fill = dens, color = dens)) +
    geom_sf(data = reg_int, color = "#ffffff", linewidth = 0.4, alpha = 0.5) +
    geom_sf(data = reg_int, color = "#555555", linewidth = .2) +
    scale_fill_fermenter(name = "inhabitants/km²",
                         palette = "Greens",
                         na.value = "white",
                         breaks = c(20, 50, 100, 500),
                         direction = 1) +
    scale_color_fermenter(name = "inhabitants/km²",
                          palette = "Greens",
                          na.value = "white",
                          breaks = c(20, 50, 100, 500),
                          direction = 1) +
    annotation_scale(location = "bl",
                     height = unit(.15, "cm"),
                     line_col = "#999999",text_col = "#999999",
                     bar_cols = c("#999999", "white")) +
    annotation_north_arrow(pad_x = unit(.25, "cm"), pad_y = unit(.8, "cm"),
                           style = north_arrow_fancy_orienteering(
                             text_col = "#999999", text_size = 8,
                             line_col = "#999999",
                             fill = "#999999"),
                           which_north = "true",
                           height = unit(.5, "cm"),
                           width = unit(.5, "cm")) +
    labs(title = "Population",
         subtitle = "Metropolitan France - 2022",
         caption = glue("data : Discrete Global Grid ISEA3H ≈{resolution} km,
                        Natural Earth and IGN Admin Express 2024
                        proj. : {lib_proj}")) +
    coord_sf(xlim = fx_bb[c(1, 3)],
             ylim = fx_bb[c(2, 4)]) +
    theme_void() +
    theme(text = element_text(family = "Marianne"),
          plot.caption = element_text(size = 6),
          plot.caption.position = "plot",
          panel.background = element_rect(color = "#E0FFFF",
                                          fill = "#E0FFFF55"))
}

Now we can demonstrate our different resolutions:

build_map(20)

Map of statistical binning on a hexagonal grid of french population — Figure 2: Population on grid size 20 km

build_map(100)

References

Barnes, Richard. 2017. “dggridR: Discrete Global Grids for R.” https://CRAN.R-project.org/package=dggridR.

Kimerling, Jon A., Kevin Sahr, Denis White, and Lian Song. 1999. “Comparing Geometrical Properties of Global Grids.” Cartography and Geographic Information Science 26 (4): 271–88. https://doi.org/10.1559/152304099782294186.

Sahr, Kevin, Denis White, and A. Jon Kimerling. 2003. “Geodesic Discrete Global Grid Systems.” Cartography and Geographic Information Science 30 (2): 121–34. https://doi.org/10.1559/152304003100011090.

Footnotes

mostly hexagons, in our case there are also five pentagons far away in the oceans.︎

To leave a comment for the author, please follow the link and comment on their blog: r.iresmi.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: Global discret grids

The Future of Data Analysis Using Hexagonal Grids

In recent times, the usage of Hexagonal grids or Geodesic Discrete Global Grid Systems (DGGs) for data analysis has been growing in popularity. These grids are beneficial as they allow the utilization of hierarchical equal-area hexagon grids. Through the {dggridR} package, these grids can be generated with relative ease for various regions, as demonstrated in this tutorial.

Implications of Hexagonal Grids for Data Analysis

Hexagonal grids offer a unique platform for statistical binning, mainly when applied over a large geographical area. Utilizing hexagon grids instead of squares could eliminate potential irregularities in data analysis, because hexagons eliminate the directional bias seen in square grids. Consequently, hexagon grids can possibly form a more cohesive global grid system for the analysis of spatial data. The long-term implications include more accurate spatial data analysis, that can improve overall data organization and representation.

Future Developments

As the field evolves, it’s conceivable that improvements to the {dggridR} package will be made, enhancing efficiency and flexibility. It’s also likely that we will see the development of additional tools and packages that function in harmony with DGGs for better data processing and visual representations.

Actionable advice

Take advantage of hexagonal grid systems for spatial data analysis and representation. They reduce directional bias and offer more consistent results.
Keep an eye on future developments in this space. The integration of Geodesic Discrete Global Grid Systems into data analysis is still relatively new, and advancements are likely to occur frequently.
Make sure to use the most updated version of the {dggridR} package for generating these grids.
Experiment with different grid sizes for different resolutions in your data to find the balance between precision and efficiency that works best for your needs.
Remember to carry out a detailed analysis using the commune population as demonstrated in the tutorial for an in-depth understanding of data spread and behavior.

Read the original article