Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Day 30 of 30DayMapChallenge: « The final map » (previously).
Creating a raw polygon layer of french postal codes from points
This data, although not a “real” administrative limit, is often used in many different applications, see for example the BNV-D.
There is no free, up-to-date, layer of polygon boundaries for french postal codes.
- The official database is just a table giving a postal code for each commune code, optionally with point coordinates. We have some postal codes shared by several communes and some communes have several postal codes (at the same point location).
- Using the national address base (BAN) we can compute the convex hulls: Contours calculés des zones codes postaux. Interesting method but old (2021) and with many holes/overlaps
- Fond de carte des codes postaux is nice (although métropole only) but old (2013).
Config
library(readr) library(dplyr) library(tidyr) library(stringr) library(purrr) library(ggplot2) library(glue) library(janitor) library(sf)
Data
# Postal codes as point by commune cp_points <- read_csv("https://datanova.laposte.fr/data-fair/api/v1/datasets/laposte-hexasmal/metadata-attachments/base-officielle-codes-postaux.csv", name_repair = make_clean_names) |> separate(geopoint, into = c("lat", "lon"), sep = ",", convert = TRUE) |> drop_na(lon, lat) |> st_as_sf(coords = c("lon", "lat"), crs = "EPSG:4326") |> filter(str_sub(code_commune_insee, 1, 3) < "987") |> mutate(proj = case_when(str_sub(code_commune_insee, 1, 3) %in% c("971", "972", "977", "978") ~ "EPSG:5490", str_sub(code_commune_insee, 1, 3) == "973" ~ "EPSG:2972", str_sub(code_commune_insee, 1, 3) == "974" ~ "EPSG:2975", str_sub(code_commune_insee, 1, 3) == "976" ~ "EPSG:4471", .default = "EPSG:2154")) # France limits, to clip voronoi polygons fr <- read_sf("https://static.data.gouv.fr/resources/admin-express-cog-simplifiee-2024-metropole-drom-saint-martin-saint-barthelemy/20240930-094021/adminexpress-cog-simpl-000-2024.gpkg", layer = "departement") |> mutate(terr = if_else(insee_reg > "06", "fx", insee_reg)) |> group_by(terr) |> summarise() # Communes limits and population to give the name of the postal code as the # biggest commune (by pop) com <- read_sf("https://static.data.gouv.fr/resources/admin-express-cog-simplifiee-2024-metropole-drom-saint-martin-saint-barthelemy/20240930-094021/adminexpress-cog-simpl-000-2024.gpkg", layer = "commune")
Processing
We will create voronoï polygons from grouped postal codes points.
st_rename_geom <- function(x, name) { names(x)[names(x) == attr(x, "sf_column")] <- name st_geometry(x) <- name return(x) } # We need to use a projection for each territory to avoid geometry errors voronoi <- function(df, k) { df_proj <- df |> st_transform(pull(k, proj)) df_proj |> st_union() |> st_voronoi() |> st_cast() |> st_intersection(st_transform(fr, pull(k, proj))) |> st_sf() |> st_rename_geom("geom") |> st_join(df_proj) |> group_by(code_postal) |> summarise() |> st_transform("EPSG:4326") } # get the names of the main town of each postal code noms <- cp_points |> st_join(com |> select(insee_com, nom, population)) |> group_by(code_postal) |> slice_max(population, n = 1, with_ties = FALSE) |> select(code_postal, nom) |> st_drop_geometry() # create the voronoï for each territory cp_poly <- cp_points |> group_by(proj) |> group_modify(voronoi) |> ungroup() |> select(-proj) |> st_sf() |> left_join(noms, join_by(code_postal))
Map
An extract only on métropole:
cp_poly |> filter(str_sub(code_postal, 1, 2) < "97") |> st_transform("EPSG:2154") |> ggplot() + geom_sf(fill = "#eeeeee", color = "#bbbbbb", linewidth = .1) + labs(title = "Codes postaux", subtitle = "France métropolitaine - 2024", caption = glue("https://r.iresmi.net/ - {Sys.Date()} data: La Poste")) + theme_void() + theme(plot.caption = element_text(size = 6, color = "darkgrey"))
Export
cp_poly |> st_write("codes_postaux_fr_2024.gpkg", delete_layer = TRUE, quiet = TRUE, layer_options = c("IDENTIFIER=Codes postaux France 2024", glue("DESCRIPTION=Métropole + DROM WGS84. d'après données La Poste + IGN Adminexpress https://r.iresmi.net/ - {Sys.Date()}")))
Get the file: polygones des codes postaux français 2024 (geopackage WGS84) (3 MB).
- Métropole + DROM (no Polynesia, New caledonia,…)
- Some polygons parts (434) cover other polygons in communes where several postal codes are present.
- 4 NAs (multipolygons without postal codes) are present covering territories outside the voronoï polygons.
- No postal code for St-Barth/St-Martin?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Codes postaux
A New Approach to Creating a Layer of French Postal Codes
In the absence of a freely available and up-to-date layer of polygon boundaries for French postal codes, the author presents a unique approach. By exploiting the official database which gives a postal code for each commune code, and using the national address base (BAN), it becomes possible to compute convex hulls from points – producing a raw polygon layer of french postal codes.
Potential Implications and Future Developments
The development of a new method to create a layer of French postal codes from points can have various long-term implications since such data is frequently used in multiple applications. For academic researchers, this could lead to more efficient and precise studies. For businesses, as a more accurate geographical segmentation conducts towards better marketing strategies, the implementation of this refined approach could result in more effective and targeted campaigns.
Future Enhancement
While this method has proven to be successful in establishing a system of generating French postal codes in polygon format, there are still areas that could be further improved. For instance, the author indicates that there are several postal codes shared by multiple communes and some communes with multiple postal codes. This overlapping issue obscures the geographical representation of the postal codes. A future development in this area cast towards finding a method to partition these overlapping regions accurately.
Actionable Advice
From this analysis, we can deduce several steps for furtherance in this field:
- There is a need to ensure that the data used is current and up-to-date. This is to guarantee that the generated map reflects the most recent changes in the Postal Service.
- Consider enhancing the current method to address the issue of overlapping regions.
- Given the rapid advance in data science and technology, it is essential to continuously explore new methods and algorithms that can lead to a more accurate representation of the postal codes.
- Finally, making the data freely available and accessible can promote further research and innovations in this field.
Knowledge sharing is an essential aspect of research and development. Looking ahead, it is crucial that researchers keep pushing boundaries while developers keep advancing software capabilities. This synergistic relationship will ultimately lead to significant innovations in data management and utilization.
Conclusively:
This innovative approach facilitates a new way of creating a raw polygon layer of French postal codes from points. While there is room for improvement and further refinement, this is a significant step towards a more accurate geographical segmentation using postal codes. Developers and researchers alike are encouraged to explore and build upon this method, contributing to the further advancement of data science and software development.
The article provides a substantial foundation for the alignment of R programming and data handling, bringing forth substantial implications for future research and development, as well as practical applications in various fields.