Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
This is the fourth part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one.
This blog post focuses on the comparison of spatial patterns in categorical raster data for overlapping regions. In other words, here we have two rasters with the same number of rows and columns, and we want to compare their spatial patterns.
For this blog post, we use two categorical raster datasets: the CORINE Land Cover (CLC) datasets for Tartu (Estonia) for the years 2000 and 2018.
library(terra) clc2000_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/clc2000_tartu.tif") clc2018_tartu = rast("https://github.com/Nowosad/comparing-spatial-patterns-2024/raw/refs/heads/main/data/clc2018_tartu.tif") plot(clc2000_tartu, main = "Tartu (2000)") plot(clc2018_tartu, main = "Tartu (2018)")
In short, the red areas in the rasters represent urban areas, the green areas represent forests, the blue areas represent water bodies, and the yellow areas represent agricultural land.
Non-spatial context
Raster outcome
The binary difference between two rasters
The simplest way to compare two categorical rasters is to create a binary raster that indicates whether the values of the two rasters are the same or different. Here, the output highlights the areas where the values of the two rasters differ (yellow) and where they are the same (purple).
clc_diff = clc2018_tartu != clc2000_tartu plot(clc_diff)
Multiple value outcome
To get a more detailed comparison, we can calculate the confusion matrix (also known as, e.g., contingency table) of the two rasters. It shows the number of pixels that have the same value in both rasters (diagonal) and the number of pixels with different values (off-diagonal).
cm = table(values(clc2000_tartu), values(clc2018_tartu)) cm
1 2 3 4 6 7 1 4821 357 2 10 0 0 2 1342 67915 684 389 0 0 3 59 415 33670 2896 9 22 4 122 638 1435 7040 229 24 6 0 3 13 20 2177 37 7 0 0 0 0 3 1995
For example, the table shows that there are 4821 pixels with the value 1 in both rasters, and 357 pixels with the value 2 in the first raster and the value 1 in the second raster.
Confusion matrices is also a building block for many other statistics, including accuracy, that can be calculated to compare the two sets of categorical data.
Single value outcome
The proportion of changed pixels
A binary difference raster can be also used to calculate the proportion of changed pixels by dividing the number of changed pixels by the total number of non-NA pixels.
clc_diff = clc2018_tartu != clc2000_tartu changed_pixels = freq(clc_diff)$count[2] na_pixels = freq(clc_diff, value = NA)$count total_pixels = ncell(clc_diff) - na_pixels proportion_changed = changed_pixels / total_pixels proportion_changed
[1] 0.06894013
This outcome shows that about 0.07 of the pixels have changed between the two rasters.
The overall comparison
The overall comparison of the two rasters can be done by calculating the difference between the frequencies of the values of the two rasters (Pontius 2002).
clc2000_tartu_freq = freq(clc2000_tartu) clc2018_tartu_freq = freq(clc2018_tartu) freq = merge(clc2000_tartu_freq, clc2018_tartu_freq, by = "value", all = TRUE) freq$diff = abs(freq$count.x - freq$count.y) sum_diff = sum(freq$diff) na_pixels = freq(clc_diff, value = NA)$count total_pixels = ncell(clc_diff) - na_pixels 1 - sum_diff / total_pixels
[1] 0.9640774
The overall comparison tells what is the percentage of pixels that have the same class in both rasters (however, it does not consider if the pixels are in the same location).
Statistics of the differences between rasters’ values
More statistics of the differences between the values of the two rasters can be calculated using the diffeR package (Pontius Jr. and Santacruz 2023). These statistics are based on the confusion matrix and include the overall allocation disagreement, overall difference, overall exchange disagreement, overall quantity disagreement, and overall shift disagreement.1
library(diffeR) clc_ct = crosstabm(clc2000_tartu, clc2018_tartu) diffeR_df = data.frame( overallAllocD = overallAllocD(clc_ct), overallDiff = overallDiff(clc_ct), overallExchangeD = overallExchangeD(clc_ct), overallQtyD = overallQtyD(clc_ct), overallShiftD = overallShiftD(clc_ct) ) diffeR_df
overallAllocD overallDiff overallExchangeD overallQtyD overallShiftD 1 6440 8709 5280 2269 1160
Spatial context
Raster outcome
The difference between a focal measure of two rasters (e.g., selected landscape metric)
To include the spatial context in the comparison, we can calculate the difference between a focal measure of two rasters. An example of such a measure is the relative mutual information (relmutinf
) metric, which quantifies the clumpiness of the landscape – the larger the value, the more clumped the landscape is [^See a blog post about this metric].
The below code chunk uses the landscapemetrics package (Hesselbarth et al. 2019) to specify a moving window of 5 by 5 and calculate the relmutinf
metric for the two rasters. Next, it calculates the absolute difference between the two rasters and plots the result.
library(landscapemetrics) window = matrix(1, nrow = 5, ncol = 5) clc2000_tartu_relmutinf_mw = window_lsm(clc2000_tartu, window = window, what = "lsm_l_relmutinf") clc2018_tartu_relmutinf_mw = window_lsm(clc2018_tartu, window = window, what = "lsm_l_relmutinf") clc2018_tartu_relmutinf_diff = abs(clc2018_tartu_relmutinf_mw[[1]][[1]] - clc2000_tartu_relmutinf_mw[[1]][[1]]) plot(clc2018_tartu_relmutinf_diff)
The largest values in the output indicate the areas where the clumpiness of the landscape has changed the most between the two rasters.
Alternatively, if we calculate the regular difference, the output will show the areas where the clumpiness of the landscape has increased (positive values) and decreased (negative values).
plot_div = function(r, ...){ r_range = range(values(r), na.rm = TRUE, finite = TRUE) max_abs = max(abs(r_range)) new_range = c(-max_abs, max_abs) plot(r, col = hcl.colors(100, palette = "prgn"), range = new_range, ...) } clc2018_tartu_relmutinf_diff2 = clc2018_tartu_relmutinf_mw[[1]][[1]] - clc2000_tartu_relmutinf_mw[[1]][[1]] plot_div(clc2018_tartu_relmutinf_diff2)
Cross-entropy loss function
The moving window approach is also used in the raster.change()
function from the spatialEco package (Evans and Murphy 2023). Its first two arguments are the two rasters to compare, the s
argument specifies the size of the moving window, and the stat
argument specifies the statistic to calculate. For example, stat = "cross-entropy"
calculates the cross-entropy loss function, where the larger the value, the more different the two rasters are.
library(spatialEco) clc_ce = raster.change(clc2000_tartu, clc2018_tartu, s = 5, stat = "cross-entropy") plot(clc_ce)
Note that the above calculation may take a few minutes to complete.
Multiple value outcome
Various statistics from categorical data can be calculated at multiple scales with the waywiser package (Mahoney 2023). Here, the ww_multi_scale()
function calculates the accuracy of the two rasters at different scales from 500 to 3000 meters (map units).
library(waywiser) cell_sizes = seq(500, 3000, by = 500) clc_multi_scale = ww_multi_scale(truth = as.factor(clc2000_tartu), estimate = as.factor(clc2018_tartu), metrics = list(yardstick::accuracy), cellsize = cell_sizes, progress = FALSE) clc_multi_scale
# A tibble: 6 × 6 .metric .estimator .estimate .grid_args .grid .notes <chr> <chr> <dbl> <list> <list> <list> 1 accuracy multiclass 0.781 <tibble [1 × 1]> <sf [6,400 × 5]> <tibble> 2 accuracy multiclass 0.607 <tibble [1 × 1]> <sf [1,600 × 5]> <tibble> 3 accuracy multiclass 0.453 <tibble [1 × 1]> <sf [729 × 5]> <tibble> 4 accuracy multiclass 0.358 <tibble [1 × 1]> <sf [400 × 5]> <tibble> 5 accuracy multiclass 0.246 <tibble [1 × 1]> <sf [256 × 5]> <tibble> 6 accuracy multiclass 0.270 <tibble [1 × 1]> <sf [196 × 5]> <tibble>
The output shows the accuracy of the two rasters at each scale: the largest value is at the scale of 500 meters, and the smallest value is at the scale of 3000 meters. It shows that with the increase of the scale, the agreement between the two rasters decreases – both rasters are similar at the local scale, but they differ at the regional one.
Single value outcome
Spatial association between regionalizations using V-measure
The sabre package (Nowosad and Stepinski 2018) provides a function to calculate a few measures of spatial association between two categorical maps.2
library(sabre) clc_sabre = vmeasure_calc(clc2000_tartu, clc2018_tartu) clc_sabre
The SABRE results: V-measure: 0.77 Homogeneity: 0.78 Completeness: 0.76 The spatial objects can be retrieved with: $map1 - the first map $map2 - the second map
Its output returns three values: the homogeneity, the completeness, and the V-measure. The homogeneity measures how well regions from the first map fit inside of regions from the second map, the completeness measures how well regions from the second map fit inside of regions from the first map, and the V-measure is the weighted harmonic mean of homogeneity and completeness. All of them range from 0 to 1, where larger values indicate better spatial agreement.
Additionally, the output contains two sets of maps of regions’ inhomogeneities (rih): the first set shows how the regions from the first map are inhomogenous in regard to the regions from the second map, and the second set shows how the regions from the second map are inhomogenous in regard to the regions from the first map.
plot(clc_sabre$map1[[2]]) plot(clc_sabre$map2[[2]])
References
Footnotes
-
And more, as can be found in the package documentation – see
?diffeR::overallAllocD
.︎ -
Its output actually gives a few values and two maps, but the most important one is the V-measure.︎
Reuse
Citation
@online{nowosad2024, author = {Nowosad, Jakub}, title = {Comparison of Spatial Patterns in Categorical Raster Data for Overlapping Regions Using {R}}, date = {2024-11-03}, url = {https://jakubnowosad.com/posts/2024-11-03-spatcomp-bp4/}, langid = {en} }
Categorical Raster Data for Overlapping Regions Using R.”
November 3, 2024. https://jakubnowosad.com/posts/2024-11-03-spatcomp-bp4/.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Comparison of spatial patterns in categorical raster data for overlapping regions using R
Insights and Implications
The blog post delves into methods for comparing spatial patterns on raster data, focusing on categorical raster data sets for overlapping regions. Specifically, the article used two raster datasets, the CORINE Land Cover (CLC) datasets, for the city of Tartu, Estonia, for the years 2000 and 2018 for their comparison study. Different methods have been highlighted for both non-spatial and spatial contexts.
Non-Spatial Context
In a non-spatial context, the simplest way to compare rasters is to create a binary raster indicating whether the values of the two rasters are the same or different. More detailed comparison can be obtained by using a confusion matrix, which displays the number of pixels with the same value in both rasters and the number of pixels with different values. Subsequently, single value outcomes can highlight the proportion of changed pixels.
Spatial Context
In the spatial context, the difference between a focal measure of two rasters is calculated. An example is the ‘relative mutual information’ metric, which quantifies the landscape’s clumpiness. Other approaches like using a ‘cross-entropy loss function’ were also suggested to identify the differences between two rasters. The ‘waywiser’ package is suggested for calculating various statistics from categorical data at multiple scales. Lastly, spatial association between regionalizations is recommended using the V-measure from the ‘sabre’ package.
Actionable Advice
Given the increase in data collection and recording, spatial analysis is becoming an important asset in making informed decisions. The methods outlined in the blog post provide R programmers with various ways to approach spatial pattern comparison in raster data.
For entities handling geospatial data such as environmental researchers, urban planners, and GIS professionals, it would be advisable to adopt these methods for better analysis of changes occurring over a period of time.
Developers should keep an eye on packages like ‘landscapemetrics’, ‘spatialEco’, ‘waywiser’, ‘sabre’, and ‘diffeR’ as these could prove to be crucial for spatial analysis tasks. Moreover, activities such as benchmarking these methods against one another for different spatial datasets would surely enhance efficiency and accuracy in the long term.
These methods serve as starting points and can be customized or expanded upon to suit specific use-cases. Further, the consideration of contributing to open source projects like the ‘landscapemetrics’, ‘spatialEco’, ‘waywiser’, ‘sabre’, and ‘diffeR’ packages by improving upon existing functions, adding new functionalities or dealing with known issues is highly encouraged.