Sea-Surface Temperature Analysis: From Data Collection to Visualization
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Sea-surface temperatures in the North Atlantic have been in the news recently as they continue to break records. While there are already a number of excellent summaries and graphs of the data, I thought I’d have a go at making some myself. The starting point is the detailed data made available by the National Centers for Environmental Information, part of NOAA. As always, the sheer volume of high-quality data agencies like this make available to the public is astonishing.
Getting the Data
The specific dataset is the NOAA 0.25-degree Daily Optimum Interpolation Sea Surface Temperature (OISST), Version 2.1, which takes a global network of daily temperature observations (from things like buoys and platforms, but also satellites), and then interpolates and aggregates them to a regular spatial grid of observations at 0.25 degrees resolution.
The data is available as daily global observations doing back to September 1st, 1981. Each day’s data is available as a single file in subdirectories organized by year-month. It’s all here. Each file is about 1.6MB in size. There are more than fifteen thousand of them.
We can make a folder called raw
in our project and then get all the data, preserving is subdirectory structure, with a wget
command like this:
1 |
wget --no-parent -r -l inf --wait 5 --random-wait 'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/' |
This tries to be polite with the NOAA. (I think its webserver is also lightly throttled in any case, but it never hurts to be nice.) The switches --no-parent -r -l inf
tell wget
not to move upwards in the folder hierarchy, but to recurse downwards indefinitely. The --wait 5 --random-wait
jointly enforce a five-second wait time between requests, randomly varying between 0.5 and 1.5 times the wait
period. Downloading the files this way will take several days of real time downloading, though of course much less in actual file transfer time.
The netCDF data format
The data are netCDF files, an interesting and in fact quite nice self-documenting binary file format. These files have regular arrays of data of some n-dimensional size, e.g. latitude by longitude, with measures that can be thought of as being stacked on the array. (E.g. a grid of points with a measure at each point for surface temperature, sea ice extent, etc, etc). So you have the potential to have big slabs of data that you can then summarize by aggregating on some dimension or other.
The ncdf4
package can read them in R, though as it turns out we won’t use this package for the analysis. Here’s what one file looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
ncdf4::nc_open(all_fnames[1000]) File /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/198405/oisst-avhrr-v02r01.19840527.nc (NC_FORMAT_NETCDF4): 4 variables (excluding dimension variables): short anom[lon,lat,zlev,time] (Chunking: [1440,720,1,1]) (Compression: shuffle,level 4) long_name: Daily sea surface temperature anomalies _FillValue: -999 add_offset: 0 scale_factor: 0.00999999977648258 valid_min: -1200 valid_max: 1200 units: Celsius short err[lon,lat,zlev,time] (Chunking: [1440,720,1,1]) (Compression: shuffle,level 4) long_name: Estimated error standard deviation of analysed_sst units: Celsius _FillValue: -999 add_offset: 0 scale_factor: 0.00999999977648258 valid_min: 0 valid_max: 1000 short ice[lon,lat,zlev,time] (Chunking: [1440,720,1,1]) (Compression: shuffle,level 4) long_name: Sea ice concentration units: % _FillValue: -999 add_offset: 0 scale_factor: 0.00999999977648258 valid_min: 0 valid_max: 100 short sst[lon,lat,zlev,time] (Chunking: [1440,720,1,1]) (Compression: shuffle,level 4) long_name: Daily sea surface temperature units: Celsius _FillValue: -999 add_offset: 0 scale_factor: 0.00999999977648258 valid_min: -300 valid_max: 4500 4 dimensions: time Size:1 *** is unlimited *** long_name: Center time of the day units: days since 1978-01-01 12:00:00 zlev Size:1 long_name: Sea surface height units: meters actual_range: 0, 0 positive: down lat Size:720 long_name: Latitude units: degrees_north grids: Uniform grid from -89.875 to 89.875 by 0.25 lon Size:1440 long_name: Longitude units: degrees_east grids: Uniform grid from 0.125 to 359.875 by 0.25 37 global attributes: title: NOAA/NCEI 1/4 Degree Daily Optimum Interpolation Sea Surface Temperature (OISST) Analysis, Version 2.1 - Final source: ICOADS, NCEP_GTS, GSFC_ICE, NCEP_ICE, Pathfinder_AVHRR, Navy_AVHRR id: oisst-avhrr-v02r01.19840527.nc naming_authority: gov.noaa.ncei summary: NOAAs 1/4-degree Daily Optimum Interpolation Sea Surface Temperature (OISST) (sometimes referred to as Reynolds SST, which however also refers to earlier products at different resolution), currently available as version v02r01, is created by interpolating and extrapolating SST observations from different sources, resulting in a smoothed complete field. The sources of data are satellite (AVHRR) and in situ platforms (i.e., ships and buoys), and the specific datasets employed may change over time. At the marginal ice zone, sea ice concentrations are used to generate proxy SSTs. A preliminary version of this file is produced in near-real time (1-day latency), and then replaced with a final version after 2 weeks. Note that this is the AVHRR-ONLY DOISST, available from Oct 1981, but there is a companion DOISST product that includes microwave satellite data, available from June 2002 cdm_data_type: Grid history: Final file created using preliminary as first guess, and 3 days of AVHRR data. Preliminary uses only 1 day of AVHRR data. date_modified: 2020-05-08T19:05:13Z date_created: 2020-05-08T19:05:13Z product_version: Version v02r01 processing_level: NOAA Level 4 institution: NOAA/National Centers for Environmental Information creator_url: https://www.ncei.noaa.gov/ creator_email: oisst-help@noaa.gov keywords: Earth Science > Oceans > Ocean Temperature > Sea Surface Temperature keywords_vocabulary: Global Change Master Directory (GCMD) Earth Science Keywords platform: Ships, buoys, Argo floats, MetOp-A, MetOp-B platform_vocabulary: Global Change Master Directory (GCMD) Platform Keywords instrument: Earth Remote Sensing Instruments > Passive Remote Sensing > Spectrometers/Radiometers > Imaging Spectrometers/Radiometers > AVHRR > Advanced Very High Resolution Radiometer instrument_vocabulary: Global Change Master Directory (GCMD) Instrument Keywords standard_name_vocabulary: CF Standard Name Table (v40, 25 January 2017) geospatial_lat_min: -90 geospatial_lat_max: 90 geospatial_lon_min: 0 geospatial_lon_max: 360 geospatial_lat_units: degrees_north geospatial_lat_resolution: 0.25 geospatial_lon_units: degrees_east geospatial_lon_resolution: 0.25 time_coverage_start: 1984-05-27T00:00:00Z time_coverage_end: 1984-05-27T23:59:59Z metadata_link: https://doi.org/10.25921/RE9P-PT57 ncei_template_version: NCEI_NetCDF_Grid_Template_v2.0 comment: Data was converted from NetCDF-3 to NetCDF-4 format with metadata updates in November 2017. sensor: Thermometer, AVHRR Conventions: CF-1.6, ACDD-1.3 references: Reynolds, et al.(2007) Daily High-Resolution-Blended Analyses for Sea Surface Temperature (available at https://doi.org/10.1175/2007JCLI1824.1). Banzon, et al.(2016) A long-term record of blended satellite and in situ sea-surface temperature for climate monitoring, modeling and environmental studies (available at https://doi.org/10.5194/essd-8-165-2016). Huang et al. (2020) Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) Version v02r01, submitted.Climatology is based on 1971-2000 OI.v2 SST. Satellite data: Pathfinder AVHRR SST and Navy AVHRR SST. Ice data: NCEP Ice and GSFC Ice. |
This is the file for May 27th, 1984. As you can see, there’s a lot of metadata. Each variable is admirably well-documented. The key information for our purposes is that we have a grid of 1440 by 720 lat-lon points. There are two additional dimensions—time and elevation (zlev)—but these are both just 1 for each particular file, because every file is observations at elevation zero on a particular day. There are four measures at each point: sea surface temperature anomalies, the standard deviation of the sea surface temperature estimate, sea ice concentration (as a percentage), and sea surface temperature (in degrees Celsius). So we have four bits of data for each grid point on our 1440 * 720 grid, which makes for just over 4.1 million data points per day since 1981.
Processing the Data
We read in the filenames and see how many we have:
1 2 3 |
all_fnames <- fs::dir_ls(here("raw"), recurse = TRUE, glob = "*.nc") length(all_fnames) #> [1] 15549 |
What we want to do is read in all this data and aggregate it so that we can take, for instance, the global average for each day and plot that trend for each year. Or perhaps we want to do that for specific regions of the globe, either defined directly by us in terms of some latitude and longitude polygon, or taken from the coordinates of some conventional division of the world’s oceans and seas into named areas.
Our tool of choice is the Terra package, which is designed specifically for this kind of data. It has a number of tools for conveniently aggregating and cutting into arrays of geospatial data. The netCDF4 package has a lot of useful features, too, but for the specific things we want to do Terra’s toolkit is quicker. One thing it can do, for example, is naturally aggregate over-time layers into single “bricks” of data, and then quickly summarize or calculate on these arrays.
So, let’s chunk our filenames into units of 25 days or so:
1 2 3 |
## This one gives you an unknown number of chunks each with approx n elements chunk <- function(x, n) split(x, ceiling(seq_along(x)/n)) chunked_fnames <- chunk(all_fnames, 25) |
Next, we write a function to process a raster file that terra creates. It calculates the area-weighted means of the layer variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
layerinfo <- tibble( num = c(1:4), raw_name = c("anom_zlev=0", "err_zlev=0", "ice_zlev=0", "sst_zlev=0"), name = c("anom", "err", "ice", "sst")) process_raster <- function(fnames, crop_area = c(-80, 0, 0, 60), layerinfo = layerinfo) { tdf <- terra::rast(fnames) |> terra::rotate() |> # Convert 0 to 360 lon to -180 to +180 lon terra::crop(crop_area) # Manually crop to a defined box. Default is roughly N. Atlantic lat/lon box wts <- terra::cellSize(tdf, unit = "km") # For scaling. Because the Earth is round. # global() calculates a quantity for the whole grid on a particular SpatRaster # so we get one weighted mean per file that comes in out <- data.frame(date = terra::time(tdf), means = terra::global(tdf, "mean", weights = wts, na.rm=TRUE)) out$var <- rownames(out) out$var <- gsub("_.*", "", out$var) out <- reshape(out, idvar = "date", timevar = "var", direction = "wide") colnames(out) <- gsub("weighted_mean.", "", colnames(out)) out } |
For a single file, this gives us one number for each variable:
1 2 3 4 5 6 |
# World box (60S to 60N) world_crop_bb <- c(-180, 180, -60, 60) process_raster(all_fnames[10000], crop_area = world_crop_bb) #> date anom err ice sst #> anom_zlev=0 2009-01-20 0.01397327 0.1873972 0.6823713 20.26344 |
For 25 filenames, 25 rows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
process_raster(chunked_fnames[[1]], crop_area = world_crop_bb) #> date anom err ice sst #> anom_zlev=0 1981-09-01 -0.1312008 0.2412722 0.4672954 20.12524 #> anom_zlev=0.1 1981-09-02 -0.1383695 0.2483428 0.4933853 20.11629 #> anom_zlev=0.2 1981-09-03 -0.1419441 0.2583364 0.4807980 20.11095 #> anom_zlev=0.3 1981-09-04 -0.1434012 0.2627574 0.5125643 20.10772 #> anom_zlev=0.4 1981-09-05 -0.1527941 0.2520100 0.4889709 20.09655 #> anom_zlev=0.5 1981-09-06 -0.1590382 0.2421610 0.5253917 20.08851 #> anom_zlev=0.6 1981-09-07 -0.1603969 0.2406726 0.4959906 20.08539 #> anom_zlev=0.7 1981-09-08 -0.1530743 0.2437756 0.5203092 20.09094 #> anom_zlev=0.8 1981-09-09 -0.1503720 0.2483605 0.5062930 20.09187 #> anom_zlev=0.9 1981-09-10 -0.1532902 0.2574440 0.5275545 20.08718 #> anom_zlev=0.10 1981-09-11 -0.1409007 0.2548919 0.5111582 20.09779 #> anom_zlev=0.11 1981-09-12 -0.1459493 0.2438222 0.5395167 20.09097 #> anom_zlev=0.12 1981-09-13 -0.1540702 0.2341866 0.5259677 20.08107 #> anom_zlev=0.13 1981-09-14 -0.1719063 0.2322755 0.5650545 20.06144 #> anom_zlev=0.14 1981-09-15 -0.1879679 0.2319289 0.5357815 20.04363 #> anom_zlev=0.15 1981-09-16 -0.2021128 0.2330142 0.5718586 20.02638 #> anom_zlev=0.16 1981-09-17 -0.2163771 0.2371551 0.5434053 20.00766 #> anom_zlev=0.17 1981-09-18 -0.2317916 0.2366315 0.5757664 19.98781 #> anom_zlev=0.18 1981-09-19 -0.2321086 0.2388878 0.5458579 19.98307 #> anom_zlev=0.19 1981-09-20 -0.2478310 0.2388981 0.5682817 19.96289 #> anom_zlev=0.20 1981-09-21 -0.2477164 0.2366739 0.5428888 19.95858 #> anom_zlev=0.21 1981-09-22 -0.2315305 0.2369557 0.5636612 19.97033 #> anom_zlev=0.22 1981-09-23 -0.2079270 0.2401278 0.5423280 19.98950 #> anom_zlev=0.23 1981-09-24 -0.1803567 0.2397868 0.5666913 20.01262 #> anom_zlev=0.24 1981-09-25 -0.1704838 0.2376401 0.5437584 20.01805 |
We need to do this for all the files so we get complete dataset. We take advantage of the futureverse to parallelize the operation, because doing this with 15,000 files is going to take a bit of time. Then at the end we clean it up a little bit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
season <- function(in_date){ br = yday(as.Date(c("2019-03-01", "2019-06-01", "2019-09-01", "2019-12-01"))) x = yday(in_date) x = cut(x, breaks = c(0, br, 366)) levels(x) = c("Winter", "Spring", "Summer", "Autumn", "Winter") x } world_df <- future_map(chunked_fnames, process_raster, crop_area = world_crop_bb) |> list_rbind() |> as_tibble() |> mutate(date = ymd(date), year = lubridate::year(date), month = lubridate::month(date), day = lubridate::day(date), yrday = lubridate::yday(date), season = season(date)) world_df #> # A tibble: 15,549 × 10 #> date anom err ice sst year month day yrday season #> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 1981-09-01 -0.131 0.241 0.467 20.1 1981 9 1 244 Summer #> 2 1981-09-02 -0.138 0.248 0.493 20.1 1981 9 2 245 Autumn #> 3 1981-09-03 -0.142 0.258 0.481 20.1 1981 9 3 246 Autumn #> 4 1981-09-04 -0.143 0.263 0.513 20.1 1981 9 4 247 Autumn #> 5 1981-09-05 -0.153 0.252 0.489 20.1 1981 9 5 248 Autumn #> 6 1981-09-06 -0.159 0.242 0.525 20.1 1981 9 6 249 Autumn #> 7 1981-09-07 -0.160 0.241 0.496 20.1 1981 9 7 250 Autumn #> 8 1981-09-08 -0.153 0.244 0.520 20.1 1981 9 8 251 Autumn #> 9 1981-09-09 -0.150 0.248 0.506 20.1 1981 9 9 252 Autumn #> 10 1981-09-10 -0.153 0.257 0.528 20.1 1981 9 10 253 Autumn #> # ℹ 15,539 more rows |
Now we have a time series for each of the variables daily from 1981 to yesterday.
Calculating values for all the world’s seas and oceans
We can do a little better though. What if we wanted to get these average values for the seas and oceans of the world? For that we’d need a map defining the conventional boundaries of those areas of water, which we’d then need to covert to raster format. After that, we’d slice up our global raster by the ocean and sea boundaries, and calculate averages for those areas.
I took the maritime boundaries from the IHO Sea Areas Shapefile.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
seas <- sf::read_sf(here("raw", "World_Seas_IHO_v3")) seas #> Simple feature collection with 101 features and 10 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY #> Bounding box: xmin: -180 ymin: -85.5625 xmax: 180 ymax: 90 #> Geodetic CRS: WGS 84 #> # A tibble: 101 × 11 #> NAME ID Longitude Latitude min_X min_Y max_X max_Y area MRGID geometry #> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <MULTIPOLYGON [°]> #> 1 Rio de La Plata 33 -56.8 -35.1 -59.8 -36.4 -54.9 -31.5 3.18e4 4325 (((-54.94302 -34.94791, … #> 2 Bass Strait 62A 146. -39.5 144. -41.4 150. -37.5 1.13e5 4366 (((149.9046 -37.54325, 1… #> 3 Great Australian Bight 62 133. -36.7 118. -43.6 146. -31.5 1.33e6 4276 (((143.5325 -38.85535, 1… #> 4 Tasman Sea 63 161. -39.7 147. -50.9 175. -30 3.34e6 4365 (((159.0333 -30, 159.039… #> 5 Mozambique Channel 45A 40.9 -19.3 32.4 -26.8 49.2 -10.5 1.39e6 4261 (((43.38218 -11.37021, 4… #> 6 Savu Sea 48o 122. -9.48 119. -10.9 125. -8.21 1.06e5 4343 (((124.5562 -8.223565, 1… #> 7 Timor Sea 48i 128. -11.2 123. -15.8 133. -8.18 4.34e5 4344 (((127.8623 -8.214911, 1… #> 8 Bali Sea 48l 116. -7.93 114. -9.00 117. -7.01 3.99e4 4340 (((115.7522 -7.143594, 1… #> 9 Coral Sea 64 157. -18.2 141. -30.0 170. -6.79 4.13e6 4364 (((168.4912 -16.79469, 1… #> 10 Flores Sea 48j 120. -7.51 117. -8.74 123. -5.51 1.03e5 4341 (((120.328 -5.510677, 12… #> # ℹ 91 more rows |
Then we rasterize the polygons with a function from terra:
1 2 3 4 5 6 7 |
## Rasterize the seas polygons using one of the nc files ## as a reference grid for the rasterization process one_raster <- all_fnames[1] seas_vect <- terra::vect(seas) tmp_tdf_seas <- terra::rast(one_raster)["sst"] |> rotate() seas_zonal <- rasterize(seas_vect, tmp_tdf_seas, "NAME") |
Now we can use this data as the grid to do zonal calculations on our data raster. To use it in a parallelized calculation we need to wrap it, so that it can be found by the processes that future_map()
will spawn. We write a new function to do the zonal calculation. It’s basically the same as the global one above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
seas_zonal_wrapped <- wrap(seas_zonal) process_raster_zonal <- function(fnames) { d <- terra::rast(fnames) wts <- terra::cellSize(d, unit = "km") # For scaling layer_varnames <- terra::varnames(d) # vector of layers date_seq <- rep(terra::time(d)) # vector of dates # New colnames for use post zonal calculation below new_colnames <- c("sea", paste(layer_varnames, date_seq, sep = "_")) # Better colnames tdf_seas <- d |> terra::rotate() |> # Convert 0 to 360 lon to -180 to +180 lon terra::zonal(unwrap(seas_zonal_wrapped), mean, na.rm = TRUE) colnames(tdf_seas) <- new_colnames # Reshape to long tdf_seas |> tidyr::pivot_longer(-sea, names_to = c("measure", "date"), values_to = "value", names_pattern ="(.*)_(.*)") |> tidyr::pivot_wider(names_from = measure, values_from = value) } |
And we feed our chunked vector of filenames to it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
## Be patient seameans_df <- future_map(chunked_fnames, process_raster_zonal) |> list_rbind() |> mutate(date = ymd(date), year = lubridate::year(date), month = lubridate::month(date), day = lubridate::day(date), yrday = lubridate::yday(date), season = season(date)) write_csv(seameans_df, file = here("data", "oceans_means_zonal.csv")) save(seameans_df, file = here("data", "seameans_df.Rdata"), compress = "xz") seameans_df #> # A tibble: 1,570,449 × 11 #> sea date anom err ice sst year month day yrday season #> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 Adriatic Sea 1981-09-01 -0.737 0.167 NA 23.0 1981 9 1 244 Summer #> 2 Adriatic Sea 1981-09-02 -0.645 0.176 NA 23.1 1981 9 2 245 Autumn #> 3 Adriatic Sea 1981-09-03 -0.698 0.176 NA 22.9 1981 9 3 246 Autumn #> 4 Adriatic Sea 1981-09-04 -0.708 0.248 NA 22.9 1981 9 4 247 Autumn #> 5 Adriatic Sea 1981-09-05 -1.05 0.189 NA 22.5 1981 9 5 248 Autumn #> 6 Adriatic Sea 1981-09-06 -1.02 0.147 NA 22.4 1981 9 6 249 Autumn #> 7 Adriatic Sea 1981-09-07 -0.920 0.141 NA 22.4 1981 9 7 250 Autumn #> 8 Adriatic Sea 1981-09-08 -0.832 0.140 NA 22.5 1981 9 8 251 Autumn #> 9 Adriatic Sea 1981-09-09 -0.665 0.162 NA 22.6 1981 9 9 252 Autumn #> 10 Adriatic Sea 1981-09-10 -0.637 0.268 NA 22.5 1981 9 10 253 Autumn #> # ℹ 1,570,439 more rows |
Now we have properly-weighted daily averages for every sea and ocean since September 1981. Time to make some pictures.
Global Sea Surface Mean Temperature Graph
To make a graph of the global daily mean sea surface temperature we can use the world_df
object we made. The idea is to put the temperature on the y-axis, the day of the year on the x-axis, and then draw a separate line for each year. We highlight 2023 and (the data to date for) 2024. And we also draw a ribbon underlay showing plus or minus two standard deviations of the global mean.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
colors <- ggokabeito::palette_okabe_ito() ## Labels for the x-axis month_labs <- seameans_df |> filter(sea == "North Atlantic Ocean", year == 2023, day == 15) |> select(date, year, yrday, month, day) |> mutate(month_lab = month(date, label = TRUE, abbr = TRUE)) ## Average and sd ribbon data world_avg <- world_df |> filter(year > 1981 & year < 2012) |> group_by(yrday) |> filter(yrday != 366) |> summarize(mean_8211 = mean(sst, na.rm = TRUE), sd_8211 = sd(sst, na.rm = TRUE)) |> mutate(fill = colors[2], color = colors[2]) ## Flag years of interest out_world <- world_df |> mutate(year_flag = case_when( year == 2023 ~ "2023", year == 2024 ~ "2024", .default = "All other years")) out_world_plot <- ggplot() + geom_ribbon(data = world_avg, mapping = aes(x = yrday, ymin = mean_8211 - 2*sd_8211, ymax = mean_8211 + 2*sd_8211, fill = fill), alpha = 0.3, inherit.aes = FALSE) + geom_line(data = world_avg, mapping = aes(x = yrday, y = mean_8211, color = color), linewidth = 2, inherit.aes = FALSE) + scale_color_identity(name = "Mean Temp. 1982-2011, ±2SD", guide = "legend", breaks = unique(world_avg$color), labels = "") + scale_fill_identity(name = "Mean Temp. 1982-2011, ±2SD", guide = "legend", breaks = unique(world_avg$fill), labels = "") + ggnewscale::new_scale_color() + geom_line(data = out_world, mapping = aes(x = yrday, y = sst, group = year, color = year_flag), inherit.aes = FALSE) + scale_color_manual(values = colors[c(1,6,8)]) + scale_x_continuous(breaks = month_labs$yrday, labels = month_labs$month_lab) + scale_y_continuous(breaks = seq(19.5, 21.5, 0.5), limits = c(19.5, 21.5), expand = expansion(mult = c(-0.05, 0.05))) + geom_line(linewidth = rel(0.7)) + guides( x = guide_axis(cap = "both"), y = guide_axis(minor.ticks = TRUE, cap = "both"), color = guide_legend(override.aes = list(linewidth = 2)) ) + labs(x = "Month", y = "Mean Temperature (°Celsius)", color = "Year", title = "Mean Daily Global Sea Surface Temperature, 1981-2024", subtitle = "Latitudes 60°N to 60°S; Area-weighted NOAA OISST v2.1 estimates", caption = "Kieran Healy / @kjhealy") + theme(axis.line = element_line(color = "gray30", linewidth = rel(1)), plot.title = element_text(size = rel(1.9))) ggsave(here("figures", "global_mean.png"), out_world_plot, height = 7, width = 10, dpi = 300) |
The North Atlantic
We can slice out the North Atlantic by name from seameans_df
and make its graph in much the same way. For variety we can color most of the years blue, to lean into the “Great Wave off Kanagawa” (or Rockall?) vibe.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
out_atlantic <- seameans_df |> filter(sea == "North Atlantic Ocean") |> mutate(year_flag = case_when( year == 2023 ~ "2023", year == 2024 ~ "2024", .default = "All other years" )) |> ggplot(aes(x = yrday, y = sst, group = year, color = year_flag)) + geom_line(linewidth = rel(1.1)) + scale_x_continuous(breaks = month_labs$yrday, labels = month_labs$month_lab) + scale_color_manual(values = colors[c(1,6,2)]) + guides( x = guide_axis(cap = "both"), y = guide_axis(minor.ticks = TRUE, cap = "both"), color = guide_legend(override.aes = list(linewidth = 2)) ) + labs(x = "Month", y = "Mean Temperature (Celsius)", color = "Year", title = "Mean Daily Sea Surface Temperature, North Atlantic Ocean, 1981-2024", subtitle = "Gridded and weighted NOAA OISST v2.1 estimates", caption = "Kieran Healy / @kjhealy") + theme(axis.line = element_line(color = "gray30", linewidth = rel(1))) ggsave(here("figures", "north_atlantic.png"), out_atlantic, height = 7, width = 10, dpi = 300) |
All the Seas
Finally we can of course go crazy with facets and just draw everything.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
## All the world's oceans and seas out <- seameans_df |> mutate(year_flag = case_when( year == 2023 ~ "2023", year == 2024 ~ "2024", .default = "All other years")) |> ggplot(aes(x = yrday, y = sst, group = year, color = year_flag)) + geom_line(linewidth = rel(0.5)) + scale_x_continuous(breaks = month_labs$yrday, labels = month_labs$month_lab) + scale_color_manual(values = colors[c(1,6,2)]) + guides( x = guide_axis(cap = "both"), y = guide_axis(minor.ticks = TRUE, cap = "both"), color = guide_legend(override.aes = list(linewidth = 1.4)) ) + facet_wrap(~ reorder(sea, sst), axes = "all_x", axis.labels = "all_y") + labs(x = "Month of the Year", y = "Mean Temperature (Celsius)", color = "Year", title = "Mean Daily Sea Surface Temperatures, 1981-2024", subtitle = "Area-weighted 0.25° grid estimates; NOAA OISST v2.1; IHO Sea Boundaries", caption = "Data processed with R; Figure made with ggplot by Kieran Healy / @kjhealy") + theme(axis.line = element_line(color = "gray30", linewidth = rel(1)), strip.text = element_text(face = "bold", size = rel(1.4)), plot.title = element_text(size = rel(1.525)), plot.subtitle = element_text(size = rel(1.1))) ggsave(here("figures", "all_seas.png"), out, width = 40, height = 40, dpi = 300) |
The full code for the data processing and the graphs is available on GitHub.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Make Your Own NOAA Sea Temperature Graph