Using rstac and gdalcubes packages

IO is a catalog of environmental raster data layers that can be used in biodiversity-related modeling contexts. This resource is in the form of a STAC catalog, which is accessible through various methods, including R packages rstac and gdalcubes.

The available layers in the catalog can be visualized here.

The data layers in IO are in COG (Cloud Optimized Geotiff) format, which allows optimal access with remote queries. For example, it is possible to extract only a small region from a global file and transform its resolution and reference coordinate system without ever having to download it entirely.

Here are some examples of queries to interact with the IO STAC catalog and COG files.


To start, install the following packages. Once the packages are installed, it is not necessary to re-install them at each R session.

install.packages(c('gdalcubes', 'rstac', 'knitr'))

At each R session, use the necessary librairies.

library(gdalcubes)
library(rstac)
library(knitr)
library(stars)

Connect to the STAC catalog.

s_obj <- stac("https://io.biodiversite-quebec.ca/stac/")

List the collections.

collections <- s_obj |> collections() |> get_request()

View the collections and their descriptions.

df<-data.frame(id=character(),title=character(),description=character())
for (c in collections[['collections']]){
  df<-rbind(df,data.frame(id=c$id,title=c$title,description=c$description))
}
kable(df)

Search for a specific collection (earthenv_landcover).


it_obj <- s_obj |>
  stac_search(collections = "earthenv_landcover") |>
  post_request() |> items_fetch()
it_obj

View the available layers in this collection.

it_obj <- s_obj |>
  collections("earthenv_landcover") |> items() |>
  get_request() |> items_fetch()
it_obj

View the properties of the first item (layer).

it_obj[['features']][[1]]$properties

Summary of items.

df<-data.frame(id=character(),datetime=character(), description=character())
for (f in it_obj[['features']]){
  df<-rbind(df,data.frame(id=f$id,datetime=f$properties$datetime,description=f$properties$description))
}
kable(df)

Access the first item with the STARS package. This package allows access to COG files remotely in a fast and efficient way. Here, we will select the layer representing the percentage of "Evergreen/Deciduous Needleleaf Trees".

lc1<-read_stars(paste0('/vsicurl/',it_obj[['features']][[12]]$assets$data$href), proxy = TRUE)
plot(lc1)

Select only a part of the raster.

bbox<-st_bbox(c(xmin = -76, xmax = -70, ymax = 54, ymin = 50), crs = st_crs(4326))
lc2 <- lc1 |> st_crop(bbox)

Visualize it.

pal <- colorRampPalette(c("black","darkblue","red","yellow","white"))
plot(lc2,breaks=seq(0,100,10),col=pal(10))

Save it to your computer in Cloud Optimized GeoTiff format.

# Replace " ~ " for the directory of your choice on your computer.
write_stars(lc2,'~/lc2.tif',driver='COG',options=c('COMPRESS=DEFLATE'))

Note that for a variable with categorical values, saving is a bit more complex. Save it to your computer in Cloud Optimized GeoTiff format, for a variable with categorical values. This command would only be appropriate is the data were categorical.

lc1 |> st_crop(bbox) |> write_stars('~/lc1.tif', driver='COG', RasterIO=c('resampling'='mode'), 
options=c('COMPRESS=DEFLATE', 'OVERVIEW_RESAMPLING=MODE', 'LEVEL=6', 
'OVERVIEW_COUNT=8', 'RESAMPLING=MODE', 'WARP_RESAMPLING=MODE', 'OVERVIEWS=IGNORE_EXISTING'))

Using GDALCUBES

Necessary step for GDALCUBES to work with IO data.

for (i in 1:length(it_obj$features)){
  it_obj$features[[i]]$assets$data$roles='data'
}

Filter based on item properties and create a collection.

st <- stac_image_collection(it_obj$features, asset_names=c('data'), 
             property_filter = function(f){f[['class']] %in% c('1','2','3','4')},srs='EPSG:4326')
st

Build a cube to process or visualize the data. Note that this cube can be in a different CRS and resolution from those of the original elements/files. However, the time dimension must capture the temporal framework of the item. dt is expressed as a time period. P1D is a period of one day, P1M is a period of one month, P1Y is a period of one year. Resampling methods should be tailored to the data type. For categorical data, use "mode" or "nearest". For continuous data, use "bilinear". Aggregation is relevant only when multiple rasters overlap.

Here is an example that sums the four forest categories using aggregation="sum", with a change of the reference system to use Quebec Lambert (EPSG:32198) and a resolution of 1 km.

bbox<-st_bbox(c(xmin = -483695, xmax = -84643, ymin = 112704 , ymax = 684311), crs = st_crs(32198))

v <- cube_view(srs = "EPSG:32198", extent = list(t0 = "2000-01-01", t1 = "2000-01-01",
                                                left = bbox$xmin, right =bbox$xmax, top = bbox$ymax, bottom = bbox$ymin), 
                                                dx=1000, dy=1000, dt="P1D", aggregation = "sum", resampling = "mean")

Match the collection and cube_view to create a raster cube.

lc_cube <- raster_cube(st, v)

Save the resulting file to your computer.

# Replace " ~ " for the directory of your choice on your computer.
lc_cube |> write_tif('~/',prefix='lc2',creation_options=list('COMPRESS'='DEFLATE'))
lc_cube |> plot(zlim=c(0,100),col=pal(10))

Use the "Accessibility from cities" dataset, keeping the same CRS and extent.

it_obj <- s_obj |>
  collections("accessibility_to_cities") |> items() |>
  get_request() |> items_fetch()
v <- cube_view(srs = "EPSG:32198", extent = list(t0 = "2015-01-01", t1 = "2015-01-01",
                                                left = bbox$xmin, right =bbox$xmax, top = bbox$ymax, bottom = bbox$ymin), 
                                                dx=1000, dy=1000, dt="P1D", aggregation

 = "mean", resampling = "bilinear")
for (i in 1:length(it_obj$features)){
  it_obj$features[[i]]$assets$data$roles='data'
}
st <- stac_image_collection(it_obj$features)
lc_cube <- raster_cube(st, v)
lc_cube |> plot(col=heat.colors)

Use the CHELSA dataset on climatologies and create an average map for the months of June, July, and August 2010 to 2019.

it_obj <- s_obj |>
  stac_search(collections = "chelsa-monthly", datetime="2010-06-01T00:00:00Z/2019-08-01T00:00:00Z") |> 
  get_request() |> items_fetch()

v <- cube_view(srs = "EPSG:32198", extent = list(t0 = "2010-06-01", t1 = "2019-08-31",
                                                left = bbox$xmin, right =bbox$xmax, top = bbox$ymax, bottom = bbox$ymin),
               dx=1000, dy=1000, dt="P10Y",
               aggregation = "mean",
               resampling = "bilinear")

for (i in 1:length(it_obj$features)){
  names(it_obj$features[[i]]$assets)='data'
  it_obj$features[[i]]$assets$data$roles='data'
}
anames=unlist(lapply(it_obj$features,function(f){f['id']}))
st <- stac_image_collection(it_obj$features, asset_names = 'data',  
                       property_filter = function(f){f[['variable']] == 'tas' & (f[['month']] %in% c(6,7,8)) })
c_cube <- raster_cube(st, v)
c_cube |> plot(col=heat.colors)

c_cube |> write_tif('~/',prefix='chelsa-monthly',creation_options=list('COMPRESS'='DEFLATE'))