Elevation API ATLAS Elevation 3 Readfromgcs
Originally
ADR-0060 ELEVATION_API_ATLAS-ELEVATION-3-ReadFromGCS (v7) · Source on Confluence ↗Raster data from GCS
Context
Raster data is stored on GCS, using Cloud Optimized GeoTiff (COG) format. The whole dataset consist of numerous smaller GeoTiff images that often overlay one another. To efficiently retrieve data from such a dataset and take advantage of the partial reading capabilities offered by Cloud Optimized GeoTiff, the reader must rapidly locate the specific COG file it intends to read. More on that can be found in Storage Layer ADR. This ADR outlines the process for accomplishing this.
Typical spatial raster datasets consist of multiple smaller COGs, each of which might be referenced in a different spatial projection, and their sizes may vary depending on the data density in the given area.
An example for Africa dataset can be seen below:
Invalid Image Path
Each blue square represents the boundary box for each Cloud Optimized GeoTiff file within the dataset. In order to get slippy tile (red square), the reader must be able to locate the appropriate files, merge the appropriate data sections contained within them and in case the same area is covered by more than one file decide which data should be picked.
Decision
Technology of choice
Atlas decided to implement a solution called Mosaic. It is a data structure that uses a GeoIndex to map assets to pre-defined areas. GeoIndex of choice is a QuadKey since it aligns with slippy tile abstraction.
An elevation service can quickly find out which files contains a data for requested tile simply by computing the quadkey for the tile and reading value assigned to that key.
Mosaic Infrastrucutre
Mosaic is stored in separate redis database as key-value pairs structure. In case of redis failure as a fallback mechanism backup of this structure is stored on GCS. The separate sidecar service is responsible for detecting the redis outage and re-populating the data. During redis outage time app switches to GCS backup as data source.
Mosaic Data Structure
An example how QuadKey is generated can be seen below:
Invalid Image Path
Metadata
Metadata header is stored under mosaic_json_metadata key in redis and has a format of:
{
"version": "1.0.0",
"minzoom": 8,
"maxzoom": 14,
"quadkey_zoom": 8,
"bounds": [
-180,
-15.000555555559954,
180,
72.00055555629484
],
"center": [
0,
28.500000000367443,
8
]
}Mosaic metadata contains the basic informations about indexed dataset.
- Version - version of current mosaic schema
- minzoom - Minimum zoom level of a tile supported by this dataset.
- maxzoom - Maximum zoom level of a tile supported by this dataset.
- quadkey_zoom - A zoom level of a QuadKey, which was used to index the Assets for this dataset.
- bounds - Dataset geographical bounds (lan, lon)
- center - Center of dataset
Asset Map
Asset map is a key value pairs stored in redis that maps the GeoTiff assets to related geoindex.
Any tile XYZ index can be converted to quadkey geoindex, so this approach enables a fast retrieval of Geotiffs related to desired location, without expensive GeoQuery.
Example content of key 13131031
[
{
"url": "gs://utm-atlas-dev-live-bucket/raster/data/DEM/USGS_13_n54e172_20210819.tif",
"bbox": {
"left": 171.99944444370726,
"bottom": 52.99944444410676,
"right": 173.0005555558954,
"top": 54.00055555629484
},
"crs": "EPSG:4269",
"capture_date": "2021-08-19"
},
{
"url": "gs://utm-atlas-dev-live-bucket/raster/data/DEM/USGS_13_n54e173_20210819.tif",
"bbox": {
"left": 172.9994444435074,
"bottom": 52.99944444410676,
"right": 174.0005555565948,
"top": 54.00055555629484
},
"crs": "EPSG:4269",
"capture_date": "2021-08-19"
}
]Each QuadKey contains a list of Assets that intersects its area. Each asset has following schema:
- url - url to Geotiff stored on GCS
- bbox - boudning box of the data contained in Geotiff. This field enables service to decide if it wants to read the geotiff in case the requested tile covers smaller area than the quadkey (tilezoom level is lower than quad_key zoom level).
- crs - Geotiff coordinate reference system. It defines the reference for bbox
- capture_date - A date of capture the data for GeoTiff. It enables consumer service to chose the data in case multiple Geotiff images overlays.
Atlas Implementation
Locating the data
Invalid Image Path
Generating the tile from assets
Invalid Image Path
When all geotiffs that covers the area are known the service merges the data they contain in order to create the output tile. The merging algorithm has following assumptions:
- each tile is a data array of size 256x256, each data point in this array is called pixel
- The assets are sorted via capture date from the most recent geotiff to the oldest.
- More than one assets can cover the same pixel.
Resolving the data conflicts
If multiple files covers the same area the service have to decide which one will be source of truth for each pixel in output tile. In order to resolve those conflicts the decision is that the most recent data takes the precedence over the older data.
Invalid Image Path
Alternatives Considered
Use other data format than COG
COG is already a default publish format for spatial data providers and it supports cloud-reading scenarios well it was decided to use it as default. Other data formats would require extra parsing, transformations or workarounds within presented architecture.
Store all data in one big Cloud Optimized GeoTiff
Combining all raster data into a single big Cloud Optimized GeoTiff is feasible, but the size of the header file is directly related to the number of tiles contained within the cloud-optimized GeoTiff. In this approach reading and parsing the header can become the bottleneck for the whole process.
Use MosaicJSON standard
This approach was abaddoned due to following risks:
- Reading a JSON file from GCS takes a lot of time
- Caching a JSON file on app creates a state machine, if multiple of service instances are used it is difficult to coordinate cache invalidation.
- UTM elevation business use case requires extension of data provided by MosaicJSON.
By standard it only provides an uri to a file. In order to improve read performance we also need an internal geometry (to check if a file is worth reading). Moreover the business use-case requires to prioritize the most recent data available. to get this information we need as well a capture date.
Use GeoTiff prioritization algorithm for the images that overlay each other
For the initial GeoTiff source reading the first non-null pixel algorithm is sufficient, but once we get more frequently changing data we must consider an alternative (eg. latest image first or custom prioritization).
Links
- MosaicJSON spec https://github.com/developmentseed/mosaicjson-spec
- Virtual raster https://gdal.org/drivers/raster/vrt.html
- Medium article about COGs and MosaicJSON https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df
- QuadKey https://towardsdatascience.com/geospatial-indexing-with-quadkeys-d933dff01496
- COG published as OGC standard https://www.ogc.org/press-release/cloud-optimized-geotiff-cog-published-as-official-ogc-standard/