Bring Your Own COG API

Overview

It is possible to access your own data from Sentinel Hub just like any other data you are used to provided a few conditions are met.

These are:

  • Store your raster data in the cloud optimized geotiff (COG) format on your own S3 bucket in the supported region.
  • Configure the bucket's permissions so that Sentinel Hub can read them.
  • Import tiles using the dashboard or API.

Your data needs to be organized into collections of tiles. Each tile needs to contain a set of bands and (optionally) an acquisition date and time. Tiles with the same bands can be grouped into collections. Think of the Sentinel-2 datasource as a collection of Sentinel-2 tiles.

Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool is a utility tool available as a Docker image and Java jar which can be used to prepare your data for use in Sentinel Hub. It takes care of the entire process. The same steps can be done manually and are detailed below, should you prefer or require more control over the process.

  • Get the Docker tool here
  • Get the Java jar here
  • Get the source code here

Converting to COG

Constraints and settings

There are a few additional constraints in addition to having COG files. These are:

  • COG header tags must not exceed one megabyte
  • the internal tile size must be between 256 x 256 and 2048 x 2048
  • the internal tile size must be equal across all overview levels (including the full resolution source).
  • the projection needs to be one of: WGS84 (EPSG:4326), WebMercator (EPGS:3857), any UTM zone (EPSG:32601-32660, 32701-32760), or Europe LAEA (EPSG:3035).
  • each band needs to be in a single file.
  • band name should be a valid JavaScript identifier so it can be safely used in evalscripts; valid identifiers are case-sensitive, can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit, and should not be one of the reserved JavaScript keywords.
  • the band file names need to be consistent for all tiles in a collection. For example, if you have B1.tiff in one tile then you also need B1.tiff in all the other tiles in your collection.
  • consistent extensions for all files in a collection (so a collection cannot contain both B1.tiff and B2.TIF)
  • JPEG compressed TIFFs are not supported.
  • currently supported formats are the same as those supported for outputs, see sampleType.

Bands can have different resolutions.

For best performance we recommend the following setting for COGs: deflate compressed with 1024x1024 pixel internal tiling.

GDAL example command

COGs can be generated in a single step with GDAL 3.1 or newer using the COG raster driver. For older GDAL versions, see below. Even though you can use any GDAL version, we highly recommend you use v3.1 or newer, as older versions have issues with average downsampling (see https://gdal.org/programs/gdaladdo.html).

The input file must conform to the constraints regarding the projection, units per pixel, and pixel formats. To generate a COG from an input file that contains a single band:

gdal_translate -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE=1024 -co RESAMPLING=AVERAGE -co OVERVIEWS=IGNORE_EXISTING input.extension output.tiff

Additional parameters may be needed:

  • if the input file contains multiple bands, add -b <band> and run it once for each band, where <band> is the band number, starting from 1. Each band must of course be saved to a different output file.
  • if your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.
  • for many types of data adding a predictor can further reduce the file size. It is best to test this on your own data, to enable the predictor add -co PREDICTOR=YES.
Older GDAL versions

For GDAL older than 3.1, multiple commands are needed. As above, we assume that the input files are single bands. If they contain multiple bands then individual bands need to be extracted and the following has to be run multiple times. Extract individual bands by adding -b <band>, where <band> is the band number, starting from 1, to the first command.

gdal_translate -of GTIFF input.extension intermediate.tiff

NOTE: If your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.

gdaladdo -r average --config GDAL_TIFF_OVR_BLOCKSIZE 1024 intermediate.tiff 2 4 8 16 32

(The number of overview levels you need depends on your source data. A good rule of thumb is to have as many overview levels as necessary for the entire source image to fit on one 1024x1024 tile).

gdal_translate -co TILED=YES -co COPY_SRC_OVERVIEWS=YES --config GDAL_TIFF_OVR_BLOCKSIZE 1024 -co BLOCKXSIZE=1024 -co BLOCKYSIZE=1024 -co COMPRESS=DEFLATE intermediate.tiff output.tiff

NOTE: for many types of data adding a predictor can further reduce the file size. It is best you test this on your own data, to enable the predictor add -co PREDICTOR=2 to the above command.

Once the commands finish you can delete the intermediate.tiff file.

For more information about each command see the GDAL documentation:

BYOC deployment

BYOC can be deployed on AWS (2 locations), CreoDIAS, Mundi and CODE-DE. Whichever deployment you choose, your request URL should be based on the url of the deployment. For example, if you choose to ingest your data from AWS US region (Oregon), your request URL should be https://services-uswest2.sentinel-hub.com/api/v1/byoc. See the table below for the supported deployments and their BYOC url endpoint.

BYOC deploymentBYOC URL end-point
AWS EU (Frankfurt)https://services.sentinel-hub.com/api/v1/byoc
AWS US (Oregon)https://services-uswest2.sentinel-hub.com/api/v1/byoc
CreoDIAShttps://creodias.sentinel-hub.com/api/v1/byoc
Mundihttps://shservices.mundiwebservices.com/api/v1/byoc
CODE-DEhttps://code-de.sentinel-hub.com/api/v1/byoc

AWS bucket settings

Bucket region

On AWS, the bucket location needs to be set to either eu-central-1 when using EU deployment (Frankfurt) or us-west-2 when using US deployment (Oregon).

Bucket settings

Your AWS bucket needs to be configured to allow access from Sentinel Hub. To do this, update your bucket policy to include the following statement (don't forget to replace <bucket_name> with your actual bucket name):

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Sentinel Hub permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::614251495211:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}

Configuring collection

When creating a collection:

  • you need to provide the S3 bucket where you data is,
  • you can provide names and data types of the bands, but only using BYOC API,
  • you can provide no data value using Dashboard or BYOC API.

If bands or no data value are not configured, BYOC service automatically configures them based on the bands of the first successfully ingested tile. The no data value is automatically extracted from TIFF tag GDAL_NODATA (TIFF entry ID = 42113), but only if all TIFF files of the first tile have the tag with the exact value and if the value is a number. Otherwise, the value is left unset.

Keep in mind:

  • that the bucket cannot be changed after the collection is created,
  • that the bands cannot be changed once they are configured,
  • and that the no data value can be changed at anytime using Dashboard or BYOC API.

Ingesting the tiles

There are two ways of doing this. The easier version is using the dashboard.

To create a new collection click the New collection button. The name can be anything and is there for your own reference. The S3 bucket name is the bucket name containing your data.

Once a collection is created you can then add tiles. Note that as of this writing only single tiles can be added in one step.

Click the Add tile button. Provide a path to the COG files inside the s3 bucket. For example, if your band files are stored in s3://bucket-name/folder/, then set folder as the tile path. In this case the band names will equal the file names. For example, the band B1 corresponds to the file s3://bucket-name/folder/B1.tiff.

If your file names have something other than just the band name, such as a prefix, this is fine as long as the prefix is the same for all files. In this case the path needs to include this prefix and also the band placeholder: (BAND). Adding the extension is optional. For an example, this is what would happen if you would use the following path folder/tile_1_(BAND)_2019.tiff for the following files:

  • s3://bucket-name/folder/tile_1_B1_2019.tiff - the file would be used, the band name would be B1
  • s3://bucket-name/folder/tile_1_B2_2019.tiff - the file would be used, the band name would be B2
  • s3://bucket-name/folder/tile_2_B1_2019.tiff - the file would not be used
  • s3://bucket-name/folder/tile_2_B2_2019.tiff - the file would not be used

Optionally, set the sensing time of the tile here as well.

Do not forget that adding multiple tiles will work only if these tiles have the same bands (with different data of course).

The more involved version is using API requests, see BYOC API reference or Python examples.

A note about cover geometries

Each tile ingested also requires a cover geometry. A cover geometry is a geometry which outlines the valid data part of the tile. Nodata therefore should not be contained in the cover geometry. In the simplest case, the cover geometry will equal the bounding box of the file being ingested.

The cover geometry is important because it tells the system where it can expect to find data. As a consequence, this determines how data is rendered where tiles overlap. If you have tiles with overlapping cover geometries, only the data from one tile can be rendered where two (or more) cover geometries intersect. This is true even if this data is nodata or if it lies outside the tile bounding box. Having quality cover geometries is therefore important for collections where many tiles containing nodata overlap. Not all cases need precise cover geometries, however. A single tile or a regularly gridded collection with a single date and coordinate reference system can get away with cover geometries equalling the bounding box.


Overlapping tiles

If the cover geometry is not specified during ingestion it will automatically be set to the tile bounding box. Sentinel Hub will not attempt to generate a more precise geometry as it is impossible to prepare such a process which will work well for all users. It is therefore your responsibility to provide quality cover geometries and in doing so allow you to extract the most out of your data. If ingesting tiles using the API, set the cover geometry using the coverGeometry field in the API request. It must be in the GeoJSON format and in a projected or geodetic coordinate reference system which is supported by Sentinel Hub. Cover geometries in practice mean one polygon or multipolygon. They must also contain no more than 100 points.

Generating cover geometries

GDAL

One way of getting a cover geometry is using the GDAL utility script gdal_trace_outline which takes a raster and returns a cover geometry in the WKT format. This then needs to be converted to GeoJSON. In this example a single band file is traced:

gdal_trace_outline band.tif -out-cs en -wkt-out wkt.txt

The process might take a while if you have a large file. To speed up the process you can pass a subsampled file which you can get with gdal_translate. To get a file that is 1% of the original size:

gdal_translate band.tif subsampled.tif -outsize 1% 1%

or if it's stored on AWS S3:

gdal_translate /vsis3/bucket-name/folder/band.tif subsampled.tif -outsize 1% 1%

Note that calculating the cover geometry on subsampled rasters may not be sufficiently accurate for touching but not intersecting tiles as the imprecision caused by downsampling may leave gaps.

Finally, you need to convert the WKT file to GeoJSON and specify the CRS under crs.properties.name (except when WGS84 when it can be omitted). CRSs with the EPSG code <EpsgCode> should be specified as urn:ogc:def:crs:EPSG::<EpsgCode>. Here is a GeoJSON example in ESPG:32633.

{
"type": "MultiPolygon",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::32633"
}
},
"coordinates": [
[
[
[
370270.52147506207,
5085707.891369364
],
...
]
]
]
}
Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool can also help you update the cover geometry of existing tiles on Sentinel Hub. Use the set-coverage command. On Docker, get help and parameters by running: docker run sentinelhub/byoc-tool set-coverage --help

Workarounds

In case your input data is complex and cannot be adequately simply outlined it is nevertheless possible to obtain pixel-precise rendering. In this case, set the cover geometry to any which covers all the valid input pixels. The file bounding box as the default is such an example. What follows is doing the mosaicking in the custom script with the help of dataMask.

First, set the mosaicking parameter within setup to TILE (mosaicking: Mosaicking.TILE) and add the dataMask to the array of input bands.

Then use something like the following as your evalscript. Since dataMask precisely determines which pixels are valid and which ones are not, the moment a valid pixel is found this can be returned, alternatively the next scene should be checked.

function evaluatePixel(samples, scenes) {
for (let i = 0; i < samples.length; i++) {
let sample = samples[i];
if (sample.dataMask == 1) {
return someCombination(sample);
}
}
return someNodataValueArray;
}

Optionally, you may additionally use the preProcessScenes function to potentially reduce the number of tiles which will be processed. This is useful to set an upper limit for the number of processing units which will be used. The following limits the maximum number of tiles to 5, for example.

function preProcessScenes (collections) {
collections.scenes.tiles = collections.scenes.tiles.splice(5);
return collections;
}

Note that getting data in such a manner will use more processing units than SIMPLE mosaicking with precise cover geometries.

This BYOC tutorial notebook is a simple walk-through on creating, updating, listing, and deleting your data collections through Python using Sentinel Hub Python package.

Examples

BYOC API Examples