Bring Your Own COG

Overview

It is possible to access your own data from Sentinel Hub just like any other data you are used to provided a few conditions are met.

These are:

  • Store your raster data in the cloud optimized geotiff (COG) format on your own S3 bucket in the supported region.
  • Configure the bucket's permissions so that Sentinel Hub can read them.
  • Import tiles using the dashboard or API.

Your data needs to be organized into collections of tiles. Each tile needs to contain a set of bands and (optionally) an acquisition date and time. Tiles with the same bands can be grouped into collections. Think of the Sentinel-2 datasource as a collection of Sentinel-2 tiles.

Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool is a utility tool available as a Docker image and Java jar which can be used to prepare your data for use in Sentinel Hub. It takes care of the entire process. The same steps can be done manually and are detailed below, should you prefer or require more control over the process.

  • Get the Docker tool here
  • Get the Java jar here
  • Get the source code here

Converting to COG

Constraints and settings

There are a few additional constraints in addition to having COG files. These are:

  • COG header tags must not exceed 30,000 bytes.
  • the internal tile size must be equal across all overview levels (including the full resolution source).
  • the projection needs to be one of: WGS84 (EPSG:4326), WebMercator (EPGS:3857), or any UTM zone (EPSG:32601-32660, 32701-32760).
  • each band needs to be in a single file.
  • band name should be a valid JavaScript identifier so it can be safely used in evalscripts; valid identifiers are case-sensitive, can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit, and should not be one of the reserved JavaScript keywords.
  • the band file names need to be consistent for all tiles in a collection. For example, if you have B1.tiff in one tile then you also need B1.tiff in all the other tiles in your collection.
  • consistent extensions for all files in a collection (so a collection cannot contain both B1.tiff and B2.TIF)
  • JPEG compressed TIFFs are not supported.
  • the units per pixel must be equal in the X and Y directions
  • currently supported formats are the same as those supported for outputs, see sampleType.

Bands can have different resolutions.

For best performance we recommend the following setting for COGs: deflate compressed with 1024x1024 pixel internal tiling.

GDAL example command

COGs can be generated in a single step with GDAL 3.1 or newer using the COG raster driver. For an example compatible with older GDAL version, see below.

The input file must conform to the constraints regarding the projection, units per pixel, and pixel formats. To generate a COG from an input file that contains a single band:

gdal_translate -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE=1024 -co RESAMPLING=AVERAGE input.extension output.tiff

Additional parameters may be needed:

  • if the input file contains multiple bands, add -b <band> and run it once for each band, where <band> is the band number, starting from 1. Each band must of course be saved to a different output file.
  • if your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.
  • for many types of data adding a predictor can further reduce the file size. It is best to test this on your own data, to enable the predictor add -co PREDICTOR=YES.
Older GDAL versions

For GDAL older than 3.1, multiple commands are needed. As above, we assume that the input files are single bands. If they contain multiple bands then individual bands need to be extracted and the following has to be run multiple times. Extract individual bands by adding -b <band>, where <band> is the band number, starting from 1, to the first command.

gdal_translate -of GTIFF input.extension intermediate.tiff

NOTE: If your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.

gdaladdo -r average --config GDAL_TIFF_OVR_BLOCKSIZE 1024 intermediate.tiff 2 4 8 16 32

(The number of overview levels you need depends on your source data. A good rule of thumb is to have as many overview levels as necessary for the entire source image to fit on one 1024x1024 tile).

gdal_translate -co TILED=YES -co COPY_SRC_OVERVIEWS=YES --config GDAL_TIFF_OVR_BLOCKSIZE 1024 -co BLOCKXSIZE=1024 -co BLOCKYSIZE=1024 -co COMPRESS=DEFLATE intermediate.tiff output.tiff

NOTE: for many types of data adding a predictor can further reduce the file size. It is best you test this on your own data, to enable the predictor add -co PREDICTOR=2 to the above command.

Once the commands finish you can delete the intermediate.tiff file.

For more information about each command see the GDAL documentation:

Configuring the bucket

Bucket region

The bucket with your data needs to be in the eu-central-1 (Frankfurt) or us-west-2 (Oregon) AWS region.

Bucket settings

Your bucket needs to be configured to allow access from Sentinel Hub. To do this, update your bucket policy to include the following statement (don't forget to replace <bucket_name> with your actual bucket name):

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Sentinel Hub permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::614251495211:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}

Ingesting the tiles

There are two ways of doing this. The easier version is using the dashboard.

To create a new collection click the New collection button. The name can be anything and is there for your own reference. The S3 bucket name is the bucket name containing your data.

Once a collection is created you can then add tiles. Note that as of this writing only single tiles can be added in one step.

Click the Add tile button. Provide a path to the COG files inside the s3 bucket. For example, if your band files are stored in s3://bucket-name/folder/, then set folder as the tile path. In this case the band names will equal the file names. For example, the band B1 corresponds to the file s3://bucket-name/folder/B1.tiff.

If your file names have something other than just the band name, such as a prefix, this is fine as long as the prefix is the same for all files. In this case the path needs to include this prefix and also the band placeholder: (BAND). Adding the extension is optional. For an example, this is what would happen if you would use the following path folder/tile_1_(BAND)_2019.tiff for the following files:

  • s3://bucket-name/folder/tile_1_B1_2019.tiff - the file would be used, the band name would be B1
  • s3://bucket-name/folder/tile_1_B2_2019.tiff - the file would be used, the band name would be B2
  • s3://bucket-name/folder/tile_2_B1_2019.tiff - the file would not be used
  • s3://bucket-name/folder/tile_2_B2_2019.tiff - the file would not be used

Optionally, set the sensing time of the tile here as well.

Do not forget that adding multiple tiles will work only if these tiles have the same bands (with different data of course).

The more involved version is using API requests, see BYOC API reference or Python examples.

A note about cover geometries

Each tile ingested also requires a cover geometry. A cover geometry is a geometry which outlines the valid data part of the tile. Nodata therefore should not be contained in the cover geometry. In the simplest case, the cover geometry will equal the bounding box of the file being ingested.

The cover geometry is important because it tells the system where it can expect to find data. As a consequence, this determines how data is rendered where tiles overlap. If you have tiles with overlapping cover geometries, only the data from one tile can be rendered where two (or more) cover geometries intersect. This is true even if this data is nodata or if it lies outside the tile bounding box. Having quality cover geometries is therefore important for collections where many tiles containing nodata overlap. Not all cases need precise cover geometries, however. A single tile or a regularly gridded collection with a single date and coordinate reference system can get away with cover geometries equalling the bounding box.


Overlapping tiles

If the cover geometry is not specified during ingestion it will automatically be set to the tile bounding box. Sentinel Hub will not attempt to generate a more precise geometry as it is impossible to prepare such a process which will work well for all users. It is therefore your responsibility to provide quality cover geometries and in doing so allow you to extract the most out of your data. If ingesting tiles using the API, set the cover geometry using the coverGeometry field in the API request. It must be in the GeoJSON format and in a projected or geodetic coordinate reference system which is supported by Sentinel Hub. Cover geometries in practice mean one polygon or multipolygon. They must also contain no more than 100 points.

Generating cover geometries

GDAL

One way of getting a cover geometry is using the GDAL utility script gdal_trace_outline which takes a raster and returns a cover geometry in the WKT format. This then needs to be converted to GeoJSON. In this example a single band file is traced:

gdal_trace_outline band.tif -out-cs en -wkt-out wkt.txt

The process might take a while if you have a large file. To speed up the process you can pass a subsampled file which you can get with gdal_translate. To get a file that is 1% of the original size:

gdal_translate band.tif subsampled.tif -outsize 1% 1%

or if it's stored on AWS S3:

gdal_translate /vsis3/bucket-name/folder/band.tif subsampled.tif -outsize 1% 1%

Note that calculating the cover geometry on subsampled rasters may not be sufficiently accurate for touching but not intersecting tiles as the imprecision caused by downsampling may leave gaps.

Finally, you need to convert the WKT file to GeoJSON and specify the CRS under crs.properties.name (except when WGS84 when it can be omitted). CRSs with the EPSG code <EpsgCode> should be specified as urn:ogc:def:crs:EPSG::<EpsgCode>. Here is a GeoJSON example in ESPG:32633.

{
"type": "MultiPolygon",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::32633"
}
},
"coordinates": [
[
[
[
370270.52147506207,
5085707.891369364
],
...
]
]
]
}
Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool can also help you update the cover geometry of existing tiles on Sentinel Hub. Use the set-coverage command. On Docker, get help and parameters by running: docker run sentinelhub/byoc-tool set-coverage --help

Workarounds

In case your input data is complex and cannot be adequately simply outlined it is nevertheless possible to obtain pixel-precise rendering. In this case, set the cover geometry to any which covers all the valid input pixels. The file bounding box as the default is such an example. What follows is doing the mosaicking in the custom script with the help of dataMask.

First, set the mosaicking parameter within setup to TILE (mosaicking: Mosaicking.TILE) and add the dataMask to the array of input bands.

Then use something like the following as your evalscript. Since dataMask precisely determines which pixels are valid and which ones are not, the moment a valid pixel is found this can be returned, alternatively the next scene should be checked.

function evaluatePixel(samples, scenes) {
for (let i = 0; i < samples.length; i++) {
let sample = samples[i];
if (sample.dataMask == 1) {
return someCombination(sample);
}
}
return someNodataValueArray;
}

Optionally, you may additionally use the filterScenes function to potentially reduce the number of tiles which will be processed. This is useful to set an upper limit for the number of processing units which will be used. The following limits the maximum number of tiles to 5, for example.

function filterScenes (scenes, inputMetadata) {
return scenes.splice(5);
}

Note that getting data in such a manner will use more processing units than SIMPLE mosaicking with precise cover geometries.

Accessing data

Data can be accessed via Process API or OGC API. In both cases collection id is needed, which can be obtained from your dashboard. You also need to access data from the correct endpoint:

  • for buckets in eu-central-1, access from services.sentinel-hub.com/
  • for buckets in us-west-2, access from services-uswest2.sentinel-hub.com/

Available Bands and Data

Since BYOC contains your data this means that the available bands are the ones you have prepared. The band names to use in you evalscript are also listed in each collection in your dashboard. dataMask is also available. See here for more information. Note also that the only units available are digital numbers so any unit conversions, if necessary, are the responsibility of your evalscript.

Process API

To get your data using Sentinel Hub all you need is a process API request which points to your data.

Selecting BYOC collection

Add your new byoc collection to input.data using collection id prefixed with "byoc-":

{
"type": "byoc-<collectionId>"
}

for example

{
"type": "byoc-017aa0ae-33a6-45d3-8548-0f7d1041b40c"
}

The old syntax:

{
"type": "CUSTOM",
"dataFilter": {
"collectionId": "<collectionId>"
}
}

is still supported, but is's now deprecated.

Filtering Options

This chapter will explain the input.data.dataFilter object for BYOC.

mosaickingOrder

Sets the order of overlapping BYOC tiles from which the output result is mosaicked. The tiling is defined by the user when ingesting the data in BYOC.

ValueDescriptionNotes
mostRecentThe pixel will be selected from the tile, which the most recent sensing time.In case there are more tiles available with the same sensing time, the one, which was created later will be used.
leastRecentSimilar to mostRecent but in reverse order.

Processing Options

This chapter will explain the input.data.processing object of the process API.

ActionDescriptionValuesDefault
upsamplingDefines the interpolation used for processing when the pixel resolution is greater than the source resolution (e.g. 5m/px with a 10m/px source)NEAREST - nearest neighbour interpolation
BILINEAR - bilinear interpolation
BICUBIC - bicubic interpolation
NEAREST
downsamplingAs above except when the resolution is lower.NEAREST - nearest neighbour interpolation
BILINEAR - bilinear interpolation
BICUBIC - bicubic interpolation
NEAREST

OGC API

To access your data via OGC you need to create a layer in the Configuration utility in either an existing or new configuration. When adding a layer you should set Source to Bring Your Own COG and Collection id to the id of your BYOC collection. You should also enter a custom script in Data processing field. This should return the appropriate values based on bands that are defined in the BYOC collection.

Once this is done the layer can be used via OGC API in the usual way.

For example: http://services.sentinel-hub.com/ogc/wms/<MyInstanceID>?REQUEST=GetMap&BBOX=15959450,8695500,16059450,8795500&CRS=EPSG:3857&WIDTH=500&HEIGHT=500&LAYERS=<MyBYOCLayerName>

WFS

To query your tiles using WFS you need to set the WFS feature type (TYPENAMES parameter) to DSS10-<MyCollectionID>, e.g. DSS10-a550f5e9-84d0-441a-8338-bbb04d42a72e.

Here is an example of a WFS GetFeature request:

http://services.sentinel-hub.com/ogc/wfs/<MyInstanceID>?SERVICE=wfs&REQUEST=GetFeature&BBOX=-90,180,90,-180&SRSNAME=EPSG:4326&OUTPUTFORMAT=application/json&TYPENAMES=DSS10-<MyCollectionID>