Bring Your Own COG API

Overview

It is possible to access your own data from Sentinel Hub just like any other data you are used to provided a few conditions are met.

These are:

  • Store your raster data in the cloud optimized geotiff (COG) format on your own S3 bucket in the supported region.
  • Configure the bucket's permissions so that Sentinel Hub can read them.
  • Import tiles using the dashboard or API.

Your data needs to be organized into collections of tiles. Each tile needs to contain a set of bands and (optionally) an acquisition date and time. Tiles with the same bands can be grouped into collections. Think of the Sentinel-2 data as a collection of Sentinel-2 tiles.

A note about COG overviews used for processing

When processing data, we select the nearest overview level which has higher resolution than your request, or the full resolution image.

Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool is a utility tool available as a Docker image and Java jar which can be used to prepare your data for use in Sentinel Hub. It takes care of the entire process. The same steps can be done manually and are detailed below, should you prefer or require more control over the process.

  • Get the Docker tool here
  • Get the Java jar here
  • Get the source code here

Converting to COG

Constraints and settings

COGs can contain either a single band or multiple bands. For multi-band COGs we support both planar configurations formats - chunky and planar format.

There are a few additional constraints in addition to having COG files. These are:

  • The COG header size must not exceed one megabyte.
  • The internal tile size must be between 256 x 256 and 2048 x 2048.
  • The projection needs to be one of: WGS84 (EPSG:4326), WebMercator (EPGS:3857), any UTM zone (EPSG:32601-32660, 32701-32760), or Europe LAEA (EPSG:3035).
  • The COG must not cross any of the two poles.
  • The band name should be a valid JavaScript identifier so it can be safely used in evalscripts; valid identifiers are case-sensitive, can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit, and should not be one of the reserved JavaScript keywords.
  • There can be at most 100 bands.
  • Multi-band COGs in chunky format can have at most 10 bands.
  • The file names need to be consistent for all tiles in a collection. For example, if you have B1.tiff in one tile then you also need B1.tiff in all the other tiles in your collection.
  • All files of each tile needs to have consistent extension (so a tile cannot contain both B1.tiff and B2.TIF).
  • All files of each tile need to have the same bounding box.
  • All files of each band need to have the same bit depth.
  • Files can be compressed with DEFLATE, ZLIB, PIXTIFF_ZIP, PACKBITS or LZW compression method. JPEG compression is not supported.
  • Currently supported formats are the same as those supported for outputs, see sampleType.

Bands can have different resolutions.

For best performance we recommend the following setting for COGs: deflate compressed with 1024x1024 pixel internal tiling.

GDAL example command

COGs can be generated in a single step with GDAL 3.1 or newer using the COG raster driver. For older GDAL versions or if you want planar multi-band COGs, see below. Even though you can use any GDAL version, we highly recommend you use v3.1 or newer, as older versions have issues with average downsampling (see https://gdal.org/programs/gdaladdo.html).

The input file must conform to the constraints regarding the projection, units per pixel, and pixel formats. To generate a COG from an input file:

gdal_translate -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE=1024 -co RESAMPLING=AVERAGE -co OVERVIEWS=IGNORE_EXISTING input.extension output.tiff

Additional parameters may be needed:

  • if the input file contains multiple bands, but you only need one or only some of them, add -b <bandA> -b <bandB> ..., where <bandX> is the band number, starting from 1.
  • if your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.
  • for many types of data adding a predictor can further reduce the file size. It is best to test this on your own data, to enable the predictor add -co PREDICTOR=YES.

Multi-band COGs generated this way, are encoded in chunky format and you cannot change it to planar format. To get a COG in planar format, follow the next chapter.

Older GDAL versions or planar multi-band COGs

For GDAL older than 3.1 or if you want planar multi-band COGs, multiple commands are needed. To extract individual bands, add -b <band>, where <band> is the band number, starting from 1, to the first command.

gdal_translate -of GTIFF input.extension intermediate.tiff

NOTE: If your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.

gdaladdo -r average --config GDAL_TIFF_OVR_BLOCKSIZE 1024 intermediate.tiff 2 4 8 16 32

(The number of overview levels you need depends on your source data. A good rule of thumb is to have as many overview levels as necessary for the entire source image to fit on one 1024x1024 tile).

gdal_translate -co TILED=YES -co COPY_SRC_OVERVIEWS=YES --config GDAL_TIFF_OVR_BLOCKSIZE 1024 -co BLOCKXSIZE=1024 -co BLOCKYSIZE=1024 -co COMPRESS=DEFLATE intermediate.tiff output.tiff

To generate a planar multi-band COG, add -co INTERLEAVE=BAND. For chunky format, you don't need to pass anything, as this is the default format.

NOTE: for many types of data adding a predictor can further reduce the file size. It is best you test this on your own data. To enable the predictor, add to the above command -co PREDICTOR=2 for integers, and -co PREDICTOR=3 for floating points.

Once the commands finish, you can delete the intermediate.tiff file.

For more information about each command see the GDAL documentation:

BYOC deployment

BYOC is available on AWS (2 regions), CreoDIAS, Mundi and CODE-DE. The BYOC API endpoint depends on the chosen deployment as specified in the table below.

BYOC deploymentBYOC URL end-point
AWS EU (Frankfurt)https://services.sentinel-hub.com/api/v1/byoc
AWS US (Oregon)https://services-uswest2.sentinel-hub.com/api/v1/byoc
CreoDIAShttps://creodias.sentinel-hub.com/api/v1/byoc
Mundihttps://shservices.mundiwebservices.com/api/v1/byoc
CODE-DEhttps://code-de.sentinel-hub.com/api/v1/byoc

Bucket settings

AWS bucket region

The bucket containing your COGs needs to be in the same region as the BYOC deployment you will use, that is, either eu-central-1 when using EU region (Frankfurt) or us-west-2 when using US region (Oregon).

AWS bucket settings

Your AWS bucket needs to be configured to allow access from Sentinel Hub. To do this, update your bucket policy to include the following statement (don't forget to replace <bucket_name> with your actual bucket name):

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Sentinel Hub permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::614251495211:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}

CreoDIAS bucket settings

Your CreoDIAS bucket needs to be configured to allow access from Sentinel Hub. This can be done using S3 bucket policy, since CreoDIAS provides object storage compatibility with AWS API. To do this, set the following policy on your bucket (don't forget to replace <bucket_name> with your actual bucket name):

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Sentinel Hub permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::b96188fd342e4f59821340ffc8cef9f5:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}

A python script to set a CreoDIAS bucket policy can be downloaded here.

CODE-DE bucket settings

How to configure CODE-DE buckets to allow access from Sentinel Hub is similar to CreoDIAS, however you need to set the following policy on your CODE-DE bucket (don't forget to replace <bucket_name> with your actual bucket name):

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Sentinel Hub permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::6db0b28619c14a41a7a061f03569ae20:root"
},
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::<bucket_name>",
"arn:aws:s3:::<bucket_name>/*"
]
}
]
}

A python script to set a CODE-DE bucket policy can be downloaded here.

Configuring collections

When creating a collection:

  • you need to provide the S3 bucket where you data is; if you have data in CreoDIAS or CODE-DE, prefix the bucket name by your CreoDIAS or CODE-DE Project ID as follows: PROJECT_ID:bucket_name ,
  • you can define bands, but only using BYOC API,
  • you can provide the no data value using Dashboard or BYOC API.

The no data value cannot be configured to NaN (not a number). However, there is no need to do this, as NaNs are by default treated as no data value.

Automatic configuration

If bands are not configured, BYOC service automatically configures them based on the files of the first ingested tile. In this case the bands are named after the files, while for multi-band files the band index in 1-based numbering is also added at the end. For example, the bands in a multi-band file named RGB.tiff would be named RGB_1, RGB_2, etc. You can rename any band later.

In this process, the service also configures the "no data" value, if it's not set by the user. The service automatically extracts "no data" values from the TIFF tag GDAL_NODATA (TIFF entry ID = 42113) of the files of the first ingested tile, and sets the value as the collection "no data" value, if all files have the exactly same value and if the value is a number. Otherwise, it sets values per band.

Manual band configuration

The below example shows how to configure manually instead of relying on the automatic configuration described above. Suppose your tiles are composed of two files - "RGB.tiff" with three 16-bit bands and "CLOUD_MASK.tiff" with a single 8-bit band. You would provide such configuration in additionalData.bands field of a new collection:

{
"Red": {
"source": "RGB",
"bandIndex": 1,
"bitDepth": 16
},
"Green": {
"source": "RGB",
"bandIndex": 2,
"bitDepth": 16
},
"Blue": {
"source": "RGB",
"bandIndex": 3,
"bitDepth": 16
},
"CloudMask": {
"source": "CLOUD_MASK",
"bandIndex": 1,
"bitDepth": 8
}
}

The keys "Red", "Green", "Blue", and "CloudMask" are the names of the bands that you are going to use in evalscripts. These names can be changed at any time. Inside each band specification you specify where the band is stored using the fields source and bandIndex. The source, together with tile path, defines the file (see below for details), while bandIndex is the band index in 1-based numbering.

Band renaming

Bands can be easily renamed in Dashboard. To do this using API, you need to provide the same band specs, but with new names. To obtain the current band specs, use this endpoint. For example, let's say your bands are defined like this, and you would like to rename bands "RGB_1", "RGB_2", "RGB_3" to "Red", "Green", and "Blue", respectively:

{
"RGB_1": {
"source": "RGB",
"bandIndex": 1,
"bitDepth": 16
},
"RGB_2": {
"source": "RGB",
"bandIndex": 2,
"bitDepth": 16
},
"RGB_3": {
"source": "RGB",
"bandIndex": 3,
"bitDepth": 16
},
"CLOUD_MASK": {
"source": "CLOUD_MASK",
"bandIndex": 1,
"bitDepth": 8
}
}

To achieve this, you need to use this endpoint. You need to provide the new names at the top level, but leave the band properties ("source", "bandIndex", etc) and values the same. So the content of additionalData.bands would be:

{
"Red": {
"source": "RGB",
"bandIndex": 1,
"bitDepth": 16
},
"Green": {
"source": "RGB",
"bandIndex": 2,
"bitDepth": 16
},
"Blue": {
"source": "RGB",
"bandIndex": 3,
"bitDepth": 16
},
"CLOUD_MASK": {
"source": "CLOUD_MASK",
"bandIndex": 1,
"bitDepth": 8
}
}

Keep in mind:

  • that the bucket cannot be changed after the collection is created,
  • that once bands have been configured you can only change band names or remove bands,
  • and that the no data value can be changed at anytime using Dashboard or BYOC API.

Configuring band sample format

The sample format is TIFF info that defines the band data type. It can be set to signed integers, unsigned integers, or floating points. Learn more about sample format here. These values are in BYOC defined as INT, UINT and FLOAT, respectively.

You can configure format manually in BYOC using API or Dashboard. If not set, it will get set to the value of the first ingested tile.

To configure it manually, set sampleFormat field for each band like this:

{
"Red": {
"source": "RGB",
"bandIndex": 1,
"bitDepth": 16,
"sampleFormat": "INT"
},
"Green": {
"source": "RGB",
"bandIndex": 2,
"bitDepth": 16,
"sampleFormat": "INT"
},
"Blue": {
"source": "RGB",
"bandIndex": 3,
"bitDepth": 16,
"sampleFormat": "INT"
},
"CLOUD_MASK": {
"source": "CLOUD_MASK",
"bandIndex": 1,
"bitDepth": 8,
"sampleFormat": "UINT"
}
}

After formats are set, the formats of all new files must match the formats defined in BYOC. If they do not match, files do not get ingested.

Legacy collections

Legacy collections are those collections created prior to the introduction of sampleFormat field. Before the field, we had read signed and unsigned integers as unsigned. To preserve backwards compatibility, we still read integers from these collections in this way, but you can change this by setting sampleFormat. If you do set it to INT, do note that you will get integers as signed in evalscripts from then on, and that you can set the field only once. Until you set it, we don't require that the formats in BYOC and files match.

The format can be changed in Dashboard or using API. Using API, make an update collection call with a payload that has sample format(s) set like the example above.

Ingesting the tiles

There are two ways of doing this. The easier version is using the dashboard.

To create a new collection click the New collection button. The name can be anything and is there for your own reference. The S3 bucket name is the bucket name containing your data.

Once the collection is created you can add tiles. Note that only a single tile can be added in one step.

To add a tile, click the Add tile button. Provide a path to the COG files inside the s3 bucket. For example, if your files are stored in s3://bucket-name/folder/, simply set folder as the tile path. Optionally, set the sensing time of the tile here as well.

When the tile is ingested its path will be automatically changed to folder/(BAND).tiff or similar, depending on the extension of the files in folder. Note that (BAND) is a placeholder that is replaced by the source of a band to obtain the actual file where the band is stored. In the example above your collection uses sources "RGB" and "CLOUD_MASK", thus the two files of the tile will be folder/RGB.tiff and folder/CLOUD_MASK.tiff.

For more complicated cases you must provide the path with the (BAND) placeholder and extension. For example, suppose your folder contains the files for multiple tiles:

  • s3://bucket-name/folder/tile_1_B1_2019.tif,
  • s3://bucket-name/folder/tile_1_B2_2019.tif,
  • s3://bucket-name/folder/tile_2_B1_2019.tif,
  • s3://bucket-name/folder/tile_2_B2_2019.tif.

Create the first tile with the path folder/tile_1_(BAND)_2019.tif to use the first two files and the second tile with the path folder/tile_2_(BAND)_2019.tif to use the next two files.

Do not forget that all tiles must contain the same set of files (with different data of course); that is, if a tile is missing one or more files it will fail to ingest.

To ingest tiles via API requests instead of the dashboard, see BYOC API reference or Python examples.

A note about changing files

While you may freely modify the data in your buckets, for it to continue to work reliably through Sentinel Hub you need to reingest tiles with changed data. Reingest tiles by removing them from the collection and then creating new ones. This is needed as it will update metadata required for processing and failing to do so can result in odd behavior.

A note about cover geometries

Each tile ingested also requires a cover geometry. A cover geometry is a geometry which outlines the valid data part of the tile. Nodata therefore should not be contained in the cover geometry. In the simplest case, the cover geometry will equal the bounding box of the file being ingested.

The cover geometry is important because it tells the system where it can expect to find data. As a consequence, this determines how data is rendered where tiles overlap. If you have tiles with overlapping cover geometries, only the data from one tile can be rendered where two (or more) cover geometries intersect. This is true even if this data is nodata or if it lies outside the tile bounding box. Having quality cover geometries is therefore important for collections where many tiles containing nodata overlap. Not all cases need precise cover geometries, however. A single tile or a regularly gridded collection with a single date and coordinate reference system can get away with cover geometries equalling the bounding box.


Overlapping tiles

If the cover geometry is not specified during ingestion it will automatically be set to the tile bounding box. Sentinel Hub will not attempt to generate a more precise geometry as it is impossible to prepare such a process which will work well for all users. It is therefore your responsibility to provide quality cover geometries and in doing so allow you to extract the most out of your data. If ingesting tiles using the API, set the cover geometry using the coverGeometry field in the API request. It must be in the GeoJSON format and in a projected or geodetic coordinate reference system which is supported by Sentinel Hub. Cover geometries in practice mean one polygon or multipolygon. They must also contain no more than 100 points.

Generating cover geometries

GDAL

One way of getting a cover geometry is using the GDAL utility script gdal_trace_outline which takes a raster and returns a cover geometry in the WKT format. This then needs to be converted to GeoJSON. In this example a single band file is traced:

gdal_trace_outline band.tif -out-cs en -wkt-out wkt.txt

The process might take a while if you have a large file. To speed up the process you can pass a subsampled file which you can get with gdal_translate. To get a file that is 1% of the original size:

gdal_translate band.tif subsampled.tif -outsize 1% 1%

or if it's stored on AWS S3:

gdal_translate /vsis3/bucket-name/folder/band.tif subsampled.tif -outsize 1% 1%

Note that calculating the cover geometry on subsampled rasters may not be sufficiently accurate for touching but not intersecting tiles as the imprecision caused by downsampling may leave gaps.

Finally, you need to convert the WKT file to GeoJSON and specify the CRS under crs.properties.name (except when WGS84 when it can be omitted). CRSs with the EPSG code <EpsgCode> should be specified as urn:ogc:def:crs:EPSG::<EpsgCode>. Here is a GeoJSON example in ESPG:32633.

{
"type": "MultiPolygon",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::32633"
}
},
"coordinates": [
[
[
[
370270.52147506207,
5085707.891369364
],
...
]
]
]
}
Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool can also help you update the cover geometry of existing tiles on Sentinel Hub. Use the set-coverage command. On Docker, get help and parameters by running: docker run sentinelhub/byoc-tool set-coverage --help

Workarounds

In case your input data is complex and cannot be adequately simply outlined it is nevertheless possible to obtain pixel-precise rendering. In this case, set the cover geometry to any which covers all the valid input pixels. The file bounding box as the default is such an example. What follows is doing the mosaicking in the custom script with the help of dataMask.

First, set the mosaicking parameter within setup to TILE (mosaicking: Mosaicking.TILE) and add the dataMask to the array of input bands.

Then use something like the following as your evalscript. Since dataMask precisely determines which pixels are valid and which ones are not, the moment a valid pixel is found this can be returned, alternatively the next scene should be checked.

function evaluatePixel(samples, scenes) {
for (let i = 0; i < samples.length; i++) {
let sample = samples[i];
if (sample.dataMask == 1) {
return someCombination(sample);
}
}
return someNodataValueArray;
}

Optionally, you may additionally use the preProcessScenes function to potentially reduce the number of tiles which will be processed. This is useful to set an upper limit for the number of processing units which will be used. The following limits the maximum number of tiles to 5, for example.

function preProcessScenes (collections) {
collections.scenes.tiles = collections.scenes.tiles.splice(5);
return collections;
}

Note that getting data in such a manner will use more processing units than SIMPLE mosaicking with precise cover geometries.

Collection metadata

Collections have the following metadata available under additionalData:

  • extent: the collection extent in WGS84
  • hasSensingTimes: information if tiles have sensing time
  • fromSensingTime the sensing time in ISO 8601 of the least recent tile
  • toSensingTime: the sensing time in ISO 8601 of the most recent tile

The metadata is updated in a few minutes after a tile is added or removed. To find out if your collection requires metadata updates, check out the flag requiresMetadataUpdate.

See our beginner BYOC webinar, where you will learn step-by-step how to prepare and ingest raster data with Sentinel Hub, make API requests to your data and visualize it.

This BYOC tutorial notebook is a simple walk-through on creating, updating, listing, and deleting your data collections through Python using Sentinel Hub Python package.

Examples

BYOC API Examples