Bring Your Own COG API

Overview

Bring Your Own COG API (or shortly "BYOC") enables you to import your own data in Sentinel Hub and access it just like any other data you are used to. To be able to do so, the following conditions should be met:

Store your raster data in the cloud optimized geotiff (COG) format on your own S3 bucket in the supported region.
Configure the bucket's permissions so that Sentinel Hub can read them.
Import tiles using the Dashboard or API.

Your data needs to be organized into collections of tiles. Each tile needs to contain a set of bands and (optionally) an acquisition date and time. Tiles with the same bands can be grouped into collections. Think of the Sentinel-2 data as a collection of Sentinel-2 tiles.

A note about COG overviews used for processing

When processing data, we select the nearest overview level which has higher resolution than your request, or the full resolution image.

Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool is software which can be used to prepare your data for use in Sentinel Hub. It can be run either in Docker or as a Java JAR. It takes care of the entire process; it is simple to use for simple cases but is also highly configurable allowing for more complex requirements. The same steps can be done manually and are detailed below, should you prefer or require more control over the process. Get the tool here.

Converting to COG

Constraints and settings

COGs can contain either a single band or multiple bands. For multi-band COGs we support both planar configurations formats - chunky and planar format.

There are a few additional constraints in addition to having COG files. These are:

The COG header size must not exceed one megabyte.
The internal tile size must be between 256 x 256 and 2048 x 2048.
The projection needs to be one of: WGS84 (EPSG:4326), WebMercator (EPGS:3857), any UTM zone (EPSG:32601-32660, 32701-32760), or Europe LAEA (EPSG:3035).
Photometric interpretation that is not 1 (0 is imaged as black) or 2 (RGB) is only supported with no compression and deflate or zstd compression.
The COG must not cross any of the two poles.
The band name should be a valid JavaScript identifier so it can be safely used in evalscripts; valid identifiers are case-sensitive, can contain Unicode letters, $, _, and digits (0-9), but may not start with a digit, and should not be one of the reserved JavaScript keywords.
There can be at most 100 bands.
Multi-band COGs in chunky format can have at most 10 bands.
The file names need to be consistent for all tiles in a collection. For example, if you have B1.tiff in one tile then you also need B1.tiff in all the other tiles in your collection.
All files of each tile needs to have consistent extension (so a tile cannot contain both B1.tiff and B2.TIF).
The maximum allowable difference between the intersection of all file bounding boxes and each individual file is one pixel of that file [1].
All files of each band need to have the same bit depth.
Files can be compressed with DEFLATE, ZLIB, ZSTD, PIXTIFF_ZIP, PACKBITS or LZW compression method. JPEG compression is not supported.
Supported sample types and bit depths are the same as those supported for outputs, as well as reading unsigned integer 1, 2, 4 bit files. See also sampleType.

Bands can have different resolutions.

For best performance we recommend the following setting for COGs: deflate compressed with 1024x1024 pixel internal tiling.

[1]: Here's one example of files with slightly different bounding boxes. One file has the bounding box [0, 0, 10, 10] and resolution of one meter per pixel, and the other file has the bounding box [0.5, 0.5, 10.5, 10.5] and 0.5 meter resolution. The intersection [0.5, 0.5, 10, 10] is not more than one pixel away from each individual file, therefore such files are valid for BYOC.

GDAL example command

COGs can be generated in a single step with GDAL 3.1 or newer using the COG raster driver. For older GDAL versions or if you want planar multi-band COGs, see below. Even though you can use any GDAL version, we highly recommend you use v3.1 or newer, as older versions have issues with average downsampling (see https://gdal.org/programs/gdaladdo.html).

The input file must conform to the constraints regarding the projection, units per pixel, and pixel formats. To generate a COG from an input file:

gdal_translate -of COG -co COMPRESS=DEFLATE -co BLOCKSIZE=1024 -co RESAMPLING=AVERAGE -co OVERVIEWS=IGNORE_EXISTING input.extension output.tiff

Additional parameters may be needed:

if the input file contains multiple bands, but you only need one or only some of them, add -b <bandA> -b <bandB> ..., where <bandX> is the band number, starting from 1.
if your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.
for many types of data adding a predictor can further reduce the file size. It is best to test this on your own data, to enable the predictor add -co PREDICTOR=YES.

Multi-band COGs generated this way, are encoded in chunky format and you cannot change it to planar format. To get a COG in planar format, follow the next chapter.

Older GDAL versions or planar multi-band COGs

For GDAL older than 3.1 or if you want planar multi-band COGs, multiple commands are needed. To extract individual bands, add -b <band>, where <band> is the band number, starting from 1, to the first command.

gdal_translate -of GTIFF input.extension intermediate.tiff

NOTE: If your input data has nodata values, add them to this command using: -a_nodata NO_DATA_VALUE, e.g. for zero: -a_nodata 0.

gdaladdo -r average --config GDAL_TIFF_OVR_BLOCKSIZE 1024 intermediate.tiff 2 4 8 16 32

(The number of overview levels you need depends on your source data. A good rule of thumb is to have as many overview levels as necessary for the entire source image to fit on one 1024x1024 tile).

gdal_translate -co TILED=YES -co COPY_SRC_OVERVIEWS=YES --config GDAL_TIFF_OVR_BLOCKSIZE 1024 -co BLOCKXSIZE=1024 -co BLOCKYSIZE=1024 -co COMPRESS=DEFLATE intermediate.tiff output.tiff

To generate a planar multi-band COG, add -co INTERLEAVE=BAND. For chunky format, you don't need to pass anything, as this is the default format.

NOTE: for many types of data adding a predictor can further reduce the file size. It is best you test this on your own data. To enable the predictor, add to the above command -co PREDICTOR=2 for integers, and -co PREDICTOR=3 for floating points.

Once the commands finish, you can delete the intermediate.tiff file.

For more information about each command see the GDAL documentation:

BYOC deployment

BYOC is available on AWS eu-central-1 and us-west-2. The BYOC API endpoint depends on the chosen deployment as specified in the table below.

BYOC deployment	BYOC URL end-point
AWS EU (Frankfurt)	https://services.sentinel-hub.com/api/v1/byoc
AWS US (Oregon)	https://services-uswest2.sentinel-hub.com/api/v1/byoc

Bucket settings

Note that a single bucket can be used for multiple collections. But while buckets do not have to be unique, this is true for tile paths.

AWS bucket region

The bucket containing your COGs needs to be in the same region as the BYOC deployment you will use, that is, either eu-central-1 when using EU region (Frankfurt) or us-west-2 when using US region (Oregon).

AWS bucket settings

Your AWS bucket needs to be configured to allow access from Sentinel Hub. To do this, update your bucket policy to include the following statement (don't forget to replace <bucket_name> with your actual bucket name):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Sentinel Hub permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::614251495211:root"
            },
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::<bucket_name>",
                "arn:aws:s3:::<bucket_name>/*"
            ]
        }
    ]
}

Configuring collections

When creating a collection:

you need to provide a name for the collection,
you need to provide the S3 bucket where your data is,
you can define bands, but only using BYOC API,
you can provide the no data value using Dashboard or BYOC API.

The no data value cannot be configured to NaN (not a number). However, there is no need to do this, as NaNs are by default treated as no data value.

Automatic configuration

If bands are not configured, BYOC service automatically configures them based on the files of the first ingested tile. In this case the bands are named after the files, while for multi-band files the band index in 1-based numbering is also added at the end. For example, the bands in a multi-band file named RGB.tiff would be named RGB_1, RGB_2, etc. You can rename any band later.

In this process, the service also configures the "no data" value, if it's not set by the user. The service automatically extracts "no data" values from the TIFF tag GDAL_NODATA (TIFF entry ID = 42113) of the files of the first ingested tile, and sets the value as the collection "no data" value, if all files have the exactly same value and if the value is a number. Otherwise, it sets values per band.

Manual band configuration

The below example shows how to configure manually instead of relying on the automatic configuration described above. Suppose your tiles are composed of two files - "RGB.tiff" with three 16-bit bands and "CLOUD_MASK.tiff" with a single 8-bit band. You would provide such configuration in additionalData.bands field of a new collection:

{
  "Red": {
    "source": "RGB",
    "bandIndex": 1,
    "bitDepth": 16
  },
  "Green": {
    "source": "RGB",
    "bandIndex": 2,
    "bitDepth": 16
  },
  "Blue": {
    "source": "RGB",
    "bandIndex": 3,
    "bitDepth": 16
  },
  "CloudMask": {
    "source": "CLOUD_MASK",
    "bandIndex": 1,
    "bitDepth": 8
  }
}

The keys "Red", "Green", "Blue", and "CloudMask" are the names of the bands that you are going to use in evalscripts. These names can be changed at any time. Inside each band specification you specify where the band is stored using the fields source and bandIndex. The source, together with tile path, defines the file (see below for details), while bandIndex is the band index in 1-based numbering.

Band renaming

Bands can be easily renamed in Dashboard. To do this using API, you need to provide the same band specs, but with new names. To obtain the current band specs, use this endpoint. For example, let's say your bands are defined like this, and you would like to rename bands "RGB_1", "RGB_2", "RGB_3" to "Red", "Green", and "Blue", respectively:

{
  "RGB_1": {
    "source": "RGB",
    "bandIndex": 1,
    "bitDepth": 16
  },
  "RGB_2": {
    "source": "RGB",
    "bandIndex": 2,
    "bitDepth": 16
  },
  "RGB_3": {
    "source": "RGB",
    "bandIndex": 3,
    "bitDepth": 16
  },
  "CLOUD_MASK": {
    "source": "CLOUD_MASK",
    "bandIndex": 1,
    "bitDepth": 8
  }
}

To achieve this, you need to use this endpoint. You need to provide the new names at the top level, but leave the band properties ("source", "bandIndex", etc) and values the same. So the content of additionalData.bands would be:

{
  "Red": {
    "source": "RGB",
    "bandIndex": 1,
    "bitDepth": 16
  },
  "Green": {
    "source": "RGB",
    "bandIndex": 2,
    "bitDepth": 16
  },
  "Blue": {
    "source": "RGB",
    "bandIndex": 3,
    "bitDepth": 16
  },
  "CLOUD_MASK": {
    "source": "CLOUD_MASK",
    "bandIndex": 1,
    "bitDepth": 8
  }
}

Keep in mind:

that the bucket cannot be changed after the collection is created,
that once bands have been configured you can only change band names or remove bands,
and that the no data value can be changed at anytime using Dashboard or BYOC API.

Configuring band sample format

The sample format is TIFF info that defines the band data type. It can be set to signed integers, unsigned integers, or floating points. Learn more about sample format here. These values are in BYOC defined as INT, UINT and FLOAT, respectively.

You can configure format manually in BYOC using API or Dashboard. If not set, it will get set to the value of the first ingested tile.

To configure it manually, set sampleFormat field for each band like this:

{
  "Red": {
    "source": "RGB",
    "bandIndex": 1,
    "bitDepth": 16,
    "sampleFormat": "INT"
  },
  "Green": {
    "source": "RGB",
    "bandIndex": 2,
    "bitDepth": 16,
    "sampleFormat": "INT"
  },
  "Blue": {
    "source": "RGB",
    "bandIndex": 3,
    "bitDepth": 16,
    "sampleFormat": "INT"
  },
  "CLOUD_MASK": {
    "source": "CLOUD_MASK",
    "bandIndex": 1,
    "bitDepth": 8,
    "sampleFormat": "UINT"
  }
}

After formats are set, the formats of all new files must match the formats defined in BYOC. If they do not match, files do not get ingested.

Legacy collections

Legacy collections are those collections created prior to the introduction of sampleFormat field. Before the field, we had read signed and unsigned integers as unsigned. To preserve backwards compatibility, we still read integers from these collections in this way, but you can change this by setting sampleFormat. If you do set it to INT, do note that you will get integers as signed in evalscripts from then on, and that you can set the field only once. Until you set it, we don't require that the formats in BYOC and files match.

The format can be changed in Dashboard or using API. Using API, make an update collection call with a payload that has sample format(s) set like the example above.

Ingesting the tiles

There are two ways of doing this. The easier version is using the dashboard.

To create a new collection click the New collection button. The name can be anything and is there for your own reference. The S3 bucket name is the bucket name containing your data.

Once the collection is created you can add tiles. Note that only a single tile can be added in one step.

To add a tile, click the Add tile button. Provide a path to the COG files inside the s3 bucket. For example, if your files are stored in s3://bucket-name/folder/, simply set folder as the tile path. Optionally, set the sensing time of the tile here as well.

When the tile is ingested its path will be automatically changed to folder/(BAND).tiff or similar, depending on the extension of the files in folder. Note that (BAND) is a placeholder that is replaced by the source of a band to obtain the actual file where the band is stored. In the example above your collection uses sources "RGB" and "CLOUD_MASK", thus the two files of the tile will be folder/RGB.tiff and folder/CLOUD_MASK.tiff.

For more complicated cases you must provide the path with the (BAND) placeholder and extension. For example, suppose your folder contains the files for multiple tiles:

s3://bucket-name/folder/tile_1_B1_2019.tif,
s3://bucket-name/folder/tile_1_B2_2019.tif,
s3://bucket-name/folder/tile_2_B1_2019.tif,
s3://bucket-name/folder/tile_2_B2_2019.tif.

Create the first tile with the path folder/tile_1_(BAND)_2019.tif to use the first two files and the second tile with the path folder/tile_2_(BAND)_2019.tif to use the next two files.

Do not forget that all tiles must contain the same set of files (with different data of course); that is, if a tile is missing one or more files it will fail to ingest.

To ingest tiles via API requests instead of the dashboard, see BYOC API reference or Python examples.

A note about changing files

While you may freely modify the data in your buckets, for it to continue to work reliably through Sentinel Hub you need to reingest tiles with changed data. You can do this in Dashboard, by clicking the "Refresh" button next to the tile, or using the API, by calling the reingest endpoint with the collection and tile id. This is needed as it will update metadata required for processing and failing to do so can result in odd behavior.

A note about cover geometries

Each tile ingested also requires a cover geometry. A cover geometry is a geometry which outlines the valid data part of the tile. Nodata therefore should not be contained in the cover geometry. In the simplest case, the cover geometry will equal the bounding box of the file being ingested.

The cover geometry is important because it tells the system where it can expect to find data. As a consequence, this determines how data is rendered where tiles overlap. If you have tiles with overlapping cover geometries and you for example request mosaicking SIMPLE, only the data from one tile will be rendered where two (or more) cover geometries intersect. This is true even if this data is nodata or if it lies outside the tile bounding box. Sentinel Hub will render such areas as nodata even if other tiles in the data collection contain valid data for this area. Having quality cover geometries is therefore important for collections where many tiles containing nodata overlap. Not all cases need precise cover geometries, however. A single tile or a regularly gridded collection with a single date and coordinate reference system can get away with cover geometries equalling the bounding box.

Overlapping tiles

If the cover geometry is not specified during ingestion it will automatically be set to the tile bounding box. Sentinel Hub will not attempt to generate a more precise geometry as it is impossible to prepare such a process which will work well for all users. It is therefore your responsibility to provide quality cover geometries and in doing so allow you to extract the most out of your data. If ingesting tiles using the API, set the cover geometry using the coverGeometry field in the API request. It must be in the GeoJSON format and in a projected or geodetic coordinate reference system which is supported by Sentinel Hub. Cover geometries in practice mean one polygon or multipolygon. They must also contain no more than 100 points.

Generating cover geometries

GDAL

One way of getting a cover geometry is using the GDAL utility script gdal_trace_outline which takes a raster and returns a cover geometry in the WKT format. This then needs to be converted to GeoJSON. In this example a single band file is traced:

gdal_trace_outline band.tif -out-cs en -wkt-out wkt.txt

The process might take a while if you have a large file. To speed up the process you can pass a subsampled file which you can get with gdal_translate. To get a file that is 1% of the original size:

gdal_translate band.tif subsampled.tif -outsize 1% 1%

or if it's stored on AWS S3:

gdal_translate /vsis3/bucket-name/folder/band.tif subsampled.tif -outsize 1% 1%

Note that calculating the cover geometry on subsampled rasters may not be sufficiently accurate for touching but not intersecting tiles as the imprecision caused by downsampling may leave gaps.

Finally, you need to convert the WKT file to GeoJSON and specify the CRS under crs.properties.name (except when WGS84 when it can be omitted). CRSs with the EPSG code <EpsgCode> should be specified as urn:ogc:def:crs:EPSG::<EpsgCode>. Here is a GeoJSON example in ESPG:32633.

{
    "type": "MultiPolygon",
    "crs": {
        "type": "name",
        "properties": {
            "name": "urn:ogc:def:crs:EPSG::32633"
        }
    },
    "coordinates": [
        [
            [
                [
                    370270.52147506207,
                    5085707.891369364
                ],
                ...
            ]
        ]
    ]
}

Sentinel Hub BYOC Tool

The Sentinel Hub BYOC Tool can also help you update the cover geometry of existing tiles on Sentinel Hub. Use the set-coverage command. On Docker, get help and parameters by running: docker run sentinelhub/byoc-tool set-coverage --help

Workarounds

In case your input data is complex and cannot be adequately simply outlined it is nevertheless possible to obtain pixel-precise rendering. In this case, set the cover geometry to any which covers all the valid input pixels. The file bounding box as the default is such an example. What follows is doing the mosaicking in the custom script with the help of dataMask.

First, set the mosaicking parameter within setup to TILE (mosaicking: Mosaicking.TILE) and add the dataMask to the array of input bands.

Then use something like the following as your evalscript. Since dataMask precisely determines which pixels are valid and which ones are not, the moment a valid pixel is found this can be returned, alternatively the next scene should be checked.

function evaluatePixel(samples, scenes) {
  for (let i = 0; i < samples.length; i++) {
    let sample = samples[i];
    if (sample.dataMask == 1) {
      return someCombination(sample);
    }
  }
  return someNodataValueArray;
}

Note that getting data in such a manner will use more processing units than SIMPLE mosaicking with precise cover geometries.

Optionally, you may additionally use the preProcessScenes function to potentially reduce the number of tiles which will be processed. This is useful to set an upper limit for the number of processing units which will be used. The following limits the maximum number of tiles to 5, for example.

function preProcessScenes (collections) {
  collections.scenes.tiles = collections.scenes.tiles.splice(5);
  return collections;
}

Collection metadata

Collections have the following metadata available under additionalData:

extent: the collection extent in WGS84
hasSensingTimes: information if tiles have sensing time
fromSensingTime the sensing time in ISO 8601 of the least recent tile
toSensingTime: the sensing time in ISO 8601 of the most recent tile

The metadata is updated in a few minutes after a tile is added or removed. To find out if your collection requires metadata updates, check out the flag requiresMetadataUpdate.

See our beginner BYOC webinar, where you will learn step-by-step how to prepare and ingest raster data with Sentinel Hub, make API requests to your data and visualize it.

This BYOC tutorial notebook is a simple walk-through on creating, updating, listing, and deleting your data collections through Python using Sentinel Hub Python package.

Examples

BYOC API Examples