Batch Processing V2 API

The BatchV2 API is only available for enterprise users. If you don't have an enterprise account, and would like to try it out, contact us for a custom offer.


Migration guide

If you're interested on how to migrate from Batch Processing API to BatchV2, please read the following guide:

BatchV2 Migration Guide


Overview

BatchV2 Processing API (or shortly "BatchV2 API") enables you to request data for large areas and/or longer time periods for any Sentinel Hub supported collection, including BYOC (bring your own data). It is an asynchronous REST service, meaning data won't be returned immediately but delivered to your specified object storage instead.

Workflow

The Batch V2 Processing API comes with the set of REST APIs which support the execution of various workflows. The diagram below shows all possible statuses of a batch task:

  • CREATED
  • ANALYSING
  • ANALYSIS_DONE
  • PROCESSING
  • DONE
  • FAILED
  • STOPPED

and user's actions:

  • ANALYSE
  • START
  • STOP

which trigger transitions among them.

👤 START/ANALYSE
👤 START
👤 STOP
👤 STOP
👤 START
CREATED
ANALYSING
FAILED
ANALYSIS_DONE
PROCESSING
STOPPED
DONE

The workflow starts when a user posts a new batch request. In this step the system:

  • creates a new batch task with the status CREATED,
  • validates the user's input (except the evalscript),
  • ensures the user's account has at least 1000 PUs,
  • uploads a JSON of the original request to the user's bucket,
  • and returns the overview of the created task.

The user can then decide to either request an additional analysis of the task or start the processing. When an additional analysis is requested:

  • the status of the task changes to ANALYSING,
  • the evalscript is validated,
  • a feature manifest file is uploaded to the user's bucket,
  • after the analysis is finished, the status of the task changes to ANALYSIS_DONE.

If the user chooses to directly start processing, the system still executes the analysis but when the analysis is done it automatically proceeds with processing. This is not explicitly shown in the diagram in order to keep it simple.

When the user starts the processing:

  • the status of the task changes to PROCESSING (this may take a while, depending on the load on the service),
  • the processing starts,
  • an execution database is periodically uploaded to the user's bucket,
  • spent processing units are billed periodically.

When the processing is finished, the status of the task changes to DONE.

Stopping the request

A task might be stopped for the following reasons:

  • it's requested by a user (user action),
  • user is out of processing units,
  • something is wrong with the processing of the task (e.g. the system is not able to process the data).

A user may stop the request in following states: ANALYSING, ANALYSIS_DONE and PROCESSING. However:

  • if the status is ANALYSING, the analysis will complete,
  • if the status is PROCESSING, all features (polygons) that have been processed or are being processed at that moment are charged for,
  • user is not allowed to restart the task in the next 30 minutes.

Input features

BatchV2 API supports two ways of specifying the input features of your batch task:

  1. Pre-defined Tiling Grid
  2. User-defined GeoPackage

1. Tiling Grid

For more effective processing we divide the area of interest into tiles and process each tile separately. While process API uses grids which come together with each datasource for processing of the data, the batch API uses one of the predefined tiling grids. The tiling grids 0-2 are based on the Sentinel-2 tiling in WGS84/UTM projection with some adjustments:

  • The width and height of tiles in the original Sentinel 2 grid is 100 km while the width and height of tiles in our grids are given in the table below.
  • All redundant tiles (i.e. fully overlapped tiles) are removed.

All available tiling grids can be requested with (NOTE: To run this example you need to first create an OAuth client as is explained here):

url = "https://services.sentinel-hub.com/api/v2/batch/tilinggrids/"
response = oauth.request("GET", url)
response.json()

This will return the list of available grids and information about tile size and available resolutions for each grid. Currently, available grids are:

nameidtile sizeresolutionscoverageoutput CRSdownload the grid [zip with shp file] **
UTM 20km grid020040 m10 m, 20 m, 30m*, 60 mWorld, latitudes from -80.7° to 80.7°UTMUTM 20km grid
UTM 10km grid110000 m10 m, 20 mWorld, latitudes from -80.6° to 80.6°UTMUTM 10km grid
UTM 100km grid2100080 m30m*, 60 m, 120 m, 240 m, 360 mWorld, latitudes from -81° to 81°UTMUTM 100km grid
WGS84 1 degree grid31 °0.0001°, 0.0002°World, all latitudesWGS84WGS84 1 degree grid
LAEA 100km grid6100000 m40 m, 50 m, 100 mEurope, including Turkey, Iceland, Svalbald, Azores, and Canary IslandsEPSG:3035LAEA 100km grid
LAEA 20km grid720000 m10 m, 20 mEurope, including Turkey, Iceland, Svalbald, Azores, and Canary IslandsEPSG:3035LAEA 20km grid

* The 30m grid is only available on the us-west-2 deployment to provide an appropriate option for the Harmonized Landsat Sentinel collection. It is not recommended for "regular" Landsat collections as the pixel placement of those is slightly shifted relative to the output grid, therefore data interpolation will occur.

** The geometries of the tiles are reprojected to WGS84 for download. Because of this and other reasons the geometries of the output rasters may differ from the tile geometries provided here.

To use 20km grid with 60 m resolution, for example, specify id and resolution parameters of the tilingGrid object when creating a new batch request (see an example of full request) as:

{
...
"input": {
"type" : "tiling-grid",
"id": 0,
"resolution": 60.0
},
...
}

2. GeoPackage

In addition to the tiling grids, BatchV2 API now also support user-defined features through GeoPackages. This allows you to specify features of any shape as long as the underlying geometry is a POLYGON or MULTIPOLYGON in an EPSG compliant CRS listed here. The GeoPackage can also have multiple layers, offering more flexibility in specifying features in multiple CRS.

The GeoPackage must adhere to the GeoPackage spec and contain at least one feature table with any name. The table must include a column that holds the geometry data. This column can be named arbitrarily, but it must be listed as the geometry column in the gpkg_geometry_columns table. The table schema should include the following columns:

ColumnTypeExample
id - primary keyINTEGER (UNIQUE)1000
identifierTEXT (UNIQUE)FEATURE_NAME
geometryPOLYGON or MULTIPOLYGONFeature geometry representation in GeoPackage WKB format
widthINTEGER1000
heightINTEGER1000
resolutionREAL0.005

Caveats

  • You must specify either both width and height, or alternatively, specify resolution. If both values are provided, width and height will be used, and resolution will be ignored.
  • The feature table must use a CRS that is EPSG compliant.
  • identifier values must not be null and unique across all feature tables.
  • There can be a maximum of 700.000 features in the GeoPackage.
  • The feature output width and height cannot exceed 3500 by 3500 pixels or the equivalent in resolution.

Below you will find a list of example GeoPackages that serve as a showcase of how a GeoPackage file should be structured. Please note that these examples do not serve as production-ready GeoPackages and should only be used for testing purposes. If you'd like to use these tiling grids for processing, use the equivalent tiling grid with the tiling grid input instead.

nameidoutput CRSgeopackage
UTM 20km grid0UTMUTM 20km grid
UTM 10km grid1UTMUTM 10km grid
UTM 100km grid2UTMUTM 100km grid
WGS84 1 degree grid3WGS84WGS84 1 degree grid
LAEA 100km grid6EPSG:3035LAEA 100km grid
LAEA 20km grid7EPSG:3035LAEA 20km grid

An example of a batch task with GeoPackage input is available here.

Area of Interest and PUs

When using either Tiling Grid or GeoPackage as input, the features that end up being processed are determined by the processRequest.input.bounds parameter specified in the request, called Area of Interest or AOI.

The way the AOI parameter is used and its effect depend on the input type used:

  • Tiling grid: The AOI must be specified in the request. Only the tiles (features) that intersect with the AOI will be processed.
  • GeoPackage: The AOI can optionally be omitted. If the AOI is omitted, all the features inside your GeoPackage will be processed. Conversely, if AOI is specified, only the features that intersect with the AOI will be processed.

Please note that in both cases of input types, if the feature is only partially covered by the AOI, the feature will be processed in its entirety.

You are only charged PUs for the features that are processed. If a feature does not intersect with the AOI, it will not be charged for.


Processing results

The outputs of a batch task will be stored to your object storage in either:

  1. GeoTIFF (and JSON for metadata) or
  2. Zarr format

1. GeoTIFF output format

The GeoTIFF format will be used if your request includes the output.type parameter set to raster, along with other relevant parameters specified in the BatchV2 API reference. An example of a batch task with GeoTIFF output is available here.

By default, the results will be organized in sub-folders where one sub-folder will be created for each feature. Each sub-folder might contain one or more images depending on how many outputs were defined in the evalscript of the request. For example: batchv2 sub folders

You can also customize the sub-folder structure and file naming as described in the delivery parameter under output in BatchV2 API reference.

You can choose to return your GeoTIFF files as Cloud Optimized GeoTIFF (COG), by setting the cogOutput parameter under output in your request as true. Several advanced COG options can be selected as well - read about the parameter in BatchV2 API reference.

The output projection depends on the selected input, either tiling grid or GeoPackage:

  1. If the input is a tiling grid, the results of batch processing will be in the projection of the selected tiling grid. For UTM-based grids, each part of the AOI (area of interest) is delivered in the UTM zone with which it intersects. In other words, in case your AOI intersects with more UTM zones, the results will be delivered as tiles in different UTM zones (and thus different CRSs).
  2. If the input is a GeoPackage, the results will be in the same CRS as the input feature's CRS.

2. Zarr output format

The Zarr format will be used if your request includes the output.type parameter set to zarr, along with other relevant parameters specified in the BatchV2 API reference. An example of a batch request with Zarr output is available here. Your request must only have one band per output and the application/json format in responses is not supported.

The outputs of batch processing will be stored as a single Zarr group containing one data array for each evalscript output and multiple coordinate arrays. The output will be stored in a subfolder named after the requestId that you pass to the API in the delivery.s3.url parameter under output.


Ingesting results into BYOC

Purpose

Enables automatic ingestion of processing results into a BYOC collection, allowing you to:

  • Access data with Processing API, by using the collection ID
  • Create a configuration with custom layers
  • Make OGC requests to a configuration
  • View data in EO Browser

In order to enable this functionality, user needs to specify either id of an existing BYOC collection (collectionId) or set createCollection = true.

{
...
"output": {
...
"createCollection": true,
"collectionId": "<byoc-collection-id>",
...
},
...
}

If collectionId is provided, the existing collection will be used for data ingestion.

If createCollection is set to true and collectionId is not provided, a new BYOC collection will be created automatically and the collection bands will be set according to the request output responses definitions.

Regardless of whether the user specifies an existing collection or requests a new one, processed data will still be uploaded to the users S3 bucket, where they will be available for download and analysis.

When creating a new batch collection, one has to be careful to:

  • Make sure that cogOutput=true and that the output format is a image/tiff
  • If an existing BYOC collection is used, make sure that identifier and sampleType from the output definition(s) match the name and the type of the BYOC band(s). Single band and multi-band outputs are supported.
  • If multi-band output is used in the request, the additionally generated bands will be named using a numerical suffix in ascending order (e.g. 2, ... 99). For example, if the output: { id: "result", bands: 3 } is used in the evalscript setup function, the produced BYOC bands will be named: result for band 1, result2 for band 2 and result3 for band 3. Make sure that no other output band has any of these automatically generated names, as this will throw an error during the analysis phase. The output: [{ id: "result", bands: 3 },{ id: "result2", bands: 1 }] will throw an exception.
  • Keep sampleType in mind, as the values the evalscript returns when creating a collection will be the values available when making a request to access it.

Mandatory bucket settings

Regardless of the credentials provided in the request, you still need to set a bucket policy to allow Sentinel Hub services to access the data. For detailed instructions on how to configure your bucket policy, please refer to the BYOC bucket settings documentation.

Using delivery buckets from other regions

A bucket from an arbitrary region can be used for data delivery. If the bucket region differs from the system region where the request is sent to, the bucket region also needs to be defined in the request:

{
...
"output": {
...
"delivery": {
"s3": {
"url": "s3://<your-bucket>/<requestId>",
"region": "<bucket-region>",
...
}
},
...
},
...
}

In this case an additional cost of 0.03 PU per MB of transferred data will be added to the total processing cost.


Feature Manifest

Purpose

  • Provides a detailed overview of features scheduled for processing during the PROCESSING step.
  • Enables users to verify feature information and corresponding output paths prior to processing.

Key Information

  • File Type: GeoPackage
  • File Name: featureManifest-<requestId>.gpkg
  • Location: Root folder of the specified output delivery path
  • Structure:
    • May contain multiple feature tables, one per distinct CRS used by the features.
    • Table names follow the format feature_<crs-id> (e.g. feature_4326).

During task analysis, the system will upload a file to the user's bucket called the featureManifest-<requestId>.gpkg. This file is a GeoPackage that contains basic information about the features that will be processed during the PROCESSING step. It is intended to be used by users to check the features that will be processed and their corresponding output paths.

If the output type is set to raster, the output paths will be the paths to the GeoTIFF files. If the output type is zarr, the output paths will just be the root of the output folder.

The database may contain multiple feature tables, one feature table for each CRS of all features. The tables will be named feature_<crs-id>, e.g. feature_4326.
The schema of feature tables inside the database is currently the following:

NameTypeDescription
fidINTEGERAuto-incrementing ID
outputIdTEXTOutput identifier defined in the processRequest
identifierTEXTID of the feature
pathTEXTThe object storage path URI where the output of this feature will be uploaded to
widthINTEGERWidth of the feature in pixels
heightINTEGERHeight of the feature in pixels
geometryGEOMETRYFeature geometry representation in GeoPackage WKB format

Execution database

Purpose

The Execution Database serves as a monitoring tool for tracking the progress of feature execution within a specific task. It provides users with insight into the status of each feature being processed.

Key Information

  • File Type: SQLite
  • File Name: execution-<requestId>.sqlite
  • Location: Root folder of specified output delivery path
  • Structure:
    • Contains a single table called features.

You can monitor the execution of your features for a specific task by checking the SQLite database that is uploaded to your bucket. The database contains the name and status of each feature. The database is updated periodically during the execution of the task.

The database can be found in your bucket in the root output folder and is named execution-<requestId>.sqlite.

The schema of the features table is currently the following:

NameTypeDescription
idINTEGERNumerical ID of the feature
nameTEXTTextual ID of the feature
statusTEXTStatus of the feature
errorTEXTError message in case processing has failed
deliveredBOOLEANTrue if output delivered to delivery bucket, otherwise False

The status of the feature can be one of the following:

  • PENDING: The feature is waiting to be processed.
  • DONE: Feature was successfully processed.
    Caveat: If there was no data to process for this feature, the feature will still be marked with status `DONE` but with a '**No data**' message in the error column.
  • FATAL: Feature has failed X amount of times and will not be retried. The error column details the issue.

AWS bucket access

The BatchV2 API requires access to your AWS bucket in order to:

The IAM user or IAM role (depending on which of access methods described below is used) must have permissions to read and/or write to the corresponding S3 bucket.

We provide 2 ways of providing access to your bucket:

  1. AWS IAM Assume Role Workflow
  2. AWS Access Key & Secret Key Workflow

AWS IAM Assume Role Workflow

In order to let Sentinel Hub access the bucket, you can provide ARN of your IAM role that has access to the bucket. This method is recommended as it is more secure and allows for more fine-grained control over the access permissions.

You can do this by creating a new IAM role in your AWS account with the necessary permissions to access your bucket and adding our IAM user as a trusted entity that can perform the sts:AssumeRole action.

Step by step guide on how to setup your IAM role & policies:

  1. Create an IAM Policy for limited access to your bucket
  • Firstly, we'll create a policy that grants access to your bucket. This policy will later be attached to the IAM role.
  • Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.
  • In the navigation pane, choose Policies and then choose Create policy.
  • Open the JSON tab.
  • Enter a policy that grants GetObject, PutObject and ListBucket permissions to your bucket. Here's an example policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<your-bucket-name>",
"arn:aws:s3:::<your-bucket-name>/*"
]
}
]
}
  • Replace <your-bucket-name> with the name of your S3 bucket & click Next.
  • On the Review and create page, enter a Policy name and optionally fill in a Description and tags for the policy, and then click Create policy.
  1. Create an IAM Role
  • In the navigation pane, choose Roles and then choose Create role.
  • Choose AWS account for the trusted entity type and then choose Another AWS account for the role type.
  • For Account ID, enter 614251495211 (this is the AWS account ID for Sentinel Hub).
  • Leave the Require external ID and Require MFA boxes unchecked. We will come back to fine-tuning the trust relationship later.
  • Click Next.
  • In the Permissions policies page, select the policy you just created & click Next.
  • On the review page, enter a Role name and optionally fill in a Description and tags for the role, and then click Create role.
  1. Adjusting the Trust Relationship
  • If you wish further limit access to the role, you can modify the trust relationship. If not, you can skip this step.
  • After the role is created, it will appear in the list of roles in the IAM console.
  • Choose the role that you just created.
  • Navigate to the Trust relationships tab and then select Edit trust policy.
  • For an extra layer of security, you can specify the sts:ExternalId parameter. If you choose to use this, set its value to your Sentinel Hub domain account ID, which can be found in the User settings page in the Dashboard.
  • If your IAM role is shared among several principals and you want to distinguish their activities, you can set the sts:RoleSessionName in the trust policy of each principal. For the Sentinel Hub principal, set its value to sentinelhub.
  • Here's an example of how the JSON might look like:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::614251495211:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<your-SH-domain-account-id>"
},
"StringLike": {
"sts:RoleSessionName": "sentinelhub"
}
}
}
]
}
  • Replace <your-SH-domain-account-id> with your Sentinel Hub domain account ID.
  • Click Update policy.

Now, you can use the ARN of this IAM role in your Sentinel Hub BatchV2 API requests by simply providing the iamRoleARN alongside the URL of your bucket object:

s3 = {
"url": "s3://<your-bucket>/<path>",
"iamRoleARN": "<your-IAM-role-ARN>",
}

AWS Access Key & Secret Key Workflow

The other option is to provide accessKey and secretAccessKey pairs in your request.

s3 = {
"url": "s3://<your-bucket>/<path>",
"accessKey": "<your-bucket-access-key>",
"secretAccessKey": "<your-bucket-secret-access-key>"
}

Access key and secret must be linked to an IAM user that has permissions to read and/or write to the corresponding S3 bucket.

To learn how to configure an access key and access key secret on AWS S3, check this link, specifically, under the Programmatic access section.


Examples

Example of Batch Processing Workflow