Batch Processing API
Batch Processing API is only available for enterprise users. If you don't have an enterprise account, and would like to try it out, contact us for a custom offer. It is currently supported only on the EU-Central-1 (Frankfurt) deployment.
Batch Processing API (or shortly "batch API") enables you to request data for large areas and/or longer time periods.
It is an asynchronous REST service. This means that data will not be returned immediately in a request response but will be delivered to your object storage, which needs to be specified in the request (e.g. S3 bucket, see AWS S3 bucket settings below). The processing results will be divided in tiles as described below.
To learn more about how batch processing can be used to create huge mosaics or to enhance your algorithms, read the following blog posts:
- How to create your own Cloudless Mosaic in less than an hour, November 3, 2020
- Scale-up your eo-learn workflow using Batch Processing API, September 17, 2020
- Large-scale data preparation — introducing Batch Processing, January 7, 2020
Workflow
The batch processing API comes with the set of REST APIs which support the execution of various workflows. The diagram below shows all possible statuses of a batch processing request (CREATED
, ANALYSING
, ANALYSIS_DONE
, PROCESSING
, DONE
, FAILED
) and user's actions (ANALYSE
, START
, CANCEL
) which trigger transitions among them.
The workflow starts when a user posts a new batch processing request. In this step the system:
- creates new batch processing request with status
CREATED
, - validates the user's inputs, and
- returns an estimated number of output tiles that will be processed.
User can then decide to either request an additional analysis of the request, start the processing or cancel the request. When additional analysis is requested:
- the status of the request changes to
ANALYSING
, - the evalscript is validated,
- a list of required tiles is created, and
- the request's cost is estimated (i.e. the estimated number of processing units (PU) needed for the requested processing).
- After the analysis is finished the status of the request changes to
ANALYSIS_DONE
.
If the user chooses to directly start processing, the system still executes the analysis but when the analysis is done it automatically starts with processing. This is not explicitly shown in the diagram in order to keep it simple.
The user can now request a list of tiles for their request, start the processing, or cancel the request. When the user starts the processing:
- the estimated number of PU is reserved,
- the status of the request changes to
PROCESSING
(this may take a while), - the processing starts.
When the processing finishes, the status of the request changes to:
FAILED
when all tiles failed processing,PARTIAL
when some tiles were processed and some failed,DONE
when all tiles were processed.
Although the process has built-in fault tolerance, occasionally, tile processing may fail. Re-processing of such FAILED
tiles is possible by requesting
per-tile reprocessing and starting the processing of that request again.
Canceling the request
User may cancel the request at any time. However:
- if the status is
ANALYSING
, the analysis will complete, - if the status is
PROCESSING
, all tiles that have been processed or are being processed at that moment are charged for. The remaining PUs are returned to the user.
Automatic deletion of stale data
Stale requests will be deleted after some time. Specifically, the following requests will be deleted:
- failed requests (request status
FAILED
), - requests that were created but never started (request statuses
CREATED
,ANALYSIS_DONE
), - successful requests (request statuses
DONE
andPARTIAL
) for which it was not requested to add the results to your collections (new feature - coming soon). Note that only such requests themselves will be deleted, while the requests' result (created imagery) will remain under your control in your S3 bucket.
Tiling grids
For more effective processing we divide the area of interest into tiles and process each tile separately. While process
API uses grids which come together with each datasource for processing of the data, the batch
API uses one of the predefined tiling grids. The predefined tiling grids are based on the Sentinel-2 tiling in UTM/WGS84 projection with some adjustments:
- The width and height of tiles in the original Sentinel 2 grid is 100 km while the width and height of tiles in our grids are given in the table below.
- All redundant tiles (i.e. fully overlapped tiles) are removed.
All available tiling grids can be requested with (NOTE: To run this example you need to first create an OAuth client as is explained here):
url = "https://services.sentinel-hub.com/api/v1/batch/tilinggrids/"response = oauth.request("GET", url)response.json()
This will return the list of available grids and information about tile size and available resolutions for each grid. Currently available grids are:
name | id | tile width | tile height | resolutions | download the grid [zip with shp file] |
---|---|---|---|---|---|
20km grid | 0 | 20040 m | 20040 m | 10 m, 20 m, 60 m | 20km grid |
10km grid | 1 | 10000 m | 10000 m | 10 m, 20 m | 10km grid |
100km grid | 2 | 100080 m | 100080 m | 60 m, 120 m, 240 m, 360 m | 100km grid |
WGS84 1 degree grid | 3 | 1° | 1° | 0.0001°, 0.0002° | WGS84 1 degree grid |
To use s2gm
grid with 60m resolution for example, specify id
and resolution
parameters of the tilingGrid
object when creating a new batch request (see an example of full request) as:
{..."tilingGrid": {"id": 0,"resolution": 60.0},...}
Contact us if you would like to use any other grid for processing.
Processing results
The outputs of a batch processing will be stored to your object storage. By default, the results will be organized in sub-folders where one sub-folder will be created for each tile. Each sub-folder might contain one or more images depending on how many outputs were defined in the evalscript of the request. For example:
You can also customize the sub-folder structure as described under output
parameter in the BATCH API reference .
Currently supported output image formats are png, jpeg, and GeoTIFF.
The results of batch processing will be in UTM/WGS84 projection. Each part of the aoi is delivered in the UTM zone with which it intersects. In other words, in case your area of interest (aoi) intersects with more UTM zones, the results will be delivered as tiles in different UTM zones (and thus different CRSs).
AWS S3 bucket settings
Bucket region
The bucket to which the results will be delivered needs to be in the eu-central-1
(Frankfurt) region.
Bucket settings
Sentinel Hub needs full access to the bucket. To do this, update your bucket policy to include the following statement (don't forget to replace <bucket_name>
with your actual bucket name):
{"Version": "2012-10-17","Statement": [{"Sid": "Sentinel Hub permissions","Effect": "Allow","Principal": {"AWS": "arn:aws:iam::614251495211:root"},"Action": ["s3:*"],"Resource": ["arn:aws:s3:::<bucket_name>","arn:aws:s3:::<bucket_name>/*"]}]}