Batch Statistical API

The Batch Statistical API is in beta release. It might misbehave in case of large requests. We may change the interface, although no major changes are expected. If you have suggestions for improvements or any feedback, please share your thoughts on our forum.
The Batch statistical API is only available for enterprise users. If you don't have an enterprise account, and would like to try it out, contact us for a custom offer.

The Batch Statistical API (or shortly "Batch Stats API") enables you to request statistics similarly as with the Statistical API but for multiple polygons at once and/or for longer aggregations. A typical use case would be calculating statistics for all parcels in a country.

Similarly to the Batch Processing API, this is an asynchronous REST service. This means that data will not be immediately returned in the response of the request but delivered to your object storage, which needs to be specified in the request (e.g. S3 bucket, see AWS bucket access below).

You can find more details about the API in the API Reference or in the examples of the workflow.

Workflow

The Batch statistical API workflow in many ways resembles the Batch Processing API workflow. Available actions and statuses are:

  • user's actions ANALYSE, START and STOP.
  • request's statuses CREATED, ANALYSING, ANALYSIS_DONE, STOPPED, PROCESSING, DONE, and FAILED.

The Batch statistical API comes with a set of REST actions that support the execution of various steps in the workflow. The diagram below shows all possible statuses of the batch statistical request and users' actions which trigger transitions among them.

👤 START/ANALYSE
👤 START
👤 STOP
👤 STOP
👤 START
CREATED
ANALYSING
FAILED
ANALYSIS_DONE
PROCESSING
STOPPED
DONE

The workflow starts when a user posts a new batch statistical request. In this step the system:

  • creates a new batch statistical request with status CREATED,
  • validates the user's input (not the evalscript),
  • returns the overview of the created request.

The user can then decide to either request an additional analysis of the request or start the processing. When an additional analysis is requested:

  • the status of the request changes to ANALYSING,
  • the evalscript is validated,
  • After the analysis is finished the status of the request changes to ANALYSIS_DONE.

If the user chooses to directly start processing, the system still executes the analysis but when the analysis is done it automatically starts with processing. This is not explicitly shown in the diagram in order to keep it simple.

When the user starts the processing:

  • the status of the request changes to PROCESSING (this may take a while),
  • the processing starts,
  • spent processing units are billed periodically.

When the processing finishes, the status of the request changes to DONE.

Stopping the request

A request might be stopped for following reasons:

  • it's requested by a user (user action)
  • user is out of processing units (see chapter below)
  • something is wrong with the processing of the request

A user may stop the request in following states: ANALYSING, ANALYSIS_DONE and PROCESSING. However:

  • if the status is ANALYSING, the analysis will complete,
  • if the status is PROCESSING, all features (polygons) that have been processed or are being processed at that moment are charged for,
  • user is not allowed to restart the request in the next 30 minutes.

The service itself may also stop the request when processing of a lot of features is repeatedly failing. stoppedStatusReason of such requests will be UNHEALTHY. This can happen if the service is unstable or something is wrong with the request. If former, the request should eventually be restarted by Sentinel Hub team.

Processing unit costs

To be able to create, analyse or start a request the user has to have at least 1000 processing units available in their account. If available processing units of a user drop below 1000 while request is being processed the request is automatically stopped and cannot be restarted in the next 60 minutes. Therefore it is highly recommended to start a request with a sufficient reserve.

More information about batch statistical costs is available here.

Automatic deletion of stale data

Stale (inactive) requests will be deleted after a certain period of inactivity, depending on their status:

  • requests with status CREATED are deleted after 7 days of inactivity
  • requests with status FAILED are deleted after 15 days of inactivity
  • all other requests are deleted after 30 days of inactivity.

Note that only such requests themselves will be deleted, while the requests' result (created statistics) will remain under your control in your S3 bucket.

Input polygons as GeoPackage file

The Batch Statistical API accepts a GeoPackage file containing features (polygons) as an input. The GeoPackage must be stored in your object storage (e.g. AWS S3 bucket) and Sentinel Hub must be able to read from the storage (find more details about this in the bucket access section below). In a batch statistical request, the input GeoPackage is specified by setting the path to the .gpkg file in the input.features.s3 parameter.

All features (polygons) in an input GeoPackage must be in the same CRS supported by Sentinel Hub. An example of a GeoPackage file can be downloaded here.

Evalscript and Batch statistical API

The same specifics as described for evalscript and Statistical API apply also for Batch statistical API.

Evalscripts smaller than 32KB in size can be provided directly in a batch statistical request under evalscript parameter. If your evalsript exceeds this limit, you can store it to your S3 bucket and provide a reference to it in a batch statistical request under evalscriptReference parameter.

Processing results

Outputs of a Batch Statistical API request are json files stored in your object storage. Each .json file will contain requested statistics for one feature (polygon) in the provided GeoPackage. You can connect statistics in a json file with corresponding feature (polygon) in the GeoPackge based on:

  • id of a feature from GeoPackage is used as name of json file (e.g. 1.json, 2.json) and available in the json file as id property OR
  • a custom column identifier of type string can be added to GeoPackage and its value will be available in json file as identifier property.

The outputs will be stored in the bucket and the folder specified by output.s3.path parameter of the batch statistical request. The outputs will be available in a sub-folder named after the ID of your request (e.g. s3://<my-bucket>/<my-folder>/db7de265-dfd4-4dc0-bc82-74866078a5ce).

Batch Statistical deployment

Batch deploymentBatch URL end-point
AWS EU (Frankfurt)https://services.sentinel-hub.com/api/v1/statistics/batch

AWS bucket access

As noted above, the Batch Statistical API uses aws S3 to:

  • read GeoPackage file with input features (polygons) from a S3 bucket,
  • read evalscript from a S3 bucket (this is optional because an evalscript can also be provided directly in a request),
  • write results of processing to a S3 bucket.

One bucket or different buckets can be used for all three purposes.

Bucket regions

The buckets to which the results of batch statistical processing are written must be in the same region as the Batch statistical API deployment. The only available region at the moment is eu-central-1 (Frankfurt).

Bucket policy

The IAM user or IAM role (depending on which of access methods described below is used) must have permissions to read and/or write to the corresponding S3 bucket.

The bucket from which the evalscript and/or GeoPackage file are read must have GetObject S3 permission listed in its policy. The bucket where the results are stored must additionally have PutObject and DeleteObject permissions in its policy.

Example of a valid bucket policy for the IAM user and a bucket that's used for both reading input data and writing results:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Batch statistical permissions",
"Effect": "Allow",
"Principal": {
"AWS": "<iam-user-arn->"
},
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<bucket-name>/*"
]
}
]
}

Access your bucket using assume IAM role flow

In order to let Sentinel Hub access the bucket, the recommended option is to provide ARN of your IAM role that has access to the bucket:

s3 = {
"url": "s3://<your-bucket>/<path>",
"iamRoleARN": "<your-IAM-role-ARN>",
}

When creating an IAM role, you have the option to add an additional layer of security by specifying the externalId parameter. In case you opt for this, make sure you use your SH domain account id for its value, which you can find in the User settings page in the Dashboard.

If your IAM role is shared among several principals, and you want to distinguish between their activities by setting roleSessionName in trust policy of each principal, set its value for SH principal to sentinelhub.

Example of a trust policy for IAM role specifying permissions for SH user:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::614251495211:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<your-SH-domain-account-id>"
},
"StringLike": {
"sts:RoleSessionName": "sentinelhub"
}
}
}
]
}

To give IAM role access to the bucket, you can either add permission policy to the IAM role or set bucket policy in a similar way as above, using your IAM role as principal.

To learn how to create a role for IAM user, check this link.

Access your bucket using accessKey and secretAccessKey

The other option is to provide accessKey and secretAccessKey pairs in your batch statistical request:

Example:

s3 = {
"url": "s3://<your-bucket>/<path>",
"accessKey": "<your-bucket-access-key>",
"secretAccessKey": "<your-bucket-access-key-secret>"
}

Access key and secret must be linked to an IAM user that has permissions to read and/or write to the corresponding S3 bucket.

The above JSON for accessing the S3 bucket can be used in:

  • input.features.s3 to specify the bucket where GeoPackage file is available,
  • (optional) evalscriptReference.s3 to specify the bucket where evalscript .js file is available,
  • output.s3 to specify the bucket where the results will be stored.

To learn how to configure an access key and access key secret on AWS S3, check this link, specifically, under the Programmatic access section.

Examples

Example of a Batch Statistical Workflow