Data Prepper's s3 source: visibility_duplication_protection (#5665)

* Adds documentation for Data Prepper's s3 source visibility_duplication_protection configuration. Includes the necessary permissions.

Signed-off-by: David Venable <dlv@amazon.com>

* Update _data-prepper/pipelines/configuration/sources/s3.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _data-prepper/pipelines/configuration/sources/s3.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: David Venable <dlv@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
This commit is contained in:
David Venable 2023-11-27 09:26:54 -08:00 committed by GitHub
parent cb8623a73a
commit 97d14ccfc2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 0 deletions

View File

@ -32,6 +32,7 @@ In order to use the `s3` source, configure your AWS Identity and Access Manageme
"Sid": "sqs-access",
"Effect": "Allow",
"Action": [
"sqs:ChangeMessageVisibility",
"sqs:DeleteMessage",
"sqs:ReceiveMessage"
],
@ -49,6 +50,8 @@ In order to use the `s3` source, configure your AWS Identity and Access Manageme
If your S3 objects or Amazon SQS queues do not use [AWS Key Management Service (AWS KMS)](https://aws.amazon.com/kms/), remove the `kms:Decrypt` permission.
If you do not enable `visibility_duplication_protection`, you can remove the `sqs:ChangeMessageVisibility` permission from the SQS queue's access.
## Cross-account S3 access<a name="s3_bucket_ownership"></a>
When Data Prepper fetches data from an S3 bucket, it verifies the ownership of the bucket using the
@ -109,6 +112,8 @@ Option | Required | Type | Description
`visibility_timeout` | No | Duration | The visibility timeout to apply to messages read from the Amazon SQS queue. This should be set to the amount of time that Data Prepper may take to read all the S3 objects in a batch. Default is `30s`.
`wait_time` | No | Duration | The amount of time to wait for long polling on the Amazon SQS API. Default is `20s`.
`poll_delay` | No | Duration | A delay placed between the reading and processing of a batch of Amazon SQS messages and making a subsequent request. Default is `0s`.
`visibility_duplication_protection` | No | Boolean | If set to `true`, Data Prepper attempts to avoid duplicate processing by extending the visibility timeout of SQS messages. Until the data reaches the sink, Data Prepper will regularly call `ChangeMessageVisibility` to avoid reading the S3 object again. To use this feature, you need to grant permissions to `ChangeMessageVisibility` on the IAM role. Default is `false`.
`visibility_duplicate_protection_timeout` | No | Duration | Sets the maximum total length of time that a message will not be processed when using `visibility_duplication_protection`. Defaults to two hours.
## aws