Add DynamoDB source (#5664)

* Add DynamoDB source

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Reconfigure sections

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Add Dynamo DB source.

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update dynamo-db.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update dynamo-db.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update dynamo-db.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update dynamo-db.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Naarcha-AWS 2023-11-29 11:38:08 -06:00 committed by GitHub
parent 5e71e78714
commit fa5bc07a69
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 96 additions and 0 deletions

View File

@ -0,0 +1,96 @@
---
layout: default
title: dynamodb
parent: Sources
grand_parent: Pipelines
nav_order: 3
---
# dynamodb
The `dynamodb` source enables change data capture (CDC) on [Amazon DynamoDB](https://aws.amazon.com/dynamodb/) tables. It can receive table events, such as `create`, `update`, or `delete`, using DynamoDB streams and supports initial snapshots using [point-in-time recovery (PITR)](https://aws.amazon.com/dynamodb/pitr/).
The source includes two ingestion options to stream DynamoDB events:
1. A _full initial snapshot_ using [PITR](https://aws.amazon.com/dynamodb/pitr/) gets an initial snapshot of the current state of the DynamoDB table. This requires the PITR Snapshots and DyanmoDB option enabled on your DynamoDB table.
2. Stream events from DynamoDB streams without full initial snapshots. This is useful if you already have a snapshot mechanism within your pipelines. This requires that the DynamoDB stream option is enabled on the DynamoDB table.
## Usage
The following example pipeline specifies DynamoDB as a source. It ingests data from a DyanmoDB table named `table-a` through a PITR snapshot. It also indicates the `start_position`, which tells the pipeline how to read DynamoDB stream events:
```yaml
version: "2"
cdc-pipeline:
source:
dynamodb:
tables:
- table_arn: "arn:aws:dynamodb:us-west-2:123456789012:table/table-a"
export:
s3_bucket: "test-bucket"
s3_prefix: "myprefix"
stream:
start_position: "LATEST" # Read latest data from streams (Default)
aws:
region: "us-west-2"
sts_role_arn: "arn:aws:iam::123456789012:role/my-iam-role"
```
## Configuration options
The following tables describe the configuration options for the `dynamodb` source.
Option | Required | Type | Description
:--- | :--- | :--- | :---
`aws` | Yes | AWS | The AWS configuration. See [aws](#aws) for more information.
`acknowledgments` | No | Boolean | When `true`, enables `s3` sources to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) when events are received by OpenSearch sinks.
`shared_acknowledgement_timeout` | No | Duration | The amount of time that elapses before the data read from a DynamoDB stream expires when used with acknowledgements. Default is 10 minutes.
`s3_data_file_acknowledgment_timeout` | No | Duration | The amount of time that elapses before the data read from a DynamoDB export expires when used with acknowledgments. Default is 5 minutes.
`tables` | Yes | List | The configuration for the DynamoDB table. See [tables](#tables) for more information.
### aws
Use the following options in the AWS configuration.
Option | Required | Type | Description
:--- | :--- | :--- | :---
`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Storage Service (Amazon S3). Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
`aws_sts_header_overrides` | No | Map | A map of header overrides that the AWS Identity and Access Management (IAM) role assumes for the sink plugin.
### tables
Use the following options with the `tables` configuration.
Option | Required | Type | Description
:--- | :--- | :--- | :---
`table_arn` | Yes | String | The Amazon Resource Name (ARN) of the source DynamoDB table.
`export` | No | Export | Determines how to export DynamoDB events. For more information, see [export](#export-options).
`stream` | No | Stream | Determines how the pipeline reads data from the DynamoDB table. For more information, see [stream](#stream-option).
#### Export options
The following options let you customize the export destination for DynamoDB events.
Option | Required | Type | Description
:--- | :--- | :--- | :---
`s3_bucket` | Yes | String | The destination bucket that stores the exported data files.
`s3_prefix` | No | String | The custom prefix for the S3 bucket.
`s3_sse_kms_key_id` | No | String | An AWS Key Management Service (AWS KMS) key that encrypts the export data files. The `key_id` is the ARN of the KMS key, for example, `arn:aws:kms:us-west-2:123456789012:key/0a4bc22f-bb96-4ad4-80ca-63b12b3ec147`.
`s3_region` | No | String | The Region for the S3 bucket.
#### Stream option
The following option lets you customize how the pipeline reads events from the DynamoDB table.
Option | Required | Type | Description
:--- | :--- | :--- | :---
`start_position` | No | String | The position from where the source starts reading stream events when the DynamoDB stream option is enabled. `LATEST` starts reading events from the most recent stream record.