From fa5bc07a69d478f645757dfbb4afdeb483667170 Mon Sep 17 00:00:00 2001 From: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Date: Wed, 29 Nov 2023 11:38:08 -0600 Subject: [PATCH] Add DynamoDB source (#5664) * Add DynamoDB source Signed-off-by: Naarcha-AWS * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Reconfigure sections Signed-off-by: Naarcha-AWS * Add Dynamo DB source. Signed-off-by: Naarcha-AWS * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update dynamo-db.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update dynamo-db.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update dynamo-db.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update dynamo-db.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower --- .../configuration/sources/dynamo-db.md | 96 +++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 _data-prepper/pipelines/configuration/sources/dynamo-db.md diff --git a/_data-prepper/pipelines/configuration/sources/dynamo-db.md b/_data-prepper/pipelines/configuration/sources/dynamo-db.md new file mode 100644 index 00000000..597e8351 --- /dev/null +++ b/_data-prepper/pipelines/configuration/sources/dynamo-db.md @@ -0,0 +1,96 @@ +--- +layout: default +title: dynamodb +parent: Sources +grand_parent: Pipelines +nav_order: 3 +--- + +# dynamodb + +The `dynamodb` source enables change data capture (CDC) on [Amazon DynamoDB](https://aws.amazon.com/dynamodb/) tables. It can receive table events, such as `create`, `update`, or `delete`, using DynamoDB streams and supports initial snapshots using [point-in-time recovery (PITR)](https://aws.amazon.com/dynamodb/pitr/). + +The source includes two ingestion options to stream DynamoDB events: + +1. A _full initial snapshot_ using [PITR](https://aws.amazon.com/dynamodb/pitr/) gets an initial snapshot of the current state of the DynamoDB table. This requires the PITR Snapshots and DyanmoDB option enabled on your DynamoDB table. +2. Stream events from DynamoDB streams without full initial snapshots. This is useful if you already have a snapshot mechanism within your pipelines. This requires that the DynamoDB stream option is enabled on the DynamoDB table. + +## Usage + +The following example pipeline specifies DynamoDB as a source. It ingests data from a DyanmoDB table named `table-a` through a PITR snapshot. It also indicates the `start_position`, which tells the pipeline how to read DynamoDB stream events: + +```yaml +version: "2" +cdc-pipeline: + source: + dynamodb: + tables: + - table_arn: "arn:aws:dynamodb:us-west-2:123456789012:table/table-a" + export: + s3_bucket: "test-bucket" + s3_prefix: "myprefix" + stream: + start_position: "LATEST" # Read latest data from streams (Default) + aws: + region: "us-west-2" + sts_role_arn: "arn:aws:iam::123456789012:role/my-iam-role" +``` + +## Configuration options + +The following tables describe the configuration options for the `dynamodb` source. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`aws` | Yes | AWS | The AWS configuration. See [aws](#aws) for more information. +`acknowledgments` | No | Boolean | When `true`, enables `s3` sources to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) when events are received by OpenSearch sinks. +`shared_acknowledgement_timeout` | No | Duration | The amount of time that elapses before the data read from a DynamoDB stream expires when used with acknowledgements. Default is 10 minutes. +`s3_data_file_acknowledgment_timeout` | No | Duration | The amount of time that elapses before the data read from a DynamoDB export expires when used with acknowledgments. Default is 5 minutes. +`tables` | Yes | List | The configuration for the DynamoDB table. See [tables](#tables) for more information. + +### aws + +Use the following options in the AWS configuration. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). +`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Storage Service (Amazon S3). Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). +`aws_sts_header_overrides` | No | Map | A map of header overrides that the AWS Identity and Access Management (IAM) role assumes for the sink plugin. + + +### tables + +Use the following options with the `tables` configuration. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`table_arn` | Yes | String | The Amazon Resource Name (ARN) of the source DynamoDB table. +`export` | No | Export | Determines how to export DynamoDB events. For more information, see [export](#export-options). +`stream` | No | Stream | Determines how the pipeline reads data from the DynamoDB table. For more information, see [stream](#stream-option). + +#### Export options + +The following options let you customize the export destination for DynamoDB events. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`s3_bucket` | Yes | String | The destination bucket that stores the exported data files. +`s3_prefix` | No | String | The custom prefix for the S3 bucket. +`s3_sse_kms_key_id` | No | String | An AWS Key Management Service (AWS KMS) key that encrypts the export data files. The `key_id` is the ARN of the KMS key, for example, `arn:aws:kms:us-west-2:123456789012:key/0a4bc22f-bb96-4ad4-80ca-63b12b3ec147`. +`s3_region` | No | String | The Region for the S3 bucket. + +#### Stream option + +The following option lets you customize how the pipeline reads events from the DynamoDB table. + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`start_position` | No | String | The position from where the source starts reading stream events when the DynamoDB stream option is enabled. `LATEST` starts reading events from the most recent stream record. + + + + + + +