Add Serverless configuration options for Data Prepper (#5663)

* Add Serverless configuration options for Data Prepper

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Implement additional technical feedback

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Fix options

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _data-prepper/pipelines/configuration/sinks/opensearch.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Naarcha-AWS 2023-11-28 10:10:15 -06:00 committed by GitHub
parent 3ec0aa4228
commit 5877f13f16
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 40 additions and 2 deletions

View File

@ -31,7 +31,7 @@ pipeline:
bulk_size: 4
```
To configure an [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) sink, specify the domain endpoint as the `hosts` option:
To configure an [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) sink, specify the domain endpoint as the `hosts` option, as shown in the following example:
```yaml
pipeline:
@ -77,6 +77,18 @@ number_of_shards | No | Integer | The number of primary shards that an index sho
number_of_replicas | No | Integer | The number of replica shards each primary shard should have on the destination OpenSearch server. For example, if you have 4 primary shards and set number_of_replicas to 3, the index has 12 replica shards. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/).
distribution_version | No | String | Indicates whether the sink backend version is Elasticsearch 6 or later. `es6` represents Elasticsearch 6. `default` represents the latest compatible backend version, such as Elasticsearch 7.x, OpenSearch 1.x, or OpenSearch 2.x. Default is `default`.
enable_request_compression | No | Boolean | Whether to enable compression when sending requests to OpenSearch. When `distribution_version` is set to `es6`, default is `false`. For all other distribution versions, default is `true`.
serverless | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
serverless_options | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
### Serverless options
The following options can be used in the `serverless_options` object.
Option | Required | Type | Description
:--- | :--- | :---| :---
network_policy_name | Yes | String | The name of the network policy to create.
collection_name | Yes | String | The name of the Amazon OpenSearch Serverless collection to configure.
vpce_id | Yes | String | The virtual private cloud (VPC) endpoint to which the source connects.
### Configure max_retries

View File

@ -68,6 +68,20 @@ opensearch-source-pipeline:
...
```
## Amazon OpenSearch Serverless
The `opensearch` source can be configured with Amazon OpenSearch Serverless by setting the `serverless` option to `true`, as shown in the following example:
```yaml
- opensearch:
hosts: [ 'https://1234567890abcdefghijkl.us-west-2.aoss.amazonaws.com' ]
aws:
sts_role_arn: 'arn:aws:iam::123456789012:role/my-domain-role'
region: 'us-west-2'
serverless: true
```
## Using metadata
When the `opensource` source constructs Data Prepper events from documents in the cluster, the document index is stored in the EventMetadata with an `opensearch-index` key, and the document_id is stored in the `EventMetadata` with the `opensearch-document_id` as the key. This allows for conditional routing based on the index or `document_id`. The following example pipeline configuration sends events to an `opensearch` sink and uses the same index and `document_id` from the source cluster as in the destination cluster:
@ -106,6 +120,18 @@ Option | Required | Type | Description
`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [indexes](#indices).
`scheduling` | No | Object | The scheduling configuration. For more information, see [Scheduling](#scheduling).
`search_options` | No | Object | A list of search options performed by the source. For more information, see [Search options](#search_options).
`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` source is an Amazon OpenSearch Serverless collection. Default is `false`.
`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` source is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
### Serverless options
The following options can be used in the `serverless_options` object.
Option | Required | Type | Description
:--- | :--- | :---| :---
`network_policy_name` | Yes | String | The name of the network policy to create.
`collection_name` | Yes | String | The name of the Amazon OpenSearch Serverless collection to configure.
`vpce_id` | Yes | String | The virtual private cloud (VPC) endpoint to which the source connects.
### Scheduling