Add Serverless configuration options for Data Prepper (#5663)
* Add Serverless configuration options for Data Prepper Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Implement additional technical feedback Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Fix options Signed-off-by: Naarcha-AWS <naarcha@amazon.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _data-prepper/pipelines/configuration/sinks/opensearch.md Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Naarcha-AWS <naarcha@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
3ec0aa4228
commit
5877f13f16
|
@ -31,7 +31,7 @@ pipeline:
|
|||
bulk_size: 4
|
||||
```
|
||||
|
||||
To configure an [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) sink, specify the domain endpoint as the `hosts` option:
|
||||
To configure an [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) sink, specify the domain endpoint as the `hosts` option, as shown in the following example:
|
||||
|
||||
```yaml
|
||||
pipeline:
|
||||
|
@ -76,7 +76,19 @@ ism_policy_file | No | String | The absolute file path for an ISM (Index State M
|
|||
number_of_shards | No | Integer | The number of primary shards that an index should have on the destination OpenSearch server. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/).
|
||||
number_of_replicas | No | Integer | The number of replica shards each primary shard should have on the destination OpenSearch server. For example, if you have 4 primary shards and set number_of_replicas to 3, the index has 12 replica shards. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/).
|
||||
distribution_version | No | String | Indicates whether the sink backend version is Elasticsearch 6 or later. `es6` represents Elasticsearch 6. `default` represents the latest compatible backend version, such as Elasticsearch 7.x, OpenSearch 1.x, or OpenSearch 2.x. Default is `default`.
|
||||
enable_request_compression | No | Boolean | Whether to enable compression when sending requests to OpenSearch. When `distribution_version` is set to `es6`, default is `false`. For all other distribution versions, default is `true`.
|
||||
enable_request_compression | No | Boolean | Whether to enable compression when sending requests to OpenSearch. When `distribution_version` is set to `es6`, default is `false`. For all other distribution versions, default is `true`.
|
||||
serverless | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
|
||||
serverless_options | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
|
||||
|
||||
### Serverless options
|
||||
|
||||
The following options can be used in the `serverless_options` object.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :---| :---
|
||||
network_policy_name | Yes | String | The name of the network policy to create.
|
||||
collection_name | Yes | String | The name of the Amazon OpenSearch Serverless collection to configure.
|
||||
vpce_id | Yes | String | The virtual private cloud (VPC) endpoint to which the source connects.
|
||||
|
||||
### Configure max_retries
|
||||
|
||||
|
|
|
@ -68,6 +68,20 @@ opensearch-source-pipeline:
|
|||
...
|
||||
```
|
||||
|
||||
## Amazon OpenSearch Serverless
|
||||
|
||||
The `opensearch` source can be configured with Amazon OpenSearch Serverless by setting the `serverless` option to `true`, as shown in the following example:
|
||||
|
||||
```yaml
|
||||
- opensearch:
|
||||
hosts: [ 'https://1234567890abcdefghijkl.us-west-2.aoss.amazonaws.com' ]
|
||||
aws:
|
||||
sts_role_arn: 'arn:aws:iam::123456789012:role/my-domain-role'
|
||||
region: 'us-west-2'
|
||||
serverless: true
|
||||
```
|
||||
|
||||
|
||||
## Using metadata
|
||||
|
||||
When the `opensource` source constructs Data Prepper events from documents in the cluster, the document index is stored in the EventMetadata with an `opensearch-index` key, and the document_id is stored in the `EventMetadata` with the `opensearch-document_id` as the key. This allows for conditional routing based on the index or `document_id`. The following example pipeline configuration sends events to an `opensearch` sink and uses the same index and `document_id` from the source cluster as in the destination cluster:
|
||||
|
@ -106,6 +120,18 @@ Option | Required | Type | Description
|
|||
`indices` | No | Object | The configuration for filtering which indexes are processed. Defaults to all indexes, including system indexes. For more information, see [indexes](#indices).
|
||||
`scheduling` | No | Object | The scheduling configuration. For more information, see [Scheduling](#scheduling).
|
||||
`search_options` | No | Object | A list of search options performed by the source. For more information, see [Search options](#search_options).
|
||||
`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` source is an Amazon OpenSearch Serverless collection. Default is `false`.
|
||||
`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` source is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
|
||||
|
||||
### Serverless options
|
||||
|
||||
The following options can be used in the `serverless_options` object.
|
||||
|
||||
Option | Required | Type | Description
|
||||
:--- | :--- | :---| :---
|
||||
`network_policy_name` | Yes | String | The name of the network policy to create.
|
||||
`collection_name` | Yes | String | The name of the Amazon OpenSearch Serverless collection to configure.
|
||||
`vpce_id` | Yes | String | The virtual private cloud (VPC) endpoint to which the source connects.
|
||||
|
||||
### Scheduling
|
||||
|
||||
|
|
Loading…
Reference in New Issue