Add E2E acknowledgements (#3811)

* Add E2E acknowledgements

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Fix typos

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Remove E from achknowledgments

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* another e

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Fix link

* Adjust S3 source.

* Update _data-prepper/pipelines/configuration/sources/s3.md

Co-authored-by: Nathan Bower <nbower@amazon.com>

* Update _data-prepper/pipelines/configuration/sources/s3.md

* Update _data-prepper/pipelines/configuration/sources/s3.md

Co-authored-by: Nathan Bower <nbower@amazon.com>

* Update _data-prepper/pipelines/pipelines.md

Co-authored-by: Nathan Bower <nbower@amazon.com>

* Update _data-prepper/pipelines/pipelines.md

Co-authored-by: Nathan Bower <nbower@amazon.com>

* Update _data-prepper/pipelines/pipelines.md

---------

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
Naarcha-AWS 2023-04-20 12:39:24 -05:00 committed by GitHub
parent 8de7c3e3d3
commit 098f6d961a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 15 additions and 6 deletions

View File

@ -1,16 +1,14 @@
---
layout: default
title: s3
title: Amazon S3 source
parent: Sources
grand_parent: Pipelines
nav_order: 20
---
# s3
# `s3` source
## Overview
`s3` is a source plugin that reads events from [Amazon Simple Storage Service (S3)](https://aws.amazon.com/s3/) (Amazon S3) objects. The following table describes options you can use to configure the `s3` source.
The Amazon Simple Storage Service (Amazon S3) source plugin reads events from [S3](https://aws.amazon.com/s3/) objects. The following table describes options you can use to configure the `s3` source.
Option | Required | Type | Description
:--- | :--- | :--- | :---
@ -24,6 +22,7 @@ buffer_timeout | No | Duration | The amount of time allowed for for writing even
records_to_accumulate | No | Integer | The number of messages that accumulate before writing to the buffer. Default value is 100.
metadata_root_key | No | String | Base key for adding S3 metadata to each Event. The metadata includes the key and bucket for each S3 object. Defaults to `s3/`.
disable_bucket_ownership_validation | No | Boolean | If `true`, the S3Source will not attempt to validate that the bucket is owned by the expected account. The expected account is the same account that owns the Amazon SQS queue. Defaults to `false`.
acknowledgments | No | Boolean | If `true`, enables `s3` sources to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks.
## sqs
@ -87,4 +86,4 @@ The `s3` processor includes the following metrics.
* `s3ObjectSizeBytes`: Measures the size of Amazon S3 objects as reported by the Amazon S3 `Content-Length`. For compressed objects, this is the compressed size.
* `s3ObjectProcessedBytes`: Measures the bytes processed by the `s3` source for a given object. For compressed objects, this is the uncompressed size.
* `s3ObjectsEvents`: Measures the number of events (sometimes called records) produced by an Amazon S3 object.
* `s3ObjectsEvents`: Measures the number of events (sometimes called records) produced by an S3 object.

View File

@ -48,6 +48,16 @@ simple-sample-pipeline:
Starting from Data Prepper 2.0, you can define pipelines across multiple configuration YAML files, where each file contains the configuration for one or more pipelines. This gives you more freedom to organize and chain complex pipeline configurations. For Data Prepper to load your pipeline configuration properly, place your configuration YAML files in the `pipelines` folder under your application's home directory (e.g. `/usr/share/data-prepper`).
{: .note }
## End-to-end acknowledgments
Data Prepper ensures the durability and reliability of data written from sources and delivered to sinks through end-to-end (E2E) acknowledgments. An E2E acknowledgment begins at the source, which monitors a batch of events set inside pipelines and waits for a positive acknowledgment when those events are successfully pushed to sinks. When a pipeline contains multiple sinks, including sinks set as additional Data Prepper pipelines, the E2E acknowledgment sends when events are received by the final sink in a pipeline chain.
Alternatively, the source sends a negative acknowledgment when an event cannot be delivered to a sink for any reason.
When any component of a pipeline fails and is unable to send an event, the source receives no acknowledgment. In the case of a failure, the pipeline's source times out. This gives you the ability to take any necessary actions to address the source failure, including rerunning the pipeline or logging the failure.
As of Data Prepper 2.2, only the `s3` source and `opensearch` sink support E2E acknowledgments.
## Conditional routing
Pipelines also support **conditional routing** which allows you to route Events to different sinks based on specific conditions. To add conditional routing to a pipeline, specify a list of named routes under the `route` component and add specific routes to sinks under the `routes` property. Any sink with the `routes` property will only accept Events that match at least one of the routing conditions.