70 lines
3.1 KiB
Markdown
70 lines
3.1 KiB
Markdown
|
---
|
||
|
layout: default
|
||
|
title: s3
|
||
|
parent: Sinks
|
||
|
grand_parent: Pipelines
|
||
|
nav_order: 55
|
||
|
---
|
||
|
|
||
|
# s3
|
||
|
|
||
|
The `s3` sink sends records to an Amazon Simple Storage Service (Amazon S3) bucket using the S3 client.
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type `ndjson`:
|
||
|
|
||
|
```
|
||
|
pipeline:
|
||
|
...
|
||
|
sink:
|
||
|
- s3:
|
||
|
aws:
|
||
|
region: us-east-1
|
||
|
sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
|
||
|
sts_header_overrides:
|
||
|
max_retries: 5
|
||
|
bucket:
|
||
|
name: bucket_name
|
||
|
object_key:
|
||
|
path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/
|
||
|
threshold:
|
||
|
event_count: 2000
|
||
|
maximum_size: 50mb
|
||
|
event_collect_timeout: 15s
|
||
|
codec:
|
||
|
ndjson:
|
||
|
buffer_type: in_memory
|
||
|
```
|
||
|
|
||
|
## Configuration
|
||
|
|
||
|
Use the following options when customizing the `s3` sink.
|
||
|
|
||
|
Option | Required | Type | Description
|
||
|
:--- | :--- | :--- | :---
|
||
|
`bucket` | Yes | String | The object from which the data is retrieved and then stored. The `name` must match the name of your object store.
|
||
|
`region` | No | String | The AWS Region to use when connecting to S3. Defaults to the [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
|
||
|
`sts_role_arn` | No | String | The [AWS Security Token Service](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) (AWS STS) role that the `s3` sink assumes when sending a request to S3. Defaults to the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
|
||
|
`sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS.
|
||
|
`max_retries` | No | Integer | The maximum number of times a single request should retry when ingesting data to S3. Defaults to `5`.
|
||
|
`object_key` | No | Sets the `path_prefix` and the `file_pattern` of the object store. Defaults to the S3 object `events-%{yyyy-MM-dd'T'hh-mm-ss}` found inside the root directory of the bucket.
|
||
|
|
||
|
## Threshold configuration options
|
||
|
|
||
|
Use the following options to set ingestion thresholds for the `s3` sink.
|
||
|
|
||
|
Option | Required | Type | Description
|
||
|
:--- | :--- | :--- | :---
|
||
|
`event_count` | Yes | Integer | The maximum number of events the S3 bucket can ingest.
|
||
|
`maximum_size` | Yes | String | The maximum count or number of bytes that the S3 bucket can ingest. Defaults to `50mb`.
|
||
|
`event_collect_timeout` | Yes | String | Sets the time period during which events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string, such as `PT20.345S`, or a simple notation, such as `60s` or `1500ms`.
|
||
|
|
||
|
## buffer_type
|
||
|
|
||
|
`buffer_type` is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. Use of one of the following options:
|
||
|
|
||
|
- `local_file`: Flushes the record into a file on your machine.
|
||
|
- `in_memory`: Stores the record in memory.
|
||
|
|
||
|
|