3.1 KiB
3.1 KiB
layout | title | parent | grand_parent | nav_order |
---|---|---|---|---|
default | s3 | Sinks | Pipelines | 55 |
s3
The s3
sink sends records to an Amazon Simple Storage Service (Amazon S3) bucket using the S3 client.
Usage
The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type ndjson
:
pipeline:
...
sink:
- s3:
aws:
region: us-east-1
sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
sts_header_overrides:
max_retries: 5
bucket:
name: bucket_name
object_key:
path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/
threshold:
event_count: 2000
maximum_size: 50mb
event_collect_timeout: 15s
codec:
ndjson:
buffer_type: in_memory
Configuration
Use the following options when customizing the s3
sink.
Option | Required | Type | Description |
---|---|---|---|
bucket |
Yes | String | The object from which the data is retrieved and then stored. The name must match the name of your object store. |
region |
No | String | The AWS Region to use when connecting to S3. Defaults to the standard SDK behavior to determine the Region. |
sts_role_arn |
No | String | The AWS Security Token Service (AWS STS) role that the s3 sink assumes when sending a request to S3. Defaults to the standard SDK behavior for credentials. |
sts_external_id |
No | String | The external ID to attach to AssumeRole requests from AWS STS. |
max_retries |
No | Integer | The maximum number of times a single request should retry when ingesting data to S3. Defaults to 5 . |
object_key |
No | Sets the path_prefix and the file_pattern of the object store. Defaults to the S3 object events-%{yyyy-MM-dd'T'hh-mm-ss} found inside the root directory of the bucket. |
Threshold configuration options
Use the following options to set ingestion thresholds for the s3
sink.
Option | Required | Type | Description |
---|---|---|---|
event_count |
Yes | Integer | The maximum number of events the S3 bucket can ingest. |
maximum_size |
Yes | String | The maximum count or number of bytes that the S3 bucket can ingest. Defaults to 50mb . |
event_collect_timeout |
Yes | String | Sets the time period during which events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string, such as PT20.345S , or a simple notation, such as 60s or 1500ms . |
buffer_type
buffer_type
is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. Use of one of the following options:
local_file
: Flushes the record into a file on your machine.in_memory
: Stores the record in memory.