opensearch-docs-cn/_data-prepper/pipelines/configuration/sinks/s3.md

3.1 KiB

layout title parent grand_parent nav_order
default s3 Sinks Pipelines 55

s3

The s3 sink sends records to an Amazon Simple Storage Service (Amazon S3) bucket using the S3 client.

Usage

The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type ndjson:

pipeline:
  ...
  sink:
    - s3:
        aws:
          region: us-east-1
          sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
          sts_header_overrides:
        max_retries: 5
        bucket:
          name: bucket_name
          object_key:
            path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/
        threshold:
          event_count: 2000
          maximum_size: 50mb
          event_collect_timeout: 15s
        codec:
          ndjson:
        buffer_type: in_memory

Configuration

Use the following options when customizing the s3 sink.

Option Required Type Description
bucket Yes String The object from which the data is retrieved and then stored. The name must match the name of your object store.
region No String The AWS Region to use when connecting to S3. Defaults to the standard SDK behavior to determine the Region.
sts_role_arn No String The AWS Security Token Service (AWS STS) role that the s3 sink assumes when sending a request to S3. Defaults to the standard SDK behavior for credentials.
sts_external_id No String The external ID to attach to AssumeRole requests from AWS STS.
max_retries No Integer The maximum number of times a single request should retry when ingesting data to S3. Defaults to 5.
object_key No Sets the path_prefix and the file_pattern of the object store. Defaults to the S3 object events-%{yyyy-MM-dd'T'hh-mm-ss} found inside the root directory of the bucket.

Threshold configuration options

Use the following options to set ingestion thresholds for the s3 sink.

Option Required Type Description
event_count Yes Integer The maximum number of events the S3 bucket can ingest.
maximum_size Yes String The maximum count or number of bytes that the S3 bucket can ingest. Defaults to 50mb.
event_collect_timeout Yes String Sets the time period during which events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string, such as PT20.345S, or a simple notation, such as 60s or 1500ms.

buffer_type

buffer_type is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. Use of one of the following options:

  • local_file: Flushes the record into a file on your machine.
  • in_memory: Stores the record in memory.