The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type `ndjson`:
`object_key` | No | Sets the `path_prefix` and the `file_pattern` of the object store. Defaults to the S3 object `events-%{yyyy-MM-dd'T'hh-mm-ss}` found inside the root directory of the bucket.
`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
`sts_header_overrides` | No | Map | A map of header overrides that the IAM role assumes for the sink plugin.
`sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS.
`event_collect_timeout` | Yes | String | Sets the time period during which events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string, such as `PT20.345S`, or a simple notation, such as `60s` or `1500ms`.
`buffer_type` is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. The default value is `in_memory`. Use one of the following options:
-`local_file`: Flushes the record into a file on your machine.
-`multipart`: Writes using the [S3 multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html). Every 10 MB is written as a part.
## Object key configuration
Option | Required | Type | Description
:--- | :--- | :--- | :---
`path_prefix` | Yes | String | The S3 key prefix path to use. Accepts date-time formatting. For example, you can use `%{yyyy}/%{MM}/%{dd}/%{HH}/` to create hourly folders in S3. By default, events write to the root of the bucket.
## codec
The `codec` determines how the `s3` source formats data written to each S3 object.
### avro codec
The `avro` codec writes an event as an [Apache Avro](https://avro.apache.org/) document.
Because Avro requires a schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema.
In general, you should define your own schema because it will most accurately reflect your needs.
We recommend that you make your Avro fields use a null [union](https://avro.apache.org/docs/current/specification/#unions).
Without the null union, each field must be present or the data will fail to write to the sink.
If you can be certain that each each event has a given field, you can make it non-nullable.
When you provide your own Avro schema, that schema defines the final structure of your data.
Therefore, any extra values inside any incoming events that are not mapped in the Arvo schema will not be included in the final destination.
To avoid confusion between a custom Arvo schema and the `include_keys` or `exclude_keys` sink configurations, Data Prepper does not allow the use of the `include_keys` or `exclude_keys` with a custom schema.
`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true.
`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event.
### ndjson codec
The `ndjson` codec writes each line as a JSON object.
The `ndjson` codec does not take any configurations.
### json codec
The `json` codec writes events in a single large JSON file.
Each event is written into an object within a JSON array.
Option | Required | Type | Description
:--- | :--- | :--- | :---
`key_name` | No | String | The name of the key for the JSON array. By default this is `events`.
### parquet codec
The `parquet` codec writes events into a Parquet file.
`schema` | Yes | String | The Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration). Not required if `auto_schema` is set to true.
`auto_schema` | No | Boolean | When set to `true`, automatically generates the Avro [schema declaration](https://avro.apache.org/docs/current/specification/#schema-declaration) from the first event.