Behavioral changes for Data Prepper S3 sink (#4897)
* Updates the Data Prepper documentation for S3 sinks based on recent behavior changes. Signed-off-by: David Venable <dlv@amazon.com> * Updates from the PR feedback. Signed-off-by: David Venable <dlv@amazon.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: David Venable <dlv@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
This commit is contained in:
parent
dc21de0f80
commit
64d59b9bb2
|
@ -98,10 +98,21 @@ The `avro` codec writes an event as an [Apache Avro](https://avro.apache.org/) d
|
|||
|
||||
Because Avro requires a schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema.
|
||||
In general, you should define your own schema because it will most accurately reflect your needs.
|
||||
|
||||
We recommend that you make your Avro fields use a null [union](https://avro.apache.org/docs/current/specification/#unions).
|
||||
Without the null union, each field must be present or the data will fail to write to the sink.
|
||||
If you can be certain that each each event has a given field, you can make it non-nullable.
|
||||
|
||||
When you provide your own Avro schema, that schema defines the final structure of your data.
|
||||
Therefore, any extra values inside any incoming events that are not mapped in the Arvo schema will not be included in the final destination.
|
||||
To avoid confusion between a custom Arvo schema and the `include_keys` or `exclude_keys` sink configurations, Data Prepper does not allow the use of the `include_keys` or `exclude_keys` with a custom schema.
|
||||
|
||||
In cases where your data is uniform, you may be able to automatically generate a schema.
|
||||
Automatically generated schemas are based on the first event received by the codec.
|
||||
The schema will only contain keys from this event. Therefore, you must have all keys present in all events in order for the automatically generated schema to produce a working schema.
|
||||
The schema will only contain keys from this event.
|
||||
Therefore, you must have all keys present in all events in order for the automatically generated schema to produce a working schema.
|
||||
Automatically generated schemas make all fields nullable.
|
||||
Use the sink's `include_keys` and `exclude_keys` configurations to control what data is included in the auto-generated schema.
|
||||
|
||||
|
||||
Option | Required | Type | Description
|
||||
|
@ -131,14 +142,13 @@ Option | Required | Type | Description
|
|||
### parquet codec
|
||||
|
||||
The `parquet` codec writes events into a Parquet file.
|
||||
You must set the `buffer_type` to `multipart` when using Parquet.
|
||||
When using the Parquet codec, set the `buffer_type` to `in_memory`.
|
||||
|
||||
The Parquet codec writes data using the Avro schema. However, we generally recommend that you define your own schema so that it can best meet your needs.
|
||||
The Parquet codec writes data using the Avro schema.
|
||||
Because Parquet requires an Avro schema, you may either define the schema yourself, or Data Prepper will automatically generate a schema.
|
||||
However, we generally recommend that you define your own schema so that it can best meet your needs.
|
||||
|
||||
In cases where your data is uniform, you may be able to automatically generate a schema.
|
||||
Automatically generated schemas are based on the first event received by the codec.
|
||||
The schema will only contain keys from this event. Therefore, you must have all keys present in all events in order for the automatically generated schema to produce a working schema.
|
||||
Automatically generated schemas make all fields nullable.
|
||||
For details on the Avro schema and recommendations, see the [Avro codec](#avro-codec) documentation.
|
||||
|
||||
|
||||
Option | Required | Type | Description
|
||||
|
|
|
@ -18,5 +18,5 @@ Option | Required | Type | Description
|
|||
:--- | :--- |:------------| :---
|
||||
routes | No | String list | A list of routes for which this sink applies. If not provided, this sink receives all events. See [conditional routing]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#conditional-routing) for more information.
|
||||
tags_target_key | No | String | When specified, includes event tags in the output of the provided key.
|
||||
include_keys | No | String list | When specified, provides the keys in this list in the data sent to the sink.
|
||||
exclude_keys | No | String list | When specified, excludes the keys given from the data sent to the sink.
|
||||
include_keys | No | String list | When specified, provides the keys in this list in the data sent to the sink. Some codecs and sinks do not allow use of this field.
|
||||
exclude_keys | No | String list | When specified, excludes the keys given from the data sent to the sink. Some codecs and sinks do not allow use of this field.
|
||||
|
|
Loading…
Reference in New Issue