opensearch-docs-cn/_data-prepper/pipelines/configuration/sources/s3.md

90 lines
5.8 KiB
Markdown

---
layout: default
title: s3
parent: Sources
grand_parent: Pipelines
nav_order: 20
---
# s3
## Overview
`s3` is a source plugin that reads events from [Amazon Simple Storage Service (S3)](https://aws.amazon.com/s3/) (Amazon S3) objects. The following table describes options you can use to configure the `s3` source.
Option | Required | Type | Description
:--- | :--- | :--- | :---
notification_type | Yes | String | Must be `sqs`.
compression | No | String | The compression algorithm to apply: `none`, `gzip`, or `automatic`. Default value is `none`.
codec | Yes | Codec | The codec to apply. Must be `newline`, `json`, or `csv`.
sqs | Yes | sqs | The [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) configuration. See [sqs](#sqs) for details.
aws | Yes | aws | The AWS configuration. See [aws](#aws) for details.
on_error | No | String | Determines how to handle errors in Amazon SQS. Can be either `retain_messages` or `delete_messages`. If `retain_messages`, then Data Prepper will leave the message in the Amazon SQS queue and try again. This is recommended for dead-letter queues. If `delete_messages`, then Data Prepper will delete failed messages. Default value is `retain_messages`.
buffer_timeout | No | Duration | The amount of time allowed for for writing events to the Data Prepper buffer before timeout occurs. Any events that the Amazon S3 source cannot write to the buffer in this time will be discarded. Default value is 10 seconds.
records_to_accumulate | No | Integer | The number of messages that accumulate before writing to the buffer. Default value is 100.
metadata_root_key | No | String | Base key for adding S3 metadata to each Event. The metadata includes the key and bucket for each S3 object. Defaults to `s3/`.
disable_bucket_ownership_validation | No | Boolean | If `true`, the S3Source will not attempt to validate that the bucket is owned by the expected account. The expected account is the same account that owns the Amazon SQS queue. Defaults to `false`.
## sqs
The following parameters allow you to configure usage for Amazon SQS in the `s3` source plugin.
Option | Required | Type | Description
:--- | :--- | :--- | :---
queue_url | Yes | String | The URL of the Amazon SQS queue from which messages are received.
maximum_messages | No | Integer | The maximum number of messages to receive from the Amazon SQS queue in any single request. Default value is `10`.
visibility_timeout | No | Duration | The visibility timeout to apply to messages read from the Amazon SQS queue. This should be set to the amount of time that Data Prepper may take to read all the Amazon S3 objects in a batch. Default value is `30s`.
wait_time | No | Duration | The amount of time to wait for long polling on the Amazon SQS API. Default value is `20s`.
poll_delay | No | Duration | A delay to place between reading/processing a batch of Amazon SQS messages and making a subsequent request. Default value is `0s`.
## aws
Option | Required | Type | Description
:--- | :--- | :--- | :---
region | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
sts_role_arn | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to null, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
## file
Source for flat file input.
Option | Required | Type | Description
:--- | :--- | :--- | :---
path | Yes | String | The path to the input file (e.g. `logs/my-log.log`).
format | No | String | The format of each line in the file. Valid options are `json` or `plain`. Default value is `plain`.
record_type | No | String | The record type to store. Valid options are `string` or `event`. Default value is `string`. If you would like to use the file source for log analytics use cases like grok, set this option to `event`.
## pipeline
Source for reading from another pipeline.
Option | Required | Type | Description
:--- | :--- | :--- | :---
name | Yes | String | Name of the pipeline to read from.
## Metrics
The `s3` processor includes the following metrics.
### Counters
* `s3ObjectsFailed`: The number of Amazon S3 objects that the `s3` source failed to read.
* `s3ObjectsNotFound`: The number of Amazon S3 objects that the `s3` source failed to read due to an Amazon S3 "Not Found" error. These are also counted toward `s3ObjectsFailed`.
* `s3ObjectsAccessDenied`: The number of Amazon S3 objects that the `s3` source failed to read due to an "Access Denied" or "Forbidden" error. These are also counted toward `s3ObjectsFailed`.
* `s3ObjectsSucceeded`: The number of Amazon S3 objects that the `s3` source successfully read.
* `sqsMessagesReceived`: The number of Amazon SQS messages received from the queue by the `s3` source.
* `sqsMessagesDeleted`: The number of Amazon SQS messages deleted from the queue by the `s3` source.
* `sqsMessagesFailed`: The number of Amazon SQS messages that the `s3` source failed to parse.
### Timers
* `s3ObjectReadTimeElapsed`: Measures the amount of time the `s3` source takes to perform a request to GET an S3 object, parse it, and write events to the buffer.
* `sqsMessageDelay`: Measures the amount of time from when Amazon S3 creates an object to when it is fully parsed.
### Distribution summaries
* `s3ObjectSizeBytes`: Measures the size of Amazon S3 objects as reported by the Amazon S3 `Content-Length`. For compressed objects, this is the compressed size.
* `s3ObjectProcessedBytes`: Measures the bytes processed by the `s3` source for a given object. For compressed objects, this is the uncompressed size.
* `s3ObjectsEvents`: Measures the number of events (sometimes called records) produced by an Amazon S3 object.