Added metrics sections to Aggregate processor subpages. (#2730)

* Added metrics section to Aggregate processor page.

Signed-off-by: carolxob <carolxob@amazon.com>

* Added Metrics section to individual Processors pages.

Signed-off-by: carolxob <carolxob@amazon.com>

* Added metrics section for JSON processor.

Signed-off-by: carolxob <carolxob@amazon.com>

* Added metrics sections. Changed Default is to Default value is.

Signed-off-by: carolxob <carolxob@amazon.com>

* Corrected references from AWS S3 to Amazon S3.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor updates to Metrics sections and phrasing.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updated Action link.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updates based on tech review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updates based on tech review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Tech review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor updates to buffer_size and batch_size default values.

Signed-off-by: carolxob <carolxob@amazon.com>

* Edits to Metrics sections for each processor.

Signed-off-by: carolxob <carolxob@amazon.com>

* Update made based ondoc review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor updates to intro text for processor pages. Minor adjustements to other text for clarity.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edits.

Signed-off-by: carolxob <carolxob@amazon.com>

* Adjustements to phrasing, fixed typos.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor updates to word choice and corrected a typo.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edit.

Signed-off-by: carolxob <carolxob@amazon.com>

* Made updates based ondoc review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Updates to http-source.

Signed-off-by: carolxob <carolxob@amazon.com>

* Added common processors table to affected docs.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update to one file.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update based on tech review feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edits.

Signed-off-by: carolxob <carolxob@amazon.com>

* Major editorial feedback incorporated through key-value.md.

Signed-off-by: carolxob <carolxob@amazon.com>

* Incorporated major editorial feedback thup to service-map-stateful.

Signed-off-by: carolxob <carolxob@amazon.com>

* Incorporated major editorial feedback for Processors section.

Signed-off-by: carolxob <carolxob@amazon.com>

* Major editorial updates, specifically to inclusion of text introducing option configuration tables.

Signed-off-by: carolxob <carolxob@amazon.com>

* Major editorial feedback through otel-trace.md incorporated.

Signed-off-by: carolxob <carolxob@amazon.com>

* Major editorial edits incorporated.

Signed-off-by: carolxob <carolxob@amazon.com>

* Technical feedback and editorial feedback incorporated.

Signed-off-by: carolxob <carolxob@amazon.com>

* Incorporated missing editorial feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor adjustements to OpenSearch sink.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor changes to capitalization.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor edits.

Signed-off-by: carolxob <carolxob@amazon.com>

* Made one instance of processor name consistent with other references.

Signed-off-by: carolxob <carolxob@amazon.com>

* Minor update based on editorial feedback.

Signed-off-by: carolxob <carolxob@amazon.com>

---------

Signed-off-by: carolxob <carolxob@amazon.com>
This commit is contained in:
Caroline 2023-02-27 11:40:59 -05:00 committed by GitHub
parent 665989898c
commit ef83f6d7d5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
32 changed files with 309 additions and 175 deletions

View File

@ -10,12 +10,13 @@ nav_order: 50
## Overview
The default buffer. Memory-based.
`Bounded blocking` is the default buffer and is memory based. The following table describes the `Bounded blocking` parameters.
Option | Required | Type | Description
:--- | :--- | :--- | :---
buffer_size | No | Integer | The maximum number of records the buffer accepts. Default is `12800`.
batch_size | No | Integer | The maximum number of records the buffer drains after each read. Default is `200`.
buffer_size | No | Integer | The maximum number of records the buffer accepts. Default value is `12800`.
batch_size | No | Integer | The maximum number of records the buffer drains after each read. Default value is `200`.
<!--- ## Configuration

View File

@ -8,4 +8,4 @@ nav_order: 20
# Buffers
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory-based (better performance) or disk-based (larger).
Buffers store data as it passes through the pipeline. If you implement a custom buffer, it can be memory based, which provides better performance, or disk based, which is larger in size.

View File

@ -10,19 +10,16 @@ nav_order: 45
## Overview
Adds an entry to event. `add_entries` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
The `add_entries` processor adds an entry to the event and is a [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processor. The following table describes the options you can use to configure the `add_entries` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of events to be added. Valid entries are `key`, `value`, and `overwrite_if_key_exists`.
key | N/A | N/A | Key of the new event to be added.
value | N/A | N/A | Value of the new entry to be added. Valid data types are strings, booleans, numbers, null, nested objects, and arrays containing the aforementioned data types.
overwrite_if_key_exists | No | Boolean | If true, the existing value gets overwritten if the key already exists within the event. Default is `false`.
overwrite_if_key_exists | No | Boolean | If true, the existing value is overwritten if the key already exists within the event. Default value is `false`.
<!--- ## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.---->

View File

@ -10,18 +10,40 @@ nav_order: 45
## Overview
Groups events together based on the keys provided and performs a action on each group.
The `aggregate` processor groups events based on the keys provided and performs an action on each group. The following table describes the options you can use to configure the `aggregate` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
identification_keys | Yes | List | A unordered list by which to group Events. Events with the same values for these keys are put into the same group. If an event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required. (e.g. `["sourceIp", "destinationIp", "port"]`).
action | Yes | AggregateAction | The action to be performed for each group. One of the available aggregate actions must be provided or you can create custom aggregate actions. `remove_duplicates` and `put_all` are available actions. For more information, see [creating custom aggregate actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions).
identification_keys | Yes | List | An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required (for example, `["sourceIp", "destinationIp", "port"]`).
action | Yes | AggregateAction | The action to be performed for each group. One of the available aggregate actions must be provided or you can create custom aggregate actions. `remove_duplicates` and `put_all` are the available actions. For more information, see [Creating New Aggregate Actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions).
group_duration | No | String | The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings ("PT20.345S", "PT15M", etc.) as well as simple notation for seconds (`"60s"`) and milliseconds (`"1500ms"`). Default value is `180s`.
<!---## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.--->
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | Metric representing the ingress of records to a pipeline component. |
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `aggregate` processor includes the following custom metrics.
**Counter**
* `actionHandleEventsOut`: The number of events that have been returned from the `handleEvent` call to the configured [action](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#action).
* `actionHandleEventsDropped`: The number of events that have not been returned from the `handleEvent` call to the configured [action](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#action).
* `actionHandleEventsProcessingErrors`: The number of calls made to `handleEvent` for the configured [action](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#action) that resulted in an error.
* `actionConcludeGroupEventsOut`: The number of events that have been returned from the `concludeGroup` call to the configured [action](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#action).
* `actionConcludeGroupEventsDropped`: The number of events that have not been returned from the `condludeGroup` call to the configured [action](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#action).
* `actionConcludeGroupEventsProcessingErrors`: The number of calls made to `concludeGroup` for the configured [action](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#action) that resulted in an error.
**Gauge**
* `currentAggregateGroups`: The current number of groups. This gauge decreases when a group concludes and increases when an event initiates the creation of a new group.

View File

@ -10,19 +10,15 @@ nav_order: 45
## Overview
Copy values within an event. `copy_values` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
The `copy_values` processor copies values within an event and is a [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processor. The following table describes the options you can use to configure the `copy_values` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
entries | Yes | List | List of entries to be copied. Valid values are `from_key`, `to_key`, and `overwrite_if_key_exists`.
entries | Yes | List | The list of entries to be copied. Valid values are `from_key`, `to_key`, and `overwrite_if_key_exists`.
from_key | N/A | N/A | The key of the entry to be copied.
to_key | N/A | N/A | The key of the new entry to be added.
overwrite_if_to_key_exists | No | Boolean | If true, the existing value is overwritten if the key already exists within the event. Default is `false`.
overwrite_if_to_key_exists | No | Boolean | If true, the existing value is overwritten if the key already exists within the event. Default value is `false`.
<!---## Configuration
Content will be added to this section.
## Metrics
Content will be added to this section.--->
Content will be added to this section.--->

View File

@ -10,21 +10,33 @@ nav_order: 45
## Overview
Takes in an Event and parses its CSV data into columns.
The `csv` processor parses comma-separated values (CSVs) from the event into columns. The following table describes the options you can use to configure the `csv` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
source | No | String | The field in the Event that will be parsed. Default is `message`.
quote_character | No | String | The character used as a text qualifier for a single column of data. Default is double quote `"`.
delimiter | No | String | The character separating each column. Default is `,`.
delete_header | No | Boolean | If specified, the header on the Event (`column_names_source_key`) deletes after the event is parsed. If theres no header on the event, no actions is taken. Default is true.
column_names_source_key | No | String | The field in the Event that specifies the CSV column names, which will be autodetected. If there must be extra column names, the column names autogenerate according to their index. If `column_names` is also defined, the header in `column_names_source_key` can also be used to generate the event fields. If too few columns are specified in this field, the remaining column names autogenerate. If too many column names are specified in this field, the CSV processor omits the extra column names.
column_names | No | List | User-specified names for the CSV columns. Default is `[column1, column2, ..., columnN]` if there are N columns of data in the CSV record and `column_names_source_key` is not defined. If `column_names_source_key` is defined, the header in `column_names_source_key` generates the Event fields. If too few columns are specified in this field, the remaining column names will autogenerate. If too many column names are specified in this field, CSV processor omits the extra column names.
source | No | String | The field in the event that will be parsed. Default value is `message`.
quote_character | No | String | The character used as a text qualifier for a single column of data. Default value is `"`.
delimiter | No | String | The character separating each column. Default value is `,`.
delete_header | No | Boolean | If specified, the event header (`column_names_source_key`) is deleted after the event is parsed. If there is no event header, no action is taken. Default value is true.
column_names_source_key | No | String | The field in the event that specifies the CSV column names, which will be automatically detected. If there need to be extra column names, the column names are automatically generated according to their index. If `column_names` is also defined, the header in `column_names_source_key` can also be used to generate the event fields. If too few columns are specified in this field, the remaining column names are automatically generated. If too many column names are specified in this field, the CSV processor omits the extra column names.
column_names | No | List | User-specified names for the CSV columns. Default value is `[column1, column2, ..., columnN]` if there are no columns of data in the CSV record and `column_names_source_key` is not defined. If `column_names_source_key` is defined, the header in `column_names_source_key` generates the event fields. If too few columns are specified in this field, the remaining column names are automatically generated. If too many column names are specified in this field, the CSV processor omits the extra column names.
<!---## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.--->
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | Metric representing the ingress of records to a pipeline component. |
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `csv` processor includes the following custom metrics.
**Counter**
* `csvInvalidEvents`: The number of invalid events. An exception is thrown when an invalid event is parsed. An unclosed quote usually causes this exception.

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Adds a default timestamp to the event or parses timestamp fields, and converts it to ISO 8601 format, which can be used as event timestamp.
The `date` processor adds a default timestamp to an event, parses timestamp fields, and converts timestamp information to the International Organization for Standardization (ISO) 8601 format. This timestamp information can be used as an event timestamp. The following table describes the options you can use to configure the `date` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
@ -23,8 +23,19 @@ locale | No | String | Locale is used for parsing dates. It's commonly used for
<!---## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.--->
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | Metric representing the ingress of records to a pipeline component. |
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `date` processor includes the following custom metrics.
* `dateProcessingMatchSuccessCounter`: Returns the number of records that match with at least one pattern specified by the `match configuration` option.
* `dateProcessingMatchFailureCounter`: Returns the number of records that did not match any of the patterns specified by the `patterns match` configuration option.

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Delete entries in an event. `delete_entries` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
The `delete_entries` processor deletes entries in an event and is a [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processor. The following table describes the options you can use to configure the `delete-entries` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
@ -18,8 +18,4 @@ with_keys | Yes | List | An array of keys of the entries to be deleted.
<!---## Configuration
Content will be added to this section.
## Metrics
Content will be added to this section.--->
Content will be added to this section.--->

View File

@ -10,17 +10,13 @@ nav_order: 45
## Overview
Drops all the events that are passed into this processor.
The `drop_events` processor drops all the events that are passed into it. The following table describes when events are dropped and how exceptions for dropping events are handled.
Option | Required | Type | Description
:--- | :--- | :--- | :---
drop_when | Yes | String | Accepts a Data Prepper Expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received.
handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so it doesn't get sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events).
drop_when | Yes | String | Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received.
handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so that it is not sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, and `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events).
<!---## Configuration
Content will be added to this section.
## Metrics
Content will be added to this section.--->
Content will be added to this section.--->

View File

@ -10,25 +10,44 @@ nav_order: 45
## Overview
Takes unstructured data and utilizes pattern matching to structure and extract important keys and make data more structured and queryable.
The `Grok` processor takes unstructured data and utilizes pattern matching to structure and extract important keys. The following table describes options you can use with the `Grok` processor to structure your data and make your data easier to query.
Option | Required | Type | Description
:--- | :--- | :--- | :---
match | No | Map | Specifies which keys to match specific patterns against. Default is an empty body.
match | No | Map | Specifies which keys to match specific patterns against. Default value is an empty body.
keep_empty_captures | No | Boolean | Enables preserving `null` captures. Default value is `false`.
named_captures_only | No | Boolean | enables whether to keep only named captures. Default value is `true`.
break_on_match | No | Boolean | Specifies whether to match all patterns or stop once the first successful match is found. Default is `true`.
keys_to_overwrite | No | List | Specifies which existing keys are to be overwritten if there is a capture with the same key value. Default is `[]`.
named_captures_only | No | Boolean | Specifies whether to keep only named captures. Default value is `true`.
break_on_match | No | Boolean | Specifies whether to match all patterns or stop once the first successful match is found. Default value is `true`.
keys_to_overwrite | No | List | Specifies which existing keys will be overwritten if there is a capture with the same key value. Default value is `[]`.
pattern_definitions | No | Map | Allows for custom pattern use inline. Default value is an empty body.
patterns_directories | No | List | Specifies the path of directories that contain customer pattern files. Default value is an empty list.
pattern_files_glob | No | String | Specifies which pattern files to use from the directories specified for `pattern_directories`. Default is `*`.
target_key | No | String | Specifies a parent level key to store all captures. Default value is `null`.
timeout_millis | No | Integer | Maximum amount of time that should take place for the matching. Setting to `0` disables the timeout. Default value is `30,000`.
pattern_files_glob | No | String | Specifies which pattern files to use from the directories specified for `pattern_directories`. Default value is `*`.
target_key | No | String | Specifies a parent-level key used to store all captures. Default value is `null`.
timeout_millis | No | Integer | The maximum amount of time during which matching occurs. Setting to `0` disables the timeout. Default value is `30,000`.
<!---## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.--->
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | Metric representing the ingress of records to a pipeline component. |
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `Grok` processor includes the following custom metrics.
### Counter
* `grokProcessingMismatch`: Records the number of records that did not match any of the patterns specified in the match field.
* `grokProcessingMatch`: Records the number of records that matched at least one pattern from the `match` field.
* `grokProcessingErrors`: Records the total number of record processing errors.
* `grokProcessingTimeouts`: Records the total number of records that timed out while matching.
### Timer
* `grokProcessingTime`: The time taken by individual records to match against patterns from `match`. The `avg` metric is the most useful metric for this timer because it provides you with an average value of the time it takes records to match.

View File

@ -10,17 +10,17 @@ nav_order: 45
## Overview
Takes in a field and parses it into key/value pairs.
The `key_value` processor parses a field into key/value pairs. The following table describes `key_value` processor options available that help you parse field information into pairs.
Option | Required | Type | Description
:--- | :--- | :--- | :---
source | No | String | The key in the event that is parsed. Default value is `message`.
destination | No | String | The key where to output the parsed source to. Doing this overwrites the value of the key if it exists. Default value is `parsed_message`
destination | No | String | The destination key for the parsed source output. Outputting the parsed source overwrites the value of the key if it already exists. Default value is `parsed_message`
field_delimiter_regex | Conditionally | String | A regex specifying the delimiter between key/value pairs. Special regex characters such as `[` and `]` must be escaped using `\\`. This option cannot be defined at the same time as `field_split_characters`.
field_split_characters | Conditionally | String | A string of characters to split between key/value pairs. Special regex characters such as `[` and `]` must be escaped using `\\`. Default value is `&`. This option cannot be defined at the same time as `field_delimiter_regex`.
key_value_delimiter_regex| Conditionally | String | A regex specifying the delimiter between a key and a value. Special regex characters such as `[` and `]` must be escaped using `\\`. There is no default value. This option cannot be defined at the same time as `value_split_characters`.
value_split_characters | Conditionally | String | A string of characters to split between keys and values. Special regex characters such as `[` and `]` must be escaped using `\\`. Default value is `=`. This option cannot be defined at the same time as `key_value_delimiter_regex`.
non_match_value | No | String | When a key/value cannot be successfully split, the key/value is be placed in the key field and the specified value in the value field. Default value is `null`.
non_match_value | No | String | When a key/value cannot be successfully split, the key/value is placed in the `key` field, and the specified value is placed in the value field. Default value is `null`.
prefix | No | String | A prefix given to all keys. Default value is empty string.
delete_key_regex | No | String | A regex used to delete characters from the key. Special regex characters such as `[` and `]` must be escaped using `\\`. There is no default value.
delete_value_regex | No | String | A regex used to delete characters from the value. Special regex characters such as `[` and `]` must be escaped using `\\`. There is no default value.

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Converts a string to its lowercase counterpart. `lowercase_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
The `lowercase_string` processor converts a string to its lowercase counterpart and is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes options for configuring the `lowercase_string` processor to convert strings to a lowercase format.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,14 +10,14 @@ nav_order: 45
## Overview
This processor is a Data Prepper event record type replacement of `otel_trace_raw_prepper` (no longer supported since Data Prepper 2.0). The processor fills in trace group related fields including the following.
The `otel_trace_raw` processor completes trace-group-related fields in all incoming Data Prepper span records by state caching the root span information for each `tradeId`. This processor includes the following parameters.
* `traceGroup`: root span name
* `endTime`: end time of the entire trace in ISO 8601
* `durationInNanos`: duration of the entire trace in nanoseconds
* `statusCode`: status code for the entire trace in nanoseconds
* `traceGroup`: Root span name
* `endTime`: End time of the entire trace in International Organization for Standardization (ISO) 8601 format
* `durationInNanos`: Duration of the entire trace in nanoseconds
* `statusCode`: Status code for the entire trace in nanoseconds
in all incoming Data Prepper span records by state caching the root span info per traceId.
The following table describes the options you can use to configure the `otel_trace_raw` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
@ -25,8 +25,19 @@ trace_flush_interval | No | Integer | Represents the time interval in seconds to
<!---## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.--->
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | Metric representing the ingress of records to a pipeline component. |
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `otel_trace_raw` processor includes the following custom metrics.
* `traceGroupCacheCount`: The number of trace groups in the trace group cache.
* `spanSetCount`: The number of span sets in the span set collection.

View File

@ -1,20 +1,20 @@
---
layout: default
title: json
title: Parse JSON
parent: Processors
grand_parent: Pipelines
nav_order: 45
---
# json
# Parse JSON
## Overview
Takes in an event and parses its JSON data, including any nested fields.
The `parse_json` processor parses JSON data for an event, including any nested fields. The following table describes several optional parameters you can configure in the `parse_json` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
source | No | String | The field in the `Event` that will be parsed. Default is `message`.
source | No | String | The field in the `Event` that will be parsed. Default value is `message`.
destination | No | String | The destination field of the parsed JSON. Defaults to the root of the `Event`. Cannot be `""`, `/`, or any whitespace-only `String` because these are not valid `Event` fields.
pointer | No | String | A JSON Pointer to the field to be parsed. There is no `pointer` by default, meaning the entire `source` is parsed. The `pointer` can access JSON Array indices as well. If the JSON Pointer is invalid then the entire `source` data is parsed into the outgoing `Event`. If the pointed-to key already exists in the `Event` and the `destination` is the root, then the pointer uses the entire path of the key.

View File

@ -8,7 +8,7 @@ nav_order: 25
# Processors
Processors perform some action on your data: filter, transform, enrich, etc.
Processors perform an action on your data, such as filtering, transforming, or enriching.
Prior to Data Prepper 1.3, Processors were named Preppers. Starting in Data Prepper 1.3, the term Prepper is deprecated in favor of Processor. Data Prepper will continue to support the term "Prepper" until 2.0, where it will be removed.
Prior to Data Prepper 1.3, processors were named preppers. Starting in Data Prepper 1.3, the term *prepper* is deprecated in favor of the term *processor*. Data Prepper will continue to support the term *prepper* until 2.0, where it will be removed.
{: .note }

View File

@ -10,7 +10,7 @@ nav_order: 44
## Overview
Rename keys in an event. `rename_keys` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
The `rename_keys` processor renames keys in an event and is a [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processor. The following table describes the options you can use to configure the `rename_keys` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -4,7 +4,6 @@ title: routes
parent: Processors
grand_parent: Pipelines
nav_order: 45
---
# Routes

View File

@ -10,16 +10,27 @@ nav_order: 45
## Overview
Uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards.
The `service_map_stateful` processor uses OpenTelemetry data to create a distributed service map for visualization in OpenSearch Dashboards. The following table describes the option you can use to configure the `service_map_stateful` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---
window_duration | No | Integer | Represents the fixed time window in seconds to evaluate service-map relationships. Default is 180.
window_duration | No | Integer | Represents the fixed time window, in seconds, during which service map relationships are evaluated. Default value is 180.
<!---## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section.--->
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
| Metric name | Type | Description |
| ------------- | ---- | -----------|
| `recordsIn` | Counter | Metric representing the ingress of records to a pipeline component. |
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `service-map-stateful` processor includes following custom metrics:
* `traceGroupCacheCount`: The number of trace groups in the trace group cache.
* `spanSetCount`: The number of span sets in the span set collection.

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Splits a field into an array using a delimiter character. `split_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
The `split_string` processor splits a field into an array using a delimiting character and is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes the options you can use to configure the `split_string` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Converts string to uppercase or lowercase. Mostly useful as an example if you want to develop your own processor.
The `string_converter` processor converts a string to uppercase or lowercase. You can use it as an example for developing your own processor. The following table describes the option you can use to configure the `string_converter` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Matches a key's value against a regular expression and replaces all matches with a replacement string. `substitute_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
The `substitute_string` processor matches a key's value against a regular expression and replaces all matches with a replacement string. `substitute_string` is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes the options you can use to configure the `substitue_string` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Strips whitespace from the beginning and end of a key. `trim_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
The `trim_string` processor removes whitespace from the beginning and end of a key and is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes the option you can use to configure the `trim_string` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Converts a string to its uppercase counterpart. `uppercase_string` is part of [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processors.
The `uppercase_string` processor converts an entire string to uppercase and is a [mutate string](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-string-processors#mutate-string-processors) processor. The following table describes the option you can use to configure the `uppercase_string` processor.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
The file sink creates a flat file output.
You can use the `file` sink to create a flat file output. The following table describes options you can configure for the `file` sink.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -10,27 +10,27 @@ nav_order: 45
## Overview
Sink for an OpenSearch cluster.
You can use the `OpenSearch` sink to send data to an OpenSearch, Amazon OpenSearch Service, or Elasticsearch cluster using the REST client. The following table describes options you can configure for the `OpenSearch` sink.
Option | Required | Type | Description
:--- | :--- | :--- | :---
hosts | Yes | List | List of OpenSearch hosts to write to (e.g. `["https://localhost:9200", "https://remote-cluster:9200"]`).
cert | No | String | Path to the security certificate (e.g. `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
hosts | Yes | List | List of OpenSearch hosts to write to (for example, `["https://localhost:9200", "https://remote-cluster:9200"]`).
cert | No | String | Path to the security certificate (for example, `"config/root-ca.pem"`) if the cluster uses the OpenSearch security plugin.
username | No | String | Username for HTTP basic authentication.
password | No | String | Password for HTTP basic authentication.
aws_sigv4 | No | Boolean | default false. Whether to use IAM signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
aws_region | No | String | AWS region (e.g. `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
aws_sts_role_arn | No | String | IAM role which the sink plugin assumes to sign request to Amazon OpenSearch Service. If not provided, the plugin uses the default credentials.
socket_timeout | No | Integer | the timeout in milliseconds for waiting for data (or, put differently, a maximum period inactivity between two consecutive data packets). A timeout value of zero is interpreted as an infinite timeout. If this timeout value is negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing socket timeouts.
aws_sigv4 | No | Boolean | Default value is false. Whether to use AWS Identity and Access Management (IAM) signing to connect to an Amazon OpenSearch Service domain. For your access key, secret key, and optional session token, Data Prepper uses the default credential chain (environment variables, Java system properties, `~/.aws/credential`, etc.).
aws_region | No | String | The AWS region (for example, `"us-east-1"`) for the domain if you are connecting to Amazon OpenSearch Service.
aws_sts_role_arn | No | String | IAM role that the Sink plugin uses to sign requests to Amazon OpenSearch Service. If this information is not provided, the plugin uses the default credentials.
socket_timeout | No | Integer | The timeout, in milliseconds, waiting for data to return (or the maximum period of inactivity between two consecutive data packets). A timeout value of zero is interpreted as an infinite timeout. If this timeout value is negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing socket timeouts.
connect_timeout | No | Integer | The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout. If this timeout value is negative or not set, the underlying Apache HttpClient would rely on operating system settings for managing connection timeouts.
insecure | No | Boolean | Whether to verify SSL certificates. If set to true, CA certificate verification is disabled and insecure HTTP requests are sent instead. Default is `false`.
insecure | No | Boolean | Whether or not to verify SSL certificates. If set to true, certificate authority (CA) certificate verification is disabled and insecure HTTP requests are sent instead. Default value is `false`.
proxy | No | String | The address of a [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is "&lt;host name or IP&gt;:&lt;port&gt;". Examples: "example.com:8100", "http://example.com:8100", "112.112.112.112:8100". Port number cannot be omitted.
index | Conditionally | String | Name of the export index. Applicable and required only when the `index_type` is `custom`.
index_type | No | String | This index type tells the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, `management-disabled`. Default is `custom`.
template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (e.g. `/your/local/template-file.json`) if `index_type` is `custom`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (e.g. `"my-field"`) if `index_type` is `custom`.
dlq_file | No | String | The path to your preferred dead letter queue file (e.g. `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
bulk_size | No | Integer (long) | The maximum size (in MiB) of bulk requests to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually. Default is 5.
index_type | No | String | This index type tells the Sink plugin what type of data it is handling. Valid values: `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, `management-disabled`. Default value is `custom`.
template_file | No | String | Path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file (for example, `/your/local/template-file.json`) if `index_type` is `custom`. See [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json) for an example.
document_id_field | No | String | The field from the source data to use for the OpenSearch document ID (for example, `"my-field"`) if `index_type` is `custom`.
dlq_file | No | String | The path to your preferred dead letter queue file (for example, `/your/local/dlq-file`). Data Prepper writes to this file when it fails to index a document on the OpenSearch cluster.
bulk_size | No | Integer (long) | The maximum size (in MiB) of bulk requests sent to the OpenSearch cluster. Values below 0 indicate an unlimited size. If a single document exceeds the maximum bulk request size, Data Prepper sends it individually. Default value is 5.
ism_policy_file | No | String | The absolute file path for an ISM (Index State Management) policy JSON file. This policy file is effective only when there is no built-in policy file for the index type. For example, `custom` index type is currently the only one without a built-in policy file, thus it would use the policy file here if it's provided through this parameter. For more information, see [ISM policies]({{site.url}}{{site.baseurl}}/im-plugin/ism/policies/).
number_of_shards | No | Integer | The number of primary shards that an index should have on the destination OpenSearch server. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/).
number_of_replicas | No | Integer | The number of replica shards each primary shard should have on the destination OpenSearch server. For example, if you have 4 primary shards and set number_of_replicas to 3, the index has 12 replica shards. This parameter is effective only when `template_file` is either explicitly provided in Sink configuration or built-in. If this parameter is set, it would override the value in index template file. For more information, see [Create index]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/).

View File

@ -10,7 +10,7 @@ nav_order: 45
## Overview
Sink for writing to another pipeline.
You can use the `pipeline` sink to write to another pipeline.
Option | Required | Type | Description
:--- | :--- | :--- | :---

View File

@ -12,6 +12,8 @@ Sinks define where Data Prepper writes your data to.
## General options for all sink types
The following table describes options you can use to configure the `sinks` sink.
Option | Required | Type | Description
:--- | :--- | :--- | :---
routes | No | List | List of routes that the sink accepts. If not specified, the sink accepts all upstream events.

View File

@ -10,11 +10,9 @@ nav_order: 45
## Overview
The stdout sink can be used for console output and can be useful for testing. It has no configurable options.
You can use the `stdout` sink for console output and testing. It has no configurable options.
<!--- ## Configuration
Content will be added to this section.
<!---
## Metrics

View File

@ -8,31 +8,50 @@ nav_order: 5
# http_source
This is a source plugin that supports HTTP protocol. Currently ONLY support Json UTF-8 codec for incoming request, e.g. `[{"key1": "value1"}, {"key2": "value2"}]`.
`http_source` is a source plugin that supports HTTP. Currently, `http_source` only supports the JSON UTF-8 codec for incoming requests, such as `[{"key1": "value1"}, {"key2": "value2"}]`. The following table describes options you can use to configure the `http_source` source.
Option | Required | Type | Description
:--- | :--- | :--- | :---
port | No | Integer | The port the source is running on. Default is `2021`. Valid options are between `0` and `65535`.
health_check_service | No | Boolean | Enables health check service on `/health` endpoint on the defined port. Default is `false`.
unauthenticated_health_check | No | Boolean | Determines whether or not authentication is required on the health check endpoint. Data Prepper ignores this option if no authentication is defined. Default is `false`.
request_timeout | No | Integer | The request timeout in millis. Default is `10_000`.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
max_pending_requests | No | Integer | The maximum number of allowed tasks in ScheduledThreadPool work queue. Default is `1024`.
authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/org/opensearch/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
ssl | No | Boolean | Enables TLS/SSL. Default is false.
ssl_certificate_file | Conditionally | String | SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is true and `use_acm_certificate_for_ssl` is false.
ssl_key_file | Conditionally | String | SSL key file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is true and `use_acm_certificate_for_ssl` is false.
use_acm_certificate_for_ssl | No | Boolean | Enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is false.
acm_certificate_arn | Conditionally | String | ACM certificate ARN. The ACM certificate takes preference over S3 or a local file system certificate. Required if `use_acm_certificate_for_ssl` is set to true.
port | No | Integer | The port that the source is running on. Default value is `2021`. Valid options are between `0` and `65535`.
health_check_service | No | Boolean | Enables the health check service on the `/health` endpoint on the defined port. Default value is `false`.
unauthenticated_health_check | No | Boolean | Determines whether or not authentication is required on the health check endpoint. Data Prepper ignores this option if no authentication is defined. Default value is `false`.
request_timeout | No | Integer | The request timeout, in milliseconds. Default value is `10000`.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default value is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`.
max_pending_requests | No | Integer | The maximum allowed number of tasks in the `ScheduledThreadPool` work queue. Default value is `1024`.
authentication | No | Object | An authentication configuration. By default, this creates an unauthenticated server for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [ArmeriaHttpAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/ArmeriaHttpAuthenticationProvider.java).
ssl | No | Boolean | Enables TLS/SSL. Default value is false.
ssl_certificate_file | Conditionally | String | SSL certificate chain file path or Amazon Simple Storage Service (Amazon S3) path. Amazon S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to true and `use_acm_certificate_for_ssl` is set to false.
ssl_key_file | Conditionally | String | SSL key file path or Amazon S3 path. Amazon S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to true and `use_acm_certificate_for_ssl` is set to false.
use_acm_certificate_for_ssl | No | Boolean | Enables a TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default value is false.
acm_certificate_arn | Conditionally | String | The ACM certificate Amazon Resource Name (ARN). The ACM certificate takes preference over Amazon S3 or a local file system certificate. Required if `use_acm_certificate_for_ssl` is set to true.
acm_private_key_password | No | String | ACM private key password that decrypts the private key. If not provided, Data Prepper generates a random password.
acm_certificate_timeout_millis | No | Integer | Timeout in milliseconds for ACM to get certificates. Default is 120000.
aws_region | Conditionally | String | AWS region to use ACM or S3. Required if `use_acm_certificate_for_ssl` is set to true or `ssl_certificate_file` and `ssl_key_file` is AWS S3 path.
acm_certificate_timeout_millis | No | Integer | Timeout, in milliseconds, that ACM takes to get certificates. Default value is 120000.
aws_region | Conditionally | String | AWS region used by ACM or Amazon S3. Required if `use_acm_certificate_for_ssl` is set to true or `ssl_certificate_file` and `ssl_key_file` is the Amazon S3 path.
<!--- ## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section. --->
The `http_source` source includes the following metrics.
### Counters
- `requestsReceived`: Measures the total number of requests received by the `/log/ingest` endpoint.
- `requestsRejected`: Measures the total number of requests rejected (429 response status code) by the HTTP Source plugin.
- `successRequests`: Measures the total number of requests successfully processed (200 response status code) the by HTTP Source plugin.
- `badRequests`: Measures the total number of requests with either an invalid content type or format processed by the HTTP Source plugin (400 response status code).
- `requestTimeouts`: Measures the total number of requests that time out in the HTTP source server (415 response status code).
- `requestsTooLarge`: Measures the total number of requests where the size of the event is larger than the buffer capacity (413 response status code).
- `internalServerError`: Measures the total number of requests processed by the HTTP Source with a custom exception type (500 response status code).
### Timers
- `requestProcessDuration`: Measures the latency of requests processed by the HTTP Source plugin in seconds.
### Distribution summaries
- `payloadSize`: Measures the incoming request payload size in bytes.

View File

@ -8,29 +8,35 @@ nav_order: 10
# otel_metrics_source
Source for the OpenTelemetry Collector for collecting metric data.
`otel_metrics_source` is an OpenTelemetry Collector source that collects metric data. The following table describes options you can use to configure the `otel_metrics_source` source.
Option | Required | Type | Description
:--- | :--- | :--- | :---
port | No | Integer | The port OTel metrics source is running on. Default is `21891`.
request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`.
health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if `ssl` is set to `true`.
sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
port | No | Integer | The port that the OpenTelemtry metrics source runs on. Default value is `21891`.
request_timeout | No | Integer | The request timeout, in milliseconds. Default value is `10000`.
health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default value is `false`.
proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default value is `false`.
unframed_requests | No | Boolean | Enables requests not framed using the gRPC wire protocol.
thread_count | No | Integer | The number of threads to keep in the `ScheduledThreadPool`. Default value is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`.
ssl | No | Boolean | Enables connections to the OpenTelemetry source port over TLS/SSL. Default value is `true`.
sslKeyCertChainFile | Conditionally | String | File-system path or Amazon Simple Storage Service (Amazon S3) path to the security certificate (for example, `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if `ssl` is set to `true`.
sslKeyFile | Conditionally | String | File-system path or Amazon S3 path to the security key (for example, `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using a certificate and private key from AWS Certificate Manager (ACM). Default value is `false`.
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificates. Required if `useAcmCertForSSL` is set to `true`.
awsRegion | Conditionally | String | Represents the AWS Region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/org/opensearch/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
awsRegion | Conditionally | String | Represents the AWS Region used by ACM or Amazon S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` is the Amazon S3 path.
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
<!--- ## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section. --->
The `otel_metrics_source` source includes the following metrics.
### Counters
- `requestTimeouts`: Measures the total number of requests that time out.
- `requestsReceived`: Measures the total number of requests received by the OpenTelemetry metrics source.

View File

@ -11,30 +11,48 @@ nav_order: 15
## Overview
Source for the OpenTelemetry Collector.
The `otel_trace` source is a source for the OpenTelemetry Collector. The following table describes options you can use to configure the `otel_trace` source.
<!--- What does otel_trace_source do? Other plugins include that in the overview section.--->
Option | Required | Type | Description
:--- | :--- | :--- | :---
port | No | Integer | The port OTel trace source is running on. Default is `21890`.
request_timeout | No | Integer | The request timeout in milliseconds. Default is `10_000`.
health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default is `false`.
unauthenticated_health_check | No | Boolean | Determines whether or not authentication is required on the health check endpoint. Data Prepper ignores this option if no authentication is defined. Default is `false`.
proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default is `false`.
port | No | Integer | The port that the `otel_trace` source runs on. Default value is `21890`.
request_timeout | No | Integer | The request timeout, in milliseconds. Default value is `10000`.
health_check_service | No | Boolean | Enables a gRPC health check service under `grpc.health.v1/Health/Check`. Default value is `false`.
unauthenticated_health_check | No | Boolean | Determines whether or not authentication is required on the health check endpoint. Data Prepper ignores this option if no authentication is defined. Default value is `false`.
proto_reflection_service | No | Boolean | Enables a reflection service for Protobuf services (see [gRPC reflection](https://github.com/grpc/grpc/blob/master/doc/server-reflection.md) and [gRPC Server Reflection Tutorial](https://github.com/grpc/grpc-java/blob/master/documentation/server-reflection-tutorial.md) docs). Default value is `false`.
unframed_requests | No | Boolean | Enable requests not framed using the gRPC wire protocol.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default is `500`.
thread_count | No | Integer | The number of threads to keep in the ScheduledThreadPool. Default value is `200`.
max_connection_count | No | Integer | The maximum allowed number of open connections. Default value is `500`.
ssl | No | Boolean | Enables connections to the OTel source port over TLS/SSL. Defaults to `true`.
sslKeyCertChainFile | Conditionally | String | File-system path or AWS S3 path to the security certificate (e.g. `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if `ssl` is set to `true`.
sslKeyFile | Conditionally | String | File-system path or AWS S3 path to the security key (e.g. `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
sslKeyCertChainFile | Conditionally | String | File system path or Amazon Simple Storage Service (Amazon S3) path to the security certificate (for example, `"config/demo-data-prepper.crt"` or `"s3://my-secrets-bucket/demo-data-prepper.crt"`). Required if `ssl` is set to `true`.
sslKeyFile | Conditionally | String | File system path or Amazon S3 path to the security key (for example, `"config/demo-data-prepper.key"` or `"s3://my-secrets-bucket/demo-data-prepper.key"`). Required if `ssl` is set to `true`.
useAcmCertForSSL | No | Boolean | Whether to enable TLS/SSL using a certificate and private key from AWS Certificate Manager (ACM). Default value is `false`.
acmCertificateArn | Conditionally | String | Represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `useAcmCertForSSL` is set to `true`.
awsRegion | Conditionally | String | Represents the AWS region to use ACM or S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are AWS S3 paths.
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This parameter uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/org/opensearch/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
awsRegion | Conditionally | String | Represents the AWS region used by ACM or Amazon S3. Required if `useAcmCertForSSL` is set to `true` or `sslKeyCertChainFile` and `sslKeyFile` are Amazon S3 paths.
authentication | No | Object | An authentication configuration. By default, an unauthenticated server is created for the pipeline. This parameter uses pluggable authentication for HTTPS. To use basic authentication, define the `http_basic` plugin with a `username` and `password`. To provide customer authentication, use or create a plugin that implements [GrpcAuthenticationProvider](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/armeria-common/src/main/java/com/amazon/dataprepper/armeria/authentication/GrpcAuthenticationProvider.java).
<!--- ## Configuration
Content will be added to this section.
Content will be added to this section.--->
## Metrics
Content will be added to this section. --->
### Counters
- `requestTimeouts`: Measures the total number of requests that time out.
- `requestsReceived`: Measures the total number of requests received by the `otel_trace` source.
- `successRequests`: Measures the total number of requests successfully processed by the `otel_trace` source plugin.
- `badRequests`: Measures the total number of requests with an invalid format processed by the `otel_trace` source plugin.
- `requestsTooLarge`: Measures the total number of requests whose number of spans exceeds the buffer capacity.
- `internalServerError`: Measures the total number of requests processed by the `otel_trace` source with a custom exception type.
### Timers
- `requestProcessDuration`: Measures the latency of requests processed by the `otel_trace` source plugin in seconds.
### Distribution summaries
- `payloadSize`: Measures the incoming request payload size distribution in bytes.

View File

@ -10,32 +10,32 @@ nav_order: 20
## Overview
This is a source plugin that reads events from [Amazon Simple Storage Service](https://aws.amazon.com/s3/) (Amazon S3) objects.
`s3` is a source plugin that reads events from [Amazon Simple Storage Service (S3)](https://aws.amazon.com/s3/) (Amazon S3) objects. The following table describes options you can use to configure the `s3` source.
Option | Required | Type | Description
:--- | :--- | :--- | :---
notification_type | Yes | String | Must be `sqs`
compression | No | String | The compression algorithm to apply: `none`, `gzip`, or `automatic`. Default is `none`.
codec | Yes | Codec | The codec to apply. Must be `newline`, `json`, or `csv`.
sqs | Yes | sqs | The [Amazon Simple Queue Service](https://aws.amazon.com/sqs/) (Amazon SQS) configuration. See [sqs](#sqs) for details.
notification_type | Yes | String | Must be `sqs`.
compression | No | String | The compression algorithm to apply: `none`, `gzip`, or `automatic`. Default value is `none`.
codec | Yes | Codec | The codec to apply. Must be `newline`, `json`, or `csv`.
sqs | Yes | sqs | The [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/sqs/) (Amazon SQS) configuration. See [sqs](#sqs) for details.
aws | Yes | aws | The AWS configuration. See [aws](#aws) for details.
on_error | No | String | Determines how to handle errors in Amazon SQS. Can be either `retain_messages` or `delete_messages`. If `retain_messages`, then Data Prepper will leave the message in the SQS queue and try again. This is recommended for dead-letter queues. If `delete_messages`, then Data Prepper will delete failed messages. Default is `retain_messages`.
buffer_timeout | No | Duration | The timeout for writing events to the Data Prepper buffer. Any events that the S3Source cannot write to the buffer in this time will be discarded. Default is 10 seconds.
records_to_accumulate | No | Integer | The number of messages that accumulate before writing to the buffer. Default is 100.
on_error | No | String | Determines how to handle errors in Amazon SQS. Can be either `retain_messages` or `delete_messages`. If `retain_messages`, then Data Prepper will leave the message in the Amazon SQS queue and try again. This is recommended for dead-letter queues. If `delete_messages`, then Data Prepper will delete failed messages. Default value is `retain_messages`.
buffer_timeout | No | Duration | The amount of time allowed for for writing events to the Data Prepper buffer before timeout occurs. Any events that the Amazon S3 source cannot write to the buffer in this time will be discarded. Default value is 10 seconds.
records_to_accumulate | No | Integer | The number of messages that accumulate before writing to the buffer. Default value is 100.
metadata_root_key | No | String | Base key for adding S3 metadata to each Event. The metadata includes the key and bucket for each S3 object. Defaults to `s3/`.
disable_bucket_ownership_validation | No | Boolean | If `true`, the S3Source will not attempt to validate that the bucket is owned by the expected account. The expected account is the same account that owns the SQS queue. Defaults to `false`.
disable_bucket_ownership_validation | No | Boolean | If `true`, the S3Source will not attempt to validate that the bucket is owned by the expected account. The expected account is the same account that owns the Amazon SQS queue. Defaults to `false`.
## sqs
The following parameters allow you to configure usage for Amazon SQS in the S3Source plugin.
The following parameters allow you to configure usage for Amazon SQS in the `s3` source plugin.
Option | Required | Type | Description
:--- | :--- | :--- | :---
queue_url | Yes | String | The URL of the Amazon SQS queue from which messages are received.
maximum_messages | No | Integer | The maximum number of messages to receive from the SQS queue in any single request. Default is `10`.
visibility_timeout | No | Duration | The visibility timeout to apply to messages read from the SQS queue. This should be set to the amount of time that Data Prepper may take to read all the S3 objects in a batch. Default is `30s`.
wait_time | No | Duration | The time to wait for long polling on the SQS API. Default is `20s`.
poll_delay | No | Duration | A delay to place between reading and processing a batch of SQS messages and making a subsequent request. Default is `0s`.
maximum_messages | No | Integer | The maximum number of messages to receive from the Amazon SQS queue in any single request. Default value is `10`.
visibility_timeout | No | Duration | The visibility timeout to apply to messages read from the Amazon SQS queue. This should be set to the amount of time that Data Prepper may take to read all the Amazon S3 objects in a batch. Default value is `30s`.
wait_time | No | Duration | The amount of time to wait for long polling on the Amazon SQS API. Default value is `20s`.
poll_delay | No | Duration | A delay to place between reading/processing a batch of Amazon SQS messages and making a subsequent request. Default value is `0s`.
## aws
@ -52,9 +52,9 @@ Source for flat file input.
Option | Required | Type | Description
:--- | :--- | :--- | :---
path | Yes | String | Path to the input file (e.g. `logs/my-log.log`).
format | No | String | Format of each line in the file. Valid options are `json` or `plain`. Default is `plain`.
record_type | No | String | The record type to store. Valid options are `string` or `event`. Default is `string`. If you would like to use the file source for log analytics use cases like grok, set this option to `event`.
path | Yes | String | The path to the input file (e.g. `logs/my-log.log`).
format | No | String | The format of each line in the file. Valid options are `json` or `plain`. Default value is `plain`.
record_type | No | String | The record type to store. Valid options are `string` or `event`. Default value is `string`. If you would like to use the file source for log analytics use cases like grok, set this option to `event`.
## pipeline
@ -64,7 +64,27 @@ Option | Required | Type | Description
:--- | :--- | :--- | :---
name | Yes | String | Name of the pipeline to read from.
## Metrics
## stdin
The `s3` processor includes the following metrics.
Source for console input. Can be useful for testing. No options.
### Counters
* `s3ObjectsFailed`: The number of Amazon S3 objects that the `s3` source failed to read.
* `s3ObjectsNotFound`: The number of Amazon S3 objects that the `s3` source failed to read due to an Amazon S3 "Not Found" error. These are also counted toward `s3ObjectsFailed`.
* `s3ObjectsAccessDenied`: The number of Amazon S3 objects that the `s3` source failed to read due to an "Access Denied" or "Forbidden" error. These are also counted toward `s3ObjectsFailed`.
* `s3ObjectsSucceeded`: The number of Amazon S3 objects that the `s3` source successfully read.
* `sqsMessagesReceived`: The number of Amazon SQS messages received from the queue by the `s3` source.
* `sqsMessagesDeleted`: The number of Amazon SQS messages deleted from the queue by the `s3` source.
* `sqsMessagesFailed`: The number of Amazon SQS messages that the `s3` source failed to parse.
### Timers
* `s3ObjectReadTimeElapsed`: Measures the amount of time the `s3` source takes to perform a request to GET an S3 object, parse it, and write events to the buffer.
* `sqsMessageDelay`: Measures the amount of time from when Amazon S3 creates an object to when it is fully parsed.
### Distribution summaries
* `s3ObjectSizeBytes`: Measures the size of Amazon S3 objects as reported by the Amazon S3 `Content-Length`. For compressed objects, this is the compressed size.
* `s3ObjectProcessedBytes`: Measures the bytes processed by the `s3` source for a given object. For compressed objects, this is the uncompressed size.
* `s3ObjectsEvents`: Measures the number of events (sometimes called records) produced by an Amazon S3 object.