Added documentation for grok_when option (#5912)

* Added documentation for grok_when option

Signed-off-by: Kondaka <krishkdk@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update _data-prepper/pipelines/configuration/processors/grok.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _data-prepper/pipelines/configuration/processors/grok.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _data-prepper/pipelines/configuration/processors/grok.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _data-prepper/pipelines/configuration/processors/grok.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _data-prepper/pipelines/configuration/processors/grok.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _data-prepper/pipelines/configuration/processors/grok.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update grok.md

Address editorial review feedback.

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update grok.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
kkondaka 2023-12-21 12:39:11 -08:00 committed by GitHub
parent 76276b2eb4
commit e63b67b873
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 26 additions and 10 deletions

View File

@ -1,27 +1,27 @@
---
layout: default
title: grok
title: Grok
parent: Processors
grand_parent: Pipelines
nav_order: 54
---
# grok
# Grok
The `Grok` processor takes unstructured data and utilizes pattern matching to structure and extract important keys.
The Grok processor uses pattern matching to structure and extract important keys from unstructured data.
## Configuration
The following table describes options you can use with the `Grok` processor to structure your data and make your data easier to query.
The following table describes options you can use with the Grok processor to structure your data and make your data easier to query.
Option | Required | Type | Description
:--- | :--- | :--- | :---
match | No | Map | Specifies which keys to match specific patterns against. Default value is an empty body.
keep_empty_captures | No | Boolean | Enables preserving `null` captures. Default value is `false`.
named_captures_only | No | Boolean | Specifies whether to keep only named captures. Default value is `true`.
break_on_match | No | Boolean | Specifies whether to match all patterns or stop once the first successful match is found. Default value is `true`.
grok_when | No | String | Specifies under what condition the `Grok` processor should perform matching. Default is no condition.
keep_empty_captures | No | Boolean | Enables the preservation of `null` captures. Default value is `false`.
keys_to_overwrite | No | List | Specifies which existing keys will be overwritten if there is a capture with the same key value. Default value is `[]`.
match | No | Map | Specifies which keys to match specific patterns against. Default value is an empty body.
named_captures_only | No | Boolean | Specifies whether to keep only named captures. Default value is `true`.
pattern_definitions | No | Map | Allows for custom pattern use inline. Default value is an empty body.
patterns_directories | No | List | Specifies the path of directories that contain customer pattern files. Default value is an empty list.
pattern_files_glob | No | String | Specifies which pattern files to use from the directories specified for `pattern_directories`. Default value is `*`.
@ -32,6 +32,22 @@ timeout_millis | No | Integer | The maximum amount of time during which matching
Content will be added to this section.--->
## Conditional grok
The Grok processor can be configured to run conditionally by using the `grok_when` option. The following is an example Grok processor configuration that uses `grok_when`:
```
processor:
- grok:
grok_when: '/type == "ipv4"'
match:
message: ['%{IPV4:clientip} %{WORD:request} %{POSINT:bytes}']
- grok:
grok_when: '/type == "ipv6"'
match:
message: ['%{IPV6:clientip} %{WORD:request} %{POSINT:bytes}']
```
The `grok_when` option can take a conditional expression. This expression is detailed in the [Expression syntax](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/) documentation.
## Metrics
The following table describes common [Abstract processor](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-api/src/main/java/org/opensearch/dataprepper/model/processor/AbstractProcessor.java) metrics.
@ -42,7 +58,7 @@ The following table describes common [Abstract processor](https://github.com/ope
| `recordsOut` | Counter | Metric representing the egress of records from a pipeline component. |
| `timeElapsed` | Timer | Metric representing the time elapsed during execution of a pipeline component. |
The `Grok` processor includes the following custom metrics.
The Grok processor includes the following custom metrics.
### Counter
@ -53,4 +69,4 @@ The `Grok` processor includes the following custom metrics.
### Timer
* `grokProcessingTime`: The time taken by individual records to match against patterns from `match`. The `avg` metric is the most useful metric for this timer because it provides you with an average value of the time it takes records to match.
* `grokProcessingTime`: The time taken by individual records to match against `match` patterns. The `avg` metric is the most useful metric for this timer because because it provides the average time taken to match records.