Add docs for Data Prepper dissect processor (#5159)
* Add docs for dissect processor Signed-off-by: Hai Yan <oeyh@amazon.com> * Fix some style issues Signed-off-by: Hai Yan <oeyh@amazon.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _data-prepper/pipelines/configuration/processors/dissect.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Hai Yan <oeyh@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
a52397e31f
commit
66db871345
|
@ -0,0 +1,96 @@
|
|||
---
|
||||
layout: default
|
||||
title: dissect
|
||||
parent: Processors
|
||||
grand_parent: Pipelines
|
||||
nav_order: 52
|
||||
---
|
||||
|
||||
# dissect
|
||||
|
||||
The `dissect` processor extracts values from an event and maps them to individual fields based on user-defined `dissect` patterns. The processor is well suited for field extraction from log messages with a known structure.
|
||||
|
||||
## Basic usage
|
||||
|
||||
To use the `dissect` processor, create the following `pipeline.yaml` file:
|
||||
|
||||
```yaml
|
||||
dissect-pipeline:
|
||||
source:
|
||||
file:
|
||||
path: "/full/path/to/logs_json.log"
|
||||
record_type: "event"
|
||||
format: "json"
|
||||
processor:
|
||||
- dissect:
|
||||
map:
|
||||
log: "%{Date} %{Time} %{Log_Type}: %{Message}"
|
||||
sink:
|
||||
- stdout:
|
||||
```
|
||||
|
||||
Then create the following file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with the path of a file containing the following JSON data:
|
||||
|
||||
```
|
||||
{"log": "07-25-2023 10:00:00 ERROR: error message"}
|
||||
```
|
||||
|
||||
The `dissect` processor will retrieve the fields (`Date`, `Time`, `Log_Type`, and `Message`) from the `log` message, based on the pattern `%{Date} %{Time} %{Type}: %{Message}` configured in the pipeline.
|
||||
|
||||
After running the pipeline, you should receive the following standard output:
|
||||
|
||||
```
|
||||
{
|
||||
"log" : "07-25-2023 10:00:00 ERROR: Some error",
|
||||
"Date" : "07-25-2023"
|
||||
"Time" : "10:00:00"
|
||||
"Log_Type" : "ERROR"
|
||||
"Message" : "error message"
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
You can configure the `dissect` processor with the following options.
|
||||
|
||||
| Option | Required | Type | Description |
|
||||
| :--- | :--- | :--- | :--- |
|
||||
| `map` | Yes | Map | Defines the `dissect` patterns for specific keys. For details on how to define fields in the `dissect` pattern, see [Field notations](#field-notations). |
|
||||
| `target_types` | No | Map | Specifies the data types for extract fields. Valid options are `integer`, `double`, `string`, and `boolean`. By default, all fields are of the `string` type. |
|
||||
| `dissect_when` | No | String | Specifies a condition for performing the `dissect` operation using a [Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). If specified, the `dissect` operation will only run when the expression evaluates to true. |
|
||||
|
||||
### Field notations
|
||||
|
||||
You can define `dissect` patterns with the following field types.
|
||||
|
||||
#### Normal field
|
||||
|
||||
A field without a suffix or prefix. The field will be directly added to the output event. The format is `%{field_name}`.
|
||||
|
||||
#### Skip field
|
||||
|
||||
A field that will not be included in the event. The format is `%{}` or `%{?field_name}`.
|
||||
|
||||
#### Append field
|
||||
|
||||
A field that will be combined with other fields. To append multiple values and include the final value in the field, use `+` before the field name in the `dissect` pattern. The format is `%{+field_name}`.
|
||||
|
||||
For example, with the pattern `%{+field_name}, %{+field_name}`, log message `"foo, bar"` will parse into `{"field_name": "foobar"}`.
|
||||
|
||||
You can also define the order of the concatenation with the help of the suffix `/<integer>`.
|
||||
|
||||
For example, with a pattern `"%{+field_name/2}, %{+field_name/1}"`, log message `"foo, bar"` will parse into `{"field_name": "barfoo"}`.
|
||||
|
||||
If the order is not mentioned, the append operation will occur in the order of the fields specified in the `dissect` pattern.
|
||||
|
||||
#### Indirect field
|
||||
|
||||
A field that uses the value from another field as its field name. When defining a pattern, prefix the field with a `&` to assign the value found in the field as the key in the key-value pair.
|
||||
|
||||
For example, with a pattern `"%{?field_name}, %{&field_name}"`, the log message `"foo, bar"` will parse into `{“foo”: “bar”}`. In the log message, `foo` is captured from the skip field `%{?field_name}`. `foo` then serves as the key to the value captured from the field `%{&field_name}`.
|
||||
|
||||
#### Padded field
|
||||
|
||||
A field with the paddings to the right removed. The `->` operator can be used as a suffix to indicate that white spaces after this field can be ignored.
|
||||
|
||||
For example, with a pattern `%{field1->} %{field2}`, log message `“firstname lastname”` will parse into `{“field1”: “firstname”, “field2”: “lastname”}`.
|
|
@ -3,7 +3,7 @@ layout: default
|
|||
title: drop_events
|
||||
parent: Processors
|
||||
grand_parent: Pipelines
|
||||
nav_order: 52
|
||||
nav_order: 53
|
||||
---
|
||||
|
||||
# drop_events
|
||||
|
|
|
@ -3,7 +3,7 @@ layout: default
|
|||
title: grok
|
||||
parent: Processors
|
||||
grand_parent: Pipelines
|
||||
nav_order: 53
|
||||
nav_order: 54
|
||||
---
|
||||
|
||||
# grok
|
||||
|
|
|
@ -3,7 +3,7 @@ layout: default
|
|||
title: key_value
|
||||
parent: Processors
|
||||
grand_parent: Pipelines
|
||||
nav_order: 54
|
||||
nav_order: 56
|
||||
---
|
||||
|
||||
# key_value
|
||||
|
|
|
@ -3,7 +3,7 @@ layout: default
|
|||
title: list_to_map
|
||||
parent: Processors
|
||||
grand_parent: Pipelines
|
||||
nav_order: 55
|
||||
nav_order: 58
|
||||
---
|
||||
|
||||
# list_to_map
|
||||
|
|
Loading…
Reference in New Issue