4.0 KiB

Raw Blame History

layout	title	parent	grand_parent	nav_order
default	dissect	Processors	Pipelines	52

dissect

The dissect processor extracts values from an event and maps them to individual fields based on user-defined dissect patterns. The processor is well suited for field extraction from log messages with a known structure.

Basic usage

To use the dissect processor, create the following pipeline.yaml file:

dissect-pipeline:
  source:
    file:
      path: "/full/path/to/logs_json.log"
      record_type: "event"
      format: "json"
  processor:
    - dissect:
        map:
          log: "%{Date} %{Time} %{Log_Type}: %{Message}"
  sink:
    - stdout:

Then create the following file named logs_json.log and replace the path in the file source of your pipeline.yaml file with the path of a file containing the following JSON data:

{"log": "07-25-2023 10:00:00 ERROR: error message"}

The dissect processor will retrieve the fields (Date, Time, Log_Type, and Message) from the log message, based on the pattern %{Date} %{Time} %{Type}: %{Message} configured in the pipeline.

After running the pipeline, you should receive the following standard output:

{
    "log" : "07-25-2023 10:00:00 ERROR: Some error",
    "Date" : "07-25-2023"
    "Time" : "10:00:00"
    "Log_Type" : "ERROR"
    "Message" : "error message"
}

Configuration

You can configure the dissect processor with the following options.

Option	Required	Type	Description
`map`	Yes	Map	Defines the `dissect` patterns for specific keys. For details on how to define fields in the `dissect` pattern, see Field notations.
`target_types`	No	Map	Specifies the data types for extract fields. Valid options are `integer`, `double`, `string`, and `boolean`. By default, all fields are of the `string` type.
`dissect_when`	No	String	Specifies a condition for performing the `dissect` operation using a Data Prepper expression. If specified, the `dissect` operation will only run when the expression evaluates to true.

Field notations

You can define dissect patterns with the following field types.

Normal field

A field without a suffix or prefix. The field will be directly added to the output event. The format is %{field_name}.

Skip field

A field that will not be included in the event. The format is %{} or %{?field_name}.

Append field

A field that will be combined with other fields. To append multiple values and include the final value in the field, use + before the field name in the dissect pattern. The format is %{+field_name}.

For example, with the pattern %{+field_name}, %{+field_name}, log message "foo, bar" will parse into {"field_name": "foobar"}.

You can also define the order of the concatenation with the help of the suffix /<integer>.

For example, with a pattern "%{+field_name/2}, %{+field_name/1}", log message "foo, bar" will parse into {"field_name": "barfoo"}.

If the order is not mentioned, the append operation will occur in the order of the fields specified in the dissect pattern.

Indirect field

A field that uses the value from another field as its field name. When defining a pattern, prefix the field with a & to assign the value found in the field as the key in the key-value pair.

For example, with a pattern "%{?field_name}, %{&field_name}", the log message "foo, bar" will parse into {“foo”: “bar”}. In the log message, foo is captured from the skip field %{?field_name}. foo then serves as the key to the value captured from the field %{&field_name}.

Padded field

A field with the paddings to the right removed. The -> operator can be used as a suffix to indicate that white spaces after this field can be ignored.

For example, with a pattern %{field1->} %{field2}, log message “firstname lastname” will parse into {“field1”: “firstname”, “field2”: “lastname”}.

4.0 KiB Raw Blame History