Merge pull request #16946 from dedemorton/ingest_doc_edit

Improve the ingest documentation.
This commit is contained in:
Martijn van Groningen 2016-03-04 11:49:19 +01:00
commit 116acee1dd
4 changed files with 220 additions and 158 deletions

View File

@ -1,10 +1,10 @@
[[ingest-attachment]] [[ingest-attachment]]
=== Ingest Attachment Processor Plugin === Ingest Attachment Processor Plugin
The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, PDF) The ingest attachment plugin lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by
using the Apache text extraction library http://lucene.apache.org/tika/[Tika]. using the Apache text extraction library http://lucene.apache.org/tika/[Tika].
It can be used as replacement for the mapper attachment plugin. You can use the ingest attachment plugin as a replacement for the mapper attachment plugin.
The source field must be a base64 encoded binary. The source field must be a base64 encoded binary.
@ -16,7 +16,7 @@ The source field must be a base64 encoded binary.
| `source_field` | yes | - | The field to get the base64 encoded field from | `source_field` | yes | - | The field to get the base64 encoded field from
| `target_field` | no | attachment | The field that will hold the attachment information | `target_field` | no | attachment | The field that will hold the attachment information
| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit. | `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit.
| `fields` | no | all | Properties to select to be stored, can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language` | `fields` | no | all | Properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language`
|====== |======
[source,js] [source,js]

View File

@ -7,7 +7,7 @@ This processor adds this information by default under the `geoip` field.
The ingest-geoip plugin ships by default with the GeoLite2 City and GeoLite2 Country geoip2 databases from Maxmind made available The ingest-geoip plugin ships by default with the GeoLite2 City and GeoLite2 Country geoip2 databases from Maxmind made available
under the CCA-ShareAlike 3.0 license. For more details see, http://dev.maxmind.com/geoip/geoip2/geolite2/ under the CCA-ShareAlike 3.0 license. For more details see, http://dev.maxmind.com/geoip/geoip2/geolite2/
The GeoIP processor can run with other geoip2 databases from Maxmind. The files must be copied into the geoip config directory The GeoIP processor can run with other geoip2 databases from Maxmind. The files must be copied into the geoip config directory,
and the `database_file` option should be used to specify the filename of the custom database. The geoip config directory and the `database_file` option should be used to specify the filename of the custom database. The geoip config directory
is located at `$ES_HOME/config/ingest/geoip` and holds the shipped databases too. is located at `$ES_HOME/config/ingest/geoip` and holds the shipped databases too.
@ -24,13 +24,13 @@ is located at `$ES_HOME/config/ingest/geoip` and holds the shipped databases too
*Depends on what is available in `database_field`: *Depends on what is available in `database_field`:
* If the GeoLite2 City database is used then the following fields may be added under the `target_field`: `ip`, * If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude` `country_iso_code`, `country_name`, `continent_name`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
and `location`. The fields actually added depend on what has been found and which fields were configured in `fields`. and `location`. The fields actually added depend on what has been found and which fields were configured in `fields`.
* If the GeoLite2 Country database is used then the following fields may be added under the `target_field`: `ip`, * If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name` and `continent_name`.The fields actually added depend on what has been found and which fields were configured in `fields`. `country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which fields were configured in `fields`.
An example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field: Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -46,7 +46,7 @@ An example that uses the default city database and adds the geographical informa
} }
-------------------------------------------------- --------------------------------------------------
An example that uses the default country database and add the geographical information to the `geo` field based on the `ip` field`: Here is an example that uses the default country database and adds the geographical information to the `geo` field based on the `ip` field`:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------

View File

@ -3,22 +3,25 @@
[partintro] [partintro]
-- --
Ingest node can be used to pre-process documents before the actual indexing takes place. You can use ingest node to pre-process documents before the actual indexing takes place.
This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the
transformations and then passes the documents back to the index or bulk APIs. transformations, and then passes the documents back to the index or bulk APIs.
Ingest node is enabled by default. In order to disable ingest the following You can enable ingest on any node or even have dedicated ingest nodes. Ingest is enabled by default
setting should be configured in the elasticsearch.yml file: on all nodes. To disable ingest on a node, configure the following setting in the `elasticsearch.yml` file:
[source,yaml] [source,yaml]
-------------------------------------------------- --------------------------------------------------
node.ingest: false node.ingest: false
-------------------------------------------------- --------------------------------------------------
It is possible to enable ingest on any node or have dedicated ingest nodes. To pre-process documents before indexing, you <<pipe-line,define a pipeline>> that specifies
a series of <<ingest-processors,processors>>. Each processor transforms the document in some way.
For example, you may have a pipeline that consists of one processor that removes a field from
the document followed by another processor that renames a field.
In order to pre-process document before indexing the `pipeline` parameter should be used To use a pipeline, you simply specify the `pipeline` parameter on an index or bulk request to
on an index or bulk request to tell Ingest what pipeline is going to be used. tell the ingest node which pipeline to use. For example:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -29,6 +32,8 @@ PUT /my-index/my-type/my-id?pipeline=my_pipeline_id
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE // AUTOSENSE
See <<ingest-apis,Ingest APIs>> for more information about creating, adding, and deleting pipelines.
-- --
include::ingest/ingest-node.asciidoc[] include::ingest/ingest-node.asciidoc[]

View File

@ -1,8 +1,10 @@
[[pipe-line]] [[pipe-line]]
== Pipeline Definition == Pipeline Definition
A pipeline is a definition of a series of processors that are to be A pipeline is a definition of a series of <<ingest-processors, processors>> that are to be executed
executed in the same sequential order as they are declared. in the same order as they are declared. A pipeline consists of two main fields: a `description`
and a list of `processors`:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
@ -12,16 +14,25 @@ executed in the same sequential order as they are declared.
-------------------------------------------------- --------------------------------------------------
The `description` is a special field to store a helpful description of The `description` is a special field to store a helpful description of
what the pipeline attempts to achieve. what the pipeline does.
The `processors` parameter defines a list of processors to be executed in The `processors` parameter defines a list of processors to be executed in
order. order.
[[ingest-apis]]
== Ingest APIs == Ingest APIs
=== Put pipeline API The following ingest APIs are available for managing pipelines:
The put pipeline api adds pipelines and updates existing pipelines in the cluster. * <<put-pipeline-api>> to add or update a pipeline
* <<get-pipeline-api>> to return a specific pipeline
* <<delete-pipeline-api>> to delete a pipeline
* <<simulate-pipeline-api>> to simulate a call to a pipeline
[[put-pipeline-api]]
=== Put Pipeline API
The put pipeline API adds pipelines and updates existing pipelines in the cluster.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -40,12 +51,13 @@ PUT _ingest/pipeline/my-pipeline-id
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE // AUTOSENSE
NOTE: The put pipeline api also instructs all ingest nodes to reload their in-memory representation of pipelines, so that NOTE: The put pipeline API also instructs all ingest nodes to reload their in-memory representation of pipelines, so that
pipeline changes take immediately in effect. pipeline changes take effect immediately.
=== Get pipeline API [[get-pipeline-api]]
=== Get Pipeline API
The get pipeline api returns pipelines based on id. This api always returns a local reference of the pipeline. The get pipeline API returns pipelines based on ID. This API always returns a local reference of the pipeline.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -75,13 +87,14 @@ Example response:
} }
-------------------------------------------------- --------------------------------------------------
For each returned pipeline the source and the version is returned. For each returned pipeline, the source and the version are returned.
The version is useful for knowing what version of the pipeline the node has. The version is useful for knowing which version of the pipeline the node has.
Multiple ids can be provided at the same time. Also wildcards are supported. You can specify multiple IDs to return more than one pipeline. Wildcards are also supported.
=== Delete pipeline API [[delete-pipeline-api]]
=== Delete Pipeline API
The delete pipeline api deletes pipelines by id. The delete pipeline API deletes pipelines by ID.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -89,16 +102,18 @@ DELETE _ingest/pipeline/my-pipeline-id
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE // AUTOSENSE
=== Simulate pipeline API [[simulate-pipeline-api]]
=== Simulate Pipeline API
The simulate pipeline api executes a specific pipeline against The simulate pipeline API executes a specific pipeline against
the set of documents provided in the body of the request. the set of documents provided in the body of the request.
A simulate request may call upon an existing pipeline to be executed You can either specify an existing pipeline to execute
against the provided documents, or supply a pipeline definition in against the provided documents, or supply a pipeline definition in
the body of the request. the body of the request.
Here is the structure of a simulate request with a provided pipeline: Here is the structure of a simulate request with a pipeline definition provided
in the body of the request:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -115,7 +130,7 @@ POST _ingest/pipeline/_simulate
} }
-------------------------------------------------- --------------------------------------------------
Here is the structure of a simulate request against a pre-existing pipeline: Here is the structure of a simulate request against an existing pipeline:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -130,7 +145,8 @@ POST _ingest/pipeline/my-pipeline-id/_simulate
-------------------------------------------------- --------------------------------------------------
Here is an example simulate request with a provided pipeline and its response: Here is an example of a simulate request with a pipeline defined in the request
and its response:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -170,7 +186,7 @@ POST _ingest/pipeline/_simulate
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE // AUTOSENSE
response: Response:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -216,13 +232,14 @@ response:
} }
-------------------------------------------------- --------------------------------------------------
It is often useful to see how each processor affects the ingest document [[ingest-verbose-param]]
as it is passed through the pipeline. To see the intermediate results of ==== Viewing Verbose Results
each processor in the simulate request, a `verbose` parameter may be added You can use the simulate pipeline API to see how each processor affects the ingest document
to the request as it passes through the pipeline. To see the intermediate results of
each processor in the simulate request, you can add the `verbose` parameter
Here is an example verbose request and its response: to the request.
Here is an example of a verbose request and its response:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -268,7 +285,7 @@ POST _ingest/pipeline/_simulate?verbose
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE // AUTOSENSE
response: Response:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -364,12 +381,16 @@ response:
} }
-------------------------------------------------- --------------------------------------------------
== Accessing data in pipelines [[accessing-data-in-pipelines]]
== Accessing Data in Pipelines
Processors in pipelines have read and write access to documents that pass through the pipeline. The processors in a pipeline have read and write access to documents that pass through the pipeline.
The fields in the source of a document and its metadata fields are accessible. The processors can access fields in the source of a document and the document's metadata fields.
Accessing a field in the source is straightforward and one can refer to fields by [float]
[[accessing-source-fields]]
=== Accessing Fields in the Source
Accessing a field in the source is straightforward. You simply refer to fields by
their name. For example: their name. For example:
[source,js] [source,js]
@ -382,7 +403,7 @@ their name. For example:
} }
-------------------------------------------------- --------------------------------------------------
On top of this fields from the source are always accessible via the `_source` prefix: On top of this, fields from the source are always accessible via the `_source` prefix:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -394,11 +415,14 @@ On top of this fields from the source are always accessible via the `_source` pr
} }
-------------------------------------------------- --------------------------------------------------
Metadata fields can also be accessed in the same way as fields from the source. This [float]
[[accessing-metadata-fields]]
=== Accessing Metadata Fields
You can access metadata fields in the same way that you access fields in the source. This
is possible because Elasticsearch doesn't allow fields in the source that have the is possible because Elasticsearch doesn't allow fields in the source that have the
same name as metadata fields. same name as metadata fields.
The following example sets the id of a document to `1`: The following example sets the `_id` metadata field of a document to `1`:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -411,15 +435,20 @@ The following example sets the id of a document to `1`:
-------------------------------------------------- --------------------------------------------------
The following metadata fields are accessible by a processor: `_index`, `_type`, `_id`, `_routing`, `_parent`, The following metadata fields are accessible by a processor: `_index`, `_type`, `_id`, `_routing`, `_parent`,
`_timestamp` and `_ttl`. `_timestamp`, and `_ttl`.
Beyond metadata fields and source fields, ingest also adds ingest metadata to documents being processed. [float]
[[accessing-ingest-metadata]]
=== Accessing Ingest Metadata Fields
Beyond metadata fields and source fields, ingest also adds ingest metadata to the documents that it processes.
These metadata properties are accessible under the `_ingest` key. Currently ingest adds the ingest timestamp These metadata properties are accessible under the `_ingest` key. Currently ingest adds the ingest timestamp
under `_ingest.timestamp` key to the ingest metadata, which is the time ES received the index or bulk under the `_ingest.timestamp` key of the ingest metadata. The ingest timestamp is the time when Elasticsearch
request to pre-process. But any processor is free to add more ingest related metadata to it. Ingest metadata is transient received the index or bulk request to pre-process the document.
and is lost after a document has been processed by the pipeline and thus ingest metadata won't be indexed.
The following example adds a field with the name `received` and the value is the ingest timestamp: Any processor can add ingest-related metadata during document processing. Ingest metadata is transient
and is lost after a document has been processed by the pipeline. Therefore, ingest metadata won't be indexed.
The following example adds a field with the name `received`. The value is the ingest timestamp:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -431,15 +460,18 @@ The following example adds a field with the name `received` and the value is the
} }
-------------------------------------------------- --------------------------------------------------
As opposed to Elasticsearch metadata fields, the ingest metadata field name _ingest can be used as a valid field name Unlike Elasticsearch metadata fields, the ingest metadata field name `_ingest` can be used as a valid field name
in the source of a document. Use _source._ingest to refer to it, otherwise _ingest will be interpreted as ingest in the source of a document. Use `_source._ingest` to refer to the field in the source document. Otherwise, `_ingest`
metadata fields. will be interpreted as an ingest metadata field.
[float]
[[accessing-template-fields]]
=== Accessing Fields and Metafields in Templates
A number of processor settings also support templating. Settings that support templating can have zero or more A number of processor settings also support templating. Settings that support templating can have zero or more
template snippets. A template snippet begins with `{{` and ends with `}}`. template snippets. A template snippet begins with `{{` and ends with `}}`.
Accessing fields and metafields in templates is exactly the same as via regular processor field settings. Accessing fields and metafields in templates is exactly the same as via regular processor field settings.
In this example a field by the name `field_c` is added and its value is a concatenation of The following example adds a field named `field_c`. Its value is a concatenation of
the values of `field_a` and `field_b`. the values of `field_a` and `field_b`.
[source,js] [source,js]
@ -452,8 +484,8 @@ the values of `field_a` and `field_b`.
} }
-------------------------------------------------- --------------------------------------------------
The following example changes the index a document is going to be indexed into. The index a document will be redirected The following example uses the value of the `geoip.country_iso_code` field in the source
to depends on the field in the source with name `geoip.country_iso_code`. to set the index that the document will be indexed into:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -466,25 +498,25 @@ to depends on the field in the source with name `geoip.country_iso_code`.
-------------------------------------------------- --------------------------------------------------
[[handling-failure-in-pipelines]] [[handling-failure-in-pipelines]]
=== Handling Failure in Pipelines == Handling Failures in Pipelines
In its simplest case, pipelines describe a list of processors which In its simplest use case, a pipeline defines a list of processors that
are executed sequentially and processing halts at the first exception. This are executed sequentially, and processing halts at the first exception. This
may not be desirable when failures are expected. For example, not all your logs behavior may not be desirable when failures are expected. For example, you may have logs
may match a certain grok expression and you may wish to index such documents into that don't match the specified grok expression. Instead of halting execution, you may
a separate index. want to index such documents into a separate index.
To enable this behavior, you can utilize the `on_failure` parameter. `on_failure` To enable this behavior, you can use the `on_failure` parameter. The `on_failure` parameter
defines a list of processors to be executed immediately following the failed processor. defines a list of processors to be executed immediately following the failed processor.
This parameter can be supplied at the pipeline level, as well as at the processor You can specify this parameter at the pipeline level, as well as at the processor
level. If a processor has an `on_failure` configuration option provided, whether level. If a processor specifies an `on_failure` configuration, whether
it is empty or not, any exceptions that are thrown by it will be caught and the it is empty or not, any exceptions that are thrown by the processor are caught, and the
pipeline will continue executing the proceeding processors defined. Since further processors pipeline continues executing the remaining processors. Because you can define further processors
are defined within the scope of an `on_failure` statement, failure handling can be nested. within the scope of an `on_failure` statement, you can nest failure handling.
Example: In the following example we define a pipeline that hopes to rename documents with The following example defines a pipeline that renames the `foo` field in
a field named `foo` to `bar`. If the document does not contain the `foo` field, we the processed document to `bar`. If the document does not contain the `foo` field, the processor
go ahead and attach an error message within the document for later analysis within attaches an error message to the document for later analysis within
Elasticsearch. Elasticsearch.
[source,js] [source,js]
@ -510,8 +542,8 @@ Elasticsearch.
} }
-------------------------------------------------- --------------------------------------------------
Example: Here we define an `on_failure` block on a whole pipeline to change The following example defines an `on_failure` block on a whole pipeline to change
the index for which failed documents get sent. the index to which failed documents get sent.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -529,15 +561,18 @@ the index for which failed documents get sent.
} }
-------------------------------------------------- --------------------------------------------------
[float]
[[accessing-error-metadata]]
=== Accessing Error Metadata From Processors Handling Exceptions
==== Accessing Error Metadata From Processors Handling Exceptions You may want to retrieve the actual error message that was thrown
Sometimes you may want to retrieve the actual error message that was thrown
by a failed processor. To do so you can access metadata fields called by a failed processor. To do so you can access metadata fields called
`on_failure_message`, `on_failure_processor_type`, `on_failure_processor_tag`. These fields are only accessible `on_failure_message`, `on_failure_processor_type`, and `on_failure_processor_tag`. These fields are only accessible
from within the context of an `on_failure` block. Here is an updated version of from within the context of an `on_failure` block.
our first example which leverages these fields to provide the error message instead
of manually setting it. Here is an updated version of the example that you
saw earlier. But instead of setting the error message manually, the example leverages the `on_failure_message`
metadata field to provide the error message.
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -562,6 +597,7 @@ of manually setting it.
} }
-------------------------------------------------- --------------------------------------------------
[[ingest-processors]]
== Processors == Processors
All processors are defined in the following way within a pipeline definition: All processors are defined in the following way within a pipeline definition:
@ -579,15 +615,16 @@ Each processor defines its own configuration parameters, but all processors have
the ability to declare `tag` and `on_failure` fields. These fields are optional. the ability to declare `tag` and `on_failure` fields. These fields are optional.
A `tag` is simply a string identifier of the specific instantiation of a certain A `tag` is simply a string identifier of the specific instantiation of a certain
processor in a pipeline. The `tag` field does not affect any processor's behavior, processor in a pipeline. The `tag` field does not affect the processor's behavior,
but is very useful for bookkeeping and tracing errors to specific processors. but is very useful for bookkeeping and tracing errors to specific processors.
See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines. See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines.
=== Append processor [[append-procesesor]]
=== Append Processor
Appends one or more values to an existing array if the field already exists and it is an array. Appends one or more values to an existing array if the field already exists and it is an array.
Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar.
Creates an array containing the provided values if the fields doesn't exist. Creates an array containing the provided values if the field doesn't exist.
Accepts a single value or an array of values. Accepts a single value or an array of values.
[[append-options]] [[append-options]]
@ -609,14 +646,15 @@ Accepts a single value or an array of values.
} }
-------------------------------------------------- --------------------------------------------------
=== Convert processor [[convert-processor]]
Converts an existing field's value to a different type, like turning a string to an integer. === Convert Processor
Converts an existing field's value to a different type, such as converting a string to an integer.
If the field value is an array, all members will be converted. If the field value is an array, all members will be converted.
The supported types include: `integer`, `float`, `string`, and `boolean`. The supported types include: `integer`, `float`, `string`, and `boolean`.
`boolean` will set the field to true if its string value is equal to `true` (ignore case), to Specifying `boolean` will set the field to true if its string value is equal to `true` (ignore case), to
false if its string value is equal to `false` (ignore case) and it will throw exception otherwise. false if its string value is equal to `false` (ignore case), or it will throw an exception otherwise.
[[convert-options]] [[convert-options]]
.Convert Options .Convert Options
@ -637,12 +675,14 @@ false if its string value is equal to `false` (ignore case) and it will throw ex
} }
-------------------------------------------------- --------------------------------------------------
=== Date processor [[date-processor]]
=== Date Processor
The date processor is used for parsing dates from fields, and then using that date or timestamp as the timestamp for that document. Parses dates from fields, and then uses the date or timestamp as the timestamp for the document.
The date processor adds by default the parsed date as a new field called `@timestamp`, configurable by setting the `target_field` By default, the date processor adds the parsed date as a new field called `@timestamp`. You can specify a
configuration parameter. Multiple date formats are supported as part of the same date processor definition. They will be used different field by setting the `target_field` configuration parameter. Multiple date formats are supported
sequentially to attempt parsing the date field, in the same order they were defined as part of the processor definition. as part of the same date processor definition. They will be used sequentially to attempt parsing the date field,
in the same order they were defined as part of the processor definition.
[[date-options]] [[date-options]]
.Date options .Date options
@ -651,12 +691,12 @@ sequentially to attempt parsing the date field, in the same order they were defi
| Name | Required | Default | Description | Name | Required | Default | Description
| `match_field` | yes | - | The field to get the date from. | `match_field` | yes | - | The field to get the date from.
| `target_field` | no | @timestamp | The field that will hold the parsed date. | `target_field` | no | @timestamp | The field that will hold the parsed date.
| `match_formats` | yes | - | Array of the expected date formats. Can be a joda pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, TAI64N. | `match_formats` | yes | - | An array of the expected date formats. Can be a Joda pattern or one of the following formats: ISO8601, UNIX, UNIX_MS, or TAI64N.
| `timezone` | no | UTC | The timezone to use when parsing the date. | `timezone` | no | UTC | The timezone to use when parsing the date.
| `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days. | `locale` | no | ENGLISH | The locale to use when parsing the date, relevant when parsing month names or week days.
|====== |======
An example that adds the parsed date to the `timestamp` field based on the `initial_date` field: Here is an example that adds the parsed date to the `timestamp` field based on the `initial_date` field:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -675,9 +715,10 @@ An example that adds the parsed date to the `timestamp` field based on the `init
} }
-------------------------------------------------- --------------------------------------------------
=== Fail processor [[fail-processor]]
The Fail Processor is used to raise an exception. This is useful for when === Fail Processor
a user expects a pipeline to fail and wishes to relay a specific message Raises an exception. This is useful for when
you expect a pipeline to fail and want to relay a specific message
to the requester. to the requester.
[[fail-options]] [[fail-options]]
@ -697,17 +738,20 @@ to the requester.
} }
-------------------------------------------------- --------------------------------------------------
=== Foreach processor [[foreach-processor]]
All processors can operate on elements inside an array, but if all elements of an array need to === Foreach Processor
be processed in the same way defining a processor for each element becomes cumbersome and tricky Processes elements in an array of unknown length.
because it is likely that the number of elements in an array are unknown. For this reason the `foreach`
processor is exists. By specifying the field holding array elements and a list of processors that
define what should happen to each element, array field can easily be preprocessed.
Processors inside the foreach processor work in a different context and the only valid top level All processors can operate on elements inside an array, but if all elements of an array need to
be processed in the same way, defining a processor for each element becomes cumbersome and tricky
because it is likely that the number of elements in an array is unknown. For this reason the `foreach`
processor exists. By specifying the field holding array elements and a list of processors that
define what should happen to each element, array fields can easily be preprocessed.
Processors inside the foreach processor work in a different context, and the only valid top-level
field is `_value`, which holds the array element value. Under this field other fields may exist. field is `_value`, which holds the array element value. Under this field other fields may exist.
If the `foreach` processor failed to process an element inside the array and no `on_failure` processor has been specified If the `foreach` processor fails to process an element inside the array, and no `on_failure` processor has been specified,
then it aborts the execution and leaves the array unmodified. then it aborts the execution and leaves the array unmodified.
[[foreach-options]] [[foreach-options]]
@ -755,7 +799,7 @@ Then the document will look like this after preprocessing:
} }
-------------------------------------------------- --------------------------------------------------
Lets take a look at another example: Let's take a look at another example:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -773,8 +817,8 @@ Lets take a look at another example:
} }
-------------------------------------------------- --------------------------------------------------
and in the case the `id` field needs to be removed In this case, the `id` field needs to be removed,
then the following `foreach` processor can be used: so the following `foreach` processor is used:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -808,12 +852,12 @@ After preprocessing the result is:
} }
-------------------------------------------------- --------------------------------------------------
Like on any processor `on_failure` processors can also be defined As for any processor, you can define `on_failure` processors
in processors that wrapped inside the `foreach` processor. in processors that are wrapped inside the `foreach` processor.
For example the `id` field may not exist on all person objects and For example, the `id` field may not exist on all person objects.
instead of failing the index request, the document will be send to Instead of failing the index request, you can use an `on_failure`
the 'failure_index' index for later inspection: block to send the document to the 'failure_index' index for later inspection:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -839,14 +883,15 @@ the 'failure_index' index for later inspection:
} }
-------------------------------------------------- --------------------------------------------------
In this example if the `remove` processor does fail then In this example, if the `remove` processor does fail, then
the array elements that have been processed thus far will the array elements that have been processed thus far will
be updated. be updated.
[[grok-processor]]
=== Grok Processor === Grok Processor
The Grok Processor extracts structured fields out of a single text field within a document. You choose which field to Extracts structured fields out of a single text field within a document. You choose which field to
extract matched fields from, as well as the Grok Pattern you expect will match. A Grok Pattern is like a regular extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular
expression that supports aliased expressions that can be reused. expression that supports aliased expressions that can be reused.
This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format
@ -858,6 +903,7 @@ Here, you can add your own custom grok pattern files with custom grok expression
If you need help building patterns to match your logs, you will find the <http://grokdebug.herokuapp.com> and If you need help building patterns to match your logs, you will find the <http://grokdebug.herokuapp.com> and
<http://grokconstructor.appspot.com/> applications quite useful! <http://grokconstructor.appspot.com/> applications quite useful!
[[grok-basics]]
==== Grok Basics ==== Grok Basics
Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. Grok sits on top of regular expressions, so any regular expressions are valid in grok as well.
@ -867,7 +913,7 @@ https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Onigiruma site].
Grok works by leveraging this regular expression language to allow naming existing patterns and combining them into more Grok works by leveraging this regular expression language to allow naming existing patterns and combining them into more
complex patterns that match your fields. complex patterns that match your fields.
The syntax for re-using a grok pattern comes in three forms: `%{SYNTAX:SEMANTIC}`, `%{SYNTAX}`, `%{SYNTAX:SEMANTIC:TYPE}`. The syntax for reusing a grok pattern comes in three forms: `%{SYNTAX:SEMANTIC}`, `%{SYNTAX}`, `%{SYNTAX:SEMANTIC:TYPE}`.
The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER` The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER`
pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match. `NUMBER` and `IP` are both pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match. `NUMBER` and `IP` are both
@ -879,15 +925,14 @@ the `client` making a request.
The `TYPE` is the type you wish to cast your named field. `int` and `float` are currently the only types supported for coercion. The `TYPE` is the type you wish to cast your named field. `int` and `float` are currently the only types supported for coercion.
For example, here is a grok pattern that would match the above example given. We would like to match a text with the following For example, you might want to match the following text:
contents:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
3.44 55.3.244.1 3.44 55.3.244.1
-------------------------------------------------- --------------------------------------------------
We may know that the above message is a number followed by an IP-address. We can match this text with the following You may know that the message in the example is a number followed by an IP address. You can match this text by using the following
Grok expression. Grok expression.
[source,js] [source,js]
@ -895,9 +940,10 @@ Grok expression.
%{NUMBER:duration} %{IP:client} %{NUMBER:duration} %{IP:client}
-------------------------------------------------- --------------------------------------------------
[[custom-patterns]]
==== Custom Patterns and Pattern Files ==== Custom Patterns and Pattern Files
The Grok Processor comes pre-packaged with a base set of pattern files. These patterns may not always have The Grok processor comes pre-packaged with a base set of pattern files. These patterns may not always have
what you are looking for. These pattern files have a very basic format. Each line describes a named pattern with what you are looking for. These pattern files have a very basic format. Each line describes a named pattern with
the following format: the following format:
@ -906,11 +952,11 @@ the following format:
NAME ' '+ PATTERN '\n' NAME ' '+ PATTERN '\n'
-------------------------------------------------- --------------------------------------------------
You can add this pattern to an existing file, or add your own file in the patterns directory here: `$ES_HOME/config/ingest/grok/patterns`. You can add new patterns to an existing file, or add your own file in the patterns directory here: `$ES_HOME/config/ingest/grok/patterns`.
The Ingest Plugin will pick up files in this directory to be loaded into the grok processor's known patterns. These patterns are loaded Ingest node picks up files in this directory and loads the patterns into the grok processor's known patterns.
at startup, so you will need to do a restart your ingest node if you wish to update these files while running. These patterns are loaded at startup, so you need to restart your ingest node if you want to update these files.
Example snippet of pattern definitions found in the `grok-patterns` patterns file: Here is an example snippet of pattern definitions found in the `grok-patterns` patterns file:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -921,7 +967,8 @@ SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
-------------------------------------------------- --------------------------------------------------
==== Using Grok Processor in a Pipeline [[using-grok]]
==== Using the Grok Processor in a Pipeline
[[grok-options]] [[grok-options]]
.Grok Options .Grok Options
@ -943,14 +990,14 @@ a document.
} }
-------------------------------------------------- --------------------------------------------------
The pattern for this could be The pattern for this could be:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
-------------------------------------------------- --------------------------------------------------
An example pipeline for processing the above document using Grok: Here is an example pipeline for processing the above document by using Grok:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -981,7 +1028,7 @@ This pipeline will insert these named captures as new fields within the document
} }
-------------------------------------------------- --------------------------------------------------
An example of a pipeline specifying custom pattern definitions: Here is an example of a pipeline specifying custom pattern definitions:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -1002,7 +1049,8 @@ An example of a pipeline specifying custom pattern definitions:
} }
-------------------------------------------------- --------------------------------------------------
=== Gsub processor [[gsub-processor]]
=== Gsub Processor
Converts a string field by applying a regular expression and a replacement. Converts a string field by applying a regular expression and a replacement.
If the field is not a string, the processor will throw an exception. If the field is not a string, the processor will throw an exception.
@ -1011,9 +1059,9 @@ If the field is not a string, the processor will throw an exception.
[options="header"] [options="header"]
|====== |======
| Name | Required | Default | Description | Name | Required | Default | Description
| `field` | yes | - | The field apply the replacement for | `field` | yes | - | The field to apply the replacement to
| `pattern` | yes | - | The pattern to be replaced | `pattern` | yes | - | The pattern to be replaced
| `replacement` | yes | - | The string to replace the matching patterns with. | `replacement` | yes | - | The string to replace the matching patterns with
|====== |======
[source,js] [source,js]
@ -1027,9 +1075,10 @@ If the field is not a string, the processor will throw an exception.
} }
-------------------------------------------------- --------------------------------------------------
=== Join processor [[join-processor]]
=== Join Processor
Joins each element of an array into a single string using a separator character between each element. Joins each element of an array into a single string using a separator character between each element.
Throws error when the field is not an array. Throws an error when the field is not an array.
[[join-options]] [[join-options]]
.Join Options .Join Options
@ -1050,7 +1099,8 @@ Throws error when the field is not an array.
} }
-------------------------------------------------- --------------------------------------------------
=== Lowercase processor [[lowercase-processor]]
=== Lowercase Processor
Converts a string to its lowercase equivalent. Converts a string to its lowercase equivalent.
[[lowercase-options]] [[lowercase-options]]
@ -1058,7 +1108,7 @@ Converts a string to its lowercase equivalent.
[options="header"] [options="header"]
|====== |======
| Name | Required | Default | Description | Name | Required | Default | Description
| `field` | yes | - | The field to lowercase | `field` | yes | - | The field to make lowercase
|====== |======
[source,js] [source,js]
@ -1070,8 +1120,9 @@ Converts a string to its lowercase equivalent.
} }
-------------------------------------------------- --------------------------------------------------
=== Remove processor [[remove-processor]]
Removes an existing field. If the field doesn't exist, an exception will be thrown === Remove Processor
Removes an existing field. If the field doesn't exist, an exception will be thrown.
[[remove-options]] [[remove-options]]
.Remove Options .Remove Options
@ -1090,9 +1141,9 @@ Removes an existing field. If the field doesn't exist, an exception will be thro
} }
-------------------------------------------------- --------------------------------------------------
=== Rename processor [[rename-processor]]
Renames an existing field. If the field doesn't exist, an exception will be thrown. Also, the new field === Rename Processor
name must not exist. Renames an existing field. If the field doesn't exist or the new name is already used, an exception will be thrown.
[[rename-options]] [[rename-options]]
.Rename Options .Rename Options
@ -1113,7 +1164,8 @@ name must not exist.
} }
-------------------------------------------------- --------------------------------------------------
=== Set processor [[set-processor]]
=== Set Processor
Sets one field and associates it with the specified value. If the field already exists, Sets one field and associates it with the specified value. If the field already exists,
its value will be replaced with the provided one. its value will be replaced with the provided one.
@ -1136,8 +1188,9 @@ its value will be replaced with the provided one.
} }
-------------------------------------------------- --------------------------------------------------
=== Split processor [[split-processor]]
Split a field to an array using a separator character. Only works on string fields. === Split Processor
Splits a field into an array using a separator character. Only works on string fields.
[[split-options]] [[split-options]]
.Split Options .Split Options
@ -1156,8 +1209,11 @@ Split a field to an array using a separator character. Only works on string fiel
} }
-------------------------------------------------- --------------------------------------------------
=== Trim processor [[trim-processor]]
Trims whitespace from field. NOTE: this only works on leading and trailing whitespaces. === Trim Processor
Trims whitespace from field.
NOTE: This only works on leading and trailing whitespace.
[[trim-options]] [[trim-options]]
.Trim Options .Trim Options
@ -1176,7 +1232,8 @@ Trims whitespace from field. NOTE: this only works on leading and trailing white
} }
-------------------------------------------------- --------------------------------------------------
=== Uppercase processor [[uppercase-processor]]
=== Uppercase Processor
Converts a string to its uppercase equivalent. Converts a string to its uppercase equivalent.
[[uppercase-options]] [[uppercase-options]]
@ -1184,7 +1241,7 @@ Converts a string to its uppercase equivalent.
[options="header"] [options="header"]
|====== |======
| Name | Required | Default | Description | Name | Required | Default | Description
| `field` | yes | - | The field to uppercase | `field` | yes | - | The field to make uppercase
|====== |======
[source,js] [source,js]