Merge pull request #16620 from dedemorton/add_ingest_doc

Add ingest docs to the build
This commit is contained in:
DeDe Morton 2016-02-12 07:33:06 -08:00
commit 16e87bbe14
3 changed files with 67 additions and 58 deletions

View File

@ -41,6 +41,8 @@ include::modules.asciidoc[]
include::index-modules.asciidoc[] include::index-modules.asciidoc[]
include::ingest.asciidoc[]
include::testing.asciidoc[] include::testing.asciidoc[]
include::glossary.asciidoc[] include::glossary.asciidoc[]

View File

@ -0,0 +1,34 @@
[[ingest]]
= Ingest Node
[partintro]
--
Ingest node can be used to pre-process documents before the actual indexing takes place.
This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the
transformations and then passes the documents back to the index or bulk APIs.
Ingest node is enabled by default. In order to disable ingest the following
setting should be configured in the elasticsearch.yml file:
[source,yaml]
--------------------------------------------------
node.ingest: false
--------------------------------------------------
It is possible to enable ingest on any node or have dedicated ingest nodes.
In order to pre-process document before indexing the `pipeline` parameter should be used
on an index or bulk request to tell Ingest what pipeline is going to be used.
[source,js]
--------------------------------------------------
PUT /my-index/my-type/my-id?pipeline=my_pipeline_id
{
...
}
--------------------------------------------------
// AUTOSENSE
--
include::ingest/ingest-node.asciidoc[]

View File

@ -1,33 +1,5 @@
[[ingest]] [[pipe-line]]
== Ingest Node == Pipeline Definition
Ingest node can be used to pre-process documents before the actual indexing takes place.
This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the
transformations and then passes the documents back to the index or bulk APIs.
Ingest node is enabled by default. In order to disable ingest the following
setting should be configured in the elasticsearch.yml file:
[source,yaml]
--------------------------------------------------
node.ingest: false
--------------------------------------------------
It is possible to enable ingest on any node or have dedicated ingest nodes.
In order to pre-process document before indexing the `pipeline` parameter should be used
on an index or bulk request to tell Ingest what pipeline is going to be used.
[source,js]
--------------------------------------------------
PUT /my-index/my-type/my-id?pipeline=my_pipeline_id
{
...
}
--------------------------------------------------
// AUTOSENSE
=== Pipeline Definition
A pipeline is a definition of a series of processors that are to be A pipeline is a definition of a series of processors that are to be
executed in the same sequential order as they are declared. executed in the same sequential order as they are declared.
@ -45,7 +17,7 @@ what the pipeline attempts to achieve.
The `processors` parameter defines a list of processors to be executed in The `processors` parameter defines a list of processors to be executed in
order. order.
=== Processors == Processors
All processors are defined in the following way within a pipeline definition: All processors are defined in the following way within a pipeline definition:
@ -67,7 +39,7 @@ but is very useful for bookkeeping and tracing errors to specific processors.
See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines. See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines.
==== Set processor === Set processor
Sets one field and associates it with the specified value. If the field already exists, Sets one field and associates it with the specified value. If the field already exists,
its value will be replaced with the provided one. its value will be replaced with the provided one.
@ -90,7 +62,7 @@ its value will be replaced with the provided one.
} }
-------------------------------------------------- --------------------------------------------------
==== Append processor === Append processor
Appends one or more values to an existing array if the field already exists and it is an array. Appends one or more values to an existing array if the field already exists and it is an array.
Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar. Converts a scalar to an array and appends one or more values to it if the field exists and it is a scalar.
Creates an array containing the provided values if the fields doesn't exist. Creates an array containing the provided values if the fields doesn't exist.
@ -115,7 +87,7 @@ Accepts a single value or an array of values.
} }
-------------------------------------------------- --------------------------------------------------
==== Remove processor === Remove processor
Removes an existing field. If the field doesn't exist, an exception will be thrown Removes an existing field. If the field doesn't exist, an exception will be thrown
[[remove-options]] [[remove-options]]
@ -135,7 +107,7 @@ Removes an existing field. If the field doesn't exist, an exception will be thro
} }
-------------------------------------------------- --------------------------------------------------
==== Rename processor === Rename processor
Renames an existing field. If the field doesn't exist, an exception will be thrown. Also, the new field Renames an existing field. If the field doesn't exist, an exception will be thrown. Also, the new field
name must not exist. name must not exist.
@ -159,7 +131,7 @@ name must not exist.
-------------------------------------------------- --------------------------------------------------
==== Convert processor === Convert processor
Converts an existing field's value to a different type, like turning a string to an integer. Converts an existing field's value to a different type, like turning a string to an integer.
If the field value is an array, all members will be converted. If the field value is an array, all members will be converted.
@ -187,7 +159,7 @@ false if its string value is equal to `false` (ignore case) and it will throw ex
} }
-------------------------------------------------- --------------------------------------------------
==== Gsub processor === Gsub processor
Converts a string field by applying a regular expression and a replacement. Converts a string field by applying a regular expression and a replacement.
If the field is not a string, the processor will throw an exception. If the field is not a string, the processor will throw an exception.
@ -212,7 +184,7 @@ If the field is not a string, the processor will throw an exception.
} }
-------------------------------------------------- --------------------------------------------------
==== Join processor === Join processor
Joins each element of an array into a single string using a separator character between each element. Joins each element of an array into a single string using a separator character between each element.
Throws error when the field is not an array. Throws error when the field is not an array.
@ -235,7 +207,7 @@ Throws error when the field is not an array.
} }
-------------------------------------------------- --------------------------------------------------
==== Split processor === Split processor
Split a field to an array using a separator character. Only works on string fields. Split a field to an array using a separator character. Only works on string fields.
[[split-options]] [[split-options]]
@ -255,7 +227,7 @@ Split a field to an array using a separator character. Only works on string fiel
} }
-------------------------------------------------- --------------------------------------------------
==== Lowercase processor === Lowercase processor
Converts a string to its lowercase equivalent. Converts a string to its lowercase equivalent.
[[lowercase-options]] [[lowercase-options]]
@ -275,7 +247,7 @@ Converts a string to its lowercase equivalent.
} }
-------------------------------------------------- --------------------------------------------------
==== Uppercase processor === Uppercase processor
Converts a string to its uppercase equivalent. Converts a string to its uppercase equivalent.
[[uppercase-options]] [[uppercase-options]]
@ -295,7 +267,7 @@ Converts a string to its uppercase equivalent.
} }
-------------------------------------------------- --------------------------------------------------
==== Trim processor === Trim processor
Trims whitespace from field. NOTE: this only works on leading and trailing whitespaces. Trims whitespace from field. NOTE: this only works on leading and trailing whitespaces.
[[trim-options]] [[trim-options]]
@ -315,7 +287,7 @@ Trims whitespace from field. NOTE: this only works on leading and trailing white
} }
-------------------------------------------------- --------------------------------------------------
==== Grok Processor === Grok Processor
The Grok Processor extracts structured fields out of a single text field within a document. You choose which field to The Grok Processor extracts structured fields out of a single text field within a document. You choose which field to
extract matched fields from, as well as the Grok Pattern you expect will match. A Grok Pattern is like a regular extract matched fields from, as well as the Grok Pattern you expect will match. A Grok Pattern is like a regular
@ -330,7 +302,7 @@ Here, you can add your own custom grok pattern files with custom grok expression
If you need help building patterns to match your logs, you will find the <http://grokdebug.herokuapp.com> and If you need help building patterns to match your logs, you will find the <http://grokdebug.herokuapp.com> and
<http://grokconstructor.appspot.com/> applications quite useful! <http://grokconstructor.appspot.com/> applications quite useful!
===== Grok Basics ==== Grok Basics
Grok sits on top of regular expressions, so any regular expressions are valid in grok as well. Grok sits on top of regular expressions, so any regular expressions are valid in grok as well.
The regular expression library is Oniguruma, and you can see the full supported regexp syntax The regular expression library is Oniguruma, and you can see the full supported regexp syntax
@ -367,7 +339,7 @@ Grok expression.
%{NUMBER:duration} %{IP:client} %{NUMBER:duration} %{IP:client}
-------------------------------------------------- --------------------------------------------------
===== Custom Patterns and Pattern Files ==== Custom Patterns and Pattern Files
The Grok Processor comes pre-packaged with a base set of pattern files. These patterns may not always have The Grok Processor comes pre-packaged with a base set of pattern files. These patterns may not always have
what you are looking for. These pattern files have a very basic format. Each line describes a named pattern with what you are looking for. These pattern files have a very basic format. Each line describes a named pattern with
@ -393,7 +365,7 @@ SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
-------------------------------------------------- --------------------------------------------------
===== Using Grok Processor in a Pipeline ==== Using Grok Processor in a Pipeline
[[grok-options]] [[grok-options]]
.Grok Options .Grok Options
@ -417,7 +389,7 @@ a document.
The pattern for this could be The pattern for this could be
[source] [source,js]
-------------------------------------------------- --------------------------------------------------
%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
-------------------------------------------------- --------------------------------------------------
@ -474,7 +446,7 @@ An example of a pipeline specifying custom pattern definitions:
} }
-------------------------------------------------- --------------------------------------------------
==== Date processor === Date processor
The date processor is used for parsing dates from fields, and then using that date or timestamp as the timestamp for that document. The date processor is used for parsing dates from fields, and then using that date or timestamp as the timestamp for that document.
The date processor adds by default the parsed date as a new field called `@timestamp`, configurable by setting the `target_field` The date processor adds by default the parsed date as a new field called `@timestamp`, configurable by setting the `target_field`
@ -512,7 +484,7 @@ An example that adds the parsed date to the `timestamp` field based on the `init
} }
-------------------------------------------------- --------------------------------------------------
==== Fail processor === Fail processor
The Fail Processor is used to raise an exception. This is useful for when The Fail Processor is used to raise an exception. This is useful for when
a user expects a pipeline to fail and wishes to relay a specific message a user expects a pipeline to fail and wishes to relay a specific message
to the requester. to the requester.
@ -534,7 +506,7 @@ to the requester.
} }
-------------------------------------------------- --------------------------------------------------
==== Foreach processor === Foreach processor
All processors can operate on elements inside an array, but if all elements of an array need to All processors can operate on elements inside an array, but if all elements of an array need to
be processed in the same way defining a processor for each element becomes cumbersome and tricky be processed in the same way defining a processor for each element becomes cumbersome and tricky
because it is likely that the number of elements in an array are unknown. For this reason the `foreach` because it is likely that the number of elements in an array are unknown. For this reason the `foreach`
@ -680,7 +652,7 @@ In this example if the `remove` processor does fail then
the array elements that have been processed thus far will the array elements that have been processed thus far will
be updated. be updated.
=== Accessing data in pipelines == Accessing data in pipelines
Processors in pipelines have read and write access to documents that pass through the pipeline. Processors in pipelines have read and write access to documents that pass through the pipeline.
The fields in the source of a document and its metadata fields are accessible. The fields in the source of a document and its metadata fields are accessible.
@ -781,7 +753,8 @@ to depends on the field in the source with name `geoip.country_iso_code`.
} }
-------------------------------------------------- --------------------------------------------------
==== Handling Failure in Pipelines [[handling-failure-in-pipelines]]
=== Handling Failure in Pipelines
In its simplest case, pipelines describe a list of processors which In its simplest case, pipelines describe a list of processors which
are executed sequentially and processing halts at the first exception. This are executed sequentially and processing halts at the first exception. This
@ -845,7 +818,7 @@ the index for which failed documents get sent.
-------------------------------------------------- --------------------------------------------------
===== Accessing Error Metadata From Processors Handling Exceptions ==== Accessing Error Metadata From Processors Handling Exceptions
Sometimes you may want to retrieve the actual error message that was thrown Sometimes you may want to retrieve the actual error message that was thrown
by a failed processor. To do so you can access metadata fields called by a failed processor. To do so you can access metadata fields called
@ -878,9 +851,9 @@ of manually setting it.
-------------------------------------------------- --------------------------------------------------
=== Ingest APIs == Ingest APIs
==== Put pipeline API === Put pipeline API
The put pipeline api adds pipelines and updates existing pipelines in the cluster. The put pipeline api adds pipelines and updates existing pipelines in the cluster.
@ -904,7 +877,7 @@ PUT _ingest/pipeline/my-pipeline-id
NOTE: The put pipeline api also instructs all ingest nodes to reload their in-memory representation of pipelines, so that NOTE: The put pipeline api also instructs all ingest nodes to reload their in-memory representation of pipelines, so that
pipeline changes take immediately in effect. pipeline changes take immediately in effect.
==== Get pipeline API === Get pipeline API
The get pipeline api returns pipelines based on id. This api always returns a local reference of the pipeline. The get pipeline api returns pipelines based on id. This api always returns a local reference of the pipeline.
@ -940,7 +913,7 @@ For each returned pipeline the source and the version is returned.
The version is useful for knowing what version of the pipeline the node has. The version is useful for knowing what version of the pipeline the node has.
Multiple ids can be provided at the same time. Also wildcards are supported. Multiple ids can be provided at the same time. Also wildcards are supported.
==== Delete pipeline API === Delete pipeline API
The delete pipeline api deletes pipelines by id. The delete pipeline api deletes pipelines by id.
@ -950,7 +923,7 @@ DELETE _ingest/pipeline/my-pipeline-id
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE // AUTOSENSE
==== Simulate pipeline API === Simulate pipeline API
The simulate pipeline api executes a specific pipeline against The simulate pipeline api executes a specific pipeline against
the set of documents provided in the body of the request. the set of documents provided in the body of the request.