update ingest docs

- move ingest plugin docs to core reference docs
- move geoip processor docs to plugins/ingest-geoip.asciidoc
- add missing options tables for some processors
- add description of pipeline definition
- add description of processor definitions including common parameters
  like "tag" and "on_failure"
This commit is contained in:
Tal Levy 2016-01-25 12:06:39 -08:00
parent a5a62932b9
commit 894efa3fb6
2 changed files with 212 additions and 68 deletions

View File

@ -0,0 +1,64 @@
[[ingest-geoip]]
== Ingest Geoip Processor Plugin
The GeoIP processor adds information about the geographical location of IP addresses, based on data from the Maxmind databases.
This processor adds this information by default under the `geoip` field.
The ingest plugin ships by default with the GeoLite2 City and GeoLite2 Country geoip2 databases from Maxmind made available
under the CCA-ShareAlike 3.0 license. For more details see, http://dev.maxmind.com/geoip/geoip2/geolite2/
The GeoIP processor can run with other geoip2 databases from Maxmind. The files must be copied into the geoip config directory
and the `database_file` option should be used to specify the filename of the custom database. The geoip config directory
is located at `$ES_HOME/config/ingest/geoip` and holds the shipped databases too.
[[geoip-options]]
.Geoip options
[options="header"]
|======
| Name | Required | Default | Description
| `source_field` | yes | - | The field to get the ip address or hostname from for the geographical lookup.
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the Maxmind database.
| `database_file` | no | GeoLite2-City.mmdb | The database filename in the geoip config directory. The ingest plugin ships with the GeoLite2-City.mmdb and GeoLite2-Country.mmdb files.
| `fields` | no | [`continent_name`, `country_iso_code`, `region_name`, `city_name`, `location`] <1> | Controls what properties are added to the `target_field` based on the geoip lookup.
|======
<1> Depends on what is available in `database_field`:
* If the GeoLite2 City database is used then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
and `location`. The fields actually added depend on what has been found and which fields were configured in `fields`.
* If the GeoLite2 Country database is used then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name` and `continent_name`.The fields actually added depend on what has been found and which fields were configured in `fields`.
An example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [
{
"geoip" : {
"source_field" : "ip"
}
}
]
}
--------------------------------------------------
An example that uses the default country database and add the geographical information to the `geo` field based on the `ip` field`:
[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [
{
"geoip" : {
"source_field" : "ip",
"target_field" : "geo",
"database_file" : "GeoLite2-Country.mmdb"
}
}
]
}
--------------------------------------------------

View File

@ -28,12 +28,59 @@ PUT /my-index/my-type/my-id?pipeline=my_pipeline_id
--------------------------------------------------
// AUTOSENSE
=== Pipeline Definition
A pipeline is a definition of a series of processors that are to be
executed in the same sequential order as they are declared.
[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [ ... ]
}
--------------------------------------------------
The `description` is a special field to store a helpful description of
what the pipeline attempts to achieve.
The `processors` parameter defines a list of processors to be executed in
order.
=== Processors
All processors are defined in the following way within a pipeline definition:
[source,js]
--------------------------------------------------
{
"PROCESSOR_NAME" : {
... processor configuration options ...
}
}
--------------------------------------------------
Each processor defines its own configuration parameters, but all processors have
the ability to declare `tag` and `on_failure` fields. These fields are optional.
A `tag` is simply a string identifier of the specific instatiation of a certain
processor in a pipeline. The `tag` field does not affect any processor's behavior,
but is very useful for bookkeeping and tracing errors to specific processors.
See <<handling-failure-in-pipelines>> to learn more about the `on_failure` field and error handling in pipelines.
==== Set processor
Sets one field and associates it with the specified value. If the field already exists,
its value will be replaced with the provided one.
[[set-options]]
.Set Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to insert, upsert, or update
| `value` | yes | - | The value to be set for the field
|======
[source,js]
--------------------------------------------------
{
@ -50,6 +97,15 @@ Converts a scalar to an array and appends one or more values to it if the field
Creates an array containing the provided values if the fields doesn't exist.
Accepts a single value or an array of values.
[[append-options]]
.Append Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to be appended to
| `value` | yes | - | The value to be appended
|======
[source,js]
--------------------------------------------------
{
@ -63,6 +119,14 @@ Accepts a single value or an array of values.
==== Remove processor
Removes an existing field. If the field doesn't exist, an exception will be thrown
[[remove-options]]
.Remove Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to be removed
|======
[source,js]
--------------------------------------------------
{
@ -76,6 +140,15 @@ Removes an existing field. If the field doesn't exist, an exception will be thro
Renames an existing field. If the field doesn't exist, an exception will be thrown. Also, the new field
name must not exist.
[[rename-options]]
.Rename Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to be renamed
| `to` | yes | - | The new name of the field
|======
[source,js]
--------------------------------------------------
{
@ -96,6 +169,15 @@ The supported types include: `integer`, `float`, `string`, and `boolean`.
`boolean` will set the field to true if its string value is equal to `true` (ignore case), to
false if its string value is equal to `false` (ignore case) and it will throw exception otherwise.
[[convert-options]]
.Convert Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field whose value is to be converted
| `type` | yes | - | The type to convert the existing value to
|======
[source,js]
--------------------------------------------------
{
@ -110,9 +192,15 @@ false if its string value is equal to `false` (ignore case) and it will throw ex
Converts a string field by applying a regular expression and a replacement.
If the field is not a string, the processor will throw an exception.
This configuration takes a `field` for the field name, `pattern` for the
pattern to be replaced, and `replacement` for the string to replace the matching patterns with.
[[gsub-options]]
.Gsub Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field apply the replacement for
| `pattern` | yes | - | The pattern to be replaced
| `replacement` | yes | - | The string to replace the matching patterns with.
|======
[source,js]
--------------------------------------------------
@ -129,6 +217,15 @@ pattern to be replaced, and `replacement` for the string to replace the matching
Joins each element of an array into a single string using a separator character between each element.
Throws error when the field is not an array.
[[join-options]]
.Join Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to be separated
| `separator` | yes | - | The separator character
|======
[source,js]
--------------------------------------------------
{
@ -142,6 +239,14 @@ Throws error when the field is not an array.
==== Split processor
Split a field to an array using a separator character. Only works on string fields.
[[split-options]]
.Split Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to split
|======
[source,js]
--------------------------------------------------
{
@ -154,6 +259,14 @@ Split a field to an array using a separator character. Only works on string fiel
==== Lowercase processor
Converts a string to its lowercase equivalent.
[[lowercase-options]]
.Lowercase Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to lowercase
|======
[source,js]
--------------------------------------------------
{
@ -166,6 +279,14 @@ Converts a string to its lowercase equivalent.
==== Uppercase processor
Converts a string to its uppercase equivalent.
[[uppercase-options]]
.Uppercase Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field to uppercase
|======
[source,js]
--------------------------------------------------
{
@ -178,6 +299,14 @@ Converts a string to its uppercase equivalent.
==== Trim processor
Trims whitespace from field. NOTE: this only works on leading and trailing whitespaces.
[[trim-options]]
.Trim Options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The string-valued field to trim whitespace from
|======
[source,js]
--------------------------------------------------
{
@ -346,71 +475,6 @@ An example of a pipeline specifying custom pattern definitions:
}
--------------------------------------------------
==== Geoip processor
The GeoIP processor adds information about the geographical location of IP addresses, based on data from the Maxmind databases.
This processor adds this information by default under the `geoip` field.
The ingest plugin ships by default with the GeoLite2 City and GeoLite2 Country geoip2 databases from Maxmind made available
under the CCA-ShareAlike 3.0 license. For more details see, http://dev.maxmind.com/geoip/geoip2/geolite2/
The GeoIP processor can run with other geoip2 databases from Maxmind. The files must be copied into the geoip config directory
and the `database_file` option should be used to specify the filename of the custom database. The geoip config directory
is located at `$ES_HOME/config/ingest/geoip` and holds the shipped databases too.
[[geoip-options]]
.Geoip options
[options="header"]
|======
| Name | Required | Default | Description
| `source_field` | yes | - | The field to get the ip address or hostname from for the geographical lookup.
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the Maxmind database.
| `database_file` | no | GeoLite2-City.mmdb | The database filename in the geoip config directory. The ingest plugin ships with the GeoLite2-City.mmdb and GeoLite2-Country.mmdb files.
| `fields` | no | [`continent_name`, `country_iso_code`, `region_name`, `city_name`, `location`] <1> | Controls what properties are added to the `target_field` based on the geoip lookup.
|======
<1> Depends on what is available in `database_field`:
* If the GeoLite2 City database is used then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name`, `continent_name`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
and `location`. The fields actually added depend on what has been found and which fields were configured in `fields`.
* If the GeoLite2 Country database is used then the following fields may be added under the `target_field`: `ip`,
`country_iso_code`, `country_name` and `continent_name`.The fields actually added depend on what has been found and which fields were configured in `fields`.
An example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [
{
"geoip" : {
"source_field" : "ip"
}
}
]
}
--------------------------------------------------
An example that uses the default country database and add the geographical information to the `geo` field based on the `ip` field`:
[source,js]
--------------------------------------------------
{
"description" : "...",
"processors" : [
{
"geoip" : {
"source_field" : "ip",
"target_field" : "geo",
"database_file" : "GeoLite2-Country.mmdb"
}
}
]
}
--------------------------------------------------
==== Date processor
The date processor is used for parsing dates from fields, and then using that date or timestamp as the timestamp for that document.
@ -454,6 +518,14 @@ The Fail Processor is used to raise an exception. This is useful for when
a user expects a pipeline to fail and wishes to relay a specific message
to the requester.
[[fail-options]]
.Fail Options
[options="header"]
|======
| Name | Required | Default | Description
| `message` | yes | - | The error message of the `FailException` thrown by the processor
|======
[source,js]
--------------------------------------------------
{
@ -467,6 +539,14 @@ to the requester.
The DeDot Processor is used to remove dots (".") from field names and
replace them with a specific `separator` string.
[[dedot-options]]
.DeDot Options
[options="header"]
|======
| Name | Required | Default | Description
| `separator` | yes | "_" | The string to replace dots with in all field names
|======
[source,js]
--------------------------------------------------
{