2018-12-22 20:51:18 -05:00
[[user-agent-processor]]
2018-12-22 20:31:07 -05:00
=== User Agent processor
2018-12-22 20:20:53 -05:00
The `user_agent` processor extracts details from the user agent string a browser sends with its web requests.
This processor adds this information by default under the `user_agent` field.
The ingest-user-agent module ships by default with the regexes.yaml made available by uap-java with an Apache 2.0 license. For more details see https://github.com/ua-parser/uap-core.
[[using-ingest-user-agent]]
==== Using the user_agent Processor in a Pipeline
[[ingest-user-agent-options]]
.User-agent options
[options="header"]
|======
| Name | Required | Default | Description
| `field` | yes | - | The field containing the user agent string.
| `target_field` | no | user_agent | The field that will be filled with the user agent details.
| `regex_file` | no | - | The name of the file in the `config/ingest-user-agent` directory containing the regular expressions for parsing the user agent string. Both the directory and the file have to be created before starting Elasticsearch. If not specified, ingest-user-agent will use the regexes.yaml from uap-core it ships with (see below).
| `properties` | no | [`name`, `major`, `minor`, `patch`, `build`, `os`, `os_name`, `os_major`, `os_minor`, `device`] | Controls what properties are added to `target_field`.
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
2019-02-13 11:28:01 -05:00
| `ecs` | no | `true` | Whether to return the output in Elastic Common Schema format. NOTE: This setting is deprecated and will be removed in a future version.
2018-12-22 20:20:53 -05:00
|======
Here is an example that adds the user agent details to the `user_agent` field based on the `agent` field:
2019-09-06 11:31:13 -04:00
[source,console]
2018-12-22 20:20:53 -05:00
--------------------------------------------------
PUT _ingest/pipeline/user_agent
{
"description" : "Add user agent information",
"processors" : [
{
"user_agent" : {
"field" : "agent"
}
}
]
}
PUT my_index/_doc/my_id?pipeline=user_agent
{
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
}
GET my_index/_doc/my_id
--------------------------------------------------
Which returns
2019-09-06 16:09:09 -04:00
[source,console-result]
2018-12-22 20:20:53 -05:00
--------------------------------------------------
{
"found": true,
"_index": "my_index",
"_type": "_doc",
"_id": "my_id",
"_version": 1,
"_seq_no": 22,
"_primary_term": 1,
"_source": {
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
"user_agent": {
"name": "Chrome",
Add ECS schema for user-agent ingest processor (#37727) (#37984)
* Add ECS schema for user-agent ingest processor (#37727)
This switches the format of the user agent processor to use the schema from [ECS](https://github.com/elastic/ecs).
So rather than something like this:
```
{
"patch" : "3538",
"major" : "70",
"minor" : "0",
"os" : "Mac OS X 10.14.1",
"os_minor" : "14",
"os_major" : "10",
"name" : "Chrome",
"os_name" : "Mac OS X",
"device" : "Other"
}
```
The structure is now like this:
```
{
"name" : "Chrome",
"original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"os" : {
"name" : "Mac OS X",
"version" : "10.14.1",
"full" : "Mac OS X 10.14.1"
},
"device" : "Other",
"version" : "70.0.3538.102"
}
```
This is now the default for 7.0. The deprecated `ecs` setting in 6.x is not
supported.
Resolves #37329
* Remove `ecs` setting from docs
2019-01-30 13:24:18 -05:00
"original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
2019-10-18 10:14:45 -04:00
"version": "51.0.2704.103",
Add ECS schema for user-agent ingest processor (#37727) (#37984)
* Add ECS schema for user-agent ingest processor (#37727)
This switches the format of the user agent processor to use the schema from [ECS](https://github.com/elastic/ecs).
So rather than something like this:
```
{
"patch" : "3538",
"major" : "70",
"minor" : "0",
"os" : "Mac OS X 10.14.1",
"os_minor" : "14",
"os_major" : "10",
"name" : "Chrome",
"os_name" : "Mac OS X",
"device" : "Other"
}
```
The structure is now like this:
```
{
"name" : "Chrome",
"original" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36",
"os" : {
"name" : "Mac OS X",
"version" : "10.14.1",
"full" : "Mac OS X 10.14.1"
},
"device" : "Other",
"version" : "70.0.3538.102"
}
```
This is now the default for 7.0. The deprecated `ecs` setting in 6.x is not
supported.
Resolves #37329
* Remove `ecs` setting from docs
2019-01-30 13:24:18 -05:00
"os": {
"name": "Mac OS X",
"version": "10.10.5",
"full": "Mac OS X 10.10.5"
},
2019-01-31 14:54:34 -05:00
"device" : {
"name" : "Other"
},
2018-12-22 20:20:53 -05:00
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term": 1/"_primary_term" : $body._primary_term/]
===== Using a custom regex file
To use a custom regex file for parsing the user agents, that file has to be put into the `config/ingest-user-agent` directory and
2019-10-30 11:10:35 -04:00
has to have a `.yml` filename extension. The file has to be present at node startup, any changes to it or any new files added
2018-12-22 20:20:53 -05:00
while the node is running will not have any effect.
In practice, it will make most sense for any custom regex file to be a variant of the default file, either a more recent version
or a customised version.
The default file included in `ingest-user-agent` is the `regexes.yaml` from uap-core: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml
2019-09-16 05:29:59 -04:00
[[ingest-user-agent-settings]]
===== Node Settings
The `user_agent` processor supports the following setting:
`ingest.user_agent.cache_size`::
The maximum number of results that should be cached. Defaults to `1000`.
Note that these settings are node settings and apply to all `user_agent` processors, i.e. there is one cache for all defined `user_agent` processors.