mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-07 13:38:49 +00:00
Forward port of https://github.com/elastic/elasticsearch/pull/38757 This change reverts the initial 7.0 commits and replaces them with the 6.7 variant that still allows for the ecs flag. This commit differs from the 6.7 variants in that ecs flag will now default to true. 6.7: `ecs` : default `false` 7.x: `ecs` : default `true` 8.0: no option, but behaves as `true` * Revert "Ingest node - user agent, move device to an object (#38115)" This reverts commit 5b008a34aa3c07e37b12b415d3c22a44da491329. * Revert "Add ECS schema for user-agent ingest processor (#37727) (#37984)" This reverts commit cac6b8e06f051d68919faf6081f1c87fa5b6757d. * cherry-pick 5dfe1935345da3799931fd4a3ebe0b6aa9c17f57 Add ECS schema for user-agent ingest processor (#37727) * cherry-pick ec8ddc890a34853ee8db6af66f608b0ad0cd1099 Ingest node - user agent, move device to an object (#38115) (#38121) * cherry-pick f63cbdb9b426ba24ee4d987ca767ca05a22f2fbb (with manual merge fixes) Dep. check for ECS changes to User Agent processor (#38362) * make true the default for the ecs option, and update 7.0 references and tests
89 lines
4.4 KiB
Plaintext
89 lines
4.4 KiB
Plaintext
[[user-agent-processor]]
|
|
=== User Agent processor
|
|
|
|
The `user_agent` processor extracts details from the user agent string a browser sends with its web requests.
|
|
This processor adds this information by default under the `user_agent` field.
|
|
|
|
The ingest-user-agent module ships by default with the regexes.yaml made available by uap-java with an Apache 2.0 license. For more details see https://github.com/ua-parser/uap-core.
|
|
|
|
[[using-ingest-user-agent]]
|
|
==== Using the user_agent Processor in a Pipeline
|
|
|
|
[[ingest-user-agent-options]]
|
|
.User-agent options
|
|
[options="header"]
|
|
|======
|
|
| Name | Required | Default | Description
|
|
| `field` | yes | - | The field containing the user agent string.
|
|
| `target_field` | no | user_agent | The field that will be filled with the user agent details.
|
|
| `regex_file` | no | - | The name of the file in the `config/ingest-user-agent` directory containing the regular expressions for parsing the user agent string. Both the directory and the file have to be created before starting Elasticsearch. If not specified, ingest-user-agent will use the regexes.yaml from uap-core it ships with (see below).
|
|
| `properties` | no | [`name`, `major`, `minor`, `patch`, `build`, `os`, `os_name`, `os_major`, `os_minor`, `device`] | Controls what properties are added to `target_field`.
|
|
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
|
|
| `ecs` | no | `true` | Whether to return the output in Elastic Common Schema format. NOTE: This setting is deprecated and will be removed in a future version.
|
|
|======
|
|
|
|
Here is an example that adds the user agent details to the `user_agent` field based on the `agent` field:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT _ingest/pipeline/user_agent
|
|
{
|
|
"description" : "Add user agent information",
|
|
"processors" : [
|
|
{
|
|
"user_agent" : {
|
|
"field" : "agent"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
PUT my_index/_doc/my_id?pipeline=user_agent
|
|
{
|
|
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
|
|
}
|
|
GET my_index/_doc/my_id
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
Which returns
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"found": true,
|
|
"_index": "my_index",
|
|
"_type": "_doc",
|
|
"_id": "my_id",
|
|
"_version": 1,
|
|
"_seq_no": 22,
|
|
"_primary_term": 1,
|
|
"_source": {
|
|
"agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
|
|
"user_agent": {
|
|
"name": "Chrome",
|
|
"original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36",
|
|
"version": "51.0.2704",
|
|
"os": {
|
|
"name": "Mac OS X",
|
|
"version": "10.10.5",
|
|
"full": "Mac OS X 10.10.5"
|
|
},
|
|
"device" : {
|
|
"name" : "Other"
|
|
},
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term": 1/"_primary_term" : $body._primary_term/]
|
|
|
|
===== Using a custom regex file
|
|
To use a custom regex file for parsing the user agents, that file has to be put into the `config/ingest-user-agent` directory and
|
|
has to have a `.yaml` filename extension. The file has to be present at node startup, any changes to it or any new files added
|
|
while the node is running will not have any effect.
|
|
|
|
In practice, it will make most sense for any custom regex file to be a variant of the default file, either a more recent version
|
|
or a customised version.
|
|
|
|
The default file included in `ingest-user-agent` is the `regexes.yaml` from uap-core: https://github.com/ua-parser/uap-core/blob/master/regexes.yaml
|