OpenSearch/docs/plugins/mapper-murmur3.asciidoc

105 lines
2.7 KiB
Plaintext

[[mapper-murmur3]]
=== Mapper Murmur3 Plugin
The mapper-murmur3 plugin provides the ability to compute hash of field values
at index-time and store them in the index. This can sometimes be helpful when
running cardinality aggregations on high-cardinality and large string fields.
[[mapper-murmur3-install]]
[float]
==== Installation
This plugin can be installed using the plugin manager:
[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin install mapper-murmur3
----------------------------------------------------------------
The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.
This plugin can be downloaded for <<plugin-management-custom-url,offline install>> from
{plugin_url}/mapper-murmur3/mapper-murmur3-{version}.zip.
[[mapper-murmur3-remove]]
[float]
==== Removal
The plugin can be removed with the following command:
[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin remove mapper-murmur3
----------------------------------------------------------------
The node must be stopped before removing the plugin.
[[mapper-murmur3-usage]]
==== Using the `murmur3` field
The `murmur3` is typically used within a multi-field, so that both the original
value and its hash are stored in the index:
[source,js]
--------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_field": {
"type": "keyword",
"fields": {
"hash": {
"type": "murmur3"
}
}
}
}
}
}
}
--------------------------
// CONSOLE
Such a mapping would allow to refer to `my_field.hash` in order to get hashes
of the values of the `my_field` field. This is only useful in order to run
`cardinality` aggregations:
[source,js]
--------------------------
# Example documents
PUT my_index/my_type/1
{
"my_field": "This is a document"
}
PUT my_index/my_type/2
{
"my_field": "This is another document"
}
GET my_index/_search
{
"aggs": {
"my_field_cardinality": {
"cardinality": {
"field": "my_field.hash" <1>
}
}
}
}
--------------------------
// CONSOLE
<1> Counting unique values on the `my_field.hash` field
Running a `cardinality` aggregation on the `my_field` field directly would
yield the same result, however using `my_field.hash` instead might result in
a speed-up if the field has a high-cardinality. On the other hand, it is
discouraged to use the `murmur3` field on numeric fields and string fields
that are not almost unique as the use of a `murmur3` field is unlikely to
bring significant speed-ups, while increasing the amount of disk space required
to store the index.