OpenSearch/docs/plugins/mapper-murmur3.asciidoc

[[mapper-murmur3]]
=== Mapper Murmur3 Plugin

The mapper-murmur3 plugin provides the ability to compute hash of field values
at index-time and store them in the index. This can sometimes be helpful when
running cardinality aggregations on high-cardinality and large string fields.

[[mapper-murmur3-install]]
[float]
==== Installation

This plugin can be installed using the plugin manager:

[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin install mapper-murmur3
----------------------------------------------------------------

The plugin must be installed on every node in the cluster, and each node must
be restarted after installation.

This plugin can be downloaded for offline install from
{plugin_url}/mapper-murmur3/{version}/mapper-murmur3-{version}.zip[elastic download service].

[[mapper-murmur3-remove]]
[float]
==== Removal

The plugin can be removed with the following command:

[source,sh]
----------------------------------------------------------------
sudo bin/elasticsearch-plugin remove mapper-murmur3
----------------------------------------------------------------

The node must be stopped before removing the plugin.

[[mapper-murmur3-usage]]
==== Using the `murmur3` field

The `murmur3` is typically used within a multi-field, so that both the original
value and its hash are stored in the index:

[source,js]
--------------------------
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": {
          "type": "keyword",
          "fields": {
            "hash": {
              "type": "murmur3"
            }
          }
        }
      }
    }
  }
}
--------------------------
// CONSOLE

Such a mapping would allow to refer to `my_field.hash` in order to get hashes
of the values of the `my_field` field. This is only useful in order to run
`cardinality` aggregations:

[source,js]
--------------------------
# Example documents
PUT my_index/my_type/1
{
  "my_field": "This is a document"
}

PUT my_index/my_type/2
{
  "my_field": "This is another document"
}

GET my_index/_search
{
  "aggs": {
    "my_field_cardinality": {
      "cardinality": {
        "field": "my_field.hash" <1>
      }
    }
  }
}
--------------------------
// CONSOLE

<1> Counting unique values on the `my_field.hash` field

Running a `cardinality` aggregation on the `my_field` field directly would
yield the same result, however using `my_field.hash` instead might result in
a speed-up if the field has a high-cardinality. On the other hand, it is
discouraged to use the `murmur3` field on numeric fields and string fields
that are not almost unique as the use of a `murmur3` field is unlikely to
bring significant speed-ups, while increasing the amount of disk space required
to store the index.
Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00			`[[mapper-murmur3]]`
			`=== Mapper Murmur3 Plugin`

			`The mapper-murmur3 plugin provides the ability to compute hash of field values`
			`at index-time and store them in the index. This can sometimes be helpful when`
			`running cardinality aggregations on high-cardinality and large string fields.`

			`[[mapper-murmur3-install]]`
			`[float]`
			`==== Installation`

			`This plugin can be installed using the plugin manager:`

			`[source,sh]`
			`----------------------------------------------------------------`
Rename bin/plugin in bin/elasticsearch-plugin 2016-02-04 16:00:55 +01:00			`sudo bin/elasticsearch-plugin install mapper-murmur3`
Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00			`----------------------------------------------------------------`

			`The plugin must be installed on every node in the cluster, and each node must`
			`be restarted after installation.`

Add offline install instructions for plugins Follow up of https://github.com/elastic/elasticsearch/issues/15925#issuecomment-171250150 This commit adds offline install instructions for plugins. 2016-09-12 15:34:44 +02:00			`This plugin can be downloaded for offline install from`
			`{plugin_url}/mapper-murmur3/{version}/mapper-murmur3-{version}.zip[elastic download service].`

Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00			`[[mapper-murmur3-remove]]`
			`[float]`
			`==== Removal`

			`The plugin can be removed with the following command:`

			`[source,sh]`
			`----------------------------------------------------------------`
Rename bin/plugin in bin/elasticsearch-plugin 2016-02-04 16:00:55 +01:00			`sudo bin/elasticsearch-plugin remove mapper-murmur3`
Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00			`----------------------------------------------------------------`

			`The node must be stopped before removing the plugin.`

			`[[mapper-murmur3-usage]]`
			==== Using the `murmur3` field

			The `murmur3` is typically used within a multi-field, so that both the original
			`value and its hash are stored in the index:`

			`[source,js]`
			`--------------------------`
			`PUT my_index`
			`{`
			`"mappings": {`
			`"my_type": {`
			`"properties": {`
			`"my_field": {`
Docs: Removed references to deprecated functionality * search_type=count * DFS in term vectors * Replaced string with text/keyword as appropriate 2016-04-07 13:33:35 +02:00			`"type": "keyword",`
Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00			`"fields": {`
			`"hash": {`
			`"type": "murmur3"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00			`// CONSOLE`
Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00
			Such a mapping would allow to refer to `my_field.hash` in order to get hashes
			of the values of the `my_field` field. This is only useful in order to run
			`cardinality` aggregations:

			`[source,js]`
			`--------------------------`
			`# Example documents`
			`PUT my_index/my_type/1`
			`{`
			`"my_field": "This is a document"`
			`}`

			`PUT my_index/my_type/2`
			`{`
			`"my_field": "This is another document"`
			`}`

			`GET my_index/_search`
			`{`
			`"aggs": {`
			`"my_field_cardinality": {`
			`"cardinality": {`
			`"field": "my_field.hash" <1>`
			`}`
			`}`
			`}`
			`}`
			`--------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00			`// CONSOLE`
Move the `murmur3` field to a plugin and fix defaults. This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its defaults so that values will not be indexed by default, as the only purpose of this field is to speed up `cardinality` aggregations on high-cardinality string fields, which only requires doc values. I also removed the `rehash` option from the `cardinality` aggregation as it doesn't bring much value (rehashing is cheap) and allowed to remove the coupling between the `cardinality` aggregation and the `murmur3` field. Close #12874 2015-08-17 12:47:14 +02:00
			<1> Counting unique values on the `my_field.hash` field

			Running a `cardinality` aggregation on the `my_field` field directly would
			yield the same result, however using `my_field.hash` instead might result in
			`a speed-up if the field has a high-cardinality. On the other hand, it is`
			discouraged to use the `murmur3` field on numeric fields and string fields
			that are not almost unique as the use of a `murmur3` field is unlikely to
			`bring significant speed-ups, while increasing the amount of disk space required`
			`to store the index.`