OpenSearch/docs/reference/mapping/params/analyzer.asciidoc

[[analyzer]]
=== `analyzer`

The values of <<mapping-index,`analyzed`>> string fields are passed through an
<<analysis,analyzer>> to convert the string into a stream of _tokens_ or
_terms_.  For instance, the string `"The quick Brown Foxes."` may, depending
on which analyzer is used,  be analyzed to the tokens: `quick`, `brown`,
`fox`.  These are the actual terms that are indexed for the field, which makes
it possible to search efficiently for individual words _within_  big blobs of
text.

This analysis process needs to happen not just at index time, but also at
query time: the query string needs to be passed through the same (or a
similar) analyzer so that the terms that it tries to find are in the same
format as those that exist in the index.

Elasticsearch ships with a number of <<analysis-analyzers,pre-defined analyzers>>,
which can be used without further configuration.  It also ships with many
<<analysis-charfilters,character filters>>, <<analysis-tokenizers,tokenizers>>,
and <<analysis-tokenfilters>> which can be combined to configure
custom analyzers per index.

Analyzers can be specified per-query, per-field or per-index. At index time,
Elasticsearch will look for an analyzer in this order:

* The `analyzer` defined in the field mapping.
* An analyzer named `default` in the index settings.
* The <<analysis-standard-analyzer,`standard`>> analyzer.

At query time, there are a few more layers:

* The `analyzer` defined in a <<full-text-queries,full-text query>>.
* The `search_analyzer` defined in the field mapping.
* The `analyzer` defined in the field mapping.
* An analyzer named `default_search` in the index settings.
* An analyzer named `default` in the index settings.
* The <<analysis-standard-analyzer,`standard`>> analyzer.

The easiest way to specify an analyzer for a particular field is to define it
in the field mapping, as follows:

[source,js]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": { <1>
          "type": "string",
          "fields": {
            "english": { <2>
              "type":     "string",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}

GET my_index/_analyze?field=text <3>
{
  "text": "The quick Brown Foxes."
}

GET my_index/_analyze?field=text.english <4>
{
  "text": "The quick Brown Foxes."
}
--------------------------------------------------
// AUTOSENSE
<1> The `text` field uses the default `standard` analyzer`.
<2> The `text.english` <<multi-fields,multi-field>> uses the `english` analyzer, which removes stop words and applies stemming.
<3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ].
<4> This returns the tokens: [ `quick`, `brown`, `fox` ].
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`[[analyzer]]`
			=== `analyzer`

			The values of <<mapping-index,`analyzed`>> string fields are passed through an
			`<<analysis,analyzer>> to convert the string into a stream of _tokens_ or`
			_terms_. For instance, the string `"The quick Brown Foxes."` may, depending
			on which analyzer is used, be analyzed to the tokens: `quick`, `brown`,
			`fox`. These are the actual terms that are indexed for the field, which makes
			`it possible to search efficiently for individual words _within_ big blobs of`
			`text.`

			`This analysis process needs to happen not just at index time, but also at`
			`query time: the query string needs to be passed through the same (or a`
			`similar) analyzer so that the terms that it tries to find are in the same`
			`format as those that exist in the index.`

			`Elasticsearch ships with a number of <<analysis-analyzers,pre-defined analyzers>>,`
			`which can be used without further configuration. It also ships with many`
			`<<analysis-charfilters,character filters>>, <<analysis-tokenizers,tokenizers>>,`
			`and <<analysis-tokenfilters>> which can be combined to configure`
			`custom analyzers per index.`

			`Analyzers can be specified per-query, per-field or per-index. At index time,`
			`Elasticsearch will look for an analyzer in this order:`

			* The `analyzer` defined in the field mapping.
			* An analyzer named `default` in the index settings.
			* The <<analysis-standard-analyzer,`standard`>> analyzer.

			`At query time, there are a few more layers:`

			* The `analyzer` defined in a <<full-text-queries,full-text query>>.
			* The `search_analyzer` defined in the field mapping.
			* The `analyzer` defined in the field mapping.
			* An analyzer named `default_search` in the index settings.
			* An analyzer named `default` in the index settings.
			* The <<analysis-standard-analyzer,`standard`>> analyzer.

			`The easiest way to specify an analyzer for a particular field is to define it`
			`in the field mapping, as follows:`

			`[source,js]`
			`--------------------------------------------------`
			`PUT my_index`
			`{`
			`"mappings": {`
			`"my_type": {`
			`"properties": {`
			`"text": { <1>`
			`"type": "string",`
			`"fields": {`
			`"english": { <2>`
			`"type": "string",`
			`"analyzer": "english"`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`}`

			`GET my_index/_analyze?field=text <3>`
			`{`
			`"text": "The quick Brown Foxes."`
			`}`

			`GET my_index/_analyze?field=text.english <4>`
			`{`
			`"text": "The quick Brown Foxes."`
			`}`
			`--------------------------------------------------`
			`// AUTOSENSE`
			<1> The `text` field uses the default `standard` analyzer`.
			<2> The `text.english` <<multi-fields,multi-field>> uses the `english` analyzer, which removes stop words and applies stemming.
			<3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ].
			<4> This returns the tokens: [ `quick`, `brown`, `fox` ].