OpenSearch/docs/reference/mapping/params/analyzer.asciidoc

[[analyzer]]
=== `analyzer`

[IMPORTANT]
====
Only <<text,`text`>> fields support the `analyzer` mapping parameter.
====

The `analyzer` parameter specifies the <<analyzer-anatomy,analyzer>> used for
<<analysis,text analysis>> when indexing or searching a `text` field.

Unless overridden with the <<search-analyzer,`search_analyzer`>> mapping
parameter, this analyzer is used for both <<analysis-index-search-time,index and
search analysis>>. See <<specify-analyzer>>.

[TIP]
====
We recommend testing analyzers before using them in production. See
<<test-analyzer>>.
====

[[search-quote-analyzer]]
==== `search_quote_analyzer`

The `search_quote_analyzer` setting allows you to specify an analyzer for phrases, this is particularly useful when dealing with disabling
stop words for phrase queries.

To disable stop words for phrases a field utilising three analyzer settings will be required:

1. An `analyzer` setting for indexing all terms including stop words
2. A `search_analyzer` setting for non-phrase queries that will remove stop words
3. A `search_quote_analyzer` setting for phrase queries that will not remove stop words

[source,console]
--------------------------------------------------
PUT my_index
{
   "settings":{
      "analysis":{
         "analyzer":{
            "my_analyzer":{ <1>
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase"
               ]
            },
            "my_stop_analyzer":{ <2>
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "english_stop"
               ]
            }
         },
         "filter":{
            "english_stop":{
               "type":"stop",
               "stopwords":"_english_"
            }
         }
      }
   },
   "mappings":{
       "properties":{
          "title": {
             "type":"text",
             "analyzer":"my_analyzer", <3>
             "search_analyzer":"my_stop_analyzer", <4>
             "search_quote_analyzer":"my_analyzer" <5>
         }
      }
   }
}

PUT my_index/_doc/1
{
   "title":"The Quick Brown Fox"
}

PUT my_index/_doc/2
{
   "title":"A Quick Brown Fox"
}

GET my_index/_search
{
   "query":{
      "query_string":{
         "query":"\"the quick brown fox\"" <6>
      }
   }
}
--------------------------------------------------

<1> `my_analyzer` analyzer which tokens all terms including stop words
<2> `my_stop_analyzer` analyzer which removes stop words
<3> `analyzer` setting that points to the `my_analyzer` analyzer which will be used at index time
<4> `search_analyzer` setting that points to the `my_stop_analyzer` and removes stop words for non-phrase queries
<5> `search_quote_analyzer` setting that points to the `my_analyzer` analyzer and ensures that stop words are not removed from phrase queries
<6> Since the query is wrapped in quotes it is detected as a phrase query therefore the `search_quote_analyzer` kicks in and ensures the stop words
are not removed from the query. The `my_analyzer` analyzer will then return the following tokens [`the`, `quick`, `brown`, `fox`] which will match one
of the documents. Meanwhile term queries will be analyzed with the `my_stop_analyzer` analyzer which will filter out stop words. So a search for either
`The quick brown fox` or `A quick brown fox` will return both documents since both documents contain the following tokens [`quick`, `brown`, `fox`].
Without the `search_quote_analyzer` it would not be possible to do exact matches for phrase queries as the stop words from phrase queries would be
removed resulting in both documents matching.
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`[[analyzer]]`
			=== `analyzer`

[DOCS] Streamline `analyzer` mapping parm def (#51874) Simplifies the `analyzer` mapping parameter definition to remove duplicated analysis content and examples. 2020-03-18 09:42:25 -04:00			`[IMPORTANT]`
			`====`
			Only <<text,`text`>> fields support the `analyzer` mapping parameter.
			`====`

			The `analyzer` parameter specifies the <<analyzer-anatomy,analyzer>> used for
			<<analysis,text analysis>> when indexing or searching a `text` field.

			Unless overridden with the <<search-analyzer,`search_analyzer`>> mapping
			`parameter, this analyzer is used for both <<analysis-index-search-time,index and`
			`search analysis>>. See <<specify-analyzer>>.`

			`[TIP]`
			`====`
			`We recommend testing analyzers before using them in production. See`
			`<<test-analyzer>>.`
			`====`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`[[search-quote-analyzer]]`
			==== `search_quote_analyzer`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			The `search_quote_analyzer` setting allows you to specify an analyzer for phrases, this is particularly useful when dealing with disabling
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`stop words for phrase queries.`

			`To disable stop words for phrases a field utilising three analyzer settings will be required:`

			1. An `analyzer` setting for indexing all terms including stop words
			2. A `search_analyzer` setting for non-phrase queries that will remove stop words
			3. A `search_quote_analyzer` setting for phrase queries that will not remove stop words

[DOCS] Change // CONSOLE comments to [source,console] (#46441) (#46451) 2019-09-06 11:31:13 -04:00			`[source,console]`
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`--------------------------------------------------`
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 09:13:52 -05:00			`PUT my_index`
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`{`
			`"settings":{`
			`"analysis":{`
			`"analyzer":{`
			`"my_analyzer":{ <1>`
			`"type":"custom",`
			`"tokenizer":"standard",`
			`"filter":[`
			`"lowercase"`
			`]`
			`},`
			`"my_stop_analyzer":{ <2>`
			`"type":"custom",`
			`"tokenizer":"standard",`
			`"filter":[`
			`"lowercase",`
			`"english_stop"`
			`]`
			`}`
			`},`
			`"filter":{`
			`"english_stop":{`
			`"type":"stop",`
			`"stopwords":"_english_"`
			`}`
			`}`
			`}`
			`},`
			`"mappings":{`
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 09:13:52 -05:00			`"properties":{`
			`"title": {`
			`"type":"text",`
			`"analyzer":"my_analyzer", <3>`
			`"search_analyzer":"my_stop_analyzer", <4>`
			`"search_quote_analyzer":"my_analyzer" <5>`
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`}`
			`}`
			`}`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 11:47:53 -05:00			`PUT my_index/_doc/1`
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`{`
			`"title":"The Quick Brown Fox"`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 11:47:53 -05:00			`PUT my_index/_doc/2`
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`{`
			`"title":"A Quick Brown Fox"`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 11:47:53 -05:00			`GET my_index/_search`
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`{`
			`"query":{`
			`"query_string":{`
			`"query":"\"the quick brown fox\"" <6>`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
[DOCS] Change // CONSOLE comments to [source,console] (#46441) (#46451) 2019-09-06 11:31:13 -04:00
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			<1> `my_analyzer` analyzer which tokens all terms including stop words
			<2> `my_stop_analyzer` analyzer which removes stop words
			<3> `analyzer` setting that points to the `my_analyzer` analyzer which will be used at index time
			<4> `search_analyzer` setting that points to the `my_stop_analyzer` and removes stop words for non-phrase queries
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			<5> `search_quote_analyzer` setting that points to the `my_analyzer` analyzer and ensures that stop words are not removed from phrase queries
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			<6> Since the query is wrapped in quotes it is detected as a phrase query therefore the `search_quote_analyzer` kicks in and ensures the stop words
Generate and run tests from the docs Adds infrastructure so `gradle :docs:check` will extract tests from snippets in the documentation and execute the tests. This is included in `gradle check` so it should happen on CI and during a normal build. By default each `// AUTOSENSE` snippet creates a unique REST test. These tests are executed in a random order and the cluster is wiped between each one. If multiple snippets chain together into a test you can annotate all snippets after the first with `// TEST[continued]` to have the generated tests for both snippets joined. Snippets marked as `// TESTRESPONSE` are checked against the response of the last action. See docs/README.asciidoc for lots more. Closes #12583. That issue is about catching bugs in the docs during build. This catches some bugs in the docs during build which is a good start. 2016-04-29 10:42:03 -04:00			are not removed from the query. The `my_analyzer` analyzer will then return the following tokens [`the`, `quick`, `brown`, `fox`] which will match one
			of the documents. Meanwhile term queries will be analyzed with the `my_stop_analyzer` analyzer which will filter out stop words. So a search for either
			`The quick brown fox` or `A quick brown fox` will return both documents since both documents contain the following tokens [`quick`, `brown`, `fox`].
			Without the `search_quote_analyzer` it would not be possible to do exact matches for phrase queries as the stop words from phrase queries would be
Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2015-12-23 14:30:59 -05:00			`removed resulting in both documents matching.`