2015-08-06 11:24:29 -04:00
[[analyzer]]
=== `analyzer`
2020-03-18 09:42:25 -04:00
[IMPORTANT]
====
Only <<text,`text`>> fields support the `analyzer` mapping parameter.
====
The `analyzer` parameter specifies the <<analyzer-anatomy,analyzer>> used for
<<analysis,text analysis>> when indexing or searching a `text` field.
Unless overridden with the <<search-analyzer,`search_analyzer`>> mapping
parameter, this analyzer is used for both <<analysis-index-search-time,index and
search analysis>>. See <<specify-analyzer>>.
[TIP]
====
We recommend testing analyzers before using them in production. See
<<test-analyzer>>.
====
2015-08-06 11:24:29 -04:00
2015-12-23 14:30:59 -05:00
[[search-quote-analyzer]]
==== `search_quote_analyzer`
2015-08-06 11:24:29 -04:00
2016-04-29 10:42:03 -04:00
The `search_quote_analyzer` setting allows you to specify an analyzer for phrases, this is particularly useful when dealing with disabling
2015-12-23 14:30:59 -05:00
stop words for phrase queries.
To disable stop words for phrases a field utilising three analyzer settings will be required:
1. An `analyzer` setting for indexing all terms including stop words
2. A `search_analyzer` setting for non-phrase queries that will remove stop words
3. A `search_quote_analyzer` setting for phrase queries that will not remove stop words
2019-09-06 11:31:13 -04:00
[source,console]
2015-12-23 14:30:59 -05:00
--------------------------------------------------
2019-01-22 09:13:52 -05:00
PUT my_index
2015-12-23 14:30:59 -05:00
{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{ <1>
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase"
]
},
"my_stop_analyzer":{ <2>
"type":"custom",
"tokenizer":"standard",
"filter":[
"lowercase",
"english_stop"
]
}
},
"filter":{
"english_stop":{
"type":"stop",
"stopwords":"_english_"
}
}
}
},
"mappings":{
2019-01-22 09:13:52 -05:00
"properties":{
"title": {
"type":"text",
"analyzer":"my_analyzer", <3>
"search_analyzer":"my_stop_analyzer", <4>
"search_quote_analyzer":"my_analyzer" <5>
2015-12-23 14:30:59 -05:00
}
}
}
}
2017-12-14 11:47:53 -05:00
PUT my_index/_doc/1
2015-12-23 14:30:59 -05:00
{
"title":"The Quick Brown Fox"
}
2017-12-14 11:47:53 -05:00
PUT my_index/_doc/2
2015-12-23 14:30:59 -05:00
{
"title":"A Quick Brown Fox"
}
2017-12-14 11:47:53 -05:00
GET my_index/_search
2015-12-23 14:30:59 -05:00
{
"query":{
"query_string":{
"query":"\"the quick brown fox\"" <6>
}
}
}
--------------------------------------------------
2019-09-06 11:31:13 -04:00
2015-12-23 14:30:59 -05:00
<1> `my_analyzer` analyzer which tokens all terms including stop words
<2> `my_stop_analyzer` analyzer which removes stop words
<3> `analyzer` setting that points to the `my_analyzer` analyzer which will be used at index time
<4> `search_analyzer` setting that points to the `my_stop_analyzer` and removes stop words for non-phrase queries
2016-04-29 10:42:03 -04:00
<5> `search_quote_analyzer` setting that points to the `my_analyzer` analyzer and ensures that stop words are not removed from phrase queries
2015-12-23 14:30:59 -05:00
<6> Since the query is wrapped in quotes it is detected as a phrase query therefore the `search_quote_analyzer` kicks in and ensures the stop words
2016-04-29 10:42:03 -04:00
are not removed from the query. The `my_analyzer` analyzer will then return the following tokens [`the`, `quick`, `brown`, `fox`] which will match one
of the documents. Meanwhile term queries will be analyzed with the `my_stop_analyzer` analyzer which will filter out stop words. So a search for either
`The quick brown fox` or `A quick brown fox` will return both documents since both documents contain the following tokens [`quick`, `brown`, `fox`].
Without the `search_quote_analyzer` it would not be possible to do exact matches for phrase queries as the stop words from phrase queries would be
2015-12-23 14:30:59 -05:00
removed resulting in both documents matching.