OpenSearch/docs/reference/analysis/index-search-time.asciidoc

[[analysis-index-search-time]]
=== Index and search analysis

Text analysis occurs at two times:

Index time::
When a document is indexed, any <<text,`text`>> field values are analyzed.

Search time::
When running a <<full-text-queries,full-text search>> on a `text` field,
the query string (the text the user is searching for) is analyzed.
+
Search time is also called _query time_.

The analyzer, or set of analysis rules, used at each time is called the _index
analyzer_ or _search analyzer_ respectively.

[[analysis-same-index-search-analyzer]]
==== How the index and search analyzer work together

In most cases, the same analyzer should be used at index and search time. This
ensures the values and query strings for a field are changed into the same form
of tokens. In turn, this ensures the tokens match as expected during a search.

.**Example**
[%collapsible]
====

A document is indexed with the following value in a `text` field:

[source,text]
------
The QUICK brown foxes jumped over the dog!
------

The index analyzer for the field converts the value into tokens and normalizes
them. In this case, each of the tokens represents a word:

[source,text]
------
[ quick, brown, fox, jump, over, dog ]
------

These tokens are then indexed.

Later, a user searches the same `text` field for:

[source,text]
------
"Quick fox"
------

The user expects this search to match the sentence indexed earlier,
`The QUICK brown foxes jumped over the dog!`.

However, the query string does not contain the exact words used in the
document's original text:

* `quick` vs `QUICK`
* `fox` vs `foxes`

To account for this, the query string is analyzed using the same analyzer. This
analyzer produces the following tokens:

[source,text]
------
[ quick, fox ]
------

To execute the search, {es} compares these query string tokens to the tokens
indexed in the `text` field.

[options="header"]
|===
|Token     | Query string | `text` field
|`quick`   | X            | X
|`brown`   |              | X
|`fox`     | X            | X
|`jump`    |              | X
|`over`    |              | X
|`dog`     |              | X
|===

Because the field value are query string were analyzed in the same way, they
created similar tokens. The tokens `quick` and `fox` are exact matches. This
means the search matches the document containing `"The QUICK brown foxes jumped
over the dog!"`, just as the user expects.
====

[[different-analyzers]]
==== When to use a different search analyzer

While less common, it sometimes makes sense to use different analyzers at index
and search time. To enable this, {es} allows you to
<<specify-search-analyzer,specify a separate search analyzer>>.

Generally, a separate search analyzer should only be specified when using the
same form of tokens for field values and query strings would create unexpected
or irrelevant search matches.

[[different-analyzer-ex]]
.*Example*
[%collapsible]
====
{es} is used to create a search engine that matches only words that start with
a provided prefix. For instance, a search for `tr` should return `tram` or
`trope`—but never `taxi` or `bat`.

A document is added to the search engine's index; this document contains one
such word in a `text` field:

[source,text]
------
"Apple"
------

The index analyzer for the field converts the value into tokens and normalizes
them. In this case, each of the tokens represents a potential prefix for
the word:

[source,text]
------
[ a, ap, app, appl, apple]
------

These tokens are then indexed.

Later, a user searches the same `text` field for:

[source,text]
------
"appli"
------

The user expects this search to match only words that start with `appli`,
such as `appliance` or `application`. The search should not match `apple`.

However, if the index analyzer is used to analyze this query string, it would
produce the following tokens:

[source,text]
------
[ a, ap, app, appl, appli ]
------

When {es} compares these query string tokens to the ones indexed for `apple`,
it finds several matches.

[options="header"]
|===
|Token      | `appli`      | `apple`
|`a`        | X            | X
|`ap`       | X            | X
|`app`      | X            | X
|`appl`     | X            | X
|`appli`    |              | X
|===

This means the search would erroneously match `apple`. Not only that, it would
match any word starting with `a`.

To fix this, you can specify a different search analyzer for query strings used
on the `text` field.

In this case, you could specify a search analyzer that produces a single token
rather than a set of prefixes:

[source,text]
------
[ appli ]
------

This query string token would only match tokens for words that start with
`appli`, which better aligns with the user's search expectations.
====
[DOCS] Rewrite analysis intro (#51184) * [DOCS] Rewrite analysis intro. Move index/search analysis content. * Rewrites 'Text analysis' page intro as high-level definition. Adds guidance on when users should configure text analysis * Rewrites and splits index/search analysis content: * Conceptual content -> 'Index and search analysis' under 'Concepts' * Task-based content -> 'Specify an analyzer' under 'Configure...' * Adds detailed examples for when to use the same index/search analyzer and when not. * Adds new example snippets for specifying search analyzers * clarifications * Add toc. Decrement headings. * Reword 'When to configure' section * Remove sentence from tip 2020-01-30 09:19:53 -05:00			`[[analysis-index-search-time]]`
			`=== Index and search analysis`

			`Text analysis occurs at two times:`

			`Index time::`
			When a document is indexed, any <<text,`text`>> field values are analyzed.

			`Search time::`
			When running a <<full-text-queries,full-text search>> on a `text` field,
			`the query string (the text the user is searching for) is analyzed.`
			`+`
			`Search time is also called _query time_.`

			`The analyzer, or set of analysis rules, used at each time is called the _index`
			`analyzer_ or _search analyzer_ respectively.`

			`[[analysis-same-index-search-analyzer]]`
			`==== How the index and search analyzer work together`

			`In most cases, the same analyzer should be used at index and search time. This`
			`ensures the values and query strings for a field are changed into the same form`
			`of tokens. In turn, this ensures the tokens match as expected during a search.`

			`.Example`
			`[%collapsible]`
			`====`

			A document is indexed with the following value in a `text` field:

			`[source,text]`
			`------`
			`The QUICK brown foxes jumped over the dog!`
			`------`

			`The index analyzer for the field converts the value into tokens and normalizes`
			`them. In this case, each of the tokens represents a word:`

			`[source,text]`
			`------`
			`[ quick, brown, fox, jump, over, dog ]`
			`------`

			`These tokens are then indexed.`

			Later, a user searches the same `text` field for:

			`[source,text]`
			`------`
			`"Quick fox"`
			`------`

			`The user expects this search to match the sentence indexed earlier,`
			`The QUICK brown foxes jumped over the dog!`.

			`However, the query string does not contain the exact words used in the`
			`document's original text:`

			* `quick` vs `QUICK`
			* `fox` vs `foxes`

			`To account for this, the query string is analyzed using the same analyzer. This`
			`analyzer produces the following tokens:`

			`[source,text]`
			`------`
			`[ quick, fox ]`
			`------`

[DOCS] Fix typo in index and search analysis docs (#52988) 2020-03-02 07:22:27 -05:00			`To execute the search, {es} compares these query string tokens to the tokens`
[DOCS] Rewrite analysis intro (#51184) * [DOCS] Rewrite analysis intro. Move index/search analysis content. * Rewrites 'Text analysis' page intro as high-level definition. Adds guidance on when users should configure text analysis * Rewrites and splits index/search analysis content: * Conceptual content -> 'Index and search analysis' under 'Concepts' * Task-based content -> 'Specify an analyzer' under 'Configure...' * Adds detailed examples for when to use the same index/search analyzer and when not. * Adds new example snippets for specifying search analyzers * clarifications * Add toc. Decrement headings. * Reword 'When to configure' section * Remove sentence from tip 2020-01-30 09:19:53 -05:00			indexed in the `text` field.

			`[options="header"]`
			`\|===`
			\|Token \| Query string \| `text` field
			\|`quick` \| X \| X
			\|`brown` \| \| X
			\|`fox` \| X \| X
			\|`jump` \| \| X
			\|`over` \| \| X
			\|`dog` \| \| X
			`\|===`

			`Because the field value are query string were analyzed in the same way, they`
			created similar tokens. The tokens `quick` and `fox` are exact matches. This
			means the search matches the document containing `"The QUICK brown foxes jumped
			over the dog!"`, just as the user expects.
			`====`

			`[[different-analyzers]]`
			`==== When to use a different search analyzer`

			`While less common, it sometimes makes sense to use different analyzers at index`
			`and search time. To enable this, {es} allows you to`
			`<<specify-search-analyzer,specify a separate search analyzer>>.`

			`Generally, a separate search analyzer should only be specified when using the`
			`same form of tokens for field values and query strings would create unexpected`
			`or irrelevant search matches.`

			`[[different-analyzer-ex]]`
			`.Example`
			`[%collapsible]`
			`====`
			`{es} is used to create a search engine that matches only words that start with`
			a provided prefix. For instance, a search for `tr` should return `tram` or
			`trope`—but never `taxi` or `bat`.

			`A document is added to the search engine's index; this document contains one`
			such word in a `text` field:

			`[source,text]`
			`------`
			`"Apple"`
			`------`

			`The index analyzer for the field converts the value into tokens and normalizes`
			`them. In this case, each of the tokens represents a potential prefix for`
			`the word:`

			`[source,text]`
			`------`
			`[ a, ap, app, appl, apple]`
			`------`

			`These tokens are then indexed.`

			Later, a user searches the same `text` field for:

			`[source,text]`
			`------`
			`"appli"`
			`------`

			The user expects this search to match only words that start with `appli`,
			such as `appliance` or `application`. The search should not match `apple`.

			`However, if the index analyzer is used to analyze this query string, it would`
			`produce the following tokens:`

			`[source,text]`
			`------`
			`[ a, ap, app, appl, appli ]`
			`------`

			When {es} compares these query string tokens to the ones indexed for `apple`,
			`it finds several matches.`

			`[options="header"]`
			`\|===`
			\|Token \| `appli` \| `apple`
			\|`a` \| X \| X
			\|`ap` \| X \| X
			\|`app` \| X \| X
			\|`appl` \| X \| X
			\|`appli` \| \| X
			`\|===`

			This means the search would erroneously match `apple`. Not only that, it would
			match any word starting with `a`.

			`To fix this, you can specify a different search analyzer for query strings used`
			on the `text` field.

			`In this case, you could specify a search analyzer that produces a single token`
			`rather than a set of prefixes:`

			`[source,text]`
			`------`
			`[ appli ]`
			`------`

			`This query string token would only match tokens for words that start with`
			`appli`, which better aligns with the user's search expectations.
			`====`