2013-08-28 19:24:34 -04:00
|
|
|
[[analysis]]
|
2020-01-08 13:53:08 -05:00
|
|
|
= Text analysis
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2020-01-30 11:22:30 -05:00
|
|
|
:lucene-analysis-docs: https://lucene.apache.org/core/{lucene_version_path}/analyzers-common/org/apache/lucene/analysis
|
2020-03-03 13:22:52 -05:00
|
|
|
:lucene-stop-word-link: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/resources/org/apache/lucene/analysis
|
2020-01-30 11:22:30 -05:00
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
[partintro]
|
|
|
|
--
|
|
|
|
|
2020-01-30 09:19:53 -05:00
|
|
|
_Text analysis_ is the process of converting unstructured text, like
|
|
|
|
the body of an email or a product description, into a structured format that's
|
|
|
|
optimized for search.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2020-01-30 09:19:53 -05:00
|
|
|
[[when-to-configure-analysis]]
|
|
|
|
=== When to configure text analysis
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-01-30 09:19:53 -05:00
|
|
|
{es} performs text analysis when indexing or searching <<text,`text`>> fields.
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-01-30 09:19:53 -05:00
|
|
|
If your index doesn't contain `text` fields, no further setup is needed; you can
|
|
|
|
skip the pages in this section.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2020-01-30 09:19:53 -05:00
|
|
|
However, if you use `text` fields or your text searches aren't returning results
|
|
|
|
as expected, configuring text analysis can often help. You should also look into
|
|
|
|
analysis configuration if you're using {es} to:
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-01-30 09:19:53 -05:00
|
|
|
* Build a search engine
|
|
|
|
* Mine unstructured data
|
|
|
|
* Fine-tune search for a specific language
|
|
|
|
* Perform lexicographic or linguistic research
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-07-23 12:42:33 -04:00
|
|
|
[discrete]
|
2020-01-30 09:19:53 -05:00
|
|
|
[[analysis-toc]]
|
|
|
|
=== In this section
|
|
|
|
|
|
|
|
* <<analysis-overview>>
|
|
|
|
* <<analysis-concepts>>
|
|
|
|
* <<configure-text-analysis>>
|
|
|
|
* <<analysis-analyzers>>
|
|
|
|
* <<analysis-tokenizers>>
|
|
|
|
* <<analysis-tokenfilters>>
|
|
|
|
* <<analysis-charfilters>>
|
|
|
|
* <<analysis-normalizers>>
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
--
|
|
|
|
|
2020-01-08 13:53:08 -05:00
|
|
|
include::analysis/overview.asciidoc[]
|
|
|
|
|
2020-01-16 13:00:04 -05:00
|
|
|
include::analysis/concepts.asciidoc[]
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2020-01-16 13:11:42 -05:00
|
|
|
include::analysis/configure-text-analysis.asciidoc[]
|
2016-05-11 08:17:56 -04:00
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
include::analysis/analyzers.asciidoc[]
|
|
|
|
|
|
|
|
include::analysis/tokenizers.asciidoc[]
|
|
|
|
|
|
|
|
include::analysis/tokenfilters.asciidoc[]
|
|
|
|
|
|
|
|
include::analysis/charfilters.asciidoc[]
|
|
|
|
|
2020-01-30 09:19:53 -05:00
|
|
|
include::analysis/normalizers.asciidoc[]
|