2013-08-28 19:24:34 -04:00
|
|
|
[[analysis-analyzers]]
|
2020-01-16 12:27:54 -05:00
|
|
|
== Built-in analyzer reference
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
Elasticsearch ships with a wide range of built-in analyzers, which can be used
|
|
|
|
in any index without further configuration:
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
<<analysis-standard-analyzer,Standard Analyzer>>::
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
The `standard` analyzer divides text into terms on word boundaries, as defined
|
|
|
|
by the Unicode Text Segmentation algorithm. It removes most punctuation,
|
|
|
|
lowercases terms, and supports removing stop words.
|
|
|
|
|
|
|
|
<<analysis-simple-analyzer,Simple Analyzer>>::
|
|
|
|
|
|
|
|
The `simple` analyzer divides text into terms whenever it encounters a
|
|
|
|
character which is not a letter. It lowercases all terms.
|
|
|
|
|
|
|
|
<<analysis-whitespace-analyzer,Whitespace Analyzer>>::
|
|
|
|
|
|
|
|
The `whitespace` analyzer divides text into terms whenever it encounters any
|
|
|
|
whitespace character. It does not lowercase terms.
|
|
|
|
|
|
|
|
<<analysis-stop-analyzer,Stop Analyzer>>::
|
|
|
|
|
|
|
|
The `stop` analyzer is like the `simple` analyzer, but also supports removal
|
|
|
|
of stop words.
|
|
|
|
|
|
|
|
<<analysis-keyword-analyzer,Keyword Analyzer>>::
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
The `keyword` analyzer is a ``noop'' analyzer that accepts whatever text it is
|
|
|
|
given and outputs the exact same text as a single term.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
<<analysis-pattern-analyzer,Pattern Analyzer>>::
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
The `pattern` analyzer uses a regular expression to split the text into terms.
|
|
|
|
It supports lower-casing and stop words.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2016-05-11 08:17:56 -04:00
|
|
|
<<analysis-lang-analyzer,Language Analyzers>>::
|
|
|
|
|
|
|
|
Elasticsearch provides many language-specific analyzers like `english` or
|
|
|
|
`french`.
|
|
|
|
|
|
|
|
<<analysis-fingerprint-analyzer,Fingerprint Analyzer>>::
|
|
|
|
|
|
|
|
The `fingerprint` analyzer is a specialist analyzer which creates a
|
|
|
|
fingerprint which can be used for duplicate detection.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Custom analyzers
|
|
|
|
|
|
|
|
If you do not find an analyzer suitable for your needs, you can create a
|
|
|
|
<<analysis-custom-analyzer,`custom`>> analyzer which combines the appropriate
|
|
|
|
<<analysis-charfilters, character filters>>,
|
|
|
|
<<analysis-tokenizers,tokenizer>>, and <<analysis-tokenfilters,token filters>>.
|
|
|
|
|
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/fingerprint-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/keyword-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/lang-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/pattern-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/simple-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/standard-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2019-10-15 15:46:50 -04:00
|
|
|
include::analyzers/stop-analyzer.asciidoc[]
|
2013-08-28 19:24:34 -04:00
|
|
|
|
2020-01-16 13:11:42 -05:00
|
|
|
include::analyzers/whitespace-analyzer.asciidoc[]
|