[DOCS] Add concepts section to analysis topic (#50801)
This helps the topic better match the structure of our machine learning docs, e.g. https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts' child page, but I plan to add other concepts, such as 'Index time vs. search time', with later PRs.
This commit is contained in:
parent
1edaf2b101
commit
ef26763ca9
|
@ -144,7 +144,7 @@ looking for:
|
|||
|
||||
include::analysis/overview.asciidoc[]
|
||||
|
||||
include::analysis/anatomy.asciidoc[]
|
||||
include::analysis/concepts.asciidoc[]
|
||||
|
||||
include::analysis/testing.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
[[analyzer-anatomy]]
|
||||
== Anatomy of an analyzer
|
||||
=== Anatomy of an analyzer
|
||||
|
||||
An _analyzer_ -- whether built-in or custom -- is just a package which
|
||||
contains three lower-level building blocks: _character filters_,
|
||||
|
@ -10,8 +10,7 @@ blocks into analyzers suitable for different languages and types of text.
|
|||
Elasticsearch also exposes the individual building blocks so that they can be
|
||||
combined to define new <<analysis-custom-analyzer,`custom`>> analyzers.
|
||||
|
||||
[float]
|
||||
=== Character filters
|
||||
==== Character filters
|
||||
|
||||
A _character filter_ receives the original text as a stream of characters and
|
||||
can transform the stream by adding, removing, or changing characters. For
|
||||
|
@ -22,8 +21,7 @@ elements like `<b>` from the stream.
|
|||
An analyzer may have *zero or more* <<analysis-charfilters,character filters>>,
|
||||
which are applied in order.
|
||||
|
||||
[float]
|
||||
=== Tokenizer
|
||||
==== Tokenizer
|
||||
|
||||
A _tokenizer_ receives a stream of characters, breaks it up into individual
|
||||
_tokens_ (usually individual words), and outputs a stream of _tokens_. For
|
||||
|
@ -37,9 +35,7 @@ the term represents.
|
|||
|
||||
An analyzer must have *exactly one* <<analysis-tokenizers,tokenizer>>.
|
||||
|
||||
|
||||
[float]
|
||||
=== Token filters
|
||||
==== Token filters
|
||||
|
||||
A _token filter_ receives the token stream and may add, remove, or change
|
||||
tokens. For example, a <<analysis-lowercase-tokenfilter,`lowercase`>> token
|
||||
|
@ -53,8 +49,4 @@ Token filters are not allowed to change the position or character offsets of
|
|||
each token.
|
||||
|
||||
An analyzer may have *zero or more* <<analysis-tokenfilters,token filters>>,
|
||||
which are applied in order.
|
||||
|
||||
|
||||
|
||||
|
||||
which are applied in order.
|
|
@ -0,0 +1,11 @@
|
|||
[[analysis-concepts]]
|
||||
== Text analysis concepts
|
||||
++++
|
||||
<titleabbrev>Concepts</titleabbrev>
|
||||
++++
|
||||
|
||||
This section explains the fundamental concepts of text analysis in {es}.
|
||||
|
||||
* <<analyzer-anatomy>>
|
||||
|
||||
include::anatomy.asciidoc[]
|
Loading…
Reference in New Issue