From ef26763ca9b620d7afbe76874adb1f94f80f9b55 Mon Sep 17 00:00:00 2001 From: James Rodewig Date: Thu, 16 Jan 2020 13:00:04 -0500 Subject: [PATCH] [DOCS] Add concepts section to analysis topic (#50801) This helps the topic better match the structure of our machine learning docs, e.g. https://www.elastic.co/guide/en/machine-learning/7.5/ml-concepts.html This PR only includes the 'Anatomy of an analyzer' page as a 'Concepts' child page, but I plan to add other concepts, such as 'Index time vs. search time', with later PRs. --- docs/reference/analysis.asciidoc | 2 +- docs/reference/analysis/anatomy.asciidoc | 18 +++++------------- docs/reference/analysis/concepts.asciidoc | 11 +++++++++++ 3 files changed, 17 insertions(+), 14 deletions(-) create mode 100644 docs/reference/analysis/concepts.asciidoc diff --git a/docs/reference/analysis.asciidoc b/docs/reference/analysis.asciidoc index a5625abbd61..17efa15479f 100644 --- a/docs/reference/analysis.asciidoc +++ b/docs/reference/analysis.asciidoc @@ -144,7 +144,7 @@ looking for: include::analysis/overview.asciidoc[] -include::analysis/anatomy.asciidoc[] +include::analysis/concepts.asciidoc[] include::analysis/testing.asciidoc[] diff --git a/docs/reference/analysis/anatomy.asciidoc b/docs/reference/analysis/anatomy.asciidoc index 729e0f7a27e..1db14e787a5 100644 --- a/docs/reference/analysis/anatomy.asciidoc +++ b/docs/reference/analysis/anatomy.asciidoc @@ -1,5 +1,5 @@ [[analyzer-anatomy]] -== Anatomy of an analyzer +=== Anatomy of an analyzer An _analyzer_ -- whether built-in or custom -- is just a package which contains three lower-level building blocks: _character filters_, @@ -10,8 +10,7 @@ blocks into analyzers suitable for different languages and types of text. Elasticsearch also exposes the individual building blocks so that they can be combined to define new <> analyzers. -[float] -=== Character filters +==== Character filters A _character filter_ receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For @@ -22,8 +21,7 @@ elements like `` from the stream. An analyzer may have *zero or more* <>, which are applied in order. -[float] -=== Tokenizer +==== Tokenizer A _tokenizer_ receives a stream of characters, breaks it up into individual _tokens_ (usually individual words), and outputs a stream of _tokens_. For @@ -37,9 +35,7 @@ the term represents. An analyzer must have *exactly one* <>. - -[float] -=== Token filters +==== Token filters A _token filter_ receives the token stream and may add, remove, or change tokens. For example, a <> token @@ -53,8 +49,4 @@ Token filters are not allowed to change the position or character offsets of each token. An analyzer may have *zero or more* <>, -which are applied in order. - - - - +which are applied in order. \ No newline at end of file diff --git a/docs/reference/analysis/concepts.asciidoc b/docs/reference/analysis/concepts.asciidoc new file mode 100644 index 00000000000..451b23c49d7 --- /dev/null +++ b/docs/reference/analysis/concepts.asciidoc @@ -0,0 +1,11 @@ +[[analysis-concepts]] +== Text analysis concepts +++++ +Concepts +++++ + +This section explains the fundamental concepts of text analysis in {es}. + +* <> + +include::anatomy.asciidoc[] \ No newline at end of file