From 3c11c7c2614ea0396f372016a5cafbc689b70ee6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Fri, 6 Jul 2018 14:36:58 +0200 Subject: [PATCH] [Docs] Add clarification to analysis example (#31826) There have been at least two PRs trying to fix the spelling of "lazi" because it isn't very clear from the example that the english analyzer will stem each token in the example. This adds a short description of the analysis process to make this clearer. Relates to #31797 --- docs/reference/analysis.asciidoc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/reference/analysis.asciidoc b/docs/reference/analysis.asciidoc index 85434720c3e..c5fcce3ad5f 100644 --- a/docs/reference/analysis.asciidoc +++ b/docs/reference/analysis.asciidoc @@ -13,15 +13,18 @@ defined per index. [float] == Index time analysis -For instance at index time, the built-in <> _analyzer_ would -convert this sentence: +For instance, at index time the built-in <> _analyzer_ +will first convert the sentence: [source,text] ------ "The QUICK brown foxes jumped over the lazy dog!" ------ -into these terms, which would be added to the inverted index. +into distinct tokens. It will then lowercase each token, remove frequent +stopwords ("the") and reduce the terms to their word stems (foxes -> fox, +jumped -> jump, lazy -> lazi). In the end, the following terms will be added +to the inverted index: [source,text] ------