[DOCS] Fix tokenizer page titles (#58361) (#58598)

Changes the titles for tokenizer pages to sentence case.

Also moves the 'Path hierarchy tokenizer examples' page within the
'Path hierarchy tokenizer' page and adds a related redirect.
This commit is contained in:
James Rodewig 2020-06-26 09:24:41 -04:00 committed by GitHub
parent eaa60b7c54
commit ab29162ab3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
18 changed files with 250 additions and 202 deletions

View File

@ -140,8 +140,6 @@ include::tokenizers/ngram-tokenizer.asciidoc[]
include::tokenizers/pathhierarchy-tokenizer.asciidoc[]
include::tokenizers/pathhierarchy-tokenizer-examples.asciidoc[]
include::tokenizers/pattern-tokenizer.asciidoc[]
include::tokenizers/simplepattern-tokenizer.asciidoc[]

View File

@ -1,5 +1,8 @@
[[analysis-chargroup-tokenizer]]
=== Char Group Tokenizer
=== Character group tokenizer
++++
<titleabbrev>Character group</titleabbrev>
++++
The `char_group` tokenizer breaks text into terms whenever it encounters a
character which is in a defined set. It is mostly useful for cases where a simple

View File

@ -1,5 +1,8 @@
[[analysis-classic-tokenizer]]
=== Classic Tokenizer
=== Classic tokenizer
++++
<titleabbrev>Classic</titleabbrev>
++++
The `classic` tokenizer is a grammar based tokenizer that is good for English
language documents. This tokenizer has heuristics for special treatment of

View File

@ -1,5 +1,8 @@
[[analysis-edgengram-tokenizer]]
=== Edge n-gram tokenizer
++++
<titleabbrev>Edge n-gram</titleabbrev>
++++
The `edge_ngram` tokenizer first breaks text down into words whenever it
encounters one of a list of specified characters, then it emits

View File

@ -1,5 +1,8 @@
[[analysis-keyword-tokenizer]]
=== Keyword Tokenizer
=== Keyword tokenizer
++++
<titleabbrev>Keyword</titleabbrev>
++++
The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
is given and outputs the exact same text as a single term. It can be combined

View File

@ -1,5 +1,8 @@
[[analysis-letter-tokenizer]]
=== Letter Tokenizer
=== Letter tokenizer
++++
<titleabbrev>Letter</titleabbrev>
++++
The `letter` tokenizer breaks text into terms whenever it encounters a
character which is not a letter. It does a reasonable job for most European

View File

@ -1,6 +1,8 @@
[[analysis-lowercase-tokenizer]]
=== Lowercase Tokenizer
=== Lowercase tokenizer
++++
<titleabbrev>Lowercase</titleabbrev>
++++
The `lowercase` tokenizer, like the
<<analysis-letter-tokenizer, `letter` tokenizer>> breaks text into terms

View File

@ -1,5 +1,8 @@
[[analysis-ngram-tokenizer]]
=== N-gram tokenizer
++++
<titleabbrev>N-gram</titleabbrev>
++++
The `ngram` tokenizer first breaks text down into words whenever it encounters
one of a list of specified characters, then it emits

View File

@ -1,183 +0,0 @@
[[analysis-pathhierarchy-tokenizer-examples]]
=== Path Hierarchy Tokenizer Examples
A common use-case for the `path_hierarchy` tokenizer is filtering results by
file paths. If indexing a file path along with the data, the use of the
`path_hierarchy` tokenizer to analyze the path allows filtering the results
by different parts of the file path string.
This example configures an index to have two custom analyzers and applies
those analyzers to multifields of the `file_path` text field that will
store filenames. One of the two analyzers uses reverse tokenization.
Some sample documents are then indexed to represent some file paths
for photos inside photo folders of two different users.
[source,console]
--------------------------------------------------
PUT file-path-test
{
"settings": {
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"file_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
}
}
}
}
POST file-path-test/_doc/1
{
"file_path": "/User/alice/photos/2017/05/16/my_photo1.jpg"
}
POST file-path-test/_doc/2
{
"file_path": "/User/alice/photos/2017/05/16/my_photo2.jpg"
}
POST file-path-test/_doc/3
{
"file_path": "/User/alice/photos/2017/05/16/my_photo3.jpg"
}
POST file-path-test/_doc/4
{
"file_path": "/User/alice/photos/2017/05/15/my_photo1.jpg"
}
POST file-path-test/_doc/5
{
"file_path": "/User/bob/photos/2017/05/16/my_photo1.jpg"
}
--------------------------------------------------
// TESTSETUP
A search for a particular file path string against the text field matches all
the example documents, with Bob's documents ranking highest due to `bob` also
being one of the terms created by the standard analyzer boosting relevance for
Bob's documents.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"match": {
"file_path": "/User/bob/photos/2017/05"
}
}
}
--------------------------------------------------
It's simple to match or filter documents with file paths that exist within a
particular directory using the `file_path.tree` field.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"term": {
"file_path.tree": "/User/alice/photos/2017/05/16"
}
}
}
--------------------------------------------------
With the reverse parameter for this tokenizer, it's also possible to match
from the other end of the file path, such as individual file names or a deep
level subdirectory. The following example shows a search for all files named
`my_photo1.jpg` within any directory via the `file_path.tree_reversed` field
configured to use the reverse parameter in the mapping.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"term": {
"file_path.tree_reversed": {
"value": "my_photo1.jpg"
}
}
}
}
--------------------------------------------------
Viewing the tokens generated with both forward and reverse is instructive
in showing the tokens created for the same file path value.
[source,console]
--------------------------------------------------
POST file-path-test/_analyze
{
"analyzer": "custom_path_tree",
"text": "/User/alice/photos/2017/05/16/my_photo1.jpg"
}
POST file-path-test/_analyze
{
"analyzer": "custom_path_tree_reversed",
"text": "/User/alice/photos/2017/05/16/my_photo1.jpg"
}
--------------------------------------------------
It's also useful to be able to filter with file paths when combined with other
types of searches, such as this example looking for any files paths with `16`
that also must be in Alice's photo directory.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"bool" : {
"must" : {
"match" : { "file_path" : "16" }
},
"filter": {
"term" : { "file_path.tree" : "/User/alice" }
}
}
}
}
--------------------------------------------------

View File

@ -1,5 +1,8 @@
[[analysis-pathhierarchy-tokenizer]]
=== Path Hierarchy Tokenizer
=== Path hierarchy tokenizer
++++
<titleabbrev>Path hierarchy</titleabbrev>
++++
The `path_hierarchy` tokenizer takes a hierarchical value like a filesystem
path, splits on the path separator, and emits a term for each component in the
@ -167,6 +170,191 @@ If we were to set `reverse` to `true`, it would produce the following:
[ one/two/three/, two/three/, three/ ]
---------------------------
[float]
=== Detailed Examples
See <<analysis-pathhierarchy-tokenizer-examples, detailed examples here>>.
[discrete]
[[analysis-pathhierarchy-tokenizer-detailed-examples]]
=== Detailed examples
A common use-case for the `path_hierarchy` tokenizer is filtering results by
file paths. If indexing a file path along with the data, the use of the
`path_hierarchy` tokenizer to analyze the path allows filtering the results
by different parts of the file path string.
This example configures an index to have two custom analyzers and applies
those analyzers to multifields of the `file_path` text field that will
store filenames. One of the two analyzers uses reverse tokenization.
Some sample documents are then indexed to represent some file paths
for photos inside photo folders of two different users.
[source,console]
--------------------------------------------------
PUT file-path-test
{
"settings": {
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"file_path": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
}
}
}
}
POST file-path-test/_doc/1
{
"file_path": "/User/alice/photos/2017/05/16/my_photo1.jpg"
}
POST file-path-test/_doc/2
{
"file_path": "/User/alice/photos/2017/05/16/my_photo2.jpg"
}
POST file-path-test/_doc/3
{
"file_path": "/User/alice/photos/2017/05/16/my_photo3.jpg"
}
POST file-path-test/_doc/4
{
"file_path": "/User/alice/photos/2017/05/15/my_photo1.jpg"
}
POST file-path-test/_doc/5
{
"file_path": "/User/bob/photos/2017/05/16/my_photo1.jpg"
}
--------------------------------------------------
A search for a particular file path string against the text field matches all
the example documents, with Bob's documents ranking highest due to `bob` also
being one of the terms created by the standard analyzer boosting relevance for
Bob's documents.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"match": {
"file_path": "/User/bob/photos/2017/05"
}
}
}
--------------------------------------------------
// TEST[continued]
It's simple to match or filter documents with file paths that exist within a
particular directory using the `file_path.tree` field.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"term": {
"file_path.tree": "/User/alice/photos/2017/05/16"
}
}
}
--------------------------------------------------
// TEST[continued]
With the reverse parameter for this tokenizer, it's also possible to match
from the other end of the file path, such as individual file names or a deep
level subdirectory. The following example shows a search for all files named
`my_photo1.jpg` within any directory via the `file_path.tree_reversed` field
configured to use the reverse parameter in the mapping.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"term": {
"file_path.tree_reversed": {
"value": "my_photo1.jpg"
}
}
}
}
--------------------------------------------------
// TEST[continued]
Viewing the tokens generated with both forward and reverse is instructive
in showing the tokens created for the same file path value.
[source,console]
--------------------------------------------------
POST file-path-test/_analyze
{
"analyzer": "custom_path_tree",
"text": "/User/alice/photos/2017/05/16/my_photo1.jpg"
}
POST file-path-test/_analyze
{
"analyzer": "custom_path_tree_reversed",
"text": "/User/alice/photos/2017/05/16/my_photo1.jpg"
}
--------------------------------------------------
// TEST[continued]
It's also useful to be able to filter with file paths when combined with other
types of searches, such as this example looking for any files paths with `16`
that also must be in Alice's photo directory.
[source,console]
--------------------------------------------------
GET file-path-test/_search
{
"query": {
"bool" : {
"must" : {
"match" : { "file_path" : "16" }
},
"filter": {
"term" : { "file_path.tree" : "/User/alice" }
}
}
}
}
--------------------------------------------------
// TEST[continued]

View File

@ -1,5 +1,8 @@
[[analysis-pattern-tokenizer]]
=== Pattern Tokenizer
=== Pattern tokenizer
++++
<titleabbrev>Pattern</titleabbrev>
++++
The `pattern` tokenizer uses a regular expression to either split text into
terms whenever it matches a word separator, or to capture matching text as

View File

@ -1,5 +1,8 @@
[[analysis-simplepattern-tokenizer]]
=== Simple Pattern Tokenizer
=== Simple pattern tokenizer
++++
<titleabbrev>Simple pattern</titleabbrev>
++++
The `simple_pattern` tokenizer uses a regular expression to capture matching
text as terms. The set of regular expression features it supports is more

View File

@ -1,5 +1,8 @@
[[analysis-simplepatternsplit-tokenizer]]
=== Simple Pattern Split Tokenizer
=== Simple pattern split tokenizer
++++
<titleabbrev>Simple pattern split</titleabbrev>
++++
The `simple_pattern_split` tokenizer uses a regular expression to split the
input into terms at pattern matches. The set of regular expression features it

View File

@ -1,5 +1,8 @@
[[analysis-standard-tokenizer]]
=== Standard Tokenizer
=== Standard tokenizer
++++
<titleabbrev>Standard</titleabbrev>
++++
The `standard` tokenizer provides grammar based tokenization (based on the
Unicode Text Segmentation algorithm, as specified in

View File

@ -1,5 +1,8 @@
[[analysis-thai-tokenizer]]
=== Thai Tokenizer
=== Thai tokenizer
++++
<titleabbrev>Thai</titleabbrev>
++++
The `thai` tokenizer segments Thai text into words, using the Thai
segmentation algorithm included with Java. Text in other languages in general

View File

@ -1,5 +1,8 @@
[[analysis-uaxurlemail-tokenizer]]
=== UAX URL Email Tokenizer
=== UAX URL email tokenizer
++++
<titleabbrev>UAX URL email</titleabbrev>
++++
The `uax_url_email` tokenizer is like the <<analysis-standard-tokenizer,`standard` tokenizer>> except that it
recognises URLs and email addresses as single tokens.

View File

@ -1,5 +1,8 @@
[[analysis-whitespace-tokenizer]]
=== Whitespace Tokenizer
=== Whitespace tokenizer
++++
<titleabbrev>Whitespace</titleabbrev>
++++
The `whitespace` tokenizer breaks text into terms whenever it encounters a
whitespace character.

View File

@ -886,6 +886,10 @@ See <<ilm-existing-indices-apply>>.
See <<ilm-existing-indices-reindex>>.
[role="exclude",id="analysis-pathhierarchy-tokenizer-examples"]
=== Path hierarchy tokenizer examples
See <<analysis-pathhierarchy-tokenizer-detailed-examples>>.
////
[role="exclude",id="search-request-body"]