Rename simple pattern tokenizers (#25300)

Changed names to be snake case for consistency

Related to #25159, original issue #23363
This commit is contained in:
Andy Bristol 2017-06-19 13:48:43 -07:00 committed by GitHub
parent 0d6c47fe14
commit 4c5bd57619
5 changed files with 19 additions and 19 deletions

View File

@ -99,14 +99,14 @@ terms.
<<analysis-simplepattern-tokenizer,Simple Pattern Tokenizer>>:: <<analysis-simplepattern-tokenizer,Simple Pattern Tokenizer>>::
The `simplepattern` tokenizer uses a regular expression to capture matching The `simple_pattern` tokenizer uses a regular expression to capture matching
text as terms. It uses a restricted subset of regular expression features text as terms. It uses a restricted subset of regular expression features
and is generally faster than the `pattern` tokenizer. and is generally faster than the `pattern` tokenizer.
<<analysis-simplepatternsplit-tokenizer,Simple Pattern Split Tokenizer>>:: <<analysis-simplepatternsplit-tokenizer,Simple Pattern Split Tokenizer>>::
The `simplepatternsplit` tokenizer uses the same restricted regular expression The `simple_pattern_split` tokenizer uses the same restricted regular expression
subset as the `simplepattern` tokenizer, but splits the input at matches rather subset as the `simple_pattern` tokenizer, but splits the input at matches rather
than returning the matches as terms. than returning the matches as terms.
<<analysis-pathhierarchy-tokenizer,Path Tokenizer>>:: <<analysis-pathhierarchy-tokenizer,Path Tokenizer>>::

View File

@ -3,7 +3,7 @@
experimental[] experimental[]
The `simplepattern` tokenizer uses a regular expression to capture matching The `simple_pattern` tokenizer uses a regular expression to capture matching
text as terms. The set of regular expression features it supports is more text as terms. The set of regular expression features it supports is more
limited than the <<analysis-pattern-tokenizer,`pattern`>> tokenizer, but the limited than the <<analysis-pattern-tokenizer,`pattern`>> tokenizer, but the
tokenization is generally faster. tokenization is generally faster.
@ -11,7 +11,7 @@ tokenization is generally faster.
This tokenizer does not support splitting the input on a pattern match, unlike This tokenizer does not support splitting the input on a pattern match, unlike
the <<analysis-pattern-tokenizer,`pattern`>> tokenizer. To split on pattern the <<analysis-pattern-tokenizer,`pattern`>> tokenizer. To split on pattern
matches using the same restricted regular expression subset, see the matches using the same restricted regular expression subset, see the
<<analysis-simplepatternsplit-tokenizer,`simplepatternsplit`>> tokenizer. <<analysis-simplepatternsplit-tokenizer,`simple_pattern_split`>> tokenizer.
This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions]. This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions].
For an explanation of the supported features and syntax, see <<regexp-syntax,Regular Expression Syntax>>. For an explanation of the supported features and syntax, see <<regexp-syntax,Regular Expression Syntax>>.
@ -22,7 +22,7 @@ tokenizer should always be configured with a non-default pattern.
[float] [float]
=== Configuration === Configuration
The `simplepattern` tokenizer accepts the following parameters: The `simple_pattern` tokenizer accepts the following parameters:
[horizontal] [horizontal]
`pattern`:: `pattern`::
@ -31,7 +31,7 @@ The `simplepattern` tokenizer accepts the following parameters:
[float] [float]
=== Example configuration === Example configuration
This example configures the `simplepattern` tokenizer to produce terms that are This example configures the `simple_pattern` tokenizer to produce terms that are
three-digit numbers three-digit numbers
[source,js] [source,js]
@ -47,7 +47,7 @@ PUT my_index
}, },
"tokenizer": { "tokenizer": {
"my_tokenizer": { "my_tokenizer": {
"type": "simplepattern", "type": "simple_pattern",
"pattern": "[0123456789]{3}" "pattern": "[0123456789]{3}"
} }
} }

View File

@ -3,14 +3,14 @@
experimental[] experimental[]
The `simplepatternsplit` tokenizer uses a regular expression to split the The `simple_pattern_split` tokenizer uses a regular expression to split the
input into terms at pattern matches. The set of regular expression features it input into terms at pattern matches. The set of regular expression features it
supports is more limited than the <<analysis-pattern-tokenizer,`pattern`>> supports is more limited than the <<analysis-pattern-tokenizer,`pattern`>>
tokenizer, but the tokenization is generally faster. tokenizer, but the tokenization is generally faster.
This tokenizer does not produce terms from the matches themselves. To produce This tokenizer does not produce terms from the matches themselves. To produce
terms from matches using patterns in the same restricted regular expression terms from matches using patterns in the same restricted regular expression
subset, see the <<analysis-simplepattern-tokenizer,`simplepattern`>> subset, see the <<analysis-simplepattern-tokenizer,`simple_pattern`>>
tokenizer. tokenizer.
This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions]. This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions].
@ -23,7 +23,7 @@ pattern.
[float] [float]
=== Configuration === Configuration
The `simplepatternsplit` tokenizer accepts the following parameters: The `simple_pattern_split` tokenizer accepts the following parameters:
[horizontal] [horizontal]
`pattern`:: `pattern`::
@ -32,7 +32,7 @@ The `simplepatternsplit` tokenizer accepts the following parameters:
[float] [float]
=== Example configuration === Example configuration
This example configures the `simplepatternsplit` tokenizer to split the input This example configures the `simple_pattern_split` tokenizer to split the input
text on underscores. text on underscores.
[source,js] [source,js]
@ -48,7 +48,7 @@ PUT my_index
}, },
"tokenizer": { "tokenizer": {
"my_tokenizer": { "my_tokenizer": {
"type": "simplepatternsplit", "type": "simple_pattern_split",
"pattern": "_" "pattern": "_"
} }
} }

View File

@ -122,8 +122,8 @@ public class CommonAnalysisPlugin extends Plugin implements AnalysisPlugin {
@Override @Override
public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() { public Map<String, AnalysisProvider<TokenizerFactory>> getTokenizers() {
Map<String, AnalysisProvider<TokenizerFactory>> tokenizers = new TreeMap<>(); Map<String, AnalysisProvider<TokenizerFactory>> tokenizers = new TreeMap<>();
tokenizers.put("simplepattern", SimplePatternTokenizerFactory::new); tokenizers.put("simple_pattern", SimplePatternTokenizerFactory::new);
tokenizers.put("simplepatternsplit", SimplePatternSplitTokenizerFactory::new); tokenizers.put("simple_pattern_split", SimplePatternSplitTokenizerFactory::new);
return tokenizers; return tokenizers;
} }

View File

@ -27,14 +27,14 @@
- match: { detail.tokenizer.tokens.2.token: od } - match: { detail.tokenizer.tokens.2.token: od }
--- ---
"simplepattern": "simple_pattern":
- do: - do:
indices.analyze: indices.analyze:
body: body:
text: "a6bf fooo ff61" text: "a6bf fooo ff61"
explain: true explain: true
tokenizer: tokenizer:
type: simplepattern type: simple_pattern
pattern: "[abcdef0123456789]{4}" pattern: "[abcdef0123456789]{4}"
- length: { detail.tokenizer.tokens: 2 } - length: { detail.tokenizer.tokens: 2 }
- match: { detail.tokenizer.name: _anonymous_tokenizer } - match: { detail.tokenizer.name: _anonymous_tokenizer }
@ -42,14 +42,14 @@
- match: { detail.tokenizer.tokens.1.token: ff61 } - match: { detail.tokenizer.tokens.1.token: ff61 }
--- ---
"simplepatternsplit": "simple_pattern_split":
- do: - do:
indices.analyze: indices.analyze:
body: body:
text: "foo==bar" text: "foo==bar"
explain: true explain: true
tokenizer: tokenizer:
type: simplepatternsplit type: simple_pattern_split
pattern: == pattern: ==
- length: { detail.tokenizer.tokens: 2 } - length: { detail.tokenizer.tokens: 2 }
- match: { detail.tokenizer.name: _anonymous_tokenizer } - match: { detail.tokenizer.name: _anonymous_tokenizer }