[[analysis-simplepattern-tokenizer]] === Simple pattern tokenizer ++++ <titleabbrev>Simple pattern</titleabbrev> ++++ The `simple_pattern` tokenizer uses a regular expression to capture matching text as terms. The set of regular expression features it supports is more limited than the <<analysis-pattern-tokenizer,`pattern`>> tokenizer, but the tokenization is generally faster. This tokenizer does not support splitting the input on a pattern match, unlike the <<analysis-pattern-tokenizer,`pattern`>> tokenizer. To split on pattern matches using the same restricted regular expression subset, see the <<analysis-simplepatternsplit-tokenizer,`simple_pattern_split`>> tokenizer. This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions]. For an explanation of the supported features and syntax, see <<regexp-syntax,Regular Expression Syntax>>. The default pattern is the empty string, which produces no terms. This tokenizer should always be configured with a non-default pattern. [float] === Configuration The `simple_pattern` tokenizer accepts the following parameters: [horizontal] `pattern`:: {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expression], defaults to the empty string. [float] === Example configuration This example configures the `simple_pattern` tokenizer to produce terms that are three-digit numbers [source,console] ---------------------------- PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "simple_pattern", "pattern": "[0123456789]{3}" } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "fd-786-335-514-x" } ---------------------------- ///////////////////// [source,console-result] ---------------------------- { "tokens" : [ { "token" : "786", "start_offset" : 3, "end_offset" : 6, "type" : "word", "position" : 0 }, { "token" : "335", "start_offset" : 7, "end_offset" : 10, "type" : "word", "position" : 1 }, { "token" : "514", "start_offset" : 11, "end_offset" : 14, "type" : "word", "position" : 2 } ] } ---------------------------- ///////////////////// The above example produces these terms: [source,text] --------------------------- [ 786, 335, 514 ] ---------------------------