[[analysis-simplepattern-tokenizer]] === Simple Pattern Tokenizer experimental[This functionality is marked as experimental in Lucene] The `simple_pattern` tokenizer uses a regular expression to capture matching text as terms. The set of regular expression features it supports is more limited than the <> tokenizer, but the tokenization is generally faster. This tokenizer does not support splitting the input on a pattern match, unlike the <> tokenizer. To split on pattern matches using the same restricted regular expression subset, see the <> tokenizer. This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions]. For an explanation of the supported features and syntax, see <>. The default pattern is the empty string, which produces no terms. This tokenizer should always be configured with a non-default pattern. [float] === Configuration The `simple_pattern` tokenizer accepts the following parameters: [horizontal] `pattern`:: {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expression], defaults to the empty string. [float] === Example configuration This example configures the `simple_pattern` tokenizer to produce terms that are three-digit numbers [source,js] ---------------------------- PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "simple_pattern", "pattern": "[0123456789]{3}" } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "fd-786-335-514-x" } ---------------------------- // CONSOLE ///////////////////// [source,js] ---------------------------- { "tokens" : [ { "token" : "786", "start_offset" : 3, "end_offset" : 6, "type" : "word", "position" : 0 }, { "token" : "335", "start_offset" : 7, "end_offset" : 10, "type" : "word", "position" : 1 }, { "token" : "514", "start_offset" : 11, "end_offset" : 14, "type" : "word", "position" : 2 } ] } ---------------------------- // TESTRESPONSE ///////////////////// The above example produces these terms: [source,text] --------------------------- [ 786, 335, 514 ] ---------------------------