[[analysis-custom-analyzer]] === Custom Analyzer When the built-in analyzers do not fulfill your needs, you can create a `custom` analyzer which uses the appropriate combination of: * zero or more <> * a <> * zero or more <>. [float] === Configuration The `custom` analyzer accepts the following parameters: [horizontal] `tokenizer`:: A built-in or customised <>. (Required) `char_filter`:: An optional array of built-in or customised <>. `filter`:: An optional array of built-in or customised <>. `position_increment_gap`:: When indexing an array of text values, Elasticsearch inserts a fake "gap" between the last term of one value and the first term of the next value to ensure that a phrase query doesn't match two terms from different array elements. Defaults to `100`. See <> for more. [float] === Example configuration Here is an example that combines the following: Character Filter:: * <> Tokenizer:: * <> Token Filters:: * <> * <> [source,js] -------------------------------- PUT my_index { "settings": { "analysis": { "analyzer": { "my_custom_analyzer": { "type": "custom", "tokenizer": "standard", "char_filter": [ "html_strip" ], "filter": [ "lowercase", "asciifolding" ] } } } } } POST my_index/_analyze { "analyzer": "my_custom_analyzer", "text": "Is this déjà vu?" } -------------------------------- // CONSOLE ///////////////////// [source,js] ---------------------------- { "tokens": [ { "token": "is", "start_offset": 0, "end_offset": 2, "type": "", "position": 0 }, { "token": "this", "start_offset": 3, "end_offset": 7, "type": "", "position": 1 }, { "token": "deja", "start_offset": 11, "end_offset": 15, "type": "", "position": 2 }, { "token": "vu", "start_offset": 16, "end_offset": 22, "type": "", "position": 3 } ] } ---------------------------- // TESTRESPONSE ///////////////////// The above example produces the following terms: [source,text] --------------------------- [ is, this, deja, vu ] --------------------------- The previous example used tokenizer, token filters, and character filters with their default configurations, but it is possible to create configured versions of each and to use them in a custom analyzer. Here is a more complicated example that combines the following: Character Filter:: * <>, configured to replace `:)` with `_happy_` and `:(` with `_sad_` Tokenizer:: * <>, configured to split on punctuation characters Token Filters:: * <> * <>, configured to use the pre-defined list of English stop words Here is an example: [source,js] -------------------------------------------------- PUT my_index { "settings": { "analysis": { "analyzer": { "my_custom_analyzer": { "type": "custom", "char_filter": [ "emoticons" <1> ], "tokenizer": "punctuation", <1> "filter": [ "lowercase", "english_stop" <1> ] } }, "tokenizer": { "punctuation": { <1> "type": "pattern", "pattern": "[ .,!?]" } }, "char_filter": { "emoticons": { <1> "type": "mapping", "mappings": [ ":) => _happy_", ":( => _sad_" ] } }, "filter": { "english_stop": { <1> "type": "stop", "stopwords": "_english_" } } } } } POST my_index/_analyze { "analyzer": "my_custom_analyzer", "text": "I'm a :) person, and you?" } -------------------------------------------------- // CONSOLE <1> The `emoticons` character filter, `punctuation` tokenizer and `english_stop` token filter are custom implementations which are defined in the same index settings. ///////////////////// [source,js] ---------------------------- { "tokens": [ { "token": "i'm", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "_happy_", "start_offset": 6, "end_offset": 8, "type": "word", "position": 2 }, { "token": "person", "start_offset": 9, "end_offset": 15, "type": "word", "position": 3 }, { "token": "you", "start_offset": 21, "end_offset": 24, "type": "word", "position": 5 } ] } ---------------------------- // TESTRESPONSE ///////////////////// The above example produces the following terms: [source,text] --------------------------- [ i'm, _happy_, person, you ] ---------------------------