OpenSearch/docs/reference/analysis/analyzers/custom-analyzer.asciidoc

[[analysis-custom-analyzer]]
=== Custom Analyzer

An analyzer of type `custom` that allows to combine a `Tokenizer` with
zero or more `Token Filters`, and zero or more `Char Filters`. The
custom analyzer accepts a logical/registered name of the tokenizer to
use, and a list of logical/registered names of token filters.
The name of the custom analyzer must not start with "_".

The following are settings that can be set for a `custom` analyzer type:

[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`tokenizer` |The logical / registered name of the tokenizer to use.

|`filter` |An optional list of logical / registered name of token
filters.

|`char_filter` |An optional list of logical / registered name of char
filters.

|`position_offset_gap` |An optional number of positions to increment 
between each field value of a field using this analyzer.
|=======================================================================

Here is an example:

[source,js]
--------------------------------------------------
index :
    analysis :
        analyzer : 
            myAnalyzer2 :
                type : custom
                tokenizer : myTokenizer1
                filter : [myTokenFilter1, myTokenFilter2]
                char_filter : [my_html]
                position_offset_gap: 256
        tokenizer :
            myTokenizer1 :
                type : standard
                max_token_length : 900
        filter :
            myTokenFilter1 :
                type : stop
                stopwords : [stop1, stop2, stop3, stop4]
            myTokenFilter2 :
                type : length
                min : 0
                max : 2000
        char_filter :
              my_html :
                type : html_strip
                escaped_tags : [xxx, yyy]
                read_ahead : 1024
--------------------------------------------------
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[analysis-custom-analyzer]]`
			`=== Custom Analyzer`

			An analyzer of type `custom` that allows to combine a `Tokenizer` with
			zero or more `Token Filters`, and zero or more `Char Filters`. The
			`custom analyzer accepts a logical/registered name of the tokenizer to`
			`use, and a list of logical/registered names of token filters.`
spell correct and add single quotes 2015-05-26 05:40:19 -04:00			`The name of the custom analyzer must not start with "_".`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			The following are settings that can be set for a `custom` analyzer type:

			`[cols="<,<",options="header",]`
			`\|=======================================================================`
			`\|Setting \|Description`
			\|`tokenizer` \|The logical / registered name of the tokenizer to use.

			\|`filter` \|An optional list of logical / registered name of token
			`filters.`

			\|`char_filter` \|An optional list of logical / registered name of char
			`filters.`
document and test custom analyzer position offset gap 2015-05-02 00:36:27 -04:00
			\|`position_offset_gap` \|An optional number of positions to increment
			`between each field value of a field using this analyzer.`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`\|=======================================================================`

			`Here is an example:`

			`[source,js]`
			`--------------------------------------------------`
			`index :`
			`analysis :`
			`analyzer :`
			`myAnalyzer2 :`
			`type : custom`
			`tokenizer : myTokenizer1`
			`filter : [myTokenFilter1, myTokenFilter2]`
			`char_filter : [my_html]`
document and test custom analyzer position offset gap 2015-05-02 00:36:27 -04:00			`position_offset_gap: 256`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`tokenizer :`
			`myTokenizer1 :`
			`type : standard`
			`max_token_length : 900`
			`filter :`
			`myTokenFilter1 :`
			`type : stop`
			`stopwords : [stop1, stop2, stop3, stop4]`
			`myTokenFilter2 :`
			`type : length`
			`min : 0`
			`max : 2000`
			`char_filter :`
			`my_html :`
			`type : html_strip`
			`escaped_tags : [xxx, yyy]`
			`read_ahead : 1024`
			`--------------------------------------------------`