OpenSearch/docs/reference/analysis/analyzers/custom-analyzer.asciidoc

60 lines
2.0 KiB
Plaintext

[[analysis-custom-analyzer]]
=== Custom Analyzer
An analyzer of type `custom` that allows to combine a `Tokenizer` with
zero or more `Token Filters`, and zero or more `Char Filters`. The
custom analyzer accepts a logical/registered name of the tokenizer to
use, and a list of logical/registered names of token filters.
The name of the custom analyzer must not start with "_".
The following are settings that can be set for a `custom` analyzer type:
[cols="<,<",options="header",]
|=======================================================================
|Setting |Description
|`tokenizer` |The logical / registered name of the tokenizer to use.
|`filter` |An optional list of logical / registered name of token
filters.
|`char_filter` |An optional list of logical / registered name of char
filters.
|`position_increment_gap` |An optional number of positions to increment
between each field value of a field using this analyzer. Defaults to 100.
100 was chosen because it prevents phrase queries with reasonably large
slops (less than 100) from matching terms across field values.
|=======================================================================
Here is an example:
[source,js]
--------------------------------------------------
index :
analysis :
analyzer :
myAnalyzer2 :
type : custom
tokenizer : myTokenizer1
filter : [myTokenFilter1, myTokenFilter2]
char_filter : [my_html]
position_increment_gap: 256
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900
filter :
myTokenFilter1 :
type : stop
stopwords : [stop1, stop2, stop3, stop4]
myTokenFilter2 :
type : length
min : 0
max : 2000
char_filter :
my_html :
type : html_strip
escaped_tags : [xxx, yyy]
read_ahead : 1024
--------------------------------------------------