60 lines
2.0 KiB
Plaintext
60 lines
2.0 KiB
Plaintext
[[analysis-custom-analyzer]]
|
|
=== Custom Analyzer
|
|
|
|
An analyzer of type `custom` that allows to combine a `Tokenizer` with
|
|
zero or more `Token Filters`, and zero or more `Char Filters`. The
|
|
custom analyzer accepts a logical/registered name of the tokenizer to
|
|
use, and a list of logical/registered names of token filters.
|
|
The name of the custom analyzer must not start with "_".
|
|
|
|
The following are settings that can be set for a `custom` analyzer type:
|
|
|
|
[cols="<,<",options="header",]
|
|
|=======================================================================
|
|
|Setting |Description
|
|
|`tokenizer` |The logical / registered name of the tokenizer to use.
|
|
|
|
|`filter` |An optional list of logical / registered name of token
|
|
filters.
|
|
|
|
|`char_filter` |An optional list of logical / registered name of char
|
|
filters.
|
|
|
|
|`position_offset_gap` |An optional number of positions to increment
|
|
between each field value of a field using this analyzer. Defaults to 100.
|
|
100 was chosen because it prevents phrase queries with reasonably large
|
|
slops (less than 100) from matching terms across field values.
|
|
|=======================================================================
|
|
|
|
Here is an example:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
index :
|
|
analysis :
|
|
analyzer :
|
|
myAnalyzer2 :
|
|
type : custom
|
|
tokenizer : myTokenizer1
|
|
filter : [myTokenFilter1, myTokenFilter2]
|
|
char_filter : [my_html]
|
|
position_offset_gap: 256
|
|
tokenizer :
|
|
myTokenizer1 :
|
|
type : standard
|
|
max_token_length : 900
|
|
filter :
|
|
myTokenFilter1 :
|
|
type : stop
|
|
stopwords : [stop1, stop2, stop3, stop4]
|
|
myTokenFilter2 :
|
|
type : length
|
|
min : 0
|
|
max : 2000
|
|
char_filter :
|
|
my_html :
|
|
type : html_strip
|
|
escaped_tags : [xxx, yyy]
|
|
read_ahead : 1024
|
|
--------------------------------------------------
|