62 lines
1.6 KiB
Plaintext
62 lines
1.6 KiB
Plaintext
|
[[ignore-above]]
|
||
|
=== `ignore_above`
|
||
|
|
||
|
Strings longer than the `ignore_above` setting will not be processed by the
|
||
|
<<analyzer,analyzer>> and will not be indexed. This is mainly useful for
|
||
|
<<mapping-index,`not_analyzed`>> string fields, which are typically used for
|
||
|
filtering, aggregations, and sorting. These are structured fields and it
|
||
|
doesn't usually make sense to allow very long terms to be indexed in these
|
||
|
fields.
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
PUT my_index
|
||
|
{
|
||
|
"mappings": {
|
||
|
"my_type": {
|
||
|
"properties": {
|
||
|
"message": {
|
||
|
"type": "string",
|
||
|
"index": "not_analyzed",
|
||
|
"ignore_above": 20 <1>
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
|
||
|
PUT my_index/my_type/1 <2>
|
||
|
{
|
||
|
"message": "Syntax error"
|
||
|
}
|
||
|
|
||
|
PUT my_index/my_type/2 <3>
|
||
|
{
|
||
|
"message": "Syntax error with some long stacktrace"
|
||
|
}
|
||
|
|
||
|
GET _search <4>
|
||
|
{
|
||
|
"aggs": {
|
||
|
"messages": {
|
||
|
"terms": {
|
||
|
"field": "message"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
// AUTOSENSE
|
||
|
<1> This field will ignore any string longer than 20 characters.
|
||
|
<2> This document is indexed successfully.
|
||
|
<3> This document will be indexed, but without indexing the `message` field.
|
||
|
<4> Search returns both documents, but only the first is present in the terms aggregation.
|
||
|
|
||
|
This option is also useful for protecting against Lucene's term byte-length
|
||
|
limit of `32766`.
|
||
|
|
||
|
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
|
||
|
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
|
||
|
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
|
||
|
3 bytes.
|