2015-08-06 11:24:29 -04:00
[[ignore-above]]
=== `ignore_above`
2016-03-18 12:01:27 -04:00
Strings longer than the `ignore_above` setting will not be indexed or stored.
2018-08-22 11:18:30 -04:00
For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.
NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.
2015-08-06 11:24:29 -04:00
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
2017-12-14 11:47:53 -05:00
"_doc": {
2015-08-06 11:24:29 -04:00
"properties": {
"message": {
2016-03-18 12:01:27 -04:00
"type": "keyword",
2015-08-06 11:24:29 -04:00
"ignore_above": 20 <1>
}
}
}
}
}
2017-12-14 11:47:53 -05:00
PUT my_index/_doc/1 <2>
2015-08-06 11:24:29 -04:00
{
"message": "Syntax error"
}
2017-12-14 11:47:53 -05:00
PUT my_index/_doc/2 <3>
2015-08-06 11:24:29 -04:00
{
"message": "Syntax error with some long stacktrace"
}
GET _search <4>
{
"aggs": {
"messages": {
"terms": {
"field": "message"
}
}
}
}
--------------------------------------------------
2016-05-09 09:42:23 -04:00
// CONSOLE
2015-08-06 11:24:29 -04:00
<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.
2018-10-30 13:19:12 -04:00
TIP: The `ignore_above` setting can be updated on
2015-08-12 15:21:37 -04:00
existing fields using the <<indices-put-mapping,PUT mapping API>>.
2015-08-06 11:24:29 -04:00
This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
2017-10-24 04:50:47 -04:00
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
4 bytes.