OpenSearch/docs/reference/mapping/params/ignore-above.asciidoc

[[ignore-above]]
=== `ignore_above`

Strings longer than the `ignore_above` setting will not be indexed or stored.
For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.

NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.

[source,console]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword",
        "ignore_above": 20 <1>
      }
    }
  }
}

PUT my_index/_doc/1 <2>
{
  "message": "Syntax error"
}

PUT my_index/_doc/2 <3>
{
  "message": "Syntax error with some long stacktrace"
}

GET my_index/_search <4>
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}
--------------------------------------------------

<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.

TIP: The `ignore_above` setting can be updated on
existing fields using the <<indices-put-mapping,PUT mapping API>>.

This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.

NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
4 bytes.
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`[[ignore-above]]`
			=== `ignore_above`

Document 5.0 mapping changes. 2016-03-18 12:01:27 -04:00			Strings longer than the `ignore_above` setting will not be indexed or stored.
Clarify ignore_above behavior with arrays of strings Currently docs don't explain how `ignore_above` behaves with arrays of strings. Clarify how `ignore_above` applies for arrays of strings and also note that all string(s) will still be visible in the `_source` field. Relates #33057 2018-08-22 11:18:30 -04:00			For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.

			NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00
[DOCS] Change // CONSOLE comments to [source,console] (#46441) (#46451) 2019-09-06 11:31:13 -04:00			`[source,console]`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`--------------------------------------------------`
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 09:13:52 -05:00			`PUT my_index`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`{`
			`"mappings": {`
Remove remaining occurances of "include_type_name=true" in docs (#37646) 2019-01-22 09:13:52 -05:00			`"properties": {`
			`"message": {`
			`"type": "keyword",`
			`"ignore_above": 20 <1>`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`}`
			`}`
			`}`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 11:47:53 -05:00			`PUT my_index/_doc/1 <2>`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`{`
			`"message": "Syntax error"`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 11:47:53 -05:00			`PUT my_index/_doc/2 <3>`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`{`
			`"message": "Syntax error with some long stacktrace"`
			`}`

Make the ignore_above docs tests more robust. (#43349) It is possible for internal ML indices like `.data-frame-notifications-1` to leak, causing other docs tests to fail when they accidentally search over these indices. This PR updates the ignore_above tests to only search a specific index. 2019-06-27 01:27:01 -04:00			`GET my_index/_search <4>`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`{`
			`"aggs": {`
			`"messages": {`
			`"terms": {`
			`"field": "message"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
[DOCS] Change // CONSOLE comments to [source,console] (#46441) (#46451) 2019-09-06 11:31:13 -04:00
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`<1> This field will ignore any string longer than 20 characters.`
			`<2> This document is indexed successfully.`
			<3> This document will be indexed, but without indexing the `message` field.
			`<4> Search returns both documents, but only the first is present in the terms aggregation.`

Remove some documentation that only makes sense with multiple types. (#35066) * Remove a tip about ignore_above that only makes sense with multiple types. * Remove a line from the percolator documentation that refers to multiple types. 2018-10-30 13:19:12 -04:00			TIP: The `ignore_above` setting can be updated on
Documented the update_all_types setting on PUT mapping Added docs to each mapping param to specify which ones can be updated when 2015-08-12 15:21:37 -04:00			`existing fields using the <<indices-put-mapping,PUT mapping API>>.`

Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`This option is also useful for protecting against Lucene's term byte-length`
			limit of `32766`.

			NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
			`bytes. If you use UTF-8 text with many non-ASCII characters, you may want to`
Update numbers to reflect 4-byte UTF-8-encoded characters (#27083) You need 4 bytes for characters outside the BMP, which includes many emoji and a bunch of less-common writing characters too. 2017-10-24 04:50:47 -04:00			set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
			`4 bytes.`