OpenSearch/docs/reference/mapping/params/ignore-above.asciidoc

[[ignore-above]]
=== `ignore_above`

Strings longer than the `ignore_above` setting will not be indexed or stored.
For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.

NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.

[source,js]
--------------------------------------------------
PUT my_index?include_type_name=true
{
  "mappings": {
    "_doc": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 20 <1>
        }
      }
    }
  }
}

PUT my_index/_doc/1 <2>
{
  "message": "Syntax error"
}

PUT my_index/_doc/2 <3>
{
  "message": "Syntax error with some long stacktrace"
}

GET _search <4>
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.

TIP: The `ignore_above` setting can be updated on
existing fields using the <<indices-put-mapping,PUT mapping API>>.

This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.

NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
4 bytes.
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`[[ignore-above]]`
			=== `ignore_above`

Document 5.0 mapping changes. 2016-03-18 17:01:27 +01:00			Strings longer than the `ignore_above` setting will not be indexed or stored.
Clarify ignore_above behavior with arrays of strings Currently docs don't explain how `ignore_above` behaves with arrays of strings. Clarify how `ignore_above` applies for arrays of strings and also note that all string(s) will still be visible in the `_source` field. Relates #33057 2018-08-22 18:18:30 +03:00			For arrays of strings, `ignore_above` will be applied for each array element separately and string elements longer than `ignore_above` will not be indexed or stored.

			NOTE: All strings/array elements will still be present in the `_source` field, if the latter is enabled which is the default in Elasticsearch.
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00
			`[source,js]`
			`--------------------------------------------------`
Update the default for include_type_name to false. (#37285) * Default include_type_name to false for get and put mappings. * Default include_type_name to false for get field mappings. * Add a constant for the default include_type_name value. * Default include_type_name to false for get and put index templates. * Default include_type_name to false for create index. * Update create index calls in REST documentation to use include_type_name=true. * Some minor clean-ups around the get index API. * In REST tests, use include_type_name=true by default for index creation. * Make sure to use 'expression == false'. * Clarify the different IndexTemplateMetaData toXContent methods. * Fix FullClusterRestartIT#testSnapshotRestore. * Fix the ml_anomalies_default_mappings test. * Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests. We make sure to specify include_type_name=true during xContent parsing, so we continue to test the legacy typed responses. XContent generation for the typeless responses is currently only covered by REST tests, but we will be adding unit test coverage for these as we implement each typeless API in the Java HLRC. This commit also refactors GetMappingsResponse to follow the same appraoch as the other mappings-related responses, where we read include_type_name out of the xContent params, instead of creating a second toXContent method. This gives better consistency in the response parsing code. * Fix more REST tests. * Improve some wording in the create index documentation. * Add a note about types removal in the create index docs. * Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL. * Make sure to mention include_type_name in the REST docs for affected APIs. * Make sure to use 'expression == false' in FullClusterRestartIT. * Mention include_type_name in the REST templates docs. 2019-01-14 13:08:01 -08:00			`PUT my_index?include_type_name=true`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`{`
			`"mappings": {`
Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 17:47:53 +01:00			`"_doc": {`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`"properties": {`
			`"message": {`
Document 5.0 mapping changes. 2016-03-18 17:01:27 +01:00			`"type": "keyword",`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`"ignore_above": 20 <1>`
			`}`
			`}`
			`}`
			`}`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 17:47:53 +01:00			`PUT my_index/_doc/1 <2>`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`{`
			`"message": "Syntax error"`
			`}`

Allow `_doc` as a type. (#27816) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751 2017-12-14 17:47:53 +01:00			`PUT my_index/_doc/2 <3>`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`{`
			`"message": "Syntax error with some long stacktrace"`
			`}`

			`GET _search <4>`
			`{`
			`"aggs": {`
			`"messages": {`
			`"terms": {`
			`"field": "message"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00			`// CONSOLE`
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`<1> This field will ignore any string longer than 20 characters.`
			`<2> This document is indexed successfully.`
			<3> This document will be indexed, but without indexing the `message` field.
			`<4> Search returns both documents, but only the first is present in the terms aggregation.`

Remove some documentation that only makes sense with multiple types. (#35066) * Remove a tip about ignore_above that only makes sense with multiple types. * Remove a line from the percolator documentation that refers to multiple types. 2018-10-30 10:19:12 -07:00			TIP: The `ignore_above` setting can be updated on
Documented the update_all_types setting on PUT mapping Added docs to each mapping param to specify which ones can be updated when 2015-08-12 21:21:37 +02:00			`existing fields using the <<indices-put-mapping,PUT mapping API>>.`

Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:29 +02:00			`This option is also useful for protecting against Lucene's term byte-length`
			limit of `32766`.

			NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
			`bytes. If you use UTF-8 text with many non-ASCII characters, you may want to`
Update numbers to reflect 4-byte UTF-8-encoded characters (#27083) You need 4 bytes for characters outside the BMP, which includes many emoji and a bunch of less-common writing characters too. 2017-10-24 09:50:47 +01:00			set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
			`4 bytes.`