OpenSearch/docs/reference/mapping/params/ignore-above.asciidoc

[[ignore-above]]
=== `ignore_above`

Strings longer than the `ignore_above` setting will not be processed by the
<<analyzer,analyzer>> and will not be indexed. This is mainly useful for
<<mapping-index,`not_analyzed`>> string fields, which are typically used for
filtering, aggregations, and sorting.  These are structured fields and it
doesn't usually make sense to allow very long terms to be indexed in these
fields.

[source,js]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "string",
          "index": "not_analyzed",
          "ignore_above": 20 <1>
        }
      }
    }
  }
}

PUT my_index/my_type/1 <2>
{
  "message": "Syntax error"
}

PUT my_index/my_type/2 <3>
{
  "message": "Syntax error with some long stacktrace"
}

GET _search <4>
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}
--------------------------------------------------
// AUTOSENSE
<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.

TIP: The `ignore_above` setting is allowed to have different settings for
fields of the same name in the same index.  Its value can be updated on
existing fields using the <<indices-put-mapping,PUT mapping API>>.


This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.

NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
3 bytes.
Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`[[ignore-above]]`
			=== `ignore_above`

			Strings longer than the `ignore_above` setting will not be processed by the
			`<<analyzer,analyzer>> and will not be indexed. This is mainly useful for`
			<<mapping-index,`not_analyzed`>> string fields, which are typically used for
			`filtering, aggregations, and sorting. These are structured fields and it`
			`doesn't usually make sense to allow very long terms to be indexed in these`
			`fields.`

			`[source,js]`
			`--------------------------------------------------`
			`PUT my_index`
			`{`
			`"mappings": {`
			`"my_type": {`
			`"properties": {`
			`"message": {`
			`"type": "string",`
			`"index": "not_analyzed",`
			`"ignore_above": 20 <1>`
			`}`
			`}`
			`}`
			`}`
			`}`

			`PUT my_index/my_type/1 <2>`
			`{`
			`"message": "Syntax error"`
			`}`

			`PUT my_index/my_type/2 <3>`
			`{`
			`"message": "Syntax error with some long stacktrace"`
			`}`

			`GET _search <4>`
			`{`
			`"aggs": {`
			`"messages": {`
			`"terms": {`
			`"field": "message"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// AUTOSENSE`
			`<1> This field will ignore any string longer than 20 characters.`
			`<2> This document is indexed successfully.`
			<3> This document will be indexed, but without indexing the `message` field.
			`<4> Search returns both documents, but only the first is present in the terms aggregation.`

Documented the update_all_types setting on PUT mapping Added docs to each mapping param to specify which ones can be updated when 2015-08-12 15:21:37 -04:00			TIP: The `ignore_above` setting is allowed to have different settings for
			`fields of the same name in the same index. Its value can be updated on`
			`existing fields using the <<indices-put-mapping,PUT mapping API>>.`


Docs: Mapping docs completely rewritten for 2.0 2015-08-06 11:24:29 -04:00			`This option is also useful for protecting against Lucene's term byte-length`
			limit of `32766`.

			NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
			`bytes. If you use UTF-8 text with many non-ASCII characters, you may want to`
			set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
			`3 bytes.`