From f86e8c33c11a41f3601d86a782960a3036e863bb Mon Sep 17 00:00:00 2001
From: John Roesler <john.roesler@bazaarvoice.com>
Date: Thu, 9 Jul 2015 14:04:20 -0500
Subject: [PATCH] Docfix: ignore_above uses string length, not utf-8

ignore_above is used to guard against the lucene limitation
that a term cannot exceed 32766 bytes.

However, the implementation just used the character count, which
doesn't take into account the fact that some characters have
multi-byte utf-8 encodings.

This commit updates the docs to make this relationship clear.

Closes #11563
---
 docs/reference/mapping/types/core-types.asciidoc | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/reference/mapping/types/core-types.asciidoc b/docs/reference/mapping/types/core-types.asciidoc
index 945a5c4e708..e1e92846bb7 100644
--- a/docs/reference/mapping/types/core-types.asciidoc
+++ b/docs/reference/mapping/types/core-types.asciidoc
@@ -143,6 +143,12 @@ defaults to `true` or to the parent `object` type setting.
 |`ignore_above` |The analyzer will ignore strings larger than this size.
 Useful for generic `not_analyzed` fields that should ignore long text.
 
+This option is also useful for protecting against Lucene's term byte-length
+limit of `32766`. Note: the value for `ignore_above` is the _character count_,
+but Lucene counts bytes, so if you have UTF-8 text, you may want to set the
+limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most 3
+bytes.
+
 |`position_offset_gap` |Position increment gap between field instances
 with the same field name. Defaults to 0.
 |=======================================================================