From 32eb5ffa925211030f4e3dd200118297e0a151c5 Mon Sep 17 00:00:00 2001
From: Adrien Grand <jpountz@gmail.com>
Date: Fri, 6 Dec 2013 22:37:04 +0100
Subject: [PATCH] [Docs] Document which encoding should be used in order to
 make sense of the offsets returned by the term vectors API.

Close #4363
---
 docs/reference/search/termvectors.asciidoc | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/reference/search/termvectors.asciidoc b/docs/reference/search/termvectors.asciidoc
index edcb2bbf735..a0bdaea967a 100644
--- a/docs/reference/search/termvectors.asciidoc
+++ b/docs/reference/search/termvectors.asciidoc
@@ -41,6 +41,14 @@ If the requested information wasn't stored in the index, it will be
 omitted without further warning. See <<mapping-types,type mapping>>
 for how to configure your index to store term vectors.
 
+[WARNING]
+======
+Start and end offsets assume UTF-16 encoding is being used. If you want to use
+these offsets in order to get the original text that produced this token, you
+should make sure that the string you are taking a sub-string of is also encoded
+using UTF-16.
+======
+
 [float]
 ==== Term statistics