[Docs] Document which encoding should be used in order to make sense of the offsets returned by the term vectors API.

Close #4363
This commit is contained in:
Adrien Grand 2013-12-06 22:37:04 +01:00
parent a1d4731137
commit 32eb5ffa92

View File

@ -41,6 +41,14 @@ If the requested information wasn't stored in the index, it will be
omitted without further warning. See <<mapping-types,type mapping>>
for how to configure your index to store term vectors.
[WARNING]
======
Start and end offsets assume UTF-16 encoding is being used. If you want to use
these offsets in order to get the original text that produced this token, you
should make sure that the string you are taking a sub-string of is also encoded
using UTF-16.
======
[float]
==== Term statistics