mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-17 10:25:15 +00:00
Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
You need 4 bytes for characters outside the BMP, which includes many emoji and a bunch of less-common writing characters too.
This commit is contained in:
parent
bf557fd886
commit
559fc5a4de
@ -56,5 +56,5 @@ limit of `32766`.
|
||||
|
||||
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
|
||||
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
|
||||
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
|
||||
3 bytes.
|
||||
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
|
||||
4 bytes.
|
||||
|
Loading…
x
Reference in New Issue
Block a user