Docs: Warning about the conflict with the Standard Tokenizer

The examples given requires a specific Tokenizer to work.

Closes: 10645
This commit is contained in:
Benoit Delbosc 2015-04-17 14:59:24 +02:00 committed by Clinton Gormley
parent 60721b2a17
commit 4a94e1f14b
1 changed files with 17 additions and 11 deletions

View File

@ -78,3 +78,9 @@ Advance settings include:
# see http://en.wikipedia.org/wiki/Zero-width_joiner
\\u200D => ALPHANUM
--------------------------------------------------
NOTE: Using a tokenizer like the `standard` tokenizer may interfere with
the `catenate_*` and `preserve_original` parameters, as the original
string may already have lost punctuation during tokenization. Instead,
you may want to use the `whitespace` tokenizer.