Docs: Warning about the conflict with the Standard Tokenizer
The examples given requires a specific Tokenizer to work. Closes: 10645
This commit is contained in:
parent
60721b2a17
commit
4a94e1f14b
|
@ -78,3 +78,9 @@ Advance settings include:
|
|||
# see http://en.wikipedia.org/wiki/Zero-width_joiner
|
||||
\\u200D => ALPHANUM
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: Using a tokenizer like the `standard` tokenizer may interfere with
|
||||
the `catenate_*` and `preserve_original` parameters, as the original
|
||||
string may already have lost punctuation during tokenization. Instead,
|
||||
you may want to use the `whitespace` tokenizer.
|
||||
|
||||
|
|
Loading…
Reference in New Issue