Commit Graph

4 Commits

Author SHA1 Message Date
Christoph Büscher 3827918417 Add configurable `maxTokenLength` parameter to whitespace tokenizer (#26749)
Other tokenizers like the standard tokenizer allow overriding the default
maximum token length of 255 using the `"max_token_length` parameter. This change
enables using this parameter also with the whitespace tokenizer. The range that
is currently allowed is from 0 to StandardTokenizer.MAX_TOKEN_LENGTH_LIMIT,
which is 1024 * 1024 = 1048576 characters.

Closes #26643
2017-09-25 17:21:19 +02:00
Shubham Aggarwal e07e4cc4dd Fix incorrect heading for Whitespace Tokenizer (#22883) 2017-01-31 12:51:37 +01:00
Clinton Gormley 5da9e5dcbc Docs: Improved tokenizer docs (#18356)
* Docs: Improved tokenizer docs

Added descriptions and runnable examples

* Addressed Nik's comments

* Added TESTRESPONSEs for all tokenizer examples

* Added TESTRESPONSEs for all analyzer examples too

* Added docs, examples, and TESTRESPONSES for character filters

* Skipping two tests:

One interprets "$1" as a stack variable - same problem exists with the REST tests

The other because the "took" value is always different

* Fixed tests with "took"

* Fixed failing tests and removed preserve_original from fingerprint analyzer
2016-05-19 19:42:23 +02:00
Clinton Gormley 822043347e Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00