Currently regexes in Pattern Tokenizer docs are escaped (it seems according to Java rules). I think it is better not to escape them because JSON escaping should be automatic in client libraries, and string escaping depends on a client language used. The default pattern is `\W+`, not `\\W+`.
Closes#6615
Add `irish` analyzer
Add `sorani` analyzer (Kurdish)
Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc.
Add `thai` tokenizer: segments thai text into words.
Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer
Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself
Add `german_normalization` tokenfilter: umlaut/sharp S normalization
Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences
Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages
Add `sorani_normalization` tokenfilter: normalizes kurdish text
Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text
Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization`
Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk`
Add support access to default Thai stopword set "_thai_"
Fix some bugs and broken links in documentation.
Closes#5935
For the casual reader, the reference to "term queries" may be glossed over, yielding an unexpected result when using `regexp` queries.
This attempts to make that distinction more prominent.
Closes#6698
If the match query with cutoff_frequency encounters stacked tokens,
like synonyms in the same position, it returns a boolean query instead
of a common terms query. However, if the original operator was set
to "and", it was ignoring that and resetting the operator to "or".
In fact, if operator is "and" then there is little benefit in using
a common terms query as a must query is already
executed efficiently.
The `exists` and `missing` filters need to merge postings lists of all existing
terms, which can be very costly, especially on high-cardinality fields. This
commit indexes the field names of a document under `_field_names` and reuses it
to speed up the `exists` and `missing` filters.
This is only enabled for indices that are created on or after Elasticsearch
1.3.0.
Close#5659
Added the http.jsonp.enable option to configure disabling of JSONP responses, as those
might pose a security risk, and can be disabled if unused.
This also fixes bugs in NettyHttpChannel
* JSONP responses were never setting application/javascript as the content-type
* The content-type and content-length headers were being overwritten even if they were set before
Closes#6164
Percentile Rank Aggregation is the reverse of the Percetiles aggregation. It determines the percentile rank (the proportion of values less than a given value) of the provided array of values.
Closes#6386