80 lines
3.6 KiB
Plaintext
80 lines
3.6 KiB
Plaintext
[discrete]
|
|
[[breaking_70_analysis_changes]]
|
|
=== Analysis changes
|
|
|
|
//NOTE: The notable-breaking-changes tagged regions are re-used in the
|
|
//Installation and Upgrade Guide
|
|
|
|
//tag::notable-breaking-changes[]
|
|
|
|
// end::notable-breaking-changes[]
|
|
|
|
[discrete]
|
|
[[limit-number-of-tokens-produced-by-analyze]]
|
|
==== Limiting the number of tokens produced by _analyze
|
|
|
|
To safeguard against out of memory errors, the number of tokens that can be produced
|
|
using the `_analyze` endpoint has been limited to 10000. This default limit can be changed
|
|
for a particular index with the index setting `index.analyze.max_token_count`.
|
|
|
|
[discrete]
|
|
==== Limiting the length of an analyzed text during highlighting
|
|
|
|
Highlighting a text that was indexed without offsets or term vectors,
|
|
requires analysis of this text in memory real time during the search request.
|
|
For large texts this analysis may take substantial amount of time and memory.
|
|
To protect against this, the maximum number of characters that will be analyzed has been
|
|
limited to 1000000. This default limit can be changed
|
|
for a particular index with the index setting `index.highlight.max_analyzed_offset`.
|
|
|
|
[discrete]
|
|
[[delimited-payload-filter-renaming]]
|
|
==== `delimited_payload_filter` renaming
|
|
|
|
The `delimited_payload_filter` was deprecated and renamed to `delimited_payload` in 6.2.
|
|
Using it in indices created before 7.0 will issue deprecation warnings. Using the old
|
|
name in new indices created in 7.0 will throw an error. Use the new name `delimited_payload`
|
|
instead.
|
|
|
|
[discrete]
|
|
[[standard-filter-removed]]
|
|
==== `standard` filter has been removed
|
|
|
|
The `standard` token filter has been removed because it doesn't change anything in the stream.
|
|
|
|
[discrete]
|
|
==== Deprecated standard_html_strip analyzer
|
|
|
|
The `standard_html_strip` analyzer has been deprecated, and should be replaced
|
|
with a combination of the `standard` tokenizer and `html_strip` char_filter.
|
|
Indexes created using this analyzer will still be readable in elasticsearch 7.0,
|
|
but it will not be possible to create new indexes using it.
|
|
|
|
[discrete]
|
|
[[deprecated-ngram-edgengram-token-filter-cannot-be-used]]
|
|
==== The deprecated `nGram` and `edgeNGram` token filter cannot be used on new indices
|
|
|
|
The `nGram` and `edgeNGram` token filter names have been deprecated in an earlier 6.x version.
|
|
Indexes created using these token filters will still be readable in elasticsearch 7.0 but indexing
|
|
documents using those filter names will issue a deprecation warning. Using the deprecated names on
|
|
new indices starting with version 7.0.0 will be prohibited and throw an error when indexing
|
|
or analyzing documents. Both names should be replaced by `ngram` or `edge_ngram` respectively.
|
|
|
|
[discrete]
|
|
==== Limit to the difference between max_size and min_size in NGramTokenFilter and NGramTokenizer
|
|
|
|
To safeguard against creating too many index terms, the difference between `max_ngram` and
|
|
`min_ngram` in `NGramTokenFilter` and `NGramTokenizer` has been limited to 1. This default
|
|
limit can be changed with the index setting `index.max_ngram_diff`. Note that if the limit is
|
|
exceeded a error is thrown only for new indices. For existing pre-7.0 indices, a deprecation
|
|
warning is logged.
|
|
|
|
[discrete]
|
|
==== Limit to the difference between max_shingle_size and min_shingle_size in ShingleTokenFilter
|
|
|
|
To safeguard against creating too many tokens, the difference between `max_shingle_size` and
|
|
`min_shingle_size` in `ShingleTokenFilter` has been limited to 3. This default
|
|
limit can be changed with the index setting `index.max_shingle_diff`. Note that if the limit is
|
|
exceeded a error is thrown only for new indices. For existing pre-7.0 indices, a deprecation
|
|
warning is logged.
|