[discrete] [[breaking_70_analysis_changes]] === Analysis changes //NOTE: The notable-breaking-changes tagged regions are re-used in the //Installation and Upgrade Guide //tag::notable-breaking-changes[] // end::notable-breaking-changes[] [discrete] [[limit-number-of-tokens-produced-by-analyze]] ==== Limiting the number of tokens produced by _analyze To safeguard against out of memory errors, the number of tokens that can be produced using the `_analyze` endpoint has been limited to 10000. This default limit can be changed for a particular index with the index setting `index.analyze.max_token_count`. [discrete] ==== Limiting the length of an analyzed text during highlighting Highlighting a text that was indexed without offsets or term vectors, requires analysis of this text in memory real time during the search request. For large texts this analysis may take substantial amount of time and memory. To protect against this, the maximum number of characters that will be analyzed has been limited to 1000000. This default limit can be changed for a particular index with the index setting `index.highlight.max_analyzed_offset`. [discrete] [[delimited-payload-filter-renaming]] ==== `delimited_payload_filter` renaming The `delimited_payload_filter` was deprecated and renamed to `delimited_payload` in 6.2. Using it in indices created before 7.0 will issue deprecation warnings. Using the old name in new indices created in 7.0 will throw an error. Use the new name `delimited_payload` instead. [discrete] [[standard-filter-removed]] ==== `standard` filter has been removed The `standard` token filter has been removed because it doesn't change anything in the stream. [discrete] ==== Deprecated standard_html_strip analyzer The `standard_html_strip` analyzer has been deprecated, and should be replaced with a combination of the `standard` tokenizer and `html_strip` char_filter. Indexes created using this analyzer will still be readable in elasticsearch 7.0, but it will not be possible to create new indexes using it. [discrete] [[deprecated-ngram-edgengram-token-filter-cannot-be-used]] ==== The deprecated `nGram` and `edgeNGram` token filter cannot be used on new indices The `nGram` and `edgeNGram` token filter names have been deprecated in an earlier 6.x version. Indexes created using these token filters will still be readable in elasticsearch 7.0 but indexing documents using those filter names will issue a deprecation warning. Using the deprecated names on new indices starting with version 7.0.0 will be prohibited and throw an error when indexing or analyzing documents. Both names should be replaced by `ngram` or `edge_ngram` respectively. [discrete] ==== Limit to the difference between max_size and min_size in NGramTokenFilter and NGramTokenizer To safeguard against creating too many index terms, the difference between `max_ngram` and `min_ngram` in `NGramTokenFilter` and `NGramTokenizer` has been limited to 1. This default limit can be changed with the index setting `index.max_ngram_diff`. Note that if the limit is exceeded a error is thrown only for new indices. For existing pre-7.0 indices, a deprecation warning is logged. [discrete] ==== Limit to the difference between max_shingle_size and min_shingle_size in ShingleTokenFilter To safeguard against creating too many tokens, the difference between `max_shingle_size` and `min_shingle_size` in `ShingleTokenFilter` has been limited to 3. This default limit can be changed with the index setting `index.max_shingle_diff`. Note that if the limit is exceeded a error is thrown only for new indices. For existing pre-7.0 indices, a deprecation warning is logged.