Adds conceptual docs for token graphs.
These docs cover:
* How a token graph is constructed from a token stream
* How synonyms and multi-position tokens impact token graphs
* How token graphs are used during search
* Why some token filters produce invalid token graphs
Also makes the following supporting changes:
* Adds anchors to the 'Anatomy of an Analyzer' docs for cross-linking
* Adds several SVGs for token graph diagrams
This commit disables the sort optimization added in #51852 for scroll requests.
Scroll queries keep a state per shard so we cannot modify the request on
the first round (submit).
This bug was introduced in non-released versions which is why this pr
is marked as a non-issue.
On clusters with a large number of shards, the shards limits allocation
decider can exhibit poor performance leading to timeouts applying
cluster state updates. This occurs because for every shard, we do a loop
to count the number of shards on the node, and the number of shards for
the index of the shard. This is roughly quadratic in the number of
shards. This loop is not necessary, since we already have a O(1) method
to count the number of non-relocating shards on a node, and with this
commit we add some infrastructure to RoutingNode to make counting the
number of shards per index O(1).
* Adds per context settings:
`script.context.${CONTEXT}.cache_max_size` ~
`script.cache.max_size`
`script.context.${CONTEXT}.cache_expire` ~
`script.cache.expire`
`script.context.${CONTEXT}.max_compilations_rate` ~
`script.max_compilations_rate`
* Context cache is used if:
`script.max_compilations_rate=use-context`. This
value is dynamically updatable, so users can
switch back to the general cache if desired.
* Settings for context caches take the first value
that applies:
1) Context specific settings if set, eg
`script.context.ingest.cache_max_size`
2) Correlated general setting is set to the non-default
value, eg `script.cache.max_size`
3) Context default
The reason for 2's inclusion is to allow an easy
transition for users who've customized their general
cache settings.
Using the general cache settings for the context caches
results in higher effective settings, since they are
multiplied across the number of contexts. So a general
cache max size of 200 will become 200 * # of contexts.
However, this behavior it will avoid users snapping to a
value that is too low for them.
Backport of: #52855
Refs: #50152
Currently we don't send values for the `pre_filter_shard_size` and
`max_concurrent_shard_requests` SearchRequest parameters over http when using
the High Level Rest Client. This change adds these parameters to the
RequestConverters and tests.
Removes the `flat_settings` and `timeout` query parameters from the JSON
spec and asciidoc docs for the put index template API.
These parameters are not supported by the API.
This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents.
This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.
For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none.
This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.
The highlighting phase for percolator queries currently uses some custom query
traversal logic to find all instances of PercolatorQuery in the query tree for the
current search context. This commit converts things to instead use a QueryVisitor,
which future-proofs us against new wrapper queries or queries from custom
plugins that the percolator module doesn't know about.
Some clients have problems running this test as a numeric key is treated like an array index by default.
We can work around this by renaming the aggregation key to not be a numeric.
Sometimes we want to deprecate and remove a ParseField entirely, without replacement;
for example, the various places where we specify a _type field in 7x. Currently we can
tell users only that a particular field name should not be used, and that another name should
be used in its place. This commit adds the ability to say that a field should not be used at
all.
The introduction of the ExitableDirectoryReader showed increase of
latencies for range queries using pointvalues.
Check for cancellation every 1024 docs instead of every 15 to lower
the impact of the check in query's performance.
Follows: #52822Fixes: #53496
(cherry picked from commit 6b5fc35e4458e60a7ca5822584ec6a60562f2c01)
There is an assertion in ReloadAnalyzersResponse.merge that compares index names
of merged responses that was falsely using object equality instead of
String.equals(). In the past this didn't seem to matter but with changes in the
test setup we started to see failures. Correcting this and also simplifying test
a bit to be able to run it repeatedly if needed.
Closes#53443
It is useful to be able to delay state recovery until enough data nodes have
joined the cluster, since this gives the shard allocator a decent opportunity
to re-use as much existing data as possible. However we also have the option to
delay state recovery until a certain number of master-eligible nodes have
joined, and this is unnecessary: we require a majority of master-eligible nodes
for state recovery, and there is no advantage in waiting for more.
This commit deprecates the unnecessary settings in preparation for their
removal.
Relates #51806
changes the output format of preview regarding deduced mappings and enhances
it to return all the details about auto-index creation. This allows the user
to customize the index creation. Using HLRC you can create a index request
from the output of the response.
backport #53572
* Add REST API for ComponentTemplate CRUD
This adds the Put/Get/DeleteComponentTemplate APIs that allow inserting, retrieving, and removing
ComponentTemplateMetadata into the cluster state metadata.
These APIs are currently only available behind a feature flag system property -
`es.itv2_feature_flag_registered`.
Relates to #53101
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Re-applies the change from #53523 along with test fixes.
closes#53626closes#53624closes#53622closes#53625
Co-authored-by: Nik Everett <nik9000@gmail.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Jake Landis <jake.landis@elastic.co>
The multiline regex rule used to detect docs code snippets greater than
76 characters in length has considerable cost to. For the high level
rest client project alone, this was taking upwards of 3 minutes. This
commit updates the rule regex pattern to use non-greedy matching when
appropriate which results in a lot fewer backtracks and about a 70%
reduction in execution time on the high level rest client module.
This commit adjusts a `deprecation[...]` message in the docs since such
messages must be on a single line. It also moves this message to the start of
the description of the deprecated setting as is the case with other such
messages.