Removing several occurrences of this typo in the docs and javadocs, seems to be
a common mistake. Corrections turn up once in a while in PRs, better to correct
some of this in one sweep.
* Fix percolator highlight sub fetch phase to not highlight query twice
The PercolatorHighlightSubFetchPhase does not override hitExecute and since it extends HighlightPhase the search hits
are highlighted twice (by the highlight phase and then by the percolator). This does not alter the results, the second highlighting
just overrides the first one but this slow down the request because it duplicates the work.
Today we have all non-plugin mappers in core. I'd like to start moving those
that neither map to json datatypes nor are very frequently used like `date` or
`ip` to a module.
This commit creates a new module called `mappers-extra` and moves the
`scaled_float` and `token_count` mappers to it. I'd like to eventually move
`range` fields there but it's more complicated due to their intimate
relationship with range queries.
Relates #10368
RangeQueryBuilder needs to perform too many `instanceof` checks in order to
check for `date` or `range` fields in order to know what it should do with the
shape relation, time zone and date format.
This commit adds those 3 parameters to the `rangeQuery` factory method so that
those instanceof checks are not necessary anymore.
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.
Also improved the support for multiple percolate queries in a search request.
Security manager policy files contains grants for specific codebases,
where a codebase is a jar file. We use a system property containing the
name of the jar file to resolve the jar file location when parsing the
policy file. However, this means the version of the jars must be
modified when versions of dependencies change. This is particularly
messy for elasticsearch, where we now have a dependency on the rest
client, and need to support both a snapshot version for testing and non
snapshot for release.
This commit adds an alias for the elasticsearch rest client without a
version to be used in policy files. That allows the policy files to not care whether
the rest client is a snapshot or release.
* If in a range query upper is smaller than lower then ignore the range query
* If two empty range extractions are compared don't fail with NoSuchElementException
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
The current script service has a script compilation limit for a one
minute window. This is set to a small default value of 15. Instead of
increasing that default value, this commit introduces a new setting
that allows to configure a rate per time unit, so that the script service can deal with bursts better.
The new setting is named `script.max_compilations_rate`,
requires a nonnegative number and a positive time value.
The default is `75/5m`, which is equivalent to the existing 15 per minute.
* Moves deferring code into its own subclass
This change moves the code that deals with deferring collection to a subclass of BucketAggregator called DeferringBucketAggregator. This means that the code in AggregatorBase is simplified and also means that the code for deferring colleciton is in one place and easier to maintain.
* Makes SIngleBucketAggregator an interface
This is so aggregators that extend BucketsAggregator directly and those that extend DeferringBucketAggregator can be a single bucket aggregator
* review comments
* More review comments
* Remove the _all metadata field
This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
At current, we do not feel there is enough of a reason to shade the low
level rest client. It caused problems with commons logging and IDE's
during the brief time it was used. We did not know exactly how many
users will need this, and decided that leaving shading out until we
gather more information is best. Users can still shade the jar
themselves. For information and feeback, see issue #26366.
Closes#26328
This reverts commit 3a20922046.
This reverts commit 2c271f0f22.
This reverts commit 9d10dbea39.
This reverts commit e816ef89a2.
There is a group of five settings relating to raw tcp configurations
(no_delay, buffer sizes, etc) that we have for the http transport. These
currently live in the netty module. As they are unrelated to netty
specifically, this commit moves these settings to the
`HttpTransportSettings` class in core.
When slices is set as auto, there's an additional network call
needed for the reindex tasks to know how to rethrottle. Sometimes
the rethrottle action happens before the reindex task is fully
initialized, so in the test we wait for the task to be ready.
This commit also adds some safeguards to ensure that
cancel and rethrottle operations are handled correctly
Closes#26192
Links to inner classes were using `$` in urls instead of `.`, causing
them to 404.
Also fixes the doc generation code to generate docs into the correct
directory. We moved the docs but never updated the generation code.
Right now it is possible for the `HttpPipeliningHandler` to queue
pipelined responses. On channel close, we do not clear and release these
responses. This commit releases the responses and completes the promise.
Due to the weird way of structuring the serialization code in AcknowledgedRequest, many request types forgot to properly serialize the request timeout, for example "index deletion", "index rollover", "index shrink", "putting pipeline", and other requests. This means that if those requests were not directly sent to the master node, the acknowledgement timeout information would be lost (and the default used instead).
Some requests also don't properly expose the timeout mechanism in the REST layer, such as put / delete stored script. This commit fixes all that.
This test was too lenient with its randomization of targetFieldName and
resulting in a conflict with the original existing fields. This commit
fixes that.
Closes#26177.
The following token filters were moved: arabic_stem, brazilian_stem, czech_stem, dutch_stem, french_stem, german_stem and russian_stem.
Relates to #23658
In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them.
This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.
For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
With this commit we remove the following three previously unused
(and undocumented) Netty 4 related settings:
* transport.netty.max_cumulation_buffer_capacity,
* transport.netty.max_composite_buffer_components and
* http.netty.max_cumulation_buffer_capacity
from Elasticsearch.
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
Raw requests are supported only by the java yaml test runner and were introduced to test docs snippets. Some yaml tests ended up using them (see #23497) which causes failures for other language clients. This commit migrates those yaml tests to Java tests that send requests through the Java low-level REST client, and also moves the ability to send raw requests to a special client that's only available when testing docs snippets.
Closes#25694
* Adds mutate function to various tests
Relates to #25929
* fix test
* implements mutate function for all single bucket aggs
* review comments
* convert getMutateFunction to mutateIInstance
This commit adds the nio transport as an option in place of the mock tcp
transport for tests. Each test will only use one transport type. The
transport type is decided by a random boolean generated inside of the
`ESTestCase` class.
This commit updates the version for master to 7.0.0-alpha1. It also adds
the 6.1 version constant, and fixes many tests, as well as marking some
as awaits fix.
Closes#25893Closes#25870
This commit fixes an issue with the Netty 4 multi-port test that a
transport client can connect. The problem here is that in case the
bottom of the random port range was already bound to (for example, by
another JVM) then then transport client could not connect to the data
node. This is because the transport client was in fact using the bottom
of the port range only. Instead, we simply try all the ports that the
data node might be bound to.
Closes#24441
The following token filters were moved: delimited_payload_filter, keep, keep_types, classic, apostrophe, decimal_digit, fingerprint, min_hash and scandinavian_folding.
Relates to #23658
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.
The query builder's binary format has now the same bwc guarentees as the xcontent format.
Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery.
It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY.
These scores are invalid for TopDocsCollectors that relies on score comparison.
Fixes#15709Fixes#23628
Extracts ranges from range queries on byte, short, integer, long, half_float, scaled_float, float, double, date and ip fields.
byte, short, integer and date ranges are normalized to Lucene's LongRange.
half_float and float are normalized to Lucene's DoubleRange.
When extracting range queries, the QueryAnalyzer computes the width of the range. This width is used to determine
what range should be preferred in a conjunction query. The QueryAnalyzer prefers the smaller ranges, because these
ranges tend to match with less documents.
Closes#21040
Today we expose `IndexFieldDataService` outside of IndexService to do maintenance
or lookup field data in different ways. Yet, we have a streamlined way to access IndexFieldData
via `QueryShardContext` that should encapsulate all access to it. This also ensures that we control all other functionality like cache clearing etc.
This change also removes the `recycler` option from `ClearIndicesCacheRequest` this option is a no-op and should have been removed long ago.
Today when we aggregate on the `_index` field the cross cluster search
alias is not taken into account. Neither is it respected when we search
on the field. This change adds support for cluster alias when the cluster
alias is present on the `_index` field.
Closes#25606
This commit removes all external dependencies from the rest client jar
and shades them in an 'org.elasticsearch.client' package within the jar
using shadowJar gradle plugin. All projects that depended on the
existing jar have been converted to using the 'org.elasticsearch.client'
package prefixes to interact with the rest client.
Closes#25208
This change disables the graph analysis on default `shingle` filter.
The pre-configured shingle filter produces shingles of different size.
Graph analysis on such token stream is useless and dangerous as it may create too many paths.
Fixes#25555
This change rewrites search requests on the coordinating node before
we send requests to the individual shards. This will reduce the rewrite load
and object creation for each rewrite on the executing nodes and will fetch
resources only once instead of N times once per shard for queries like `terms`
query with index lookups. (among percolator and geo-shape)
Relates to #25791
Also has updates to ScriptMetaData for allowing the old namespace format to be loaded all the way back through 5.0; however, it will throw an exception if two scripts share the same id but different languages.
The `QueryRewriteContext` used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.
Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction
This commit calls the `useSystemProperties` method on the HttpAsyncClientBuilder so that the jvm
system properties are used. The primary reason for doing this is to ensure the builder uses the
system default SSLContext rather than the default instance created by the http client library.
Closes#23231
Today we have duplicated code that is quite complicated to iterate
over rewriteable (`QueryBuilders` mainly) This change introduces a
`Rewriteable` interface that allow to share code to do the rewriting as
well as encapsulation and composition of queries.
We currently use fielddata on the `_id` field which is trappy, especially as we
do it implicitly. This changes the `random_score` function to use doc ids when
no seed is provided and to suggest a field when a seed is provided.
For now the change only emits a deprecation warning when no field is supplied
but this should be replaced by a strict check on 7.0.
Closes#25240
The following token filters were moved: arabic_normalization, german_normalization, hindi_normalization, indic_normalization, persian_normalization, scandinavian_normalization, serbian_normalization, sorani_normalization, cjk_width and cjk_width
Relates to #23658
This change refactors the query_string query to analyze the query text around logical operators of the query string the same way than a match_query/multi_match_query.
It also adds a type parameter that can be used to change the way multi fields query are built the same way than a multi_match query does.
Now that these queries share the same behavior regarding text analysis, some parameters are obsolete and have been deprecated:
split_on_whitespace: This setting is now ignored with a deprecation notice
if it is used explicitely. With this PR The query_string always splits on logical operator.
It simplifies the understanding of the other parameters that can have different meanings
depending on the value of split_on_whitespace.
auto_generate_phrase_queries: This setting is now ignored with a deprecation notice
if it is used explicitely. This setting only makes sense when the parser splits on whitespace.
use_dismax: This setting is now ignored with a deprecation notice
if it is used explicitely. The tie_breaker parameter is sufficient to handle best_fields/most_fields.
Fixes#25574
It was brought up that our current client artifacts have generic names like 'rest' that may cause conflicts with other artifacts.
This commit renames:
- rest -> elasticsearch-rest-client
- sniffer -> elasticsearch-rest-client-sniffer
- rest-high-level -> elasticsearch-rest-high-level-client
A couple of small changes are also preparing the high level client for its first release.
Closes#20248
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.
This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
Requests that execute a stored script will no longer be allowed to specify the lang of the script. This information is stored in the cluster state making only an id necessary to execute against. Putting a stored script will still require a lang.
There is a bug when a call to `BytesReferenceStreamInput` skip is made
on a `BytesReference` that has an initial offset. The offset for the
current slice is added to the current index and then subtracted from the
length. This introduces the possibility of a negative number of bytes to
skip. This happens inside a loop, which leads to an infinte loop.
This commit correctly subtracts the current slice index from the
slice.length. Additionally, the `BytesArrayTests` are modified to test
instances that include an offset.
Currently when we close a channel in Netty4Utils.closeChannels we
block until the closing is complete. This introduces the possibility
that a network selector thread will block while waiting until a
separate network selector thread closes a channel.
For instance: T1 closes channel 1 (which is assigned to a T1 selector).
Channel 1's close listener executes the closing of the node. That
means that T1 now tries to close channel 2. However, channel 2 is
assigned to a selector that is running on T2. T1 now must wait until T2
closes that channel at some point in the future.
This commit addresses this by adding a boolean to closeChannels
indicating if we should block on close. We only set this boolean to true
if we are closing down the server channels at shutdown. This call is
never made from a network thread. When we call the closeChannels method
with that boolean set to false, we do not block on close.
This commit does two things:
- bumps the version from 6.0.0-alpha3 to 6.0.0-beta1
- renames the 6.0.0-alpha3 version constant to 6.0.0-beta1
Relates #25621
Indexing ids in binary form should help with indexing speed since we would
have to compare fewer bytes upon sorting, should help with memory usage of
the live version map since keys will be shorter, and might help with disk
usage depending on how efficient the terms dictionary is at compressing
terms.
Since we can only expect base64 ids in the auto-generated case, this PR tries
to use an encoding that makes the binary id equal to the base64-decoded id in
the majority of cases (253 out of 256). It also specializes numeric ids, since
this seems to be common when content that is stored in Elasticsearch comes
from another database that uses eg. auto-increment ids.
Another option could be to require base64 ids all the time. It would make things
simpler but I'm not sure users would welcome this requirement.
This PR should bring some benefits, but I expect it to be mostly useful when
coupled with something like #24615.
Closes#18154
Transport profiles unfortunately have never been validated. Yet, it's very
easy to make a mistake when configuring profiles which will most likely stay
undetected since we don't validate the settings but allow almost everything
based on the wildcard in `transport.profiles.*`. This change removes the
settings subset based parsing of profiles but rather uses concrete affix settings
for the profiles which makes it easier to fall back to higher level settings since
the fallback settings are present when the profile setting is parsed. Previously, it was
unclear in the code which setting is used ie. if the profiles settings (with removed
prefixes) or the global node setting. There is no distinction anymore since we don't pull
prefix based settings.
* Adds check for negative search request size
This change adds a check to `SearchSourceBuilder` to throw and exception if the size set on it is set to a negative value.
Closes#22530
* fix error in reindex
* update re-index tests
* Addresses review comment
* Fixed tests
* Added random negative size test
* Fixes test
QueryParseContext is currently only used as a wrapper for an XContentParser, so
this change removes it entirely and changes the appropriate APIs that use it so
far to only accept a parser instead.
This commit makes the use of the global network settings explicit instead
of implicit within NetworkService. It cleans up several places where we fall
back to the global settings while we should have used tcp or http ones.
In addition this change also removes unnecessary settings classes
These settings have not be working for a full major version since they
are not registered. Given that they are simply duplicates we can just remove
them.
Currently QueryParseContext is only a thin wrapper around an XContentParser that
adds little functionality of its own. I provides helpers for long deprecated
field names which can be removed and two helper methods that can be made static
and moved to other classes. This is a first step in helping to remove
QueryParseContext entirely.
This removes the remaining usage of `mapping.single_type` from the parent join
module and moves it's bwc test to the mixed cluster tests
Relates to #24961
Relates to #20257