This improves the way similarities are plugged in in order to:
- reject the classic similarity on 7.x indices and emit a deprecation
warning otherwise
- reject unkwown parameters on 7.x indices and emit a deprecation
warning otherwise
Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.
Closes#23208Closes#29035
While playing with the percolator I found two bugs:
- Sometimes we set a min_should_match that is greater than the number of
extractions. While this doesn't cause direct trouble, it does when the query
is nested into a boolean query and the boolean query tries to compute the
min_should_match for the entire query based on its own min_should_match and
those of the sub queries. So I changed the code to throw an exception when
min_should_match is greater than the number of extractions.
- Boolean queries claim matches are verified when in fact they shouldn't. This
is due to the fact that boolean queries assume that they are verified if all
sub clauses are verified but things are more complex than that, eg.
conjunctions that are nested in a disjunction or disjunctions that are nested
in a conjunction can generally not be verified without running the query.
This moves the `Nullable` annotation into the elasticsearch-core project, so it
may be used without relying entirely on the server jar. This will allow us to
decouple more pieces to make them smaller.
In addition, there were two different `Nullable` annotations, these have all
been moved to the ES version rather than the inject version.
We have seen exceptions bubble up to the uncaught exception handler. Checking the blocks can
lead for example to IndexNotFoundException when the indices are resolved. In order to make
TransportMasterNodeAction more resilient against such expected exceptions, this code change
wraps the execution of doStart() into a try catch and informs the listener in case of failures.
DiskThresholdDecider currently assumes that the source index of a resize operation (e.g. shrink)
is available, and throws an IndexNotFoundException otherwise, thereby breaking any kind of shard
allocation. This can be quite harmful if the source index is deleted during a shrink, or if the source
index is unavailable during state recovery.
While this behavior has been partly fixed in 6.1 and above (due to #26931), it relies on the order in
which AllocationDeciders are executed (i.e. that ResizeAllocationDecider returns NO, ensuring that
DiskThresholdDecider does not run, something that for example does not hold for the allocation
explain API).
This change adds a more complete fix, and also solves the situation for 5.6.
This commit adds a new fixture that emulates a S3 service in order to
improve the existing integration tests. This is very similar to what has
been made for Google Cloud Storage in #28788, and such tests would
have helped a lot to catch bugs like #22534.
The AmazonS3Fixture is brittle and only implements the very necessary
stuff for the S3 repository to work, but at least it works and can be
adapted for specific tests needs.
Fixes and edge case where DiscountedCumulativeGain can return NaN as
result of the quality metric calculation. This can happen when the
search result set is empty and normalization is used. We should return 0
in this case. Also adding related unit tests to the other two metrics.
* Pass script level params into scripted metric aggs (#28819)
Now params that are passed at the script level and at the aggregation level
are merged and can both be used in the aggregation scripts. If there are
any conflicts, aggregation level params will win. This may be followed
by another change detecting that case and throwing an exception to
disallow such conflicts.
* Disallow duplicate parameter names between scripted agg and script (#28819)
If a scripted metric aggregation has aggregation params and script params
which have the same name, throw an IllegalArgumentException when merging
the parameter lists.
I am not sure why we have this leniency for HTTP max content length, it
has been there since the beginning
(5ac51ee93f) with no explanation of its
source. That said, our philosophy today is different than the philosophy
of the past where Elasticsearch would be quite lenient in its handling
of settings and today we aim for predictability for both users and
us. This commit removes leniency in the parsing of
http.max_content_length.
* Begin moving XContent to a separate lib/artifact
This commit moves a large portion of the XContent code from the `server` project
to the `libs/xcontent` project. For the pieces that have been moved, some
helpers have been duplicated to allow them to be decoupled from ES helper
classes. In addition, `Booleans` and `CheckedFunction` have been moved to the
`elasticsearch-core` project.
This decoupling is a move so that we can eventually make things like the
high-level REST client not rely on the entire ES jar, only the parts it needs.
There are some pieces that are still not decoupled, in particular some of the
XContent tests still remain in the server project, this is because they test a
large portion of the pluggable xcontent pieces through
`XContentElasticsearchException`. They may be decoupled in future work.
Additionally, there may be more piecese that we want to move to the xcontent lib
in the future that are not part of this PR, this is a starting point.
Relates to #28504
* Add test matrix axis files for periodic java testing
* Add properties file defining java versions to use
* We have no openjdk8
* Remove openjdk
Oracle Java and OpenJDK basically only differ in license, so we don't
need to test both.
Fix a couple of minor things in the InternalEngine:
* Rename loadOrGenerateHistoryUUID to reflect that it always generates a UUID
* Move .acquire() call next to the associated try {} block.
Today this part of the documentation just says that Geo queries are not 100%
accurate, but in fact we can be more precise about which kinds of queries see
which kinds of error. This commit clarifies this point.
At time of writing, GeoJSON did not enforce a specific ordering of vertices in
a polygon, but it now does. We occasionally get reports of Elasticsearch
rejecting apparently-valid GeoJSON because of badly oriented polygons, and it's
helpful to be able to point at this bit of the documentation when responding.
Removes a set of assertions in the test framework that verified that
Streamable objects could be serialized and deserialized across different
versions. When this was discussed the consensus was that this approach
has not caught many bugs in a long time and that serialization testing of
objects was best left to their respective unit and integration tests.
This commit also removes a transport interceptor that was used in
ESIntegTestCase tests to make these assertions about objects coming in
or off the wire.
The parsing code for script query currently silently skips by any tokens
it does not know about within its parsing loop. The only token it does
not catch is an array, which means pasing multiple scripts in via an
array will cause the last script to be parsed and one, silently dropping
the others. This commit adds validation that arrays are not seen while
parsing.
We use a latch when sending requests during tests so that we do not hang
forever waiting for replies on those requests. This commit increases the
timeout on that latch to 30 seconds because sometimes 10 seconds is just
not enough.
This commit changes the sysprop for overriding the branch bwc builds use
to be branch specific. There are 3 different bwc branches built, but all
of them currently read the exact same sysprop. For example, with this change
and current branches, you can now specify eg `-Dtests.bwc.refspec.6.x=my_6x`
and it will build only next-minor-snapshot with that branch, while
next-bugfix-snapshot will continue to use 5.6.
Since #29260, unsafe commits must be trimmed before opening an engine.
This makes the engine constructor follow Lucene standard semantics and
use the last commit. However, we haven't fully applied this change in some
tests.
Relates #29260
As follow up to #28245 , this PR removes the logic for selecting the
right start commit from the Engine constructor in favor of explicitly
trimming them in the Store, before the engine is opened. This makes the
constructor in engine follow standard Lucene semantics and use the last
commit.
Relates #28245
Relates #29156
Due to special treatment for the 0xFFFFFF... value in GeoHashUtils'
encodeLatLon method, the hashcode for lat 90, lon 180 is incorrectly
encoded as `"000000000000"` instead of "zzzzzzzzzzzz". This commit
removes the special treatment and fixes the issue.
Closes#22163
When deleting a snapshot, it is not necessary to load and to parse the
global metadata of the snapshot to delete. Now indices are stored in
the snapshot metadata file, we have all the information to resolve the
shards files to delete.
This commit removes the readSnapshotMetaData() method that was used to
load both global and index metadata files. Test coverage should be
enough as SharedClusterSnapshotRestoreIT already contains several
deletion tests.
Related to #28934
The sysprop repos.mavenLocal may be used to add the local .m2 maven
repository for testing snapshots of locally build dependencies.
Unfortunately this has to be checked in two different places (they cannot
be shared, due to buildSrc being built essentially as a separate
project), and the casing of the string sysprop lookups did not align.
This commit fixes BuildPlugin's checking of repos.mavenLocal to use the
correct casing (camelCase, to match the gradle dsl element).
Today we have a few problems with how we handle bad requests:
- handling requests with bad encoding
- handling requests with invalid value for filter_path/pretty/human
- handling requests with a garbage Content-Type header
There are two problems:
- in every case, we give an empty response to the client
- in most cases, we leak the byte buffer backing the request!
These problems are caused by a broader problem: poor handling preparing
the request for handling, or the channel to write to when the response
is ready. This commit addresses these issues by taking a unified
approach to all of them that ensures that:
- we respond to the client with the exception that blew us up
- we do not leak the byte buffer backing the request
We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.
This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.
Closes#26802
When the `BulkProcessor` is used with the high-level REST client, a scheduler is internally created that allows to schedule tasks. Such scheduler is not exposed to users and needs to be closed once the `BulkProcessor` is closed. There are two ways to close the `BulkProcessor` though, one is the ordinary `close` method and the other one is `awaitClose`. The former closes the scheduler while the latter doesn't, leaving threads lingering.
We previously had a property to specify the location of the REST test
spec files but this was removed in a previous refactoring yet left
behind in the docs. This commit removes the last remaining vestige of
this parameter.
DocumentParser: The checks for Text and Keyword were masked by the
earlier check for String, which they are child classes of. As String
field types are no longer supported, this check can be removed.