Switch from a 32 bit Java hash to a 128 bit Murmur hash for
creating document IDs from by/over/partition field values.
The 32 bit Java hash was not sufficiently unique, and could
produce identical numbers for relatively common combinations
of by/partition field values such as L018/128 and L017/228.
Fixes#50613
The end offset of a tokenizer is supposed to point one past the
end of the input, not to the end character of the input. The
ml_classic tokenizer was erroneously doing the latter.
This test seems to be bogus as it was confusing a nominal execution time with a
delay (i.e. an elapsed time). This commit reworks the test to address this.
Fixes#50650
With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API,
term queries that are direct children of a boolean query are handled separately from
other children. This works fine for conjunctions, but for disjunctions we need to
treat the extracted terms from these direct descendents along with extractions from
more deeply nested children to ensure that minimum-should-match requirements
are met correctly.
This commit changes the logic in QueryAnalyzer#getResult() to bundle child term
results with all other results before handling them.
Fixes#50305
A failure of a recovering shard can race with a new allocation of the shard, and cause the new
allocation to be failed as well. This can result in a shard being marked as initializing in the cluster
state, but not exist on the node anymore.
Closes#50508
* Add aditional logging for ILM history store tests (#50624)
These tests use the same index name, making it hard to read logs when
diagnosing the failures. Additionally more information about the current
state of the index could be retrieved when failing.
This changes these two things in the hope of capturing more data about
why this fails on some CI nodes but not others.
Relates to #50353
This change removes a no longer used method, `fromByte`, in
IndicesOptions. This method was necessary for backwards compatibility
with versions prior to 6.4.0 and was used when talking to those
versions. However, the minimum wire compatibility version has changed
and we no longer use this code.
Backport of #50665
This drops all remaining references to `BaseRestHandler.logger` which
has been deprecated for something like a year now. I replaced all of the
references with locally declared loggers which is so much less spooky
action at a distance to me.
Sharing a random generator may cause test failures as non-threadsafe random generators are periodically utilized in tests (see: https://github.com/elastic/elasticsearch/issues/50651)
This change constructs a calls `Randomness.get()` within the `bulkIndexWithRetry` method so that the returned `Random` object is only used in a single thread. Before, the member variable could have been used between threads, which caused test failures.
* [ML][Inference] lang_ident model (#50292)
This PR contains a java port of Google's CLD3 compact NN model https://github.com/google/cld3
The ported model is formatted to fit within our inference model formatting and stored as a resource in the `:xpack:ml:` plugin and is under basic license.
The model is broken up into two major parts:
- Preprocessing through the custom embedding (based on CLD3's embedding layer)
- Pushing the embedded text through the two layers of fully connected shallow NN.
Main differences between this port and CLD3:
- We take advantage of Java's internal Unicode handling where possible (i.e. codepoints, characters, decoders, etc.)
- We do not trim down input text by removing duplicated tokens
- We do not encode doubles/floats as longs/integers.
* Fix GCS Mock Broken Handling of some Blobs
We were incorrectly handling blobs starting in `\r\n` which broke
tests randomly when blobs started on these.
Relates #49429
This replaces the hand written xcontent parsers for significance
heristics with `ObjectParser` and parsing named xcontent.
As a happy accident, this was the last user of `ParseFieldRegistry` so
this PR entirely removes that class.
Closes#25519
Currently, we use delayed address resolution in the proxy strategy tests
to allow tests to connect to different addresses. Unfortunately, this
has the potential to introduce races as the address is resolved each
connection attempt. The number of connection attempts can vary based on
when connections are opening and closing. This commit modifies the test
be allowing them to specifically control which address is used.
Related to #50618
The docs/reference/redirects.asciidoc file stores a list of relocated or
deleted pages for the Elasticsearch Reference documentation.
This prunes several older redirects that are no longer needed.
This test code fixes a serialization test bug:
https://gradle-enterprise.elastic.co/s/7x2ct6yywkw3o
Rarely stats for the same processor are generated and
the production code then sums up these stats. However
the test code wasn't summing up in that case,
which caused inconsistencies between the actual and expected results.
Closes#50507
If `geo_point fields` are multi-valued, using `geo_centroid` as a
sub-agg to `geohash_grid` could result in centroids outside of bucket
boundaries.
This adds a related warning to the geo_centroid agg docs.
Previously, as long as a deleted version value was kept as a tombstone,
another index or delete operation against the same id would leak that
the doc had existed (through seq_no info) or would allow the operation
if the client forged the seq_no. Fixed to disregard info on deleted docs
when doing seq_no based optimistic concurrency check.
Today the `InternalClusterInfoService` collects information on the sizes of
shards of open indices, but does not consider closed indices. This means that
shards of closed indices are treated as having zero size when they are being
allocated. This commit fixes this, obtaining the sizes of all shards.
Relates #33888
This adds a new cluster privilege `monitor_snapshot` which is a restricted
version of `create_snapshot`, granting the same privileges to view
snapshot and repository info and status but not granting the actual
privilege to create a snapshot.
Co-authored-by: j-bean <anton.shuvaev91@gmail.com>
Some changes had to be made in order to make the test pass due to the removal or types.
Added some more assertions. The failure description in this comment [0] indicates that the rest handler couldn't be found. The test passes now.
I plan to merge this into master and see how CI reacts, if it handles this change well then I will also unmute this test in 7 dot x branch.
Also check watch count after stopping watcher in test teardown and
disabled slm in smoke test watcher qa test.
Relates to #41172
0: https://github.com/elastic/elasticsearch/issues/41172#issuecomment-496993976
* Adds JavaDoc to `AbstractWireTestCase` and
`AbstractWireSerializingTestCase` so it is more obvious you should prefer
the latter if you have a choice
* Moves the `instanceReader` method out of `AbstractWireTestCase` becaue
it is no longer used.
* Marks a bunch of methods final so it is more obvious which classes are
for what.
* Cleans up the side effects of the above.
Follow up to #50550. Cache empty nodes lists (`fetchDynamicNodes` will return an empty list in case of failure)
now that the plugin properly retries requests to AWS EC2 APIs.
This adds support for "collapsed" named object to `ObjectParser`. In
particular, this supports the sort of xcontent that we use to specify
significance heuristics. See #25519 and this example:
```
GET /_search
{
"query" : {
"terms" : {"force" : [ "British Transport Police" ]}
},
"aggregations" : {
"significant_crime_types" : {
"significant_terms" : {
"field" : "crime_type",
"mutual_information" : { <<------- This is the name
"include_negatives": true
}
}
}
}
}
```
I believe there are a couple of things that work this way.
I've held off on moving the actual parsing of the significant heuristics
to this code to keep the review more compact. The moving is pretty
mechanical stuff in the aggs framework.
*Most* of our parsing can be done without passing any extra context into
the parser that isn't already part of the xcontent stream. While I was
looking around at the places that *do* need a context I found a few
places that were declared to need a context but don't actually need it.
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to these
parsers.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
A geo box with a top value of Double.NEGATIVE_INFINITY will yield an empty
xContent which translates to a null `geoBoundingBox`. This commit marks the
field as `Nullable` and guards against null when retrieving the `topLeft`
and `bottomRight` fields.
Fixes https://github.com/elastic/elasticsearch/issues/50505
(cherry picked from commit 051718f9b1e1ca957229b01e80d7b79d7e727e14)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This marks a couple of constants in the `DecayFunctionBuilder` as final.
They are written in CONSTANT_CASE and used as constants but not final
which is a little confusing and might lead to sneaky bugs.
FutureUtils.get() would unwrap ElasticsearchWrapperExceptions. This
is trappy, since nearly all usages of FutureUtils.get() expected only to
not have to deal with checked exceptions.
In particular, StepListener builds upon ListenableFuture which uses
FutureUtils.get to be informed about the exception passed to onFailure.
This had the bad consequence of masking away any exception that was an
ElasticsearchWrapperException like RemoteTransportException.
Specifically for recovery, this made CircuitBreakerExceptions happening
on the target node look like they originated from the source node.
The only usage that expected that behaviour was AdapterActionFuture.
The unwrap behaviour has been moved to that class.
This adds support for retrying AsyncActionSteps by triggering the async
step after ILM was moved back on the failed step (the async step we'll
be attempting to run after the cluster state reflects ILM being moved
back on the failed step).
This also marks the RolloverStep as retryable and adds an integration
test where the RolloverStep is failing to execute as the rolled over
index already exists to test that the async action RolloverStep is
retried until the rolled over index is deleted.
(cherry picked from commit 8bee5f4cb58a1242cc2ef4bc0317dae6c8be49d3)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Adds a `force` parameter to the delete data frame analytics
request. When `force` is `true`, the action force-stops the
jobs and then proceeds to the deletion. This can be used in
order to delete a non-stopped job with a single request.
Closes#48124
Backport of #50553
Today we log changes to index settings like this:
updating [index.setting.blah] from [A] to [B]
The identity of the index whose settings were updated is conspicuously absent
from this message. This commit addresses this by adding the index name to these
messages.
Fixes#49818.
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.
Closes#49595