Closes#19528
* Port a test Colin wrote for the TDigest library to validate TDigests storing over 2*MAXINT values. This appears to have been fixed in version 3.2 of TDigest, which Elasticsearch has been using for some time, so no changes were necessary to resolve this issue.
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
This commit replaces usages of Streamable with Writeable for the
BaseTasksResponse / TransportTasksAction classes and subclasses of
these classes.
Note that where possible response fields were made final.
Relates to #34389
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
* Move `createRepository` call out of cluster state tasks
* Now only `RepositoriesService#applyClusterState` manipulates `this.repositories`
* Closes#9488
This commit makes `document`, `update`, `explain`, `termvectors` and `mapping`
typeless APIs work on indices that have a type whose name is not `_doc`.
Unfortunately, this needs to be a bit of a hack since I didn't want calls with
random type names to see documents with the type name that the user had chosen
upon type creation.
The `explain` and `termvectors` do not support being called without a type for
now so the test is just using `_doc` as a type for now, we will need to fix
tests later but this shouldn't require further changes server-side since passing
`_doc` as a type name is what typeless APIs do internally anyway.
Relates #35190
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.
Closes#33826
Today we configure the soft-deletes field iff soft-deletes enabled.
Although this choice was correct, it prevents an engine with
soft-deletes disabled from opening a Lucene index with soft-deletes.
Moreover, this change should not have any side-effect if a Lucene index
does not have any soft-deletes.
Relates #36141
This commit changes how an operation which requires all index shard
operations permits is executed when a primary term update is required:
the operation and the update are combined so that the operation is
executed after the primary term update under the same blocking
operation.
Closes#35850
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
This change adds a soft limit to open scroll contexts that can be controlled with the dynamic cluster setting `search.max_open_scroll_context` (defaults to 500).
Given that we check the max buckets limit on each shard when collecting the buckets, and that non final reduction cannot add buckets (see #35921), there is no point in counting and checking the number of buckets as part of non final reduction phases.
Such check is still needed though in the final reduction phases to make sure that the number of returned buckets is not above the allowed threshold.
Relates somehow to #32125 as we will make use of non final reduction phases in CCS alternate execution mode and that increases the chance that this check trips for nothing when reducing aggs in each remote cluster.
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Moved method `canOpenIndex` is only used in tests -> moved to test CP
* Simplify `org.elasticsearch.index.store.Store#renameTempFilesSafe`
* Delete some dead methods
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.
Closes#33764
Empty buckets don't need to be added when performing an incremental reduction step, they can be added later in the final reduction step. This will allow us to later remove the max buckets limit when performing non final reduction.
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
Prior to #35441 `ConnectionManager` had a `Lifecycle` object to support
the ping runnable. After that commit, the connection amanger only needs
the existing `AtomicBoolean` to indicate if it is running.
New that we test with min_doc_count set to 0 as well, we may end up generating a lot more buckets. This commit adjusts the min bound and max bound, as well as the offset for each randomly generated agg instance so that we don't end up hitting the 10.000 max buckets limit.
Relates to #36064
In this test we were randomizing different values but minDocCount was hardcoded to 1. It's important to test other values, especially `0` as it's the default. The test needed some adapting in the way buckets are randomly generated: all aggs need to share the same interval, minDocCount and emptyBucketInfo. Also assertions need to take into account that more (or less) buckets are expected depending on minDocCount.
CompositeBytesReference#slice has two bugs:
- One that makes it fail if the reference is empty and an empty slice is
created, this is #35950 and is fixed by special-casing empty-slices.
- One performance bug that makes it always create a composite slice when
creating a slice that ends on a boundary, this is fixed by computing `limit`
as the index of the sub reference that holds the last element rather than
the next element after the slice.
Closes#35950
org.elasticsearch.rest.RestController#hasContentType checks to see if the
RestHandler supports the `application/x-ndjson` Content-Type. DeprecationRestHandler
is a wrapper around the real RestHandler, and prior to this change
would always return `false` due to the interface's default supportsContentStream().
This prevents API's that use multi-line JSON from properly being deprecated
resulting in an HTTP 406 error.
This change ensures that the DeprecationRestHandler honors the
supportsContentStream() of the wrapped RestHandler.
Relates to #35958
The support for rest_total_hits_as_int has already been merged to 6x
in #35848 so this change adds this new option to master. The plan was
to add this new option as part of #35848 but we've decided to wait a few
days before merging this breaking change so this commit just handles
the new option as a noop exactly like 6x for now. This will allow
users to migrate to this parameter before #35848 is merged.
Relates #33028
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.
The ping interval configuration is moved to the ConnectionProfile.
The server channel now responds to pings. This makes the keepalive
pings bidirectional.
On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
The nested agg can defer the collection of children if it is nested
under another aggregation. In such case accessing the score in the children
aggregation throws an error because the scorer has already advanced to the next
parent. This change fixes this error by caching the score of the parent in the
nested aggregation. Children aggregations that work on nested documents will be
able to access the _score. Also note that the _score in this case is always the
parent's score, there is no way to retrieve the score of a nested docs in aggregations.
Closes#35985Closes#34555
A number of tokenfilters can produce multiple tokens at the same position. This
is a problem when using token chains to parse synonym files, as the SynonymMap
requires that there are no stacked tokens in its input.
This commit ensures that when used to parse synonyms, these tokenfilters either produce
a single version of their input token, or that they throw an error when mappings are
generated. In indexes created in elasticsearch 6.x deprecation warnings are emitted in place
of the error.
* asciifolding and cjk_bigram produce only the folded or bigrammed token
* decompounders, synonyms and keyword_repeat are skipped
* n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors
Fixes#34298
The ActiveShardCount is used by cluster state observers to wait for a
given number of shards to be active before returning to the caller. The
current implementation does not work when an index is closed while an
observer is waiting on shards to be active. In this case, a NPE is thrown
and the observer is never notified that the shards won't become active.
This commit fixes the ActiveShardCount.enoughShardsActive() so that it
does not fail when an index is closed, similarly to what is done when an
index is deleted.
In `InternalHistogramTests` we were randomizing different values but `minDocCount` was hardcoded to `1`. It's important to test other values, especially `0` as it's the default. To make this possible, the test needed some adapting in the way buckets are randomly generated: all aggs need to share the same `interval`, `minDocCount` and `emptyBucketInfo`. Also assertions need to take into account that more (or less) buckets are expected depending on `minDocCount`.
This was originated by #35921 and its need to test adding empty buckets as part of the reduce phase.
Also relates to #26856 as one more key comparison needed to use `Double.compare` to properly handle `NaN` values, which was triggered by the increased test coverage.
Right now using the `GET /_tasks/<taskid>` API and causing a task to opt
in to saving its result after being completed requires permissions on
the `.tasks` index. When we built this we thought that that was fine,
but we've since moved towards not leaking details like "persisting task
results after the task is completed is done by saving them into an index
named `.tasks`." A more modern way of doing this would be to save the
tasks into the index "under the hood" and to have APIs to manage the
saved tasks. This is the first step down that road: it drops the
requirement to have permissions to interact with the `.tasks` index when
fetching task statuses and when persisting statuses beyond the lifetime
of the task.
In particular, this moves the concept of the "origin" of an action into
a more prominent place in the Elasticsearch server. The origin of an
action is ignored by the server, but the security plugin uses the origin
to make requests on behalf of a user in such a way that the user need
not have permissions to perform these actions. It *can* be made to be
fairly precise. More specifically, we can create an internal user just
for the tasks API that just has permission to interact with the `.tasks`
index. This change doesn't do that, instead, it uses the ubiquitus
"xpack" user which has most permissions because it is simpler. Adding
the tasks user is something I'd like to get to in a follow up change.
Instead, the majority of this change is about moving the "origin"
concept from the security portion of x-pack into the server. This should
allow any code to use the origin. To keep the change managable I've also
opted to deprecate rather than remove the "origin" helpers in the
security code. Removing them is almost entirely mechanical and I'd like
to that in a follow up as well.
Relates to #35573
Currently when a Fuzziness instance with custom AUTO distance values gets
written to XContent, the customized lower and upper distance values are ommited
and can consequently not be parsed back. This changes this to write the String
including the optional custom values when writing to XContent and fixes the
tests that should have caught this in the first place, e.g. by adding the custom
low and high distance values to the equality check.
This change removes the deprecated useDisMax() and useAllFields() methods from
the QueryStringQueryBuilder and related tests. The disMax parameter has already
been a no-op since 6.0 and also the useAllFields has been deprecated since 6.0
and there is a direct replacement via defaultField.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
Closes#22919
This PR adds deprecation warnings to the relevant `Rest*Action` classes, plus tests in `Rest*ActionTests`. No updates to REST tests, the Java HLRC, or documentation were necessary, since they didn't make use of types.
MultiSearchRequests issues through `_msearch` now validate all keys
in the metadata section. Previously unknown keys were ignored
while now an exception is thrown.
Closes#35869
This commit removes the dedicated `setSoLinger` method. This simplifies
the `TcpChannel` interface. This method has very little effect as the
SO_LINGER is not set prior to the channels being closed in the abstract
transport test case. We still will set SO_LINGER on the
`MockNioTransport`. However we can do this manually.