Set rest_total_hits_as_int in search requests only on the new cluster,
the old cluster can be on a version older than don't support this
parameter (< 6.6) and total hits is not an object in 6x anyway.
Closes#36291
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
This commit adds settings upgraders for the search.remote.* settings
that can be in the cluster state to automatically upgrade these settings
to cluster.remote.*. Because of the infrastructure that we have here,
these settings can be upgraded when recovering the cluster state, but
also when a user tries to make a dynamic update for these settings.
We need to wait for the job to fully initialize and start before
we can attempt to stop it. If we don't, it's possible for the stop
API to be called before the persistent task is fully loaded and it'll
throw an exception.
Closes#32773
We only upgrade the ID when the state is saved in one of four scenarios:
- when we reach a checkpoint (every 50 pages)
- when we run out of data
- when explicitly stopped
- on failure
The test was relying on the pre-upgrade to finish, save state and then
the post-upgrade to start, hit the end of data and upgrade ID. THEN
get the new doc and apply the new ID.
But I think this is vulnerable to timing issues. If the pre-upgrade
portion shutdown before it saved the state, when restarting we would run
through all the data from the beginning with the old ID, meaning both
docs would still have the old scheme.
This change makes the pre-upgrade wait for the job to go back to STARTED
so that we know it persisted the end point. Post-upgrade, it stops and
restarts the job to ensure the state was persisted and the ID upgraded.
That _should_ rule out the above timing issue.
Closes#32773
Bumping down the version to 6.4 since the backport is complete. Also
adds some missing version checks to the bwc tests to make sure it
only runs on the correct versions
Previously, we were using a simple CRC32 for the IDs of rollup documents.
This is a very poor choice however, since 32bit IDs leads to collisions
between documents very quickly.
This commit moves Rollups over to a 128bit ID. The ID is a concatenation
of all the keys in the document (similar to the rolling CRC before),
hashed with 128bit Murmur3, then base64 encoded. Finally, the job
ID and a delimiter (`$`) are prepended to the ID.
This gurantees that there are 128bits per-job. 128bits should
essentially remove all chances of collisions, and the prepended
job ID means that _if_ there is a collision, it stays "within"
the job.
BWC notes:
We can only upgrade the ID scheme after we know there has been a good
checkpoint during indexing. We don't rely on a STARTED/STOPPED
status since we can't guarantee that resulted from a real checkpoint,
or other state. So we only upgrade the ID after we have reached
a checkpoint state during an active index run, and only after the
checkpoint has been confirmed.
Once a job has been upgraded and checkpointed, the version increments
and the new ID is used in the future. All new jobs use the
new ID from the start
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. This
changes all calls in the `x-pack:qa:full-cluster-restart` project to use
the new versions.
We can leverage the composite agg's new `missing_bucket` feature on
terms groupings. This means the aggregation criteria used in the indexer
will now return null buckets for missing keys.
Because all buckets are now returned (even if a key is null),
we can guarantee correct doc counts with
"combined" jobs (where a job rolls up multiple schemas). This was
previously impossible since composite would ignore documents that
didn't have _all_ the keys, meaning non-overlapping schemas would
cause composite to return no buckets.
Note: date_histo does not use `missing_bucket`, since a timestamp is
always required.
The docs have been adjusted to recommend a single, combined job. It
also makes reference to the previous issue to help users that are upgrading
(rather than just deleting the sections).
Merges the `query-builder-bwc` qa project into the
`full-cluster-restart` qa project, saving a cluster starts on every
build and *many* cluster starts on `./gradlew bwcTests`.
This pull request adds a full cluster restart test for a Rollup job.
The test creates and starts a Rollup job on the cluster and checks
that the job already exists and is correctly started on the upgraded
cluster.
This test allows to test that the persistent task state is correctly
parsed from the cluster state after the upgrade, as the status field
has been renamed to state in #31031.
The test undercovers a ClassCastException that can be thrown in
the RollupIndexer when the timestamp as a very low value that fits
into an integer. When it's the case, the value is parsed back as an
Integer instead of Long object and (long) position.get(rollupFieldName)
fails.
This commit renames IndexLifecycleManager to SecurityIndexManager as it
is not actually a general purpose class, but specific to security. It
also removes indirection in code calling the lifecycle service, instead
calling the security index manager directly.