Enhance ConstantProcessor to properly serialize complex objects
(Intervals) that have their own custom serialization/deserialization
mechanism
Fix#39875
(cherry picked from commit ed8a1f9340673e69a44ea7a89679cadb4762e43d)
* reduce the number of leader indices to be auto followed
* also check the number of follower indices being created
* also check the whether leader indices are marked as auto followed
Relates to #36761
The monitoring bulk API accepts the same format as the bulk API, yet its concept
of types is different from "mapping types" and the deprecation warning is only
emitted as a side-effect of this API reusing the parsing logic of bulk requests.
This commit extracts the parsing logic from `_bulk` into its own class with a
new flag that allows to configure whether usage of `_type` should emit a warning
or not. Support for payloads has been removed for simplicity since they were
unused.
@jakelandis has a separate change that removes this notion of type from the
monitoring bulk API that we are considering bringing to 8.0.
* [ML] refactoring lazy query and agg parsing
* Clean up and addressing PR comments
* removing unnecessary try/catch block
* removing bad call to logger
* removing unused import
* fixing bwc test failure due to serialization and config migrator test
* fixing style issues
* Adjusting DafafeedUpdate class serialization
* Adding todo for refactor in v8
* Making query non-optional so it does not write a boolean byte
The watcher stats implementation tries to look at all queued watches
before preparing the result. We want to cast these to a
WatchExecutionTask to extract the context to prepare the stats for
queued watches. The problem is that not all tasks on the watcher queue
were WatchExecutionTask. This is because a manually executed watch was
not even at all wrapped in a WatchExecutionTask. Moreover, we were using
ExecutorService#submit(Runnable) which would wrap the Runnable in a
FutureTask<?>. This commit addresses this by using a WatchExecutionTask,
and also using ExecutorService#execute(Runnable) so that no wrapping
occurs. This will let us continue with the assumption that all queued
tasks are WatchExecutionTasks.
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
This commit removes the "doc" type from watcher internal indexes.
The template still carries the "_doc" type since that is needed for
the internal representation.
This impacts the .watches, .triggered-watches, and .watch-history indexes.
External consumers do not need any changes since all external calls
go through the _watcher API, and should not interact with the the .index directly.
Relates #38637
This test was more complicated than necessary, where we were capturing
requests to prevent removal of retention leases, so that our forget
follower request could remove the retention leases instead. Instead, a
pause is enough to ensure that the retention leases are not re-added
after we remove them by the forget follower request. This commit
simplifies this test, and should remove some spurious failures.
Relates #39850
This commit changes the type from "doc" to "_doc" for the
.logstash-management template. Since this is an internally
managed template it does not always go through the REST
layer for it's internal representation. The internal
representation requires the default "_doc" type, which for
external templates is added in the REST layer.
Related #38637
When trace logging is enabled we log the computed steps for a policy. This
commit makes sure that the steps that are logged are in the same order they will
be run when the policy executes. This makes it much easier to reason about the
policy if the move-to-step API is ever required in the future.
A TLS handshake requires exchanging multiple messages to initiate a
session. If one side decides to close during the handshake, it is
supposed to send a close_notify alert (similar to closing during
application data exchange). The java SSLEngine engine throws an
exception when this happens. We currently log this at the warn level if
trace logging is not enabled. This level is too high for a valid
scenario. Additionally it happens all the time in tests (quickly closing
and opened transports). This commit changes this to be logged at the
debug level if trace is not enabled. Additionally, it extracts the
transport security exception handling to a common class.
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
Previously all the threads were writing the received tokens to a
HashSet. In cases with many threads, sometimes (1 every ~25 tests)
calling size() on the HashSet returned 2 even though it seemed to
contain only one String and there was no evidence from logging that
threadSecurityClient.refreshToken() ever returned a different
access or refresh token.
This commit changes the test to use a ConcurrentHashMap instead,
checking that we only received one pair of access token/refresh token
eventually. It also adds a check so that we won't take into consideration
tokens that are returned after 30s, hence not in the concurrent refresh
time window.
Due to migration from joda to java.time licence expiration 'full date' format
has to use 4-char pattern (MMMM). Also since jdk9 the date with ROOT
locale will still return abbreviated days and month names.
closes#39136
backport #39681
Fixes several errors of the token retry logic:
* not checking for backoff.hasNext() before calling backoff.next()
* checking for backoff.hasNext() without calling backoff.next()
* not preserving the context on the retry
* calling scheduleWithFixedDelay instead of schedule
This change does the following:
1. Makes the per-node setting xpack.ml.max_open_jobs
into a cluster-wide dynamic setting
2. Changes the job node selection to continue to use the
per-node attributes storing the maximum number of open
jobs if any node in the cluster is older than 7.1, and
use the dynamic cluster-wide setting if all nodes are on
7.1 or later
3. Changes the docs to reflect this
4. Changes the thread pools for native process communication
from fixed size to scaling, to support the dynamic nature
of xpack.ml.max_open_jobs
5. Renames the autodetect thread pool to the job comms
thread pool to make clear that it will be used for other
types of ML jobs (data frame analytics in particular)
Backport of #39320
Today the `GroupedActionListener` accepts a `defaults` parameter but all
callers pass an empty list. Also it is permitted to pass an empty group but
this is trappy because the delegated listener is never be called in that case.
This commit removes the `defaults` parameter and forbids an empty group.
As we are moving to single type indices,
we need to address this change in security-related indexes.
To address this, we are
- updating index templates to use preferred type name `_doc`
- updating the API calls to use preferred type name `_doc`
Upgrade impact:-
In case of an upgrade from 6.x, the security index has type
`doc` and this will keep working as there is a single type and `_doc`
works as an alias to an existing type. The change is handled in the
`SecurityIndexManager` when we load mappings and settings from
the template. Previously, we used to do a `PutIndexTemplateRequest`
with the mapping source JSON with the type name. This has been
modified to remove the type name from the source.
So in the case of an upgrade, the `doc` type is updated
whereas for fresh installs `_doc` is updated. This happens as
backend handles `_doc` as an alias to the existing type name.
An optional step is to `reindex` security index and update the
type to `_doc`.
Since we do not support the security audit log index,
that template has been deleted.
Relates: #38637
This commit renames the retention lease setting
index.soft_deletes.retention.lease so that it is under the namespace
index.soft_deletes.retention_lease. As such, we rename the setting to
index.soft_deletes.retention_lease.period.
Previously, Watcher only attached its listener to indices that started
with the prefix `.watches`, which causes Watcher to silently fail to
schedule newly created Watches if the `.watches` alias is redirected to
an index that does not start with `.watches`.
Watcher now attaches the listener to all indices, so that Watcher can
respond to changes in which index has the `.watches` alias.
Also adjusts the tests to randomly use non-prefixed concrete indices
for .watches and .triggered_watches.
This change adds two new cluster privileges:
* manage_data_frame_transforms
* monitor_data_frame_transforms
And two new built-in roles:
* data_frame_transforms_admin
* data_frame_transforms_user
These permit access to the data frame transform endpoints.
(Index privileges are also required on the source and
destination indices for each data frame transform, but
since these indices are configurable they it is not
appropriate to grant them via built-in roles.)
Today we have no chance to fetch actual segment stats for segments that
are currently unloaded. This is relevant in the case of frozen indices.
This allows to monitor how much memory a frozen index would use if it was
unfrozen.
This is a backport of #39631
Co-authored-by: Jay Modi jaymode@users.noreply.github.com
This change adds support for the concurrent refresh of access
tokens as described in #36872
In short it allows subsequent client requests to refresh the same token that
come within a predefined window of 60 seconds to be handled as duplicates
of the original one and thus receive the same response with the same newly
issued access token and refresh token.
In order to support that, two new fields are added in the token document. One
contains the instant (in epoqueMillis) when a given refresh token is refreshed
and one that contains a pointer to the token document that stores the new
refresh token and access token that was created by the original refresh.
A side effect of this change, that was however also a intended enhancement
for the token service, is that we needed to stop encrypting the string
representation of the UserToken while serializing. ( It was necessary as we
correctly used a new IV for every time we encrypted a token in serialization, so
subsequent serializations of the same exact UserToken would produce
different access token strings)
This change also handles the serialization/deserialization BWC logic:
In mixed clusters we keep creating tokens in the old format and
consume only old format tokens
In upgraded clusters, we start creating tokens in the new format but
still remain able to consume old format tokens (that could have been
created during the rolling upgrade and are still valid)
When reading/writing TokensInvalidationResult objects, we take into
consideration that pre 7.1.0 these contained an integer field that carried
the attempt count
Resolves#36872
Previously, the security index could be wrongfully recreated. This might
happen if the index was interpreted as missing, as in the case of a fresh
install, but the index existed and the state did not yet recover.
This fix will return HTTP SERVICE_UNAVAILABLE (503) for requests that
try to write to the security index before the state has not been recovered yet.
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.
Relates #39658Closes#39575
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.
Follows: #39427
`enum` is a single option from a known list of `options`
`list` is an array of unknown values
`flags` are multiple options from a list of known `options`.
We don't support the `flags` type but a `list` with `options` acts as one. This is already the case for other API's taking metric such as `node.stats.json`.
watcher.stats behaves the same as other API's as `metrics` and as such accepts the following `GET _xpack/watcher/stats/queued_watches,current_watches`
(cherry picked from commit 4c00a025b8ac9b397b27c4ae2f799553d6499412)
ML has historically used doc as the single mapping type but reindex in 7.x
will change the mapping to _doc. Switching to the typeless APIs handles
case where the mapping type is either doc or _doc. This change removes
deprecated typed usages.
This cleans up the Engine implementation by separating the sequence number generation from the
planning step in the engine, to avoid for the planning step to have any side effects. This makes it
easier to see that every sequence number is properly accounted for.
Fix bug in IndexResolver that caused conflicts in multi-field types to
be ignored up (causing the query to fail later on due to mapping
conflicts).
The issue was caused by the multi-field which forced the parent creation
before checking its validity across mappings
Fix#39547
(cherry picked from commit 4e4fe289f90b9b5eae09072d54903701a3128696)
Queries that require counting of all hits (COUNT(*) on implicit
group by), now enable accurate hit tracking.
Fix#37971
(cherry picked from commit 265b637cf6df08986a890b8b5daf012c2b0c1699)
This commit parallelizes some parts of the test
and its remove an unnecessary refresh call.
On my local machine it shaves off about 15 seconds
for a test execution time of ~64s (down from ~80s).
This test is still slow but progress over perfection.
Relates #37339
This is a backport of #38382
This change adds supports for the concurrent refresh of access
tokens as described in #36872
In short it allows subsequent client requests to refresh the same token that
come within a predefined window of 60 seconds to be handled as duplicates
of the original one and thus receive the same response with the same newly
issued access token and refresh token.
In order to support that, two new fields are added in the token document. One
contains the instant (in epoqueMillis) when a given refresh token is refreshed
and one that contains a pointer to the token document that stores the new
refresh token and access token that was created by the original refresh.
A side effect of this change, that was however also a intended enhancement
for the token service, is that we needed to stop encrypting the string
representation of the UserToken while serializing. ( It was necessary as we
correctly used a new IV for every time we encrypted a token in serialization, so
subsequent serializations of the same exact UserToken would produce
different access token strings)
This change also handles the serialization/deserialization BWC logic:
- In mixed clusters we keep creating tokens in the old format and
consume only old format tokens
- In upgraded clusters, we start creating tokens in the new format but
still remain able to consume old format tokens (that could have been
created during the rolling upgrade and are still valid)
Resolves#36872
Co-authored-by: Jay Modi jaymode@users.noreply.github.com
Backport support for replicating closed indices (#39499)
Before this change, closed indexes were simply not replicated. It was therefore
possible to close an index and then decommission a data node without knowing
that this data node contained shards of the closed index, potentially leading to
data loss. Shards of closed indices were not completely taken into account when
balancing the shards within the cluster, or automatically replicated through shard
copies, and they were not easily movable from node A to node B using APIs like
Cluster Reroute without being fully reopened and closed again.
This commit changes the logic executed when closing an index, so that its shards
are not just removed and forgotten but are instead reinitialized and reallocated on
data nodes using an engine implementation which does not allow searching or
indexing, which has a low memory overhead (compared with searchable/indexable
opened shards) and which allows shards to be recovered from peer or promoted
as primaries when needed.
This new closing logic is built on top of the new Close Index API introduced in
6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before
closing them, and closing an index on a 8.0 cluster will reinitialize the index shards
and therefore impact the cluster health.
Some APIs have been adapted to make them work with closed indices:
- Cluster Health API
- Cluster Reroute API
- Cluster Allocation Explain API
- Recovery API
- Cat Indices
- Cat Shards
- Cat Health
- Cat Recovery
This commit contains all the following changes (most recent first):
* c6c42a1 Adapt NoOpEngineTests after #39006
* 3f9993d Wait for shards to be active after closing indices (#38854)
* 5e7a428 Adapt the Cluster Health API to closed indices (#39364)
* 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767)
* 71f5c34 Recover closed indices after a full cluster restart (#39249)
* 4db7fd9 Adapt the Recovery API for closed indices (#38421)
* 4fd1bb2 Adapt more tests suites to closed indices (#39186)
* 0519016 Add replica to primary promotion test for closed indices (#39110)
* b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631)
* c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955)
* 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
* e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329)
* cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327)
* b9becdd Adapt testPendingTasks() for replicated closed indices (#38326)
* 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024)
* e53a9be Fix compilation error in IndexShardIT after merge with master
* cae4155 Relax NoOpEngine constraints (#37413)
* 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
* c63fd69 [RCI] Add NoOpEngine for closed indices (#33903)
Relates to #33888
* SYS COLUMNS will skip UNSUPPORTED field types in ODBC and JDBC, as well.
NESTED and OBJECT types were already skipped in ODBC mode, now they are
skipped in JDBC mode, as well.
(cherry picked from commit 9e0df64b2d36c9069dfa506570468f0522c86417)
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.
Closes: #38501Fixes: #35203
This commit adds a simple integ test that exercises the flow:
* snapshot .security
* delete .security
* restore .security
, checking that the Native Realm works as expected.
Relates #34454
fix a couple of odd behaviors of data frame transforms REST API's:
- check if id from body and id from URL match if both are specified
- do not allow a body for delete
- allow get and stats without specifying an id
Backport of #39325
When ILM is disabled and Watcher is setting up the templates and policies for
the watch history indices, it will now use a template that does not have the
`index.lifecycle.name` setting, so that indices are not created with the
setting.
This also adds tests for the behavior, and changes the cluster state used in
these tests to be real instead of mocked.
Resolves#38805
This change fixes the tests that expect the reload of a
SSLConfiguration to fail. The tests relied on an incorrect assumption
that the reloader only called reload on for an SSLConfiguration if the
key and trust managers were successfully reloaded, but that is not the
case. This change removes the fail call with a wrapped call to the
original method and captures the exception and counts down a latch to
make these tests consistently tested.
Closes#39260
Backport of #39350
Contains the following:
* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
Previously, if a text field had an underlying keyword field
the latter was not used instead of the text leading to wrong
results returned by queries filtering with LIKE/RLIKE.
Fixes: #39442
The ScheduledEvent class has never preserved the time
zone so it makes more sense for it to store the start and
end time using Instant rather than ZonedDateTime.
Closes#38620
* Add "columnar" option for REST requests (but be lenient for non-"plain"
modes) for json, yaml, smile and cbor formats.
* Updated documentation
(cherry picked from commit 5b7e0de237fb514d14a61a347bc669d4b4adbe56)
Currently there are two security tests that specifically target the
netty security transport. This PR moves the client authentication tests
into `AbstractSimpleSecurityTransportTestCase` so that the nio transport
will also be tested.
Additionally the work to build transport configurations is moved out of
the netty transport and tested independently.
This changes the name of the internal security index to ".security-7",
but supports indices that were upgraded from earlier versions and use
the ".security-6" name.
In all cases, both ".security-6" and ".security-7" are considered to
be restricted index names regardless of which name is actually in use
on the cluster.
Backport of: #39337
Some small fix for the `x-pack` rest api spec.
* In both `security.enable_user.json` and `security.disable_user.json`
the `username` parameter was `false` instead of `true`
(the documentation is already correct).
* In `security.get_privileges.json` there were missing all the
possible paths since the path parameters are not required.
This fix aligns the document with the rest of the spec,
where all the possible combinations are listed.
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.
This change is a backport of #39252
- Fixes TokenBackwardsCompatibilityIT: Existing tests seemed to made
the assumption that in the oneThirdUpgraded stage the master node
will be on the old version and in the twoThirdsUpgraded stage, the
master node will be one of the upgraded ones. However, there is no
guarantee that the master node in any of the states will or will
not be one of the upgraded ones.
This class now tests:
- That we can generate and consume tokens before we start the
rolling upgrade.
- That we can consume tokens generated in the old cluster during
all the stages of the rolling upgrade.
- That while on a mixed cluster, when/if the master node is
upgraded, we can generate, consume and refresh a token
- That after the rolling upgrade, we can consume a token
generated in an old cluster and can invalidate it so that it
can't be used any more.
- Ensures that during the rolling upgrade, the upgraded nodes have
the same configuration as the old nodes. Specifically that the
file realm we use is explicitly named `file1`. This is needed
because while attempting to refresh a token in a mixed cluster
we might create a token hitting an old node and attempt to refresh
it hitting a new node. If the file realm name is not the same, the
refresh will be seen as being made by a "different" client, and
will, thus, fail.
- Renames the Authentication variable we check while refreshing a
token to be clientAuth in order to make the code more readable.
Some of the above were possibly causing the flakiness of #37379
Today when users upgrade to 7.0, existing indices will automatically
switch to soft-deletes without an opt-out option. With this change,
we only enable soft-deletes by default for new indices.
Relates #36141
This commit is the final piece of the integration of CCR with retention
leases. Namely, we periodically renew retention leases and advance the
retaining sequence number while following.
* Remove Hipchat support from Watcher (#39199)
Hipchat has been shut down and has previously been deprecated in
Watcher (#39160), therefore we should remove support for these actions.
* Add migrate note
With this change, we won't wait for the local checkpoint to advance to
the max_seq_no before starting phase2 of peer-recovery. We also remove
the sequence number range check in peer-recovery. We can safely do these
thanks to Yannick's finding.
The replication group to be used is currently sampled after indexing
into the primary (see `ReplicationOperation` class). This means that
when initiating tracking of a new replica, we have to consider the
following two cases:
- There are operations for which the replication group has not been
sampled yet. As we initiated the new replica as tracking, we know that
those operations will be replicated to the new replica and follow the
typical replication group semantics (e.g. marked as stale when
unavailable).
- There are operations for which the replication group has already been
sampled. These operations will not be sent to the new replica. However,
we know that those operations are already indexed into Lucene and the
translog on the primary, as the sampling is happening after that. This
means that by taking a snapshot of Lucene or the translog, we will be
getting those ops as well. What we cannot guarantee anymore is that all
ops up to `endingSeqNo` are available in the snapshot (i.e. also see
comment in `RecoverySourceHandler` saying `We need to wait for all
operations up to the current max to complete, otherwise we can not
guarantee that all operations in the required range will be available
for replaying from the translog of the source.`). This is not needed,
though, as we can no longer guarantee that max seq no == local
checkpoint.
Relates #39000Closes#38949
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
when dealing with TimeoutException
The `IndexFollowingIT#testDeleteLeaderIndex()`` test failed,
because a NPE was captured as fatal error instead of an IndexNotFoundException.
Closes#39308
There have been intermittent failures where either
LDAP server could not be started or KDC server could
not be started causing failures during test runs.
`KdcNetwork` class from Apache kerby project does not set reuse
address to `true` on the socket so if the port that we found to be free
is in `TIME_WAIT` state it may fail to bind. As this is an internal
class for kerby, I could not find a way to extend.
This commit adds a retry loop for initialization. It will keep
trying in an await busy loop and fail after 10 seconds if not
initialized.
Closes#35982
Finally! This commit should fix the issues with the CCR retention lease
that has been plaguing build failures. The issue here is that we are
trying to prevent the clear session requests from being executed until
after we have been able to validate that retention leases are being
renewed. However, we were only blocking the clear session requests but
not blocking them when they are proxied through another node. This
commit addresses that.
Relates #39268
This commit changes the sort order of shard stats that are collected in
CCR retention lease integration tests. This change is done so that
primaries appear first in sort order.
This test fails rarely but it is flaky in its current form. The problem
here is that we lack a guarantee on the retention leases having been
synced to all shard copies. We need to sleep long enough to ensure that
that occurs, and then we can sample the retention leases, possibly sleep
again (we usually will not have too since the first sleep will have been
long enough to allow a sync and a renewal to happen, if one was going to
happen), and the sample the retention leases for comparison.
Closes#39331