In some cases the retention leases can return null, causing a
`NullPointerException` when waiting for no followers.
This wraps those so that no NPE is thrown.
Here is an example failure:
```
[2019-03-26T09:24:01,368][ERROR][o.e.x.i.IndexLifecycleRunner] [node-0] policy [deletePolicy] for index [ilm-00001] failed on step [{"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}]. Moving to ERROR step
java.lang.NullPointerException: null
at org.elasticsearch.xpack.core.indexlifecycle.WaitForNoFollowersStep.lambda$evaluateCondition$0(WaitForNoFollowersStep.java:60) ~[?:?]
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) ~[?:1.8.0_191]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_191]
at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) ~[?:1.8.0_191]
at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) ~[?:1.8.0_191]
at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498) ~[?:1.8.0_191]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) ~[?:1.8.0_191]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[?:1.8.0_191]
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230) ~[?:1.8.0_191]
at java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196) ~[?:1.8.0_191]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_191]
at java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:449) ~[?:1.8.0_191]
at org.elasticsearch.xpack.core.indexlifecycle.WaitForNoFollowersStep.lambda$evaluateCondition$2(WaitForNoFollowersStep.java:61) ~[?:?]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:68) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onCompletion(TransportBroadcastByNodeAction.java:383) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.onNodeResponse(TransportBroadcastByNodeAction.java:352) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:324) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction$1.handleResponse(TransportBroadcastByNodeAction.java:314) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1095) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1176) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]
...
```
Replaces the vagrant based kerberos fixtures with docker based test fixtures plugin.
The configuration is now entirely static on the docker side and no longer driven by Gradle,
also two different services are being configured since there are two different consumers of the fixture that can run in parallel and require different configurations.
SYS TABLES meta command has been improved to better adhere to the ODBC
spec in particular with regards to the handling of enumerations (and
the differences between '%', null and ''(empty string))
Fix#40348
(cherry picked from commit e3070615000228c283d17ce8d182b44f1450a5d5)
Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with
full indexing and search capabilities allocated. We can save on a lot of heap memory for those
indices by not allocating a mapper service and caching infrastructure (which preallocates a constant
amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed
metricbeat indices (each index with one shard). After this change, the same instance can host 7300
replicated closed metricbeat instances (not that this would be a recommended configuration). Most
of the remaining memory is in the cluster state and the IndexSettings object.
Previously, `getTime(colIdx/colLabel)` and
`getObject(colIdx/colLabel, java.sql.Time.class)` methods were computing
the time from a `ZonedDateTime` by applying day in millis modulo on the epoch millis
of the `ZonedDateTime` object. This is wrong as we need to keep the time
related fields at the timezone of the `ZonedDateTime` object and just
set the date info to the epoch date (01/01/1970).
Additionally fixes a testing issue as the original timezone id is converted
to an offset string when parsing the response from the server.
* Document MATCH and QUERY function predicates.
* Polish the functions pages and add a list of functions to the main Functions & Operators page.
(cherry picked from commit 4cec0ae1b962ec7ea011a290aec72740386eb808)
Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.
Additionally this commit back ports #40237
Remove Tracer from MockTransportService
Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.
* [ML] Add data frame task state object and field
* A new state item is added so that the overall task state can be
accoutned for
* A new FAILED state and reason have been added as well so that failures
can be shown to the user for optional correction
* Addressing PR comments
* adjusting after master merge
* addressing pr comment
* Adjusting auditor usage with failure state
* Refactor, renamed state items to task_state and indexer_state
* Adding todo and removing redundant auditor call
* Address HLRC changes and PR comment
* adjusting hlrc IT test
create and use unique, deterministic document ids based on the grouping values.
This is a pre-requisite for updating documents as well as preventing duplicates after a hard failure during indexing.
This change removes the use of hardcoded port values for the
idp-fixture in favor of the mapped ephemeral ports. This should prevent
failures due to port conflicts in CI.
Ensure that there is at least a 1s delay between the time that state
is persisted by each of the two jobs in the test.
Model snapshot IDs use the current time in epoch seconds to
distinguish themselves, hence snapshots will be overwritten
by another if it occurs in the same 1s window.
Closes#40347
To avoid having to specify each spec by hand (which can miss specs to be
added), the test infrastructure now performs classpath discovery so that
each spec added, is automatically considered.
Relates #40358
(cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90)
* [ML] make source and dest objects in the transform config
* addressing PR comments
* Fixing compilation post merge
* adding comment for Arrays.hashCode
* addressing changes for moving dest to object
* fixing data_frame yml tests
* fixing API test
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.
In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.
Relates #36712
Previously, `getDate(int columnIdx)/getDate(String columnLabel)` and
were using legacy`java.util.Calendar` instead of the the `java.time.*`
classes to reset to the start of day. This resulted in different results
for certain timestamps and timezones when calling
`getDate(col)` vs`getObject(col, java.sql.Date)`
Now only the methods (that must be implemented due to the JDBC spec)
`getDate(int columnIdx, Calendar cal)/getDate(String columnLabel, Calendar cal)`
are still using the `java.util.Calendar` for those conversion.
The same change was applied to
`getTime(int columnIdx)/getTime(String columnLabel)`
and
`getTimestamp(int columnIdx)/getTimestamp(String columnLabel)`
Fixes: #40289
(cherry picked from commit 44560671f18397e0c58e3647732880fcb73a5034)
In some cases, a request to perform a retention lease action can arrive
on a primary shard before it is active. In this case, the primary shard
would not yet be in primary mode, tripping an assertion in the
replication tracker. Instead, we should not attempt to perform such
actions on an initializing shard. This commit addresses this by not
returning the primary shard in the single shard iterator if the primary
shard is not yet active.
Previously metric aggregations on date fields would return a double
which caused errors when trying to apply scalar functions on top, e.g.:
```
SELECT YEAR(MAX(date)) FROM test
```
Fixes: #40376
(cherry-picked from commit 41d0a038467fbdbbf67fd9bfdf27623451cae63a)
* Refactor RegexMatch to support both LIKE and RLIKE
* Add integration tests for RLIKE
* Polish the rest of tests
(cherry picked from commit 7562d6eeeb77c04794002649fe726f4b3a9a398b)
Upgrade JLine to 3.10.0
Switch to using JLine granular jars instead of the uber-one
Remove Jansi dependency (due to errors in closing streams)
Pin JNA dependency to our own artifact
Fix#40239
(cherry picked from commit 9afa65fa80111f3b68c13373c7b6db13c11dde31)
Extend CAST to support all data types notations (whether SQL or ES
specific)
Fix#40282
(cherry picked from commit eb2ee8a344da946920598839a5db76c8bb9bc3fe)
Add a checkpoint service for data frame transforms, which allows to ask for a checkpoint of the
source. In future these checkpoints will be stored in the internal index to
- detect upstream changes
- updating the data frame without a full re-run
- allow data frame clients to checkpoint themselves
* Rewrite Round and Truncate functions to have a slightly different
approach to handling the optional parameter in the constructor. Until now
the optional parameter was considered 0 if the value was missing and the
constructor was filling in this value. The current solution is to have
the optional parameter as null right until the actual calculation is done.
(cherry picked from commit 3e314f8fa4cb322e67949e80857561ce51268726)
If there's a failover on the follower, then its max_seq_no_of_updates is
bootstrapped from its max_seq_no which might be higher than the
max_seq_no_of_updates of the leader. We need to relax this check.
Relates #40249
This refactoring is in the context of the work related to moving security
tokens to a new index. In that regard, the Token Service has to work with
token documents stored in any of the two indices, albeit only as a transient
situation. I reckoned the added complexity as unmanageable,
hence this refactoring.
This is incomplete, as it fails to address the goal of minimizing .security accesses,
but I have stopped because otherwise it would've become a full blown rewrite
(if not already). I will follow-up with more targeted PRs.
In addition to being a true refactoring, some 400 errors moved to 500. Furthermore,
more stringed validation of various return result, has been implemented, notably the
one of the token document creation.
This commit adjusts the frequency with which CCR renews retention leases
and with which primaries sync retention leases to replicas. This helps
Lucene reclaim soft-deleted documents more aggressively, which we have
found in some use-cases can help improve performance, and either way
will help keep disk space under more control.
* Define a equals method for Like function so that the pattern used
is considered in the equality check. Whenever the functions are resolved
this check should be used.
(cherry picked from commit 4e5d5af58a140573b8ee19d57c7839db7b779e3b)
When creating API keys we check for if API key with
the same key name already exists and fail the request if it does.
The check should have been performed with XPackSecurityUser
instead of the authenticated user. This caused the request to fail
in case of the non-super user trying to create an API key.
This commit fixes by executing search action with SECURITY_ORIGIN
so it can be executed with XPackSecurityUser.
Also fixed the Rest test to avoid using a user with `super_user` role.
Closes#40029
Previously, calling getDate()/getTime()/getTimestamp() and getObject()
with the corresponding java.sql class on a column of SQL DATE type from
the JDBC result set would throw an Exception.
* [Data Frame] Refactor GET Transforms API:
* Add pagination
* comma delimited list expression support GET transforms
* Flag troublesome internal code for future refactor
* Removing `allow_no_transforms` param, ratcheting down pageparam option
* Changing DataFrameFeatureSet#usage to not get all configs
* Intermediate commit
* Writing test for batch data gatherer
* Removing unused import
* removing bad println used for debugging
* Updating BatchedDataIterator comments and query
* addressing pr comments
* disallow null scrollId to cause stackoverflow
Previously, when a trival plain `SELECT` or a trivial `SELECT` with
aggregations has also an `ORDER BY` or a `LIMIT` or both, then the
optimization to convert it to a `LocalRelation` was skipped resulting
in exception thrown. E.g.::
```
SELECT 'foo' FROM test LIMIT 10
```
or
```
SELECT 'foo' FROM test GROUP BY 1 ORDER BY 1
```
Fixes: #40211
When selecting columns of ES type `date` (SQL's DATETIME) the
`FieldHitExtractor` was not using the timezone of the client session
but always resorted to UTC. The same behaviour (UTC only) was
encountered also for grouping keys (`CompositeKeyExtractor`) and
for First/Last functions on dates (`TopHitsAggExtractor`).
Fixes: #40152
Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's
not dynamic which can be updated for closed index only.
Closes#32763
If a replica were first reset due to one primary failover and then
promoted (before resync completes), its MSU would not include changes
since global checkpoint, leading to errors during translog replay.
Fixed by re-initializing MSU before restoring local history.
Some field types are not used for queries which use auto-expansion, in
particular, `binary`, `geo_point`, and `geo_shape`. This was causing the
count returned by the deprecation check and the count returned by the
query-time deprecation warning to be misaligned for indices with fields
of those types, with the count returned by the deprecation check being
larger.
The setup-passwords tool gives cryptic messages in case where custom discovery providers are
used (see #33580). As the URL auto-detection logic should be seen as best effort, this commit
improves the exception message to make it clearer what needs to be done to fix the issue.
Relates #33580
This named writable was never registered, so it means that we could not
read auto-follow patterns that were registered in the cluster
state. This causes them to be lost on restarts, a bad bug. This commit
addresses this by registering this named writable, and we add a basic
CCR restart test to ensure that CCR keeps functioning properly when the
follower is restarted.
The Migration Assistance API has been functionally replaced by the
Deprecation Info API, and the Migration Upgrade API is not used for the
transition from ES 6.x to 7.x, and does not need to be kept around to
repair indices that were not properly upgraded before upgrading the
cluster, as was the case in 6.
We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped
using it since #25692 (6.0). This change removes that action and related
code in 7.x and 8.0.
Relates #19287
Relates #25692
We were leaking a reference to an AutoFollowCoordinator during
construction, violating safe publication according to the JLS
specification. This commit addresses this by waiting to register
AutoFollowCoordinator with the ClusterApplierService after the
AutoFollowCoordinator is fully constructed. We also remove ourselves as
a listener when stopping.
* Take into consideration aliases that can be used as aggregates
and in the ORDER BY element so that the groupings are re-ordered inside
the composite aggregation according to the ORDER BY ordering.
(cherry picked from commit 110c0b90b9cf2e9344ab3f412cfa8f8cd94ad71f)
For cases where fields can have multi values, allow the behavior to be
customized through a dedicated configuration field.
By default this will be enabled on the drivers so that existing datasets
work instead of throwing an exception.
For regular SQL usage, the behavior is false so that the user is aware
of the underlying data.
Fix#39700
(cherry picked from commit 2b351571961f172fd59290ee079126bbd081ceaf)
When shutting down a node, auto-followers will keep trying to run. This
is happening even as transport services and other components are being
closed. In some cases, this can lead to a stack overflow as we rapidly
try to check the license state of the remote cluster, can not because
the transport service is shutdown, and then immeidately retry
again. This can happen faster than the shutdown, and we die with stack
overflow. This commit adds a stop command to auto-followers so that this
retry loop occurs at most once on shutdown.
This commit enables full-cluster-restart and rolling-upgrade tests
to run with nodes using a JVM in fips approved only node by using
PEM key material instead of a JKS for the transport layer in that
case.
With SUN security provider, a CertificateException is thrown when
attempting to parse a Certificate from a PEM file on disk with
`sun.security.provider.X509Provider#parseX509orPKCS7Cert`
When using the BouncyCastle Security provider (as we do in fips
tests) the parsing happens in
CertificateFactory#engineGenerateCertificates which doesn't throw
an exception but returns an empty list.
In order to have a consistent behavior, this change makes it so
that we throw a CertificateException when attempting to read
a PEM file from disk and failing to do so in either Security
Provider
Resolves: #39580
`SecurityIndexManager` is hardcoded to handle only the `.security`-`.security-7` alias-index pair.
This commit removes the hardcoded bits, so that the `SecurityIndexManager` can be reused
for other indices, such as the planned security tokens index (`.security-tokens-7`).
This change adjusts the LDAP connection timeout for retrieving
attributes while performing the SAML IT to 5 seconds, from 5 ms
that it previously was.
Resolves: #40025
When an auto-follower coordinator times out waiting for the remote
cluster state, we do not log any indication of this. While this is
expected behavior in quiet deployments, it is still useful to see this
information for tracing the behavior of the auto-follow
coordinator. This commit adds a trace log message indicating that the
timeout.
This commit removes the cluster state size field from the cluster state
response, and drops the backwards compatibility layer added in 6.7.0 to
continue to support this field. As calculation of this field was
expensive and had dubious value, we have elected to remove this field.
Since other classes besides intervals can be serialized as part of
the Cursor, the getNamedWritables method should be moved from Intervals
to a more generic class Literals.
Relates to #39973
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.
Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.
This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.
Fixes#38785
org.elasticsearch.xpack.monitoring.action.MonitoringBulkRequestTests#testAddRequestContent
can still randomly use a defaultType for monitoring. The defaultType
support has been removed as of PR #39888. Prior to its's removal it
would default the type if one is not specified. The _type on the monitoring
bulk end point is currently required, though it is not used as the final index type
(which defaultType would have).
Closes#39980
Previously, JDBC's REST call to the server was always sending UTC
instead of the timezone passed through connection string/properties.
Moreover the conversion to java.sql.Date was problematic as a
calculation on the epoch millis was used to set the time to 00:00:00.000
and the timezone info was lost. This caused the resulting java.sql.Date
object which is always using the JVM's timezone (no matter what timezone
setting is used in the connection string/properties) to be wrongly created.
Fixes: #39915
* [ML] Refactor common utils out of ML plugin to XPack.Core
* implementing GET filters with abstract transport
* removing added rest param
* adjusting how defaults can be supplied
* [Data Frame] Refactor PUT transform such that:
* POST _start creates the task and starts it
* GET transforms queries docs instead of tasks
* POST _stop verifies the stored config exists before trying to stop
the task
* Addressing PR comments
* Refactoring DataFrameFeatureSet#usage, decreasing size returned getTransformConfigurations
* fixing failing usage test
The problem here was that `DatafeedJob` was updating the last end time searched
based on the `now` even though when there are aggregations, the extactor will
only search up to the floor of `now` against the histogram interval.
This commit fixes the issue by using the end time as calculated by the extractor.
It also adds an integration test that uses aggregations. This test would fail
before this fix. Unfortunately the test is slow as we need to wait for the
datafeed to work in real time.
Closes#39842
The change replaces the Vagrant box based fixture with a fixture
based on docker compose and 2 docker images, one for an openldap
server and one for a Shibboleth SAML Identity Provider.
The configuration of both openldap and shibboleth is identical to
the previous one, in order to minimize required changes in the
tests
When following the steps mentioned in upgrade guide
https://www.elastic.co/guide/en/elastic-stack/6.6/upgrading-elastic-stack.html
if we disable the cluster shard allocation but fail to enable it after
upgrading the nodes and plugins, the next step of upgrading internal
indices fails. As we did not check the bulk request response for reindexing,
we delete the old index assuming it has been created. This is fatal
as we cannot recover from this state.
This commit adds a pre-upgrade check to test the cluster shard
allocation setting and fail upgrade if it is disabled. In case there
are search or bulk failures then we remove the read-only block and
fail the upgrade index request.
Closes#39339
The LDAP tests attempt to bind all interfaces,
but if for some reason an interface can't be bound
the tests will stall until the suite times out.
This modifies the tests to be a bit more lenient and allow
some binding to fail so long as at least one succeeds.
This allows the test to continue even in more antagonistic
environments.
Indices with very large numbers of fields (>1024 by default) that do not
have index.query.default_field set will experience query failures in 7.0
for Simple Query String and Multi-Match queries. This deprecation check
issues a warning for indices of that size that do not have
index.query.default_field set.
This also adds a deprecation check for index templates with field counts
that would trigger these query failures as well.
When a query is translated into script terms agg where key has a date
type, it should generate a terms agg with value_type long instead of
date, otherwise the key gets formatted as a string, which confuses
hit extractor.
Fixes#37042
This commit removes the "doc" type from monitoring internal indexes.
The template still carries the "_doc" type since that is needed for
the internal representation.
This change impacts the following templates:
monitoring-alerts.json
monitoring-beats.json
monitoring-es.json
monitoring-kibana.json
monitoring-logstash.json
As part of the required changes, the system_api_version has been
bumped from "6" to "7" and support for version "2" has been dropped.
A new empty pipeline is now introduced for the version "7", and
the formerly empty "6" pipeline will now remove the type and re-direct
the request to the "7" index.
Additionally, to due to a difference in the internal representation
(which requires the inclusion of "_doc" type) and external representation
(which requires the exclusion of any type) a helper method is introduced
to help convert internal to external representation, and used by the
monitoring HTTP template exporter.
Relates #38637
Painless allows ZonedDateTime objects to be passed natively to scripts
which creates problematic translate queries as the ZonedDateTime is
passed as a string instead.
Wrap this with a dedicated method to perform the conversion.
Fix#39877
(cherry picked from commit 4957cad5bda77257d10430ac102e93f5e062148a)
Enhance ConstantProcessor to properly serialize complex objects
(Intervals) that have their own custom serialization/deserialization
mechanism
Fix#39875
(cherry picked from commit ed8a1f9340673e69a44ea7a89679cadb4762e43d)
* reduce the number of leader indices to be auto followed
* also check the number of follower indices being created
* also check the whether leader indices are marked as auto followed
Relates to #36761
The monitoring bulk API accepts the same format as the bulk API, yet its concept
of types is different from "mapping types" and the deprecation warning is only
emitted as a side-effect of this API reusing the parsing logic of bulk requests.
This commit extracts the parsing logic from `_bulk` into its own class with a
new flag that allows to configure whether usage of `_type` should emit a warning
or not. Support for payloads has been removed for simplicity since they were
unused.
@jakelandis has a separate change that removes this notion of type from the
monitoring bulk API that we are considering bringing to 8.0.
* [ML] refactoring lazy query and agg parsing
* Clean up and addressing PR comments
* removing unnecessary try/catch block
* removing bad call to logger
* removing unused import
* fixing bwc test failure due to serialization and config migrator test
* fixing style issues
* Adjusting DafafeedUpdate class serialization
* Adding todo for refactor in v8
* Making query non-optional so it does not write a boolean byte
The watcher stats implementation tries to look at all queued watches
before preparing the result. We want to cast these to a
WatchExecutionTask to extract the context to prepare the stats for
queued watches. The problem is that not all tasks on the watcher queue
were WatchExecutionTask. This is because a manually executed watch was
not even at all wrapped in a WatchExecutionTask. Moreover, we were using
ExecutorService#submit(Runnable) which would wrap the Runnable in a
FutureTask<?>. This commit addresses this by using a WatchExecutionTask,
and also using ExecutorService#execute(Runnable) so that no wrapping
occurs. This will let us continue with the assumption that all queued
tasks are WatchExecutionTasks.
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
This commit removes the "doc" type from watcher internal indexes.
The template still carries the "_doc" type since that is needed for
the internal representation.
This impacts the .watches, .triggered-watches, and .watch-history indexes.
External consumers do not need any changes since all external calls
go through the _watcher API, and should not interact with the the .index directly.
Relates #38637
This test was more complicated than necessary, where we were capturing
requests to prevent removal of retention leases, so that our forget
follower request could remove the retention leases instead. Instead, a
pause is enough to ensure that the retention leases are not re-added
after we remove them by the forget follower request. This commit
simplifies this test, and should remove some spurious failures.
Relates #39850
This commit changes the type from "doc" to "_doc" for the
.logstash-management template. Since this is an internally
managed template it does not always go through the REST
layer for it's internal representation. The internal
representation requires the default "_doc" type, which for
external templates is added in the REST layer.
Related #38637
When trace logging is enabled we log the computed steps for a policy. This
commit makes sure that the steps that are logged are in the same order they will
be run when the policy executes. This makes it much easier to reason about the
policy if the move-to-step API is ever required in the future.
A TLS handshake requires exchanging multiple messages to initiate a
session. If one side decides to close during the handshake, it is
supposed to send a close_notify alert (similar to closing during
application data exchange). The java SSLEngine engine throws an
exception when this happens. We currently log this at the warn level if
trace logging is not enabled. This level is too high for a valid
scenario. Additionally it happens all the time in tests (quickly closing
and opened transports). This commit changes this to be logged at the
debug level if trace is not enabled. Additionally, it extracts the
transport security exception handling to a common class.
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
Previously all the threads were writing the received tokens to a
HashSet. In cases with many threads, sometimes (1 every ~25 tests)
calling size() on the HashSet returned 2 even though it seemed to
contain only one String and there was no evidence from logging that
threadSecurityClient.refreshToken() ever returned a different
access or refresh token.
This commit changes the test to use a ConcurrentHashMap instead,
checking that we only received one pair of access token/refresh token
eventually. It also adds a check so that we won't take into consideration
tokens that are returned after 30s, hence not in the concurrent refresh
time window.
Due to migration from joda to java.time licence expiration 'full date' format
has to use 4-char pattern (MMMM). Also since jdk9 the date with ROOT
locale will still return abbreviated days and month names.
closes#39136
backport #39681
Fixes several errors of the token retry logic:
* not checking for backoff.hasNext() before calling backoff.next()
* checking for backoff.hasNext() without calling backoff.next()
* not preserving the context on the retry
* calling scheduleWithFixedDelay instead of schedule
This change does the following:
1. Makes the per-node setting xpack.ml.max_open_jobs
into a cluster-wide dynamic setting
2. Changes the job node selection to continue to use the
per-node attributes storing the maximum number of open
jobs if any node in the cluster is older than 7.1, and
use the dynamic cluster-wide setting if all nodes are on
7.1 or later
3. Changes the docs to reflect this
4. Changes the thread pools for native process communication
from fixed size to scaling, to support the dynamic nature
of xpack.ml.max_open_jobs
5. Renames the autodetect thread pool to the job comms
thread pool to make clear that it will be used for other
types of ML jobs (data frame analytics in particular)
Backport of #39320
Today the `GroupedActionListener` accepts a `defaults` parameter but all
callers pass an empty list. Also it is permitted to pass an empty group but
this is trappy because the delegated listener is never be called in that case.
This commit removes the `defaults` parameter and forbids an empty group.
As we are moving to single type indices,
we need to address this change in security-related indexes.
To address this, we are
- updating index templates to use preferred type name `_doc`
- updating the API calls to use preferred type name `_doc`
Upgrade impact:-
In case of an upgrade from 6.x, the security index has type
`doc` and this will keep working as there is a single type and `_doc`
works as an alias to an existing type. The change is handled in the
`SecurityIndexManager` when we load mappings and settings from
the template. Previously, we used to do a `PutIndexTemplateRequest`
with the mapping source JSON with the type name. This has been
modified to remove the type name from the source.
So in the case of an upgrade, the `doc` type is updated
whereas for fresh installs `_doc` is updated. This happens as
backend handles `_doc` as an alias to the existing type name.
An optional step is to `reindex` security index and update the
type to `_doc`.
Since we do not support the security audit log index,
that template has been deleted.
Relates: #38637
This commit renames the retention lease setting
index.soft_deletes.retention.lease so that it is under the namespace
index.soft_deletes.retention_lease. As such, we rename the setting to
index.soft_deletes.retention_lease.period.
Previously, Watcher only attached its listener to indices that started
with the prefix `.watches`, which causes Watcher to silently fail to
schedule newly created Watches if the `.watches` alias is redirected to
an index that does not start with `.watches`.
Watcher now attaches the listener to all indices, so that Watcher can
respond to changes in which index has the `.watches` alias.
Also adjusts the tests to randomly use non-prefixed concrete indices
for .watches and .triggered_watches.
This change adds two new cluster privileges:
* manage_data_frame_transforms
* monitor_data_frame_transforms
And two new built-in roles:
* data_frame_transforms_admin
* data_frame_transforms_user
These permit access to the data frame transform endpoints.
(Index privileges are also required on the source and
destination indices for each data frame transform, but
since these indices are configurable they it is not
appropriate to grant them via built-in roles.)
Today we have no chance to fetch actual segment stats for segments that
are currently unloaded. This is relevant in the case of frozen indices.
This allows to monitor how much memory a frozen index would use if it was
unfrozen.
This is a backport of #39631
Co-authored-by: Jay Modi jaymode@users.noreply.github.com
This change adds support for the concurrent refresh of access
tokens as described in #36872
In short it allows subsequent client requests to refresh the same token that
come within a predefined window of 60 seconds to be handled as duplicates
of the original one and thus receive the same response with the same newly
issued access token and refresh token.
In order to support that, two new fields are added in the token document. One
contains the instant (in epoqueMillis) when a given refresh token is refreshed
and one that contains a pointer to the token document that stores the new
refresh token and access token that was created by the original refresh.
A side effect of this change, that was however also a intended enhancement
for the token service, is that we needed to stop encrypting the string
representation of the UserToken while serializing. ( It was necessary as we
correctly used a new IV for every time we encrypted a token in serialization, so
subsequent serializations of the same exact UserToken would produce
different access token strings)
This change also handles the serialization/deserialization BWC logic:
In mixed clusters we keep creating tokens in the old format and
consume only old format tokens
In upgraded clusters, we start creating tokens in the new format but
still remain able to consume old format tokens (that could have been
created during the rolling upgrade and are still valid)
When reading/writing TokensInvalidationResult objects, we take into
consideration that pre 7.1.0 these contained an integer field that carried
the attempt count
Resolves#36872
Previously, the security index could be wrongfully recreated. This might
happen if the index was interpreted as missing, as in the case of a fresh
install, but the index existed and the state did not yet recover.
This fix will return HTTP SERVICE_UNAVAILABLE (503) for requests that
try to write to the security index before the state has not been recovered yet.
Investigating how to make DeleteExpiredDataIT faster, it was
revealed that the security audit trail threads were quite hot.
Disabling that seems to be helping quite a bit with making this
test faster. This commit also unmutes the test to see how it goes
with the audit trail disabled.
Relates #39658Closes#39575
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.
Follows: #39427
`enum` is a single option from a known list of `options`
`list` is an array of unknown values
`flags` are multiple options from a list of known `options`.
We don't support the `flags` type but a `list` with `options` acts as one. This is already the case for other API's taking metric such as `node.stats.json`.
watcher.stats behaves the same as other API's as `metrics` and as such accepts the following `GET _xpack/watcher/stats/queued_watches,current_watches`
(cherry picked from commit 4c00a025b8ac9b397b27c4ae2f799553d6499412)
ML has historically used doc as the single mapping type but reindex in 7.x
will change the mapping to _doc. Switching to the typeless APIs handles
case where the mapping type is either doc or _doc. This change removes
deprecated typed usages.
This cleans up the Engine implementation by separating the sequence number generation from the
planning step in the engine, to avoid for the planning step to have any side effects. This makes it
easier to see that every sequence number is properly accounted for.
Fix bug in IndexResolver that caused conflicts in multi-field types to
be ignored up (causing the query to fail later on due to mapping
conflicts).
The issue was caused by the multi-field which forced the parent creation
before checking its validity across mappings
Fix#39547
(cherry picked from commit 4e4fe289f90b9b5eae09072d54903701a3128696)
Queries that require counting of all hits (COUNT(*) on implicit
group by), now enable accurate hit tracking.
Fix#37971
(cherry picked from commit 265b637cf6df08986a890b8b5daf012c2b0c1699)
This commit parallelizes some parts of the test
and its remove an unnecessary refresh call.
On my local machine it shaves off about 15 seconds
for a test execution time of ~64s (down from ~80s).
This test is still slow but progress over perfection.
Relates #37339
This is a backport of #38382
This change adds supports for the concurrent refresh of access
tokens as described in #36872
In short it allows subsequent client requests to refresh the same token that
come within a predefined window of 60 seconds to be handled as duplicates
of the original one and thus receive the same response with the same newly
issued access token and refresh token.
In order to support that, two new fields are added in the token document. One
contains the instant (in epoqueMillis) when a given refresh token is refreshed
and one that contains a pointer to the token document that stores the new
refresh token and access token that was created by the original refresh.
A side effect of this change, that was however also a intended enhancement
for the token service, is that we needed to stop encrypting the string
representation of the UserToken while serializing. ( It was necessary as we
correctly used a new IV for every time we encrypted a token in serialization, so
subsequent serializations of the same exact UserToken would produce
different access token strings)
This change also handles the serialization/deserialization BWC logic:
- In mixed clusters we keep creating tokens in the old format and
consume only old format tokens
- In upgraded clusters, we start creating tokens in the new format but
still remain able to consume old format tokens (that could have been
created during the rolling upgrade and are still valid)
Resolves#36872
Co-authored-by: Jay Modi jaymode@users.noreply.github.com
Backport support for replicating closed indices (#39499)
Before this change, closed indexes were simply not replicated. It was therefore
possible to close an index and then decommission a data node without knowing
that this data node contained shards of the closed index, potentially leading to
data loss. Shards of closed indices were not completely taken into account when
balancing the shards within the cluster, or automatically replicated through shard
copies, and they were not easily movable from node A to node B using APIs like
Cluster Reroute without being fully reopened and closed again.
This commit changes the logic executed when closing an index, so that its shards
are not just removed and forgotten but are instead reinitialized and reallocated on
data nodes using an engine implementation which does not allow searching or
indexing, which has a low memory overhead (compared with searchable/indexable
opened shards) and which allows shards to be recovered from peer or promoted
as primaries when needed.
This new closing logic is built on top of the new Close Index API introduced in
6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before
closing them, and closing an index on a 8.0 cluster will reinitialize the index shards
and therefore impact the cluster health.
Some APIs have been adapted to make them work with closed indices:
- Cluster Health API
- Cluster Reroute API
- Cluster Allocation Explain API
- Recovery API
- Cat Indices
- Cat Shards
- Cat Health
- Cat Recovery
This commit contains all the following changes (most recent first):
* c6c42a1 Adapt NoOpEngineTests after #39006
* 3f9993d Wait for shards to be active after closing indices (#38854)
* 5e7a428 Adapt the Cluster Health API to closed indices (#39364)
* 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767)
* 71f5c34 Recover closed indices after a full cluster restart (#39249)
* 4db7fd9 Adapt the Recovery API for closed indices (#38421)
* 4fd1bb2 Adapt more tests suites to closed indices (#39186)
* 0519016 Add replica to primary promotion test for closed indices (#39110)
* b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631)
* c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955)
* 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
* e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329)
* cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327)
* b9becdd Adapt testPendingTasks() for replicated closed indices (#38326)
* 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024)
* e53a9be Fix compilation error in IndexShardIT after merge with master
* cae4155 Relax NoOpEngine constraints (#37413)
* 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
* c63fd69 [RCI] Add NoOpEngine for closed indices (#33903)
Relates to #33888
* SYS COLUMNS will skip UNSUPPORTED field types in ODBC and JDBC, as well.
NESTED and OBJECT types were already skipped in ODBC mode, now they are
skipped in JDBC mode, as well.
(cherry picked from commit 9e0df64b2d36c9069dfa506570468f0522c86417)
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.
Closes: #38501Fixes: #35203
This commit adds a simple integ test that exercises the flow:
* snapshot .security
* delete .security
* restore .security
, checking that the Native Realm works as expected.
Relates #34454
fix a couple of odd behaviors of data frame transforms REST API's:
- check if id from body and id from URL match if both are specified
- do not allow a body for delete
- allow get and stats without specifying an id
Backport of #39325
When ILM is disabled and Watcher is setting up the templates and policies for
the watch history indices, it will now use a template that does not have the
`index.lifecycle.name` setting, so that indices are not created with the
setting.
This also adds tests for the behavior, and changes the cluster state used in
these tests to be real instead of mocked.
Resolves#38805
This change fixes the tests that expect the reload of a
SSLConfiguration to fail. The tests relied on an incorrect assumption
that the reloader only called reload on for an SSLConfiguration if the
key and trust managers were successfully reloaded, but that is not the
case. This change removes the fail call with a wrapped call to the
original method and captures the exception and counts down a latch to
make these tests consistently tested.
Closes#39260
Backport of #39350
Contains the following:
* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
Previously, if a text field had an underlying keyword field
the latter was not used instead of the text leading to wrong
results returned by queries filtering with LIKE/RLIKE.
Fixes: #39442