This change deletes the SslNullCipherTests from our codebase since it
will have issues with newer JDK versions and it is essentially testing
JDK functionality rather than our own. The upstream JDK issue for
disabling these ciphers by default is
https://bugs.openjdk.java.net/browse/JDK-8212823.
Closes#37403
The test that remote clusters used by ML datafeeds have
a license that allows ML was not accounting for the
possibility that the remote cluster name could be
wildcarded. This change fixes that omission.
Fixes#36228
The SourceOnlySnapshotIT class tests a source only repository
using the following scenario:
starts a master node
starts a data node
creates a source only repository
creates an index with documents
snapshots the index to the source only repository
deletes the index
stops the data node
starts a new data node
restores the index
Thanks to ESIntegTestCase the index is sometimes created using a custom
data path. With such a setting, when a shard is assigned to one of the data
node of the cluster the shard path is resolved using the index custom data
path and the node's lock id by the NodeEnvironment#resolveCustomLocation().
It should work nicely but in SourceOnlySnapshotIT.snashotAndRestore(), b
efore the change in this PR, the last data node was restarted using a different
path.home. At startup time this node was assigned a node lock based on other
locks in the data directory of this temporary path.home which is empty. So it
always got the 0 lock id. And when this new data node is assigned a shard for
the index and resolves it against the index custom data path, it also uses the
node lock id 0 which conflicts with another node of the cluster, resulting in
various errors with the most obvious one being LockObtainFailedException.
This commit removes the temporary home path for the last data node so that it
uses the same path home as other nodes of the cluster and then got assigned
a correct node lock id at startup.
Closes#36330Closes#36276
Adjust FieldExtractor to handle fields which contain `.` in their
name, regardless where they fall in, in the document hierarchy. E.g.:
```
{
"a.b": "Elastic Search"
}
{
"a": {
"b.c": "Elastic Search"
}
}
{
"a.b": {
"c": {
"d.e" : "Elastic Search"
}
}
}
```
Fixes: #37128
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit removes the fallback for SSL settings. While this may be
seen as a non user friendly change, the intention behind this change
is to simplify the reasoning needed to understand what is actually
being used for a given SSL configuration. Each configuration now needs
to be explicitly specified as there is no global configuration or
fallback to some other configuration.
Closes#29797
This new `hostname` field is meant to be a replacement for its sibling `name` field. See https://github.com/elastic/beats/pull/9943, particularly https://github.com/elastic/beats/pull/9943#discussion_r245932581.
This PR simply adds the new field (`hostname`) to the mapping without removing the old one (`name`), because a user might be running an older-version Beat (without this field rename in it) with a newer-version Monitoring ES cluster (with this PR's change in it).
AFAICT the Monitoring UI isn't currently using the `name` field so no changes are necessary there yet. If it decides to start using the `name` field, it will also want to look at the value of the `hostname` field.
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
Adds another field, named "request.method", to the structured logfile audit.
This field is present for all events associated with a REST request (not a
transport request) and the value is one of GET, POST, PUT, DELETE, OPTIONS,
HEAD, PATCH, TRACE and CONNECT.
Improve error messages by returning the original SQL statement
declaration instead of trying to reproduce it as the casing and
whitespaces are not preserved accurately leading to small
differences.
Close#37161
Since `full` can be common as a field name or part of a field name
(e.g.: `full.name` or `name.full`), it's nice if it's not a reserved
keyword of the grammar so a user can use it without resorting to quotes.
Fixes: #37376
SqlPlugin cannot have more than one public constructor, so for the testing
purposes the `getLicenseState()` should be overriden.
Fixes: #37191
Co-authored-by: Michael Basnight <mbasnight@gmail.com>
When this message was first added the model debug config was
the only thing that could be updated, but now more aspects of
the config can be updated so the message needs to be more
general.
* ML: Updating .ml-state calls to be able to support > 1 index
* Matching bulk delete behavior with dbq
* Adjusting state name
* refreshing indices before search
* fixing line length
* adjusting index expansion options
This is a reinforcement of #37227. It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this. The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.
When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up. In the case
of reassignment of the job persistent task this was
dealt with by the stale status test. But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.
Fixes#36810
This adds a configurable whitelist to the HTTP client in watcher. By
default every URL is allowed to retain BWC. A dynamically configurable
setting named "xpack.http.whitelist" was added that allows to
configure an array of URLs, which can also contain simple regexes.
Closes#29937
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
This change ensures we always countdown the latch in the
SSLConfigurationReloaderTests to prevent the suite from timing out in
case of an exception. Additionally, we also increase the logging of the
resource watcher in case an IOException occurs.
See #36053
Field of types aliases that have dots in name are returned without a
hierarchy by field_caps, as oppose to the mapping api or field with
concrete types, which in turn breaks IndexResolver.
This commit fixes this by creating the backing hierarchy similar to the
mapping api.
Close#37224
Jobs created in version 6.1 or earlier can have a
null model_memory_limit. If these are parsed from
cluster state following a full cluster restart then
we replace the null with 4096mb to make the meaning
explicit. But if such jobs are streamed from an
old node in a mixed version cluster this does not
happen. Therefore we need to account for the
possibility of a null model_memory_limit in the ML
memory tracker.
This commit reorders the realm list for iteration based on the last
successful authentication for the given principal. This is an
optimization to prevent unnecessary iteration over realms if we can
make a smart guess on which realm to try first.
Fail with a 403 when indexing a document directly into a follower index.
In order to test this change, I had to move specific assertions into a dedicated class and
disable assertions for that class in the rest qa module. I think that is the right trade off.
If a running shard follow task needs to be restarted and
the remote connection seeds have changed then
a shard follow task currently fails with a fatal error.
The change creates the remote client lazily and adjusts
the errors a shard follow task should retry.
This issue was found in test failures in the recently added
ccr rolling upgrade tests. The reason why this issue occurs
more frequently in the rolling upgrade test is because ccr
is setup in local mode (so remote connection seed will become stale) and
all nodes are restarted, which forces the shard follow tasks to get
restarted at some point during the test. Note that these tests
cannot be enabled yet, because this change will need to be backported
to 6.x first. (otherwise the issue still occurs on non upgraded nodes)
I also changed the RestartIndexFollowingIT to setup remote cluster
via persistent settings and to also restart the leader cluster. This
way what happens during the ccr rolling upgrade qa tests, also happens
in this test.
Relates to #37231
* Tests: Add ElasticsearchAssertions.awaitLatch method
Some tests are using assertTrue(latch.await(...)) in their code. This
leads to an assertion error without any error message. This adds a
method which has a nicer error message and can be used in tests.
* fix forbidden apis
* fix spaces
* provide overriden `hashCode` and toString methods to account for `DISTINCT`
* change the analyzer for scenarios where `COUNT <field_name>` and `COUNT DISTINCT` have different paths
* defined a new `filter` aggregation encapsulating an `exists` query to filter out null or missing values
* ML: add migrate anomalies assistant
* adjusting failure handling for reindex
* Fixing request and tests
* Adding tests to blacklist
* adjusting test
* test fix: posting data directly to the job instead of relying on datafeed
* adjusting API usage
* adding Todos and adjusting endpoint
* Adding types to reindexRequest
* removing unreliable "live" data test
* adding index refresh to test
* adding index refresh to test
* adding index refresh to yaml test
* fixing bad exists call
* removing todo
* Addressing remove comments
* Adjusting rest endpoint name
* making service have its own logger
* adjusting validity check for newindex names
* fixing typos
* fixing renaming
This commit fixes a race condition in a test introduced by #36900 that
verifies concurrent authentications get a result propagated from the
first thread that attempts to authenticate. Previously, a thread may
be in a state where it had not attempted to authenticate when the first
thread that authenticates finishes the authentication, which would
cause the test to fail as there would be an additional authentication
attempt. This change adds additional latches to ensure all threads have
attempted to authenticate before a result gets returned in the
thread that is performing authentication.
Closing a channel using TLS/SSL requires reading and writing a
CLOSE_NOTIFY message (for pre-1.3 TLS versions). Many implementations do
not actually send the CLOSE_NOTIFY message, which means we are depending
on the TCP close from the other side to ensure channels are closed. In
case there is an issue with this, we need a timeout. This commit adds a
timeout to the channel close process for TLS secured channels.
As part of this change, we need a timer service. We could use the
generic Elasticsearch timeout threadpool. However, it would be nice to
have a local to the nio event loop timer service dedicated to network needs. In
the future this service could support read timeouts, connect timeouts,
request timeouts, etc. This commit adds a basic priority queue backed
service. Since our timeout volume (channel closes) is very low, this
should be fine. However, this can be updated to something more efficient
in the future if needed (timer wheel). Everything being local to the event loop
thread makes the logic simple as no locking or synchronization is necessary.
We already had logic to stop datafeeds running against
jobs that were OPENING, but a job that relocates from
one node to another while OPENED stays OPENED, and this
could cause the datafeed to fail when it sent data to
the OPENED job on its new node before it had a
corresponding autodetect process.
This change extends the check to stop datafeeds running
when their job is OPENING _or_ stale (i.e. has not had
its status reset since relocating to a different node).
Relates #36810
This bug was introduced in #36893 and had the effect that
execution would continue after calling onFailure on the the
listener in checkIfTokenIsValid in the case that the token is
expired. In a case of many consecutive requests this could lead to
the unwelcome side effect of an expired access token producing a
successful authentication response.
After #30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls
* COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one
This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
Today, a setting can declare that its validity depends on the values of other
related settings. However, the validity of a setting is not always checked
against the correct values of its dependent settings because those settings'
correct values may not be available when the validator runs.
This commit separates the validation of a settings updates into two phases,
with separate methods on the `Setting.Validator` interface. In the first phase
the setting's validity is checked in isolation, and in the second phase it is
checked again against the values of its related settings. Most settings only
use the first phase, and only the few settings with dependencies make use of
the second phase.
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.
Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.
Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.
Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.
This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
- timestamps
- expiration
- persistence to and recovery from Lucene
- handoff during primary relocation
- sharing retention leases with replicas
- exposing leases in shard-level statistics
- integration with cross-cluster replication
These will occur individually in follow-up commits.
This pull request changes the Freeze Index and Close Index actions so
that these actions always requires a Task. The task's id is then propagated
from the Freeze action to the Close action, and then to the Verify shard action.
This way it is possible to track which Freeze task initiates the closing of an index,
and which consecutive verifiy shard are executed for the index closing.
As suggested in #36775, this pull request renames the following methods:
ClusterBlocks.hasGlobalBlock(int)
ClusterBlocks.hasGlobalBlock(RestStatus)
ClusterBlocks.hasGlobalBlock(ClusterBlockLevel)
to something that better reflects the property of the ClusterBlock that is searched for:
ClusterBlocks.hasGlobalBlockWithId(int)
ClusterBlocks.hasGlobalBlockWithStatus(RestStatus)
ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
Improve error message returned to the client when an SQL statement
cannot be translated to a ES query DSL. Cases:
1. WHERE clause evaluates to FALSE => No results returned
1. Missing FROM clause => Local execution, e.g.: SELECT SIN(PI())
3. Special SQL command => Only valid of SQL iface, e.g.: SHOW TABLES
Fixes: #37040
Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.
Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.
Fixes: #35872
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028
When a 6.1-6.3 job is opened in a later version
we increase the model memory limit by 30% if it's
below 0.5GB. The migration of jobs from cluster
state to the config index changes the job version,
so we need to also do this uplift as part of that
config migration.
Relates #36961
The unused state remover was never adjusted to account for jobs stored
in the config index. The result was that when triggered it removed
state for all jobs stored in the config index.
This commit fixes the issue.
Closes#37109
Improve parsing to save the source for each token alongside the location
of each Node/Expression for accurate reproducibility of an expression
name and source
Fix#36894
Removes the `xpack.ml.max_model_memory_limit` cluster setting
at the teardown of the `ml_info.yml` tests to ensure the setting
does not trip other tests.
Enhance error message for the case that the 2nd argument of PERCENTILE
and PERCENTILE_RANK is not a foldable, as it doesn't make sense to have
a dynamic value coming from a field.
Fixes: #36903
We run subsequent token invalidation requests and we still want to
trigger the deletion of expired tokens so we need to lower the
deleteInterval parameter significantly. Especially now that the
bwc expiration logic is removed and the invalidation process is
much shorter
Resolves#37063
Changes the feature usage retrieval to use the job manager rather than
directly talking to the cluster state, because jobs can now be either in
cluster state or stored in an index
This is a follow-up of #36702 / #36698
- Removes bwc invalidation logic from the TokenService
- Removes bwc serialization for InvalidateTokenResponse objects as
old nodes in supported mixed clusters during upgrade will be 6.7 and
thus will know of the new format
- Removes the created field from the TokensInvalidationResult and the
InvalidateTokenResponse as it is no longer useful in > 7.0
If we don't explicitly sett the client SSLSocketFactory when
creating an InMemoryDirectoryServer and setting its SSL config, it
will result in using a TrustAllTrustManager(that extends
X509TrustManager) which is not allowed in a FIPS 140 JVM.
Instead, we get the SSLSocketFactory from the existing SSLContext
and pass that to be used.
Resolves#37013
The phrase "missing authentication token" is historic and is based
around the use of "AuthenticationToken" objects inside the Realm code.
However, now that we have a TokenService and token API, this message
would sometimes lead people in the wrong direction and they would try
and generate a "token" for authentication purposes when they would
typically just need a username:password Basic Auth header.
This change replaces the word "token" with "credentials".
In #30509 we changed the way SSL configuration is reloaded when the
content of a file changes. As a consequence of that implementation
change the LDAP realm ceased to pick up changes to CA files (or other
certificate material) if they changed.
This commit repairs the reloading behaviour for LDAP realms, and adds
a test for this functionality.
Resolves: #36923
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
The AutoFollowCoordinator should be resilient to the fact that the follower
index has already been created and in that case it should only update
the auto follow metadata with the fact that the follower index was created.
Relates to #33007
Currently auto follow stats users are unable to see whether an auto follow
error was recent or old. The new timestamp field will help user distinguish
between old and new errors.
Both index following and auto following should be resilient against missing remote connections.
This happens in the case that they get accidentally removed by a user. When this happens
auto following and index following will retry to continue instead of failing with unrecoverable exceptions.
Both the put follow and put auto follow APIs validate whether the
remote cluster connection. The logic added in this change only exists
in case during the lifetime of a follower index or auto follow pattern
the remote connection gets removed. This retry behavior similar how CCR
deals with authorization errors.
Closes#36667Closes#36255
* Added Limitations page
* Made the aggregations page follow the common template for functions
* Modified all tables to have the first row's cells content centered
* Polishing in other various sections
When the script contexts were created in 6, the use of params.ctx was
deprecated. This commit cleans up that code and ensures that params.ctx
is null in both watcher script contexts.
Relates: #34059
Realm settings were changed in #30241 in a non-BWC way.
If you try and start a 7.x node using a 6.x config style, then the
default error messages do not adequately describe the cause of
the problem, or the solution.
This change detects the when realms are using the 6.x style and fails
with a specific error message.
This detection is a best-effort, and will detect issues when the
realms have not been modified to use the 7.x style, but may not detect
situations where the configuration was partially changed.
e.g. We can detect this:
xpack.security.authc:
realms.pki1.type: pki
realms.pki1.order: 3
realms.pki1.ssl.certificate_authorities: [ "ca.crt" ]
But this (where the "order" has been updated, but the "ssl.*" has not)
will fall back to the standard "unknown setting" check
xpack.security.authc:
realms.pki.pki1.order: 3
realms.pki1.ssl.certificate_authorities: [ "ca.crt" ]
Closes: #36026
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.
It implements the ccr clean_session action with this client.
Allow scripts to correctly reference grouping functions
Fix bug in translation of date/time functions mixed with histograms.
Enhance Verifier to prevent histograms being nested inside other
functions inside GROUP BY (as it implies double grouping)
Extend Histogram docs
This is related to #35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
the testFullPolicy and testMoveToRolloverStep tests
are very important tests, but they sometimes timeout
beyond the default 10sec wait for shrink to occur.
This commit increases one of the assertBusys to
20 seconds
Currently if a leader index with soft deletes disabled is auto followed then this index is silently ignored.
This commit changes this behavior to mark these indices as auto followed and report an error, which is visible in auto follow stats. Marking the index as auto follow is important, because otherwise the auto follower will continuously try to auto follow and fail.
Relates to #33007
... MlDistributedFailureIT.testLoseDedicatedMasterNode.
An intermittent failure has been observed in
`MlDistributedFailureIT. testLoseDedicatedMasterNode`.
The test launches a cluster comprised by a dedicated master node
and a data and ML node. It creates a job and datafeed and starts them.
It then shuts down and restarts the master node. Finally, the test asserts
that the two tasks have been reassigned within 10s.
The intermittent failure is due to the assertions that the tasks have been
reassigned failing. Investigating the failure revealed that the `assertBusy`
that performs that assertion times out. Furthermore, it appears that the
job task is not reassigned because the memory tracking info is stale.
Memory tracking info is refreshed asynchronously when a job is attempted
to be reassigned. Tasks are attempted to be reassigned either due to a relevant
cluster state change or periodically. The periodic interval is controlled by a cluster
setting called `cluster.persistent_tasks.allocation.recheck_interval` and defaults to 30s.
What seems to be happening in this test is that if all cluster state changes after the
master node is restarted come through before the async memory info refresh completes,
then the job might take up to 30s until it is attempted to reassigned. Thus the `assertBusy`
times out.
This commit changes the test to reduce the periodic check that reassigns persistent
tasks to `200ms`. If the above theory is correct, this should eradicate those failures.
Closes#36760
This change cleans up a number of ugly BWC
workarounds in the ML code.
7.0 cannot run in a mixed version cluster with
versions prior to 6.7, so code that deals with
these old versions is no longer required.
Closes#29963
* This change is to account for different system clock implementations
or different Java versions (for Java 8, milliseconds precision is used;
for Java 9+ a system specific clock implementation is used which can
have greater precision than what we need here).
Negative timestamps are currently supported in joda time. These are
dates before epoch. However, it doesn't really make sense to have a
negative timestamp, since this is a modern format. Any dates before
epoch can be represented with normal date formats, like ISO8601.
Additionally, implementing negative epoch timestamp parsing in java time
has an edge case which would more than double the code required. This
commit deprecates use of negative epoch timestamps.
When a filter is evaluated to false then it becomes a LocalRelation
with an EmptyExecutable. The LocalRelation in turn, becomes a
LocalExec and the the SkipQueryIfFoldingProjection was wrongly
converting it to a SingletonExecutable. Moreover made a change, so
that the queries without FROM clause, which are supposed to return a
single row, to become a LocalRelation with a SingletonExecutable
instead of EmptyExecutable to avoid mixing up with the ones operating
on a table but with a filter that evaluates to false.
Fixes: #35980
Leaving `index.lifecycle.indexing_complete` in place when removing the
lifecycle policy from an index can cause confusion, as if a new policy
is associated with the policy, rollover will be silently skipped.
Removing that setting when removing the policy from an index makes
associating a new policy with the index more involved, but allows ILM to
fail loudly, rather than silently skipping operations which the user may
assume are being performed.
* Adjust order of checks in WaitForRolloverReadyStep
This allows ILM to error out properly for indices that have a valid
alias, but are not the write index, while still handling
`indexing_complete` on old-style aliases and rollover (that is, those
which only point to a single index at a time with no explicit write
index)
Fixes two minor problems reported after merge of #36731:
1. Name the creation method to make clear it only creates
if necessary
2. Avoid multiple simultaneous in-flight creation requests
* ML having delayed data detection create annotations
* adding upsertAsDoc, audit, and changing user
* changing update to just index the doc with the id set
Lucene 7.6 uses a smaller encoding for LatLonShape. This commit forks the LatLonShape classes to Elasticsearch's local lucene package. These classes will be removed on the release of Lucene 7.6.
The logstash management template was named in such a way as to confuse
users, who misunderstood it to be a template for indices created by
logstash. It is now renamed to more clearly communicate its purpose and
match the format of the other templates for system indices.
The parser used for rollup configs in _meta fields was not able to
handle unrelated data in the meta field. If an unrelated object
was encountered, it would half-consume the JSON object, realize it
wasn'ta rollup config, then stop parsing. This would leave the object
halfway consumed and the parsing framework would throw an exception.
This commit replaces the parsing logic with a set of minimal parsers,
each for the specific component we care about (`_doc`, `_meta`,
`_rollup`) and configured to ignore unknown fields where applicable.
More verbose, but less hacky than before and should be more robust.
Also adds tests (randomized and explicit) to make sure this doesn't
break in the future.
This commit is related to #36127. It adds a CcrRestoreSourceService to
track Engine.IndexCommitRef need for in-process file restores. When a
follower starts restoring a shard through the CcrRepository it opens a
session with the leader through the PutCcrRestoreSessionAction. The
leader responds to the request by telling the follower what files it
needs to fetch for a restore. This is not yet implemented.
Once, the restore is complete, the follower closes the session with the
DeleteCcrRestoreSessionAction action.
* [ML] Job and datafeed mappings with index template (#32719)
Index mappings for the configuration documents
* [ML] Job config document CRUD operations (#32738)
* [ML] Datafeed config CRUD operations (#32854)
* [ML] Change JobManager to work with Job config in index (#33064)
* [ML] Change Datafeed actions to read config from the config index (#33273)
* [ML] Allocate jobs based on JobParams rather than cluster state config (#33994)
* [ML] Return missing job error when .ml-config is does not exist (#34177)
* [ML] Close job in index (#34217)
* [ML] Adjust finalize job action to work with documents (#34226)
* [ML] Job in index: Datafeed node selector (#34218)
* [ML] Job in Index: Stop and preview datafeed (#34605)
* [ML] Delete job document (#34595)
* [ML] Convert job data remover to work with index configs (#34532)
* [ML] Job in index: Get datafeed and job stats from index (#34645)
* [ML] Job in Index: Convert get calendar events to index docs (#34710)
* [ML] Job in index: delete filter action (#34642)
This changes the delete filter action to search
for jobs using the filter to be deleted in the index
rather than the cluster state.
* [ML] Job in Index: Enable integ tests (#34851)
Enables the ml integration tests excluding the rolling upgrade tests and a lot of fixes to
make the tests pass again.
* [ML] Reimplement established model memory (#35500)
This is the 7.0 implementation of a master node service to
keep track of the native process memory requirement of each ML
job with an associated native process.
The new ML memory tracker service works when the whole cluster
is upgraded to at least version 6.6. For mixed version clusters
the old mechanism of established model memory stored on the job
in cluster state was used. This means that the old (and complex)
code to keep established model memory up to date on the job object
has been removed in 7.0.
Forward port of #35263
* [ML] Need to wait for shards to replicate in distributed test (#35541)
Because the cluster was expanded from 1 node to 3 indices would
initially start off with 0 replicas. If the original node was
killed before auto-expansion to 1 replica was complete then
the test would fail because the indices would be unavailable.
* [ML] DelayedDataCheckConfig index mappings (#35646)
* [ML] JIndex: Restore finalize job action (#35939)
* [ML] Replace Version.CURRENT in streaming functions (#36118)
* [ML] Use 'anomaly-detector' in job config doc name (#36254)
* [ML] Job In Index: Migrate config from the clusterstate (#35834)
Migrate ML configuration from clusterstate to index for closed jobs
only once all nodes are v6.6.0 or higher
* [ML] Check groups against job Ids on update (#36317)
* [ML] Adapt to periodic persistent task refresh (#36633)
* [ML] Adapt to periodic persistent task refresh
If https://github.com/elastic/elasticsearch/pull/36069/files is
merged then the approach for reallocating ML persistent tasks
after refreshing job memory requirements can be simplified.
This change begins the simplification process.
* Remove AwaitsFix and implement TODO
* [ML] Default search size for configs
* Fix TooManyJobsIT.testMultipleNodes
Two problems:
1. Stack overflow during async iteration when lots of
jobs on same machine
2. Not effectively setting search size in all cases
* Use execute() instead of submit() in MlMemoryTracker
We don't need a Future to wait for completion
* [ML][TEST] Fix NPE in JobManagerTests
* [ML] JIindex: Limit the size of bulk migrations (#36481)
* [ML] Prevent updates and upgrade tests (#36649)
* [FEATURE][ML] Add cluster setting that enables/disables config migration (#36700)
This commit adds a cluster settings called `xpack.ml.enable_config_migration`.
The setting is `true` by default. When set to `false`, no config migration will
be attempted and non-migrated resources (e.g. jobs, datafeeds) will be able
to be updated normally.
Relates #32905
* [ML] Snapshot ml configs before migrating (#36645)
* [FEATURE][ML] Split in batches and migrate all jobs and datafeeds (#36716)
Relates #32905
* SQL: Fix translation of LIKE/RLIKE keywords (#36672)
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
* Fixing line length for EnvironmentTests and RecoveryTests (#36657)
Relates #34884
* Add back one line removed by mistake regarding java version check and
COMPAT jvm parameter existence
* Do not resolve addresses in remote connection info (#36671)
The remote connection info API leads to resolving addresses of seed
nodes when invoked. This is problematic because if a hostname fails to
resolve, we would not display any remote connection info. Yet, a
hostname not resolving can happen across remote clusters, especially in
the modern world of cloud services with dynamically chaning
IPs. Instead, the remote connection info API should be providing the
configured seed nodes. This commit changes the remote connection info to
display the configured seed nodes, avoiding a hostname resolution. Note
that care was taken to preserve backwards compatibility with previous
versions that expect the remote connection info to serialize a transport
address instead of a string representing the hostname.
* [Painless] Add boxed type to boxed type casts for method/return (#36571)
This adds implicit boxed type to boxed types casts for non-def types to create asymmetric casting relative to the def type when calling methods or returning values. This means that a user calling a method taking an Integer can call it with a Byte, Short, etc. legally which matches the way def works. This creates consistency in the casting model that did not previously exist.
* SNAPSHOTS: Adjust BwC Versions in Restore Logic (#36718)
* Re-enables bwc tests with adjusted version conditions now that #36397 enables concurrent snapshots in 6.6+
* ingest: fix on_failure with Drop processor (#36686)
This commit allows a document to be dropped when a Drop processor
is used in the on_failure fork of the processor chain.
Fixes#36151
* Initialize startup `CcrRepositories` (#36730)
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
* [TEST] fix float comparison in RandomObjects#getExpectedParsedValue
This commit fixes a test bug introduced with #36597. This caused some
test failure as stored field values comparisons would not work when CBOR
xcontent type was used.
Closes#29080
* [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
* TESTS:Debug Log. IndexStatsIT#testFilterCacheStats
* ingest: support default pipelines + bulk upserts (#36618)
This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.
However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.
This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.
fixes#36219
* Fix duplicate phrase in shrink/split error message (#36734)
This commit removes a duplicate "must be a" from the shrink/split error
messages.
* Deprecate types in get_source and exist_source (#36426)
This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.
Relates to #35190
* Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)"
This reverts commit 5bc7822562.
* Enhance Invalidate Token API (#35388)
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
* Add raw sort values to SearchSortValues transport serialization (#36617)
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.
This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.
* Watcher: Ensure all internal search requests count hits (#36697)
In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.
Closes#36177
* [TEST] Ensure shard follow tasks have really stopped.
Relates to #36696
* Ensure MapperService#getAllMetaFields elements order is deterministic (#36739)
MapperService#getAllMetaFields returns an array, which is created out of
an `ObjectHashSet`. Such set does not guarantee deterministic hash
ordering. The array returned by its toArray may be sorted differently
at each run. This caused some repeatability issues in our tests (see #29080)
as we pick random fields from the array of possible metadata fields,
but that won't be repeatable if the input array is sorted differently at
every run. Once setting the tests seed, hppc picks that up and the sorting is
deterministic, but failures don't repeat with the seed that gets printed out
originally (as a seed was not originally set).
See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173.
With this commit, we simply create a static sorted array that is used for
`getAllMetaFields`. The change is in production code but really affects
only testing as the only production usage of this method was to iterate
through all values when parsing fields in the high-level REST client code.
Anyways, this seems like a good change as returning an array would imply
that it's deterministically sorted.
* Expose Sequence Number based Optimistic Concurrency Control in the rest layer (#36721)
Relates #36148
Relates #10708
* [ML] Mute MlDistributedFailureIT
This change fixes the rollup statistics regarding search times. Search times are
computed from the first query and never updated. This commit adds the missing
calls to the subsequent search.
Fix grammar so that each element inside the list of values for IN
is a valueExpression and not a more generic expression. Introduce a
mapping for context names as some rules in the grammar are exited with
a different rule from the one they entered.This helps so that the decrement
of depth counts in the Parser's CircuitBreakerListener works correctly.
For the list of values for IN, don't count the
PrimaryExpressionContext as this is not visited on exitRule() due to
the peculiarity in our gramamr with the predicate and predicated.
Fixes: #36592
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
The ML UI now provides the ability for users to annotate
time periods with arbitrary text to add insight to what
happened.
This change makes the backend create the index for these
annotations, together with read and write aliases to
make future upgrades possible without adding complexity
to the UI.
It also adds read and write permission to the index for
all ML users (not just admins).
The spec for the index is in
https://github.com/elastic/kibana/pull/26034/files#diff-c5c6ac3dbb0e7c91b6d127aa06121b2cR7
Relates #33376
Relates elastic/kibana#26034
In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.
Closes#36177
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.
Relates to #33007
Originates from #35895
This fixes two bugs about watcher notifications:
* registering accounts that had only secure settings was not possible before;
these accounts are very much practical for Slack and PagerDuty integrations.
* removes the limitation that, for an account with both secure and cluster settings,
the admin had to first change/add the secure settings and only then add the
dependent dynamic cluster settings. The reverse order would trigger a
SettingsException for an incomplete account.
The workaround is to lazily instantiate account objects, hoping that when accounts
are instantiated all the required settings are in place. Previously, the approach
was to greedily validate all the account settings by constructing the account objects,
even if they would not ever be used by actions. This made sense in a world where
all the settings were set by a single API. But given that accounts have dependent
settings (that must be used together) that have to be changed using different APIs
(POST _nodes/reload_secure_settings and PUT _cluster/settings), the settings group
would technically be in an invalid state in between the calls.
This fix builds account objects, and validates the settings, when they are
needed by actions.
As the internals have moved to java.time, the usage of TimeZone itself
should be minimized as it creates issues when being converted to ZoneId
Protocol wise the two are mostly identical so consumer should not see
any difference.
Note that terminology wise, inside the docs, the public API and inside
the protocol timeZone will continue to be used as it's more widely
understood as oppose to zoneId which is an implementation detail
specific to the JVM
Fix#36535
When trying to analyse a HAVING condition, we crate a temp Aggregate
Plan which wasn't created correctly (missing the expressions from the
SELECT clause) and as a result the ordinal (1, 2, etc) in the GROUP BY
couldn't be resolved correctly.
Also after successfully analysing the HAVING condition, still the
original plan was returned.
Fixes: #36059
If a primary promotion happens in the test testAddRemoveShardOnLeader, the
max_seq_no_of_updates_or_deletes on a new primary might be higher than the
max_seq_no_of_updates_or_deletes on the replicas or copies of the follower.
Relates #36607
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control)
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.
Relates #36148
Relates #10708
For class fields of type collection whose order is not important
and for which duplicates are not permitted we declare them as `Set`s.
Usually the definition is a `HashSet` but in this case `TreeSet` is used
instead to aid testing.
This commit updates our transport settings for 7.0. It generally takes a
few approaches. First, for normal transport settings, it usestransport.
instead of transport.tcp. Second, it uses transport.tcp, http.tcp,
or network.tcp for all settings that are proxies for OS level socket
settings. Third, it marks the network.tcp.connect_timeout setting for
removal. Network service level settings are only settings that apply to
both the http and transport modules. There is no connect timeout in
http. Fourth, it moves all the transport settings to a single class
TransportSettings similar to the HttpTransportSettings class.
This commit does not actually remove any settings. It just adds the new
renamed settings and adds todos for settings that will be deprecated.
* WaitForRolloverReadyStepTests#mutateInstance sometimes did not mutate the instance
correctly
* 40_explain_lifecycle#"Test new phase still has phase_time" is not really a necessary
integration test. In addition to this, it is flaky due to the asynchronous nature of
ILM metadata population
Introduce Histogram grouping function for bucketing/grouping data based
on a given range. Both date and numeric histograms are supported using
the appropriate range declaration (numbers vs intervals).
SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h
SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h
In addition add multiply operator for Intervals
Add docs for intervals and histogram
Fix#36509
Add CURRENT_TIMESTAMP as keyword as well function alongside NOW()
These return the current date/time for the given query, computed when
the statement reaches the server. For completeness, CURRENT_TIMESTAMP
also accepts precision as an optional parameter.
Fix#36534
Add missing `formatTemplate()` for conditional functions which
resulted in incomplete painless script. Moreover the specific
return type of Object in the painless signatures resulted in
casting exceptions when conditional functions are used in the
ORDER BY.
Fixes: #36631
* Enable parallel restore operations
* Add uuid to restore in progress entries to uniquely identify them
* Adjust restore in progress entries to be a map in cluster state
* Added tests for:
* Parallel restore from two different snapshots
* Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator
* Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers
The file structure finder has timeout functionality,
but prior to this change it would not interrupt a
single long-running Grok match attempt.
This commit hooks into the ThreadWatchdog facility
provided by the Grok library to interrupt individual
Grok matches that may be running at the time the
file structure finder timeout expires.
testFailLeaderReplicaShard periodically fails because we concurrently
index to the leader group and close one of its replicas. If a
replication request hits a closing shard, we will fail that shard;
however, failing a shard is supported by the test framework - this makes
the test fail.
Previously, Math.floorMod was used for integers and longs
which has different logic for negative numbers. Also, the
priority of data types check was wrong as if one of the args
is double the evaluation should be with doubles, then for floats,
then longs and finally integers.
Fixes: #36364
This commit adds deprecation warnings when using format specifiers with
joda data formats that will change with java time. It also adds the "8"
prefix which may be used to force the new java time format parsing.
The getters and setters for useDisMax() have been deprecated since at least 6.0,
also there hasn't been any reference to the query parameter in the
documentation. Removing it from the builder and tests and replacing it with
`tieBreaker(1.0f)` where necessary.
The commit changes how indices are closed in the MetaDataIndexStateService.
It now uses a 3 steps process where writes are blocked on indices to be closed,
then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction
added in #36249, and finally indices states are moved to CLOSE and their routing
tables removed.
The closing process also takes care of using the pre-7.0 way to close indices if the
cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction. It also closes unassigned indices.
Related to #33888
* add read_ilm cluster privilege
Although managing ILM policies is best done using the
"manage" cluster privilege, it is useful to have read-only
views.
* adds `read_ilm` cluster privilege for viewing policies and status
* adds Explain API to the `view_index_metadata` index privilege
* add manage_ilm privileges
This commit add support to engine operations for resolving and verifying the sequence number and
primary term of the last modification to a document before performing an operation. This is
infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning.
Relates #36148
Relates #10708
There are certain BootstrapCheck checks that may need access environment-specific
values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment
via a constructor to bypass the shortcoming in BootstrapContext. This commit
pulls in the node's environment into BootstrapContext.
Another case is found in #36519, where it is useful to check the state of the
data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on
the environment to retrieve references to things like node data paths.
This means that the BootstrapContext will have the same Settings used in the
Environment, which currently differs from the Node's settings.
The error message used when attempting to delete a lifecycle policy that
is in use previously only included one index which was using the policy.
It now includes all indices using that policy.
Includes the following:
* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this
has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
`PageCacheRecycler` is the class that creates and holds pages of arrays
for various uses. `BigArrays` is just one user of these pages. This
commit moves the constants that define the page sizes for the recycler
to be on the recycler class.
Changed AutofollowCoordinator makes use of the wait_for_metadata_version
feature in cluster state API and removed hard coded poll interval.
Originates from #35895
Relates to #33007
The auto follow coordinator keeps track of the UUIDs of indices that it has followed. The index UUID strings need to be cleaned up in the case that these indices are removed in the remote cluster.
Relates to #33007
This commit moves the MergedDateFormatter to a package private class and
reworks joda DateFormatter instances to use that instead of a single
DateTimeFormatter with multiple parsers. This will allow the java and
joda multi formats to share the same format parsing method in a
followup.
Redeprecates the `/_xpack/rollup` endpoints in favor of `/_rollup`.
When we cleanup the rollup in a cluster containing 6.x nodes we need to
use `/_xpack/rollup` instead of `/_rollup` because the 6.x nodes don't
know about `/_rollup`. In those cases we must ignore the deprecation
warnings that the 7.0 node will return for the end point.
Closes#36044
* Renamed DAY_OF_WEEK and WEEK_OF_YEAR functions to their ISO version and
added the same functions with different functionality.
* Rewritten the datetime functions documentation to follow the format of the other
functions documentation pages.
This commit modifies BigArrays to take a circuit breaker name and
the circuit breaking service. The default instance of BigArrays that
is passed around everywhere always uses the request breaker. At the
network level, we want to be using the inflight request breaker. So this
change will allow that.
Additionally, as this change moves away from a single instance of
BigArrays, the class is modified to not be a Releasable anymore.
Releasing big arrays was always dispatching to the PageCacheRecycler,
so this change makes the PageCacheRecycler the class that needs to be
managed and torn-down.
Finally, this commit closes#31435 be making the serialization of
transport messages use the inflight request breaker. With this change,
we no longer push the global BigArrays instnace to the network level.
1. CCR tests work without any changes
2. `testDanglingIndices` require changes the source code (added TODO).
3. `testIndexDeletionWhenNodeRejoins` because it's using just two
nodes, adding the node to exclusions is needed on restart.
4. `testCorruptTranslogTruncationOfReplica` starts dedicated master
one, because otherwise, the cluster does not form, if nodes are stopped
and one node is started back.
5. `testResolvePath` needs TEST cluster, because all nodes are stopped
at the end of the test and it's not possible to perform checks needed
by SUITE cluster.
6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2
retries snapshot creation as soon as network partition heals. This
results into the race between creating snapshot and test cleanup logic
(deleting index). Zen1 on the
other hand, also schedules retry, but it takes some time after network
partition heals, so cleanup logic executes latter and test passes. The
check that snapshot is eventually created is added to
the end of the test.
Adds a setting that indicates that an index is done indexing, set by ILM
when the Rollover action completes. This indicates that the Rollover
action should be skipped in any future invocations, as long as the index
is no longer the write index for its alias.
This enables 1) an index with a policy that involves the Rollover action
to have the policy removed and switched to another one without use of
the move-to-step API, and 2) integrations with Beats and CCR.
* This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.
- REST API docs
- HLRC docs and doc tests
- Handle REST actions with deprecation warnings
- Changed endpoints in rest-api-spec and relevant file names
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This commit converts the watcher execution context to use the joda
compat java time objects. It also again removes the joda methods from
the painless whitelist.
* Add non-X-Pack centric rollup endpoints
This commit adds new endpoints for rollup that do not have _xpack in
their path. The purpose for this change is to take these endpoints into
6.x as well so that they can be available in mixed cluster tests too. A
follow-up change will deprecate the use of _xpack in the rollup
endpoints. And finally, in the future, we would remove the _xpack
endpoints.
* Remove import
* Fix typo
Today the `GetDiscoveredNodesAction` waits, possibly indefinitely, to discover
enough nodes to bootstrap the cluster. However it is possible that the cluster
forms before a node has discovered the expected collection of nodes, in which
case the action will wait indefinitely despite the fact that it is no longer
required.
This commit changes the behaviour so that the action fails once a node receives
a cluster state with a nonempty configuration, indicating that the cluster has
been successfully bootstrapped and therefore the `GetDiscoveredNodesAction`
need wait no longer.
Relates #36380 and #36381; reverts 558f4ec278.
Previously, we used a CamelCase to CAMEL_CASE transformation to get the
primary name of a function from its class name which led to some issues
since there are functions that we don't want to be registered this way
(e.g.: IFNULL). To simplify the logic and avoid and "magic"
transformations in the FunctionRegistry a primary name must be provided
explicitely for each function.
The same change is applied for the function resolution (when a function
is used in an SQL statement). There is no CamelCase to CAMEL_CASE
transformation but only upper-casing is applied (fuNcTiOn -> FUNCTION).
This commit creates JodaDateFormatter to replace
FormatDateTimeFormatter. It converts all uses of the old class
to DateFormatter to allow a future change to use JavaDateFormatter
when appropriate.
Renamed the follow qa modules:
`multi-cluster-downgraded-to-basic-license` to `downgraded-to-basic-license`
`multi-cluster-with-non-compliant-license` to `non-compliant-license`
`multi-cluster-with-security` to `security`
Moved the `chain` module into the `multi-cluster` module and
changed the `multi-cluster` to start 3 clusters.
Followup from #36031
This commit makes FormatDateTimeFormatter and DateFormatter apis close
to each other, so that the former can be removed in favor of the latter.
This PR does not change the uses of FormatDateTimeFormatter yet, so that
that future change can be purely mechanical.
This commit gets rid of the 'NONE' and 'INFO' severity levels for
deprecation issues.
'NONE' is unused and does not make much sense as a severity level.
'INFO' can be separated into two categories: Either 1) we can
definitively tell there will be a problem with the cluster/node/index
configuration that can be resolved prior to upgrade, in which case
the issue should be a WARNING, or 2) we can't, because any issues would
be at the application level, for which the user should review the
deprecation logs and/or response headers.
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.
When the restore shard method is called, an empty shard is initialized.
ML jobs and datafeeds wrap collections into their unmodifiable
equivalents in their constructor. However, the copying builder
does not make a copy of some of those collections resulting
in wrapping those again and again. This can eventually result
to stack overflow.
This commit addressed this issue by copying the collections in
question in the copying builder constructor.
Closes#36360
In #34474, we added a new assertion to ensure that the
LocalCheckpointTracker is always consistent with Lucene index. However,
we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this
assertion to be violated.
This commit removes resetCheckpoint from LocalCheckpointTracker and
rewrites testDedupByPrimaryTerm without resetting the local checkpoint.
Relates #34474
This test tries to compare the CB stats from an InternalEngine
and a FrozenEngine but is subject to segement merges that might finish
and get committed after we read the breaker stats. This can cause
occational test failures.
Closes#36207
The results iterator is consuming and closing the results stream
once it is done. It seems this should not be the responsibility
of the results iterator. It stops the iterator from being reusable
for different processes where closing the stream is not desirable.
This commit is moving the consuming and closing of the results stream
into the autodetect result processor.
Includes:
LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates
Relates #36286
This is related to #27260. In Elasticsearch all of the messages that we
serialize to write to the network are composed of heap bytes. When you
read or write to a nio socket in java, the heap memory you passed down
must be copied to/from direct memory. The JVM internally does some
buffering of the direct memory, however it is essentially unbounded.
This commit introduces a simple mechanism of buffering and copying the
memory in transport-nio. Each network event loop is given a 64kb
DirectByteBuffer. When we go to read we use this buffer and copy the
data after the read. Additionally, when we go to write, we copy the data
to the direct memory before calling write. 64KB is chosen as this is the
default receive buffer size we use for transport-netty4
(NETTY_RECEIVE_PREDICTOR_SIZE).
Since we only have one buffer per thread, we could afford larger.
However, if we the buffer is large and not all of the data is flushed in
a write call, we will do excess copies. This is something we can
explore in the future.
This commit moves back to use explicit dependsOn for test tasks on
check. Not all tasks extending RandomizedTestingTask should be run by
check directly.
* Add deprecation warnings to `Rest*TermVectorsAction`, plus tests in `Rest*TermVectorsActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Update documentation (for both the REST API and Java HLRC).
* For each REST yml test, create one version without types, and another legacy version that retains types (called *_with_types.yml).
We have a few places where we register license state listeners on
transient components (i.e., resources that can be open and closed during
the lifecycle of the server). In one case (the opt-out query cache) we
were never removing the registered listener, effectively a terrible
memory leak. In another case, we were not un-registered the listener
that we registered, since we were not referencing the same instance of
Runnable. This commit does two things:
- introduces a marker interface LicenseStateListener so that it is
easier to identify these listeners in the codebase and avoid classes
that need to register a license state listener from having to
implement Runnable which carries a different semantic meaning than
we want here
- fixes the two places where we are currently leaking license state
listeners
This commit hides ClusterStates that have a STATE_NOT_RECOVERED_BLOCK from
ClusterStateAppliers. This is needed, because some appliers, such as IngestService, rely on
the fact, that cluster states with STATE_NOT_RECOVERED_BLOCK won't contain anything useful.
Once the state is recovered it's fully available for the appliers. This commit also switches many of
the remaining tests that require state persistence/recovery from Zen1 to Zen2.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
This is a follow-up to #36086. It renames the internal repository
actions to be prefixed by "internal". This allows the system user to
execute the actions.
Additionally, this PR stops casting Client to NodeClient. The client we
have is a NodeClient so executing the actions will be local.
and replaced poll interval setting with a hardcoded poll interval.
The hard coded interval will be removed in a follow up change to make
use of cluster state API's wait_for_metatdata_version.
Before the auto following was bootstrapped from thread pool scheduler,
but now auto followers for new remote clusters are bootstrapped when
a new cluster state is published.
Originates from #35895
Relates to #33007
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
This commit replaces usages of Streamable with Writeable for the
BaseTasksResponse / TransportTasksAction classes and subclasses of
these classes.
Note that where possible response fields were made final.
Relates to #34389
The current response format is:
```
{
"pattern1": {
...
},
"pattern2": {
...
}
}
```
The new format is:
```
{
"patterns": [
{
"name": "pattern1",
"pattern": {
...
}
},
{
"name": "pattern2",
"pattern": {
...
}
}
]
}
```
This format is more structured and more friendly for parsing and generating specs.
This is a breaking change, but it is better to do this now while ccr
is still a beta feature than later.
Follow up from #36049
Made credentials mandatory for xpack migrate tool.
Closes#29847.
The x-pack user and roles APIs aren't available unless security is enabled, so the tool should always be called with the -u and -p options specified.
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
This commit makes `document`, `update`, `explain`, `termvectors` and `mapping`
typeless APIs work on indices that have a type whose name is not `_doc`.
Unfortunately, this needs to be a bit of a hack since I didn't want calls with
random type names to see documents with the type name that the user had chosen
upon type creation.
The `explain` and `termvectors` do not support being called without a type for
now so the test is just using `_doc` as a type for now, we will need to fix
tests later but this shouldn't require further changes server-side since passing
`_doc` as a type name is what typeless APIs do internally anyway.
Relates #35190
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.
Closes#33826
AutoFollowCoordinator should take into account that after auto following
an index and while updating that a leader index has been followed, that
the auto follow pattern may have been removed via delete auto follow patterns
api.
Also fixed a bug that when a remote cluster connection has been removed,
the auto follow coordinator does not die when it tries get a remote client for
that cluster.
Closes#35480
The rest interface for remove-policy-from-index API does not support
`_ilm/remove`, it requires that an `{index}` pattern be defined in
the URL path. This fixes the rest-api-spec to reflect the implementation
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
* Add deprecation for /_xpack/monitoring/_bulk in favor of /_monitoring/bulk
* Removed xpack from the rest-api-spec and tests
* Removed xpack from the Action name
* Removed MonitoringRestHandler as an unnecessary abstraction
* Minor corrections to comments
Relates #35958
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
Introduces a debug log message when a bind fails and a trace message
when a bind succeeds.
It may seem strange to only debug a bind failure, but failures of this
nature are relatively common in some realm configurations (e.g. LDAP
realm with multiple user templates, or additional realms configured
after an LDAP realm).
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
This change adds the support for rest_total_hits_as_int
in the watcher search inputs. Setting this parameter in the request
will transform the search response to contain the total hits as
a number (instead of an object).
Note that this parameter is currently a noop since #35849 is not
merged.
Closes#36008
This commit upgrades netty. This will close#35360. Netty started
throwing an IllegalArgumentException if a CompositeByteBuf is
created with < 2 components. Netty4Utils was updated to reflect this
change.
Some tests kill nodes and otherwise it would take 60s by default
for replicas to get allocated and that is longer than we wait
for getting in a green state in tests.
Relates to #35403
The NotificationService (base class for SlackService, HipchatService ...) has both dynamic
cluster settings and SecureSettings and builds the clients (Account) that are used to comm
with external services. This commit fixes an important bug about updating/reloading any
of these settings (both Secure and dynamic cluster). Briefly the bug is due to the fact that
both the secure settings as well as the dynamic node scoped ones can be updated
independently, but when constructing the clients some of the settings might not be visible.
In the current implementation, there is a time between the
ShrinkStep and the ShrinkSetAliasStep that both the source and
target indices will be present with the same aliases. This means
that queries to during this time will query both and return
duplicates. This fixes that scenario by moving the alias inheritance
to the same aliases update request as is done in ShrinkSetAliasStep
This commit improves the efficiency of exact index name matching by
separating exact matches from those that include wildcards or regular
expressions. Internally, exact matching is done using a HashSet instead
of adding the exact matches to the automata. For the wildcard and
regular expression matches, the underlying implementation has not
changed.
The support for rest_total_hits_as_int has already been merged to 6x
in #35848 so this change adds this new option to master. The plan was
to add this new option as part of #35848 but we've decided to wait a few
days before merging this breaking change so this commit just handles
the new option as a noop exactly like 6x for now. This will allow
users to migrate to this parameter before #35848 is merged.
Relates #33028
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.
The ping interval configuration is moved to the ConnectionProfile.
The server channel now responds to pings. This makes the keepalive
pings bidirectional.
On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
This commits changes the serialization version from V_7_0_0 to
v_6_6_0 for the authenticate API response now that the work to add
the realm info in the response has been backported to 6.x in
b515ec7c9b9074dfa2f5fd28bac68fd8a482209e
Relates #35648
This change adds an extra check that verifies that all primary shards
have been started of an index that is about to be auto followed.
If not all primary shards have been started for an index
then the next auto follow run will try to follow to auto follow
this index again.
Closes#35480
In #30241 Realm settings were changed, but the Kerberos realm settings
were not registered correctly. This change fixes the registration of
those Kerberos settings.
Also adds a new integration test that ensures every internal realm can
be configured in a test cluster.
Also fixes the QA test for kerberos.
Resolves: #35942
Right now using the `GET /_tasks/<taskid>` API and causing a task to opt
in to saving its result after being completed requires permissions on
the `.tasks` index. When we built this we thought that that was fine,
but we've since moved towards not leaking details like "persisting task
results after the task is completed is done by saving them into an index
named `.tasks`." A more modern way of doing this would be to save the
tasks into the index "under the hood" and to have APIs to manage the
saved tasks. This is the first step down that road: it drops the
requirement to have permissions to interact with the `.tasks` index when
fetching task statuses and when persisting statuses beyond the lifetime
of the task.
In particular, this moves the concept of the "origin" of an action into
a more prominent place in the Elasticsearch server. The origin of an
action is ignored by the server, but the security plugin uses the origin
to make requests on behalf of a user in such a way that the user need
not have permissions to perform these actions. It *can* be made to be
fairly precise. More specifically, we can create an internal user just
for the tasks API that just has permission to interact with the `.tasks`
index. This change doesn't do that, instead, it uses the ubiquitus
"xpack" user which has most permissions because it is simpler. Adding
the tasks user is something I'd like to get to in a follow up change.
Instead, the majority of this change is about moving the "origin"
concept from the security portion of x-pack into the server. This should
allow any code to use the origin. To keep the change managable I've also
opted to deprecate rather than remove the "origin" helpers in the
security code. Removing them is almost entirely mechanical and I'd like
to that in a follow up as well.
Relates to #35573
When the grouping key of a GROUP BY is a painless script (functions are
involved), the data type of the key was incorrect in certain cases
(Boolean, IP, Date). This resulted in returning wrong data type for this
columns in the query results. E.g.:
```
SELECT COUNT(*), a > 10 AS a FROM t GROUP BY a
```
Fixes: #35662
Move classes under the same package to avoid internal classes being
exposed to the outside. Remove public visibility outside 3 classes:
EsDriver, EsDataSource and EsTypes.
The driver only has one package, namely org.elasticsearch.xpack.sql.jdbc
Use Es prefix for classes to ease name conflict and indicate their
destination
Fix#35437
This change removes the deprecated useDisMax() and useAllFields() methods from
the QueryStringQueryBuilder and related tests. The disMax parameter has already
been a no-op since 6.0 and also the useAllFields has been deprecated since 6.0
and there is a direct replacement via defaultField.
Clients can use the Kerberos V5 security mechanism and when it
used this to establish security context it failed to do so as
Elasticsearch server only accepted Spengo mechanism.
This commit adds support to accept Kerberos V5 credentials
over spnego.
Closes#34763
- Add the authentication realm and lookup realm name and type in the response for the _authenticate API
- The authentication realm is set as the lookup realm too (instead of setting the lookup realm to null or empty ) when no lookup realm is used.
* [Rollup] Add more diagnostic stats to job
To help debug future performance issues, this adds the
min/max/avg/count/total latencies (in milliseconds) for search
and bulk phase. This latency is the total service time including
transfer between nodes, not just the `took` time.
It also adds the count of search/bulk failures encountered during
runtime. This information is also in the log, but a runtime counter
will help expose problems faster
* review cleanup
* Remove dead ParseFields
This changes the exporter code -- most notably the `http` exporter --
to use async operations throughout the resource management and bulk
initialization code (the bulk indexing of monitoring documents was
already async).
As part of this change, this does change one semi-core aspect of the
`HttpResource` class in that it will no longer block all concurrent calls
until the first call completes with
`HttpResource::checkAndPublishIfDirty`.
Now, any parallel attempts to check the resources will be skipped until
the first call completes (success or failure). While this is a technical
change, it has very little practical impact because the existing behavior
was either quick success (then every blocked request processed) or
each request timed out and failed anyway, thus being effectively
skipped (and a burden on the system).
step times were set. The assumption was that these are always set.
Tests passed, which led me to believe this was true. There is a time
when shrunk indices have their step phase/action/step details set,
but with no time information (in the CopyExecutionStateStep).
Explain API fails for these
This commit removes the use of AbstractComponent in xpack where it was
still being extended. It has been replaced with explicit logger
declarations.
See #34488
This commit is related to #32517. It allows an "sni_server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. Prior to this commit, this functionality
was only support for the netty transport. This commit adds this
functionality to the security nio transport.
added validation for complete information of step details.
also changed the rendering of explain responses so null strings are not rendered
Another thing that I changed is the format of the client-side response. I found it difficult to maintain the two subtly-different objects, so I migrated the usage of long for the fields, to Long (just as it is on the server-side).
The trigger engine did always create a new schedule data structure, when
the watcher indexing listener called an add. However the indexing
listener also called add, when the watch status was updated. This means,
that upon a watch status update the watch got retriggered, potentially
waiting a defined interval from the watch status update onwards, instead
of waiting from the last run.
This commit only updates the schedule in the trigger engine, if it
actually has changed, otherwise the existing schedule will not be
touched. This has two results
1. If a watch is updated by an execution, the existing interval will not
be touched (meaning the scheduled time will not move forward).
2. If a watch is updated by a user, but the schedule is not changed, it
will not be reset from the update (for example starting to count from 5
minutes again, if the interval was set to 5 minutes).
Furthermore some minor cleanups were applied, making variables final in
the ctor, preventing double creation of variables.
`SIGN` and `RADIANS` where wrongly overriding `mathFunction()`.
Converted `mathFunction()` to private in `MathFunction` since it
shouldn't be overriden, as it uses the assigned `MathOperation`
to get the funtion name for painless scripts.
Fixes: #35654
Add special verifier rule to check that the arguments of conditional
functions are of the same or compatible types. This way the user gets
a descriptive error message with line number and column indicating
where is the offending argument.
Closes: #35907
This commit adds a test for handling correctly all they possible
`SamlPrepareAuthenticationRequest` parameter combinations that
we might get from Kibana or a custom web application talking to the
SAML APIs.
We can match the correct SAML realm based either on the realm name
or the ACS URL. If both are included in the request then both need to
match the realm configuration.
This generates a synthesized "id" for each incoming request that is
included in the audit logs (file only).
This id can be used to correlate events for the same request (e.g.
authentication success with access granted).
This request.id is specific to the audit logs and is not used for any
other purpose
The request.id is consistent across nodes if a single request requires
execution on multiple nodes (e.g. search acros multiple shards).
When assertions are enabled, a Put User action that have no effect (a
noop update) would trigger an assertion failure and shutdown the node.
This change accepts "noop" as an update result, and adds more
diagnostics to the assertion failure message.
This commit adds back bundling of all deps of the sql jdbc jar. This was
lost in a refactoring of how the shadow plugin is handled for the entire
elasticsearch project.
This removes the option to run a cluster without enforcing the
cluster-wide shard limit, making strict enforcement the default and only
behavior. The limit can still be adjusted as desired using the cluster
settings API.
Add GREATEST(expr1, expr2, ... exprN) and LEAST(expr1, expr2, exprN)
functions which are in the family of CONDITIONAL functions.
Implementation follows PostgreSQL behaviour, so the functions return
`NULL` when all of their arguments evaluate to `NULL`.
Renamed `CoalescePipe` and `CoalesceProcessor` to `ConditionalPipe` and
`ConditionalProcessor` respectively, to be able to reuse them for
`Greatest` and `Least` evaluations. To achieve that `ConditionalOperation`
has been added to differentiate between the functionalities at execution
time.
Closes: #35878
Due to some unresolvable type conflict between the expected definition
in JDBC vs ODBC, the driver mode is now passed over so that certain
command can change their results accordingly (in this case SYS COLUMNS)
Fix#35376
This operator handles nulls in different way than the normal `=`.
If one of the operants is `null` and the other not it returns `false`.
If both operants are `null` it returns `true`. Therefore in contrary to
`=`, which returns `null` if at least one of the operants is `null`, this one
never returns `null` as a result.
Closes: #35871
Code that operates on-top of the engine requires all readers returned to be
unwrapped into ElasticsearchDirectoryReader. The special reader
the FrozenEngine uses wasn't wrapped.
We didn't check that the ExplainLifecycleRequest was constructed with at least
one index before, now that we do we must also make sure the tests
mutateInstance() method used in equals/hashCode checks doesn't accidentally
create an empty index array.
Closes#35822
When there is no persistent tasks metadata we could hit a null pointer
exception when executing a follower stats request. This is because we
inspect the persistent tasks metadata. Yet, if no tasks have been
registered, this is null (as opposed to empty). We need to avoid
de-referencing the persistent tasks metadata in this case. That is what
this commit does, and we add a test for this situation.
By setting the cron to 2017, we ensure it won't trigger. This makes it
easier to test because we know the job will _only_ be in STARTED,
and we can ignore INDEXING states due to transient triggers.
Closes#35779
Today we have a way to atomically persist global MetaData and
IndexMetaData to disk when new ClusterState is received. All other
ClusterState fields are not persisted.
However, there are other parts of ClusterState that should be
persisted, namely:
version
term
lastCommittedConfiguration
lastAcceptedConfiguration
votingTombstones
version is changed frequently, other fields are not. We decided
to group term, lastCommittedConfiguration,
lastAcceptedConfiguration and votingTombstones into
CoordinationMetaData class and make CoordinationMetaData a field
inside MetaData.
MetaData.toXContent and MetaData.fromXContent should take care of
CoordinationMetaData.
version stays as a top level field in ClusterState and will be
persisted as part of Manifest in a follow-up commit.
Also MetaData.isGlobalStateEquals should be extended to include
coordinationMetaData in comparison.
This commit favors exposing getters, such as getTerm directly in
ClusterState to avoid massive code changes.
An example of CoordinationMetaState.toXContent:
{
"term": 1,
"last_committed_config": [
"TiIuBcbBtpuXyDDVHXeD",
"ZIAoVbkjjLPLUuYLaTkw"
],
"last_accepted_config": [
"OwkXbXZNOZPJqccdFHdz",
"LouzsGYwmQzpeQMrboZe",
"fCKGRZdjLTqzXAqPUtGL",
"pLoxshjpJXwDhbgjfYJy",
"SjINLwFIlIEFZCbjrSFo",
"MDkVncJEVyZLJktopWje"
]
}
Move away from performing eager, fail-fast validation of mismatched
mapping to a lazy evaluation based on the fields actually used in the
query. This allows queries to run on the parts of the indices that
"work" which is not just convenient but also a necessity for large
mappings (like logging) where alignment is hard/impossible to achieve.
Fix#35659
Creates the manage_token cluster privilege and adds it to the
kibana_system role. This is required if kibana were to use the token
service for its authenticator process.
Because kibana_system already has manage_saml this effectively
only adds the privilege to create tokens.
Introduce INTERVAL as a DataType
Add INTERVAL to the grammar which supports the standard SQL declaration
(without precision):
> INTERVAL '1 23:45:01.123456789' DAY TO SECOND
but also number for single unit intervals:
> INTERVAL 1 YEAR
as well as the plurals of the units:
> INTERVAL 2 YEARS
Interval are internally supported as just another Literal being backed
by java.time.Period and java.time.Duration
Move JDBC away from JDBCType enum to SQLType interface
Refactor DataType by moving it into server core and adding dedicated (and
much simpler) JDBC driver type
Improve internal JDBC conversion by normalizing on the DataType
Rename JDBC columnInfo to JdbcColumnInfo to differentiate between it and
the SQL ColumnInfo
Fix#29990
This commit removes the parsing code from the PutLicenseResponse server variant, and the toXContent portion from the corresponding client variant.
Relates to #35547
This commit removes the parsing code from the PostStartBasicResponse server variant. It also makes the server response implement StatusToXContent which allows us to save a couple of lines of code in the corredponding REST action.
Relates to #35547
The RestHasPrivilegesAction previously handled its own XContent
generation. This change moves that into HasPrivilegesResponse and
makes the response implement ToXContent.
This allows HasPrivilegesResponseTests to be used to test
compatibility between HLRC and X-Pack internals.
A serialization bug (cluster privs) was also fixed here.
* The port assigned to all loopback interfaces doesn't necessarily have to be the same for ipv4 and ipv6
=> use actual address from profile instead of just port + loopback in test
* Closes#35584
This parameter in the `query_string` query was deprecated in 6.0 and ignored
since then. Its API methods and remaining uses can be removed in the upcoming
major version.
Relates to #35734
This commit adds a rest endpoint for freezing and unfreezing an index.
Among other cleanups mainly fixing an issue accessing package private APIs
from a plugin that got caught by integration tests this change also adds
documentation for frozen indices.
Note: frozen indices are marked as `beta` and available as a basic feature.
Relates to #34352
Currently there is a common NPE in the IndexFollowingIT that does not
indicate the test failing. This is when a cluster state listener is
called and certain index metadata is not yet available.
This commit checks that the metadata is not null before performing the
logic that depends on the metadata.
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
* ML: Removing result_finalization_window && overlapping_buckets
* Reverting bad method deletions
* Setting to current before backport to try and get a green build
* fixing testBuildAutodetectCommand test
* disabling bwc tests for backport
Fix bug in Analyzer that caused it to report unsupported fields only
when declared in projections. The rule has been extended to all field
declarations.
Fix#35673
RolloverStep previously had a name of "attempt_rollover", which was
inconsistent with all other step names due it its use of an underscore
instead of a dash.
This commit switches from using java util's default timezone method to
using joda. The former can cause problems when the string representation
of the timezone is unknown to joda.
closes#35518
Removed extending of AbstractComponent and changed logger usage to
explicit declaration. Abstract classes still have logger
declaration using this.getClass() in order to show implementation class
name in its logs.
See #34488
RolloverAction will now periodically check the rollover conditions using
the Rollover API with the dry_run option as an AsyncWaitStep, then run
the rollover itself by calling the Rollover API with no conditions,
which will always roll over, as an AsyncActionStep. This will resolve
race condition issues in policies using RolloverAction.
Today, the bootstrapping of a Zen2 cluster is driven externally, requiring
something else to wait for discovery to converge and then to inject the initial
configuration. This is hard to use in some situations, such as REST tests.
This change introduces the `ClusterBootstrapService` which brings the bootstrap
retry logic within each node and allows it to be controlled via an (unsafe)
node setting.
* ML: Adding missing datacheck to datafeedjob
* Adding client side and docs
* Making adjustments to validations
* Making values default to on, having more sensible limits
* Intermittent commit, still need to figure out interval
* Adjusting delayed data check interval
* updating docs
* Making parameter Boolean, so it is nullable
* bumping bwc to 7 before backport
* changing to version current
* moving delayed data check config its own object
* Separation of duties for delayed data detection
* fixing checkstyles
* fixing checkstyles
* Adjusting default behavior so that null windows are allowed
* Mentioning the default value
* Fixing comments, syncing up validations
In the event that the target index does not exist when `CopyExecutionStateStep`
executes, this avoids a `NullPointerException` and provides a more helpful error
to the ILM user.
Resolves#35567
Kibana now uses the tasks API to manage automatic reindexing of the
.kibana index during upgrades.
The implementation of the tasks API requires that
1. the user executing the task can create & write to the ".tasks" index
2. the user checking on the status of the task can read (Get) the
relevant document from the ".tasks" index
Response classes in Elasticsearch (and xpack) only need to implement ToXContent, which is needed to print their output put in the REST layer and return the response in json (or others) format. On the other hand, response classes that are added to the high-level REST client, need to do the opposite: parse xcontent and create a new object based on that.
This commit removes the parsing code from the XPackInfoResponse server variant, and the toXContent portion from the corresponding client variant. It also removes a client specific test class that looks redundant now that we have a single test class for both classes.
This pull request replaces some blocks of code that must be run once
and that are currently based on AtomicBoolean by the convient RunOnce
class added in #35489.
The DefaultAuthenticationFailureHandler has a deprecated constructor
that was present to prevent a breaking change to custom realm plugin
authors in 6.x. This commit removes the constructor and its uses.
For some time, the PutUser REST API has supported storing a pre-hashed
password for a user. The change adds validation and tests around that
feature so that it can be documented & officially supported.
It also prevents the request from containing both a "password" and a "password_hash".
This adds a `wait_for_completion` flag which allows the user to block
the Stop API until the task has actually moved to a stopped state,
instead of returning immediately. If the flag is set, a `timeout` parameter
can be specified to determine how long (at max) to block the API
call. If unspecified, the timeout is 30s.
If the timeout is exceeded before the job moves to STOPPED, a
timeout exception is thrown. Note: this is just signifying that the API
call itself timed out. The job will remain in STOPPING and evenutally
flip over to STOPPED in the background.
If the user asks the API to block, we move over the the generic
threadpool so that we don't hold up a networking thread.
This change adds a special caching reader that caches all relevant
values for a range query to rewrite correctly in a can_match phase
without actually opening the underlying directory reader. This
allows frozen indices to be filtered with can_match and in-turn
searched with wildcards in a efficient way since it allows us to
exclude shards that won't match based on their date-ranges without
opening their directory readers.
Relates to #34352
Depends on #34357
The NPE would occur if should_trim_field was overridden to
true and any field value was completely blank. This change
defends against this situation.
Fixes#35462
Before, moving to a failed step would only change the step info
to be that of the failed step. This means two things.
1. Async Steps would never be triggered to execute
2. If there are inherent problems with the action definition that can
be fixed with a policy update, these changes were not being reflected
by the new execution info.
Changes now
1. Async steps are executed after the move to the failed step in cluster state
2. the lifecycle execution info's phase definition is updated from the current
latest policy definition, even though the index isn't moving to a new phase.
Closes#35397.
avoid the assertions that check the log files, because that does not work on Windows.
The rest of the test is still useful and should work on Windows CI.
Currently on Windows CI this qa module fails because there is just one test and
that test si ignored if OS is Windows.
This change adds a high level freeze API that allows to mark an
index as frozen and vice versa. Indices must be closed in order to
become frozen and an open but frozen index must be closed to be
defrosted. This change also adds a index.frozen setting to
mark frozen indices and integrates the frozen engine with the
SearchOperationListener that resets and releases the directory
reader after and before search phases.
Relates to #34352
Depends on #34357
Validate remote cluster license as part of put auto follow pattern api call
in addition of validation that when auto follow coordinator starts auto
following indices in the leader cluster.
Also added qa module that tests what happens to ccr after downgrading to basic license.
Existing active follow indices should remain to follow,
but the auto follow feature should not pickup new leader indices.
Add `IsNull` node in parser to simplify expressions so that `<value> IS NULL` is
no longer translated internally to `NOT(<value> IS NOT NULL)`
Replace `IsNotNullProcessor` with `CheckNullProcessor` to encapsulate both
isNull and isNotNull functionality.
Closes: #34876Fixes: #35171
Many realm tests were written to use separate setting objects for
"global settings" and "realm settings".
Since #30241 there is no distinction between these settings, so these
tests can be cleaned up to use a single Settings object.
Change `nullable()` logic of AND and OR to false since in the Optimizer
we cannot fold to null as we might be handling and expression in the
SELECT clause.
Introduce folding of null for AND and OR expressions in PruneFilter()
since we now know that we are in HAVING or WHERE clause and we
can fold `null` to `false`
Fixes: #35088
Co-authored-by: Costin Leau <costin.leau@gmail.com>
This is related to #34483. It introduces a namespaced setting for
compression that allows users to configure compression on a per remote
cluster basis. The transport.tcp.compress remains as a fallback
setting. If transport.tcp.compress is set to true, then all requests
and responses are compressed. If it is set to false, only requests to
clusters based on the cluster.remote.cluster_name.transport.compress
setting are compressed. However, after this change regardless of any
local settings, responses will be compressed if the request that is
received was compressed.
Today our OS information returned in node stats only returns a
high-level name of the OS (e.g., "Linux"). Yet, for some uses this is
too high-level and knowing at a finer level of granularity the
underlying OS can be useful. This commit extracts the pretty name on
Linux from /etc/os-release. This pretty name usually includes the Linux
vendor and the Linux vendor version number (e.g., Fedora 28).
- Introduces a transport API for bootstrapping a Zen2 cluster
- Introduces a transport API for requesting the set of nodes that a
master-eligible node has discovered and for waiting until this comprises the
expected number of nodes.
- Alters ESIntegTestCase to use these APIs when forming a cluster, rather than
injecting the initial configuration directly.
Adjust list of dynamic index settings that should be replicated
and added a test that verifies whether builtin dynamic index settings
are classified as replicated or non replicated (whitelisted).
This commit uses the index settings version so that a follower can
replicate index settings changes as needed from the leader.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
There is no longer a concept of non-global "realm settings". All realm
settings should be loaded from the node's settings using standard
Setting classes.
This change renames the "globalSettings" field and method to simply be
"settings".
The file realm has not supported custom filenames/locations since at
least 5.0, but this test still tried to configure them.
Remove all configuration of file locations, and cleaned up a few other
warnings and deprecations
Ensure that Watcher is correctly started and stopped between tests for
SmokeTestWatcherWithSecurityIT,
SmokeTestWatcherWithSecurityClientYamlTestSuiteIT,
SmokeTestWatcherTestSuiteIT, WatcherRestIT,
XDocsClientYamlTestSuiteIT, and XPackRestIT
The change here is to throw an `AssertionError` instead of `break;` to
allow the `assertBusy()` to continue to busy wait until the desired
state is reached.
closes#33291, closes#29877, closes#34462, closes#30705, closes#34448
Since it's still possible to shrink an index when replicas are unassigned, we
should not check that all copies are available when performing the shrink, since
we set the allocation requirement for a single node.
Resolves#35321
* [ILM] Check shard and relocation status in AllocationRoutedStep
This is a follow-up from #35161 where we now check for started and relocating
state in `AllocationRoutedStep`.
Resolves#35258
This change adds a `frozen` engine that allows lazily open a directory reader
on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader
that also allows to release and reset the underlying index readers after any and before
secondary search phases.
Relates to #34352
An auto follow pattern:
* cannot start with `_`
* cannot contain a `,`
* can be encoded in UTF-8
* the length of UTF-8 encoded bytes is no longer than 255 bytes
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
Grammar's identifiers can be completely skipped from counting depths
as they just add another level to the tree and they are always children
of some other expression which gets counted.
Increased maximum depth from 100 to 200. After testing on production
configuration with -Xss1m, depths of at least 250 can be used, so being
conservative we put the limit lower.
Fixes: #35299
The elasticsearch-croneval CLI tool uses local dates to display when
something gets triggered the next time. This is very confusing.
This commit ensures, that UTC and local timezone times will be written
out.
The output looks like this and contains localized dates for each trigger
date as well as for `now`.
Now is [Tue, 28 Aug 2018 17:23:51 +0000] in UTC, local time is [ᏔᎵᏁ, 28 ᎦᎶ 2018 12:23:51 -0500]
Here are the next 10 times this cron expression will trigger:
1. Mon, 2 Jan 2040 11:00:00 +0000
ᏉᏅᎯ, 2 ᎤᏃ 2040 06:00:00 -0500
2. ...
This also removes an old outstanding TODO to use the jopt parsing to
cast the count to an integer instead of doing it ourselves.
In order to start shard follow tasks, the resume follow api already
needs execute N requests to the elected master node.
The pause follow API is also a master node action, which would make
how both APIs execute more consistent.
This is related to #29023. Additionally at other points we have
discussed a preference for removing the need to unnecessarily block
threads for opening new node connections. This commit lays the groudwork
for this by opening connections asynchronously at the transport level.
We still block, however, this work will make it possible to eventually
remove all blocking on new connections out of the TransportService
and Transport.
The remove-ilm-from-index API was using the DELETE http method
to signify that something is being removed. Although, metadata
about ILM for the index is being deleted, no entity/resource
is being deleted during this operation. POST is more in line with
what this API is actually doing, it is modifying the metadata for
an index. As part of this change, `remove` is also appended to the path
to be more explicit about its actions.
Error was thrown if leader index had no soft deletes enabled, but it then continued creating the follower index.
The test caught this bug, but very rarely due to timing issue.
Build failure instance:
```
1> [2018-11-05T20:29:38,597][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] before test
1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
1> [2018-11-05T20:29:38,599][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [[]] to [["127.0.0.1:9300"]]
1> [2018-11-05T20:29:38,609][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s_0] [leader-index] creating index, cause [api], templates [random-soft-deletes-templat
e, one_shard_index_template], shards [2]/[0], mappings []
1> [2018-11-05T20:29:38,628][INFO ][o.e.c.r.a.AllocationService] [node_s_0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[leader-
index][0]] ...]).
1> [2018-11-05T20:29:38,660][INFO ][o.e.x.c.a.TransportPutFollowAction] [node_s_0] [follower-index] creating index, cause [ccr_create_and_follow], shards [2]/[0]
1> [2018-11-05T20:29:38,675][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
1> [2018-11-05T20:29:38,676][INFO ][o.e.c.s.ClusterSettings ] [node_s_0] updating [cluster.remote.local.seeds] from [["127.0.0.1:9300"]] to [[]]
1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] after test
1> [2018-11-05T20:29:38,678][INFO ][o.e.x.c.LocalIndexFollowingIT] [testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes] [LocalIndexFollowingIT#testDoNotCreateFoll
owerIfLeaderDoesNotHaveSoftDeletes]: cleaning up after test
1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [follower-index/TlWlXp0JSVasju2Kr_hksQ] deleting index
1> [2018-11-05T20:29:38,678][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s_0] [leader-index/FQ6EwIWcRAKD8qvOg2eS8g] deleting index
FAILURE 0.23s J0 | LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes <<< FAILURES!
> Throwable #1: java.lang.AssertionError:
> Expected: <false>
> but: was <true>
> at __randomizedtesting.SeedInfo.seed([7A3C89DA3BCA17DD:65C26CBF6FEF0B39]:0)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.elasticsearch.xpack.ccr.LocalIndexFollowingIT.testDoNotCreateFollowerIfLeaderDoesNotHaveSoftDeletes(LocalIndexFollowingIT.java:83)
> at java.lang.Thread.run(Thread.java:748)
```
Build failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.5+intake/46/console
Override `process()` in `BinaryLogicProcessor` which doesn't immediately
return null if left or right argument is null, which is the behaviour of
`process()` of the parent class `BinaryProcessor`.
Also, add more tests for `AND` and `OR` in SELECT clause with literal.
Fixes: #35240
Today if a wildcard, date-math expression or alias expands/resolves
to an index that is search-throttled we still search it. This is likely
not the desired behavior since it can unexpectedly slow down searches
significantly.
This change adds a new indices option that allows `search`, `count`
and `msearch` to ignore throttled indices by default. Users can
force expansion to throttled indices by using `ignore_throttled=true`
on the rest request to expand also to throttled indices.
Relates to #34352
This moves all Realm settings to an Affix definition.
However, because different realm types define different settings
(potentially conflicting settings) this requires that the realm type
become part of the setting key.
Thus, we now need to define realm settings as:
xpack.security.authc.realms:
file.file1:
order: 0
native.native1:
order: 1
- This is a breaking change to realm config
- This is also a breaking change to custom security realms (SecurityExtension)
This is a forward port of some of the changes made in #8445, specifically the change mentioned in https://github.com/elastic/elasticsearch/pull/34023#issuecomment-433212636.
Currently, in master, the `cluster_stats` collector collects _all_ cluster metadata and indexes it into `.monitoring-es-*`. However, per the discussion linked to above, we decided to collect _only_ the `display_name` cluster metadata setting for now. This PR makes this change.
Add NotEquals node in parser to simplify expressions so that <value1> != <value2> is
no longer translated internally to NOT(<value1> = <value2>)
Closes: #35210Fixes: #35233
This adds a new step for checking whether an index is allocated correctly based
on the rules added prior to running the shrink step. It also fixes a bug where
for shrink we are not allowed to have the shards relocating for the shrink step.
This also allows us to simplify AllocationRoutedStep and provide better
feedback in the step info for why either the allocation or the shrink checks
have failed.
Resolves#34938
If the Rollover step would fail due to the next index in sequence
already existing, just skip to the next step instead of going to the
Error step.
This prevents spurious `ResourceAlreadyExistsException`s created by
simultaneous RolloverStep executions from causing ILM to error out
unnecessarily.
This commit removes the Joda time usage from ILM and the HLRC components of ILM.
It also fixes an issue where using the `?human=true` flag could have caused the
parser not to work. These millisecond fields now follow the standard we use
elsewhere in the code, with additional fields added iff the `human` flag is
specified.
This is a breaking change for ILM, but since ILM has not yet been released, no
compatibility shim is needed.
The suite FollowerFailOverIT is failing because some documents are not
replicated to the follower. Maybe the FollowTask is not working as
expected or the background indexers eat all resources while the follower
cluster is trying to reform after a failover; then CI is not fast enough
to replicate all the indexed docs within 60 seconds (sometimes I see 80k
docs on the leader).
This commit limits the number of documents to be indexed into the leader
index by the background threads so that we can eliminate the latter
case. This change also replaces a docCount assertion with a docIds
assertion so we can have more information if these tests fail again.
Relates #33337
Previously, testRunStateChangePolicyWithNextStep asserted that the
ClusterState before and after running the steps were equal. The test
only passed due to a race condition: The latch would be triggered by the
step execution, but the cluster state update thread would continue
running before committing the change to the cluster state. This allowed
the test to read the old cluster state and pass the equality check about
99.99% of the time.
The test now waits for the new cluster state to be committed before
checking that it is _not_ equal to the old cluster state.
Include `null` literals when generating the painless script for `IN` expressions.
Previously, they were skipped, because of an issue that had been fixed with #35108.
Fixes: #35122
Only the response classes of get auto follow pattern, the follow and stats APIs
were moved away from Streamable. The other APIs use `AcknowledgedResponse`
or `BaseTasksResponse` as response class and
moving that class away from Streamable is a bigger change.
ILM's Shrink Action was using a nodes "_name" attribute to
allocate to prepare for the shrink step. Since the name is
configurable by a user and may use the same name for
multiple nodes on one machine, _id is safer since it is guaranteed
to be unique.
closes#35043.
* Adding rollup support for datafeeds
* Fixing tests and adjusting formatting
* minor formatting chagne
* fixing some syntax and removing redundancies
* Refactoring and fixing failing test
* Refactoring, adding paranoid null check
* Moving rollup into the aggregation package
* making AggregationToJsonProcessor package private again
* Addressing test failure
* Fixing validations, chunking
* Addressing failing test
* rolling back RollupJobCaps changes
* Adding comment and cleaning up test
* Addressing review comments and test failures
* Moving builder logic into separate methods
* Addressing PR comments, adding test for rollup permissions
* Fixing test failure
* Adding rollup priv check on datafeed put
* Handling missing index when getting caps
* Fixing unused import
* Watcher: fix metric stats names
The current watcher stats metric names doesn't match the current
documentation. This commit fixes the behavior of `queued_watches`
metric, deprecates `pending_watches` metric and adds `current_watches`
to match the documented behavior. It also fixes the documentation, which
introduced `executing_watches` metric that was never added.
Fixes#34865
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of values provided
by the user.
Move In to the same package as InPipe & InProcessor
Follow up to #34750
Co-authored-by: Costin Leau <costin.leau@gmail.com>
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts
files each time a node starts its transport service. This does mean that a
number of nodes can start and perform their first pinging round without any
unicast hosts which, if the timing is unlucky and a lot of nodes are all
started at the same time, can lead to a split brain as in #35052.
Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider
would always have yielded the existing hosts, so the timing would have to have
been implausibly unlucky. Since #33554, however, it's more likely because the
race occurs between the start of the first round of pinging and the writing of
the unicast hosts file. It is realistic that new nodes will be configured with
the existing nodes from startup, so this change reinstates that behaviour.
Closes#35052.
This commit does a few things
- moves ILM-specifc rest yaml tests into plugin/ilm/qa, and creates special
:plugin:ilm:qa:rest module to test them
- removes the with-security tests of the yaml tests since they are covered in
the rest tests now
- moves ChangePolicyforIndexIT into the qa/multi-node project since that test is
not currently running in main ilm since integTest is disabled
The java yaml test runner supports sending request headers, yet not all clients support headers. This commit makes sure that we enforce adding a skip section with feature "headers" whenever headers are used in a do section as part of a test. That decreases the chance for new tests to break client builds due to the missing skip section.
Closes#34650
The ILM Rollover Step can execute on the incorrect index if the rollover alias
exists on another valid index, but not the one the step is executing against. This
is a problem and is now guarded against
This PR renames the CRUD APIS for ILM
GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy
PUT _ilm/<policy> -> _ilm/policy/<policy>
DELETE _ilm/<policy> -> _ilm/policy/<policy>
closes#34929.
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.
This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
The native roles store previously would issue a search if attempting to
retrieve more than a single role. This is fine when we are attempting
to retrieve all of the roles to list them in the api, but could cause
issues when attempting to find roles for a user. The search is not
prioritized over other search requests, so heavy aggregations/searches
or overloaded nodes could cause roles to be cached as missing if the
search is rejected.
When attempting to load specific roles, we know the document id for the
role that we are trying to load, which allows us to use the multi-get
api for loading these roles. This change makes use of the multi-get api
when attempting to load more than one role by name. This api is also
constrained by a threadpool but the tasks in the GET threadpool should
be quicker than searches.
See #33205
Previously, if ClusterStateActionSteps or ClusterStateWaitSteps threw an
exception executing, the exception would only be caught and logged by
the generic ClusterStateUpdateTask machinery and the index would become
stuck on that step.
Now, exceptions thrown in these steps will be caught and the index will
be moved to the Error step.
Move `x-pack/qa/smoke-test-graph-with-security` to
`x-pack/plugin/graph/qa/security` which should make it easier to run all
of the tests with a single command. It also lines up the directories
more closely with newer projects like cross cluster replication.
Moves `x-pack/qa/sql/*` into `x-pack/plugin/sql/qa` to make it simpler
to run all of the sql tests. This lines up with how newer projects like
cross cluster replication are testing themselves.
This changes the RollupSearch endpoint to proactively resolve index
patterns. If the index pattern(s) match more than one rollup index,
an exception is throw as before. But if the pattern only matches one
rollup index, execution is allowed to continue (unlike before where
it would assume all patterns were for raw data).
This also allows the search endpoint to resolve aliases that point to
a rollup index.
Also tweaks the documentation to make this clear.
Closes#34828
Conflicts during the merge:
1. >=140 chars line length fixed for a lot of project files and warnings
for those files are no longer suppressed
2. Node name is removed from AbstractComponent, it’s no longer taken
from settings, but is explicitly passed as constructor argument and
there were quite a few new classes on zen2 branch that require this
change
3. TransportResponseHandler interface changed (new method added) and
Zen2 makes a lot of subclasses in tests
4. Deprecated way of obtaining logger was changed
Only the follow stats request couldn't be changed to use Writeable serialization,
because that requires changes in `TransportTasksAction` and `BaseTasksRequest` base classes.
Sometimes the cluster forming here will split-brain when it grows up to 5
nodes. This could be a timing issue or could be something going wrong in
discovery, so this asks for more logs. Relates #35052
The Move To Step API now checks to see if the target step is an
AsyncActionStep, and if so, runs it.
Previously, AsyncActionSteps would only be run when they are entered by
executing the previous step, so if an AsyncActionStep was entered via
the Move To Step API, ILM would never touch that index again.
This commit fixes two issues with the CCR API specification:
- remove the CCR stats endpoint, it is not currently implemented
- fix the documentation links
Extract data type verification for function arguments to a single place
so that NULL type can be treated as RESOLVED for all functions. Moreover
this enables to have consistent verification error messages for all functions.
Fixes: #34752Fixes: #33469
The file structure finder endpoint can find the NDJSON
(newline-delimited JSON) file format, but called it
`json`. This change renames the `format` for this file
structure to `ndjson`, which is more precise and will
hopefully avoid confusion.
* Changed the auto follow stats to also include follow stats.
* Renamed the auto follow stats api to stats api and changed its url path
from `/_ccr/auto_follow/stats` `/_ccr/stats`.
* Removed `/_ccr/stats` url path for the follow stats api, which makes
the index parameter a required parameter.
* Fixed docs.
SSLTrustRestrictionsTests.testRestrictionsAreReloaded checks that the
SSL trust configuration is automatically updated reapplied if the
underlying "trust_restrictions.yml" file is modified.
Since the default resource watcher frequency is 5seconds, it could
take 10 second to run that test (as it waits for 2 reloaded).
Previously this test set that frequency to a very low value (3ms) so
that the elapsed time for the test would be reduced. However this
caused other problems, including that the resource watcher would
frequently run while the cluster was shutting down and files were
being cleaned up.
This change resets that watch frequency back to its default (5s) and
then manually calls the "notifyNow" method on the resource watcher
whenever the restrictions file is modified, so that the SSL trust
configuration is reloaded at exactly the right time.
Resolves: #34502
Drops the `deprecationLogger` from `AbstractComponent`, moving it to
places where we need it. This saves us from building a bunch of
`DeprecationLogger`s that we don't need.
Relates to #34488
`AbstractComponent` is trouble because its name implies that
*everything* should extend from it. It *is* useful, but maybe too
broadly useful. The things it offers access too, the `Settings` instance
for the entire server and a logger are nice to have around, but not
really needed *everywhere*. The `Settings` instance especially adds a
fair bit of ceremony to testing without any value.
This removes the `nodeName` method from `AbstractComponent` so it is
more clear where we actually need the node name.
In order to remove Streamable from the codebase, Response objects need
to be read using the Writeable.Reader interface which this change
enables. This change enables the use of Writeable.Reader by adding the
`Action#getResponseReader` method. The default implementation simply
uses the existing `newResponse` method and the readFrom method. As
responses are migrated to the Writeable.Reader interface, Action
classes can be updated to throw an UnsupportedOperationException when
`newResponse` is called and override the `getResponseReader` method.
Relates #34389
We currently have two different native processes:
autodetect & normalizer. There are plans for introducing
a new process. All these share many things in common.
This commit refactors the processes to extend an
`AbstractNativeProcess` class that encapsulates those
commonalities with the purpose of reusing the code
for new processes in the future.
Index shard stats for the follower shard are fetched, when a shard follow task is started.
This is needed in order to bootstap the shard follow task with the follower global checkpoint.
Sometimes index shard stats are not available (e.g. during a restart) and
we fail now, while it is very likely that these stats will be available some time later.
This change ensures the `message` field is always included
in the `field_stats` for the semi-structured text log file
file structure. Previously it was not, as it will almost
certainly contain all distinct values. However, for
consistency in the UI it's useful to include it.
After discussing on the team's FixItFriday, we concluded that
static final instance variables that are mutable should be lowercased.
Historically, DeprecationLogger was uppercased more frequently than lowercased.
ILM would return a resource-not-found exception when requesting policies
while the IndexLifecycleMetaData is not initialized. The behavior here
should not be as extreme since it is not the user's fault.
This commit changes the behavior so that it succeeds and returns no policies
when no policy names are explicitely specified, otherwise keep the same behavior
of throwing an exception
Previously, for some queries the validation for ORDER BY
fields didn't kick in since a HAVING close or an ORDER BY
with scalar function would add `Filter` and `Project` plans
between the `OrderBy` and the `Aggregate`.
Now the LogicalPlan tree beneath `OrderBy` is traversed and
the ORDER BY fields are properly verified.
Fixes: #34590
This is related to #30876. The AbstractSimpleTransportTestCase initiates
many tcp connections. There are normally over 1,000 connections in
TIME_WAIT at the end of the test. This is because every test opens at
least two different transports that connect to each other with 13
channel connection profiles. This commit modifies the default
connection profile used by this test to 6. One connection for each
type, except for REG which gets 2 connections.
With the introduction of _ilm/stop and _ilm/start APIs, the
use cases where one would only target a select group
of indices to start/stop has been reduced. Since there is no
strong use-case for skipping specific indices, it is best to
remove this functionality and only adding if later desired, with the
hopes of keeping things more simple.
through randomization, there is a chance that the mutateInstance
for PolicyStatsTests does not actually mutate the original object.
This PR aims to fix this
* NETWORKING: Add SSL Handler before other Handlers
* The only way to run into the issue in #33998 is for `Netty4MessageChannelHandler`
to be in the pipeline while the SslHandler is not. Adding the SslHandler before any
other handlers should ensure correct ordering here even when we handle upstream events
in our own thread pool
* Ensure that channels that were closed concurrently don't trip the assertion
* Closes#33998
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
This limit is based on the size in bytes of the operations in the write buffer. If this limit is exceeded then no more read operations will be coordinated until the size in bytes of the write buffer has dropped below the configured write buffer size limit.
Renamed existing `max_write_buffer_size` to ``max_write_buffer_count` to indicate that limit is count based.
Closes#34705
Previously, `Mapper` was returning an incorrect plan which resulted in an
```
SQLFeatureNotSupportedException: Found 1 problem(s)
line 1:8: Unexecutable item
```
Queries with a `WHERE` and/or `HAVING` clause which results in NO_MATCH
are now handled correctly and return 0 rows.
Fixes: #34613
Co-authored-by: Costin Leau <costin.leau@gmail.com>
* Adds usage data for ILM
* Adds tests for IndexLifecycleFeatureSetUsage and friends
* Adds tests for IndexLifecycleFeatureSet
* Fixes merge errors
* Add number of indices managed to usage stats
Also adds more tests
* Addresses Review comments
We should not create a follower index and abort a follow request if the
leader does not have soft-deletes. Moreover, we also should not
auto-follow an index if it does not have soft-deletes.
* Removes Set Policy API in favour of setting index.lifecycle.name
directly
* Reinstates matcher that will still be used
* Cleans up code after rebase
* Adds test to check changing policy with ndex settings works
* Fixes TimeseriesLifecycleActionsIT after API removal
* Fixes docs tests
* Fixes case on close where lifecycle service was never created
* Adding stack_monitoring_agent role
* Fixing checkstyle issues
* Adding tests for new role
* Tighten up privileges around index templates
* s/stack_monitoring_user/remote_monitoring_collector/ + remote_monitoring_user
* Fixing checkstyle violation
* Fix test
* Removing unused field
* Adding missed code
* Fixing data type
* Update Integration Test for new builtin user
With this change, we apply the common test config automatically to all
newly created tasks instead of opting in specifically.
For plugin authors using the plugin externally this means that the
configuration will be applied to their RandomizedTestingTasks as well.
The purpose of the task is to simplify setup and make it easier to
change projects that use the `test` task but actually run integration
tests to use a task called `integTest` for clarity, but also because
we may want to configure and run them differently.
E.x. using different levels of concurrency.
Implemented null handling for both the value tested but also for
values inside the list of values tested against.
The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.
Closes: #34582
#33708 introduced a strict deprecation mode that makes a REST request
fail if there is a warning header in the response returned by
Elasticsearch (usually a deprecation message signaling that a feature
or a field has been deprecated).
This change adds the strict deprecation mode into the REST integration
tests, and makes the tests fail if a deprecated feature is used. Also
any test using a deprecated feature has been modified to pass the build.
The YAML integration tests already analyzed HTTP warnings so they do
not use this mode, keeping their "expected vs actual" behavior.
Per #31717 this commit changes the defaults to the following:
Batch size of 5120 ops.
Maximum of 12 concurrent read requests.
Maximum of 9 concurrent write requests.
This is not necessarily our final values but it's good to have these as defaults for the purposes of initial testing.
* Change the `TransportPauseFollowAction` to extend from `TransportMasterNodeAction`
instead of `HandledAction`, this removes a sync cluster state api call.
* Introduced `ResponseHandler` that removes duplicated code in `TransportPauseFollowAction` and
`TransportResumeFollowAction`.
* Changed `PauseFollowAction.Request` to not use `readFrom()`.
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
Since there's no transition into the "new" phase it wasn't set until the "hot"
phase, so now we initialize it when initializing the policy context.
Resolves#34277
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
Both testFollowIndexAndCloseNode and testFailOverOnFollower failed
because they responded to the FollowTask a TransportService closed
exception which is currently considered as a fatal error. This behavior
is not desirable since a closing node can throw that exception, and we
should retry in that case.
This change adds TransportService closed error to the list of retryable
errors.
Closes#34694
As part of this change the leader index name and leader cluster name are
stored in the CCR metadata in the follow index. The resume follow api
will read that when a resume follow request is executed.
We should delete a job by directly talking to the allocated
task and telling it to shutdown. Today we shut down a job
via the persistent task framework. This is not ideal because,
while the job has been removed from the persistent task
CS, the allocated task continues to live until it gets the
shutdown message.
This means a user can delete a job, immediately delete
the rollup index, and then see new documents appear in
the just-deleted index. This happens because the indexer
in the allocated task is still running and indexes a few
more documents before getting the shutdown command.
In this PR, the transport action is changed to a TransportTasksAction,
and we invoke onCancelled() directly on the matching job.
The race condition still exists after this PR (albeit less likely),
but this was a precursor to fixing the issue and a self-contained
chunk of code. A second PR will followup to fix the race itself.
Since #34412 and #34474, a follower must have soft-deletes enabled
to work correctly. This change requires soft-deletes on the follower.
Relates #34412
Relates #34474
This fixes a bug about aliases authorization.
That is, a user might see aliases which he is not authorized to see.
This manifests when the user is not authorized to see any aliases
and the `GetAlias` request is empty which normally is a marking
that all aliases are requested. In this case, no aliases should be
returned, but due to this bug, all aliases will have been returned.
Extend querying support on multiple indices from being strictly
identical to being just compatible.
Use FieldCapabilities API (extended through #33803) for mapping merging.
Close#31837#31611
* Changed the resource id of auto follow patterns to be a user defined name
instead of being the leader cluster alias name.
* Fail when an unfollowed leader index matches with two or more auto follow patterns.
Implement the functionality to translate the
`field IN (value1, value2,...)` expressions to proper Lucene queries
or painless script or local processors depending on the use case.
The `IN` expression can be used in SELECT, WHERE and HAVING clauses.
Closes: #32955
`CONVERT` works exactly like cast with slightly different syntax:
`CONVERT(<value>, <data_type)` as opposed to `CAST(<value> AS <data_type>)`
Moreover it support format of the MS-SQL data types `SQL_<type>`,
e.g.: `SQL_INTEGER`
Closes: #34513
JDK11 introduced some changes with the SSLEngine. A number of error
messages were changed. Additionally, there were some behavior changes
in regard to how the SSLEngine handles closes during the handshake
process. This commit updates our tests and SSLDriver to support these
changes.
All of the tests in PainlessDomainSplitIT have an awaitsfix, which
causes the build to fail since no tests are run. This adds an empty
test to get the build going again.
Relates #34683
Relates #32966
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
This is the fixed version of #34246, which was reverted.
Relates #33205
We should be consistent here. We were already using the casing "Ccr" and
this is the preferred casing for Java class names. This commit adjusts
the names of some classes that were using the casing "CCR" to be "Ccr".
In some of our X-Pack REST tests we have to wait for pending tasks to
complete. We are now needing this functionality in ESRestTestCase for
the docs tests where we run against X-Pack features. This commit moves
the helper method that we have in X-Pack to ESRestTestCase, and removes
duplicate logic from waiting for rollup tasks to complete.
Since #34288, we might hit deadlock if the FollowTask has more fetchers
than writers. This can happen in the following scenario:
Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has
two fetchers and one writer.
1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0,
num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1
respectively.
2. The second request which fetches seq#1 completes before, and then it
triggers a write request containing only seq#1.
3. The primary of a follower fails after it has replicated seq#1 to
replicas.
4. Since the old primary did not respond, the FollowTask issues another
write request containing seq#1 (resend the previous write request).
5. The new primary has seq#1 already; thus it won't replicate seq#1 to
replicas but will wait for the global checkpoint to advance at least
seq#1.
The problem is that the FollowTask has only one writer and that writer
is waiting for seq#0 which won't be delivered until the writer completed.
This PR proposes to replicate existing operations with the old primary
term (instead of the current term) on the follower. In particular, when
the following primary detects that it has processed an process already,
it will look up the term of an existing operation with the same seq_no
in the Lucene index, then rewrite that operation with the old term
before replicating it to the following replicas. This approach is
wait-free but requires soft-deletes on the follower.
Relates #34288
Today we rely on the LocalCheckpointTracker to ensure no duplicate when
enabling optimization using max_seq_no_of_updates. The problem is that
the LocalCheckpointTracker is not fully reloaded when opening an engine
with an out-of-order index commit. Suppose the starting commit has seq#0
and seq#2, then the current LocalCheckpointTracker would return "false"
when asking if seq#2 was processed before although seq#2 in the commit.
This change scans the existing sequence numbers in the starting commit,
then marks these as completed in the LocalCheckpointTracker to ensure
the consistent state between LocalCheckpointTracker and Lucene commit.
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Generalize NULLS LAST/FIRST
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place (introduce schema:: prefix for declaration)
upfront.
Fix#32079
With this commit we cleanup hand-coded duplicate checks in XContent
parsing. They were necessary previously but since we reconfigured the
underlying parser in #22073 and #22225, these checks are obsolete and
were also ineffective unless an undocumented system property has been
set. As we also remove this escape hatch, we can remove the additional
checks as well.
Closes#22253
Relates #34588
For user/_has_privileges and user/_privileges, handle the case where
there is no user in the security context. This is likely to indicate
that the server is running with a basic license, in which case the
action will be rejected with a non-compliance exception (provided
we don't throw a NPE).
The implementation here is based on the _authenticate API.
Resolves: #34567
This change makes it no longer possible to follow / auto follow without
specifying a leader cluster. If a local index needs to be followed
then `cluster.remote.*.seeds` should point to nodes in the local cluster.
Closes#34258
Having integration tests separated from the unit tests in the qa
directory works much more smoothly with our testing infrastructure,
matches what other plugins do, and tests in a more "real" deployment
scenario by having all plugins installed.
The setting that reduces the disk space requirement
for the forecasting integration tests was accidentally
removed in #31757 when files were moved around. This
change simply adds back the setting that existed before
that.
A constant can now be used outside aggregation only queries.
Don't skip an ES query in case of constants-only selects.
Loosen the binary pipe restriction of being used only in aggregation queries.
Fixes https://github.com/elastic/elasticsearch/issues/31863
Right now, watches fail on runtime, when invalid email addresses are
used.
All those fields can be checked on parsing, if no mustache is used in
any email address template. In that case we can return immediate
feedback, that invalid email addresses should not be specified when
trying to store a watch.
The logfile audit log format is no longer formed by prefix fields followed
by key value fields, it is all formed by key value fields only (JSON format).
Consequently, the following settings, which toggled some of the prefix
fields, have been renamed:
audit.logfile .prefix.emit_node_host_address
audit.logfile .prefix.emit_node_host_name
audit.logfile .prefix.emit_node_name
This API is intended as a companion to the _has_privileges API.
It returns the list of privileges that are held by the current user.
This information is difficult to reason about, and consumers should
avoid making direct security decisions based solely on this data.
For example, each of the following index privileges (as well as many
more) would grant a user access to index a new document into the
"metrics-2018-08-30" index, but clients should not try and deduce
that information from this API.
- "all" on "*"
- "all" on "metrics-*"
- "write" on "metrics-2018-*"
- "write" on "metrics-2018-08-30"
Rather, if a client wished to know if a user had "index" access to
_any_ index, it would be possible to use this API to determine whether
the user has any index privileges, and on which index patterns, and
then feed those index patterns into _has_privileges in order to
determine whether the "index" privilege had been granted.
The result JSON is modelled on the Role API, with a few small changes
to reflect how privileges are modelled when multiple roles are merged
together (multiple DLS queries, multiple FLS grants, multiple global
conditions, etc).
This commit moves the definition of domainSplit into java and exposes it
as a painless whitelist extension. The method also no longer needs
params, and version which ignores params is added and deprecated.
This reverts commit 0b4e8db1d3 as some
issues have been identified with the changed handling of a primary
shard of the security index not being available.
This moves the rollup cleanup code for http tests from the high level rest
client into the test framework and then entirely removes the rollup cleanup
code for http tests that lived in x-pack. This is nice because it
consolidates the cleanup into one spot, automatically invokes the cleanup
without the test having to know that it is "about rollup", and should allow
us to run the rollup docs tests.
Part of #34530
The token service has fairly strict validation and there are a range
of reasons why request may be rejected.
The detail is typically returned in the client exception / json body
but the ES admin can only debug that if they have access to detailed
logs from the client.
This commit adds debug & trace logging to the token service so that it
is possible to perform this debugging from the server side if
necessary.
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
Relates #33205
This change adds the command RemoveIndexLifecyclePolicy to the HLRC. This uses the
new TimeRequest as a base class for RemoveIndexLifecyclePolicyRequest on the client side.
We're publishing jdbc into our maven repo as though its artifactId is
`x-pack-sql-jdbc` but the pom listed the artifactId as `jdbc`. This
fixes the pom to line up with where we're publishing the artifact.
Closes#34399
* Rollup adding support for date field metrics (#34185)
* Restricting supported metrics for `date` field rollup
* fixing expected error message for yaml test
* Addressing PR comments
The `AutoFollowTests` needs to restart the clusters between each tests, because
it is using auto follow stats in assertions. Auto follow stats are only reset
by stopping the elected master node.
Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api.
* Renamed AutoFollowTests to AutoFollowIT, because it is an integration test.
Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name
of an internal api and isn't a good name for an integration test.
* move creation of NodeConfigurationSource to a seperate method
* Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.
When performing an internal reindex, we add a setting marking the source
as read-only. We also check that this index is not already
read-only. This means that when we add the read-only setting, we expect
that it is already not there. This commit adds an assertion before we
increment the settings version validating that this is indeed the case.
This commit introduces settings version to index metadata. This value is
monotonically increasing and is updated on settings updates. This will
be useful in cross-cluster replication so that we can request settings
updates from the leader only when there is a settings update.
xContent ordering is unreliable when derived from map insertions but the parsed objects’ .equals() methods have the sort logic required to prove connections and vertices are correct. Disabled the xContent equivalence checks.
Closes#33686
In 54cb890 a setting for testing only was introduced, that delayed the start up of watcher. With the changes of how is watcher is started/stopped over time, this is not needed anymore.
Security caches the result of role lookups and negative lookups are
cached indefinitely. In the case of transient failures this leads to a
bad experience as the roles could truly exist. The CompositeRolesStore
needs to know if a failure occurred in one of the roles stores in order
to make the appropriate decision as it relates to caching. In order to
provide this information to the CompositeRolesStore, the return type of
methods to retrieve roles has changed to a new class,
RoleRetrievalResult. This class provides the ability to pass back an
exception to the roles store. This exception does not mean that a
request should be failed but instead serves as a signal to the roles
store that missing roles should not be cached and neither should the
combined role if there are missing roles.
As part of this, the negative lookup cache was also changed from an
unbounded cache to a cache with a configurable limit.
Relates #33205
PR #34290 made it impossible to use thread-context values to pass
authentication metadata out of a realm. The SAML realm used this
technique to allow the SamlAuthenticateAction to process the parsed
SAML token, and apply them to the access token that was generated.
This new method adds metadata to the AuthenticationResult itself, and
then the authentication service makes this result available on the
thread context.
Closes: #34332
The ingest pipeline that is produced is very simple. It
contains a grok processor if the format is semi-structured
text, a date processor if the format contains a timestamp,
and a remove processor if required to remove the interim
timestamp field parsed out of semi-structured text.
Eventually the UI should offer the option to customize the
pipeline with additional processors to perform other data
preparation steps before ingesting data to an index.
In ccb9ab5717 we changed how we deal with time
fields to support the `DateTime`-format fields added in 6.0, but dropped
support for pre-6.x `Long`-format fields. This change reinstates this support
for cases where pre-6.x data is made available to ML (e.g. in a mixed-version
CCS setup or after an upgrade).
ListenableFuture may run a listener on the same thread that called the
addListener method or it may execute on another thread after the future
has completed. Whenever the ListenableFuture stores the listener for
execution later, it should preserve the thread context which is what
this change does.
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.
This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).
Relates #31751
Relates #31113
* New OCTET_LENGTH function
* Changed the way the FunctionRegistry stores functions, considering the alphabetic ordering by name
* Added documentation for the RANDOM function
This fixes an issue where an incorrect expected next step is used when checking
to execute `AsyncActionStep`s after a cluster state step.
It fixes this scenario:
- `ExecuteStepsUpdateTask` executes a `ClusterStateWaitStep` or
`ClusterStateActionStep` successfully
- The next step is also a `ClusterStateWaitStep`, so it loops
- The `ClusterStateWaitStep` has a next stepkey (which gets set to the
`nextStepKey` in the code)
- The `ClusterStateWaitStep` fails the condition, meaning that it will have to
wait longer
- The `nextStepKey` is now incorrect though, because we did not advance the
index's step, and it's not `null` (which is another safe value if there is no
step after the `ClusterStateWaitStep`)
This fixes the problem by resetting the nextStepKey to null if the condition is
not met, since we are not going to advance the step metadata in this
case (thereby skipping the `maybeRunAsyncAction` invocation).
This commit also tightens up and enhances much of the ILM logging. A lot of
logging was missing the index name (making it hard to debug in the presence of
multiple indices) and a lot was using the wrong logging level (DEBUG is now
actually readable without being a wall of text).
Resolves#34297
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
Also fixed ShardFollowNodeTaskTests to not return ops when responseSize
is empty. Otherwise ops are returned when no ops are expected to be returned.
Co-authored-by: Jason Tedor <jason@tedor.me>
Unfollow should be allowed / disallowed on a per index level instead of
cluster level.
Also renamed `create_follow_index` index privilege to
`manage_follow_index` privilege and include unfollow and close APIs.
This commit modifies the follow stats API response structure to more
clearly highlight meaning of the higher level fields. In particular,
previously the response had a top-level key for each index. Instead, we
nest the indices under an "indices" field which is now an array. The
values in this array are objects containing two fields: "index" which is
the name of the follower index, and "shards" which is an array where
each value in the array is the follower stats for that shard. That is,
we have gone from:
{
"bar": [
{
"shard_id": 0...
}...
]...
}
to
{
"indices": [
{
"index": "bar",
"shards": [
{
"shard_id": 0...
}...
]
}...
}
In the CCR docs we want to refer to the endpoint that returns following
stats as the follow stats API. This commit renames the internal
implementation of this endpoint to reflect this usage.
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
There may be values in the thread context that ought to be preseved
for later use, even if one or more realms perform asynchronous
authentication.
This commit changes the AuthenticationService to wrap the potentially
asynchronous calls in a ContextPreservingActionListener that retains
the original thread context for the authentication.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
Drops the last logging constructor that takes `Settings` because it is
no longer needed.
Watcher goes through a lot of effort to pass `Settings` to `Logger`
constructors and dropping `Settings` from all of those calls allowed us
to remove quite a bit of log-based ceremony from watcher.
The Security plugin authorizes actions on indices. Authorization
happens on a per index/alias basis. Therefore a request with a
Multi Index Expression (containing wildcards) has to be
first evaluated in the authorization layer, before the request is
handled. For authorization purposes, wildcards in expressions will
only be expanded to indices/aliases that are visible by the authenticated
user. However, this "constrained" evaluation has to be compatible with
the expression evaluation that a cluster without the Security plugin
would do. Therefore any change in the evaluation logic
in any of these sites has to be mirrored in the other site.
This commit mirrors the changes in core from #33518 that allowed
for Multi Index Expression in the Get Alias API, loosely speaking.
This enables Elasticsearch to use the JVM-wide configured
PKCS#11 token as a keystore or a truststore for its TLS configuration.
The JVM is assumed to be configured accordingly with the appropriate
Security Provider implementation that supports PKCS#11 tokens.
For the PKCS#11 token to be used as a keystore or a truststore for an
SSLConfiguration, the .keystore.type or .truststore.type must be
explicitly set to pkcs11 in the configuration.
The fact that the PKCS#11 token configuration is JVM wide implies that
there is only one available keystore and truststore that can be used by TLS
configurations in Elasticsearch.
The PIN for the PKCS#11 token can be set as a truststore parameter in
Elasticsearch or as a JVM parameter ( -Djavax.net.ssl.trustStorePassword).
The basic goal of enabling PKCS#11 token support is to allow PKCS#11-NSS in
FIPS mode to be used as a FIPS 140-2 enabled Security Provider.
When the cluster.routing.allocation.disk.watermark.flood_stage watermark
is breached, DiskThresholdMonitor marks the indices as read-only. This
failed when x-pack security was present as system user does not have the privilege
for update settings action("indices:admin/settings/update").
This commit adds the required privilege for the system user. Also added missing
debug logs when access is denied to help future debugging.
An assert statement is added to catch any missed privileges required for
system user.
Closes#33119
In SessionFactoryLoadBalancingTests#testRoundRobinWithFailures()
we kill ldap servers randomly and immediately bind to that port
connecting to mock server socket. This is done to avoid someone else
listening to this port. As the creation of mock socket and binding to the
port is immediate, sometimes the earlier socket would be in TIME_WAIT state
thereby having problems with either bind or connect.
This commit sets the SO_REUSEADDR explicitly to true and also sets
the linger on time to 0(as we are not writing any data) so as to
allow re-use of the port and close immediately.
Note: I could not find other places where this might be problematic
but looking at test runs and netstat output I do see lot of sockets
in TIME_WAIT. If we find that this needs to be addressed we can
wrap ServerSocketFactory to set these options and use that with in
memory ldap server configuration during tests.
Closes#32190
This commit upgrades the unboundid ldapsdk to version 4.0.8. The
primary driver for upgrading is a fix that prevents this library from
rewrapping Error instances that would normally bubble up to the
UncaughtExceptionHandler and terminate the JVM. Other notable changes
include some fixes related to connection handling in the library's
connection pool implementation.
Closes#33175
- this adds an integration test that runs through a policy
with all the actions defined.
- adds a test specific to a policy having just a rollover action
- bumps the node count to 4
The `DnRoleMapper` class is used to map distinguished names of groups
and users to role names. This mapper builds in an internal map that
maps from a `com.unboundid.ldap.sdk.DN` to a `Set<String>`. In cases
where a lot of distinct DNs are mapped to roles, this can consume quite
a bit of memory. The majority of the memory is consumed by the DN
object. For example, a 94 character DN that has 9 relative DNs (RDN)
will retain 4KB of memory, whereas the String itself consumes less than
250 bytes.
In order to reduce memory usage, we can map from a normalized DN string
to a List of roles. The normalized string is actually how the DN class
determines equality with another DN and we can drop the overhead of
needing to keep all of the other objects in memory. Additionally the
use of a List provides memory savings as each HashSet is backed by a
HashMap, which consumes a great deal more memory than an appropriately
sized ArrayList. The uniqueness we get from a Set is maintained by
first building a set when parsing the file and then converting to a
list upon completion.
Closes#34237
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
This commit changes the way that step execution flows. Rather than have any step
run when the cluster state changes or the periodic scheduler fires, this now
runs the different types of steps at different times.
`AsyncWaitStep` is run at a periodic manner, ie, every 10 minutes by default
`ClusterStateActionStep` and `ClusterStateWaitStep` are run every time the
cluster state changes.
`AsyncActionStep` is now run only after the cluster state has been transitioned
into a new step. This prevents these non-idempotent steps from running at the
same time. It addition to being run when transitioned into, this is also run
when a node is newly elected master (only if set as the current step) so that
master failover does not fail to run the step.
This also changes the `RolloverStep` from an `AsyncActionStep` to an
`AsyncWaitStep` so that it can run periodically.
Relates to #29823
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid.
No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing.
Closes#33956
Revert "[TESTS] Pin MockWebServer to TLS1.2 (#33127)" (commit
214652d4af) and "Pin TLS1.2 in
SSLConfigurationReloaderTests" (commit
d9f5e4fd2e), which pinned the
MockWebServer used in the SSLConfigurationReloaderTests to TLSv1.2 in
order to prevent failures with JDK 11 related to ssl session
invalidation. We no longer need this pinning as the problematic code
was fixed in #34130.
The `-` and `+` as a number literal prefix are already
parsed by the rule in `valueExpression`. To accommodate
this, there are some code changes that enables the
`ExpressionBuilder` to parse Literal integers and decimals
together with the `-/+` prefix sign (if exists) and validate
them (wrong format, large numbers, etc.).
Follows: #33854
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
Previously, parsing an arithmetic expression with `*` and no spaces,
e.g.: `2*i` threw a parsing exception as the grammar rule for
tableIdentifier was clashing with the rule for arithmetic operator `*`.
This issue comes already in the lexer and the left part of the
expression (in our example `2*`) was recognised as a
TABLE_IDENTIFIER token.
The solution adopted is to allow the `*` wildcard in the table name
only if it's surrounded with double quotes, e.g.: `"my*index"`
Closes: #33957
Remove CamelCase to CAMEL_CASE conversion when resolving
a function. Only convert user input to upper case and then
try to match with aliases or primary names.
Keep the internal conversion FunctionName to FUNCTION__NAME
which provides flexibility when registering functions by their class
name.
Fixes: #34114
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
This change fixes a potential deadlock problem in the unit
test introduced in #34117.
It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
This arose when two commits were pushed at roughly the same time, both
of which compiled successfully against master, but not when taken
together. This commit fixes a reference in one of the commits that was
changed in the other commit.
This commit modifies the CCR stats endpoint for indices to be
/{index}/_ccr/stats. This makes this endpoint consistent with other
index-centric endpoints like indices stats.
The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients.
For the unfollow api to work the index follow needs to be stopped and the index needs to be closed.
Closes#33931
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.
Relates #33656
This can be used to restrict the amount of CPU a single
structure finder request can use.
The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.
The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
Prior to following an index in the follow API, check whether current
user has sufficient privileges in the leader cluster to read and
monitor the leader index.
Also check this in the create and follow API prior to creating the
follow index.
Also introduced READ_CCR cluster privilege that include the minimal
cluster level actions that are required for ccr in the leader cluster.
So a user can follow indices in a cluster, but not use the ccr admin APIs.
Closes#33553
Co-authored-by: Jason Tedor <jason@tedor.me>
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#29989
The SSLService invalidates SSLSessions when there is a change to any of
the underlying key or trust material. However, this invalidation code
did not check for a null SSLSession being returned from the context and
assumed that the context would always return a non-null object. The
return of a null object is possible in all versions, but JDK11 seems to
return them more often due to changes for TLS 1.3. There are a number
of reasons that we get a id of a session but the context returns null
when the session with that id is requested. Some of the reasons for
this are:
* Session was evicted by session cache
* Session has timed out
* Session has been invalidated by another caller
To handle this, the SSLService now checks if the value is null before
calling invalidate on the SSLSession.
Closes#32124
This commit removes the use of ExecutableScript from watcher in favor of
custom script contexts for both watcher condition scripts and transform
scripts.
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#32293
The following changes were made:
* Added ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error.
* Added Index block exception. If the leader index gets closed, this exception is returned.
* Added ClusterBlockException service unavailable. In case for example the leader cluster is without elected master.
* Removed IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error.
Closes#33954
Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.
This also addresses a potential issue in caching when the same query contains aggregations but in
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in
the cache.
Due to a bug, that was fixed in #33360 and commit
1de2a925ce the initial adding of a watch
could get lost, thus leaving the watcher stats count as zero despite adding a watch.
Closes#33326
* Renamed CCR APIs
Renamed:
* `/{index}/_ccr/create_and_follow` to `/{index}/_ccr/follow`
* `/{index}/_ccr/unfollow` to `/{index}/_ccr/pause_follow`
* `/{index}/_ccr/follow` to `/{index}/_ccr/resume_follow`
Relates to #33931
always use `IndicesOptions.strictExpand()` for indices options.
The follow index may be closed and we still want to get stats from
shard follow task and the whether the provided index name matches with
follow index name is checked when locating the task itself in the ccr
stats transport action.
`Settings` is no longer required to get a `Logger` and we went to quite
a bit of effort to pass it to the `Logger` getters. This removes the
`Settings` from all of the logger fetches in security and x-pack:core.
Centralize and simplify the script generation between operators and functions which are currently decoupled. As part of this process most predicates (<, <=, etc...) were made ScalarFunction as their purpose and functionality is quite similar (see % and MOD functions).
Renamed ProcessDefinition to Pipe
Add ScriptWeaver as a mixin-in interface for script customization
Add logic for equals/lte/lt
Improve BinaryOperator/expression toString
Minimize duplication across string functions
Close#33975
This commit adds the ability to plug in compilation of custom contexts
in mock script engine. This is needed for testing plugins which add
custom contexts like watcher.
Watcher is using a lot of so called TextTemplate fields in a watch
definition, which can use mustache to insert the watch id for example.
For the user it is non-obvious which field is just a string field or
which field is a text template.
This also means, that for every such field, we currently do a script
compilation, even if the field does not contain any mustache syntax.
This will lead to an increased script cache churn, because those
compiled scripts (that only contain a string), will evict other scripts.
On top of that, this also means that an unneeded compilation has
happened, instead of returning that string immediately.
The usages of mustache templating are in all of the actions (most of the time far
more than one compilation) as well as most of the inputs.
Especially when running a lot of watches in parallel, this will reduce
execution times and help reuse of real scripts.
Security previously hardcoded a default scroll keepalive of 10 seconds,
but in some cases this is not enough time as there can be network
issues or overloading of host machines. After this change, security
will now use the default keepalive timeout, which is controllable using
a setting and the default value is 5 minutes.
In order to optimize the use of the role cache, when the roles.yml file
is reloaded we now calculate the names of removed, changed, and added
roles so that they may be passed to any listeners. This allows a
listener to selectively clear cache for only the roles that have been
modified. The CompositeRolesStore has been adapted to do exactly that
so that we limit the need to reload roles from sources such as the
native roles stores or external role providers.
See #33205
This commits creates a DateMathParser interface, which is already
implemented for both joda and java time. While currently the java time
DateMathParser is not used, this change will allow a followup which will
create a DateMathParser from a DateFormatter, so the caller does not
need to know the internals of the DateFormatter they have.
This change cleans up "unused variable" warnings. There are several cases were we
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
This commit replicates the max_seq_no_of_updates on the leading index
to the primaries of the following index via ShardFollowNodeTask. The
max_seq_of_updates is then transmitted to the replicas of the follower
via replication requests (that's BulkShardOperationsRequest).
Relates #33656
Implement circuit breaker logic in the parser which catches expressions
that can blow up the tree and result in StackOverflowError being thrown.
Co-authored-by: Costin Leau <costin.leau@gmail.com>
ILM now only forces indices to become read only in the case of Shrink
and Force Merge actions, as these are most useful in cases where the
index is no longer being written to.
[ML] Modify thresholds for normalization triggers
The (arbitrary) threshold factors used to judge if scores have
changed significantly enough to trigger a look-back renormalization have
been changed to values that reduce the frequency of such
renormalizations.
Added a clause to treat changes in scores as a 'big change' if it would
result in a change of severity reported in the UI.
Also altered the clause affecting small scores so that a change should
be considered big if scores have changed by at least 1.5.
Relates https://github.com/elastic/machine-learning-qa/issues/263
We start tracking max seq_no_of_updates on the primary in #33842. This
commit replicates that value from a primary to its replicas in replication
requests or the translog phase of peer-recovery.
With this change, we guarantee that the value of max seq_no_of_updates
on a replica when any index/delete operation is performed at least the
max_seq_no_of_updates on the primary when that operation was executed.
Relates #33656
Previously the timestamp_formats field in the response
from the find_file_structure endpoint contained Joda
timestamp formats. This change makes that clear by
renaming the field to joda_timestamp_formats, and also
adds a java_timestamp_formats field containing the
equivalent Java time format strings.
Two issues are resolved:
1. The `value_type` should be either long or double in case of numeric.
2. The key label for the aggregate filter (having) was duplicate of an aggr key label.
Fixes: #33520
* TESTS: Stabilize Renegotiation Test
* The second `startHandshake` is not synchronous and a read of only
50ms may fail to trigger it entirely (the failure can be reproduced reliably by setting the socket timeout to `1`)
=> fixed by retrying the read until the handshake finishes (a longer timeout would've worked too,
but retrying seemed more stable)
* Closes#33772
* Make certain ML node settings dynamic (#33565)
* Changing to pull in updating settings and pass to constructor
* adding note about only newly opened jobs getting updated value
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
The source only snapshot drops fully deleted segments before snapshotting
them. In order to compare them we need to drop all fully deleted segments
in the test as well.
Closes#33755
If numWrites is between 2 and 9, we will issue an invalid range because
the from_seq_no is negative. This commit makes sure that numWrites is at
least 10, and adds an explicit test to verify invalid request ranges.
SearchGroupsResolverInMemoryTests was (rarely) fail in a way that
suggests that the server-side delay (100ms) was not enough to trigger
the client-side timeout (5ms).
The server side delay has been increased to try and overcome this.
Resolves: #32913
Previously numeric values in the field_stats created by the
find_file_structure endpoint were always output with a
decimal point. This looked unfriendly and unnatural for
fields that clearly store integer values. This change
converts integer values to type Integer before output in
the file structure field stats.
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).
The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.
Relates #33656
This commit reverts most of #33157 as it introduces another race
condition and breaks a common case of watcher, when the first watch is
added to the system and the index does not exist yet.
This means, that the index will be created, which triggers a reload, but
during this time the put watch operation that triggered this is not yet
indexed, so that both processes finish roughly add the same time and
should not overwrite each other but act complementary.
This commit reverts the logic of cleaning out the ticker engine watches
on start up, as this is done already when the execution is paused -
which also gets paused on the cluster state listener again, as we can be
sure here, that the watches index has not yet been created.
This also adds a new test, that starts a one node cluster and emulates
the case of a non existing watches index and a watch being added, which
should result in proper execution.
Closes#33320
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
The job deletion logic was scattered around a few places:
the transport action, the job manager and the deletion task.
Overloading the task with deletion logic also meant extra
dependencies in the core package which should be unnecessary.
This commit consolidates all this logic into the transport action
and replaces the deletion task with a plain one that needs not be
aware of deletion logic.
* Added TRUNCATE function, modified ROUND to accept two parameters instead of one. Made the second parameter optional for both functions.
* Added documentation for both functions.
Removed rules in the grammar that were superfluous, as they
are already "caught" other rules in the same context.
Also switched to exact ambig detection for debug mode
Fixes: #31885
Using index settings for ILM state is fragile and exposes too much
information that doesn't need to be exposed. Using custom index metadata
is more resilient and allows more controlled access to internal
information.
As part of these changes, moves away from using defaults for ILM-related
values, in favor of using null values to clearly indicate that the value is not
present.
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to
know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This
special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional
filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they
appear before them in a chain, because they produce multiple tokens at the same position.
This commit adds two methods to the TokenFilterFactory interface.
* `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding
filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and
`CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`.
* `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym
list `Analyzer`. By default it returns `true`.
Fixes#33609
The fix in #33757 introduces some workaround since FilterCodecReader didn't
support unwrapping. This cuts over to a more elegant fix to access the readers
segment infos.
Previously multiple comma separated lists of options where not
recognized correctly which resulted in only the last of them
to be taked into account, e.g.:
For the following query:
SELECT * FROM test WHERE QUERY('search', 'default_field=foo', 'default_operator=and')"
only the `default_operator=and` was finally passed to the ES query.
Fixes: #32602
Instead of having one constructor that accepts all arguments, all parameters
should be provided via setters. Only leader and follower index are required
arguments. This makes using this class in tests and transport client easier.
DAYNAME and MONTHNAME functions tests will be skipped if the right JVM parameter (-Djava.locale.providers=COMPAT) is not used in unit testing environment
This moves away from caching a list of steps for a current phase, instead
rebuilding the necessary step from the phase JSON stored in the index's
metadata.
Relates to #29823
Currently a watch execution results in one bulk request, when the
triggered watches are written into the that index, that need to be
executed. However the update of the watch status, the creation of the
watch history entry as well as the deletion of the triggered watches
index are all single document operations.
This can have quite a negative impact, once you are executing a lot of
watches, as each execution results in 4 documents writes, three of them
being single document actions.
This commit switches to a bulk processor instead of a single document
action for writing watch history entries and deleting triggered watch
entries. However the defaults are to run synchronous as before because
the number of concurrent requests is set to 0. This also fixes a bug,
where the deletion of the triggered watch entry was done asynchronously.
However if you have a high number of watches being executed, you can
configure watcher to delete the triggered watches entries as well as
writing the watch history entries via bulk requests.
The triggered watches deletions should still happen in a timely manner,
where as the history entries might actually be bound by size as one
entry can easily have 20kb.
The following settings have been added:
- xpack.watcher.bulk.actions (default 1)
- xpack.watcher.bulk.concurrent_requests (default 0)
- xpack.watcher.bulk.flush_interval (default 1s)
- xpack.watcher.bulk.size (default 1mb)
The drawback of this is of course, that on a node outage you might end
up with watch history entries not being written or watches needing to be
executing again because they have not been deleted from the triggered
watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
(e.g. create_and_follow api call fails) or cluster alias
(e.g. fetching remote cluster state fails).
Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.
Relates to #33007
* Implement xpack.monitoring.elasticsearch.collection.enabled setting
* Fixing line lengths
* Updating constructor calls in test
* Removing unused import
* Fixing line lengths in test classes
* Make monitoringService.isElasticsearchCollectionEnabled() return true for tests
* Remove wrong expectation
* Adding unit tests for new flag to be false
* Fixing line wrapping/indentation for better readability
* Adding docs
* Fixing logic in ClusterStatsCollector::shouldCollect
* Rebasing with master and resolving conflicts
* Simplifying implementation by gating scheduling
* Doc fixes / improvements
* Making methods package private
* Fixing wording
* Fixing method access
This PR removes fields that are not actually used by the Monitoring UI. This will greatly simplify the eventual migration to using Metricbeat for monitoring Elasticsearch (see https://github.com/elastic/beats/pull/8260#discussion_r215885868 for more context and discussion around removing these fields from ES collection).
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.
The extra caused by level is not necessary here:
```
"fetch_exceptions": [
{
"from_seq_no": 1,
"retries": 106,
"exception": {
"type": "exception",
"reason": "[index1] IndexNotFoundException[no such index]",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index",
"index_uuid": "_na_",
"index": "index1"
}
}
}
],
```
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.
This commit fixes that, by allowing that a leader index does not yet have
a mapping.
We can't rely on the leaf reader ordinal in a wrapped reader since
it might not correspond to the ordinal in the SegmentInfos for it's
SegmentCommitInfo.
Relates to #32844Closes#33689Closes#33755
This PR integrates the following pieces of machinery in the Coordinator:
- discovery
- pre-voting
- randomised election scheduling
- joining (of a new master)
- publication of cluster state updates
Together, these things are everything needed to form a cluster. We therefore
also add the start of a test suite that allows us to assert higher-level
properties of the interactions between all these pieces of machinery, with as
little fake behaviour as possible. We assert one such property: "a cluster
successfully forms".
This commit adds the Create Rollup Job API to the high level REST
client. It supersedes #32703 and adds dedicated request/response
objects so that it does not depend on server side components.
Related #29827
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
There was a listener that re-runs the policy with the new state when the cluster
state is processed by the `MoveToNextStepUpdateTask`. This removes this listener
as we will execute the policy through the `IndexLifecyleService` cluster state
listener.
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.
This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check
before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
Ensure that the SSLConfigurationReloaderTests can run with JDK 11
by pinning the HttpClient to TLS version to TLS1.2. This is necessary
becase even if the MockWebServer is set to user TLS1.2, we don't
set its enabled protocols, so if it receives a TLS1.3 request (which
is the default behavior for HttpClient in JDK11), it will use TLS1.3
and the original issue will manifest again.
Relates #33127Resolves#32124
Changes the format of log events in the audit logfile.
It also changes the filename suffix from `_access` to `_audit`.
The new entry format is consistent with Elastic Common Schema.
Entries are formatted as JSON with no nested objects and field
names have a dotted syntax. Moreover, log entries themselves
are not spaced by commas and there is exactly one entry per line.
In addition, entry fields are ordered, unlike a typical JSON doc,
such that a human would not strain his eyes over jumbled
fields from one line to the other; the order is defined in the log4j2
properties file.
The implementation utilizes the log4j2's `StringMapMessage`.
This means that the application builds the log event as a map
and the log4j logic (the appender's layout) handle the format
internally. The layout, such as the set of printed fields and their
order, can be changed at runtime without restarting the node.
and if so debug log it and otherwise rethrow.
This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
This change modifies the file structure detection functionality
such that some of the decisions can be overridden with user
supplied values.
The fields that can be overridden are:
- charset
- format
- has_header_row
- column_names
- delimiter
- quote
- should_trim_fields
- grok_pattern
- timestamp_field
- timestamp_format
If an override makes finding the file structure impossible then
the endpoint will return an exception.
We have a Kerberos setting to remove realm part from the user
principal name (remove_realm_name). If this is true then
the realm name is removed to form username but in the process,
the realm name is lost. For scenarios like Kerberos cross-realm
authentication, one could make use of the realm name to determine
role mapping for users coming from different realms.
This commit adds user metadata for kerberos_realm and
kerberos_user_principal_name.
Disable specific Thai and Japanese locales as Certificate expiration
validation fails due to the date parsing of BouncyCastle (that manifests
in a FIPS 140 JVM as this is the only place we use BouncyCastle).
Added the locale switching logic here instead of subclassing
ESTestCase as these are the only tests that fail for these locales and
JVM combination.
Resolves#33081
We have a test dependency on Apache Mina when using SimpleKdcServer
for testing Kerberos. When checking for LDAP backend connectivity,
the code checks for deadlocks which require additional security
permissions accessClassInPackage.sun.reflect. As this is only for
test and we do not want to add security permissions to production,
this commit moves these tests and related classes to
x-pack evil-tests where they can run with security manager disabled.
The plan is to handle the security manager exception in the upstream issue
DIRMINA-1093
and then once the release is available to run these tests with security
manager enabled.
Closes#32739
This change removes the wrapping of the created field in the put user
response. The created field was added as a top level field in #32332,
while also still being wrapped within the `user` object of the
response. Since the value is available in both formats in 6.x, we can
remove the wrapped version for 7.0.
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.
The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.
Closes#31505
Previously, when an non-pruned cast (casting as a different
data type) got applied on a table column in the `SELECT` clause,
the name of the result column didn't contain the target data type
of the cast, e.g.:
SELECT CAST(MAX(salary) AS DOUBLE) FROM "test_emp"
returned as column name:
CAST(MAX(salary))
instead of:
CAST(MAX(salary) AS DOUBLE)
Closes#33571
* Added more tests for trivial casts that are pruned
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`,
in many integration tests, to deal with the dynamic nature of the allocation of
ports to nodes. However #33241 allows us to use file-based discovery to achieve
the same goal, so the special test-only `MockUncasedHostsProvider` is no longer
required.
This change removes `MockUncasedHostProvider` and replaces it with file-based
discovery in tests based on `EsIntegTestCase`.
Follow up to #33617. Relates to #30086.
As with all other per-index Monitoring collectors, the `CcrStatsCollector` should only collect stats for the indices the user wants to monitor. This list is controlled by the `xpack.monitoring.collection.indices` setting and defaults to all indices.
This change addresses some issues regarding thread safety around
updates and method calls on the XPackLicenseState object. There exists
a possibility that there could be a concurrent update to the
XPackLicenseState when there is a scheduled check to see if the license
is expired and a cluster state update. In order to address this, the
update method now has a synchronized block where member variables are
updated. Each method that reads these variables is now also
synchronized.
Along with the above change, there was a consistency issue around
security calls to the license state. The majority of security checks
make two calls to the license state, which could result in incorrect
behavior due to the checks being made against different license states.
The majority of this behavior was introduced for 6.3 with the inclusion
of x-pack in the default distribution. In order to resolve the majority
of these cases, the `isSecurityEnabled` method is no longer public and
the logic is also included in individual methods about security such as
`isAuthAllowed`. There were a few cases where this did not remove
multiple calls on the license state, so a new method has been added
which creates a copy of the current license state that will not change.
Callers can use this copy of the license state to make decisions based
on a consistent view of the license state.
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.
* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
index shards are the same as the recorded history UUIDs.
If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
on each shard changes api call verify whether their current history uuid
is the same as the recorded history uuid.
Relates to #30086
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
This change adds a `_source` only snapshot repository that allows to wrap
any existing repository as a _backend_ to snapshot only the `_source` part
including live docs markers. Snapshots taken with the `source` repository
won't include any indices, doc-values or points. The snapshot will be reduced in size and
functionality such that it requires full re-indexing after it's successfully restored.
The restore process will copy the `_source` data locally starts a special shard and engine
to allow `match_all` scrolls and searches. Any other query, or get call will fail with and unsupported operation exception. The restored index is also marked as read-only.
This feature aims mainly for disaster recovery use-cases where snapshot size is
a concern or where time to restore is less of an issue.
**NOTE**: The snapshot produced by this repository is still a valid lucene index. This change doesn't allow for any longer retention policies which is out of scope for this change.
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.
Relates to #30086
* SQL: Make Literal a NamedExpression
Literal now is a NamedExpression reducing the need for Aliases for
folded expressions leading to simpler optimization rules.
Fix#33523
This change tightens up the meaning of the "input_fields" field
in the file structure finder output. Previously it was permitted
but not calculated for JSON and XML files. Following this change
the field is called "column_names" and is only permitted for
delimited files.
Additionally the way the column names are set for headerless
delimited files is refactored to encapsulate the way they're
named to one line of the code rather than having the same
logic in two places.
Previously, when an arithmetic function got applied on a
table column in the `SELECT` clause, the name of the result
column contained weird characters used internally when
processing the SQL statement e.g.:
SELECT CHAR(emp_no % 10000) FROM "test_emp"
returned:
CHAR((emp_no{f}#14) % 10000))
as the column name instead of:
CHAR((emp_no) % 10000))
Also, fix an issue that causes a ClassCastException to be thrown
when using functions where both arguments are literals.
Closes#31869Closes#33461
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk