`<expression>::<dataType>` is a simplified altenative syntax to
`CAST(<expression> AS <dataType> which exists in PostgreSQL and
provides an improved user experience and possibly more compact
SQL queries.
Fixes: #38717
fix tests to use clock in milliseconds precision in watcher code
make sure the date comparison in string format is using same formatters
some of the code was modified in #38514 possibly because of merge conflicts
closes#38581
Backport #38738
The java time formatter used in the exporter adds a plus sign to the
year, if a year with more than five digits is used. This changes the
creation of those timestamp to only have a date up to 9999.
Closes#38378
The Close Index API has been refactored in 6.7.0 and it now performs
pre-closing sanity checks on shards before an index is closed: the maximum
sequence number must be equals to the global checkpoint. While this is a
strong requirement for regular shards, we identified the need to relax this
check in the case of CCR following shards.
The following shards are not in charge of managing the max sequence
number or global checkpoint, which are pulled from a leader shard. They
also fetch and process batches of operations from the leader in an unordered
way, potentially leaving gaps in the history of ops. If the following shard lags
a lot it's possible that the global checkpoint and max seq number never get
in sync, preventing the following shard to be closed and a new PUT Follow
action to be issued on this shard (which is our recommended way to
resume/restart a CCR following).
This commit allows each Engine implementation to define the specific
verification it must perform before closing the index. In order to allow
following/frozen/closed shards to be closed whatever the max seq number
or global checkpoint are, the FollowingEngine and ReadOnlyEngine do
not perform any check before the index is closed.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Added a constructor accepting `StreamInput` as argument, which allowed to
make most of the instance members final as well as remove the default
constructor.
Removed a test only constructor in favour of invoking the existing
constructor that takes a `SearchRequest` as first argument.
Also removed profile members and related methods as they were all unused.
* Rename integTest to bwcTestSample for bwc test projects
This change renames the `integTest` task to `bwcTestSample` for projects
testing bwc to make it possible to run all the bwc tests that check
would run without running on bwc tests.
This change makes it possible to add a new PR check on backports to make
sure these don't break BWC tests in master.
* Rename task as per PR
The test was relying on toString in ZonedDateTime which is different to
what is formatted by strict_date_time when milliseconds are 0
The method is just delegating to dateFormatter, so that scenario should
be covered there.
closes#38359
Backport #38610
* Enhance parsing of StatusCode in SAML Responses
<Status> elements in a failed response might contain two nested
<StatusCode> elements. We currently only parse the first one in
order to create a message that we attach to the Exception we return
and log. However this is generic and only gives out informarion
about whether the SAML IDP believes it's an error with the
request or if it couldn't handle the request for other reasons. The
encapsulated StatusCode has a more interesting error message that
potentially gives out the actual error as in Invalid nameid policy,
authentication failure etc.
This change ensures that we print that information also, and removes
Message and Details fields from the message when these are not
part of the Status element (which quite often is the case)
When the millisecond part of a timestamp is 0 the toString
representation in java-time is omitting the millisecond part (joda was
not). The Search response is returning timestamps formatted with
WatcherDateTimeUtils, therefore comparisons of strings should be done
with the same formatter
relates #27330
BackPort #38505
Adds the ability to fetch chunks from different files in parallel, configurable using the new `ccr.indices.recovery.max_concurrent_file_chunks` setting, which defaults to 5 in this PR.
The implementation uses the parallel file writer functionality that is also used by peer recoveries.
Improve verifier to disallow grouping over grouping functions (e.g.
HISTOGRAM over HISTOGRAM).
Close#38308
(cherry picked from commit 4e9b1cfd4df38c652bba36b4b4b538ce7c714b6e)
Constant numbers (of any form: integers, decimals, negatives,
scientific) and strings shouldn't increase the depth counters
as they don't contribute to the increment of the stack depth.
Fixes: #38571
* ML: update set_upgrade_mode, add logging
* Attempt to fix datafeed isolation
Also renamed a few methods/variables for clarity and added
some comments
This commit adds the 7.1 version constant to the 7.x branch.
Co-authored-by: Andy Bristol <andy.bristol@elastic.co>
Co-authored-by: Tim Brooks <tim@uncontended.net>
Co-authored-by: Christoph Büscher <cbuescher@posteo.de>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: markharwood <markharwood@gmail.com>
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Alpar Torok <torokalpar@gmail.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: Albert Zaharovits <albert.zaharovits@gmail.com>
- Add resolution to the exact keyword field (if exists) for text fields.
- Add proper verification and error message if underlying keyword
doesn'texist.
- Move check for field attribute in the comparison list to the
`resolveType()` method of `IN`.
Fixes: #38424
In #38333 and #38350 we moved away from the `discovery.zen` settings namespace
since these settings have an effect even though Zen Discovery itself is being
phased out. This change aligns the documentation and the names of related
classes and methods with the newly-introduced naming conventions.
Aliases defined in SELECT (Project or Aggregate) are now resolved in the
following WHERE clause. The Analyzer has been enhanced to identify this
rule and replace the field accordingly.
Close#29983
We have had various reports of problems caused by the maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.
The implementation was problematic though as such timeout would also
expire in the first request attempt (see #31834), would leave the
request executing after expiration causing memory leaks (see #33342),
and would not take into account the http client internal queuing (see #25951).
Given all these issues, it seems that this custom timeout mechanism
gives little benefits while causing a lot of harm. We should rather rely
on connect and socket timeout exposed by the underlying http client
and accept that a request can overall take longer than the configured
timeout, which is the case even with a single retry anyways.
This commit removes the `maxRetryTimeout` setting and all of its usages.
This commit adds an authentication cache for API keys that caches the
hash of an API key with a faster hash. This will enable better
performance when API keys are used for bulk or heavy searching.
I have not been able to reproduce the failing
test scenario locally for #38408 and there are other similar
tests which are running fine in the same test class.
I am re-enabling the test with additional logs so
that we can debug further on what's happening.
I will keep the issue open for now and look out for the builds
to see if there are any related failures.
This is related to #35975. We do not want a slow master to fail a
recovery from remote process due to a slow put mappings call. This
commit increases the master node timeout on this call to 30 mins.
`waitForRollUpJob` is an assertBusy that waits for the rollup job
to appear in the tasks list, and waits for it to be a certain state.
However, there was a null check around the state assertion, which meant
if the job _was_ null, the assertion would be skipped, and the
assertBusy would pass withouot an exception. This could then lead to
downstream assertions to fail because the job was not actually ready,
or in the wrong state.
This changes the test to assert the job is not null, so the assertBusy
operates as intended.
Since introduction of data types that don't have a corresponding type
in ES the `esType` is error-prone when used for `unmappedType()` calls.
Moreover since the renaming of `DATE` to `DATETIME` and the introduction
of an actual date-only `DATE` the `esType` would return `datetime` which
is not a valid type for ES mapping.
Fixes: #38051
For some users, the built in authorization mechanism does not fit their
needs and no feature that we offer would allow them to control the
authorization process to meet their needs. In order to support this,
a concept of an AuthorizationEngine is being introduced, which can be
provided using the security extension mechanism.
An AuthorizationEngine is responsible for making the authorization
decisions about a request. The engine is responsible for knowing how to
authorize and can be backed by whatever mechanism a user wants. The
default mechanism is one backed by roles to provide the authorization
decisions. The AuthorizationEngine will be called by the
AuthorizationService, which handles more of the internal workings that
apply in general to authorization within Elasticsearch.
In order to support external authorization services that would back an
authorization engine, the entire authorization process has become
asynchronous, which also includes all calls to the AuthorizationEngine.
The use of roles also leaked out of the AuthorizationService in our
existing code that is not specifically related to roles so this also
needed to be addressed. RequestInterceptor instances sometimes used a
role to ensure a user was not attempting to escalate their privileges.
Addressing this leakage of roles meant that the RequestInterceptor
execution needed to move within the AuthorizationService and that
AuthorizationEngines needed to support detection of whether a user has
more privileges on a name than another. The second area where roles
leaked to the user is in the handling of a few privilege APIs that
could be used to retrieve the user's privileges or ask if a user has
privileges to perform an action. To remove the leakage of roles from
these actions, the AuthorizationService and AuthorizationEngine gained
methods that enabled an AuthorizationEngine to return the response for
these APIs.
Ultimately this feature is the work included in:
#37785#37495#37328#36245#38137#38219Closes#32435
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:
> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document
We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem.
This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.
Relates to #1078
Currently the snapshot/restore process manually sets the global
checkpoint to the max sequence number from the restored segements. This
does not work for Ccr as this will lead to documents that would be
recovered in the normal followering operation from being recovered.
This commit fixes this issue by setting the initial global checkpoint to
the existing local checkpoint.
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).
This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:
```
PUT index?include_type_name=false
{
"mappings": {
"_doc": {
"properties": { ... }
}
}
}
```
There is a similar issue for both 'put templates' and 'put mappings' requests
as well.
This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
The backport of #38022 introduced types-deprecation warning for get/put template requests
that cause problems on tests master in mixed cluster scenarios. While these warnings are
caught and ignored in regular Rest tests, the get template requests in XPackRestTestHelper
were missed.
Closes#38412
Tests can override assertToXContentEquivalence() in case their xcontent
cannot be directly compared (e.g. due to insertion order in maps
affecting the xcontent ordering). But the `testHlrcFromXContent` test
hardcoded the equivalence test to `true` instead of consulting
`assertToXContentEquivalence()`
Fixes#36034
With this change we no longer support pluggable discovery implementations. No
known implementations of `DiscoveryPlugin` actually override this method, so in
practice this should have no effect on the wider world. However, we were using
this rather extensively in tests to provide the `test-zen` discovery type. We
no longer need a separate discovery type for tests as we no longer need to
customise its behaviour.
Relates #38410
the clock resolution changed from jdk8->jdk10, hence the test is passing
in jdk8 but failing in jdk10. The Watcher's objects are serialised and
deserialised with milliseconds precision, making test to fail in jdk 10
and higher
closes#38400
The test is now expected to be always passing no matter what the random
locale is. This is fixed with using jdk ZoneId.systemDefault() in both
the test and CronEvalTool
closes#35687
If a job cannot be assigned to a node because an index it
requires is unavailable and there are lazy ML nodes then
index unavailable should be reported as the assignment
explanation rather than waiting for a lazy ML node.
Introduced FollowParameters class that put follow, resume follow,
put auto follow pattern requests and follow info response classes reuse.
The FollowParameters class had the fields, getters etc. for the common parameters
that all these APIs have. Also binary and xcontent serialization /
parsing is handled by this class.
The follow, resume follow, put auto follow pattern request classes originally
used optional non primitive fields, so FollowParameters has that too and the follow info api can handle that now too.
Also the followerIndex field can in production only be specified via
the url path. If it is also specified via the request body then
it must have the same value as is specified in the url path. This
option only existed to xcontent testing. However the AbstractSerializingTestCase
base class now also supports createXContextTestInstance() to provide
a different test instance when testing xcontent, so allowing followerIndex
to be specified via the request body is no longer needed.
By moving the followerIndex field from Body to ResumeFollowAction.Request
class and not allowing the followerIndex field to be specified via
the request body the Body class is redundant and can be removed. The
ResumeFollowAction.Request class can then directly use the
FollowParameters class.
For consistency I also removed the ability to specified followerIndex
in the put follow api and the name in put auto follow pattern api via
the request body.
Authn is enabled only if `license_type` is non `basic`, but `basic` is
what the `LicenseService` generates implicitly. This commit explicitly sets
license type to `trial`, which allows for authn, in the `SecuritySettingsSource`
which is the settings configuration parameter for `InternalTestCluster`s.
The real problem, that had created tests failures like #31028 and #32685, is
that the check `licenseState.isAuthAllowed()` can change sporadically. If it were
to return `true` or `false` during the whole test there would be no problem.
The problem manifests when it turns from `true` to `false` right before `Realms.asList()`.
There are other license checks before this one (request filter, token service, etc)
that would not cause a problem if they would suddenly see the check as `false`.
But switching to `false` before `Realms.asList()` makes it appear that no installed
realms could have handled the authn token which is an authentication error, as can
be seen in the failing tests.
Closes#31028#32685
Renames the following settings to remove the mention of `zen` in their names:
- `discovery.zen.hosts_provider` -> `discovery.seed_providers`
- `discovery.zen.ping.unicast.concurrent_connects` -> `discovery.seed_resolver.max_concurrent_resolvers`
- `discovery.zen.ping.unicast.hosts.resolve_timeout` -> `discovery.seed_resolver.timeout`
- `discovery.zen.ping.unicast.hosts` -> `discovery.seed_addresses`
* Adding apm_user
* Fixing SecurityDocumentationIT testGetRoles test
* Adding access to .ml-anomalies-*
* Fixing APM test, we don't have access to the ML state index
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.
This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.
The API keys:-
- by default do not have an expiration but expiration can be
configured where the API keys need to be expired after a
certain amount of time.
- when generated will keep authentication information of the user that
generated them.
- can be defined with a role describing the privileges for accessing
Elasticsearch and will be limited by the role of the user that
generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
before being deleted. The expired API keys remover task handles this.
Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`
The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```
Closes#34383
We mention in our documentation for the token
expiration configuration maximum value is 1 hour
but do not enforce it. This commit adds max limit
to the TOKEN_EXPIRATION setting.
This test should not pass until CCR finishes integrating shard history
retention leases. It currently sometimes passes (which is a bug in the
test), but cannot pass reliably until the linked issue is resolved.
There are two issues regarding the way that we sync mapping from leader
to follower when a ccr restore is completed:
1. The returned mapping from a cluster service might not be up to date
as the mapping of the restored index commit.
2. We should not compare the mapping version of the follower and the
leader. They are not related to one another.
Moreover, I think we should only ensure that once the restore is done,
the mapping on the follower should be at least the mapping of the copied
index commit. We don't have to sync the mapping which is updated after
we have opened a session.
Relates #36879Closes#37887
This is related to #35975. Currently when an index falls behind a leader
it encounters a fatal exception. This commit adds a test for that
scenario. Additionally, it tests that the user can stop following, close
the follower index, and put follow again. After the indexing is
re-bootstrapped, it will recover the documents it lost in normal
following operations.
This commit fixes the pinning of SSLContexts to TLSv1.2 in the
SSLConfigurationReloaderTests. The pinning was added for the initial
creation of clients and webservers but the updated contexts would
default to TLSv1.3, which is known to cause hangs with the
MockWebServer that we use.
Relates #38103Closes#38247
This PR removes the use of document types from the monitoring exporters and template + watches setup code.
It does not remove the notion of types from the monitoring bulk API endpoint "front end" code as that code will eventually just go away in 8.0 and be replaced with Beats as collectors/shippers directly to the monitoring cluster.
At times, we need to check for usage of deprecated settings in settings
which should not be returned by the NodeInfo API. This commit changes
the deprecation info API to run all node checks locally so that these
settings can be checked without exposing them via any externally
accessible API.
This commit introduces a background sync for retention leases. The idea
here is that we do a heavyweight sync when adding a new retention lease,
and then periodically we want to background sync any retention lease
renewals to the replicas. As long as the background sync interval is
significantly lower than the extended lifetime of a retention lease, it
is okay if from time to time a replica misses a sync (it will still have
an older version of the lease that is retaining more data as we assume
that renewals do not decrease the retaining sequence number). There are
two follow-ups that will come after this commit. The first is to address
the fact that we have not adapted the should periodically flush logic to
possibly flush the retention leases. We want to do something like flush
if we have not flushed in the last five minutes and there are renewed
retention leases since the last time that we flushed. An additional
follow-up will remove the syncing of retention leases when a retention
lease expires. Today this sync could be invoked in the background by a
merge operation. Rather, we will move the syncing of retention lease
expiration to be done under the background sync. The background sync
will use the heavyweight sync (write action) if a lease has expired, and
will use the lightweight background sync (replication action) otherwise.
The explanation so far can be invaluable for troubleshooting
as incorrect decisions made early on in the structure analysis
can result in seemingly crazy decisions or timeouts later on.
Relates elastic/kibana#29821
Today the following settings in the `discovery.zen` namespace are still used:
- `discovery.zen.no_master_block`
- `discovery.zen.hosts_provider`
- `discovery.zen.ping.unicast.concurrent_connects`
- `discovery.zen.ping.unicast.hosts.resolve_timeout`
- `discovery.zen.ping.unicast.hosts`
This commit deprecates all other settings in this namespace so that they can be
removed in the next major version.
It would be beneficial to apply some of the request interceptors even
when features are disabled. This change reworks the way we build that
list so that the interceptors we always want to use are constructed
outside of the settings check.
Instead of throwing an exception, use an unresolved attribute to pass
the message to the Verifier.
Additionally improve the parser to save the extended source for the
Aggregate and OrderBy.
Close#38208
The culprit in #38097 is an `IndicesRequest` that has no indices,
but instead of `request.indices()` returning `null` or `String[0]`
it returned `String[] {null}` . This tripped the audit filter.
I have addressed this in two ways:
1. `request.indices()` returning `String[] {null}` is treated as `null`
or `String[0]`, i.e. no indices
2. `null` values among the roles and indices lists, which are
unexpected, will never again stumble the audit filter; `null` values
are treated as special values that will not match any policy,
i.e. their events will always be printed.
Closes#38097
* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.
Adds a Step to the Shrink and Delete actions which prevents those
actions from running on a leader index - all follower indices must first
unfollow the leader index before these actions can run. This prevents
the loss of history before follower indices are ready, which might
otherwise result in the loss of data.
Introduce client-side sorting of groups based on aggregate
functions. To allow this, the Analyzer has been extended to push down
to underlying Aggregate, aggregate function and the Querier has been
extended to identify the case and consume the results in order and sort
them based on the given columns.
The underlying QueryContainer has been slightly modified to allow a view
of the underlying values being extracted as the columns used for sorting
might not be requested by the user.
The PR also adds minor tweaks, mainly related to tree output.
Close#35118
Because concurrent sync requests from a primary to its replicas could be
in flight, it can be the case that an older retention leases collection
arrives and is processed on the replica after a newer retention leases
collection has arrived and been processed. Without a defense, in this
case the replica would overwrite the newer retention leases with the
older retention leases. This commit addresses this issue by introducing
a versioning scheme to retention leases. This versioning scheme is used
to resolve out-of-order processing on the replica. We persist this
version into Lucene and restore it on recovery. The encoding of
retention leases is starting to get a little ugly. We can consider
addressing this in a follow-up.
There was a bug where creating a new policy would start
the ILM service, even if it was stopped. This change ensures
that there is no change to the existing operation mode
This suite still fails one per week sometimes with a worrying assertion.
Sadly we are still unable to find the actual source.
Expected: <SeqNoStats{maxSeqNo=229, localCheckpoint=86, globalCheckpoint=86}>
but: was <SeqNoStats{maxSeqNo=229, localCheckpoint=-1, globalCheckpoint=86}>
This change enables trace log in the suite so we will have a better
picture if this fails again.
Relates #3333
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.
Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
Unlike assertBusy, awaitBusy does not retry if the code-block throws an
AssertionError. A refresh in atLeastDocsIndexed can fail because we call
this method while we are closing some node in FollowerFailOverIT.
This PR adds the `monitor/xpack/info` cluster-level privilege to the built-in `monitoring_user` role.
This privilege is required for the Monitoring UI to call the `GET _xpack API` on the Monitoring Cluster. It needs to do this in order to determine the license of the Monitoring Cluster, which further determines whether Cluster Alerts are shown to the user or not.
Resolves#37970.
In 7.x Java timestamp formats are the default timestamp format and
there is no need to prefix them with "8". (The "8" prefix was used
in 6.7 to distinguish Java timestamp formats from Joda timestamp
formats.)
This change removes the "8" prefixes from timestamp formats in the
output of the ML file structure finder.
This commit enables the use of TLSv1.3 with security by enabling us to
properly map `TLSv1.3` in the supported protocols setting to the
algorithm for a SSLContext. Additionally, we also enable TLSv1.3 by
default on JDKs that support it.
An issue was uncovered with the MockWebServer when TLSv1.3 is used that
ultimately winds up in an endless loop when the client does not trust
the server's certificate. Due to this, SSLConfigurationReloaderTests
has been pinned to TLSv1.2.
Closes#32276
In 6.3 trial licenses were changed to default to security
disabled, and ee added some heuristics to detect when security should
be automatically be enabled if `xpack.security.enabled` was not set.
This change removes those heuristics, and requires that security be
explicitly enabled (via the `xpack.security.enabled` setting) for
trial licenses.
Relates: #38009
Currently we use the raw byte array length when calling the IndexInput
read call to determine how many bytes we want to read. However, due to
how BigArrays works, the array length might be longer than the reference
length. This commit fixes the issue and uses the BytesRef length when
calling read. Additionally, it expands the index follow test to index
many more documents. These documents should potentially lead to large
enough segment files to trigger scenarios where this fix matters.
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.
This is a continuation of #28667, #36137 and also fixes#37708.
FIRST and LAST can be used with one argument and work similarly to MIN
and MAX but they are implemented using a Top Hits aggregation and
therefore can also operate on keyword fields. When a second argument is
provided then they return the first/last value of the first arg when its
values are ordered ascending/descending (respectively) by the values of
the second argument. Currently because of the usage of a Top Hits
aggregation FIRST and LAST cannot be used in the HAVING clause of a
GROUP BY query to filter on the results of the aggregation.
Closes: #35639
With #37566 we have introduced the ability to merge multiple search responses into one. That makes it possible to expose a new way of executing cross-cluster search requests, that makes CCS much faster whenever there is network latency between the CCS coordinating node and the remote clusters. The coordinating node can now send a single search request to each remote cluster, which gets reduced by each one of them. from + size results are requested to each cluster, and the reduce phase in each cluster is non final (meaning that buckets are not pruned and pipeline aggs are not executed). The CCS coordinating node performs an additional, final reduction, which produces one search response out of the multiple responses received from the different clusters.
This new execution path will be activated by default for any CCS request unless a scroll is provided or inner hits are requested as part of field collapsing. The search API accepts now a new parameter called ccs_minimize_roundtrips that allows to opt-out of the default behaviour.
Relates to #32125
* Added SSL configuration options tests
Removed the allow.self.signed option from the documentation since we allow
by default self signed certificates as well.
* Added more tests
The existing implementation was slow due to exceptions being thrown if
an accessor did not have a time zone. This implementation queries for
having a timezone, local time and local date and also checks for an
instant preventing to throw an exception and thus speeding up the conversion.
This removes the existing method and create a new one named
DateFormatters.from(TemporalAccessor accessor) to resemble the naming of
the java time ones.
Before this change an epoch millis parser using the toZonedDateTime
method took approximately 50x longer.
Relates #37826
* move watcher to seq# occ
* top level set
* fix parsing and missing setters
* share toXContent for PutResponse and rest end point
* fix redacted password
* fix username reference
* fix deactivate-watch.asciidoc have seq no references
* add seq# + term to activate-watch.asciidoc
* more doc fixes
This commit fixes the test case that ensures only a priority
less then 0 is used with testNonPositivePriority. This also
allows the HLRC to support a value of 0.
Closes#37652
Previously, ShrinkAction would fail if
it was executed on an index that had
the same number of shards as the target
shrunken number.
This PR introduced a new BranchingStep that
is used inside of ShrinkAction to branch which
step to move to next, depending on the
shard values. So no shrink will occur if the
shard count is unchanged.
This commit allows implementors of the `HandledTransportAction` to
specify what thread the action should be executed on. The motivation for
this commit is that certain CCR requests should be performed on the
generic threadpool.
The apache commons http client implementations recently released
versions that solve TLS compatibility issues with the new TLS engine
that supports TLSv1.3 with JDK 11. This change updates our code to
use these versions since JDK 11 is a supported JDK and we should
allow the use of TLSv1.3.
This fixes#38027. Currently we assert that all shards have failed.
However, it is possible that some shards do not have segement files
created yet. The action that we block is fetching these segement files
so it is possible that some shards successfully recover.
This commit changes the assertion to ensure that at least some of the
shards have failed.
Today we pass `discovery.zen.minimum_master_nodes` to nodes started up in
tests, but for 7.x nodes this setting is not required as it has no effect.
This commit removes this setting so that nodes are started with more realistic
configurations, and deprecates it.
Types have been deprecated and this commit removes the documentation
for specifying types in the index action, and search input/transform.
Relates #37594#35190
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.
This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
The certgen, certutil and saml-metadata tools did not correctly return
their exit code to the calling shell.
These commands now explicitly exit with the code that was returned
from the main(args, terminal) method.
This commit fixes a potential race in the IndexFollowingIT. Currently it
is possible that we fetch the task metadata, it is null, and that throws
a null pointer exception. Assertbusy does not catch null pointer
exceptions. This commit assertions that the metadata is not null.
This is related to #35975. It adds a action timeout setting that allows
timeouts to be applied to the individual transport actions that are
used during a ccr recovery.
Restricted indices (currently only .security-6 and .security) are special
internal indices that require setting the `allow_restricted_indices` flag
on every index permission that covers them. If this flag is `false`
(default) the permission will not cover these and actions against them
will not be authorized.
However, the monitoring APIs were the only exception to this rule.
This exception is herein forfeited and index monitoring privileges have to be
granted explicitly, using the `allow_restricted_indices` flag on the permission,
as is the case for any other index privilege.
This commit modifies the put follow index action to use a
CcrRepository when creating a follower index. It routes
the logic through the snapshot/restore process. A
wait_for_active_shards parameter can be used to configure
how long to wait before returning the response.
This change adds a _meta field storing the version in which
the index mappings were last updated to the 3 ML indices
that didn't previously have one:
- .ml-annotations
- .ml-meta
- .ml-notifications
All other ML indices already had such a _meta field.
This field will be useful if we ever need to automatically
update the index mappings during a future upgrade.
Runnables can be submitted to
AutodetectProcessManager.AutodetectWorkerExecutorService
without error after it has been shutdown which can lead
to requests timing out as their handlers are never called
by the terminated executor.
This change throws an EsRejectedExecutionException if a
runnable is submitted after after the shutdown and calls
AbstractRunnable.onRejection on any tasks not run.
Closes#37108
This commit changes the TransportVerifyShardBeforeCloseAction so that it issues a
forced flush, forcing the translog and the Lucene commit to contain the same max seq
number and global checkpoint in the case the Translog contains operations that were
not written in the IndexWriter (like a Delete that touches a non existing doc). This way
the assertion added in #37426 won't trip.
Related to #33888
This commit moves the auditing of job deletion related errors
to the final listener in the job delete action. This ensures
any error that occurs during job deletion is audited.
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.
To populate additional fields node.id and cluster.uuid which are not available at start time,
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter.
Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions
Docker log4j2 configuration is now almost the same as the one use for ES binary.
The only difference is that docker is using console appenders, whereas ES is using file appenders.
relates: #32850
This commit allows JIRA API fields that require a list of key/value
pairs (maps), such as JIRA "components" to use use template snippets
(e.g. {{ctx.payload.foo}}). Prior to this change the templated value
(not the de-referenced value) would be sent via the API and error.
Closes#30068
We inject an Unfollow action before Shrink because the Shrink action
cannot be safely used on a following index, as it may not be fully
caught up with the leader index before the "original" following index is
deleted and replaced with a non-following Shrunken index. The Unfollow
action will verify that 1) the index is marked as "complete", and 2) all
operations up to this point have been replicated from the leader to the
follower before explicitly disconnecting the follower from the leader.
Injecting an Unfollow action before the Rollover action is done mainly
as a convenience: This allow users to use the same lifecycle policy on
both the leader and follower cluster without having to explictly modify
the policy to unfollow the index, while doing what we expect users to
want in most cases.
If the index request is executed before the mapping update is applied on
the IndexShard, the index request will perform a dynamic mapping update.
This mapping update will be timeout (i.e, ProcessClusterEventTimeoutException)
because the latch is not open. This leads to the failure of the index
request and the test. This commit makes sure the mapping is ready
before we execute the index request.
Closes#37807
This commit adds deprecation warnings for index actions
and search actions when executed via watcher. Unit and
integration tests updated accordingly.
relates #35190
* ML: Add MlMetadata.upgrade_mode and API
* Adding tests
* Adding wait conditionals for the upgrade_mode call to return
* Adding tests
* adjusting format and tests
* Adjusting wait conditions for api return and msgs
* adjusting doc tests
* adding upgrade mode tests to black list
We have read and write aliases for the ML results indices. However,
the job still had methods that purported to reliably return the name
of the concrete results index being used by the job. After reindexing
prior to upgrade to 7.x this will be wrong, so the method has been
renamed and the comments made more explicit to say the returned index
name may not be the actual concrete index name for the lifetime of the
job. Additionally, the selection of indices when deleting the job
has been changed so that it works regardless of concrete index names.
All these changes are nice-to-have for 6.7 and 7.0, but will become
critical if we add rolling results indices in the 7.x release stream
as 6.7 and 7.0 nodes may have to operate in a mixed version cluster
that includes a version that can roll results indices.
* Changed `LuceneSnapshot` to throw an `OperationsMissingException` if the requested ops are missing.
* Changed the shard changes api to handle the `OperationsMissingException` and wrap the exception into `ResourceNotFound` exception and include metadata to indicate the requested range can no longer be retrieved.
* Changed `ShardFollowNodeTask` to handle this `ResourceNotFound` exception with the included metdata header.
Relates to #35975
Replace `threadPool().schedule()` / catch
`EsRejectedExecutionException` pattern with direct calls to
`ThreadPool#scheduleUnlessShuttingDown()`.
Closes#36318
This commit introduces the `create_snapshot` cluster privilege and
the `snapshot_user` role.
This role is to be used by "cronable" tools that call the snapshot API
periodically without recurring to the `manage` cluster privilege. The
`create_snapshot` cluster privilege is much more limited compared to
the `manage` privilege.
The `snapshot_user` role grants the privileges to view the metadata of
all indices (including restricted ones, i.e. .security). It obviously grants the
create snapshot privilege but the repository has to be created using another
role. In addition, it grants the privileges to (only) GET repositories and
snapshots, but not create and delete them.
The role does not allow to create repositories. This distinction is important
because snapshotting equates to the `read` index privilege if the user has
control of the snapshot destination, but this is not the case in this instance,
because the role does not grant control over repository configuration.
This commit introduces retention lease syncing from the primary to its
replicas when a new retention lease is added. A follow-up commit will
add a background sync of the retention leases as well so that renewed
retention leases are synced to replicas.
Made the test tolerant to index upgrade being run
in between the old/mixed/upgraded portions. This
can occur because the rolling upgrade tests all
share the same indices.
Fixes#37763
The ML file structure finder has always reported both Joda
and Java time format strings. This change makes the Java time
format strings the ones that are incorporated into mappings
and ingest pipeline definitions.
The BWC syntax of prepending "8" to these formats is used.
This will need to be removed once Java time format strings
become the default in Elasticsearch.
This commit also removes direct imports of Joda classes in the
structure finder unit tests. Instead the core Joda BWC class
is used.
* Exit batch files explictly using ERRORLEVEL
This makes sure the exit code is preserved when calling the batch
files from different contexts other than DOS
Fixes#29582
This also fixes specific error codes being masked by an explict
exit /b 1
causing the useful exitcodes from ExitCodes to be lost.
* fix line breaks for calling cli to match the bash scripts
* indent size of bash files is 2, make sure editorconfig does the same for bat files
* update indenting to match bash files
* update elasticsearch-keystore.bat indenting
* Update elasticsearch-node.bat to exit outside of endlocal
The TransportUnfollowAction updates the index settings but does not
increase the settings version to reflect that change.
This issue has been caught while working on the replication of closed
indices (#33888). The IndexFollowingIT.testUnfollowIndex() started to
fail and this specific assertion tripped. It does not happen on master
branch today because index metadata for closed indices are never
updated in IndexService instances, but this is something that is going
to change with the replication of closed indices.
The unlucky timing can cause this test to fail when the indexing is triggered from `maybeTriggerAsyncJob`. As this is asynchronous, in can finish quicker then the test stepping over to next assertion
The introduced barrier solves the problem
closes#37695
* Remove empty statements
There are a couple of instances of undocumented empty statements all across the
code base. While they are mostly harmless, they make the code hard to read and
are potentially error-prone. Removing most of these instances and marking blocks
that look empty by intention as such.
* Change test, slightly more verbose but less confusing
When upgrading from 5.4 to 5.5 to 6.7 (inclusive) it was
necessary to ensure there was a mapping for type "doc" on
the ML state index before opening a job. This was because
5.4 created a multi-type ML state index.
In version 7.x we can be sure that any such 5.4 index is no
longer in use. It would have had to be reindexed into the
6.x index format prior to the upgrade to version 7.x.
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.
Closes#33028
When an index is frozen, two index settings are updated (index.frozen and
index.search.throttled) but the settings version is left unchanged and does
not reflect the settings update. This commit change the
TransportFreezeIndexAction so that it also increases the settings version
when an index is frozen/unfrozen.
This issue has been caught while working on the replication of closed
indices (#3388) in which index metadata for a closed index are updated
to frozen metadata and this specific assertion tripped.
The default value for ssl.supported_protocols no longer includes TLSv1
as this is an old protocol with known security issues.
Administrators can enable TLSv1.0 support by configuring the
appropriate `ssl.supported_protocols` setting, for example:
xpack.security.http.ssl.supported_protocols: ["TLSv1.2","TLSv1.1","TLSv1"]
Relates: #36021
This deprecates the `xpack.watcher.history.cleaner_service.enabled` setting,
since all newly created `.watch-history` indices in 7.0 will use ILM to manage
their retention.
In 8.0 the setting itself and cleanup actions will be removed.
Resolves#32041
Today, the mapping on the follower is managed and replicated from its
leader index by the ShardFollowTask. Thus, we should prevent users
from modifying the mapping on the follower indices.
Relates #30086
When the arguements of PERCENTILE and PERCENTILE_RANK can be folded,
the `ConstantFolding` rule kicks in and calls the `replaceChildren()`
method on `InnerAggregate` which is created from the aggregation rules
of the `Optimizerz. `InnerAggregate` in turn, cannot implement the method
as the logic of creating a new `InnerAggregate` instance from a list of
`Expression`s resides in the Optimizer. So, instead, `ConstantFolding`
should be applied before any of the aggregations related rules.
Fixes: #37099
* Testing conventions now checks for tests in main
This is the last outstanding feature of the old NamingConventionsTask,
so time to remove it.
* PR review
This change adds a docker compose configuration that's used with
the `elasticsearch.test.fixtures` plugin to start up the image
and check that the TCP ports are up.
We can build on this to add other checks for culster health,
run REST tests, etc.
We can add multiple containers and configurations to the compose
file (e.x. test different env vars) and form clusters.
When we can't map the principal attribute from the configured SAML
attribute in the realm settings, we can't complete the
authentication. We return an error to the user indicating this and
we present them with a list of attributes we did get from the SAML
response to point out that the expected one was not part of that
list. This list will never contain the NameIDs though as they are
not part of the SAMLAttribute list. So we might have a NameID but
just with a different format.
This commit removes the Index Audit Output type, following its deprecation
in 6.7 by 8765a31d4e6770. It also adds the migration notice (settings notice).
In general, the problem with the index audit output is that event indexing
can be slower than the rate with which audit events are generated,
especially during the daily rollovers or the rolling cluster upgrades.
In this situation audit events will be lost which is a terrible failure situation
for an audit system.
Besides of the settings under the `xpack.security.audit.index` namespace, the
`xpack.security.audit.outputs` setting has also been deprecated and will be
removed in 7. Although explicitly configuring the logfile output does not touch
any deprecation bits, this setting is made redundant in 7 so this PR deprecates
it as well.
Relates #29881
The filtering by follower index was completely broken.
Also the wrong persistent tasks were selected, causing the
wrong status to be reported.
Closes#37738
Today we keep the mapping on the follower in sync with the leader's
using the mapping version from changes requests. There are two rare
cases where the mapping on the follower is not synced properly:
1. The returned mapping version (from ClusterService) is outdated than
the actual mapping. This happens because we expose the latest cluster
state in ClusterService after applying it to IndexService.
2. It's possible for the FollowTask to receive an outdated mapping than
the min_required_mapping. In that case, it should fetch the mapping
again; otherwise, the follower won't have the right mapping.
Relates to #31140
In some cases we only have a string collection instead of a string list
that we want to serialize out. We have a convenience method for writing
a list of strings, but no such method for writing a collection of
strings. Yet, a list of strings is a collection of strings, so we can
simply liberalize StreamOutput#writeStringList to be more generous in
the collections that it accepts and write out collections of strings
too. On the other side, we do not have a convenience method for reading
a list of strings. This commit addresses both of these issues.
* Use ILM for Watcher history deletion
This commit adds an index lifecycle policy for the `.watch-history-*` indices.
This policy is automatically used for all new watch history indices.
This does not yet remove the automatic cleanup that the monitoring plugin does
for the .watch-history indices, and it does not touch the
`xpack.watcher.history.cleaner_service.enabled` setting.
Relates to #32041
Some steps, such as steps that delete, close, or freeze an index, may fail due to a currently running snapshot of the index. In those cases, rather than move to the ERROR step, we should retry the step when the snapshot has completed.
This change adds an abstract step (`AsyncRetryDuringSnapshotActionStep`) that certain steps (like the ones I mentioned above) can extend that will automatically handle a situation where a snapshot is taking place. When a `SnapshotInProgressException` is received by the listener wrapper, a `ClusterStateObserver` listener is registered to wait until the snapshot has completed, re-running the ILM action when no snapshot is occurring.
This also adds integration tests for these scenarios (thanks to @talevy in #37552).
Resolves#37541
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.
The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.
Relates #27330
This change moves the update to the results index mappings
from the open job action to the code that starts the
autodetect process.
When a rolling upgrade is performed we need to update the
mappings for already-open jobs that are reassigned from an
old version node to a new version node, but the open job
action is not called in this case.
Closes#37607
While tests migration from Zen1 to Zen2, we've encountered this test.
This test is organized as follows:
Starts the first cluster node.
Starts the second cluster node.
Checks that license is active.
Interesting fact that adding assertLicenseActive(true) between 1
and 2 also makes the test pass.
assertLicenseActive retrieves XPackLicenseState from the nodes
and checks that active flag is set. It's set to true even before
the cluster is initialized.
So this test does not make sense.
This adds a set of helper classes to determine if an agg "has a value".
This is needed because InternalAggs represent "empty" in different
manners according to convention. Some use `NaN`, `+/- Inf`, `0.0`, etc.
A user can pass the Internal agg type to one of these helper methods
and it will report if the agg contains a value or not, which allows the
user to differentiate "empty" from a real `NaN`.
These helpers are best-effort in some cases. For example, several
pipeline aggs share a single return class but use different conventions
to mark "empty", so the helper uses the loosest definition that applies
to all the aggs that use the class.
Sums in particular are unreliable. The InternalSum simply returns 0.0
if the agg is empty (which is correct, no values == sum of zero). But this
also means the helper cannot differentiate from "empty" and `+1 + -1`.
It looks like the output of FileUserPasswdStore.parseFile shouldn't be wrapped
into another map since its output can be null. Doing this wrapping after the null
check (which potentially raises an exception) instead.
Use PEM files for the key/cert for TLS on the http layer of the
node instead of a JKS keystore so that the tests can also run
in a FIPS 140 JVM .
Resolves: #37682
* Add separate CLI Mode
* Use the correct Mode for cursor close requests
* Renamed CliFormatter and have different formatting behavior for CLI and "text" format.
Due to missing stubbing for `NativePrivilegeStore#getPrivileges`
the test `testNegativeLookupsAreCached` failed
when the superuser role name was present in the role names.
This commit adds missing stubbing.
Closes: #37657
Currently we create dedicated network threads for both the http and
transport implementations. Since these these threads should never
perform blocking operations, these threads could be shared. This commit
modifies the nio-transport to have 0 http workers be default. If the
default configs are used, this will cause the http transport to be run
on the transport worker threads. The http worker setting will still exist
in case the user would like to configure dedicated workers. Additionally,
this commmit deletes dedicated acceptor threads. We have never had these
for the netty transport and they can be added back if a need is
determined in the future.
The integ tests currently use the raw zip project name as the
distribution type. This commit simplifies this specification to be
"default" or "oss". Whether zip or tar is used should be an internal
implementation detail of the integ test setup, which can (in the future)
be platform specific.
This grants the capability to grant privileges over certain restricted
indices (.security and .security-6 at the moment).
It also removes the special status of the superuser role.
IndicesPermission.Group is extended by adding the `allow_restricted_indices`
boolean flag. By default the flag is false. When it is toggled, you acknowledge
that the indices under the scope of the permission group can cover the
restricted indices as well. Otherwise, by default, restricted indices are ignored
when granting privileges, thus rendering them hidden for authorization purposes.
This effectively adds a confirmation "check-box" for roles that might grant
privileges to restricted indices.
The "special status" of the superuser role has been removed and coded as
any other role:
```
new RoleDescriptor("superuser",
new String[] { "all" },
new RoleDescriptor.IndicesPrivileges[] {
RoleDescriptor.IndicesPrivileges.builder()
.indices("*")
.privileges("all")
.allowRestrictedIndices(true)
// this ----^
.build() },
new RoleDescriptor.ApplicationResourcePrivileges[] {
RoleDescriptor.ApplicationResourcePrivileges.builder()
.application("*")
.privileges("*")
.resources("*")
.build()
},
null, new String[] { "*" },
MetadataUtils.DEFAULT_RESERVED_METADATA,
Collections.emptyMap());
```
In the context of the Backup .security work, this allows the creation of a
"curator role" that would permit listing (get settings) for all indices
(including the restricted ones). That way the curator role would be able to
ist and snapshot all indices, but not read or restore any of them.
Supersedes #36765
Relates #34454
Removes all sensitive settings (passwords, auth tokens, urls, etc...) for
watcher notifications accounts. These settings were deprecated (and
herein removed) in favor of their secure sibling that is set inside the
elasticsearch keystore. For example:
`xpack.notification.email.account.<id>.smtp.password`
is no longer a valid setting, and it is replaced by
`xpack.notification.email.account.<id>.smtp.secure_password`
The ML subproject of xpack has a cache for the cpp artifact snapshots
which is checked on each build. The cache is outside of the build dir so
that it is not wiped on a typical clean, as the artifacts can be large
and do not change often. This commit adds a cleanCache task which will
wipe the cache dir, as over time the size of the directory can become
bloated.
Currently we add the CcrRestoreSourceService as a index event
listener. However, if ccr is disabled, this service is null and we
attempt to add a null listener throwing an exception. This commit only
adds the listener if ccr is enabled.
This is related to #35975. This commit adds timeout functionality to
the local session on a leader node. When a session is started, a timeout
is scheduled using a repeatable runnable. If the session is not accessed
in between two runs the session is closed. When the sssion is closed,
the repeating task is cancelled.
Additionally, this commit moves session uuid generation to the leader
cluster. And renames the PutCcrRestoreSessionRequest to
StartCcrRestoreSessionRequest to reflect that change.
* Remove obsolete deprecation checks
This also updates the old-indices check to be appropriate for the 7.x
series of releases, and leaves it as the only deprecation check in
place.
* Add toString to DeprecationIssue
* Bring filterChecks across from 6.x
* License headers
This change adds the unfollow action for CCR follower indices.
This is needed for the shrink action in case an index is a follower index.
This will give the follower index the opportunity to fully catch up with
the leader index, pause index following and unfollow the leader index.
After this the shrink action can safely perform the ilm shrink.
The unfollow action needs to be added to the hot phase and acts as
barrier for going to the next phase (warm or delete phases), so that
follower indices are being unfollowed properly before indices are expected
to go in read-only mode. This allows the force merge action to execute
its steps safely.
The unfollow action has three steps:
* `wait-for-indexing-complete` step: waits for the index in question
to get the `index.lifecycle.indexing_complete` setting be set to `true`
* `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks
for the index being handled to report that the leader shard global checkpoint
is equal to the follower shard global checkpoint.
* `pause-follower-index` step: Pauses index following, necessary to unfollow
* `close-follower-index` step: Closes the index, necessary to unfollow
* `unfollow-follower-index` step: Actually unfollows the index using
the CCR Unfollow API
* `open-follower-index` step: Reopens the index now that it is a normal index
* `wait-for-yellow` step: Waits for primary shards to be allocated after
reopening the index to ensure the index is ready for the next step
In the case of the last two steps, if the index in being handled is
a regular index then the steps acts as a no-op.
Relates to #34648
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Gordon Brown <gordon.brown@elastic.co>
This change fixes the setup of the SSL configuration for the test
openldap realm. The configuration was missing the realm identifier so
the SSL settings being used were just the default JDK ones that do not
trust the certificate of the idp fixture.
See #37591
Throws an exception if hit extractor tries to retrieve unsupported
object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws
an exception instead of returning null.
Relates to #37364
* Add ccr follow info api
This api returns all follower indices and per follower index
the provided parameters at put follow / resume follow time and
whether index following is paused or active.
Closes#37127
* iter
* [DOCS] Edits the get follower info API
* [DOCS] Fixes link to remote cluster
* [DOCS] Clarifies descriptions for configured parameters
Commit #37535 removed an internal restore request in favor of the
RestoreSnapshotRequest. Commit #37449 added a new test that used the
internal restore request. This commit modifies the new test to use the
RestoreSnapshotRequest.
This is a continuation of #28667 and has as goal to convert all executors to propagate errors to the
uncaught exception handler. Notable missing ones were the direct executor and the scheduler. This
commit also makes it the property of the executor, not the runnable, to ensure this property. A big
part of this commit also consists of vastly improving the test coverage in this area.
This commit adds a set_priority action to the hot, warm, and cold
phases for an ILM policy. This action sets the `index.priority`
on the managed index to allow different priorities between the
hot, warm, and cold recoveries.
This commit also includes the HLRC and documentation changes.
closes#36905
* SQL: Rename SQL data type DATE to DATETIME
SQL data type DATE has only the date part (e.g.: 2019-01-14)
without any time information. Previously the SQL type DATE was
referring to the ES DATE which contains also the time part along
with TZ information. To conform with SQL data types the data type
`DATE` is renamed to `DATETIME`, since it includes also the time,
as a new runtime SQL `DATE` data type will be introduced down the road,
which only contains the date part and meets the SQL standard.
Closes: #36440
* Address comments
When reporting metadata, several clients have issues with the 'ALIAS'
type. To improve compatibility and be consistent with the ANSI SQL
expectations and because they are similar, aliases targets are now
reported as views.
Close#37422
Currently all proxied actions are denied for the `SystemPrivilege`.
Unfortunately, there are use cases (CCR) where we would like to proxy
actions to a remote node that are normally performed by the
system context. This commit allows the system context to perform
proxy actions if they are actions that the system context is normally
allowed to execute.
The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class.
It no longer extends the AbstractComponent so there is no need for this constructor
There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor.
This is part 1. which will be backported to 6.x with a migration guide/deprecation log.
part 2 will have this constructor removed in 7
relates #35560
relates #34488
This change fixes failures in the SslMultiPortTests where we attempt to
connect to a profile on a port it is listening on but the connection
fails. The failure is due to the profile being bound to multiple
addresses and randomization will pick one of these addresses to
determine the listening port. However, the address we get the port for
may not be the address we are actually connecting to. In order to
resolve this, the test now sets the bind host for profiles to the
loopback address and uses the same address for connecting.
Closes#37481
Migrate ml job and datafeed config of open jobs and update
the parameters of the persistent tasks as they become unallocated
during a rolling upgrade. Block allocation of ml persistent tasks
until the configs are migrated.
Currently when there are no more auto follow patterns for a remote cluster then
the AutoFollower instance for this remote cluster will be removed. If
a new auto follow pattern for this remote cluster gets added quickly enough
after the last delete then there may be two AutoFollower instance running
for this remote cluster instead of one.
Each AutoFollower instance stops automatically after it sees in the
start() method that there are no more auto follow patterns for the
remote cluster it is tracking. However when an auto follow pattern
gets removed and then added back quickly enough then old AutoFollower
may never detect that at some point there were no auto follow patterns
for the remote cluster it is monitoring. The creation and removal of
an AutoFollower instance happens independently in the `updateAutoFollowers()`
as part of a cluster state update.
By adding the `removed` field, an AutoFollower instance will not miss the
fact there were no auto follow patterns at some point in time. The
`updateAutoFollowers()` method now marks an AutoFollower instance as
removed when it sees that there are no more patterns for a remote cluster.
The updateAutoFollowers() method can then safely start a new AutoFollower
instance.
Relates to #36761
This change deletes the SslNullCipherTests from our codebase since it
will have issues with newer JDK versions and it is essentially testing
JDK functionality rather than our own. The upstream JDK issue for
disabling these ciphers by default is
https://bugs.openjdk.java.net/browse/JDK-8212823.
Closes#37403
The test that remote clusters used by ML datafeeds have
a license that allows ML was not accounting for the
possibility that the remote cluster name could be
wildcarded. This change fixes that omission.
Fixes#36228
The SourceOnlySnapshotIT class tests a source only repository
using the following scenario:
starts a master node
starts a data node
creates a source only repository
creates an index with documents
snapshots the index to the source only repository
deletes the index
stops the data node
starts a new data node
restores the index
Thanks to ESIntegTestCase the index is sometimes created using a custom
data path. With such a setting, when a shard is assigned to one of the data
node of the cluster the shard path is resolved using the index custom data
path and the node's lock id by the NodeEnvironment#resolveCustomLocation().
It should work nicely but in SourceOnlySnapshotIT.snashotAndRestore(), b
efore the change in this PR, the last data node was restarted using a different
path.home. At startup time this node was assigned a node lock based on other
locks in the data directory of this temporary path.home which is empty. So it
always got the 0 lock id. And when this new data node is assigned a shard for
the index and resolves it against the index custom data path, it also uses the
node lock id 0 which conflicts with another node of the cluster, resulting in
various errors with the most obvious one being LockObtainFailedException.
This commit removes the temporary home path for the last data node so that it
uses the same path home as other nodes of the cluster and then got assigned
a correct node lock id at startup.
Closes#36330Closes#36276
Adjust FieldExtractor to handle fields which contain `.` in their
name, regardless where they fall in, in the document hierarchy. E.g.:
```
{
"a.b": "Elastic Search"
}
{
"a": {
"b.c": "Elastic Search"
}
}
{
"a.b": {
"c": {
"d.e" : "Elastic Search"
}
}
}
```
Fixes: #37128
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit removes the fallback for SSL settings. While this may be
seen as a non user friendly change, the intention behind this change
is to simplify the reasoning needed to understand what is actually
being used for a given SSL configuration. Each configuration now needs
to be explicitly specified as there is no global configuration or
fallback to some other configuration.
Closes#29797
This new `hostname` field is meant to be a replacement for its sibling `name` field. See https://github.com/elastic/beats/pull/9943, particularly https://github.com/elastic/beats/pull/9943#discussion_r245932581.
This PR simply adds the new field (`hostname`) to the mapping without removing the old one (`name`), because a user might be running an older-version Beat (without this field rename in it) with a newer-version Monitoring ES cluster (with this PR's change in it).
AFAICT the Monitoring UI isn't currently using the `name` field so no changes are necessary there yet. If it decides to start using the `name` field, it will also want to look at the value of the `hostname` field.
This is related to #35975. It implements a file based restore in the
CcrRepository. The restore transfers files from the leader cluster
to the follower cluster. It does not implement any advanced resiliency
features at the moment. Any request failure will end the restore.
Adds another field, named "request.method", to the structured logfile audit.
This field is present for all events associated with a REST request (not a
transport request) and the value is one of GET, POST, PUT, DELETE, OPTIONS,
HEAD, PATCH, TRACE and CONNECT.
Improve error messages by returning the original SQL statement
declaration instead of trying to reproduce it as the casing and
whitespaces are not preserved accurately leading to small
differences.
Close#37161
Since `full` can be common as a field name or part of a field name
(e.g.: `full.name` or `name.full`), it's nice if it's not a reserved
keyword of the grammar so a user can use it without resorting to quotes.
Fixes: #37376
SqlPlugin cannot have more than one public constructor, so for the testing
purposes the `getLicenseState()` should be overriden.
Fixes: #37191
Co-authored-by: Michael Basnight <mbasnight@gmail.com>
When this message was first added the model debug config was
the only thing that could be updated, but now more aspects of
the config can be updated so the message needs to be more
general.
* ML: Updating .ml-state calls to be able to support > 1 index
* Matching bulk delete behavior with dbq
* Adjusting state name
* refreshing indices before search
* fixing line length
* adjusting index expansion options
This is a reinforcement of #37227. It turns out that
persistent tasks are not made stale if the node they
were running on is restarted and the master node does
not notice this. The main scenario where this happens
is when minimum master nodes is the same as the number
of nodes in the cluster, so the cluster cannot elect a
master node when any node is restarted.
When an ML node restarts we need the datafeeds for any
jobs that were running on that node to not just wait
until the jobs are allocated, but to wait for the
autodetect process of the job to start up. In the case
of reassignment of the job persistent task this was
dealt with by the stale status test. But in the case
where a node restarts but its persistent tasks are not
reassigned we need a deeper test.
Fixes#36810
This adds a configurable whitelist to the HTTP client in watcher. By
default every URL is allowed to retain BWC. A dynamically configurable
setting named "xpack.http.whitelist" was added that allows to
configure an array of URLs, which can also contain simple regexes.
Closes#29937
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
This change ensures we always countdown the latch in the
SSLConfigurationReloaderTests to prevent the suite from timing out in
case of an exception. Additionally, we also increase the logging of the
resource watcher in case an IOException occurs.
See #36053
Field of types aliases that have dots in name are returned without a
hierarchy by field_caps, as oppose to the mapping api or field with
concrete types, which in turn breaks IndexResolver.
This commit fixes this by creating the backing hierarchy similar to the
mapping api.
Close#37224
Jobs created in version 6.1 or earlier can have a
null model_memory_limit. If these are parsed from
cluster state following a full cluster restart then
we replace the null with 4096mb to make the meaning
explicit. But if such jobs are streamed from an
old node in a mixed version cluster this does not
happen. Therefore we need to account for the
possibility of a null model_memory_limit in the ML
memory tracker.
This commit reorders the realm list for iteration based on the last
successful authentication for the given principal. This is an
optimization to prevent unnecessary iteration over realms if we can
make a smart guess on which realm to try first.
Fail with a 403 when indexing a document directly into a follower index.
In order to test this change, I had to move specific assertions into a dedicated class and
disable assertions for that class in the rest qa module. I think that is the right trade off.
If a running shard follow task needs to be restarted and
the remote connection seeds have changed then
a shard follow task currently fails with a fatal error.
The change creates the remote client lazily and adjusts
the errors a shard follow task should retry.
This issue was found in test failures in the recently added
ccr rolling upgrade tests. The reason why this issue occurs
more frequently in the rolling upgrade test is because ccr
is setup in local mode (so remote connection seed will become stale) and
all nodes are restarted, which forces the shard follow tasks to get
restarted at some point during the test. Note that these tests
cannot be enabled yet, because this change will need to be backported
to 6.x first. (otherwise the issue still occurs on non upgraded nodes)
I also changed the RestartIndexFollowingIT to setup remote cluster
via persistent settings and to also restart the leader cluster. This
way what happens during the ccr rolling upgrade qa tests, also happens
in this test.
Relates to #37231
* Tests: Add ElasticsearchAssertions.awaitLatch method
Some tests are using assertTrue(latch.await(...)) in their code. This
leads to an assertion error without any error message. This adds a
method which has a nicer error message and can be used in tests.
* fix forbidden apis
* fix spaces
* provide overriden `hashCode` and toString methods to account for `DISTINCT`
* change the analyzer for scenarios where `COUNT <field_name>` and `COUNT DISTINCT` have different paths
* defined a new `filter` aggregation encapsulating an `exists` query to filter out null or missing values
* ML: add migrate anomalies assistant
* adjusting failure handling for reindex
* Fixing request and tests
* Adding tests to blacklist
* adjusting test
* test fix: posting data directly to the job instead of relying on datafeed
* adjusting API usage
* adding Todos and adjusting endpoint
* Adding types to reindexRequest
* removing unreliable "live" data test
* adding index refresh to test
* adding index refresh to test
* adding index refresh to yaml test
* fixing bad exists call
* removing todo
* Addressing remove comments
* Adjusting rest endpoint name
* making service have its own logger
* adjusting validity check for newindex names
* fixing typos
* fixing renaming
This commit fixes a race condition in a test introduced by #36900 that
verifies concurrent authentications get a result propagated from the
first thread that attempts to authenticate. Previously, a thread may
be in a state where it had not attempted to authenticate when the first
thread that authenticates finishes the authentication, which would
cause the test to fail as there would be an additional authentication
attempt. This change adds additional latches to ensure all threads have
attempted to authenticate before a result gets returned in the
thread that is performing authentication.
Closing a channel using TLS/SSL requires reading and writing a
CLOSE_NOTIFY message (for pre-1.3 TLS versions). Many implementations do
not actually send the CLOSE_NOTIFY message, which means we are depending
on the TCP close from the other side to ensure channels are closed. In
case there is an issue with this, we need a timeout. This commit adds a
timeout to the channel close process for TLS secured channels.
As part of this change, we need a timer service. We could use the
generic Elasticsearch timeout threadpool. However, it would be nice to
have a local to the nio event loop timer service dedicated to network needs. In
the future this service could support read timeouts, connect timeouts,
request timeouts, etc. This commit adds a basic priority queue backed
service. Since our timeout volume (channel closes) is very low, this
should be fine. However, this can be updated to something more efficient
in the future if needed (timer wheel). Everything being local to the event loop
thread makes the logic simple as no locking or synchronization is necessary.
We already had logic to stop datafeeds running against
jobs that were OPENING, but a job that relocates from
one node to another while OPENED stays OPENED, and this
could cause the datafeed to fail when it sent data to
the OPENED job on its new node before it had a
corresponding autodetect process.
This change extends the check to stop datafeeds running
when their job is OPENING _or_ stale (i.e. has not had
its status reset since relocating to a different node).
Relates #36810
This bug was introduced in #36893 and had the effect that
execution would continue after calling onFailure on the the
listener in checkIfTokenIsValid in the case that the token is
expired. In a case of many consecutive requests this could lead to
the unwelcome side effect of an expired access token producing a
successful authentication response.
After #30794, our caching realms limit each principal to a single auth
attempt at a time. This prevents hammering of external servers but can
cause a significant performance hit when requests need to go through a
realm that takes a long time to attempt to authenticate in order to get
to the realm that actually authenticates. In order to address this,
this change will propagate failed results to listeners if they use the
same set of credentials that the authentication attempt used. This does
prevent these stalled requests from retrying the authentication attempt
but the implementation does allow for new requests to retry the
attempt.
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls
* COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one
This commit implements a straightforward approach to retention lease
expiration. Namely, we inspect which leases are expired when obtaining
the current leases through the replication tracker. At that moment, we
clean the map that persists the retention leases in memory.
Today, a setting can declare that its validity depends on the values of other
related settings. However, the validity of a setting is not always checked
against the correct values of its dependent settings because those settings'
correct values may not be available when the validator runs.
This commit separates the validation of a settings updates into two phases,
with separate methods on the `Setting.Validator` interface. In the first phase
the setting's validity is checked in isolation, and in the second phase it is
checked again against the values of its related settings. Most settings only
use the first phase, and only the few settings with dependencies make use of
the second phase.
This commit is the first in a series which will culminate with
fully-functional shard history retention leases.
Shard history retention leases are aimed at preventing shard history
consumers from having to fallback to expensive file copy operations if
shard history is not available from a certain point. These consumers
include following indices in cross-cluster replication, and local shard
recoveries. A future consumer will be the changes API.
Further, index lifecycle management requires coordinating with some of
these consumers otherwise it could remove the source before all
consumers have finished reading all operations. The notion of shard
history retention leases that we are introducing here will also be used
to address this problem.
Shard history retention leases are a property of the replication group
managed under the authority of the primary. A shard history retention
lease is a combination of an identifier, a retaining sequence number, a
timestamp indicating when the lease was acquired or renewed, and a
string indicating the source of the lease. Being leases they have a
limited lifespan that will expire if not renewed. The idea of these
leases is that all operations above the minimum of all retaining
sequence numbers will be retained during merges (which would otherwise
clear away operations that are soft deleted). These leases will be
periodically persisted to Lucene and restored during recovery, and
broadcast to replicas under certain circumstances.
This commit is merely putting the basics in place. This first commit
only introduces the concept and integrates their use with the soft
delete retention policy. We add some tests to demonstrate the basic
management is correct, and that the soft delete policy is correctly
influenced by the existence of any retention leases. We make no effort
in this commit to implement any of the following:
- timestamps
- expiration
- persistence to and recovery from Lucene
- handoff during primary relocation
- sharing retention leases with replicas
- exposing leases in shard-level statistics
- integration with cross-cluster replication
These will occur individually in follow-up commits.
This pull request changes the Freeze Index and Close Index actions so
that these actions always requires a Task. The task's id is then propagated
from the Freeze action to the Close action, and then to the Verify shard action.
This way it is possible to track which Freeze task initiates the closing of an index,
and which consecutive verifiy shard are executed for the index closing.
As suggested in #36775, this pull request renames the following methods:
ClusterBlocks.hasGlobalBlock(int)
ClusterBlocks.hasGlobalBlock(RestStatus)
ClusterBlocks.hasGlobalBlock(ClusterBlockLevel)
to something that better reflects the property of the ClusterBlock that is searched for:
ClusterBlocks.hasGlobalBlockWithId(int)
ClusterBlocks.hasGlobalBlockWithStatus(RestStatus)
ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
Improve error message returned to the client when an SQL statement
cannot be translated to a ES query DSL. Cases:
1. WHERE clause evaluates to FALSE => No results returned
1. Missing FROM clause => Local execution, e.g.: SELECT SIN(PI())
3. Special SQL command => Only valid of SQL iface, e.g.: SHOW TABLES
Fixes: #37040
Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.
Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.
Fixes: #35872
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028
When a 6.1-6.3 job is opened in a later version
we increase the model memory limit by 30% if it's
below 0.5GB. The migration of jobs from cluster
state to the config index changes the job version,
so we need to also do this uplift as part of that
config migration.
Relates #36961
The unused state remover was never adjusted to account for jobs stored
in the config index. The result was that when triggered it removed
state for all jobs stored in the config index.
This commit fixes the issue.
Closes#37109
Improve parsing to save the source for each token alongside the location
of each Node/Expression for accurate reproducibility of an expression
name and source
Fix#36894
Removes the `xpack.ml.max_model_memory_limit` cluster setting
at the teardown of the `ml_info.yml` tests to ensure the setting
does not trip other tests.
Enhance error message for the case that the 2nd argument of PERCENTILE
and PERCENTILE_RANK is not a foldable, as it doesn't make sense to have
a dynamic value coming from a field.
Fixes: #36903
We run subsequent token invalidation requests and we still want to
trigger the deletion of expired tokens so we need to lower the
deleteInterval parameter significantly. Especially now that the
bwc expiration logic is removed and the invalidation process is
much shorter
Resolves#37063
Changes the feature usage retrieval to use the job manager rather than
directly talking to the cluster state, because jobs can now be either in
cluster state or stored in an index
This is a follow-up of #36702 / #36698
- Removes bwc invalidation logic from the TokenService
- Removes bwc serialization for InvalidateTokenResponse objects as
old nodes in supported mixed clusters during upgrade will be 6.7 and
thus will know of the new format
- Removes the created field from the TokensInvalidationResult and the
InvalidateTokenResponse as it is no longer useful in > 7.0
If we don't explicitly sett the client SSLSocketFactory when
creating an InMemoryDirectoryServer and setting its SSL config, it
will result in using a TrustAllTrustManager(that extends
X509TrustManager) which is not allowed in a FIPS 140 JVM.
Instead, we get the SSLSocketFactory from the existing SSLContext
and pass that to be used.
Resolves#37013
The phrase "missing authentication token" is historic and is based
around the use of "AuthenticationToken" objects inside the Realm code.
However, now that we have a TokenService and token API, this message
would sometimes lead people in the wrong direction and they would try
and generate a "token" for authentication purposes when they would
typically just need a username:password Basic Auth header.
This change replaces the word "token" with "credentials".
In #30509 we changed the way SSL configuration is reloaded when the
content of a file changes. As a consequence of that implementation
change the LDAP realm ceased to pick up changes to CA files (or other
certificate material) if they changed.
This commit repairs the reloading behaviour for LDAP realms, and adds
a test for this functionality.
Resolves: #36923
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
The AutoFollowCoordinator should be resilient to the fact that the follower
index has already been created and in that case it should only update
the auto follow metadata with the fact that the follower index was created.
Relates to #33007
Currently auto follow stats users are unable to see whether an auto follow
error was recent or old. The new timestamp field will help user distinguish
between old and new errors.
Both index following and auto following should be resilient against missing remote connections.
This happens in the case that they get accidentally removed by a user. When this happens
auto following and index following will retry to continue instead of failing with unrecoverable exceptions.
Both the put follow and put auto follow APIs validate whether the
remote cluster connection. The logic added in this change only exists
in case during the lifetime of a follower index or auto follow pattern
the remote connection gets removed. This retry behavior similar how CCR
deals with authorization errors.
Closes#36667Closes#36255
* Added Limitations page
* Made the aggregations page follow the common template for functions
* Modified all tables to have the first row's cells content centered
* Polishing in other various sections
When the script contexts were created in 6, the use of params.ctx was
deprecated. This commit cleans up that code and ensures that params.ctx
is null in both watcher script contexts.
Relates: #34059
Realm settings were changed in #30241 in a non-BWC way.
If you try and start a 7.x node using a 6.x config style, then the
default error messages do not adequately describe the cause of
the problem, or the solution.
This change detects the when realms are using the 6.x style and fails
with a specific error message.
This detection is a best-effort, and will detect issues when the
realms have not been modified to use the 7.x style, but may not detect
situations where the configuration was partially changed.
e.g. We can detect this:
xpack.security.authc:
realms.pki1.type: pki
realms.pki1.order: 3
realms.pki1.ssl.certificate_authorities: [ "ca.crt" ]
But this (where the "order" has been updated, but the "ssl.*" has not)
will fall back to the standard "unknown setting" check
xpack.security.authc:
realms.pki.pki1.order: 3
realms.pki1.ssl.certificate_authorities: [ "ca.crt" ]
Closes: #36026
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.
It implements the ccr clean_session action with this client.
Allow scripts to correctly reference grouping functions
Fix bug in translation of date/time functions mixed with histograms.
Enhance Verifier to prevent histograms being nested inside other
functions inside GROUP BY (as it implies double grouping)
Extend Histogram docs
This is related to #35975. When the shard restore process is complete,
the index mappings need to be updated to ensure that the data in the
files restores is compatible with the follower mappings. This commit
implements a mapping update as the final step in a shard restore.
the testFullPolicy and testMoveToRolloverStep tests
are very important tests, but they sometimes timeout
beyond the default 10sec wait for shrink to occur.
This commit increases one of the assertBusys to
20 seconds
Currently if a leader index with soft deletes disabled is auto followed then this index is silently ignored.
This commit changes this behavior to mark these indices as auto followed and report an error, which is visible in auto follow stats. Marking the index as auto follow is important, because otherwise the auto follower will continuously try to auto follow and fail.
Relates to #33007
... MlDistributedFailureIT.testLoseDedicatedMasterNode.
An intermittent failure has been observed in
`MlDistributedFailureIT. testLoseDedicatedMasterNode`.
The test launches a cluster comprised by a dedicated master node
and a data and ML node. It creates a job and datafeed and starts them.
It then shuts down and restarts the master node. Finally, the test asserts
that the two tasks have been reassigned within 10s.
The intermittent failure is due to the assertions that the tasks have been
reassigned failing. Investigating the failure revealed that the `assertBusy`
that performs that assertion times out. Furthermore, it appears that the
job task is not reassigned because the memory tracking info is stale.
Memory tracking info is refreshed asynchronously when a job is attempted
to be reassigned. Tasks are attempted to be reassigned either due to a relevant
cluster state change or periodically. The periodic interval is controlled by a cluster
setting called `cluster.persistent_tasks.allocation.recheck_interval` and defaults to 30s.
What seems to be happening in this test is that if all cluster state changes after the
master node is restarted come through before the async memory info refresh completes,
then the job might take up to 30s until it is attempted to reassigned. Thus the `assertBusy`
times out.
This commit changes the test to reduce the periodic check that reassigns persistent
tasks to `200ms`. If the above theory is correct, this should eradicate those failures.
Closes#36760
This change cleans up a number of ugly BWC
workarounds in the ML code.
7.0 cannot run in a mixed version cluster with
versions prior to 6.7, so code that deals with
these old versions is no longer required.
Closes#29963
* This change is to account for different system clock implementations
or different Java versions (for Java 8, milliseconds precision is used;
for Java 9+ a system specific clock implementation is used which can
have greater precision than what we need here).
Negative timestamps are currently supported in joda time. These are
dates before epoch. However, it doesn't really make sense to have a
negative timestamp, since this is a modern format. Any dates before
epoch can be represented with normal date formats, like ISO8601.
Additionally, implementing negative epoch timestamp parsing in java time
has an edge case which would more than double the code required. This
commit deprecates use of negative epoch timestamps.
When a filter is evaluated to false then it becomes a LocalRelation
with an EmptyExecutable. The LocalRelation in turn, becomes a
LocalExec and the the SkipQueryIfFoldingProjection was wrongly
converting it to a SingletonExecutable. Moreover made a change, so
that the queries without FROM clause, which are supposed to return a
single row, to become a LocalRelation with a SingletonExecutable
instead of EmptyExecutable to avoid mixing up with the ones operating
on a table but with a filter that evaluates to false.
Fixes: #35980
Explicitly call out the existence of the troubleshooting guide so
that hopefully users can solve common and easy problems with their
initial configuration
Leaving `index.lifecycle.indexing_complete` in place when removing the
lifecycle policy from an index can cause confusion, as if a new policy
is associated with the policy, rollover will be silently skipped.
Removing that setting when removing the policy from an index makes
associating a new policy with the index more involved, but allows ILM to
fail loudly, rather than silently skipping operations which the user may
assume are being performed.
* Adjust order of checks in WaitForRolloverReadyStep
This allows ILM to error out properly for indices that have a valid
alias, but are not the write index, while still handling
`indexing_complete` on old-style aliases and rollover (that is, those
which only point to a single index at a time with no explicit write
index)
Fixes two minor problems reported after merge of #36731:
1. Name the creation method to make clear it only creates
if necessary
2. Avoid multiple simultaneous in-flight creation requests
Commit #36786 updated docs and strings to reference transport.port instead of
transport.tcp.port. However, this breaks backwards compatibility tests
as the tests rely on string configurations and transport.port does not
exist prior to 6.6. This commit reverts the places were we reference
transport.tcp.port for tests. This work will need to be reintroduced in
a backwards compatible way.
* ML having delayed data detection create annotations
* adding upsertAsDoc, audit, and changing user
* changing update to just index the doc with the id set
Lucene 7.6 uses a smaller encoding for LatLonShape. This commit forks the LatLonShape classes to Elasticsearch's local lucene package. These classes will be removed on the release of Lucene 7.6.
The logstash management template was named in such a way as to confuse
users, who misunderstood it to be a template for indices created by
logstash. It is now renamed to more clearly communicate its purpose and
match the format of the other templates for system indices.
The parser used for rollup configs in _meta fields was not able to
handle unrelated data in the meta field. If an unrelated object
was encountered, it would half-consume the JSON object, realize it
wasn'ta rollup config, then stop parsing. This would leave the object
halfway consumed and the parsing framework would throw an exception.
This commit replaces the parsing logic with a set of minimal parsers,
each for the specific component we care about (`_doc`, `_meta`,
`_rollup`) and configured to ignore unknown fields where applicable.
More verbose, but less hacky than before and should be more robust.
Also adds tests (randomized and explicit) to make sure this doesn't
break in the future.
This is related to #36652. In 7.0 we plan to deprecate a number of
settings that make reference to the concept of a tcp transport. We
mostly just have a single transport type now (based on tcp). Settings
should only reference tcp if they are referring to socket options. This
commit updates the settings in the docs. And removes string usages of
the old settings. Additionally it adds a missing remote compress setting
to the docs.
This commit is related to #36127. It adds a CcrRestoreSourceService to
track Engine.IndexCommitRef need for in-process file restores. When a
follower starts restoring a shard through the CcrRepository it opens a
session with the leader through the PutCcrRestoreSessionAction. The
leader responds to the request by telling the follower what files it
needs to fetch for a restore. This is not yet implemented.
Once, the restore is complete, the follower closes the session with the
DeleteCcrRestoreSessionAction action.
* [ML] Job and datafeed mappings with index template (#32719)
Index mappings for the configuration documents
* [ML] Job config document CRUD operations (#32738)
* [ML] Datafeed config CRUD operations (#32854)
* [ML] Change JobManager to work with Job config in index (#33064)
* [ML] Change Datafeed actions to read config from the config index (#33273)
* [ML] Allocate jobs based on JobParams rather than cluster state config (#33994)
* [ML] Return missing job error when .ml-config is does not exist (#34177)
* [ML] Close job in index (#34217)
* [ML] Adjust finalize job action to work with documents (#34226)
* [ML] Job in index: Datafeed node selector (#34218)
* [ML] Job in Index: Stop and preview datafeed (#34605)
* [ML] Delete job document (#34595)
* [ML] Convert job data remover to work with index configs (#34532)
* [ML] Job in index: Get datafeed and job stats from index (#34645)
* [ML] Job in Index: Convert get calendar events to index docs (#34710)
* [ML] Job in index: delete filter action (#34642)
This changes the delete filter action to search
for jobs using the filter to be deleted in the index
rather than the cluster state.
* [ML] Job in Index: Enable integ tests (#34851)
Enables the ml integration tests excluding the rolling upgrade tests and a lot of fixes to
make the tests pass again.
* [ML] Reimplement established model memory (#35500)
This is the 7.0 implementation of a master node service to
keep track of the native process memory requirement of each ML
job with an associated native process.
The new ML memory tracker service works when the whole cluster
is upgraded to at least version 6.6. For mixed version clusters
the old mechanism of established model memory stored on the job
in cluster state was used. This means that the old (and complex)
code to keep established model memory up to date on the job object
has been removed in 7.0.
Forward port of #35263
* [ML] Need to wait for shards to replicate in distributed test (#35541)
Because the cluster was expanded from 1 node to 3 indices would
initially start off with 0 replicas. If the original node was
killed before auto-expansion to 1 replica was complete then
the test would fail because the indices would be unavailable.
* [ML] DelayedDataCheckConfig index mappings (#35646)
* [ML] JIndex: Restore finalize job action (#35939)
* [ML] Replace Version.CURRENT in streaming functions (#36118)
* [ML] Use 'anomaly-detector' in job config doc name (#36254)
* [ML] Job In Index: Migrate config from the clusterstate (#35834)
Migrate ML configuration from clusterstate to index for closed jobs
only once all nodes are v6.6.0 or higher
* [ML] Check groups against job Ids on update (#36317)
* [ML] Adapt to periodic persistent task refresh (#36633)
* [ML] Adapt to periodic persistent task refresh
If https://github.com/elastic/elasticsearch/pull/36069/files is
merged then the approach for reallocating ML persistent tasks
after refreshing job memory requirements can be simplified.
This change begins the simplification process.
* Remove AwaitsFix and implement TODO
* [ML] Default search size for configs
* Fix TooManyJobsIT.testMultipleNodes
Two problems:
1. Stack overflow during async iteration when lots of
jobs on same machine
2. Not effectively setting search size in all cases
* Use execute() instead of submit() in MlMemoryTracker
We don't need a Future to wait for completion
* [ML][TEST] Fix NPE in JobManagerTests
* [ML] JIindex: Limit the size of bulk migrations (#36481)
* [ML] Prevent updates and upgrade tests (#36649)
* [FEATURE][ML] Add cluster setting that enables/disables config migration (#36700)
This commit adds a cluster settings called `xpack.ml.enable_config_migration`.
The setting is `true` by default. When set to `false`, no config migration will
be attempted and non-migrated resources (e.g. jobs, datafeeds) will be able
to be updated normally.
Relates #32905
* [ML] Snapshot ml configs before migrating (#36645)
* [FEATURE][ML] Split in batches and migrate all jobs and datafeeds (#36716)
Relates #32905
* SQL: Fix translation of LIKE/RLIKE keywords (#36672)
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
* Fixing line length for EnvironmentTests and RecoveryTests (#36657)
Relates #34884
* Add back one line removed by mistake regarding java version check and
COMPAT jvm parameter existence
* Do not resolve addresses in remote connection info (#36671)
The remote connection info API leads to resolving addresses of seed
nodes when invoked. This is problematic because if a hostname fails to
resolve, we would not display any remote connection info. Yet, a
hostname not resolving can happen across remote clusters, especially in
the modern world of cloud services with dynamically chaning
IPs. Instead, the remote connection info API should be providing the
configured seed nodes. This commit changes the remote connection info to
display the configured seed nodes, avoiding a hostname resolution. Note
that care was taken to preserve backwards compatibility with previous
versions that expect the remote connection info to serialize a transport
address instead of a string representing the hostname.
* [Painless] Add boxed type to boxed type casts for method/return (#36571)
This adds implicit boxed type to boxed types casts for non-def types to create asymmetric casting relative to the def type when calling methods or returning values. This means that a user calling a method taking an Integer can call it with a Byte, Short, etc. legally which matches the way def works. This creates consistency in the casting model that did not previously exist.
* SNAPSHOTS: Adjust BwC Versions in Restore Logic (#36718)
* Re-enables bwc tests with adjusted version conditions now that #36397 enables concurrent snapshots in 6.6+
* ingest: fix on_failure with Drop processor (#36686)
This commit allows a document to be dropped when a Drop processor
is used in the on_failure fork of the processor chain.
Fixes#36151
* Initialize startup `CcrRepositories` (#36730)
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
* [TEST] fix float comparison in RandomObjects#getExpectedParsedValue
This commit fixes a test bug introduced with #36597. This caused some
test failure as stored field values comparisons would not work when CBOR
xcontent type was used.
Closes#29080
* [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
* TESTS:Debug Log. IndexStatsIT#testFilterCacheStats
* ingest: support default pipelines + bulk upserts (#36618)
This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.
However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.
This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.
fixes#36219
* Fix duplicate phrase in shrink/split error message (#36734)
This commit removes a duplicate "must be a" from the shrink/split error
messages.
* Deprecate types in get_source and exist_source (#36426)
This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.
Relates to #35190
* Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)"
This reverts commit 5bc7822562.
* Enhance Invalidate Token API (#35388)
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
* Add raw sort values to SearchSortValues transport serialization (#36617)
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.
This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.
* Watcher: Ensure all internal search requests count hits (#36697)
In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.
Closes#36177
* [TEST] Ensure shard follow tasks have really stopped.
Relates to #36696
* Ensure MapperService#getAllMetaFields elements order is deterministic (#36739)
MapperService#getAllMetaFields returns an array, which is created out of
an `ObjectHashSet`. Such set does not guarantee deterministic hash
ordering. The array returned by its toArray may be sorted differently
at each run. This caused some repeatability issues in our tests (see #29080)
as we pick random fields from the array of possible metadata fields,
but that won't be repeatable if the input array is sorted differently at
every run. Once setting the tests seed, hppc picks that up and the sorting is
deterministic, but failures don't repeat with the seed that gets printed out
originally (as a seed was not originally set).
See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173.
With this commit, we simply create a static sorted array that is used for
`getAllMetaFields`. The change is in production code but really affects
only testing as the only production usage of this method was to iterate
through all values when parsing fields in the high-level REST client code.
Anyways, this seems like a good change as returning an array would imply
that it's deterministically sorted.
* Expose Sequence Number based Optimistic Concurrency Control in the rest layer (#36721)
Relates #36148
Relates #10708
* [ML] Mute MlDistributedFailureIT
This change fixes the rollup statistics regarding search times. Search times are
computed from the first query and never updated. This commit adds the missing
calls to the subsequent search.
Fix grammar so that each element inside the list of values for IN
is a valueExpression and not a more generic expression. Introduce a
mapping for context names as some rules in the grammar are exited with
a different rule from the one they entered.This helps so that the decrement
of depth counts in the Parser's CircuitBreakerListener works correctly.
For the list of values for IN, don't count the
PrimaryExpressionContext as this is not visited on exitRule() due to
the peculiarity in our gramamr with the predicate and predicated.
Fixes: #36592
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
The ML UI now provides the ability for users to annotate
time periods with arbitrary text to add insight to what
happened.
This change makes the backend create the index for these
annotations, together with read and write aliases to
make future upgrades possible without adding complexity
to the UI.
It also adds read and write permission to the index for
all ML users (not just admins).
The spec for the index is in
https://github.com/elastic/kibana/pull/26034/files#diff-c5c6ac3dbb0e7c91b6d127aa06121b2cR7
Relates #33376
Relates elastic/kibana#26034
In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.
Closes#36177
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.
Relates to #33007
Originates from #35895
This fixes two bugs about watcher notifications:
* registering accounts that had only secure settings was not possible before;
these accounts are very much practical for Slack and PagerDuty integrations.
* removes the limitation that, for an account with both secure and cluster settings,
the admin had to first change/add the secure settings and only then add the
dependent dynamic cluster settings. The reverse order would trigger a
SettingsException for an incomplete account.
The workaround is to lazily instantiate account objects, hoping that when accounts
are instantiated all the required settings are in place. Previously, the approach
was to greedily validate all the account settings by constructing the account objects,
even if they would not ever be used by actions. This made sense in a world where
all the settings were set by a single API. But given that accounts have dependent
settings (that must be used together) that have to be changed using different APIs
(POST _nodes/reload_secure_settings and PUT _cluster/settings), the settings group
would technically be in an invalid state in between the calls.
This fix builds account objects, and validates the settings, when they are
needed by actions.
As the internals have moved to java.time, the usage of TimeZone itself
should be minimized as it creates issues when being converted to ZoneId
Protocol wise the two are mostly identical so consumer should not see
any difference.
Note that terminology wise, inside the docs, the public API and inside
the protocol timeZone will continue to be used as it's more widely
understood as oppose to zoneId which is an implementation detail
specific to the JVM
Fix#36535
When trying to analyse a HAVING condition, we crate a temp Aggregate
Plan which wasn't created correctly (missing the expressions from the
SELECT clause) and as a result the ordinal (1, 2, etc) in the GROUP BY
couldn't be resolved correctly.
Also after successfully analysing the HAVING condition, still the
original plan was returned.
Fixes: #36059
If a primary promotion happens in the test testAddRemoveShardOnLeader, the
max_seq_no_of_updates_or_deletes on a new primary might be higher than the
max_seq_no_of_updates_or_deletes on the replicas or copies of the follower.
Relates #36607
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control)
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.
Relates #36148
Relates #10708
For class fields of type collection whose order is not important
and for which duplicates are not permitted we declare them as `Set`s.
Usually the definition is a `HashSet` but in this case `TreeSet` is used
instead to aid testing.
This commit updates our transport settings for 7.0. It generally takes a
few approaches. First, for normal transport settings, it usestransport.
instead of transport.tcp. Second, it uses transport.tcp, http.tcp,
or network.tcp for all settings that are proxies for OS level socket
settings. Third, it marks the network.tcp.connect_timeout setting for
removal. Network service level settings are only settings that apply to
both the http and transport modules. There is no connect timeout in
http. Fourth, it moves all the transport settings to a single class
TransportSettings similar to the HttpTransportSettings class.
This commit does not actually remove any settings. It just adds the new
renamed settings and adds todos for settings that will be deprecated.