This commit hides ClusterStates that have a STATE_NOT_RECOVERED_BLOCK from
ClusterStateAppliers. This is needed, because some appliers, such as IngestService, rely on
the fact, that cluster states with STATE_NOT_RECOVERED_BLOCK won't contain anything useful.
Once the state is recovered it's fully available for the appliers. This commit also switches many of
the remaining tests that require state persistence/recovery from Zen1 to Zen2.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
This commit replaces usages of Streamable with Writeable for the
BaseTasksResponse / TransportTasksAction classes and subclasses of
these classes.
Note that where possible response fields were made final.
Relates to #34389
The current response format is:
```
{
"pattern1": {
...
},
"pattern2": {
...
}
}
```
The new format is:
```
{
"patterns": [
{
"name": "pattern1",
"pattern": {
...
}
},
{
"name": "pattern2",
"pattern": {
...
}
}
]
}
```
This format is more structured and more friendly for parsing and generating specs.
This is a breaking change, but it is better to do this now while ccr
is still a beta feature than later.
Follow up from #36049
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
This commit is part of our plan to deprecate and ultimately remove the
use of _xpack in the REST APIs.
* Add deprecation for /_xpack/monitoring/_bulk in favor of /_monitoring/bulk
* Removed xpack from the rest-api-spec and tests
* Removed xpack from the Action name
* Removed MonitoringRestHandler as an unnecessary abstraction
* Minor corrections to comments
Relates #35958
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
In the current implementation, there is a time between the
ShrinkStep and the ShrinkSetAliasStep that both the source and
target indices will be present with the same aliases. This means
that queries to during this time will query both and return
duplicates. This fixes that scenario by moving the alias inheritance
to the same aliases update request as is done in ShrinkSetAliasStep
This commit improves the efficiency of exact index name matching by
separating exact matches from those that include wildcards or regular
expressions. Internally, exact matching is done using a HashSet instead
of adding the exact matches to the automata. For the wildcard and
regular expression matches, the underlying implementation has not
changed.
This commits changes the serialization version from V_7_0_0 to
v_6_6_0 for the authenticate API response now that the work to add
the realm info in the response has been backported to 6.x in
b515ec7c9b9074dfa2f5fd28bac68fd8a482209e
Relates #35648
In #30241 Realm settings were changed, but the Kerberos realm settings
were not registered correctly. This change fixes the registration of
those Kerberos settings.
Also adds a new integration test that ensures every internal realm can
be configured in a test cluster.
Also fixes the QA test for kerberos.
Resolves: #35942
Right now using the `GET /_tasks/<taskid>` API and causing a task to opt
in to saving its result after being completed requires permissions on
the `.tasks` index. When we built this we thought that that was fine,
but we've since moved towards not leaking details like "persisting task
results after the task is completed is done by saving them into an index
named `.tasks`." A more modern way of doing this would be to save the
tasks into the index "under the hood" and to have APIs to manage the
saved tasks. This is the first step down that road: it drops the
requirement to have permissions to interact with the `.tasks` index when
fetching task statuses and when persisting statuses beyond the lifetime
of the task.
In particular, this moves the concept of the "origin" of an action into
a more prominent place in the Elasticsearch server. The origin of an
action is ignored by the server, but the security plugin uses the origin
to make requests on behalf of a user in such a way that the user need
not have permissions to perform these actions. It *can* be made to be
fairly precise. More specifically, we can create an internal user just
for the tasks API that just has permission to interact with the `.tasks`
index. This change doesn't do that, instead, it uses the ubiquitus
"xpack" user which has most permissions because it is simpler. Adding
the tasks user is something I'd like to get to in a follow up change.
Instead, the majority of this change is about moving the "origin"
concept from the security portion of x-pack into the server. This should
allow any code to use the origin. To keep the change managable I've also
opted to deprecate rather than remove the "origin" helpers in the
security code. Removing them is almost entirely mechanical and I'd like
to that in a follow up as well.
Relates to #35573
- Add the authentication realm and lookup realm name and type in the response for the _authenticate API
- The authentication realm is set as the lookup realm too (instead of setting the lookup realm to null or empty ) when no lookup realm is used.
* [Rollup] Add more diagnostic stats to job
To help debug future performance issues, this adds the
min/max/avg/count/total latencies (in milliseconds) for search
and bulk phase. This latency is the total service time including
transfer between nodes, not just the `took` time.
It also adds the count of search/bulk failures encountered during
runtime. This information is also in the log, but a runtime counter
will help expose problems faster
* review cleanup
* Remove dead ParseFields
step times were set. The assumption was that these are always set.
Tests passed, which led me to believe this was true. There is a time
when shrunk indices have their step phase/action/step details set,
but with no time information (in the CopyExecutionStateStep).
Explain API fails for these
This commit is related to #32517. It allows an "sni_server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. Prior to this commit, this functionality
was only support for the netty transport. This commit adds this
functionality to the security nio transport.
added validation for complete information of step details.
also changed the rendering of explain responses so null strings are not rendered
Another thing that I changed is the format of the client-side response. I found it difficult to maintain the two subtly-different objects, so I migrated the usage of long for the fields, to Long (just as it is on the server-side).
This generates a synthesized "id" for each incoming request that is
included in the audit logs (file only).
This id can be used to correlate events for the same request (e.g.
authentication success with access granted).
This request.id is specific to the audit logs and is not used for any
other purpose
The request.id is consistent across nodes if a single request requires
execution on multiple nodes (e.g. search acros multiple shards).
When assertions are enabled, a Put User action that have no effect (a
noop update) would trigger an assertion failure and shutdown the node.
This change accepts "noop" as an update result, and adds more
diagnostics to the assertion failure message.
Code that operates on-top of the engine requires all readers returned to be
unwrapped into ElasticsearchDirectoryReader. The special reader
the FrozenEngine uses wasn't wrapped.
We didn't check that the ExplainLifecycleRequest was constructed with at least
one index before, now that we do we must also make sure the tests
mutateInstance() method used in equals/hashCode checks doesn't accidentally
create an empty index array.
Closes#35822
Creates the manage_token cluster privilege and adds it to the
kibana_system role. This is required if kibana were to use the token
service for its authenticator process.
Because kibana_system already has manage_saml this effectively
only adds the privilege to create tokens.
This commit removes the parsing code from the PutLicenseResponse server variant, and the toXContent portion from the corresponding client variant.
Relates to #35547
This commit removes the parsing code from the PostStartBasicResponse server variant. It also makes the server response implement StatusToXContent which allows us to save a couple of lines of code in the corredponding REST action.
Relates to #35547
The RestHasPrivilegesAction previously handled its own XContent
generation. This change moves that into HasPrivilegesResponse and
makes the response implement ToXContent.
This allows HasPrivilegesResponseTests to be used to test
compatibility between HLRC and X-Pack internals.
A serialization bug (cluster privs) was also fixed here.
This commit adds a rest endpoint for freezing and unfreezing an index.
Among other cleanups mainly fixing an issue accessing package private APIs
from a plugin that got caught by integration tests this change also adds
documentation for frozen indices.
Note: frozen indices are marked as `beta` and available as a basic feature.
Relates to #34352
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
* ML: Removing result_finalization_window && overlapping_buckets
* Reverting bad method deletions
* Setting to current before backport to try and get a green build
* fixing testBuildAutodetectCommand test
* disabling bwc tests for backport
RolloverStep previously had a name of "attempt_rollover", which was
inconsistent with all other step names due it its use of an underscore
instead of a dash.
RolloverAction will now periodically check the rollover conditions using
the Rollover API with the dry_run option as an AsyncWaitStep, then run
the rollover itself by calling the Rollover API with no conditions,
which will always roll over, as an AsyncActionStep. This will resolve
race condition issues in policies using RolloverAction.
Today, the bootstrapping of a Zen2 cluster is driven externally, requiring
something else to wait for discovery to converge and then to inject the initial
configuration. This is hard to use in some situations, such as REST tests.
This change introduces the `ClusterBootstrapService` which brings the bootstrap
retry logic within each node and allows it to be controlled via an (unsafe)
node setting.
* ML: Adding missing datacheck to datafeedjob
* Adding client side and docs
* Making adjustments to validations
* Making values default to on, having more sensible limits
* Intermittent commit, still need to figure out interval
* Adjusting delayed data check interval
* updating docs
* Making parameter Boolean, so it is nullable
* bumping bwc to 7 before backport
* changing to version current
* moving delayed data check config its own object
* Separation of duties for delayed data detection
* fixing checkstyles
* fixing checkstyles
* Adjusting default behavior so that null windows are allowed
* Mentioning the default value
* Fixing comments, syncing up validations
In the event that the target index does not exist when `CopyExecutionStateStep`
executes, this avoids a `NullPointerException` and provides a more helpful error
to the ILM user.
Resolves#35567
Kibana now uses the tasks API to manage automatic reindexing of the
.kibana index during upgrades.
The implementation of the tasks API requires that
1. the user executing the task can create & write to the ".tasks" index
2. the user checking on the status of the task can read (Get) the
relevant document from the ".tasks" index
Response classes in Elasticsearch (and xpack) only need to implement ToXContent, which is needed to print their output put in the REST layer and return the response in json (or others) format. On the other hand, response classes that are added to the high-level REST client, need to do the opposite: parse xcontent and create a new object based on that.
This commit removes the parsing code from the XPackInfoResponse server variant, and the toXContent portion from the corresponding client variant. It also removes a client specific test class that looks redundant now that we have a single test class for both classes.
The DefaultAuthenticationFailureHandler has a deprecated constructor
that was present to prevent a breaking change to custom realm plugin
authors in 6.x. This commit removes the constructor and its uses.
For some time, the PutUser REST API has supported storing a pre-hashed
password for a user. The change adds validation and tests around that
feature so that it can be documented & officially supported.
It also prevents the request from containing both a "password" and a "password_hash".
This adds a `wait_for_completion` flag which allows the user to block
the Stop API until the task has actually moved to a stopped state,
instead of returning immediately. If the flag is set, a `timeout` parameter
can be specified to determine how long (at max) to block the API
call. If unspecified, the timeout is 30s.
If the timeout is exceeded before the job moves to STOPPED, a
timeout exception is thrown. Note: this is just signifying that the API
call itself timed out. The job will remain in STOPPING and evenutally
flip over to STOPPED in the background.
If the user asks the API to block, we move over the the generic
threadpool so that we don't hold up a networking thread.
This change adds a special caching reader that caches all relevant
values for a range query to rewrite correctly in a can_match phase
without actually opening the underlying directory reader. This
allows frozen indices to be filtered with can_match and in-turn
searched with wildcards in a efficient way since it allows us to
exclude shards that won't match based on their date-ranges without
opening their directory readers.
Relates to #34352
Depends on #34357
This change adds a high level freeze API that allows to mark an
index as frozen and vice versa. Indices must be closed in order to
become frozen and an open but frozen index must be closed to be
defrosted. This change also adds a index.frozen setting to
mark frozen indices and integrates the frozen engine with the
SearchOperationListener that resets and releases the directory
reader after and before search phases.
Relates to #34352
Depends on #34357
This commit uses the index settings version so that a follower can
replicate index settings changes as needed from the leader.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
There is no longer a concept of non-global "realm settings". All realm
settings should be loaded from the node's settings using standard
Setting classes.
This change renames the "globalSettings" field and method to simply be
"settings".
Since it's still possible to shrink an index when replicas are unassigned, we
should not check that all copies are available when performing the shrink, since
we set the allocation requirement for a single node.
Resolves#35321
* [ILM] Check shard and relocation status in AllocationRoutedStep
This is a follow-up from #35161 where we now check for started and relocating
state in `AllocationRoutedStep`.
Resolves#35258
This change adds a `frozen` engine that allows lazily open a directory reader
on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader
that also allows to release and reset the underlying index readers after any and before
secondary search phases.
Relates to #34352
An auto follow pattern:
* cannot start with `_`
* cannot contain a `,`
* can be encoded in UTF-8
* the length of UTF-8 encoded bytes is no longer than 255 bytes
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
In order to start shard follow tasks, the resume follow api already
needs execute N requests to the elected master node.
The pause follow API is also a master node action, which would make
how both APIs execute more consistent.
This is related to #29023. Additionally at other points we have
discussed a preference for removing the need to unnecessarily block
threads for opening new node connections. This commit lays the groudwork
for this by opening connections asynchronously at the transport level.
We still block, however, this work will make it possible to eventually
remove all blocking on new connections out of the TransportService
and Transport.
Today if a wildcard, date-math expression or alias expands/resolves
to an index that is search-throttled we still search it. This is likely
not the desired behavior since it can unexpectedly slow down searches
significantly.
This change adds a new indices option that allows `search`, `count`
and `msearch` to ignore throttled indices by default. Users can
force expansion to throttled indices by using `ignore_throttled=true`
on the rest request to expand also to throttled indices.
Relates to #34352
This moves all Realm settings to an Affix definition.
However, because different realm types define different settings
(potentially conflicting settings) this requires that the realm type
become part of the setting key.
Thus, we now need to define realm settings as:
xpack.security.authc.realms:
file.file1:
order: 0
native.native1:
order: 1
- This is a breaking change to realm config
- This is also a breaking change to custom security realms (SecurityExtension)
This adds a new step for checking whether an index is allocated correctly based
on the rules added prior to running the shrink step. It also fixes a bug where
for shrink we are not allowed to have the shards relocating for the shrink step.
This also allows us to simplify AllocationRoutedStep and provide better
feedback in the step info for why either the allocation or the shrink checks
have failed.
Resolves#34938
If the Rollover step would fail due to the next index in sequence
already existing, just skip to the next step instead of going to the
Error step.
This prevents spurious `ResourceAlreadyExistsException`s created by
simultaneous RolloverStep executions from causing ILM to error out
unnecessarily.
This commit removes the Joda time usage from ILM and the HLRC components of ILM.
It also fixes an issue where using the `?human=true` flag could have caused the
parser not to work. These millisecond fields now follow the standard we use
elsewhere in the code, with additional fields added iff the `human` flag is
specified.
This is a breaking change for ILM, but since ILM has not yet been released, no
compatibility shim is needed.
Only the response classes of get auto follow pattern, the follow and stats APIs
were moved away from Streamable. The other APIs use `AcknowledgedResponse`
or `BaseTasksResponse` as response class and
moving that class away from Streamable is a bigger change.
ILM's Shrink Action was using a nodes "_name" attribute to
allocate to prepare for the shrink step. Since the name is
configurable by a user and may use the same name for
multiple nodes on one machine, _id is safer since it is guaranteed
to be unique.
closes#35043.
* Adding rollup support for datafeeds
* Fixing tests and adjusting formatting
* minor formatting chagne
* fixing some syntax and removing redundancies
* Refactoring and fixing failing test
* Refactoring, adding paranoid null check
* Moving rollup into the aggregation package
* making AggregationToJsonProcessor package private again
* Addressing test failure
* Fixing validations, chunking
* Addressing failing test
* rolling back RollupJobCaps changes
* Adding comment and cleaning up test
* Addressing review comments and test failures
* Moving builder logic into separate methods
* Addressing PR comments, adding test for rollup permissions
* Fixing test failure
* Adding rollup priv check on datafeed put
* Handling missing index when getting caps
* Fixing unused import
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
The ILM Rollover Step can execute on the incorrect index if the rollover alias
exists on another valid index, but not the one the step is executing against. This
is a problem and is now guarded against
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.
This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
Only the follow stats request couldn't be changed to use Writeable serialization,
because that requires changes in `TransportTasksAction` and `BaseTasksRequest` base classes.
This commit fixes two issues with the CCR API specification:
- remove the CCR stats endpoint, it is not currently implemented
- fix the documentation links
The file structure finder endpoint can find the NDJSON
(newline-delimited JSON) file format, but called it
`json`. This change renames the `format` for this file
structure to `ndjson`, which is more precise and will
hopefully avoid confusion.
* Changed the auto follow stats to also include follow stats.
* Renamed the auto follow stats api to stats api and changed its url path
from `/_ccr/auto_follow/stats` `/_ccr/stats`.
* Removed `/_ccr/stats` url path for the follow stats api, which makes
the index parameter a required parameter.
* Fixed docs.
We currently have two different native processes:
autodetect & normalizer. There are plans for introducing
a new process. All these share many things in common.
This commit refactors the processes to extend an
`AbstractNativeProcess` class that encapsulates those
commonalities with the purpose of reusing the code
for new processes in the future.
With the introduction of _ilm/stop and _ilm/start APIs, the
use cases where one would only target a select group
of indices to start/stop has been reduced. Since there is no
strong use-case for skipping specific indices, it is best to
remove this functionality and only adding if later desired, with the
hopes of keeping things more simple.
through randomization, there is a chance that the mutateInstance
for PolicyStatsTests does not actually mutate the original object.
This PR aims to fix this
* NETWORKING: Add SSL Handler before other Handlers
* The only way to run into the issue in #33998 is for `Netty4MessageChannelHandler`
to be in the pipeline while the SslHandler is not. Adding the SslHandler before any
other handlers should ensure correct ordering here even when we handle upstream events
in our own thread pool
* Ensure that channels that were closed concurrently don't trip the assertion
* Closes#33998
This limit is based on the size in bytes of the operations in the write buffer. If this limit is exceeded then no more read operations will be coordinated until the size in bytes of the write buffer has dropped below the configured write buffer size limit.
Renamed existing `max_write_buffer_size` to ``max_write_buffer_count` to indicate that limit is count based.
Closes#34705
* Adds usage data for ILM
* Adds tests for IndexLifecycleFeatureSetUsage and friends
* Adds tests for IndexLifecycleFeatureSet
* Fixes merge errors
* Add number of indices managed to usage stats
Also adds more tests
* Addresses Review comments
* Removes Set Policy API in favour of setting index.lifecycle.name
directly
* Reinstates matcher that will still be used
* Cleans up code after rebase
* Adds test to check changing policy with ndex settings works
* Fixes TimeseriesLifecycleActionsIT after API removal
* Fixes docs tests
* Fixes case on close where lifecycle service was never created
* Adding stack_monitoring_agent role
* Fixing checkstyle issues
* Adding tests for new role
* Tighten up privileges around index templates
* s/stack_monitoring_user/remote_monitoring_collector/ + remote_monitoring_user
* Fixing checkstyle violation
* Fix test
* Removing unused field
* Adding missed code
* Fixing data type
* Update Integration Test for new builtin user
* Change the `TransportPauseFollowAction` to extend from `TransportMasterNodeAction`
instead of `HandledAction`, this removes a sync cluster state api call.
* Introduced `ResponseHandler` that removes duplicated code in `TransportPauseFollowAction` and
`TransportResumeFollowAction`.
* Changed `PauseFollowAction.Request` to not use `readFrom()`.
As part of this change the leader index name and leader cluster name are
stored in the CCR metadata in the follow index. The resume follow api
will read that when a resume follow request is executed.
We should delete a job by directly talking to the allocated
task and telling it to shutdown. Today we shut down a job
via the persistent task framework. This is not ideal because,
while the job has been removed from the persistent task
CS, the allocated task continues to live until it gets the
shutdown message.
This means a user can delete a job, immediately delete
the rollup index, and then see new documents appear in
the just-deleted index. This happens because the indexer
in the allocated task is still running and indexes a few
more documents before getting the shutdown command.
In this PR, the transport action is changed to a TransportTasksAction,
and we invoke onCancelled() directly on the matching job.
The race condition still exists after this PR (albeit less likely),
but this was a precursor to fixing the issue and a self-contained
chunk of code. A second PR will followup to fix the race itself.
* Changed the resource id of auto follow patterns to be a user defined name
instead of being the leader cluster alias name.
* Fail when an unfollowed leader index matches with two or more auto follow patterns.
In some of our X-Pack REST tests we have to wait for pending tasks to
complete. We are now needing this functionality in ESRestTestCase for
the docs tests where we run against X-Pack features. This commit moves
the helper method that we have in X-Pack to ESRestTestCase, and removes
duplicate logic from waiting for rollup tasks to complete.
This change makes it no longer possible to follow / auto follow without
specifying a leader cluster. If a local index needs to be followed
then `cluster.remote.*.seeds` should point to nodes in the local cluster.
Closes#34258
This API is intended as a companion to the _has_privileges API.
It returns the list of privileges that are held by the current user.
This information is difficult to reason about, and consumers should
avoid making direct security decisions based solely on this data.
For example, each of the following index privileges (as well as many
more) would grant a user access to index a new document into the
"metrics-2018-08-30" index, but clients should not try and deduce
that information from this API.
- "all" on "*"
- "all" on "metrics-*"
- "write" on "metrics-2018-*"
- "write" on "metrics-2018-08-30"
Rather, if a client wished to know if a user had "index" access to
_any_ index, it would be possible to use this API to determine whether
the user has any index privileges, and on which index patterns, and
then feed those index patterns into _has_privileges in order to
determine whether the "index" privilege had been granted.
The result JSON is modelled on the Role API, with a few small changes
to reflect how privileges are modelled when multiple roles are merged
together (multiple DLS queries, multiple FLS grants, multiple global
conditions, etc).
This moves the rollup cleanup code for http tests from the high level rest
client into the test framework and then entirely removes the rollup cleanup
code for http tests that lived in x-pack. This is nice because it
consolidates the cleanup into one spot, automatically invokes the cleanup
without the test having to know that it is "about rollup", and should allow
us to run the rollup docs tests.
Part of #34530
This change adds the command RemoveIndexLifecyclePolicy to the HLRC. This uses the
new TimeRequest as a base class for RemoveIndexLifecyclePolicyRequest on the client side.
* Rollup adding support for date field metrics (#34185)
* Restricting supported metrics for `date` field rollup
* fixing expected error message for yaml test
* Addressing PR comments
The `AutoFollowTests` needs to restart the clusters between each tests, because
it is using auto follow stats in assertions. Auto follow stats are only reset
by stopping the elected master node.
Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api.
* Renamed AutoFollowTests to AutoFollowIT, because it is an integration test.
Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name
of an internal api and isn't a good name for an integration test.
* move creation of NodeConfigurationSource to a seperate method
* Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.
This commit introduces settings version to index metadata. This value is
monotonically increasing and is updated on settings updates. This will
be useful in cross-cluster replication so that we can request settings
updates from the leader only when there is a settings update.
xContent ordering is unreliable when derived from map insertions but the parsed objects’ .equals() methods have the sort logic required to prove connections and vertices are correct. Disabled the xContent equivalence checks.
Closes#33686
Security caches the result of role lookups and negative lookups are
cached indefinitely. In the case of transient failures this leads to a
bad experience as the roles could truly exist. The CompositeRolesStore
needs to know if a failure occurred in one of the roles stores in order
to make the appropriate decision as it relates to caching. In order to
provide this information to the CompositeRolesStore, the return type of
methods to retrieve roles has changed to a new class,
RoleRetrievalResult. This class provides the ability to pass back an
exception to the roles store. This exception does not mean that a
request should be failed but instead serves as a signal to the roles
store that missing roles should not be cached and neither should the
combined role if there are missing roles.
As part of this, the negative lookup cache was also changed from an
unbounded cache to a cache with a configurable limit.
Relates #33205
PR #34290 made it impossible to use thread-context values to pass
authentication metadata out of a realm. The SAML realm used this
technique to allow the SamlAuthenticateAction to process the parsed
SAML token, and apply them to the access token that was generated.
This new method adds metadata to the AuthenticationResult itself, and
then the authentication service makes this result available on the
thread context.
Closes: #34332
The ingest pipeline that is produced is very simple. It
contains a grok processor if the format is semi-structured
text, a date processor if the format contains a timestamp,
and a remove processor if required to remove the interim
timestamp field parsed out of semi-structured text.
Eventually the UI should offer the option to customize the
pipeline with additional processors to perform other data
preparation steps before ingesting data to an index.
This fixes an issue where an incorrect expected next step is used when checking
to execute `AsyncActionStep`s after a cluster state step.
It fixes this scenario:
- `ExecuteStepsUpdateTask` executes a `ClusterStateWaitStep` or
`ClusterStateActionStep` successfully
- The next step is also a `ClusterStateWaitStep`, so it loops
- The `ClusterStateWaitStep` has a next stepkey (which gets set to the
`nextStepKey` in the code)
- The `ClusterStateWaitStep` fails the condition, meaning that it will have to
wait longer
- The `nextStepKey` is now incorrect though, because we did not advance the
index's step, and it's not `null` (which is another safe value if there is no
step after the `ClusterStateWaitStep`)
This fixes the problem by resetting the nextStepKey to null if the condition is
not met, since we are not going to advance the step metadata in this
case (thereby skipping the `maybeRunAsyncAction` invocation).
This commit also tightens up and enhances much of the ILM logging. A lot of
logging was missing the index name (making it hard to debug in the presence of
multiple indices) and a lot was using the wrong logging level (DEBUG is now
actually readable without being a wall of text).
Resolves#34297
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
Unfollow should be allowed / disallowed on a per index level instead of
cluster level.
Also renamed `create_follow_index` index privilege to
`manage_follow_index` privilege and include unfollow and close APIs.
This commit modifies the follow stats API response structure to more
clearly highlight meaning of the higher level fields. In particular,
previously the response had a top-level key for each index. Instead, we
nest the indices under an "indices" field which is now an array. The
values in this array are objects containing two fields: "index" which is
the name of the follower index, and "shards" which is an array where
each value in the array is the follower stats for that shard. That is,
we have gone from:
{
"bar": [
{
"shard_id": 0...
}...
]...
}
to
{
"indices": [
{
"index": "bar",
"shards": [
{
"shard_id": 0...
}...
]
}...
}
In the CCR docs we want to refer to the endpoint that returns following
stats as the follow stats API. This commit renames the internal
implementation of this endpoint to reflect this usage.
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
Drops the last logging constructor that takes `Settings` because it is
no longer needed.
Watcher goes through a lot of effort to pass `Settings` to `Logger`
constructors and dropping `Settings` from all of those calls allowed us
to remove quite a bit of log-based ceremony from watcher.
This enables Elasticsearch to use the JVM-wide configured
PKCS#11 token as a keystore or a truststore for its TLS configuration.
The JVM is assumed to be configured accordingly with the appropriate
Security Provider implementation that supports PKCS#11 tokens.
For the PKCS#11 token to be used as a keystore or a truststore for an
SSLConfiguration, the .keystore.type or .truststore.type must be
explicitly set to pkcs11 in the configuration.
The fact that the PKCS#11 token configuration is JVM wide implies that
there is only one available keystore and truststore that can be used by TLS
configurations in Elasticsearch.
The PIN for the PKCS#11 token can be set as a truststore parameter in
Elasticsearch or as a JVM parameter ( -Djavax.net.ssl.trustStorePassword).
The basic goal of enabling PKCS#11 token support is to allow PKCS#11-NSS in
FIPS mode to be used as a FIPS 140-2 enabled Security Provider.
When the cluster.routing.allocation.disk.watermark.flood_stage watermark
is breached, DiskThresholdMonitor marks the indices as read-only. This
failed when x-pack security was present as system user does not have the privilege
for update settings action("indices:admin/settings/update").
This commit adds the required privilege for the system user. Also added missing
debug logs when access is denied to help future debugging.
An assert statement is added to catch any missed privileges required for
system user.
Closes#33119
This commit upgrades the unboundid ldapsdk to version 4.0.8. The
primary driver for upgrading is a fix that prevents this library from
rewrapping Error instances that would normally bubble up to the
UncaughtExceptionHandler and terminate the JVM. Other notable changes
include some fixes related to connection handling in the library's
connection pool implementation.
Closes#33175
This commit changes the way that step execution flows. Rather than have any step
run when the cluster state changes or the periodic scheduler fires, this now
runs the different types of steps at different times.
`AsyncWaitStep` is run at a periodic manner, ie, every 10 minutes by default
`ClusterStateActionStep` and `ClusterStateWaitStep` are run every time the
cluster state changes.
`AsyncActionStep` is now run only after the cluster state has been transitioned
into a new step. This prevents these non-idempotent steps from running at the
same time. It addition to being run when transitioned into, this is also run
when a node is newly elected master (only if set as the current step) so that
master failover does not fail to run the step.
This also changes the `RolloverStep` from an `AsyncActionStep` to an
`AsyncWaitStep` so that it can run periodically.
Relates to #29823
Revert "[TESTS] Pin MockWebServer to TLS1.2 (#33127)" (commit
214652d4af) and "Pin TLS1.2 in
SSLConfigurationReloaderTests" (commit
d9f5e4fd2e), which pinned the
MockWebServer used in the SSLConfigurationReloaderTests to TLSv1.2 in
order to prevent failures with JDK 11 related to ssl session
invalidation. We no longer need this pinning as the problematic code
was fixed in #34130.
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients.
For the unfollow api to work the index follow needs to be stopped and the index needs to be closed.
Closes#33931
This can be used to restrict the amount of CPU a single
structure finder request can use.
The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.
The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
Prior to following an index in the follow API, check whether current
user has sufficient privileges in the leader cluster to read and
monitor the leader index.
Also check this in the create and follow API prior to creating the
follow index.
Also introduced READ_CCR cluster privilege that include the minimal
cluster level actions that are required for ccr in the leader cluster.
So a user can follow indices in a cluster, but not use the ccr admin APIs.
Closes#33553
Co-authored-by: Jason Tedor <jason@tedor.me>
The SSLService invalidates SSLSessions when there is a change to any of
the underlying key or trust material. However, this invalidation code
did not check for a null SSLSession being returned from the context and
assumed that the context would always return a non-null object. The
return of a null object is possible in all versions, but JDK11 seems to
return them more often due to changes for TLS 1.3. There are a number
of reasons that we get a id of a session but the context returns null
when the session with that id is requested. Some of the reasons for
this are:
* Session was evicted by session cache
* Session has timed out
* Session has been invalidated by another caller
To handle this, the SSLService now checks if the value is null before
calling invalidate on the SSLSession.
Closes#32124
Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.
This also addresses a potential issue in caching when the same query contains aggregations but in
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in
the cache.
* Renamed CCR APIs
Renamed:
* `/{index}/_ccr/create_and_follow` to `/{index}/_ccr/follow`
* `/{index}/_ccr/unfollow` to `/{index}/_ccr/pause_follow`
* `/{index}/_ccr/follow` to `/{index}/_ccr/resume_follow`
Relates to #33931
always use `IndicesOptions.strictExpand()` for indices options.
The follow index may be closed and we still want to get stats from
shard follow task and the whether the provided index name matches with
follow index name is checked when locating the task itself in the ccr
stats transport action.
`Settings` is no longer required to get a `Logger` and we went to quite
a bit of effort to pass it to the `Logger` getters. This removes the
`Settings` from all of the logger fetches in security and x-pack:core.
Security previously hardcoded a default scroll keepalive of 10 seconds,
but in some cases this is not enough time as there can be network
issues or overloading of host machines. After this change, security
will now use the default keepalive timeout, which is controllable using
a setting and the default value is 5 minutes.
This commits creates a DateMathParser interface, which is already
implemented for both joda and java time. While currently the java time
DateMathParser is not used, this change will allow a followup which will
create a DateMathParser from a DateFormatter, so the caller does not
need to know the internals of the DateFormatter they have.
This change cleans up "unused variable" warnings. There are several cases were we
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
ILM now only forces indices to become read only in the case of Shrink
and Force Merge actions, as these are most useful in cases where the
index is no longer being written to.
Previously the timestamp_formats field in the response
from the find_file_structure endpoint contained Joda
timestamp formats. This change makes that clear by
renaming the field to joda_timestamp_formats, and also
adds a java_timestamp_formats field containing the
equivalent Java time format strings.
The source only snapshot drops fully deleted segments before snapshotting
them. In order to compare them we need to drop all fully deleted segments
in the test as well.
Closes#33755
Previously numeric values in the field_stats created by the
find_file_structure endpoint were always output with a
decimal point. This looked unfriendly and unnatural for
fields that clearly store integer values. This change
converts integer values to type Integer before output in
the file structure field stats.
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
The job deletion logic was scattered around a few places:
the transport action, the job manager and the deletion task.
Overloading the task with deletion logic also meant extra
dependencies in the core package which should be unnecessary.
This commit consolidates all this logic into the transport action
and replaces the deletion task with a plain one that needs not be
aware of deletion logic.
Using index settings for ILM state is fragile and exposes too much
information that doesn't need to be exposed. Using custom index metadata
is more resilient and allows more controlled access to internal
information.
As part of these changes, moves away from using defaults for ILM-related
values, in favor of using null values to clearly indicate that the value is not
present.
The fix in #33757 introduces some workaround since FilterCodecReader didn't
support unwrapping. This cuts over to a more elegant fix to access the readers
segment infos.
Instead of having one constructor that accepts all arguments, all parameters
should be provided via setters. Only leader and follower index are required
arguments. This makes using this class in tests and transport client easier.
This moves away from caching a list of steps for a current phase, instead
rebuilding the necessary step from the phase JSON stored in the index's
metadata.
Relates to #29823
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
(e.g. create_and_follow api call fails) or cluster alias
(e.g. fetching remote cluster state fails).
Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.
Relates to #33007
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.
The extra caused by level is not necessary here:
```
"fetch_exceptions": [
{
"from_seq_no": 1,
"retries": 106,
"exception": {
"type": "exception",
"reason": "[index1] IndexNotFoundException[no such index]",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index",
"index_uuid": "_na_",
"index": "index1"
}
}
}
],
```
We can't rely on the leaf reader ordinal in a wrapped reader since
it might not correspond to the ordinal in the SegmentInfos for it's
SegmentCommitInfo.
Relates to #32844Closes#33689Closes#33755
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.
This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check
before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
Ensure that the SSLConfigurationReloaderTests can run with JDK 11
by pinning the HttpClient to TLS version to TLS1.2. This is necessary
becase even if the MockWebServer is set to user TLS1.2, we don't
set its enabled protocols, so if it receives a TLS1.3 request (which
is the default behavior for HttpClient in JDK11), it will use TLS1.3
and the original issue will manifest again.
Relates #33127Resolves#32124
Changes the format of log events in the audit logfile.
It also changes the filename suffix from `_access` to `_audit`.
The new entry format is consistent with Elastic Common Schema.
Entries are formatted as JSON with no nested objects and field
names have a dotted syntax. Moreover, log entries themselves
are not spaced by commas and there is exactly one entry per line.
In addition, entry fields are ordered, unlike a typical JSON doc,
such that a human would not strain his eyes over jumbled
fields from one line to the other; the order is defined in the log4j2
properties file.
The implementation utilizes the log4j2's `StringMapMessage`.
This means that the application builds the log event as a map
and the log4j logic (the appender's layout) handle the format
internally. The layout, such as the set of printed fields and their
order, can be changed at runtime without restarting the node.