This is related to #30876. The AbstractSimpleTransportTestCase initiates
many tcp connections. There are normally over 1,000 connections in
TIME_WAIT at the end of the test. This is because every test opens at
least two different transports that connect to each other with 13
channel connection profiles. This commit modifies the default
connection profile used by this test to 6. One connection for each
type, except for REG which gets 2 connections.
With the introduction of _ilm/stop and _ilm/start APIs, the
use cases where one would only target a select group
of indices to start/stop has been reduced. Since there is no
strong use-case for skipping specific indices, it is best to
remove this functionality and only adding if later desired, with the
hopes of keeping things more simple.
through randomization, there is a chance that the mutateInstance
for PolicyStatsTests does not actually mutate the original object.
This PR aims to fix this
* NETWORKING: Add SSL Handler before other Handlers
* The only way to run into the issue in #33998 is for `Netty4MessageChannelHandler`
to be in the pipeline while the SslHandler is not. Adding the SslHandler before any
other handlers should ensure correct ordering here even when we handle upstream events
in our own thread pool
* Ensure that channels that were closed concurrently don't trip the assertion
* Closes#33998
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
This limit is based on the size in bytes of the operations in the write buffer. If this limit is exceeded then no more read operations will be coordinated until the size in bytes of the write buffer has dropped below the configured write buffer size limit.
Renamed existing `max_write_buffer_size` to ``max_write_buffer_count` to indicate that limit is count based.
Closes#34705
Previously, `Mapper` was returning an incorrect plan which resulted in an
```
SQLFeatureNotSupportedException: Found 1 problem(s)
line 1:8: Unexecutable item
```
Queries with a `WHERE` and/or `HAVING` clause which results in NO_MATCH
are now handled correctly and return 0 rows.
Fixes: #34613
Co-authored-by: Costin Leau <costin.leau@gmail.com>
* Adds usage data for ILM
* Adds tests for IndexLifecycleFeatureSetUsage and friends
* Adds tests for IndexLifecycleFeatureSet
* Fixes merge errors
* Add number of indices managed to usage stats
Also adds more tests
* Addresses Review comments
We should not create a follower index and abort a follow request if the
leader does not have soft-deletes. Moreover, we also should not
auto-follow an index if it does not have soft-deletes.
* Removes Set Policy API in favour of setting index.lifecycle.name
directly
* Reinstates matcher that will still be used
* Cleans up code after rebase
* Adds test to check changing policy with ndex settings works
* Fixes TimeseriesLifecycleActionsIT after API removal
* Fixes docs tests
* Fixes case on close where lifecycle service was never created
* Adding stack_monitoring_agent role
* Fixing checkstyle issues
* Adding tests for new role
* Tighten up privileges around index templates
* s/stack_monitoring_user/remote_monitoring_collector/ + remote_monitoring_user
* Fixing checkstyle violation
* Fix test
* Removing unused field
* Adding missed code
* Fixing data type
* Update Integration Test for new builtin user
With this change, we apply the common test config automatically to all
newly created tasks instead of opting in specifically.
For plugin authors using the plugin externally this means that the
configuration will be applied to their RandomizedTestingTasks as well.
The purpose of the task is to simplify setup and make it easier to
change projects that use the `test` task but actually run integration
tests to use a task called `integTest` for clarity, but also because
we may want to configure and run them differently.
E.x. using different levels of concurrency.
Implemented null handling for both the value tested but also for
values inside the list of values tested against.
The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.
Closes: #34582
#33708 introduced a strict deprecation mode that makes a REST request
fail if there is a warning header in the response returned by
Elasticsearch (usually a deprecation message signaling that a feature
or a field has been deprecated).
This change adds the strict deprecation mode into the REST integration
tests, and makes the tests fail if a deprecated feature is used. Also
any test using a deprecated feature has been modified to pass the build.
The YAML integration tests already analyzed HTTP warnings so they do
not use this mode, keeping their "expected vs actual" behavior.
Per #31717 this commit changes the defaults to the following:
Batch size of 5120 ops.
Maximum of 12 concurrent read requests.
Maximum of 9 concurrent write requests.
This is not necessarily our final values but it's good to have these as defaults for the purposes of initial testing.
The changes introduced in cca1a2a mean that we should
not encrypt the public keys that might be generated by
the key-pair-generator when storing the file, as the code
that would consume them assumes that they are not encrypted
* Change the `TransportPauseFollowAction` to extend from `TransportMasterNodeAction`
instead of `HandledAction`, this removes a sync cluster state api call.
* Introduced `ResponseHandler` that removes duplicated code in `TransportPauseFollowAction` and
`TransportResumeFollowAction`.
* Changed `PauseFollowAction.Request` to not use `readFrom()`.
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
Since there's no transition into the "new" phase it wasn't set until the "hot"
phase, so now we initialize it when initializing the policy context.
Resolves#34277
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
Both testFollowIndexAndCloseNode and testFailOverOnFollower failed
because they responded to the FollowTask a TransportService closed
exception which is currently considered as a fatal error. This behavior
is not desirable since a closing node can throw that exception, and we
should retry in that case.
This change adds TransportService closed error to the list of retryable
errors.
Closes#34694
As part of this change the leader index name and leader cluster name are
stored in the CCR metadata in the follow index. The resume follow api
will read that when a resume follow request is executed.
We should delete a job by directly talking to the allocated
task and telling it to shutdown. Today we shut down a job
via the persistent task framework. This is not ideal because,
while the job has been removed from the persistent task
CS, the allocated task continues to live until it gets the
shutdown message.
This means a user can delete a job, immediately delete
the rollup index, and then see new documents appear in
the just-deleted index. This happens because the indexer
in the allocated task is still running and indexes a few
more documents before getting the shutdown command.
In this PR, the transport action is changed to a TransportTasksAction,
and we invoke onCancelled() directly on the matching job.
The race condition still exists after this PR (albeit less likely),
but this was a precursor to fixing the issue and a self-contained
chunk of code. A second PR will followup to fix the race itself.
Since #34412 and #34474, a follower must have soft-deletes enabled
to work correctly. This change requires soft-deletes on the follower.
Relates #34412
Relates #34474
This fixes a bug about aliases authorization.
That is, a user might see aliases which he is not authorized to see.
This manifests when the user is not authorized to see any aliases
and the `GetAlias` request is empty which normally is a marking
that all aliases are requested. In this case, no aliases should be
returned, but due to this bug, all aliases will have been returned.
Extend querying support on multiple indices from being strictly
identical to being just compatible.
Use FieldCapabilities API (extended through #33803) for mapping merging.
Close#31837#31611
* Changed the resource id of auto follow patterns to be a user defined name
instead of being the leader cluster alias name.
* Fail when an unfollowed leader index matches with two or more auto follow patterns.
Implement the functionality to translate the
`field IN (value1, value2,...)` expressions to proper Lucene queries
or painless script or local processors depending on the use case.
The `IN` expression can be used in SELECT, WHERE and HAVING clauses.
Closes: #32955
`CONVERT` works exactly like cast with slightly different syntax:
`CONVERT(<value>, <data_type)` as opposed to `CAST(<value> AS <data_type>)`
Moreover it support format of the MS-SQL data types `SQL_<type>`,
e.g.: `SQL_INTEGER`
Closes: #34513
JDK11 introduced some changes with the SSLEngine. A number of error
messages were changed. Additionally, there were some behavior changes
in regard to how the SSLEngine handles closes during the handshake
process. This commit updates our tests and SSLDriver to support these
changes.
All of the tests in PainlessDomainSplitIT have an awaitsfix, which
causes the build to fail since no tests are run. This adds an empty
test to get the build going again.
Relates #34683
Relates #32966
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
This is the fixed version of #34246, which was reverted.
Relates #33205
We should be consistent here. We were already using the casing "Ccr" and
this is the preferred casing for Java class names. This commit adjusts
the names of some classes that were using the casing "CCR" to be "Ccr".
In some of our X-Pack REST tests we have to wait for pending tasks to
complete. We are now needing this functionality in ESRestTestCase for
the docs tests where we run against X-Pack features. This commit moves
the helper method that we have in X-Pack to ESRestTestCase, and removes
duplicate logic from waiting for rollup tasks to complete.
Since #34288, we might hit deadlock if the FollowTask has more fetchers
than writers. This can happen in the following scenario:
Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has
two fetchers and one writer.
1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0,
num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1
respectively.
2. The second request which fetches seq#1 completes before, and then it
triggers a write request containing only seq#1.
3. The primary of a follower fails after it has replicated seq#1 to
replicas.
4. Since the old primary did not respond, the FollowTask issues another
write request containing seq#1 (resend the previous write request).
5. The new primary has seq#1 already; thus it won't replicate seq#1 to
replicas but will wait for the global checkpoint to advance at least
seq#1.
The problem is that the FollowTask has only one writer and that writer
is waiting for seq#0 which won't be delivered until the writer completed.
This PR proposes to replicate existing operations with the old primary
term (instead of the current term) on the follower. In particular, when
the following primary detects that it has processed an process already,
it will look up the term of an existing operation with the same seq_no
in the Lucene index, then rewrite that operation with the old term
before replicating it to the following replicas. This approach is
wait-free but requires soft-deletes on the follower.
Relates #34288
Today we rely on the LocalCheckpointTracker to ensure no duplicate when
enabling optimization using max_seq_no_of_updates. The problem is that
the LocalCheckpointTracker is not fully reloaded when opening an engine
with an out-of-order index commit. Suppose the starting commit has seq#0
and seq#2, then the current LocalCheckpointTracker would return "false"
when asking if seq#2 was processed before although seq#2 in the commit.
This change scans the existing sequence numbers in the starting commit,
then marks these as completed in the LocalCheckpointTracker to ensure
the consistent state between LocalCheckpointTracker and Lucene commit.
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Generalize NULLS LAST/FIRST
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place (introduce schema:: prefix for declaration)
upfront.
Fix#32079
With this commit we cleanup hand-coded duplicate checks in XContent
parsing. They were necessary previously but since we reconfigured the
underlying parser in #22073 and #22225, these checks are obsolete and
were also ineffective unless an undocumented system property has been
set. As we also remove this escape hatch, we can remove the additional
checks as well.
Closes#22253
Relates #34588
For user/_has_privileges and user/_privileges, handle the case where
there is no user in the security context. This is likely to indicate
that the server is running with a basic license, in which case the
action will be rejected with a non-compliance exception (provided
we don't throw a NPE).
The implementation here is based on the _authenticate API.
Resolves: #34567
This change makes it no longer possible to follow / auto follow without
specifying a leader cluster. If a local index needs to be followed
then `cluster.remote.*.seeds` should point to nodes in the local cluster.
Closes#34258
Having integration tests separated from the unit tests in the qa
directory works much more smoothly with our testing infrastructure,
matches what other plugins do, and tests in a more "real" deployment
scenario by having all plugins installed.
The setting that reduces the disk space requirement
for the forecasting integration tests was accidentally
removed in #31757 when files were moved around. This
change simply adds back the setting that existed before
that.
A constant can now be used outside aggregation only queries.
Don't skip an ES query in case of constants-only selects.
Loosen the binary pipe restriction of being used only in aggregation queries.
Fixes https://github.com/elastic/elasticsearch/issues/31863
Right now, watches fail on runtime, when invalid email addresses are
used.
All those fields can be checked on parsing, if no mustache is used in
any email address template. In that case we can return immediate
feedback, that invalid email addresses should not be specified when
trying to store a watch.
The logfile audit log format is no longer formed by prefix fields followed
by key value fields, it is all formed by key value fields only (JSON format).
Consequently, the following settings, which toggled some of the prefix
fields, have been renamed:
audit.logfile .prefix.emit_node_host_address
audit.logfile .prefix.emit_node_host_name
audit.logfile .prefix.emit_node_name
This API is intended as a companion to the _has_privileges API.
It returns the list of privileges that are held by the current user.
This information is difficult to reason about, and consumers should
avoid making direct security decisions based solely on this data.
For example, each of the following index privileges (as well as many
more) would grant a user access to index a new document into the
"metrics-2018-08-30" index, but clients should not try and deduce
that information from this API.
- "all" on "*"
- "all" on "metrics-*"
- "write" on "metrics-2018-*"
- "write" on "metrics-2018-08-30"
Rather, if a client wished to know if a user had "index" access to
_any_ index, it would be possible to use this API to determine whether
the user has any index privileges, and on which index patterns, and
then feed those index patterns into _has_privileges in order to
determine whether the "index" privilege had been granted.
The result JSON is modelled on the Role API, with a few small changes
to reflect how privileges are modelled when multiple roles are merged
together (multiple DLS queries, multiple FLS grants, multiple global
conditions, etc).
This commit moves the definition of domainSplit into java and exposes it
as a painless whitelist extension. The method also no longer needs
params, and version which ignores params is added and deprecated.
This reverts commit 0b4e8db1d3 as some
issues have been identified with the changed handling of a primary
shard of the security index not being available.
This moves the rollup cleanup code for http tests from the high level rest
client into the test framework and then entirely removes the rollup cleanup
code for http tests that lived in x-pack. This is nice because it
consolidates the cleanup into one spot, automatically invokes the cleanup
without the test having to know that it is "about rollup", and should allow
us to run the rollup docs tests.
Part of #34530
The token service has fairly strict validation and there are a range
of reasons why request may be rejected.
The detail is typically returned in the client exception / json body
but the ES admin can only debug that if they have access to detailed
logs from the client.
This commit adds debug & trace logging to the token service so that it
is possible to perform this debugging from the server side if
necessary.
The security native stores follow a pattern where
`SecurityIndexManager#prepareIndexIfNeededThenExecute` wraps most calls
made for the security index. The reasoning behind this was to check if
the security index had been upgraded to the latest version in a
consistent manner. However, this has the potential side effect that a
read will trigger the creation of the security index or an updating of
its mappings, which can lead to issues such as failures due to put
mapping requests timing out even though we might have been able to read
from the index and get the data necessary.
This change introduces a new method, `checkIndexVersionThenExecute`,
that provides the consistent checking of the security index to make
sure it has been upgraded. That is the only check that this method
performs prior to running the passed in operation, which removes the
possible triggering of index creation and mapping updates for reads.
Additionally, areas where we do reads now check the availability of the
security index and can short circuit requests. Availability in this
context means that the index exists and all primaries are active.
Relates #33205
This change adds the command RemoveIndexLifecyclePolicy to the HLRC. This uses the
new TimeRequest as a base class for RemoveIndexLifecyclePolicyRequest on the client side.
We're publishing jdbc into our maven repo as though its artifactId is
`x-pack-sql-jdbc` but the pom listed the artifactId as `jdbc`. This
fixes the pom to line up with where we're publishing the artifact.
Closes#34399
* Rollup adding support for date field metrics (#34185)
* Restricting supported metrics for `date` field rollup
* fixing expected error message for yaml test
* Addressing PR comments
The `AutoFollowTests` needs to restart the clusters between each tests, because
it is using auto follow stats in assertions. Auto follow stats are only reset
by stopping the elected master node.
Extracted the `testGetOperationsBasedOnGlobalSequenceId()` test to its own test, because it just tests the shard changes api.
* Renamed AutoFollowTests to AutoFollowIT, because it is an integration test.
Renamed ShardChangesIT to IndexFollowingIT, because shard changes it the name
of an internal api and isn't a good name for an integration test.
* move creation of NodeConfigurationSource to a seperate method
* Fixes issues after merge, moved assertSeqNos() and assertSameDocIdsOnShards() methods from ESIntegTestCase to InternalTestCluster, so that ccr tests can use these methods too.
When performing an internal reindex, we add a setting marking the source
as read-only. We also check that this index is not already
read-only. This means that when we add the read-only setting, we expect
that it is already not there. This commit adds an assertion before we
increment the settings version validating that this is indeed the case.
This commit introduces settings version to index metadata. This value is
monotonically increasing and is updated on settings updates. This will
be useful in cross-cluster replication so that we can request settings
updates from the leader only when there is a settings update.
xContent ordering is unreliable when derived from map insertions but the parsed objects’ .equals() methods have the sort logic required to prove connections and vertices are correct. Disabled the xContent equivalence checks.
Closes#33686
In 54cb890 a setting for testing only was introduced, that delayed the start up of watcher. With the changes of how is watcher is started/stopped over time, this is not needed anymore.
Security caches the result of role lookups and negative lookups are
cached indefinitely. In the case of transient failures this leads to a
bad experience as the roles could truly exist. The CompositeRolesStore
needs to know if a failure occurred in one of the roles stores in order
to make the appropriate decision as it relates to caching. In order to
provide this information to the CompositeRolesStore, the return type of
methods to retrieve roles has changed to a new class,
RoleRetrievalResult. This class provides the ability to pass back an
exception to the roles store. This exception does not mean that a
request should be failed but instead serves as a signal to the roles
store that missing roles should not be cached and neither should the
combined role if there are missing roles.
As part of this, the negative lookup cache was also changed from an
unbounded cache to a cache with a configurable limit.
Relates #33205
PR #34290 made it impossible to use thread-context values to pass
authentication metadata out of a realm. The SAML realm used this
technique to allow the SamlAuthenticateAction to process the parsed
SAML token, and apply them to the access token that was generated.
This new method adds metadata to the AuthenticationResult itself, and
then the authentication service makes this result available on the
thread context.
Closes: #34332
The ingest pipeline that is produced is very simple. It
contains a grok processor if the format is semi-structured
text, a date processor if the format contains a timestamp,
and a remove processor if required to remove the interim
timestamp field parsed out of semi-structured text.
Eventually the UI should offer the option to customize the
pipeline with additional processors to perform other data
preparation steps before ingesting data to an index.
In ccb9ab5717 we changed how we deal with time
fields to support the `DateTime`-format fields added in 6.0, but dropped
support for pre-6.x `Long`-format fields. This change reinstates this support
for cases where pre-6.x data is made available to ML (e.g. in a mixed-version
CCS setup or after an upgrade).
ListenableFuture may run a listener on the same thread that called the
addListener method or it may execute on another thread after the future
has completed. Whenever the ListenableFuture stores the listener for
execution later, it should preserve the thread context which is what
this change does.
Today we rewrite the operations from the leader with the term of the
following primary because the follower should own its history. The
problem is that a newly promoted primary may re-assign its term to
operations which were replicated to replicas before by the previous
primary. If this happens, some operations with the same seq_no may be
assigned different terms. This is not good for the future optimistic
locking using a combination of seqno and term.
This change ensures that the primary of a follower only processes an
operation if that operation was not processed before. The skipped
operations are guaranteed to be delivered to replicas via either
primary-replica resync or peer-recovery. However, the primary must not
acknowledge until the global checkpoint is at least the highest seqno of
all skipped ops (i.e., they all have been processed on every replica).
Relates #31751
Relates #31113
* New OCTET_LENGTH function
* Changed the way the FunctionRegistry stores functions, considering the alphabetic ordering by name
* Added documentation for the RANDOM function
This fixes an issue where an incorrect expected next step is used when checking
to execute `AsyncActionStep`s after a cluster state step.
It fixes this scenario:
- `ExecuteStepsUpdateTask` executes a `ClusterStateWaitStep` or
`ClusterStateActionStep` successfully
- The next step is also a `ClusterStateWaitStep`, so it loops
- The `ClusterStateWaitStep` has a next stepkey (which gets set to the
`nextStepKey` in the code)
- The `ClusterStateWaitStep` fails the condition, meaning that it will have to
wait longer
- The `nextStepKey` is now incorrect though, because we did not advance the
index's step, and it's not `null` (which is another safe value if there is no
step after the `ClusterStateWaitStep`)
This fixes the problem by resetting the nextStepKey to null if the condition is
not met, since we are not going to advance the step metadata in this
case (thereby skipping the `maybeRunAsyncAction` invocation).
This commit also tightens up and enhances much of the ILM logging. A lot of
logging was missing the index name (making it hard to debug in the presence of
multiple indices) and a lot was using the wrong logging level (DEBUG is now
actually readable without being a wall of text).
Resolves#34297
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
Also fixed ShardFollowNodeTaskTests to not return ops when responseSize
is empty. Otherwise ops are returned when no ops are expected to be returned.
Co-authored-by: Jason Tedor <jason@tedor.me>
Unfollow should be allowed / disallowed on a per index level instead of
cluster level.
Also renamed `create_follow_index` index privilege to
`manage_follow_index` privilege and include unfollow and close APIs.
This commit modifies the follow stats API response structure to more
clearly highlight meaning of the higher level fields. In particular,
previously the response had a top-level key for each index. Instead, we
nest the indices under an "indices" field which is now an array. The
values in this array are objects containing two fields: "index" which is
the name of the follower index, and "shards" which is an array where
each value in the array is the follower stats for that shard. That is,
we have gone from:
{
"bar": [
{
"shard_id": 0...
}...
]...
}
to
{
"indices": [
{
"index": "bar",
"shards": [
{
"shard_id": 0...
}...
]
}...
}
In the CCR docs we want to refer to the endpoint that returns following
stats as the follow stats API. This commit renames the internal
implementation of this endpoint to reflect this usage.
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
There may be values in the thread context that ought to be preseved
for later use, even if one or more realms perform asynchronous
authentication.
This commit changes the AuthenticationService to wrap the potentially
asynchronous calls in a ContextPreservingActionListener that retains
the original thread context for the authentication.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
Drops the last logging constructor that takes `Settings` because it is
no longer needed.
Watcher goes through a lot of effort to pass `Settings` to `Logger`
constructors and dropping `Settings` from all of those calls allowed us
to remove quite a bit of log-based ceremony from watcher.
The Security plugin authorizes actions on indices. Authorization
happens on a per index/alias basis. Therefore a request with a
Multi Index Expression (containing wildcards) has to be
first evaluated in the authorization layer, before the request is
handled. For authorization purposes, wildcards in expressions will
only be expanded to indices/aliases that are visible by the authenticated
user. However, this "constrained" evaluation has to be compatible with
the expression evaluation that a cluster without the Security plugin
would do. Therefore any change in the evaluation logic
in any of these sites has to be mirrored in the other site.
This commit mirrors the changes in core from #33518 that allowed
for Multi Index Expression in the Get Alias API, loosely speaking.
This enables Elasticsearch to use the JVM-wide configured
PKCS#11 token as a keystore or a truststore for its TLS configuration.
The JVM is assumed to be configured accordingly with the appropriate
Security Provider implementation that supports PKCS#11 tokens.
For the PKCS#11 token to be used as a keystore or a truststore for an
SSLConfiguration, the .keystore.type or .truststore.type must be
explicitly set to pkcs11 in the configuration.
The fact that the PKCS#11 token configuration is JVM wide implies that
there is only one available keystore and truststore that can be used by TLS
configurations in Elasticsearch.
The PIN for the PKCS#11 token can be set as a truststore parameter in
Elasticsearch or as a JVM parameter ( -Djavax.net.ssl.trustStorePassword).
The basic goal of enabling PKCS#11 token support is to allow PKCS#11-NSS in
FIPS mode to be used as a FIPS 140-2 enabled Security Provider.
When the cluster.routing.allocation.disk.watermark.flood_stage watermark
is breached, DiskThresholdMonitor marks the indices as read-only. This
failed when x-pack security was present as system user does not have the privilege
for update settings action("indices:admin/settings/update").
This commit adds the required privilege for the system user. Also added missing
debug logs when access is denied to help future debugging.
An assert statement is added to catch any missed privileges required for
system user.
Closes#33119
In SessionFactoryLoadBalancingTests#testRoundRobinWithFailures()
we kill ldap servers randomly and immediately bind to that port
connecting to mock server socket. This is done to avoid someone else
listening to this port. As the creation of mock socket and binding to the
port is immediate, sometimes the earlier socket would be in TIME_WAIT state
thereby having problems with either bind or connect.
This commit sets the SO_REUSEADDR explicitly to true and also sets
the linger on time to 0(as we are not writing any data) so as to
allow re-use of the port and close immediately.
Note: I could not find other places where this might be problematic
but looking at test runs and netstat output I do see lot of sockets
in TIME_WAIT. If we find that this needs to be addressed we can
wrap ServerSocketFactory to set these options and use that with in
memory ldap server configuration during tests.
Closes#32190
This commit upgrades the unboundid ldapsdk to version 4.0.8. The
primary driver for upgrading is a fix that prevents this library from
rewrapping Error instances that would normally bubble up to the
UncaughtExceptionHandler and terminate the JVM. Other notable changes
include some fixes related to connection handling in the library's
connection pool implementation.
Closes#33175
- this adds an integration test that runs through a policy
with all the actions defined.
- adds a test specific to a policy having just a rollover action
- bumps the node count to 4
The `DnRoleMapper` class is used to map distinguished names of groups
and users to role names. This mapper builds in an internal map that
maps from a `com.unboundid.ldap.sdk.DN` to a `Set<String>`. In cases
where a lot of distinct DNs are mapped to roles, this can consume quite
a bit of memory. The majority of the memory is consumed by the DN
object. For example, a 94 character DN that has 9 relative DNs (RDN)
will retain 4KB of memory, whereas the String itself consumes less than
250 bytes.
In order to reduce memory usage, we can map from a normalized DN string
to a List of roles. The normalized string is actually how the DN class
determines equality with another DN and we can drop the overhead of
needing to keep all of the other objects in memory. Additionally the
use of a List provides memory savings as each HashSet is backed by a
HashMap, which consumes a great deal more memory than an appropriately
sized ArrayList. The uniqueness we get from a Set is maintained by
first building a set when parsing the file and then converting to a
list upon completion.
Closes#34237
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
This commit changes the way that step execution flows. Rather than have any step
run when the cluster state changes or the periodic scheduler fires, this now
runs the different types of steps at different times.
`AsyncWaitStep` is run at a periodic manner, ie, every 10 minutes by default
`ClusterStateActionStep` and `ClusterStateWaitStep` are run every time the
cluster state changes.
`AsyncActionStep` is now run only after the cluster state has been transitioned
into a new step. This prevents these non-idempotent steps from running at the
same time. It addition to being run when transitioned into, this is also run
when a node is newly elected master (only if set as the current step) so that
master failover does not fail to run the step.
This also changes the `RolloverStep` from an `AsyncActionStep` to an
`AsyncWaitStep` so that it can run periodically.
Relates to #29823
The follower index shard history UUID will be fetched from the indices stats api when the shard follow task starts and will be provided with the bulk shard operation requests. The bulk shard operations api will fail if the provided history uuid is unequal to the actual history uuid.
No longer record the leader history uuid in shard follow task params, but rather use the leader history UUIDs directly from follower index's custom metadata. The resume follow api will remain to fail if leader index shard history UUIDs are missing.
Closes#33956
Revert "[TESTS] Pin MockWebServer to TLS1.2 (#33127)" (commit
214652d4af) and "Pin TLS1.2 in
SSLConfigurationReloaderTests" (commit
d9f5e4fd2e), which pinned the
MockWebServer used in the SSLConfigurationReloaderTests to TLSv1.2 in
order to prevent failures with JDK 11 related to ssl session
invalidation. We no longer need this pinning as the problematic code
was fixed in #34130.
The `-` and `+` as a number literal prefix are already
parsed by the rule in `valueExpression`. To accommodate
this, there are some code changes that enables the
`ExpressionBuilder` to parse Literal integers and decimals
together with the `-/+` prefix sign (if exists) and validate
them (wrong format, large numbers, etc.).
Follows: #33854
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
Previously, parsing an arithmetic expression with `*` and no spaces,
e.g.: `2*i` threw a parsing exception as the grammar rule for
tableIdentifier was clashing with the rule for arithmetic operator `*`.
This issue comes already in the lexer and the left part of the
expression (in our example `2*`) was recognised as a
TABLE_IDENTIFIER token.
The solution adopted is to allow the `*` wildcard in the table name
only if it's surrounded with double quotes, e.g.: `"my*index"`
Closes: #33957
Remove CamelCase to CAMEL_CASE conversion when resolving
a function. Only convert user input to upper case and then
try to match with aliases or primary names.
Keep the internal conversion FunctionName to FUNCTION__NAME
which provides flexibility when registering functions by their class
name.
Fixes: #34114
We use wrap code in `// tag` and `//end` to include it in our docs. Our
current docs style wraps code snippets in a box that is only wide enough
for 76 characters and adds a horizontal scroll bar for wider snippets
which makes the snippet much harder to read. This adds a checkstyle check
that looks for java code that is included in the docs and is wider than
that 76 characters so all snippets fit into the box. It solves many of
the failures that this catches but suppresses many more. I will clean
those up in a follow up change.
Since #34099, the FollowingEngine will skip an operation which was
already processed before. With that change, it should be okay to unmute
testFollowIndexAndCloseNode.
This change fixes a potential deadlock problem in the unit
test introduced in #34117.
It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
This arose when two commits were pushed at roughly the same time, both
of which compiled successfully against master, but not when taken
together. This commit fixes a reference in one of the commits that was
changed in the other commit.
This commit modifies the CCR stats endpoint for indices to be
/{index}/_ccr/stats. This makes this endpoint consistent with other
index-centric endpoints like indices stats.
The unfollow API changes a follower index into a regular index, so that it will accept write requests from clients.
For the unfollow api to work the index follow needs to be stopped and the index needs to be closed.
Closes#33931
This change introduces the indexing optimization using sequence numbers
in the FollowingEngine. This optimization uses the max_seq_no_updates
which is tracked on the primary of the leader and replicated to replicas
and followers.
Relates #33656
This can be used to restrict the amount of CPU a single
structure finder request can use.
The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.
The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
Prior to following an index in the follow API, check whether current
user has sufficient privileges in the leader cluster to read and
monitor the leader index.
Also check this in the create and follow API prior to creating the
follow index.
Also introduced READ_CCR cluster privilege that include the minimal
cluster level actions that are required for ccr in the leader cluster.
So a user can follow indices in a cluster, but not use the ccr admin APIs.
Closes#33553
Co-authored-by: Jason Tedor <jason@tedor.me>
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#29989
The SSLService invalidates SSLSessions when there is a change to any of
the underlying key or trust material. However, this invalidation code
did not check for a null SSLSession being returned from the context and
assumed that the context would always return a non-null object. The
return of a null object is possible in all versions, but JDK11 seems to
return them more often due to changes for TLS 1.3. There are a number
of reasons that we get a id of a session but the context returns null
when the session with that id is requested. Some of the reasons for
this are:
* Session was evicted by session cache
* Session has timed out
* Session has been invalidated by another caller
To handle this, the SSLService now checks if the value is null before
calling invalidate on the SSLSession.
Closes#32124
This commit removes the use of ExecutableScript from watcher in favor of
custom script contexts for both watcher condition scripts and transform
scripts.
In prior versions of Java, we expected to see a SSLHandshakeException
when starting a handshake with a server that we do not trust. In JDK11,
the exception has changed to a SSLException, which
SSLHandshakeException extends. This is most likely a side effect of the
TLS 1.3 changes in JDK11. This change updates the test to catch the
SSLException instead of the SSLHandshakeException and enables the test
to work on JDK8 through JDK11.
Closes#32293
The following changes were made:
* Added ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error.
* Added Index block exception. If the leader index gets closed, this exception is returned.
* Added ClusterBlockException service unavailable. In case for example the leader cluster is without elected master.
* Removed IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error.
Closes#33954
Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.
This also addresses a potential issue in caching when the same query contains aggregations but in
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in
the cache.
Due to a bug, that was fixed in #33360 and commit
1de2a925ce the initial adding of a watch
could get lost, thus leaving the watcher stats count as zero despite adding a watch.
Closes#33326
* Renamed CCR APIs
Renamed:
* `/{index}/_ccr/create_and_follow` to `/{index}/_ccr/follow`
* `/{index}/_ccr/unfollow` to `/{index}/_ccr/pause_follow`
* `/{index}/_ccr/follow` to `/{index}/_ccr/resume_follow`
Relates to #33931
always use `IndicesOptions.strictExpand()` for indices options.
The follow index may be closed and we still want to get stats from
shard follow task and the whether the provided index name matches with
follow index name is checked when locating the task itself in the ccr
stats transport action.
`Settings` is no longer required to get a `Logger` and we went to quite
a bit of effort to pass it to the `Logger` getters. This removes the
`Settings` from all of the logger fetches in security and x-pack:core.
Centralize and simplify the script generation between operators and functions which are currently decoupled. As part of this process most predicates (<, <=, etc...) were made ScalarFunction as their purpose and functionality is quite similar (see % and MOD functions).
Renamed ProcessDefinition to Pipe
Add ScriptWeaver as a mixin-in interface for script customization
Add logic for equals/lte/lt
Improve BinaryOperator/expression toString
Minimize duplication across string functions
Close#33975
This commit adds the ability to plug in compilation of custom contexts
in mock script engine. This is needed for testing plugins which add
custom contexts like watcher.
Watcher is using a lot of so called TextTemplate fields in a watch
definition, which can use mustache to insert the watch id for example.
For the user it is non-obvious which field is just a string field or
which field is a text template.
This also means, that for every such field, we currently do a script
compilation, even if the field does not contain any mustache syntax.
This will lead to an increased script cache churn, because those
compiled scripts (that only contain a string), will evict other scripts.
On top of that, this also means that an unneeded compilation has
happened, instead of returning that string immediately.
The usages of mustache templating are in all of the actions (most of the time far
more than one compilation) as well as most of the inputs.
Especially when running a lot of watches in parallel, this will reduce
execution times and help reuse of real scripts.
Security previously hardcoded a default scroll keepalive of 10 seconds,
but in some cases this is not enough time as there can be network
issues or overloading of host machines. After this change, security
will now use the default keepalive timeout, which is controllable using
a setting and the default value is 5 minutes.
In order to optimize the use of the role cache, when the roles.yml file
is reloaded we now calculate the names of removed, changed, and added
roles so that they may be passed to any listeners. This allows a
listener to selectively clear cache for only the roles that have been
modified. The CompositeRolesStore has been adapted to do exactly that
so that we limit the need to reload roles from sources such as the
native roles stores or external role providers.
See #33205
This commits creates a DateMathParser interface, which is already
implemented for both joda and java time. While currently the java time
DateMathParser is not used, this change will allow a followup which will
create a DateMathParser from a DateFormatter, so the caller does not
need to know the internals of the DateFormatter they have.
This change cleans up "unused variable" warnings. There are several cases were we
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
This commit replicates the max_seq_no_of_updates on the leading index
to the primaries of the following index via ShardFollowNodeTask. The
max_seq_of_updates is then transmitted to the replicas of the follower
via replication requests (that's BulkShardOperationsRequest).
Relates #33656
Implement circuit breaker logic in the parser which catches expressions
that can blow up the tree and result in StackOverflowError being thrown.
Co-authored-by: Costin Leau <costin.leau@gmail.com>
ILM now only forces indices to become read only in the case of Shrink
and Force Merge actions, as these are most useful in cases where the
index is no longer being written to.
[ML] Modify thresholds for normalization triggers
The (arbitrary) threshold factors used to judge if scores have
changed significantly enough to trigger a look-back renormalization have
been changed to values that reduce the frequency of such
renormalizations.
Added a clause to treat changes in scores as a 'big change' if it would
result in a change of severity reported in the UI.
Also altered the clause affecting small scores so that a change should
be considered big if scores have changed by at least 1.5.
Relates https://github.com/elastic/machine-learning-qa/issues/263
We start tracking max seq_no_of_updates on the primary in #33842. This
commit replicates that value from a primary to its replicas in replication
requests or the translog phase of peer-recovery.
With this change, we guarantee that the value of max seq_no_of_updates
on a replica when any index/delete operation is performed at least the
max_seq_no_of_updates on the primary when that operation was executed.
Relates #33656
Previously the timestamp_formats field in the response
from the find_file_structure endpoint contained Joda
timestamp formats. This change makes that clear by
renaming the field to joda_timestamp_formats, and also
adds a java_timestamp_formats field containing the
equivalent Java time format strings.
Two issues are resolved:
1. The `value_type` should be either long or double in case of numeric.
2. The key label for the aggregate filter (having) was duplicate of an aggr key label.
Fixes: #33520
* Changed the format of the String functions documentation page.
* Adopted the same format for Math functions, but completely changed the examples.
* Added missing documentation for Math functions.
* TESTS: Stabilize Renegotiation Test
* The second `startHandshake` is not synchronous and a read of only
50ms may fail to trigger it entirely (the failure can be reproduced reliably by setting the socket timeout to `1`)
=> fixed by retrying the read until the handshake finishes (a longer timeout would've worked too,
but retrying seemed more stable)
* Closes#33772
* Make certain ML node settings dynamic (#33565)
* Changing to pull in updating settings and pass to constructor
* adding note about only newly opened jobs getting updated value
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
The source only snapshot drops fully deleted segments before snapshotting
them. In order to compare them we need to drop all fully deleted segments
in the test as well.
Closes#33755
If numWrites is between 2 and 9, we will issue an invalid range because
the from_seq_no is negative. This commit makes sure that numWrites is at
least 10, and adds an explicit test to verify invalid request ranges.
SearchGroupsResolverInMemoryTests was (rarely) fail in a way that
suggests that the server-side delay (100ms) was not enough to trigger
the client-side timeout (5ms).
The server side delay has been increased to try and overcome this.
Resolves: #32913
Previously numeric values in the field_stats created by the
find_file_structure endpoint were always output with a
decimal point. This looked unfriendly and unnatural for
fields that clearly store integer values. This change
converts integer values to type Integer before output in
the file structure field stats.
This PR is the first step to use seq_no to optimize indexing operations.
The idea is to track the max seq_no of either update or delete ops on a
primary, and transfer this information to replicas, and replicas use it
to optimize indexing plan for index operations (with assigned seq_no).
The max_seq_no_of_updates on primary is initialized once when a primary
finishes its local recovery or peer recovery in relocation or being
promoted. After that, the max_seq_no_of_updates is only advanced internally
inside an engine when processing update or delete operations.
Relates #33656
This commit reverts most of #33157 as it introduces another race
condition and breaks a common case of watcher, when the first watch is
added to the system and the index does not exist yet.
This means, that the index will be created, which triggers a reload, but
during this time the put watch operation that triggered this is not yet
indexed, so that both processes finish roughly add the same time and
should not overwrite each other but act complementary.
This commit reverts the logic of cleaning out the ticker engine watches
on start up, as this is done already when the execution is paused -
which also gets paused on the cluster state listener again, as we can be
sure here, that the watches index has not yet been created.
This also adds a new test, that starts a one node cluster and emulates
the case of a non existing watches index and a watch being added, which
should result in proper execution.
Closes#33320
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
The job deletion logic was scattered around a few places:
the transport action, the job manager and the deletion task.
Overloading the task with deletion logic also meant extra
dependencies in the core package which should be unnecessary.
This commit consolidates all this logic into the transport action
and replaces the deletion task with a plain one that needs not be
aware of deletion logic.
* Added TRUNCATE function, modified ROUND to accept two parameters instead of one. Made the second parameter optional for both functions.
* Added documentation for both functions.
Removed rules in the grammar that were superfluous, as they
are already "caught" other rules in the same context.
Also switched to exact ambig detection for debug mode
Fixes: #31885
Using index settings for ILM state is fragile and exposes too much
information that doesn't need to be exposed. Using custom index metadata
is more resilient and allows more controlled access to internal
information.
As part of these changes, moves away from using defaults for ILM-related
values, in favor of using null values to clearly indicate that the value is not
present.
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to
know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This
special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional
filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they
appear before them in a chain, because they produce multiple tokens at the same position.
This commit adds two methods to the TokenFilterFactory interface.
* `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding
filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and
`CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`.
* `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym
list `Analyzer`. By default it returns `true`.
Fixes#33609
The fix in #33757 introduces some workaround since FilterCodecReader didn't
support unwrapping. This cuts over to a more elegant fix to access the readers
segment infos.
Previously multiple comma separated lists of options where not
recognized correctly which resulted in only the last of them
to be taked into account, e.g.:
For the following query:
SELECT * FROM test WHERE QUERY('search', 'default_field=foo', 'default_operator=and')"
only the `default_operator=and` was finally passed to the ES query.
Fixes: #32602
Instead of having one constructor that accepts all arguments, all parameters
should be provided via setters. Only leader and follower index are required
arguments. This makes using this class in tests and transport client easier.
DAYNAME and MONTHNAME functions tests will be skipped if the right JVM parameter (-Djava.locale.providers=COMPAT) is not used in unit testing environment
This moves away from caching a list of steps for a current phase, instead
rebuilding the necessary step from the phase JSON stored in the index's
metadata.
Relates to #29823
Currently a watch execution results in one bulk request, when the
triggered watches are written into the that index, that need to be
executed. However the update of the watch status, the creation of the
watch history entry as well as the deletion of the triggered watches
index are all single document operations.
This can have quite a negative impact, once you are executing a lot of
watches, as each execution results in 4 documents writes, three of them
being single document actions.
This commit switches to a bulk processor instead of a single document
action for writing watch history entries and deleting triggered watch
entries. However the defaults are to run synchronous as before because
the number of concurrent requests is set to 0. This also fixes a bug,
where the deletion of the triggered watch entry was done asynchronously.
However if you have a high number of watches being executed, you can
configure watcher to delete the triggered watches entries as well as
writing the watch history entries via bulk requests.
The triggered watches deletions should still happen in a timely manner,
where as the history entries might actually be bound by size as one
entry can easily have 20kb.
The following settings have been added:
- xpack.watcher.bulk.actions (default 1)
- xpack.watcher.bulk.concurrent_requests (default 0)
- xpack.watcher.bulk.flush_interval (default 1s)
- xpack.watcher.bulk.size (default 1mb)
The drawback of this is of course, that on a node outage you might end
up with watch history entries not being written or watches needing to be
executing again because they have not been deleted from the triggered
watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
(e.g. create_and_follow api call fails) or cluster alias
(e.g. fetching remote cluster state fails).
Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.
Relates to #33007
* Implement xpack.monitoring.elasticsearch.collection.enabled setting
* Fixing line lengths
* Updating constructor calls in test
* Removing unused import
* Fixing line lengths in test classes
* Make monitoringService.isElasticsearchCollectionEnabled() return true for tests
* Remove wrong expectation
* Adding unit tests for new flag to be false
* Fixing line wrapping/indentation for better readability
* Adding docs
* Fixing logic in ClusterStatsCollector::shouldCollect
* Rebasing with master and resolving conflicts
* Simplifying implementation by gating scheduling
* Doc fixes / improvements
* Making methods package private
* Fixing wording
* Fixing method access
This PR removes fields that are not actually used by the Monitoring UI. This will greatly simplify the eventual migration to using Metricbeat for monitoring Elasticsearch (see https://github.com/elastic/beats/pull/8260#discussion_r215885868 for more context and discussion around removing these fields from ES collection).
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.
The extra caused by level is not necessary here:
```
"fetch_exceptions": [
{
"from_seq_no": 1,
"retries": 106,
"exception": {
"type": "exception",
"reason": "[index1] IndexNotFoundException[no such index]",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index",
"index_uuid": "_na_",
"index": "index1"
}
}
}
],
```
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.
This commit fixes that, by allowing that a leader index does not yet have
a mapping.
We can't rely on the leaf reader ordinal in a wrapped reader since
it might not correspond to the ordinal in the SegmentInfos for it's
SegmentCommitInfo.
Relates to #32844Closes#33689Closes#33755
This PR integrates the following pieces of machinery in the Coordinator:
- discovery
- pre-voting
- randomised election scheduling
- joining (of a new master)
- publication of cluster state updates
Together, these things are everything needed to form a cluster. We therefore
also add the start of a test suite that allows us to assert higher-level
properties of the interactions between all these pieces of machinery, with as
little fake behaviour as possible. We assert one such property: "a cluster
successfully forms".
This commit adds the Create Rollup Job API to the high level REST
client. It supersedes #32703 and adds dedicated request/response
objects so that it does not depend on server side components.
Related #29827
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
There was a listener that re-runs the policy with the new state when the cluster
state is processed by the `MoveToNextStepUpdateTask`. This removes this listener
as we will execute the policy through the `IndexLifecyleService` cluster state
listener.
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.
This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check
before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
Ensure that the SSLConfigurationReloaderTests can run with JDK 11
by pinning the HttpClient to TLS version to TLS1.2. This is necessary
becase even if the MockWebServer is set to user TLS1.2, we don't
set its enabled protocols, so if it receives a TLS1.3 request (which
is the default behavior for HttpClient in JDK11), it will use TLS1.3
and the original issue will manifest again.
Relates #33127Resolves#32124
Changes the format of log events in the audit logfile.
It also changes the filename suffix from `_access` to `_audit`.
The new entry format is consistent with Elastic Common Schema.
Entries are formatted as JSON with no nested objects and field
names have a dotted syntax. Moreover, log entries themselves
are not spaced by commas and there is exactly one entry per line.
In addition, entry fields are ordered, unlike a typical JSON doc,
such that a human would not strain his eyes over jumbled
fields from one line to the other; the order is defined in the log4j2
properties file.
The implementation utilizes the log4j2's `StringMapMessage`.
This means that the application builds the log event as a map
and the log4j logic (the appender's layout) handle the format
internally. The layout, such as the set of printed fields and their
order, can be changed at runtime without restarting the node.
and if so debug log it and otherwise rethrow.
This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
This change modifies the file structure detection functionality
such that some of the decisions can be overridden with user
supplied values.
The fields that can be overridden are:
- charset
- format
- has_header_row
- column_names
- delimiter
- quote
- should_trim_fields
- grok_pattern
- timestamp_field
- timestamp_format
If an override makes finding the file structure impossible then
the endpoint will return an exception.
We have a Kerberos setting to remove realm part from the user
principal name (remove_realm_name). If this is true then
the realm name is removed to form username but in the process,
the realm name is lost. For scenarios like Kerberos cross-realm
authentication, one could make use of the realm name to determine
role mapping for users coming from different realms.
This commit adds user metadata for kerberos_realm and
kerberos_user_principal_name.
Disable specific Thai and Japanese locales as Certificate expiration
validation fails due to the date parsing of BouncyCastle (that manifests
in a FIPS 140 JVM as this is the only place we use BouncyCastle).
Added the locale switching logic here instead of subclassing
ESTestCase as these are the only tests that fail for these locales and
JVM combination.
Resolves#33081
We have a test dependency on Apache Mina when using SimpleKdcServer
for testing Kerberos. When checking for LDAP backend connectivity,
the code checks for deadlocks which require additional security
permissions accessClassInPackage.sun.reflect. As this is only for
test and we do not want to add security permissions to production,
this commit moves these tests and related classes to
x-pack evil-tests where they can run with security manager disabled.
The plan is to handle the security manager exception in the upstream issue
DIRMINA-1093
and then once the release is available to run these tests with security
manager enabled.
Closes#32739
This change removes the wrapping of the created field in the put user
response. The created field was added as a top level field in #32332,
while also still being wrapped within the `user` object of the
response. Since the value is available in both formats in 6.x, we can
remove the wrapped version for 7.0.
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.
The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.
Closes#31505
Previously, when an non-pruned cast (casting as a different
data type) got applied on a table column in the `SELECT` clause,
the name of the result column didn't contain the target data type
of the cast, e.g.:
SELECT CAST(MAX(salary) AS DOUBLE) FROM "test_emp"
returned as column name:
CAST(MAX(salary))
instead of:
CAST(MAX(salary) AS DOUBLE)
Closes#33571
* Added more tests for trivial casts that are pruned
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`,
in many integration tests, to deal with the dynamic nature of the allocation of
ports to nodes. However #33241 allows us to use file-based discovery to achieve
the same goal, so the special test-only `MockUncasedHostsProvider` is no longer
required.
This change removes `MockUncasedHostProvider` and replaces it with file-based
discovery in tests based on `EsIntegTestCase`.
Follow up to #33617. Relates to #30086.
As with all other per-index Monitoring collectors, the `CcrStatsCollector` should only collect stats for the indices the user wants to monitor. This list is controlled by the `xpack.monitoring.collection.indices` setting and defaults to all indices.
This change addresses some issues regarding thread safety around
updates and method calls on the XPackLicenseState object. There exists
a possibility that there could be a concurrent update to the
XPackLicenseState when there is a scheduled check to see if the license
is expired and a cluster state update. In order to address this, the
update method now has a synchronized block where member variables are
updated. Each method that reads these variables is now also
synchronized.
Along with the above change, there was a consistency issue around
security calls to the license state. The majority of security checks
make two calls to the license state, which could result in incorrect
behavior due to the checks being made against different license states.
The majority of this behavior was introduced for 6.3 with the inclusion
of x-pack in the default distribution. In order to resolve the majority
of these cases, the `isSecurityEnabled` method is no longer public and
the logic is also included in individual methods about security such as
`isAuthAllowed`. There were a few cases where this did not remove
multiple calls on the license state, so a new method has been added
which creates a copy of the current license state that will not change.
Callers can use this copy of the license state to make decisions based
on a consistent view of the license state.
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.
* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
index shards are the same as the recorded history UUIDs.
If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
on each shard changes api call verify whether their current history uuid
is the same as the recorded history uuid.
Relates to #30086
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
This change adds a `_source` only snapshot repository that allows to wrap
any existing repository as a _backend_ to snapshot only the `_source` part
including live docs markers. Snapshots taken with the `source` repository
won't include any indices, doc-values or points. The snapshot will be reduced in size and
functionality such that it requires full re-indexing after it's successfully restored.
The restore process will copy the `_source` data locally starts a special shard and engine
to allow `match_all` scrolls and searches. Any other query, or get call will fail with and unsupported operation exception. The restored index is also marked as read-only.
This feature aims mainly for disaster recovery use-cases where snapshot size is
a concern or where time to restore is less of an issue.
**NOTE**: The snapshot produced by this repository is still a valid lucene index. This change doesn't allow for any longer retention policies which is out of scope for this change.
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.
Relates to #30086
This commit adds settings upgraders for the search.remote.* settings
that can be in the cluster state to automatically upgrade these settings
to cluster.remote.*. Because of the infrastructure that we have here,
these settings can be upgraded when recovering the cluster state, but
also when a user tries to make a dynamic update for these settings.
These logs are incredibly verbose, and it makes chasing normal failures
burdensome. This commit removes the debug logging, which can be
reenabled again if needed.
* SQL: Make Literal a NamedExpression
Literal now is a NamedExpression reducing the need for Aliases for
folded expressions leading to simpler optimization rules.
Fix#33523
This change tightens up the meaning of the "input_fields" field
in the file structure finder output. Previously it was permitted
but not calculated for JSON and XML files. Following this change
the field is called "column_names" and is only permitted for
delimited files.
Additionally the way the column names are set for headerless
delimited files is refactored to encapsulate the way they're
named to one line of the code rather than having the same
logic in two places.
Previously, when an arithmetic function got applied on a
table column in the `SELECT` clause, the name of the result
column contained weird characters used internally when
processing the SQL statement e.g.:
SELECT CHAR(emp_no % 10000) FROM "test_emp"
returned:
CHAR((emp_no{f}#14) % 10000))
as the column name instead of:
CHAR((emp_no) % 10000))
Also, fix an issue that causes a ClassCastException to be thrown
when using functions where both arguments are literals.
Closes#31869Closes#33461
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured
Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.
Closes#33555
* Correctly handle NONE keyword for system keystore
As defined in the PKCS#11 reference guide
https://docs.oracle.com/javase/8/docs/technotes/guides/security/p11guide.html
PKCS#11 tokens can be used as the JSSE keystore and truststore and
the way to indicate this is to set `javax.net.ssl.keyStore` and
`javax.net.ssl.trustStore` to `NONE` (case sensitive).
This commits ensures that we honor this convention and do not
attempt to load the keystore or truststore if the system property is
set to NONE.
* Handle password protected system truststore
When a PKCS#11 token is used as the system truststore, we need to
pass a password when loading it, even if only for reading
certificate entries. This commit ensures that if
`javax.net.ssl.trustStoreType` is set to `PKCS#11` (as it would
when a PKCS#11 token is in use) the password specified in
`javax.net.ssl.trustStorePassword` is passed when attempting to
load the truststore.
Relates #33459
In some cases we want to deprecate a setting, and then automatically
upgrade uses of that setting to a replacement setting. This commit adds
infrastructure for this so that we can upgrade settings when recovering
the cluster state, as well as when such settings are dynamically applied
on cluster update settings requests. This commit only focuses on cluster
settings, index settings can build on this infrastructure in a
follow-up.
When requesting job stats for `_all`, all ES tasks are accepted
resulting to loads of cluster traffic and a memory overhead.
This commit correctly filters out non ML job tasks.
Closes#33515
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
In the multi-cluster-with-non-compliant-license tests, we try to write
out a java.policy to a temporary directory. However, if this temporary
directory does not already exist then writing the java.policy file will
fail. This commit ensures that the temporary directory exists before we
attempt to write the java.policy file.
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
This commit ensures that we bootstrap a new history_uuid when force
allocating a stale primary. A stale primary should never be the source
of an operation-based recovery to another shard which exists before the
forced-allocation.
Closes#26712
Today when checking settings dependencies, we do not check if fallback
settings are present. This means, for example, that if
cluster.remote.*.seeds falls back to search.remote.*.seeds, and
cluster.remote.*.skip_unavailable and search.remote.*.skip_unavailable
depend on cluster.remote.*.seeds, and we have set search.remote.*.seeds
and search.remote.*.skip_unavailable, then validation will fail because
it is expected that cluster.ermote.*.seeds is set here. This commit
addresses this by also checking fallback settings when validating
dependencies. To do this, we adjust the settings exist method to also
check for fallback settings, a case that it was not handling previously.
Adds Request and Reponse classes for accessing lifecycle policies.
Changes existing tests to use these classes where appropriate.
Sets up SPI configuration to allow parsing *Actions from XContent.
Change the logging infrastructure to handle when the node name isn't
available in `elasticsearch.yml`. In that case the node name is not
available until long after logging is configured. The biggest change is
that the node name logging no longer fixed at pattern build time.
Instead it is read from a `SetOnce` on every print. If it is unset it is
printed as `unknown` so we have something that fits in the pattern.
On normal startup we don't log anything until the node name is available
so we never see the `unknown`s.
This endpoint accepts an arbitrary file in the request body and
attempts to determine the structure. If successful it also
proposes mappings that could be used when indexing the file's
contents, and calculates simple statistics for each of the fields
that are useful in the data preparation step prior to configuring
machine learning jobs.
Watcher validates `action.auto_create_index` upon startup. If a user
specifies a pattern that does not contain watcher indices, it raises an
error message to include a list of three indices. However, the indices
are separated by a comma and a space which is not considered in parsing.
With this commit we change the error message string so it does not
contain the additional space thus making it more straightforward to copy
it to the configuration file.
Closes#33369
Relates #33497
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
Some browsers (eg. Firefox) behave differently when presented with
multiple auth schemes in 'WWW-Authenticate' header. The expected
behavior is that browser select the most secure auth-scheme before
trying others, but Firefox selects the first presented auth scheme and
tries the next ones sequentially. As the browser interpretation is
something that we do not control, we can at least present the auth
schemes in most to least secure order as the server's preference.
This commit modifies the code to collect and sort the auth schemes
presented by most to least secure. The priority of the auth schemes is
fixed, the lower number denoting more secure auth-scheme.
The current order of schemes based on the ES supported auth-scheme is
[Negotiate, Bearer,Basic] and when we add future support for
other schemes we will need to update the code. If need be we will make
this configuration customizable in future.
Unit test to verify the WWW-Authenticate header values are sorted by
server preference as more secure to least secure auth schemes.
Tested with Firefox, Chrome, Internet Explorer 11.
Closes#32699
It is useful to keep track of which version of a policy is currently
being executed by a specific index. For management purposes, it would
also be useful to know at which time the latest version was inserted
so that an audit trail is left for reconciling changes happening in ILM.
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.
Relates #32867
Split function section into multiple chapters
Add String functions
Add (small) section on Conversion/Cast functions
Add missing aggregation functions
Enable documentation testing (was disabled by accident). While at it,
fix failing tests
Improve spec tests to allow multi-line queries (useful for docs)
Add ability to ignore a spec test (name should end with -Ignore)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
Many files supplied to the upcoming ML data preparation
functionality will not be "log" files. For example,
CSV files are generally not "log" files. Therefore it
makes sense to rename library that determines the
structure of these files.
Although "file structure" could be considered too broad,
as the library currently only works with a few text
formats, in the future it may be extended to work with
more formats.
Auto Following Patterns is a cross cluster replication feature that
keeps track whether in the leader cluster indices are being created with
names that match with a specific pattern and if so automatically let
the follower cluster follow these newly created indices.
This change adds an `AutoFollowCoordinator` component that is only active
on the elected master node. Periodically this component checks the
the cluster state of remote clusters if there new leader indices that
match with configured auto follow patterns that have been defined in
`AutoFollowMetadata` custom metadata.
This change also adds two new APIs to manage auto follow patterns. A put
auto follow pattern api:
```
PUT /_ccr/_autofollow/{{remote_cluster}}
{
"leader_index_pattern": ["logs-*", ...],
"follow_index_pattern": "{{leader_index}}-copy",
"max_concurrent_read_batches": 2
... // other optional parameters
}
```
and delete auto follow pattern api:
```
DELETE /_ccr/_autofollow/{{remote_cluster_alias}}
```
The auto follow patterns are directly tied to the remote cluster aliases
configured in the follow cluster.
Relates to #33007
Co-authored-by: Jason Tedor jason@tedor.me
With features like CCR building on the CCS infrastructure, the settings
prefix search.remote makes less sense as the namespace for these remote
cluster settings than does a more general namespace like
cluster.remote. This commit replaces these settings with cluster.remote
with a fallback to the deprecated settings search.remote.
This removes `PhaseAfterStep` in favor of a new `PhaseCompleteStep`. This step
in only a marker that the `LifecyclePolicyRunner` needs to halt until the time
indicated for entering the next phase.
This also fixes a bug where phase times were encapsulated into the policy
instead of dynamically adjusting to policy changes.
Supersedes #33140, which it replaces
Relates to #29823
This commit is related to #32517. It allows an "server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. This functionality is only implemented for
the netty security transport.
Since policies can be updated independent of execution plans for the current
phase being executed, it would be nice to know what the phase that is executing
looks like in JSON. This PR does just that, while also using that index setting
to recontruct the phase steps to execute (for consistency)
Drops and unused logging constructor, simplifies a rarely used one, and
removes `Settings` from a third. There is now only a single logging ctor
that takes `Settings` and we'll remove that one in a follow up change.
This commit adds a security client to the high level rest client, which
includes an implementation for the put user api. As part of these
changes, a new request and response class have been added that are
specific to the high level rest client. One change here is that the response
was previously wrapped inside a user object. The plan is to remove this
wrapping and this PR adds an unwrapped response outside of the user
object so we can remove the user object later on.
See #29827
Solves all of the xpack line length suppressions and then merges the
remainder of the xpack checkstyle_suppressions.xml file into the core
checkstyle_suppressions.xml file. At this point that just means the
antlr generated files for sql.
It also adds an exclusion to the line length tests for javadocs that
are just a URL. We have one such javadoc and breaking up the line would
make the link difficult to use.
The log structure endpoint will return these in addition to
pure structure information so that it can be used to drive
pre-import data visualizer functionality.
The statistics for every field are count, cardinality
(distinct count) and top hits (most common values). Extra
statistics are calculated if the field is numeric: min, max,
mean and median.
With the introduction of the default distribution, it means that by
default the query cache is wrapped in the security implementation of the
query cache. This cache does not allow caching if the request does not
carry indices permissions. Yet, this will not happen if authorization is
not allowed, which it is not by default. This means that with the
introduction of the default distribution, query caching was disabled by
default! This commit addresses this by checking if authorization is
allowed and if not, delegating to the default indices query
cache. Otherwise, we proceed as before with security. Additionally, we
clear the cache on license state changes.
Extend SHOW TABLES, DESCRIBE and SHOW COLUMNS to support table
identifiers not just SQL LIKE pattern.
This allows both Elasticsearch-style multi-index patterns and SQL LIKE.
To disambiguate between the two (as the " vs ' can be easy to miss),
the grammar now requires LIKE keyword as a prefix for all LIKE-like
patterns.
Also added some docs comparing the two types of patterns.
Fix#33294
This is not changing the behaviour as when the sort field was set
to `influencer_score` the secondary sort would be used and that
was using the `record_score` at the highest priority.
1. The TOMCAT_DATESTAMP format needs to be checked before
TIMESTAMP_ISO8601, otherwise TIMESTAMP_ISO8601 will
match the start of the Tomcat datestamp.
2. Exclude more characters before and after numbers. For
example, in 1.2.3 we don't want to match 1.2 as a float.
The comparator used TimeValue parsing, which meant it couldn't handle
calendar time. This fixes the comparator to handle either (and potentially
mixed). The mixing shouldn't be an issue since the validation code
upstream will prevent it, but was simplest to allow the comparator
to handle both.
In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are
checked and invalid values (v < 0) are rejected. Though for the _field_names
field we hide the statistics of the field if security is enabled since
some terms (field names) may be filtered. However this statistics are never
used, this field is not used for ranking and cannot be used to generate
term vectors. For these reasons this commit restores the original statistics
for the field in order to be compliant with Lucene 8.
This test fails several times due to timeout when asserting the number
of docs on the following and leading indices. This change reduces
the number of docs to index and increases the timeout.
* master:
Mute test watcher usage stats output
[Rollup] Fix FullClusterRestart test
Adjust soft-deletes version after backport into 6.5
completely drop `index.shard.check_on_startup: fix` for 7.0 (#33194)
Fix AwaitsFix issue number
Mute SmokeTestWatcherWithSecurityIT testsi
drop `index.shard.check_on_startup: fix` (#32279)
tracked at
[DOCS] Moves ml folder from x-pack/docs to docs (#33248)
[DOCS] Move rollup APIs to docs (#31450)
[DOCS] Rename X-Pack Commands section (#33005)
TEST: Disable soft-deletes in ParentChildTestCase
Fixes SecurityIntegTestCase so it always adds at least one alias (#33296)
Fix pom for build-tools (#33300)
Lazy evaluate java9home (#33301)
SQL: test coverage for JdbcResultSet (#32813)
Work around to be able to generate eclipse projects (#33295)
Highlight that index_phrases only works if no slop is used (#33303)
Different handling for security specific errors in the CLI. Fix for https://github.com/elastic/elasticsearch/issues/33230 (#33255)
[ML] Refactor delimited file structure detection (#33233)
SQL: Support multi-index format as table identifier (#33278)
MINOR: Remove Dead Code from PathTrie (#33280)
Enable forbiddenapis server java9 (#33245)
We need to wait for the job to fully initialize and start before
we can attempt to stop it. If we don't, it's possible for the stop
API to be called before the persistent task is fully loaded and it'll
throw an exception.
Closes#32773
In the previous commit where SmokeTestWatcherWithSecurityIT tests were
muted, I added the incorrect issue numbers. This commit fixes this. The
issue for the tests is #33320.
* Fixes SecurityIntegTestCase so it always adds at least one alias
`SecurityIntegTestCase.createIndicesWithRandomAliases` could randomly
fail because its not gauranteed that the randomness of which aliases to
add to the `IndicesAliasesRequestBuilder` would always select at least
one alias to add. This change fixes the problem by keeping track of
whether we have added an alias to teh request and forcing the last
alias to be added if no other aliases have been added so far.
Closes#30098
Closes #33123e
* Addresses review comments
These response classes did not add any value and in that case just AcknowledgedResponse should be used.
I also changed the formatting of methods to take one line per parameter in
FollowIndexAction.java and UnfollowIndexAction.java files to make
reviewing diffs in the future easier.
* Tests for JdbcResultSet
* Added VARCHAR conversion for different types
* Made error messages consistent: they now contain both the type that fails to be converted and the value itself
1. Use the term "delimited" rather than "separated values"
2. Use a single factory class with arguments to specify the
delimiter and identification constraints
This change makes it easier to add support for other
delimiter characters.
This brings the name in line with everywhere else and means that name
seen on the feature usage and `GET _xpack` APIs will match the plugin
name.
This change also removes `IndexLifcycle.NAME` since this was only used
to name the scheduler job and that can be done using
`XPackField.INDEX_LIFECYCLE` instead
* master:
Integrates soft-deletes into Elasticsearch (#33222)
Revert "Integrates soft-deletes into Elasticsearch (#33222)"
Add support for "authorization_realms" (#33262)
Authorization Realms allow an authenticating realm to delegate the task
of constructing a User object (with name, roles, etc) to one or more
other realms.
E.g. A client could authenticate using PKI, but then delegate to an LDAP
realm. The LDAP realm performs a "lookup" by principal, and then does
regular role-mapping from the discovered user.
This commit includes:
- authorization_realm support in the pki, ldap, saml & kerberos realms
- docs for authorization_realms
- checks that there are no "authorization chains"
(whereby "realm-a" delegates to "realm-b", but "realm-b" delegates to "realm-c")
Authorization realms is a platinum feature.
This PR removes the deprecated `Custom` class in `IndexMetaData`, in favor
of a `Map<String, DiffableStringMap>` that is used to store custom index
metadata. As part of this, there is now no way to set this metadata in a
template or create index request (since it's only set by plugins, or dedicated
REST endpoints).
The `Map<String, DiffableStringMap>` is intended to be a namespaced `Map<String,
String>` (`DiffableStringMap` implements `Map<String, String>`, so the signature
is more like `Map<String, Map<String, String>>`). This is so we can do things
like:
``` java
Map<String, String> ccrMeta = indexMetaData.getCustom("ccr");
```
And then have complete control over the metadata. This also means any
plugin/feature that uses this has to manage its own BWC, as the map is just
serialized as a map. It also means that if metadata is put in the map that isn't
used (for instance, if a plugin were removed), it causes no failures the way
an unregistered `Setting` would.
The reason I use a custom `DiffableStringMap` here rather than a plain
`Map<String, String>` is so the map can be diffed with previous cluster state
updates for serialization.
Supersedes #32683
This commit ensures that when `TriggerService.start()` is called,
we ensure in the trigger engine implementations that current watches are
removed instead of adding to the existing ones in
`TickerScheduleTriggerEngine.start()`
Two additional minor fixes, where the result remains the same but less code gets executed.
1. If the node is not a data node, we forgot to set the status to
STARTING when watcher is being started. This should not be a big issue,
because a non-data node does not spent a lot of time loading as there
are no watches which need loading.
2. If a new cluster state came in during a reload, we had two checks in
place to abort loading the current one. The first one before we load all
the watches of the local node and the second before watcher is starting
with those new watches. Turned out that the first check was not
returning, which meant we always tried to load all the watches, and then
would fail on the second check. This has been fixed here.
Ensure that the SSLConfigurationReloaderTests can run with JDK 11
by pinning the Server TLS version to TLS1.2. This can be revisited
while tackling the effort to full support TLSv1.3 in
https://github.com/elastic/elasticsearch/issues/32276Resolves#32124
Ran for all locales in system to find locales which caused
problems in tests due to incorrect generalized time handling
in simple kdc ldap server.
Closes#33228
We need to limit the search request aggregations to whole multiples
of the configured interval for both histogram and date_histogram.
Otherwise, agg buckets won't overlap with the rolled up buckets
and the results will be incorrect.
For histogram, the validation is very simple: request must be >= the config,
and modulo evenly.
Dates are more tricky.
- If both request and config are fixed dates, we can convert to millis
and treat them just like the histo
- If both are calendar, we make sure the request is >= the config with
a static lookup map that ranks the calendar values relatively. All
calendar units are "singles", so they are evenly divisible already
- We disallow any other combination (one fixed, one calendar, etc)
When a node dies that carries a watcher shard or a shard is relocated to
another node, then watcher needs not only trigger a reload on the node
where the shard relocation happened, but also on other nodes where
copies of this shard, as different watches may need to be loaded.
This commit takes the change of remote nodes into account by not only
storing the local shard allocation ids in the WatcherLifeCycleService,
but storing a list of ShardRoutings based on the local active shards.
This also fixes some tests, which had a wrong assumption. Using
`TestShardRouting.newShardRouting` in our tests for cluster state
creation led to the issue of always creating new allocation ids which
implicitely lead to a reload.
This extracts a super class out of the rollup indexer called the AsyncTwoPhaseIterator.
The implementor of it can define the query, transformation of the response,
indexing and the object to persist the position/state of the indexer.
The stats object used by the indexer to record progress is also now abstract, allowing
the implementation provide custom stats beyond what the indexer provides. It also
allows the implementation to decide how the stats are presented (leaves toXContent()
up to the implementation).
This should allow new projects to reuse the search-then-index persistent task that Rollup
uses, but without the restrictions/baggage of how Rollup has to work internally to
satisfy time-based rollups.
* master:
Painless: Add Bindings (#33042)
Update version after client credentials backport
Fix forbidden apis on FIPS (#33202)
Remote 6.x transport BWC Layer for `_shrink` (#33236)
Test fix - Graph HLRC tests needed another field adding to randomisation exception list
HLRC: Add ML Get Records API (#33085)
[ML] Fix character set finder bug with unencodable charsets (#33234)
TESTS: Fix overly long lines (#33240)
Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic
Remove unsupported group_shard_failures parameter (#33208)
Update BucketUtils#suggestShardSideQueueSize signature (#33210)
Parse PEM Key files leniantly (#33173)
INGEST: Add Pipeline Processor (#32473)
Core: Add java time xcontent serializers (#33120)
Consider multi release jars when running third party audit (#33206)
Update MSI documentation (#31950)
HLRC: create base timed request class (#33216)
[DOCS] Fixes command page titles
HLRC: Move ML protocol classes into client ml package (#33203)
Scroll queries asking for rescore are considered invalid (#32918)
Painless: Fix Semicolon Regression (#33212)
ingest: minor - update test to include dissect (#33211)
Switch remaining LLREST usage to new style Requests (#33171)
HLREST: add reindex API (#32679)
This commit changes the serialization version from V_7_0_0_alpha1 to
V_6_5_0 for the create token request and response with a client
credentials grant type. The client credentials work has now been
backported to 6.x.
Relates #33106
- third party audit detects jar hell with JDK so we disable it
- jdk non portable in forbiddenapis detects classes being used from the
JDK ( for fips ) that are not portable, this is intended so we don't
scan for it on fips.
- different exclusion rules for third party audit on fips
Closes#33179
Some character sets cannot be encoded and this was tripping
up the binary data check in the ML log structure character
set finder.
The fix is to assume that if ICU4J identifies that some bytes
correspond to a character set that cannot be encoded and those
bytes contain zeroes then the data is binary rather than text.
Fixes#33227
Exclude classes meant for newer versions than what we are auditing against, those classes won't be found. There's no reason to exclude JDK classes from newer versions, with this PR, we will not extract them in the first place.
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. In a
long series of PRs I've changed all of the old style requests that I
could find with `grep`. In this PR I change all requests that I could
find by *removing* the deprecated methods. Since this is a non-trivial
change I do not include actually removing the deprecated requests. I'll
do that in a follow up. But this should be the last set of usage
removals before the actual deprecated method removal. Yay!
* master:
[Rollup] Better error message when trying to set non-rollup index (#32965)
HLRC: Use Optional in validation logic (#33104)
Remove unused User class from protocol (#33137)
ingest: Introduce the dissect processor (#32884)
[Docs] Add link to es-kotlin-wrapper-client (#32618)
[Docs] Remove repeating words (#33087)
Minor spelling and grammar fix (#32931)
Remove support for deprecated params._agg/_aggs for scripted metric aggregations (#32979)
Watcher: Simplify finding next date in cron schedule (#33015)
Run Third party audit with forbidden APIs CLI (part3/3) (#33052)
Fix plugin build test on Windows (#33078)
HLRC+MINOR: Remove Unused Private Method (#33165)
Remove old unused test script files (#32970)
Build analysis-icu client JAR (#33184)
Ensure to generate identical NoOp for the same failure (#33141)
ShardSearchFailure#readFrom to set index and shardId (#33161)
We don't allow the user to configure a rollup index against an
existing index, but the exceptions that we return are not clear about
that. They indicate issues with metadata, instead of stating
the real reason (not allowed to use a non-rollup index to store
rollup data).
This makes the exception better, and adds a bit more testing
This commit removes the unused User class from the protocol project.
This class was originally moved into protocol in preparation for moving
more request and response classes, but given the change in direction
for the HLRC this is no longer needed. Additionally, this change also
changes the package name for the User object in x-pack/plugin/core to
its original name.
This commit makes primary-replica resyncer use Lucene as the source of
history operation instead of translog if soft-deletes is enabled. With
this change, we no longer expose translog snapshot directly in IndexShard.
Relates #29530
These were broken when fetch exceptions were introduced to the status
object but equals and hash code were not updated then. This commit
addresses that.
Today we fetch the mapping from the leader and apply it as a mapping
update whenever the index metadata version on the leader changes. Yet,
the index metadata can change for many reasons other than a mapping
update (e.g., settings updates, adding an alias, or a replica being
promoted to a primary among many other reasons). This commit builds on
the addition of a mapping version to the index metadata to only fetch
mapping updates when the mapping version increases. This reduces the
number of these fetches and application of mappings on the follower to
the bare minimum.
The code introduced in 3fa36807f8 to fix
an issue with crons always returning -1 was not very readable. This
implementation uses streams to improve readability.
The new implementation is functional equivalent with the old, ant based one.
It parses task standard error to get the missing classes and violations in the same way.
I considered re-using ForbiddenApisCliTask but Gradle makes it hard to build inheritance with tasks that have task actions , since the order of the task actions can't be controlled.
This inheritance isn't dully desired either as the third party audit task is much more opinionated and we don't want to expose some of the configuration.
We could probably extract a common base class without any task actions, but probably more trouble than it's worth.
Closes#31715
Changes to the IndexLifecycleService were necessary since relying on
ClusterChangedEvents for a full picture of the cluster state's settings was
a mistake. It is not necessary that these events hold all settings, especially ones
that are set at node start-up.
Changes to main include:
- move poll interval updates to a SettingsUpdateConsumer
- move scheduler start/stop to a localMasterNodeListener
- keep triggerPolicies in clusterChanged
Changes to tests include:
- removal of some low-level state transition checks in the Service that no longer make sense
since the changes are unconditionally specified in the appropriate listeners
- add integration tests for poll-interval updates
- add integration test assertions for verifying scheduler is started up correctly
* master:
Adjust BWC version on mapping version
Token API supports the client_credentials grant (#33106)
Build: forked compiler max memory matches jvmArgs (#33138)
Introduce mapping version to index metadata (#33147)
SQL: Enable aggregations to create a separate bucket for missing values (#32832)
Fix grammar in contributing docs
SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
APM server monitoring (#32515)
Support only string `format` in date, root object & date range (#28117)
[Rollup] Move toBuilders() methods out of rollup config objects (#32585)
Fix forbiddenapis on java 11 (#33116)
Apply publishing to genreate pom (#33094)
Have circuit breaker succeed on unknown mem usage
Do not lose default mapper on metadata updates (#33153)
Fix a mappings update test (#33146)
Reload Secure Settings REST specs & docs (#32990)
Refactor CachingUsernamePassword realm (#32646)
This change adds support for the client credentials grant type to the
token api. The client credentials grant allows for a client to
authenticate with the authorization server and obtain a token to access
as itself. Per RFC 6749, a refresh token should not be included with
the access token and as such a refresh token is not issued when the
client credentials grant is used.
The addition of the client credentials grant will allow users
authenticated with mechanisms such as kerberos or PKI to obtain a token
that can be used for subsequent access.
* Adding new MonitoredSystem for APM server
* Teaching Monitoring template utils about APM server monitoring indices
* Documenting new monitoring index for APM server
* Adding monitoring index template for APM server
* Copy pasta typo
* Removing metrics.libbeat.config section from mapping
* Adding built-in user and role for APM server user
* Actually define the role :)
* Adding missing import
* Removing index template and system ID for apm server
* Shortening line lengths
* Updating expected number of built-in users in integration test
* Removing "system" from role and user names
* Rearranging users to make tests pass
Refactors the logic of authentication and lookup caching in
`CachingUsernamePasswordRealm`. Nothing changed about
the single-inflight-request or positive caching.
* master:
Add proxy support to RemoteClusterConnection (#33062)
TEST: Skip assertSeqNos for closed shards (#33130)
TEST: resync operation on replica should acquire shard permit (#33103)
Switch remaining x-pack tests to new style Requests (#33108)
Switch remaining tests to new style Requests (#33109)
Switch remaining ml tests to new style Requests (#33107)
Build: Line up IDE detection logic
Security index expands to a single replica (#33131)
HLRC: request/response homogeneity and JavaDoc improvements (#33133)
Checkstyle!
[Test] Fix sporadic failure in MembershipActionTests
Revert "Do NOT allow termvectors on nested fields (#32728)"
[Rollup] Move toAggCap() methods out of rollup config objects (#32583)
Fix race condition in scheduler engine test
This adds support for connecting to a remote cluster through
a tcp proxy. A remote cluster can configured with an additional
`search.remote.$clustername.proxy` setting. This proxy will be used
to connect to remote nodes for every node connection established.
We still try to sniff the remote clsuter and connect to nodes directly
through the proxy which has to support some kind of routing to these nodes.
Yet, this routing mechanism requires the handshake request to include some
kind of information where to route to which is not yet implemented. The effort
to use the hostname and an optional node attribute for routing is tracked
in #32517Closes#31840
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. This
changes all calls in the `x-pack/qa/saml-idp-tests` and
`x-pack/qa/security-setup-password-tests` projects to use the new
versions.
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. This
changes all calls in the `x-pack/plugin/ml/qa/native-multi-node-tests`,
`x-pack/plugin/ml/qa/single-node-tests` projects to use the new
versions.
This change removes the use of 0-all for auto expand replicas for the
security index. The use of 0-all causes some unexpected behavior with
certain allocation settings. This change allows us to avoid these with
a default install. If necessary, the number of replicas can be tuned by
the user.
Closes#29933Closes#29712
This commit adds tracking and reporting for fetch exceptions. We track
fetch exceptions per fetch, keeping track of up to the maximum number of
concurrent fetches. With each failing fetch, we associate the from
sequence number with the exception that caused the fetch. We report
these in the CCR stats endpoint, and add some testing for this tracking.
Welp, I broke this. I merged a change to auto-discover the CCR QA tests
by making :x-pack:plugin:ccr:check auto-discover the check tasks in the
qa sub-project. Yet, the check tasks for these sub-projects did not
depend on the necessary test tasks (as we were previously doing this
directly from the ccr build file. This commit fixes this!
This commit addresses a race condition in the scheduler engine test that
a listener that throws an exception does not cause other listeners to be
skipped. The race here is that we were counting down a latch, and then
throwing an exception yet an assertion that expected the exception to
have been thrown already could execute after the latch was counted down
for the final time but before the exception was thrown and acted upon by
the scheduler engine. This commit addresses this by moving the counting
down of the latch to definitely be after the exception was acted upon by
the scheduler engine.
This committ removes the getMetadata() methods from the DateHistoGroupConfig
and HistoGroupConfig objects. This way the configuration objects do not rely on RollupField.formatMetaField() anymore and do not expose a getMetadata()
method that is tighlty coupled to the rollup indexer.
* es/master: (62 commits)
[DOCS] Add docs for Application Privileges (#32635)
Add versions 5.6.12 and 6.4.1
Do NOT allow termvectors on nested fields (#32728)
[Rollup] Return empty response when aggs are missing (#32796)
[TEST] Add some ACL yaml tests for Rollup (#33035)
Move non duplicated actions back into xpack core (#32952)
Test fix - GraphExploreResponseTests should not randomise array elements Closes#33086
Use `addIfAbsent` instead of checking if an element is contained
TESTS: Fix Random Fail in MockTcpTransportTests (#33061)
HLRC: Fix Compile Error From Missing Throws (#33083)
[DOCS] Remove reload password from docs cf. #32889
HLRC: Add ML Get Buckets API (#33056)
Watcher: Improve error messages for CronEvalTool (#32800)
Search: Support of wildcard on docvalue_fields (#32980)
Change query field expansion (#33020)
INGEST: Cleanup Redundant Put Method (#33034)
SQL: skip uppercasing/lowercasing function tests for AZ locales as well (#32910)
Fix the default pom file name (#33063)
Switch ml basic tests to new style Requests (#32483)
Switch some watcher tests to new style Requests (#33044)
...
* Remove canSetPolicy, canUpdatePolicy and canRemovePolicy
Since we now store a pre-compiled list of steps for an index's phase in the
`PolicyStepsRegistry`, we no longer need to worry about updating policies as any
updates won't affect the current phase, and will only be picked up on phase
transitions.
This also removes the tests that test these methods
Relates to #29823
If a search request doesn't contain aggs (or an empty agg object),
we should just retun an empty response. This is how the normal search
API works if you specify zero hits and empty aggs.
The existing behavior throws an exception because it tries to send
an empty msearch.
Closes#32256
This is needed as with recent changes to master (see #32952), protocol
is no longer accessible from core, so these classes need to be
duplicated in both places.