Currently a watch execution results in one bulk request, when the
triggered watches are written into the that index, that need to be
executed. However the update of the watch status, the creation of the
watch history entry as well as the deletion of the triggered watches
index are all single document operations.
This can have quite a negative impact, once you are executing a lot of
watches, as each execution results in 4 documents writes, three of them
being single document actions.
This commit switches to a bulk processor instead of a single document
action for writing watch history entries and deleting triggered watch
entries. However the defaults are to run synchronous as before because
the number of concurrent requests is set to 0. This also fixes a bug,
where the deletion of the triggered watch entry was done asynchronously.
However if you have a high number of watches being executed, you can
configure watcher to delete the triggered watches entries as well as
writing the watch history entries via bulk requests.
The triggered watches deletions should still happen in a timely manner,
where as the history entries might actually be bound by size as one
entry can easily have 20kb.
The following settings have been added:
- xpack.watcher.bulk.actions (default 1)
- xpack.watcher.bulk.concurrent_requests (default 0)
- xpack.watcher.bulk.flush_interval (default 1s)
- xpack.watcher.bulk.size (default 1mb)
The drawback of this is of course, that on a node outage you might end
up with watch history entries not being written or watches needing to be
executing again because they have not been deleted from the triggered
watches index. The window of these two cases increases configuring the bulk processor to wait to reach certain thresholds.
The following stats are being kept track of:
1) The total number of times that auto following a leader index succeed.
2) The total number of times that auto following a leader index failed.
3) The total number of times that fetching a remote cluster state failed.
4) The most recent 256 auto follow failures per auto leader index
(e.g. create_and_follow api call fails) or cluster alias
(e.g. fetching remote cluster state fails).
Each auto follow run now produces a result that is being used to update
the stats being kept track of in AutoFollowCoordinator.
Relates to #33007
* Implement xpack.monitoring.elasticsearch.collection.enabled setting
* Fixing line lengths
* Updating constructor calls in test
* Removing unused import
* Fixing line lengths in test classes
* Make monitoringService.isElasticsearchCollectionEnabled() return true for tests
* Remove wrong expectation
* Adding unit tests for new flag to be false
* Fixing line wrapping/indentation for better readability
* Adding docs
* Fixing logic in ClusterStatsCollector::shouldCollect
* Rebasing with master and resolving conflicts
* Simplifying implementation by gating scheduling
* Doc fixes / improvements
* Making methods package private
* Fixing wording
* Fixing method access
This PR removes fields that are not actually used by the Monitoring UI. This will greatly simplify the eventual migration to using Metricbeat for monitoring Elasticsearch (see https://github.com/elastic/beats/pull/8260#discussion_r215885868 for more context and discussion around removing these fields from ES collection).
* [CCR] Do not unnecessarily wrap fetch exception in a ElasticSearch exception and
properly map fetch_exception.exception field as object.
The extra caused by level is not necessary here:
```
"fetch_exceptions": [
{
"from_seq_no": 1,
"retries": 106,
"exception": {
"type": "exception",
"reason": "[index1] IndexNotFoundException[no such index]",
"caused_by": {
"type": "index_not_found_exception",
"reason": "no such index",
"index_uuid": "_na_",
"index": "index1"
}
}
}
],
```
When a leader index is created, it may not have a mapping yet.
Currently if you follow such an index the shard follow tasks fail with
NoSuchElementException, because they expect a single mapping.
This commit fixes that, by allowing that a leader index does not yet have
a mapping.
We can't rely on the leaf reader ordinal in a wrapped reader since
it might not correspond to the ordinal in the SegmentInfos for it's
SegmentCommitInfo.
Relates to #32844Closes#33689Closes#33755
This commit adds the Create Rollup Job API to the high level REST
client. It supersedes #32703 and adds dedicated request/response
objects so that it does not depend on server side components.
Related #29827
Rather than scheduling pings to the leader index when we are caught up
to the leader, this commit introduces long polling for changes. We will
fire off a request to the leader which if we are already caught up will
enter a poll on the leader side to listen for global checkpoint
changes. These polls will timeout after a default of one minute, but can
also be specified when creating the following task. We use these time
outs as a way to keep statistics up to date, to not exaggerate time
since last fetches, and to avoid pipes being broken.
When executing CCR REST tests it is going to be expected after global
checkpoint polling goes in that shard changes tasks can still be pending
at the end of the test. One way to deal with this is to set a low
timeout on these polls, but then that means we are not executing our
REST tests with our default production settings and instead would be
using an unrealistic low timeout. Alternatively, since we expect these
tasks to be there, we can not count them against the test. That is what
this commit does.
This commit moves these REST tests (possibly temporarily) to a
sub-project of ccr. We do this (again, possibly temporarily) to keep
them within the ccr sub-project yet there are changes within 6.x that
prevent these from being in the top-level project (the cluster formation
tasks are trying to install x-pack-ccr into the
integ-test-zip). Therefore, we isolate these for now until we can
understand why there are differences between 6.x and master.
When developing ccr it is not ideal if tests are in multiple modules.
Even the classes these tests test are in the core module, it is easier
if these tests are in ccr module in order to avoid running the test task
in core module. This results in running many non ccr tests.
This way when developing ccr we can run locally:
./gradlew x-pack:plugin:core:precommit x-pack:plugin:ccr:check
before pushing to PR branches and be confident that the PR build passes,
without running x-pack:plugin:core:check task.
Ensure that the SSLConfigurationReloaderTests can run with JDK 11
by pinning the HttpClient to TLS version to TLS1.2. This is necessary
becase even if the MockWebServer is set to user TLS1.2, we don't
set its enabled protocols, so if it receives a TLS1.3 request (which
is the default behavior for HttpClient in JDK11), it will use TLS1.3
and the original issue will manifest again.
Relates #33127Resolves#32124
Changes the format of log events in the audit logfile.
It also changes the filename suffix from `_access` to `_audit`.
The new entry format is consistent with Elastic Common Schema.
Entries are formatted as JSON with no nested objects and field
names have a dotted syntax. Moreover, log entries themselves
are not spaced by commas and there is exactly one entry per line.
In addition, entry fields are ordered, unlike a typical JSON doc,
such that a human would not strain his eyes over jumbled
fields from one line to the other; the order is defined in the log4j2
properties file.
The implementation utilizes the log4j2's `StringMapMessage`.
This means that the application builds the log event as a map
and the log4j logic (the appender's layout) handle the format
internally. The layout, such as the set of printed fields and their
order, can be changed at runtime without restarting the node.
and if so debug log it and otherwise rethrow.
This should fix a couple of test failures where during test teardown tests
failed due to uncaught exceptions being detected.
This change modifies the file structure detection functionality
such that some of the decisions can be overridden with user
supplied values.
The fields that can be overridden are:
- charset
- format
- has_header_row
- column_names
- delimiter
- quote
- should_trim_fields
- grok_pattern
- timestamp_field
- timestamp_format
If an override makes finding the file structure impossible then
the endpoint will return an exception.
We have a Kerberos setting to remove realm part from the user
principal name (remove_realm_name). If this is true then
the realm name is removed to form username but in the process,
the realm name is lost. For scenarios like Kerberos cross-realm
authentication, one could make use of the realm name to determine
role mapping for users coming from different realms.
This commit adds user metadata for kerberos_realm and
kerberos_user_principal_name.
Disable specific Thai and Japanese locales as Certificate expiration
validation fails due to the date parsing of BouncyCastle (that manifests
in a FIPS 140 JVM as this is the only place we use BouncyCastle).
Added the locale switching logic here instead of subclassing
ESTestCase as these are the only tests that fail for these locales and
JVM combination.
Resolves#33081
We have a test dependency on Apache Mina when using SimpleKdcServer
for testing Kerberos. When checking for LDAP backend connectivity,
the code checks for deadlocks which require additional security
permissions accessClassInPackage.sun.reflect. As this is only for
test and we do not want to add security permissions to production,
this commit moves these tests and related classes to
x-pack evil-tests where they can run with security manager disabled.
The plan is to handle the security manager exception in the upstream issue
DIRMINA-1093
and then once the release is available to run these tests with security
manager enabled.
Closes#32739
This change removes the wrapping of the created field in the put user
response. The created field was added as a top level field in #32332,
while also still being wrapped within the `user` object of the
response. Since the value is available in both formats in 6.x, we can
remove the wrapped version for 7.0.
The follow index api checks if the recorded uuid in the follow index matches
with uuid of the leader index and fails otherwise. This validation will
prevent a follow index from following an incompatible leader index.
The create_and_follow api will automatically add this custom index metadata
when it creates the follow index.
Closes#31505
Previously, when an non-pruned cast (casting as a different
data type) got applied on a table column in the `SELECT` clause,
the name of the result column didn't contain the target data type
of the cast, e.g.:
SELECT CAST(MAX(salary) AS DOUBLE) FROM "test_emp"
returned as column name:
CAST(MAX(salary))
instead of:
CAST(MAX(salary) AS DOUBLE)
Closes#33571
* Added more tests for trivial casts that are pruned
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`,
in many integration tests, to deal with the dynamic nature of the allocation of
ports to nodes. However #33241 allows us to use file-based discovery to achieve
the same goal, so the special test-only `MockUncasedHostsProvider` is no longer
required.
This change removes `MockUncasedHostProvider` and replaces it with file-based
discovery in tests based on `EsIntegTestCase`.
Follow up to #33617. Relates to #30086.
As with all other per-index Monitoring collectors, the `CcrStatsCollector` should only collect stats for the indices the user wants to monitor. This list is controlled by the `xpack.monitoring.collection.indices` setting and defaults to all indices.
This change addresses some issues regarding thread safety around
updates and method calls on the XPackLicenseState object. There exists
a possibility that there could be a concurrent update to the
XPackLicenseState when there is a scheduled check to see if the license
is expired and a cluster state update. In order to address this, the
update method now has a synchronized block where member variables are
updated. Each method that reads these variables is now also
synchronized.
Along with the above change, there was a consistency issue around
security calls to the license state. The majority of security checks
make two calls to the license state, which could result in incorrect
behavior due to the checks being made against different license states.
The majority of this behavior was introduced for 6.3 with the inclusion
of x-pack in the default distribution. In order to resolve the majority
of these cases, the `isSecurityEnabled` method is no longer public and
the logic is also included in individual methods about security such as
`isAuthAllowed`. There were a few cases where this did not remove
multiple calls on the license state, so a new method has been added
which creates a copy of the current license state that will not change.
Callers can use this copy of the license state to make decisions based
on a consistent view of the license state.
For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.
* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
index shards are the same as the recorded history UUIDs.
If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
on each shard changes api call verify whether their current history uuid
is the same as the recorded history uuid.
Relates to #30086
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>