If a cluster sending monitoring data is unhealthy and triggers an
alert, then stops sending data the following exception [1] can occur.
This exception stops the current Watch and the behavior is actually
correct in part due to the exception. Simply fixing the exception
introduces some incorrect behavior. Now that the Watch does not
error in the this case, it will result in an incorrectly "resolved"
alert. The fix here is two parts a) fix the exception b) fix the
following incorrect behavior.
a) fixing the exception is as easy as checking the size of the
array before accessing it.
b) fixing the following incorrect behavior is a bit more intrusive
- Note - the UI depends on the success/met state for each condition
to determine an "OK" or "FIRING"
In this scenario, where an unhealthy cluster triggers an alert and
then goes silent, it should keep "FIRING" until it hears back that
the cluster is green. To keep the Watch "FIRING" either the index
action or the email action needs to fire. Since the Watch is neither
a "new" alert or a "resolved" alert, we do not want to keep sending
an email (that would be non-passive too). Without completely changing
the logic of how an alert is resolved allowing the index action to
take place would result in the alert being resolved. Since we can
not keep "FIRING" either the email or index action (since we don't
want to resolve the alert nor re-write the logic for alert resolution),
we will introduce a 3rd action. A logging action that WILL fire when
the cluster is unhealthy. Specifically will fire when there is an
unresolved alert and it can not find the cluster state.
This logging action is logged at debug, so it should be noticed much.
This logging action serves as an 'anchor' for the UI to keep the state
in an a "FIRING" status until the alert is resolved.
This presents a possible scenario where a cluster starts firing,
then goes completely silent forever, the Watch will be "FIRING"
forever. This is an edge case that already exists in some scenarios
and requires manual intervention to remove that Watch.
This changes changes to use a template-like method to populate the
version_created for the default monitoring watches. The version is
set to 7.5 since that is where this is first introduced.
Fixes#43184
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
This change also slightly modifies the stats response,
so that is can easier consumer by monitoring and other
users. (coordinators stats are now in a list instead of
a map and has an additional field for the node id)
Relates to #32789
This commit renames the ILM package from indexlifecycle to ilm. We have
all come to know index lifecycle management as ILM, the APIs and
settings use ilm, and it would be nice of the package did too. This
commit makes that change.
We often start testing with early access versions of new Java
versions and this have caused minor issues in our tests
(i.e. #43141) because the version string that the JVM reports
cannot be parsed as it ends with the string -ea.
This commit changes how we parse and compare Java versions to
allow correct parsing and comparison of the output of java.version
system property that might include an additional alphanumeric
part after the version numbers
(see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it
handles a version number part, like before, but additionally a
PRE part that matches ([a-zA-Z0-9]+).
It also changes a number of tests that would attempt to parse
java.specification.version in order to get the full version
of Java. java.specification.version only contains the major
version and is thus inappropriate when trying to compare against
a version that might contain a minor, patch or an early access
part. We know parse java.version that can be consistently
parsed.
Resolves#43141
This commit converts all remaining ActionType response classes to
writeable in xpack core. It also converts a few from server which were
used by xpack core.
relates #34389
This commit converts the request and response classes for broadcast
actions to implement ctors for Writeable.Reader and forces all future
implementations to implement the same.
relates #34389
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.
relates #34389
* Return 0 for negative "free" and "total" memory reported by the OS
We've had a situation where the MX bean reported negative values for the
free memory of the OS, in those rare cases we want to return a value of
0 rather than blowing up later down the pipeline.
In the event that there is a serialization or creation error with regard
to memory use, this adds asserts so the failure will occur as soon as
possible and give us a better location for investigation.
Resolves#42157
* Fix test passing in invalid memory value
* Fix another test passing in invalid memory value
* Also change mem check in MachineLearning.machineMemoryFromStats
* Add background documentation for why we prevent negative return values
* Clarify comment a bit more
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
This commit fixes the version parsing in various tests. The issue here is that
the parsing was relying on java.version. However, java.version can contain
additional characters such as -ea for early access builds. See JEP 233:
Name Syntax
------------------------------ --------------
java.version $VNUM(\-$PRE)?
java.runtime.version $VSTR
java.vm.version $VSTR
java.specification.version $VNUM
java.vm.specification.version $VNUM
Instead, we want java.specification.version.
This commit updates the default ciphers and TLS protocols that are used
when the runtime JDK supports them. New cipher support has been
introduced in JDK 11 and 12 along with performance fixes for AES GCM.
The ciphers are ordered with PFS ciphers being most preferred, then
AEAD ciphers, and finally those with mainstream hardware support. When
available stronger encryption is preferred for a given cipher.
This is a backport of #41385 and #41808. There are known JDK bugs with
TLSv1.3 that have been fixed in various versions. These are:
1. The JDK's bundled HttpsServer will endless loop under JDK11 and JDK
12.0 (Fixed in 12.0.1) based on the way the Apache HttpClient performs
a close (half close).
2. In all versions of JDK 11 and 12, the HttpsServer will endless loop
when certificates are not trusted or another handshake error occurs. An
email has been sent to the openjdk security-dev list and #38646 is open
to track this.
3. In JDK 11.0.2 and prior there is a race condition with session
resumption that leads to handshake errors when multiple concurrent
handshakes are going on between the same client and server. This bug
does not appear when client authentication is in use. This is
JDK-8213202, which was fixed in 11.0.3 and 12.0.
4. In JDK 11.0.2 and prior there is a bug where resumed TLS sessions do
not retain peer certificate information. This is JDK-8212885.
The way these issues are addressed is that the current java version is
checked and used to determine the supported protocols for tests that
provoke these issues.
The run task is supposed to run elasticsearch with the given plugin or
module. However, for modules, this is most realistic if using the full
distribution. This commit changes the run setup to use the default or
oss as appropriate.
When monitoring exporters are all disabled, which must be done
explicitly, _and_ monitoring collection is enabled, then
any call to `_xpack/monitoring/_bulk` will create a task that
never closes _and_ ES collection will stop happening because
a semaphore is never marked as completed.
This also simplifies the async `ExportBulk` code by removing the
third step (second async step, `close`) entirely because it was
entirely unnecessary by both implementations.
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.
In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.
Relates #36712
org.elasticsearch.xpack.monitoring.action.MonitoringBulkRequestTests#testAddRequestContent
can still randomly use a defaultType for monitoring. The defaultType
support has been removed as of PR #39888. Prior to its's removal it
would default the type if one is not specified. The _type on the monitoring
bulk end point is currently required, though it is not used as the final index type
(which defaultType would have).
Closes#39980
* [ML] Refactor common utils out of ML plugin to XPack.Core
* implementing GET filters with abstract transport
* removing added rest param
* adjusting how defaults can be supplied
This commit removes the "doc" type from monitoring internal indexes.
The template still carries the "_doc" type since that is needed for
the internal representation.
This change impacts the following templates:
monitoring-alerts.json
monitoring-beats.json
monitoring-es.json
monitoring-kibana.json
monitoring-logstash.json
As part of the required changes, the system_api_version has been
bumped from "6" to "7" and support for version "2" has been dropped.
A new empty pipeline is now introduced for the version "7", and
the formerly empty "6" pipeline will now remove the type and re-direct
the request to the "7" index.
Additionally, to due to a difference in the internal representation
(which requires the inclusion of "_doc" type) and external representation
(which requires the exclusion of any type) a helper method is introduced
to help convert internal to external representation, and used by the
monitoring HTTP template exporter.
Relates #38637
The monitoring bulk API accepts the same format as the bulk API, yet its concept
of types is different from "mapping types" and the deprecation warning is only
emitted as a side-effect of this API reusing the parsing logic of bulk requests.
This commit extracts the parsing logic from `_bulk` into its own class with a
new flag that allows to configure whether usage of `_type` should emit a warning
or not. Support for payloads has been removed for simplicity since they were
unused.
@jakelandis has a separate change that removes this notion of type from the
monitoring bulk API that we are considering bringing to 8.0.
Backport of #38818 to `7.x`. Original description:
The HTTP exporter code in the Monitoring plugin makes `GET _template` requests to check for existence of templates. These requests don't need to pass the `include_type_name` query parameter so this PR removes it from the request. This should remove the following deprecation log entries on the Monitoring cluster in 7.0.0 onwards:
```
[types removal] Specifying include_type_name in get index template requests is deprecated.
```
The java time formatter used in the exporter adds a plus sign to the
year, if a year with more than five digits is used. This changes the
creation of those timestamp to only have a date up to 9999.
Closes#38378
The test was relying on toString in ZonedDateTime which is different to
what is formatted by strict_date_time when milliseconds are 0
The method is just delegating to dateFormatter, so that scenario should
be covered there.
closes#38359
Backport #38610
This PR removes the use of document types from the monitoring exporters and template + watches setup code.
It does not remove the notion of types from the monitoring bulk API endpoint "front end" code as that code will eventually just go away in 8.0 and be replaced with Beats as collectors/shippers directly to the monitoring cluster.
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.
This is a continuation of #28667, #36137 and also fixes#37708.
This deprecates the `xpack.watcher.history.cleaner_service.enabled` setting,
since all newly created `.watch-history` indices in 7.0 will use ILM to manage
their retention.
In 8.0 the setting itself and cleanup actions will be removed.
Resolves#32041
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.
The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.
Relates #27330
The AbstracLifecycleComponent used to extend AbstractComponent, so it had to pass settings to the constractor of its supper class.
It no longer extends the AbstractComponent so there is no need for this constructor
There is also no need for AbstracLifecycleComponent subclasses to have Settings in their constructors if they were only passing it over to super constructor.
This is part 1. which will be backported to 6.x with a migration guide/deprecation log.
part 2 will have this constructor removed in 7
relates #35560
relates #34488
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.