Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.
This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.
relates #34389
Renames `_id_copy` to `ml__id_copy` as field names starting with
underscore are deprecated. The new field name `ml__id_copy` was
chosen as an obscure enough field that users won't have in their data.
Otherwise, this field is only intented to be used by df-analytics.
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
If a job is opened and then closed and does nothing in
between then it should not persist any results or state
documents. This change adapts the no-op job test to
assert no results in addition to no state, and to log
any documents that cause this assertion to fail.
Relates elastic/ml-cpp#512
Relates #43680
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.
relates #34389
Since #41817 was merged the ml-cpp zip file for any
given version has been cached indefinitely by Gradle.
This is problematic, particularly in the case of the
master branch where the version 8.0.0-SNAPSHOT will
be in use for more than a year.
This change tells Gradle that the ml-cpp zip file is
a "changing" dependency, and to check whether it has
changed every two hours. Two hours is a compromise
between checking on every build and annoying developers
with slow internet connections and checking rarely
causing bug fixes in the ml-cpp code to take a long
time to propagate through to elasticsearch PRs that
rely on them.
This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection
betweeen the query and the live docs instead of checking the live docs on every document that match the query.
This commit adds support for multiple source indices.
In order to deal with multiple indices having different mappings,
it attempts a best-effort approach to merge the mappings assuming
there are no conflicts. In case conflicts exists an error will be
returned.
To allow users creating custom mappings for special use cases,
the destination index is now allowed to exist before the analytics
job runs. In addition, settings are no longer copied except for
the `index.number_of_shards` and `index.number_of_replicas`.
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.
This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.
Relates to #29051
TransportNodesAction provides a mechanism to easily broadcast a request
to many nodes, and collect the respones into a high level response. Each
node has its own request type, with a base class of BaseNodeRequest.
This base request requires passing the nodeId to which the request will
be sent. However, that nodeId is not used anywhere. It is private to the
base class, yet serialized to each node, where the node could just as
easily find the nodeId of the node it is on locally.
This commit removes passing the nodeId through to the node request
creation, and guards its serialization so that we can remove the base
request class altogether in the future.
GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the
flushing on shutdown. This behaviour, however, can be randomly disabled
in MockInternalEngine.
Closes#43034
* Deduplicate org.elasticsearch.xpack.core.dataframe.utils.TimeUtils and org.elasticsearch.xpack.core.ml.utils.time.TimeUtils into a common class: org.elasticsearch.xpack.core.common.time.TimeUtils.
* Add unit tests for parseTimeField and parseTimeFieldToInstant methods
Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will
still try to become elected and bring full master nodes into the cluster.
Search requests executed through the SecurityIndexSearcherWrapper throw
an UnsupportedOperationException if they match a sparse role query.
When low level cancellation is activated (which is the default since #42857),
the context index searcher creates a weight that doesn't handle #scorer.
This change fixes this bug and adds a test to ensure that we check this case.
* Use atomic boolean to guard wakeups
* Don't trigger wakeups from the select loops thread itself for registering and closing channels
* Don't needlessly queue writes
Co-authored-by: Tim Brooks <tim@uncontended.net>
* [ML][Data Frame] Add support for allow_no_match for endpoints (#43490)
* [ML][Data Frame] Add support for allow_no_match parameter in endpoints
Adds support for:
* Get Transforms
* Get Transforms stats
* stop transforms
* Update DataFrameTransformDocumentationIT.java
This change introduces a new setting,
xpack.ml.process_connect_timeout, to enable
the timeout for one of the external ML processes
to connect to the ES JVM to be increased.
The timeout may need to be increased if many
processes are being started simultaneously on
the same machine. This is unlikely in clusters
with many ML nodes, as we balance the processes
across the ML nodes, but can happen in clusters
with a single ML node and a high value for
xpack.ml.node_concurrent_job_allocations.
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.
Closes#14340
Co-authored-by: David Turner <david.turner@elastic.co>
This commit changes the `role_descriptors` field from required
to optional when creating API key. The default behavior in .NET ES
client is to omit properties with `null` value requiring additional
workarounds. The behavior for the API does not change.
Field names (`id`, `name`) in the invalidate api keys API documentation have been
corrected where they were wrong.
Closes#42053
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.
This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
information about the indices.
Closes#39933
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.
A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.
The APIs are:
- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}
When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:
1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index
In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:
- POST _ml/data_frame/_evaluate
The error message if the native controller failed to run
(for example due to running Elasticsearch on an unsupported
platform) was not easy to understand. This change removes
pointless detail from the message and adds some hints about
likely causes.
Fixes#42341
Currently nio implements ip filtering at the channel context level. This
is kind of a hack as the application logic should be implemented at the
handler level. This commit moves the ip filtering into a channel
handler. This requires adding an indicator to the channel handler to
show when a channel should be closed.
This commit ensures that ILM's Shrink action will take node versions into
account when choosing which node to allocate to when shrinking an
index. Prior to this change, ILM could pick a node with a lower version
than some shards are already allocated to, which causes the new
allocation to fail as shards can't be relocated onto a node with a lower
version than they are already on.
As part of this, when making the decision about which node to allocate
to prior to Shrink, all shards in the index are considered, rather than
choosing a random shard to consider.
Further, the unit tests for the logic that chooses a node to allocate
shards to pre-shrink has been improved to validate the behavior in more
realistic and varied initial conditions.
This commit replaces usages of Streamable with Writeable for the
AcknowledgedResponse and its subclasses, plus associated actions.
Note that where possible response fields were made final and default
constructors were removed.
This is a large PR, but the change is mostly mechanical.
Relates to #34389
Backport of #43414
* [ML][Data Frame] Add version and create_time to transform config (#43384)
* [ML][Data Frame] Add version and create_time to transform config
* s/transform_version/version s/Date/Instant
* fixing getter/setter for version
* adjusting for backport
After the network disruption a partition is created,
one side of which can form a cluster the other can't.
Ensure requests are sent to a node on the correct side
of the cluster
This replaces the use of char[] in the password length validation
code, with the use of SecureString
Although the use of char[] is not in itself problematic, using a
SecureString encourages callers to think about the lifetime of the
password object and to clear it after use.
Backport of: #42884
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.
This commit required changing some core classes in the system:
- The LocalCheckpointTracker keeps track now not only of the information whether an operation has
been processed, but also whether that operation has been persisted to disk.
- TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once
they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
- ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all
shard copies when in primary mode. The computed global checkpoint (which represents the
minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored
in the checkpoint entry for the local shard copy, has been moved to an extra field.
- The periodic global checkpoint sync now also takes async durability into account, where the local
checkpoints on shards only advance when the translog is asynchronously fsynced. This means that
the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is
not sufficient anymore.
- The new index closing API does not work when combined with async durability. The shard
verification step is now requires an additional pre-flight step to fsync the translog, so that the main
verify shard step has the most up-to-date global checkpoint at disposition.
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.