2921 Commits

Author SHA1 Message Date
Ryan Ernst
3a2c698ce0
Rename Action to ActionType (#43778)
Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.

This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.

relates #34389
2019-06-30 22:00:17 -07:00
Dimitris Athanasiou
8f49d01113
[7.x][ML] Rename df-analytics _id_copy to ml__id_copy (#43754) (#43783)
Renames `_id_copy` to `ml__id_copy` as field names starting with
underscore are deprecated. The new field name `ml__id_copy` was
chosen as an obscure enough field that users won't have in their data.
Otherwise, this field is only intented to be used by df-analytics.
2019-06-30 19:37:00 +03:00
Albert Zaharovits
5e17bc5dcc
Consistent Secure Settings #40416
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
2019-06-29 23:26:17 +03:00
David Roberts
b599c68d23 [ML] Assert that a no-op job creates no results nor state (#43681)
If a job is opened and then closed and does nothing in
between then it should not persist any results or state
documents. This change adapts the no-op job test to
assert no results in addition to no state, and to log
any documents that cause this assertion to fail.

Relates elastic/ml-cpp#512
Relates #43680
2019-06-29 14:57:49 +01:00
Ryan Ernst
28ab77a023
Add StreamableResponseAction to aid in deprecation of Streamable (#43770)
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.

relates #34389
2019-06-28 21:40:00 -07:00
David Roberts
7951c63b91 [ML] Mark ml-cpp dependency as regularly changing (#43760)
Since #41817 was merged the ml-cpp zip file for any
given version has been cached indefinitely by Gradle.
This is problematic, particularly in the case of the
master branch where the version 8.0.0-SNAPSHOT will
be in use for more than a year.

This change tells Gradle that the ml-cpp zip file is
a "changing" dependency, and to check whether it has
changed every two hours. Two hours is a compromise
between checking on every build and annoying developers
with slow internet connections and checking rarely
causing bug fixes in the ml-cpp code to take a long
time to propagate through to elasticsearch PRs that
rely on them.
2019-06-28 21:21:18 +01:00
Benjamin Trent
67a3c656c3
[7.x] [ML][Data Frame] removing format support (#43659) (#43747)
* [ML][Data Frame] removing format support (#43659)

* Fixing conflicts
2019-06-28 10:02:37 -05:00
Jim Ferenczi
7ca69db83f Refactor IndexSearcherWrapper to disallow the wrapping of IndexSearcher (#43645)
This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection
betweeen the query and the live docs instead of checking the live docs on every document that match the query.
2019-06-28 16:28:02 +02:00
Alpar Torok
d1a4d8866d Add missing dependencies so we can build in parallel (#43672) 2019-06-28 16:41:18 +03:00
Dimitris Athanasiou
86c853a7c2
[7.x][ML] Rename outlier score setting to feature_influence_threshold (#43705) (#43734)
Renames outlier score setting `minimum_score_to_write_feature_influence`
to `feature_influence_threshold`.
2019-06-28 13:28:25 +03:00
Dimitris Athanasiou
cab879118d
[7.x][ML] Support multiple source indices for df-analytics (#43702) (#43731)
This commit adds support for multiple source indices.
In order to deal with multiple indices having different mappings,
it attempts a best-effort approach to merge the mappings assuming
there are no conflicts. In case conflicts exists an error will be
returned.

To allow users creating custom mappings for special use cases,
the destination index is now allowed to exist before the analytics
job runs. In addition, settings are no longer copied except for
the `index.number_of_shards` and `index.number_of_replicas`.
2019-06-28 13:28:03 +03:00
Christoph Büscher
2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Przemysław Witek
94f18da5df
Add version and create_time to data frame analytics config (#43683) (#43712) 2019-06-28 07:37:21 +02:00
Ryan Ernst
5b4089e57e
Remove nodeId from BaseNodeRequest (#43658)
TransportNodesAction provides a mechanism to easily broadcast a request
to many nodes, and collect the respones into a high level response. Each
node has its own request type, with a base class of BaseNodeRequest.
This base request requires passing the nodeId to which the request will
be sent. However, that nodeId is not used anywhere. It is private to the
base class, yet serialized to each node, where the node could just as
easily find the nodeId of the node it is on locally.

This commit removes passing the nodeId through to the node request
creation, and guards its serialization so that we can remove the base
request class altogether in the future.
2019-06-27 18:45:14 -07:00
Igor Motov
3607876a71 Geo: Makes coordinate validator in libs/geo plugable (#43657)
Moves coordinate validation from Geometry constructors into
parser.

Relates #43644
2019-06-27 19:53:41 -04:00
Nhat Nguyen
ce8771feb7 Do not use MockInternalEngine in GatewayIndexStateIT (#43716)
GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the
flushing on shutdown. This behaviour, however, can be randomly disabled
in MockInternalEngine.

Closes #43034
2019-06-27 18:28:04 -04:00
Przemysław Witek
68dbbd8793
Deduplicate two similar TimeUtils classes. (#43697)
* Deduplicate org.elasticsearch.xpack.core.dataframe.utils.TimeUtils and org.elasticsearch.xpack.core.ml.utils.time.TimeUtils into a common class: org.elasticsearch.xpack.core.common.time.TimeUtils.
* Add unit tests for parseTimeField and parseTimeFieldToInstant methods
2019-06-27 18:51:48 +02:00
Yannick Welsch
6744344ef2 Handle situation where only voting-only nodes are bootstrapped (#43628)
Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will
still try to become elected and bring full master nodes into the cluster.
2019-06-27 18:10:15 +02:00
David Roberts
f39619d182 [ML] Don't write timing stats on no-op (#43680)
Similar to elastic/ml-cpp#512, if a job opens and closes and
does nothing in between we shouldn't write timing stats to
the results index.
2019-06-27 16:37:54 +01:00
Jim Ferenczi
329d05f61e Fix UOE on search requests that match a sparse role query (#43668)
Search requests executed through the SecurityIndexSearcherWrapper throw
an UnsupportedOperationException if they match a sparse role query.
When low level cancellation is activated (which is the default since #42857),
the context index searcher creates a weight that doesn't handle #scorer.
This change fixes this bug and adds a test to ensure that we check this case.
2019-06-27 16:56:56 +02:00
Przemysław Witek
ba518722a2
[7.x] [ML] Tag destination index with data frame metadata (#43567) (#43660) 2019-06-27 08:08:39 +02:00
Benjamin Trent
d05593c3ad
[ML][Data Frame] adds tests for continuous DF (#43601) (#43654) 2019-06-26 14:59:19 -05:00
Benjamin Trent
52e26bbc42
[ML][Data Frame] improve pivot nested field validations (#43548) (#43636)
* [ML][Data Frame] improve pivot nested field validations

* addressing pr comments
2019-06-26 13:35:51 -05:00
Armin Braun
c00e305d79
Optimize Selector Wakeups (#43515) (#43650)
* Use atomic boolean to guard wakeups
* Don't trigger wakeups from the select loops thread itself for registering and closing channels
* Don't needlessly queue writes

Co-authored-by:  Tim Brooks <tim@uncontended.net>
2019-06-26 20:00:42 +02:00
David Kyle
e1f761dfc7 [Ml Data Frame] Size the GET stats search by number of Ids requested (#43206)
Set the size of the search request to the number of ids limited by 10,000
2019-06-26 17:01:12 +01:00
Benjamin Trent
c121b00c98
[7.x] [ML][Data Frame] Add support for allow_no_match for endpoints (#43490) (#43637)
* [ML][Data Frame] Add support for allow_no_match for endpoints (#43490)

* [ML][Data Frame] Add support for allow_no_match parameter in endpoints

Adds support for:
* Get Transforms
* Get Transforms stats
* stop transforms

* Update DataFrameTransformDocumentationIT.java
2019-06-26 10:09:56 -05:00
David Roberts
31dc5b7d3a [TEST] Wait for replicas before stopping nodes in ML distributed test (#43622)
If we stop a node before replicas exist then
the test can fail because we lose a whole
index if we stop the node with the primary on.
2019-06-26 11:52:53 +01:00
David Roberts
558e323c89 [ML] Introduce a setting for the process connect timeout (#43234)
This change introduces a new setting,
xpack.ml.process_connect_timeout, to enable
the timeout for one of the external ML processes
to connect to the ES JVM to be increased.

The timeout may need to be increased if many
processes are being started simultaneously on
the same machine. This is unlikely in clusters
with many ML nodes, as we balance the processes
across the ML nodes, but can happen in clusters
with a single ML node and a high value for
xpack.ml.node_concurrent_job_allocations.
2019-06-26 09:22:04 +01:00
Yannick Welsch
2049f715b3 Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <david.turner@elastic.co>
2019-06-26 08:07:56 +02:00
Yogesh Gaikwad
480453aa24
Make role descriptors optional when creating API keys (#43481) (#43614)
This commit changes the `role_descriptors` field from required
to optional when creating API key. The default behavior in .NET ES
client is to omit properties with `null` value requiring additional
workarounds. The behavior for the API does not change.
Field names (`id`, `name`) in the invalidate api keys API documentation have been
corrected where they were wrong.

Closes #42053
2019-06-26 14:30:51 +10:00
Przemysław Witek
76a750a0a0
Remove unused mapStringsOrdered method (#42513) (#43585) 2019-06-25 20:43:38 +02:00
Tanguy Leroux
0dc1c12f13
Fix indices shown in _cat/indices (#43286)
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.

This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
 information about the indices.

Closes #39933
2019-06-25 20:02:34 +02:00
Dimitris Athanasiou
126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00
Benjamin Trent
970e157eac
[ML][Data Frame] Adjusting error message (#43455) (#43580)
* Adjusting error message

* Update TransportPutDataFrameTransformAction.java

* Update TransportPutDataFrameTransformAction.java
2019-06-25 10:09:39 -05:00
Przemysław Witek
c702cd7415
[7.x] Implement XContentParser.genericMap and XContentParser.genericMapOrdered methods (#42059) (#43575) 2019-06-25 16:04:54 +02:00
Przemysław Witek
b15e40ffad
Extract TimingStats-related functionality into TimingStatsReporter (#43371) (#43557) 2019-06-25 15:48:39 +02:00
David Roberts
9c285ddbab [ML] Improve message when native controller cannot connect (#43565)
The error message if the native controller failed to run
(for example due to running Elasticsearch on an unsupported
platform) was not easy to understand.  This change removes
pointless detail from the message and adds some hints about
likely causes.

Fixes #42341
2019-06-25 12:06:54 +01:00
Tim Brooks
38516a4dd5
Move nio ip filter rule to be a channel handler (#43507)
Currently nio implements ip filtering at the channel context level. This
is kind of a hack as the application logic should be implemented at the
handler level. This commit moves the ip filtering into a channel
handler. This requires adding an indicator to the channel handler to
show when a channel should be closed.
2019-06-24 10:03:24 -06:00
Gordon Brown
fac7efba9a
[7.x] Account for node versions during allocation in ILM Shrink (#43300)
This commit ensures that ILM's Shrink action will take node versions into
account when choosing which node to allocate to when shrinking an
index. Prior to this change, ILM could pick a node with a lower version
than some shards are already allocated to, which causes the new
allocation to fail as shards can't be relocated onto a node with a lower
version than they are already on.

As part of this, when making the decision about which node to allocate
to prior to Shrink, all shards in the index are considered, rather than
choosing a random shard to consider.

Further, the unit tests for the logic that chooses a node to allocate
shards to pre-shrink has been improved to validate the behavior in more
realistic and varied initial conditions.
2019-06-24 10:02:49 -06:00
Mayya Sharipova
813551e070 Fix eclipse build gradle for vectors project
Closes #43496
2019-06-24 09:22:48 -04:00
Martijn van Groningen
101cf384ba
Replace Streamable w/ Writable in AcknowledgedResponse and subclasses (backport 7.x) (#43525)
This commit replaces usages of Streamable with Writeable for the
AcknowledgedResponse and its subclasses, plus associated actions.

Note that where possible response fields were made final and default
constructors were removed.

This is a large PR, but the change is mostly mechanical.

Relates to #34389
Backport of #43414
2019-06-24 13:47:37 +02:00
Alpar Torok
ea44da6069 Testclusters: conver remaining x-pack (#43335)
Convert x-pack tests
2019-06-24 12:07:42 +03:00
Benjamin Trent
f4b75d6d14
[7.x] [ML][Data Frame] Add version and create_time to transform config (#43384) (#43480)
* [ML][Data Frame] Add version and create_time to transform config (#43384)

* [ML][Data Frame] Add version and create_time to transform config

* s/transform_version/version s/Date/Instant

* fixing getter/setter for version

* adjusting for backport
2019-06-21 09:11:44 -05:00
David Kyle
73221d2265 [ML] Resolve NetworkDisruptionIT (#43441)
After the network disruption a partition is created,
one side of which can form a cluster the other can't.
Ensure requests are sent to a node on the correct side
of the cluster
2019-06-21 10:24:02 +01:00
Simon Willnauer
424ef4f158 SecurityIndexSearcherWrapper doesn't always carry over caches and similarity (#43436)
If DocumentLevelSecurity is enabled SecurityIndexSearcherWrapper doesn't carry
over the cache, cache policy and similarity from the incoming searcher.
2019-06-21 10:19:10 +02:00
Tim Vernum
059eb55108
Use SecureString for password length validation (#43465)
This replaces the use of char[] in the password length validation
code, with the use of SecureString

Although the use of char[] is not in itself problematic, using a
SecureString encourages callers to think about the lifetime of the
password object and to clear it after use.

Backport of: #42884
2019-06-21 17:11:07 +10:00
Armin Braun
21515b9ff1
Fix IpFilteringIntegrationTests (#43019) (#43434)
* Increase timeout to 5s since we saw 500ms+ GC pauses on CI
* closes #40689
2019-06-20 22:31:59 +02:00
Yannick Welsch
7f8e1454ab Advance checkpoints only after persisting ops (#43205)
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.

This commit required changing some core classes in the system:

- The LocalCheckpointTracker keeps track now not only of the information whether an operation has
been processed, but also whether that operation has been persisted to disk.
- TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once
they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
- ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all
shard copies when in primary mode. The computed global checkpoint (which represents the
minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored
in the checkpoint entry for the local shard copy, has been moved to an extra field.
- The periodic global checkpoint sync now also takes async durability into account, where the local
checkpoints on shards only advance when the translog is asynchronously fsynced. This means that
the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is
not sufficient anymore.
- The new index closing API does not work when combined with async durability. The shard
verification step is now requires an additional pre-flight step to fsync the translog, so that the main
verify shard step has the most up-to-date global checkpoint at disposition.
2019-06-20 11:12:38 +02:00
Andrei Stefan
fe0f9055d8 Fix NPE in case of subsequent scrolled requests for a CSV/TSV formatted response (#43365)
(cherry picked from commit 0ef7bb0f8b07cd0392d37f96ca9360821b19315a)
2019-06-20 11:26:11 +03:00
Jason Tedor
1f1a035def
Remove stale test logging annotations (#43403)
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
2019-06-19 22:58:22 -04:00