Commit Graph

6159 Commits

Author SHA1 Message Date
Benjamin Trent 1b9dc0172a
[ML] adding feature_name and node size validation for tree models (#62096) (#62161)
When a tree model is provided, it is possible that it is a stump.
Meaning, it only has one node with no splits
This implies that the tree has no features. In this case,
having zero feature_names is appropriate. In any other case,
this should be considered a validation failure.

This commit adds the validation if there is more than 1 node,
that the feature_names in the model are non-empty.

closes #60759
2020-09-09 08:50:25 -04:00
Luca Cavanna b680d3fb29 Async search: don't track fetch failures (#62111)
Fetch failures are currently tracked byy AsyncSearchTask like ordinary shard failures. Though they should be treated differently or they end up causing weird scenarios like total=num_shards and successful=num_shards as the query phase ran fine yet the failed count would reflect the number of shards where fetch failed.

Given that partial results only include aggs for now and are complete even if fetch fails, we can ignore fetch failures in async search, as they will be anyways included in the response. They are in fact either received as a failure when all shards fail during fetch, or as part of the final response when only some shards fail during fetch.
2020-09-09 13:44:07 +02:00
Luca Cavanna ad83261348 Print out search request as part of async search task description (#62057)
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.

With this commit we address that while also making sure that the description highlights that the task is originated from an async search.

Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
2020-09-09 13:44:07 +02:00
Rory Hunter b7fd7cf154
Write deprecation logs to a data stream (#61966)
Backport of #58924.

Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream
as well as to disk.
2020-09-09 12:16:28 +01:00
David Kyle a4fb501a33
[ML] Extra exceptions for renamed fields in mapping tests (#62151)
Add renamed 'maximum_number_trees' fields to exceptions
2020-09-09 12:03:42 +01:00
markharwood 5a48895065
Wildcard field bug fix for prefix + term queries. Wildcard syntax should not be supported. (#62085) (#62154)
Wildcard field bug fix for term and prefix queries.
We now escape any * or ? characters in the search string before delegating to the main wildcardQuery() method.

Closes #62081
2020-09-09 10:46:52 +01:00
Dimitris Athanasiou 6c1700c343
[7.x][ML] Outlier detection mapping for nested feature influence (#62068) (#62150)
Adds mappings for outlier detection results.

Backport of #62068
2020-09-09 12:41:26 +03:00
Yang Wang 146b2e6b1a
[Test] Fix data-stream rest test failure (#62137) (#62144)
By enabling searchable snapshots for release builds.
2020-09-09 17:12:53 +10:00
Dimitris Athanasiou 14547b14fb
[ML] Add changes stats fields as exception in mappings upgrade test (#62121)
Following #61980 we need to exclude the replaced fields in the test
in the 7.x branch.
2020-09-09 10:09:35 +03:00
Yang Wang 18a08c0cf2
Mute MlMappingsUpgradeIT testMappingsUpgrade (#61909) (#62136)
For #61908

Co-authored-by: David Kyle <david.kyle@elastic.co>
2020-09-09 13:23:36 +10:00
Armin Braun ed4984a32e
Remove Redundant Stream Wrapping from Compression (#62017) (#62132)
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
2020-09-09 03:27:38 +02:00
Costin Leau 0f9532689f EQL: Propagate key constraints through the query (#62073)
Since join keys are common across all queries in a Join/Sequence, any
constraint applied on one query needs to be obeyed but all the other
queries.
This PR enhances the optimizer to propagate such constraints across
all queries so they get pushed down to the actual generated ES queries.

Fix #58937

(cherry picked from commit 4afa5debc199c132c07015bfae17952c40a21e5d)
2020-09-08 18:40:47 +03:00
Benjamin Trent 057bf3f7d5
[ML] setting require_alias to previous value on bulk index retry (#62103) (#62108)
Previous work has been done to prevent automatically creating a concrete index when an alias is desired.

This commit addresses a path where this check was not being done.

relates: #62064
2020-09-08 11:38:32 -04:00
Adam Locke 8f27e9fa28
[DOCS] [7.8] Clarify HTTPS usage for create key API (#60858) (#62098)
* Update create-api-keys.asciidoc

* Adding note to create API keys for https

* Adding note for enabling TLS

* Add specific setting for ssl.enabled

* Incorporating review feedback
2020-09-08 10:23:43 -04:00
Dimitris Athanasiou 41507cff48
[7.x][ML] Update mappings of ml stats index (#61980) (#62091)
- Adds missing mappings for `alpha`, `gamma`, and `lambda`.
- Corrects name of `soft_tree_depth_limit` and `soft_tree_depth_tolerance`.
- Removes unused `regularization_depth_penalty_multiplier`,
  `regularization_leaf_weight_penalty_multiplier` and
  `regularization_tree_size_penalty_multiplier`.

Backport of #61980
2020-09-08 16:41:57 +03:00
David Roberts b2636678b2 [ML] Add support for date_nanos fields in find_file_structure (#62048)
Now that #61324 is merged it is possible for the find_file_structure
endpoint to suggest using date_nanos fields for timestamps where
the timestamp format provides greater than millisecond accuracy.
2020-09-08 13:05:09 +01:00
Francisco Fernández Castaño 2bb5716b3d
Add repositories metering API (#62088)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.

Backport of #60371
2020-09-08 14:01:04 +02:00
David Kyle fb6ee5b36d
[7.x] [ML] Assert mappings match templates in Upgrade tests (#61905)
At the end of the rolling upgrade tests check the mappings of the concrete
.ml and .transform-internal indices match the mappings in the templates.
When the templates change, the tests should prove that the mappings have
been updated in the new cluster.
2020-09-08 12:21:19 +01:00
Przemko Robakowski bb357f6aae
[7.x] Move internal index templates to composable templates (#61457) (#61661)
This change moves watcher, ILM history and SLM history templates to composable templates.
Versions are updated to reflect the switch. Only change to the templates themselves is added `_meta` to mark them as managed
2020-09-08 11:26:06 +02:00
Andrei Stefan 7d5791b6bd
EQL: create the search request with a list of indices (#62005) (#62076)
* The query client uses an array of indices instead of the comma separated
version of the indices names

(cherry picked from commit 8ec4a768f4892a4a2faed25836cb333a9deb2ace)
2020-09-08 10:26:59 +03:00
David Kyle a5b24bf44c
Mute ClassificationIT (#62063)
testWithOnlyTrainingRowsAndTrainingPercentIsFifty_DependentVariableIsBoolean
For #60759
2020-09-07 16:10:48 +01:00
Luca Cavanna 168b448a0f
Rename runtime_script field type to runtime (#62034)
We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts.

This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper .

Relates to #59332
2020-09-07 15:07:23 +02:00
Jim Ferenczi fa8e76abb1
Improve reduction of terms aggregations (#61779) (#62028)
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).

Relates #51857
2020-09-07 13:13:20 +02:00
Martijn van Groningen 7e566ddd06
Move data stream yaml tests to xpack plugin module. (#62032)
Backport of #61998 to 7.x branch.

Moving the data stream yaml tests to xpack plugin module has the following benefits:
* The tests are ran both with security enabled (as part of xpack/plugin integTest)
  and disabled (as part of xpack/plugin/data-stream/qa/rest integTest).
* and running the tests in mixed cluster qa environment.
2020-09-07 11:03:32 +02:00
Tanguy Leroux ebbf4df9fd
Adapt SearchableSnapshotsBlobStoreCacheIntegTests to Lucene 8.7.0 (#61989) (#62030)
Elasticsearch now uses #61957 which includes https://issues.apache.org/jira/browse/LUCENE-9456. 
We can remove the corresponding //TODO in SearchableSnapshotsBlobStoreCacheIntegTests.
2020-09-07 10:25:44 +02:00
Luca Cavanna 0c8b438577
Add support for runtime fields (#61776)
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.

We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-09-07 09:14:53 +02:00
Dimitris Athanasiou d37f197efd
[7.x][ML] Allow training_percent to be any positive double up to hundred (#61977) (#61990)
This changes the valid range of `training_percent` for regression and
classification from [1, 100] to (0, 100].

Backport of #61977
2020-09-04 17:34:14 +03:00
Yannick Welsch 6d08b55d4e Simplify searchable snapshot shard allocation (#61911)
Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those
snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which
to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current
implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).
2020-09-04 15:45:00 +02:00
Tanguy Leroux 289b1f4ae7
Reduce locking in prewarming (#61837) (#61967)
During prewarming of a Lucene file a CacheFile is acquired and 
then locked for the duration of the prewarming, ie locked until all 
the part of the file has been downloaded and written to cache on 
disk. The locking (executed with CacheFile#fileLock()) is here to 
prevent the cache file to be evicted while it is prewarming.

But holding the lock may take a while for large files, specially since
 restoring snapshot files now respects the 
indices.recovery.max_bytes_per_sec setting of 40mb (#58658), 
and this can have bad consequences like preventing the CacheFile 
to be evicted, opened or closed. In manual tests this bug slow 
downs various requests like mounting a new searchable snapshot
 index or deleting an existing one that is still prewarming.

This commit reduces the time the lock is held during prewarming so
 that the read lock is only required when actively writing to the CacheFile.
2020-09-04 15:06:50 +02:00
Martijn van Groningen 84af9abd76
Fix skip versions fix xpack data stream yaml tests. (#61981)
Backport of #61926 to 7.x branch.

Relates to #61904
2020-09-04 14:53:38 +02:00
Benjamin Trent cec102a391
[7.x] [ML] adds new n_gram_encoding custom processor (#61578) (#61935)
* [ML] adds new n_gram_encoding custom processor (#61578)

This adds a new `n_gram_encoding` feature processor for analytics and inference.

The focus of this processor is simple ngram encodings that allow:
 - multiple ngrams [1..5]
 - Prefix, infix, suffix
2020-09-04 08:36:50 -04:00
Ignacio Vera 31c026f25c
upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957) (#61974) 2020-09-04 13:46:20 +02:00
Dimitris Athanasiou bdccab7c7a
[7.x][ML] Add incremental id during data frame analytics reindexing (#61943) (#61971)
Previously, we added a copy of the `_id` during reindexing and sorted
the destination index on that. This allowed us to traverse the docs in the
destination index in a stable order multiple times and with efficiency.
However, the destination index being sorted means we cannot have `nested`
typed fields. This is a problem as it does not allow us to provide
a good experience with our evaluate API when it comes to computing
metrics for specific classes, features, etc.

This commit changes the approach in order to result to a destination
index that allows nested fields.

Instead of adding a copy of the `_id` field, we now add an incremental
id that we can use to traverse the docs in a stable order. We also
ensure we always assign the same incremental id to the same doc from
the source indices by sorting on `_seq_no` during reindexing. That
in combination with the reindexing API using scroll gives us a stable
order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties.

The extractor now does not need to scroll. Instead we sort on the incremental
id and we do ranged searches to avoid the sort-all-docs overhead.

Finally, the `TestDocsIterator` is simply changed to search_after the incremental id.

With these changes data frame analytics jobs do not use scroll at any part.

Having all these in place, the commit adds the `nested` types to the necessary
fields of `classification` and `regression` analyses results.

Backport of #61943
2020-09-04 13:24:42 +03:00
Tanguy Leroux 10d14ce101
Enable searchable snapshot feature for all test clusters (#61888) (#61965)
This commit reenables the searchable snapshot feature for integration tests 
after #61802 which changed some build plugins.
2020-09-04 11:20:24 +02:00
Tim Vernum cdfb163c7c
Add explicit test for DLS with OIDC metadata (#61955)
When a user authenticates via OpenID Connect we copy information from
the OIDC claims into the user's metadata in a particular format.

This commit adds a test that metadata in that format can be used in a
mustache template for Document Level Security.

Backport of: #60030
2020-09-04 16:21:20 +10:00
Tim Vernum 57efda2865
Add DEBUG logging for undefined role mapping field (#61887)
A role mapping with the following content:

    "rules": { "field": { "userid" : "admin" } }

will never match because `userid` is not a valid field. The correct
field is `username`.

This change adds DEBUG logging when an undefined field is referenced.

The choice to use DEBUG rather than INFO/WARN is that the set of
fields is partially dynamic (e.g. the `metadata.*` fields), so
it may be perfectly reasonable to check a field that is not defined
for that user. For example this rule:

    "rules": { "field": { "metadata.ranking" : "A" } }

would generate a log message for an unranked user, which would
erroneously suggest that such a rule is an error.
This DEBUG logging will assist in diagnosing problems, without
introducing that confusion.

Backport of: #61246

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-09-04 14:19:05 +10:00
Ryan Ernst d6e17170c3
Simplify adding plugins and modules to testclusters (#61886)
There are currently half a dozen ways to add plugins and modules for
test clusters to use. All of them require the calling project to peek
into the plugin or module they want to use to grab its bundlePlugin
task, and then both depend on that task, as well as extract the archive
path the task will produce. This creates cross project dependencies that
are difficult to detect, and if the dependent plugin/module has not yet
been configured, the build will fail because the task does not yet
exist.

This commit makes the plugin and module methods for testclusters
symmetetric, and simply adding a file provider directly, or a project
path that will produce the plugin/module zip. Internally this new
variant uses normal configuration/dependencies across projects to get
the zip artifact. It also has the added benefit of no longer needing the
caller to add to the test task a dependsOn for bundlePlugin task.
2020-09-03 19:37:46 -07:00
Jake Landis ea1e8ad6ea
[7.x] Fix passing params to template or script failed in watcher (#58559) (#61885)
The main changes are:
* Fix custom params are missing when using template or script in watcher's 
  logging action or jira action.
* Add yaml tests to test passing params to template or script successfully.

Relates to #57625

Co-authored-by: bellengao <gbl_long@163.com>
2020-09-03 15:47:51 -05:00
Costin Leau 99ee87e332 EQL: Revert filter pipe (#61907)
The current implementation of the filter pipe is incomplete hence why
it got reverted. Note this is not a complete revert as some of the
improvements of said commit (such as the PostAnalyzer) are useful in
general.

Relates #61805

(cherry picked from commit 7a7eb66f7d39586c3a3bc00dce49e6c47a23b46a)
2020-09-03 22:31:08 +03:00
Martijn van Groningen 3d9c12e2d3
Fix data stream wildcard resolution bug in eql search api.(#61910)
Backport of #61904 to 7.x branch.

The eql search api redirects to the search api. For this reason the eql
search api could work with concrete data stream names. However if security
is enabled and a data stream name snippet with a wildcard was used then
it could not resolve this expressions. This is because the EqlSearchRequest
class didn't overwrite the `includeDataStreams()` method. This pr fixes this,
so that the security layer can properly expand data stream name wildcard
expressions for the eql search api.

This commit also moves the eql data stream test to xpack rest tests,
so that the test runs with security enabled. This is required to reproduce
the bug.

Closes #60828
2020-09-03 16:03:57 +02:00
Tanguy Leroux c90ee32cdc
Mute ClassificationIT.testTooLowConfiguredMemoryStillStarts (#61915)
Relates #61913
2020-09-03 15:52:01 +02:00
Jake Landis dbb78e1c45
[7.x] Correct the query dsl for watching elasticsearch version (#58321) (#61882)
The term query should be looking at the cluster_uuid field in elasticsearch_version_mismatch.json.

Co-authored-by: bellengao <gbl_long@163.com>
2020-09-02 16:58:21 -05:00
Nik Everett c19f67ce30
Support longs in BitArray (backport of #61867) (#61871)
We frequently use `long`s with `BitArray` in aggs and right now we have
to assert that the `long` fits in an `int`. This adds support for `long`
to `BitArray` so we don't need those assertions.
2020-09-02 17:24:31 -04:00
Dimitris Athanasiou ec405978fc
[7.x][ML] Update reindexing task progress before persisting job progress (#61868) (#61875)
This fixes a bug introduced by #61782. In that PR I thought I could
simplify the persistence of progress by using the progress straight
from the stats holder in the task instead of calling the get
stats action. However, I overlooked that it is then possible to
have stale progress for the reindexing task as that is only updated
when the get stats API is called.

In this commit this is fixed by updating reindexing task progress
before persisting the job progress. This seems to be much more
lightweight than calling the get stats request.

Closes #61852

Backport of #61868
2020-09-02 21:44:18 +03:00
Benjamin Trent c22415c241
[7.x] [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846) (#61869)
* [ML] unmute testTooLowConfiguredMemoryStillStarts (#61846)

Native PR addresses this test failure: https://github.com/elastic/ml-cpp/pull/1465


closes https://github.com/elastic/elasticsearch/issues/61704

closes https://github.com/elastic/elasticsearch/issues/61561
2020-09-02 13:23:23 -04:00
Jake Landis f6b3148e5e
[7.x] Convert second 1/2 x-pack plugins from integTest to [yaml | java]RestTest or internalClusterTest (#61802) (#61856)
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.

This includes the following projects:
security, spatial, stack, transform, vecotrs, voting-only-node, and watcher.

A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately. 

related: #60630
related: #56841
related: #59939
related: #55896
2020-09-02 11:20:55 -05:00
Jake Landis 794aac717d
[7.x] Convert first 1/2 x-pack plugins from integTest to [yaml | java]RestTest or internalClusterTest (#60630) (#61855)
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.

This includes the following projects:
async-search, autoscaling, ccr, enrich, eql, frozen-indicies,
data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml

A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.

A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is).

related: #61802
related: #56841
related: #59939
related: #55896
2020-09-02 11:19:24 -05:00
Dimitris Athanasiou 07ab0beea0
[7.x][ML] Improve handling of exception while starting DFA process (#61838) (#61847)
While starting the data frame analytics process it is possible
to get an exception before the process crash handler is in place.
In addition, right after starting the process, we check the process
is alive to ensure we capture a failed process. However, those exceptions
are unhandled.

This commit catches any exception thrown while starting the process
and sets the task to failed with the root cause error message.

I have also taken the chance to remove some unused parameters
in `NativeAnalyticsProcessFactory`.

Relates #61704

Backport of #61838
2020-09-02 16:32:45 +03:00
Costin Leau e6dc8054a5 EQL: Introduce filter pipe (#61805)
Allow filtering through a pipe, across events and sequences.
Filter pipes are pushed down to base queries.
For now filtering after limit (head/tail) is forbidden as the
semantics are still up for debate.

Fix #59763

(cherry picked from commit 80569a388b76cecb5f55037fe989c8b6f140761b)
2020-09-02 15:48:51 +03:00
David Roberts 89599ba0a3
[ML] Update ML mappings upgrade test and extend to config index (#61830)
The ML mappings upgrade test had become useless as it was
checking a field that has been the same since 6.5. This
commit switches to a field that was changed in 7.9.

Additionally, the test only used to check the results index
mappings.  This commit also adds checking for the config
index.

Backport of #61340
2020-09-02 12:23:59 +01:00