Commit Graph

1424 Commits

Author SHA1 Message Date
Tom Veasey 690099553c
[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552)
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
2020-03-13 17:35:51 +00:00
Nik Everett 9dcd64c110
Preserve metric types in top_metrics (backport of #53288) (#53440)
This changes the `top_metrics` aggregation to return metrics in their
original type. Since it only supports numerics, that means that dates,
longs, and doubles will come back as stored, with their appropriate
formatter applied.
2020-03-12 17:17:09 -04:00
Aleksandr Maus e3a1291adf
EQL: Add more rest client tests (#53422) (#53474) 2020-03-12 09:44:46 -04:00
Benjamin Trent 89668c5ea0
[ML][Inference] adds new default_field_map field to trained models (#53294) (#53419)
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Dimitris Athanasiou 0fd0516d0d
[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300) (#53390)
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.

Backport of #53300
2020-03-11 12:45:27 +02:00
Hendrik Muhs 696aa4ddaf
[7.x][Transform] add support for script in group_by (#53167) (#53324)
add the possibility to base the group_by on the output of a script.

closes #43152
backport #53167
2020-03-10 11:12:58 +01:00
Christoph Büscher 9e561c2921 Fix AbstractBulkByScrollRequest slices parameter via Rest (#53068)
Currently the AbstractBulkByScrollRequest accepts slice values of 0 via its
`setSlices` method, denoting the "auto" slicing behaviour that is usable by
settting the "slices=auto" parameter on rest requests. When using the High Level
Rest Client, however, we send the 0 value as an integer, which is then rejected
as invalid by `AbstractBulkByScrollRequest#parseSlices`. Instead of making
parsing of the rest request more lenient, this PR opts for changing the
RequestConverter logic in the client to translate 0 values to "auto" on the rest
requests.

Closes #53044
2020-03-06 15:38:04 +01:00
Aleksandr Maus 2dc872f052
EQL: Add HLRC for EQL stats (#53043) (#53148) 2020-03-05 09:20:38 -05:00
Nik Everett 28df7ae5ed
Support multiple metrics in `top_metrics` agg (backport of #52965) (#53163)
This adds support for returning multiple metrics to the `top_metrics`
agg. It looks like:
```
POST /test/_search?filter_path=aggregations
{
  "aggs": {
    "tm": {
      "top_metrics": {
        "metrics": [
          {"field": "v"},
          {"field": "m"}
        ],
        "sort": {"s": "desc"}
      }
    }
  }
}
```
2020-03-05 08:12:01 -05:00
Aleksandr Maus b47bffba24
EQL: consistent naming for event type vs event category (#53073) (#53090)
Related to https://github.com/elastic/elasticsearch/issues/52941
2020-03-04 08:02:38 -05:00
Costin Leau 712e0c05cd EQL: Add implicit ordering on timestamp (#53004)
QL: Move Sort base class from SQL to QL
(cherry picked from commit 798015b7bbd565e9c4222724614baeb432c7c2b3)
2020-03-02 22:41:36 +02:00
Aleksandr Maus 89ed857c79
EQL: Change request parameter query to filter and rule to query (#52971) (#53006)
Related to https://github.com/elastic/elasticsearch/issues/52911
2020-03-02 09:26:23 -05:00
Dimitris Athanasiou 85b4e45093
[7.x]ML] Parse and report memory usage for DF Analytics (#52778) (#52980)
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.

Backport of #52778 and #52958
2020-02-29 13:03:40 +02:00
Dan Hermann dd44376d27
[7.x] Send the fields param in body instead of URL params (#52948) 2020-02-28 08:57:35 -06:00
Costin Leau a674085903 EQL: Disable field extraction for returned events (#52884)
Return the whole source of matching events

(cherry picked from commit 79ca586ab1d89d645fb58142b82202f14ce5d361)
2020-02-28 13:48:15 +02:00
Nik Everett 1d1956ee93
Add size support to `top_metrics` (backport of #52662) (#52914)
This adds support for returning the top "n" metrics instead of just the
very top.

Relates to #51813
2020-02-27 16:12:52 -05:00
Jason Tedor d568345f1a
Fix roles parsing in client nodes sniffer (#52888)
We mades roles pluggable, but never updated the client to account for
this. This means that when speaking to a modern cluster, application
logs are spammed with warning messages around unrecognized roles. This
commit addresses this by accounting for the fact that roles can extend
beyond master/data/ingest now.
2020-02-27 15:26:49 -05:00
Benjamin Trent 19a6c5d980
[7.x] [ML][Inference] Add support for multi-value leaves to the tree model (#52531) (#52901)
* [ML][Inference] Add support for multi-value leaves to the tree model (#52531)

This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.
2020-02-27 14:05:28 -05:00
Benjamin Trent eac38e9847
[ML] Add indices_options to datafeed config and update (#52793) (#52905)
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.

This is necessary for the following use cases:
 - Reading from frozen indices
 - Allowing certain indices in multiple index patterns to not exist yet

These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.

closes https://github.com/elastic/elasticsearch/issues/48056
2020-02-27 13:43:25 -05:00
Josh Devins 68ba571f70
Adds recall@k metric to rank eval API (#52889)
This change adds the recall@k metric and refactors precision@k to match
the new metric.

Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.

See: https://github.com/elastic/elasticsearch/issues/51676
Backports: https://github.com/elastic/elasticsearch/pull/52577
2020-02-27 16:04:24 +01:00
Costin Leau 40bc06f6ad EQL: Hook engine to Elasticsearch (#52828)
Add query execution and return actual results returned from
Elasticsearch inside the tests

(cherry picked from commit 3e039282bf991af87604a6d4f8eada19d5e33842)
2020-02-27 11:22:22 +02:00
Jake Landis 8d311297ca
[7.x] Smarter copying of the rest specs and tests (#52114) (#52798)
* Smarter copying of the rest specs and tests (#52114)

This PR addresses the unnecessary copying of the rest specs and allows
for better semantics for which specs and tests are copied. By default
the rest specs will get copied if the project applies
`elasticsearch.standalone-rest-test` or `esplugin` and the project
has rest tests or you configure the custom extension `restResources`.

This PR also removes the need for dozens of places where the x-pack
specs were copied by supporting copying of the x-pack rest specs too.

The plugin/task introduced here can also copy the rest tests to the
local project through a similar configuration.

The new plugin/task allows a user to minimize the surface area of
which rest specs are copied. Per project can be configured to include
only a subset of the specs (or tests). Configuring a project to only
copy the specs when actually needed should help with build cache hit
rates since we can better define what is actually in use.
However, project level optimizations for build cache hit rates are
not included with this PR.

Also, with this PR you can no longer use the includePackaged flag on
integTest task.

The following items are included in this PR:
* new plugin: `elasticsearch.rest-resources`
* new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy
* new extension 'restResources'
```
restResources {
  restApi {
    includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar
    includeXpack 'baz' //will include x-pack specs that start with baz
  }
  restTests {
    includeCore 'foo', 'bar' //will include the core tests that start with foo and bar
    includeXpack 'baz' //will include the x-pack tests that start with baz
  }
}

```
2020-02-26 08:13:41 -06:00
Sachin Frayne d3c0a2f013 Improve the error message when loading text fielddata. (#52753)
Emphasize keyword over fielddata as the preferred way to use String fields for aggregations or sorting.
2020-02-25 15:45:44 -08:00
Aleksandr Maus a7bdb0b456
EQL: Add integration tests harness to test EQL feature parity with original implementation (#52248) (#52675)
The tests use the original test queries from
https://github.com/endgameinc/eql/blob/master/eql/etc/test_queries.toml
for EQL implementation correctness validation.
The file test_queries_unsupported.toml serves as a "blacklist" for the
queries that we do not support. Currently all of the queries are
blacklisted. Over the time the expectation is to eventually have an
empty "blacklist" when all of the queries are fully supported.

The tests use the original test vector from
https://raw.githubusercontent.com/endgameinc/eql/master/eql/etc/test_data.json.

Only one EQL and the response is stubbed for now to match the expected
output from that query. This part would need some tweaking after EQL is
fully wired.

Related to https://github.com/elastic/elasticsearch/issues/49581
2020-02-24 12:46:59 -05:00
Przemko Robakowski e72cb79476
Add docs for errors in GetAlias API (#51850) (#52716)
Closes #31499

Co-authored-by: Maxim <timonin.maksim@mail.ru>
2020-02-24 18:22:09 +01:00
Igor Motov e5b21a3fc6
Add HLRC for EQL search (#52550)
Adds EQL HLRC client with the search method.

Relates to #51961
2020-02-21 08:44:08 -05:00
Maria Ralli ba8d6d1fb5 Remove Xlint exclusions from gradle files
Backport of #52542.

This commit is part of issue #40366 to remove disabled Xlint warnings
from gradle files. In particular, it removes the Xlint exclusions from
the following files:

- benchmarks/build.gradle
- client/client-benchmark-noop-api-plugin/build.gradle
- x-pack/qa/rolling-upgrade/build.gradle
- x-pack/qa/third-party/active-directory/build.gradle
- modules/transport-netty4/build.gradle

For the first three files no code adjustments were needed. For
x-pack/qa/third-party/active-directory move the suppression at the code
level. For transport-netty4 replace the variable arguments with
ArrayLists and remove any redundant casts.
2020-02-20 14:12:05 +00:00
David Kyle 7bbe5c8464
[Ml] Validate tree feature index is within range (#52514)
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
2020-02-19 14:41:43 +00:00
Nik Everett 146def8caa
Implement top_metrics agg (#51155) (#52366)
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <zach@elastic.co>
2020-02-14 11:19:11 -05:00
Nik Everett 2dac36de4d
HLRC support for string_stats (#52163) (#52297)
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.

While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
2020-02-12 19:25:05 -05:00
David Roberts 1cefafdd14 [ML] Add new categorization stats to model_size_stats (#52009)
This change adds support for the following new model_size_stats
fields:

- categorized_doc_count
- total_category_count
- frequent_category_count
- rare_category_count
- dead_category_count
- categorization_status

Backport of #51879
2020-02-10 09:10:50 +00:00
Martijn van Groningen 44ea1efd26
Tidy up GetSourceRequest class: (#51916)
* No need to implement ToXContentObject
* Made index and id fields immutable.
2020-02-10 09:42:03 +01:00
Jay Modi 3edadfefd0 RestHandlers declare handled routes (#52123)
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.

This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.

Closes #51622

Co-authored-by: Jason Tedor <jason@tedor.me>

Backport of #51950
2020-02-09 22:48:32 -07:00
Benjamin Trent c6111eb90e
[ML][Inference] adding number_samples to TreeNode (#51937) (#52060)
in preparation for feature importance and split information gain, adding `number_samples` field to `TreeNode` definition.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-07 17:04:58 -05:00
David Kyle 8f10a7c6ca [ML] Make Ensemble feature names optional (#51996)
The featureNames field is requisite in individual models but is not required by the Ensemble.
2020-02-07 10:08:37 +00:00
Jason Tedor 25daf5f1e1
Add autoscaling API skelton (#51564)
The main purpose of this commit is to add a single autoscaling REST
endpoint skeleton, for the purpose of starting to build out the build
and testing infrastructure that will surround it. For example, rather
than commiting a fully-functioning autoscaling API, we introduce here
the skeleton so that we can start wiring up the build and testing
infrastructure, establish security roles/permissions, an so on. This
way, in a forthcoming PR that introduces actual functionality, that PR
will be smaller and have less distractions around that sort of
infrastructure.
2020-02-06 21:55:01 -05:00
Ioannis Kakavas 5092d3098d
Cleanup test user in HLRC test (#49477) (#51942)
SecurityIT.testGetUser creates a user for testing purposes, but did
not delete the user at the end of the test. This could leave the
cluster in an unexpected state for other tests.

This commit:
- Deletes the user at the end of `testGetUser`
- Adds the test-name as metadata to the users that are created in `SecurityIT`
  so that their origin is clear if they do interfere with other tests
- Enables SecurityDocumentationIT.testGetUsers on the expectation that
  the new cleanup step will resolve the unreliability of that test.

Relates: #48440

Co-authored-by: Tim Vernum <tim@adjective.org>
2020-02-06 13:05:09 +02:00
Martijn van Groningen 0610eb51ef
Change HLRC SourceExists to use GetSourceRequest instead of GetRequest (#51789) (#51913)
Originates from #50885

Co-authored-by: Maxim <timonin.maksim@mail.ru>
2020-02-05 13:27:31 +01:00
Julie Tibshirani 38ce428831
Create a class to hold field capabilities for one index. (#51844)
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.

Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
2020-02-04 11:24:57 -08:00
James Rodewig 4ea7297e1e
[DOCS] Change http://elastic.co -> https (#48479) (#51812)
Co-authored-by: Jonathan Budzenski <jon@budzenski.me>
2020-02-03 09:50:11 -05:00
Ryan Ernst 21224caeaf Remove comparison to true for booleans (#51723)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
2020-01-31 16:35:43 -08:00
Lee Hinman b9faa0733d
[7.x] Rename ILM history index enablement setting (#51698) (#51705)
* Rename ILM history index enablement setting

The previous setting was `index.lifecycle.history_index_enabled`, this commit changes it to
`indices.lifecycle.history_index_enabled` to indicate this is not an index-level setting (it's node
level).
2020-01-30 15:27:44 -07:00
Benjamin Trent 1380dd439a
[7.x] [ML][Inference] Fix weighted mode definition (#51648) (#51695)
* [ML][Inference] Fix weighted mode definition (#51648)

Weighted mode inaccurately assumed that the "max value" of the input values would be the maximum class value. This does not make sense. 

Weighted Mode should know how many classes there are. Hence the new parameter `num_classes`. This indicates what the maximum class value to be expected.
2020-01-30 15:33:25 -05:00
Hendrik Muhs b5106aa59d dump audit index to logs for better debugging (#51627)
The audit index is re-created for every testrun and therefore potential useful debug information
gets lost. This change reads out the audit index and logs the results, which makes them available
for debugging CI issues.

relates #51549
2020-01-30 11:14:56 +01:00
Albert Zaharovits f25b6cc2eb
Add new 'maintenance' index privilege #50643
This commit creates a new index privilege named `maintenance`.
The privilege grants the following actions: `refresh`, `flush` (also synced-`flush`),
and `force-merge`. Previously the actions were only under the `manage` privilege
which in some situations was too permissive.

Co-authored-by: Amir H Movahed <arhd83@gmail.com>
2020-01-30 11:59:11 +02:00
Ioannis Kakavas 81e7d926f6
Add HLRC docs for AuthN and TLS (#51355) (#51551)
This commit adds examples in our documentation for

- An HLRC instance authenticating to an elasticsearch cluster using
an elasticsearch token service access token or an API key
- An HLRC instance connecting to an elasticsearch cluster that is
setup for TLS on the HTTP layer when the CA certificate of the
cluster is available either as a PEM file or a keystore
- An HLRC instance connecting to an elasticsearch cluster that
requires client authentication where the client key and certificate
are available in a keystore

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2020-01-29 08:14:38 +02:00
Ioannis Kakavas ee202a642f
Enable tests in FIPS 140 in JDK 11 (#49485)
This change changes the way to run our test suites in 
JVMs configured in FIPS 140 approved mode. It does so by:

- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.

- When runtime java is 11 and higher, using BouncyCastle FIPS 
Cryptographic provider and BCJSSE in FIPS mode. These are 
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests

- When runtime java is 8, using BouncyCastle FIPS 
Cryptographic provider and SunJSSE in FIPS mode. 

Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
2020-01-27 11:14:52 +02:00
Benjamin Trent fc994d9ce1
[ML][Inference] Adds validations for model PUT (#51376) (#51409)
Adds validations making sure that

* `input.field_names` is not empty
* `ensemble.trained_models` is not empty
* `tree.feature_names` is not empty

closes https://github.com/elastic/elasticsearch/issues/51354
2020-01-24 09:29:12 -05:00
Benjamin Trent 76660a5a4f
[7.x] [ML][Inference] add tags url param to GET (#51330) (#51404)
* [ML][Inference] add tags url param to GET (#51330)

Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.

This parameter allows the list of models to be further reduced to those who contain all the provided tags.
2020-01-24 08:26:58 -05:00
Nhat Nguyen 072203cba8
Clean soft-deletes setting in ccr tests (#51113) (#51372)
We no longer need to explicitly enable soft-deletes in CCR tests.

Relates #50775
Backport of #51113
2020-01-23 16:31:47 -05:00