Commit Graph

5350 Commits

Author SHA1 Message Date
Benjamin Trent ea9b8b9d41
[ML] fix setting forecasts to failed method (#57654) (#57656) 2020-06-04 08:54:46 -04:00
Rene Groeschke 751f16858b
Remove duplicate ssl setup in sql/qa projects (#57319) (#57643)
* Remove duplicate ssl setup in sql/qa projects
* Fix enforcement of task instances
* Use static data for cert generation
* Move ssl testing logic into a plugin
* Document test cert creation
2020-06-04 14:53:23 +02:00
Marios Trivyzas 5f8442d1f4
SQL: Improve performances of LTRIM/RTRIM (#57603)
Change custom stripping leading and trailing whitespaces implementation
to substantially improves performance:
```
Benchmark                         Mode  Cnt      Score     Error  Units
StringTrim.testWithStringBuilder  avgt   25  82547.575 ±  66.244  ns/op (existing impl)
StringTrim.testWithSubstring      avgt   25   1398.762 ± 101.152  ns/op (new impl)
StringTrim.testWithJavaStrip      avgt   25   1186.120 ±  10.374  ns/op (for reference)
```
Java's string stripLeading()/stripTrailing() not available to all supported JDKs.

Enhanced LENGTH unit tests and compine a couple of LTRIM/RTRIM integ
tests.

Relates to: #57594
(partially cherry picked from commit ee7868d68733f195dc46926a7eab3d9dd7033ef4)

Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
2020-06-04 13:43:49 +02:00
Igor Motov 8d7f389f3a
Increase search.max_buckets to 65,535 (#57042)
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes #51731
2020-06-03 15:35:41 -04:00
Julie Tibshirani e0a15e8dc4
Remove the 'array value parser' marker interface. (#57571) (#57622)
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).

The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
2020-06-03 11:30:14 -07:00
Marios Trivyzas a674844893
SQL: Implement TRIM function (#57518) (#57593)
Add `TRIM` function which combines the functionality of both
`LTRIM` and `RTRIM` by stripping both leading and trailing
whitespaces.

Refers to #41195

(cherry picked from commit 6c86c919e12f0c4cb5e39d129aa65ab3e274268f)
2020-06-03 15:19:48 +02:00
Ioannis Kakavas 64583f7ec4
Mute EmailSslTests test case in fips (#57576) (#57577)
We test expected TLS failures by catching SSLException, but other
security providers ( i.e. BCFIPS ) might throw a different one. In
this case, BCFIPS throws org.bouncycastle.tls.TlsFatalAlert
2020-06-03 11:23:31 +03:00
Marios Trivyzas 634936e3be
SQL: [Tests] Enable tests which have been fixed (#57526) (#57538)
Enable integration tests for issues that have been fixed
over time.

(cherry picked from commit 117759ee152bcfb0043e5af3a784302ca31f6b8c)
2020-06-02 23:38:33 +02:00
Nik Everett 2a27c411fb
Same memory when geo aggregations are not on top (#57483) (#57551)
Saves memory when the `geotile_grid` and `geohash_grid` are not on the
top level by using the `LongKeyedBucketOrds` we built in #55873.
2020-06-02 16:21:50 -04:00
Dan Hermann 97a51272b0
Fix incorrect log warning when exporting monitoring via HTTP without authentication (#57552) 2020-06-02 15:03:55 -05:00
Mark Tozzi e50f514092
IndexFieldData should hold the ValuesSourceType (#57373) (#57532) 2020-06-02 12:16:53 -04:00
Rene Groeschke 8584da40af
Move classes from build scripts to buildSrc (#57197) (#57512)
* Move classes from build scripts to buildSrc

- move Run task
- move duplicate SanEvaluator

* Remove :run workaround

* Some little cleanup on build scripts on the way
2020-06-02 15:33:53 +02:00
Andrei Dan bd188f4a21
[7.x] ILM: add support for rolling over data streams (#57295) (#57515)
As the datastream information is stored in the `ClusterState.Metadata` we exposed
the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for
the steps to be able to identify when a managed index is part of a DataStream.

If a managed index is part of a DataStream the rollover target is the DataStream
name and the highest generation index is the write index (ie. the rolled index).

(cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-02 11:55:23 +01:00
Przemysław Witek ea6cfb7c3d
[7.x] Make Annotation a result type (#56342) (#57508) 2020-06-02 11:56:41 +02:00
Tanguy Leroux b4a2cd810a
Use 3rd party task to run integration tests on external service (#56588)
Backport of #56587 for 7.x
2020-06-02 11:26:58 +02:00
Marios Trivyzas 52c555e286
SQL: Make CASTing string to DATETIME more lenient (#57451) (#57509)
Some BI tools (i.e. Tableau) would try to cast strings where the time
part is separated from the date part with a whitespace instead of `T`.
Adjust type conversion used by CAST to support this.

(cherry picked from commit 0e18321e7ad9f779c42855efbf93f171b9128a5e)
2020-06-02 10:54:03 +02:00
Marios Trivyzas b8a13de20f
SQL: Implement TOP as an alternative to LIMIT (#57428) (#57507)
Add basic support for `TOP X` as a synonym to LIMIT X which is used
by [MS-SQL server](https://docs.microsoft.com/en-us/sql/t-sql/queries/top-transact-sql?view=sql-server-ver15),
e.g.:

```
SELECT TOP 5 a, b, c FROM test
```

TOP in SQL server also supports the `PERCENTAGE` and `WITH TIES`
keywords which this implementation doesn't.

Don't allow usage of both TOP and LIMIT in the same query.

Refers to #41195

(cherry picked from commit 2f5ab81b9ad884434d1faa60f4391f966ede73e8)
2020-06-02 10:53:42 +02:00
Przemysław Witek ceb4b29b98
Introduce Annotation.event field (#57144) (#57453) 2020-06-01 20:42:25 +02:00
Mark Tozzi 1f500583b1
Clean up Aggregator Supplier Boiler Plate (#57442) (#57452) 2020-06-01 14:21:07 -04:00
Zachary Tong daaf5a3dcc
Fix assertion catching in aggregation supported type test (#56466) (#57382)
At some point, we changed the supported-type test to also catch
assertion errors.  This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.

This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.

Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.

Also includes a variety of fixes to other tests which were failing
but being silently smothered.
2020-06-01 12:10:05 -04:00
David Kyle 064093c4d4 Fix compilation after backport of #57278 2020-06-01 12:03:13 +01:00
Przemysław Witek 72ad9a4548
[7.x] Make AnnotationPersister use bulk requests instead of indexing individual documents (#57278) (#57354) 2020-06-01 12:05:09 +02:00
Benjamin Trent 34f1e0b6bb
[7.x] [ML] mark forecasts for force closed/failed jobs as failed (#57143) (#57374)
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)

forecasts that are still running should be marked as failed/finished in the following scenarios:

- Job is force closed
- Job is re-assigned to another node.

Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.

Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.

relates to https://github.com/elastic/elasticsearch/issues/56419
2020-05-29 14:48:10 -04:00
Benjamin Trent 35d5126cea
[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351) (#57368)
* [ML] adds new for_export flag to GET _ml/inference API (#57351)

Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 14:01:08 -04:00
Benjamin Trent 15aba60c02
[7.x] Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695) (#57359)
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)

This commit lays the ground work for plugins supplying their own circuit breakers.

It adds a new interface: `CircuitBreakerPlugin`.

This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.

With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
2020-05-29 12:13:46 -04:00
Benjamin Trent c8374dc9f3
[ML] add max_model_memory parameter to forecast request (#57254) (#57355)
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 11:16:08 -04:00
Marios Trivyzas b2651323fd
SQL: Implement TIME_PARSE function for parsing strings into TIME values (#55223) (#57342)
Implement TIME_PARSE(<time_str>, <pattern_str>) function
which allows to parse a time string according to the specified
pattern into a time object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Closes #54963

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <patrickjiang0530@gmail.com>

(cherry picked from commit 1fe1188d449cad7d0782a202372edc52a4014135)
2020-05-29 15:48:37 +02:00
Dan Hermann 6b0d707671
[7.x] Do not report negative values for swap sizes (#57353) 2020-05-29 08:11:47 -05:00
Martijn van Groningen 04ef39da77
Change cluster info actions to be able to resolve data streams. (#57343)
Backport of #56878 to 7.x branch.

With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.

Relates to #53100
2020-05-29 12:17:53 +02:00
Dimitris Athanasiou 322f953060
[7.x][ML] Anomaly detection jobs should allow missing values for geo fields (#57300) (#57338)
Allows geo fields (`geo_point`, `geo_shape`) to have missing values.
Fixes a bug where such missing values would result in an error.

Closes #57299

Backport of #57300
2020-05-29 13:06:16 +03:00
Benjamin Trent 24d605e41e
[ML] fixing GET _ml/inference so size param is respected (#57303) (#57308)
`size` was previously ignored when grabbing full trained model configs. 

closes https://github.com/elastic/elasticsearch/issues/57298
2020-05-28 15:45:26 -04:00
Martijn van Groningen 225ccd1cfa
Ensure template exists when creating data stream (#57275)
Backporting #56888 to 7.x branch.

Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.

This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.

Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.

Relates to #53100
2020-05-28 15:08:25 +02:00
Marios Trivyzas fdac9e99fa
SQL: Fix unecessary evaluation for CASE/IIF (#57159) (#57262)
Previously, `CASE` and `IIF` when translated to painless scripts
(used in GROUP BY, HAVING, WHERE) a custom `caseFunction`
registered in the `InternalSqlScriptUtils` was used. This function
received and array of arbitrary length:
```[condition1, result1, condition2, result2, ... elseResult]```

Painless doesn't know of the context and therefore is evaluating
all conditions and results before invoking the `caseFunction` on them.
As a consequence, erroneous result expressions (i.e. division by 0)
where always evaluated despite of the guarding condition.

Replace the `caseFunction` with painless `<cond> ? <res1> : <res2>`
expressions to properly guard the result expressions and only evaluate
the one for which its guarding condition evaluates to true (or of course
the elseResult).

As a bonus, this approach includes performance benefits since we avoid
unnecessary evaluations of both conditions and result expressions.

Fixes: #49672
(cherry picked from commit 9584b345d89f797bfb658212b928b9812804f02f)
2020-05-28 11:30:14 +02:00
Tim Vernum 408250dcc4
Fix smtp.ssl.trust setting for watcher email (#57268)
The ssl.trust setting for Watcher provides a list of hostnames that
should be automatically trusted for SSL hostname verification. It was
accidentally broken when we added the full ssl.* settings for email
notifications (see #45272)

This commit corrects this, so the setting is once again respected,
as long as none of the other ssl settings are configured for email
notifications.

Resolves: #52153
Backport of: #56090
2020-05-28 17:34:13 +10:00
Ryan Ernst fdb8573413
Convert remaining compilerJavaHome reference 2020-05-27 17:04:04 -07:00
Ryan Ernst beb1d0c338
Remove compiler java version flag (#57237)
This commit removes the compiler.java setting from the build. It was
originally added when Gradle was far behind support for the latest jdk,
but is no longer applicable as we don't have any need to update the
supported compile version before gradle supports the newer version. Note
that the runtime version changing support still exists here, this only
ensures we use the same jdk to compile as we use to run gradle.
2020-05-27 16:33:38 -07:00
David Roberts d139a79ef6
[7.x][ML] Fix monitoring if orphaned anomaly detector persistent tasks exist (#57240)
Since #51888 the ML job stats endpoint has returned entries for
jobs that have a persistent task but not job config. Such
orphaned tasks caused monitoring to fail.

This change ignores any such corrupt jobs for monitoring purposes.

Backport of #57235
2020-05-27 22:59:11 +01:00
James Baiera 3b73ce3112
Fix enrich coordinator to reject documents instead of deadlocking (#56247) (#57179)
This PR removes the blocking call to insert ingest documents into a queue in the
coordinator. It replaces it with an offer call which will throw a rejection exception
in the event that the queue is full. This prevents deadlocks of the write threads
when the queue fills to capacity and there are more than one enrich processors
in a pipeline.
2020-05-27 15:32:13 -04:00
Lee Hinman c0f732b9f6
[7.x] Rename template V2 classes to ComposableTemplate (#57183) (#57232)
Backports the following commits to 7.x:

    Rename template V2 classes to ComposableTemplate (#57183)
2020-05-27 11:01:59 -06:00
Tal Levy 81060820e9 Fix NormalizerAgg test searcher wrapping (#57171)
The searcher was randomly wrapping its reader as slow, parallel, or filtered.
This was causing casting issues in the normalizer tests. By removing the
wrapping, the problem goes away.

Closes #57164
2020-05-26 13:25:19 -07:00
Benjamin Trent decc6277f9
[ML] allow unran/incomplete forecasts to be deleted for stopped/failed jobs (#57152) (#57172)
If a job is NOT opened, forecasts should be able to be deleted, no matter their state.

This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids

closes https://github.com/elastic/elasticsearch/issues/56419
2020-05-26 15:44:22 -04:00
Bogdan Pintea 74b2c8a770 Change error message for comp against fields (#57126)
Change the error message wording for comparisons against fields in
filtering (s/variables/fields).

(cherry picked from commit d9a1cb50940d0a98fd75b9c0123ca6e1d862f65d)
2020-05-26 17:57:51 +02:00
Bogdan Pintea 0c379e334a SQL: update the JLine dependency to 3.14.1 (#57111)
* Update the JLine dependency to 3.14.1

Update the JLine dependency from 3.10.0 to 3.14.1.

(cherry picked from commit c2d9b74046fa5ddb54604da3afa7887cc38548a1)
2020-05-26 17:56:34 +02:00
markharwood b2bc6071fd
Add regex query support to wildcard field (approach 2) (#55548) (#57141)
Backport of #55548

Adds equivalence for keyword field to the wildcard field. Regex, fuzzy, wildcard and prefix queries are all supported.
All queries use an approximation query backed by an automaton-based verification queries.

Closes #54275
2020-05-26 16:55:59 +01:00
markharwood 1d74549d7f
Wildcard field - add support for null field with test (#57047) (#57139)
Backport of #57047
2020-05-26 16:07:49 +01:00
David Kyle 571477d0ad
[7.x] Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041) (#57136)
Fix delete_expired_data/nightly maintenance when 
many model snapshots need deleting (#57041)

The queries performed by the expired data removers pull back entire 
documents when only a few fields are required. For ModelSnapshots in 
particular this is a problem as they contain quantiles which may be 
100s of KB and the search size is set to 10,000.

This change makes the search more efficient by only requesting the 
fields needed to work out which expired data should be deleted.
2020-05-26 10:56:42 +01:00
Ioannis Kakavas 1e03de4999
Fix key usage in SamlAuthenticatorTests (#57124) (#57129)
In #51089 where SamlAuthenticatorTests were refactored, we missed
to update one test case which meant that a single key would be
used both for signing and encryption in the same run. As explained
in #51089, and due to FIPS 140 requirements, BouncyCastle FIPS
provider will block RSA keys that have been used for signing from
being used for encryption and vice versa

This commit changes testNoAttributesReturnedWhenTheyCannotBeDecrypted
to always use the specific keys we have added for encryption.
2020-05-26 10:51:47 +03:00
Jim Ferenczi 52443d41cf Stop async search maintenance service on restart (#56982)
This change ensures that we stop the maintenance service on all nodes
when a data node is restarted. This ensures that we don't send
update_by_query requests on the node that is restarted.
This commit also raises the log level to trace for some packages
in order to investigate the failures to acquire a shard lock
after a restart.

Relates #56765
2020-05-26 09:30:33 +02:00
Przemysław Witek ea2012778e
Mute failing test (#57112) (#57113) 2020-05-25 14:06:29 +02:00
Ioannis Kakavas 174af2bb1a
[7.x] Refactor SamlAuthenticatorTests (#51089) (#57105)
- Use opensaml to sign and encrypt responses/assertions/attributes
instead of doing this manually
- Use opensaml to build response and assertion objects instead of
parsing xml strings
- Always use different keys for signing and encryption. Due to FIPS
140 requirements, BouncyCastle FIPS provider will block
RSA keys that have been used for signing from being used for
encryption and vice versa. This change adds new encryption specific
 keys to be used throughout the tests.
2020-05-25 14:09:42 +03:00
Ioannis Kakavas 6c832fe4e3
Don't run IDP tests in FIPS 140 mode (#57048) (#57098)
We don't support this for now so there is no need to handle all
the test logic/exceptions to run this in FIPS 140 mode.
2020-05-25 14:08:48 +03:00
Armin Braun 9fa60f7367
Add History UUID Index Setting (#56930) (#57104)
Pre-requesite for #50278 to be able to uniquely identify index metadata by
its version fields and UUIDs when restoring into closed indices.
2020-05-25 11:26:03 +02:00
Marios Trivyzas b91bae30b1
SQL: [Tests] Move JDBC integration tests to new module (#56872) (#57072)
Move the JDBC functionality integration tests from `:sql:qa` to a separate
module `:sql:qa:jdbc`. This way the tests are isolated from the rest of the
integration tests and they only depend to the `:sql:jdbc` module, thus
removing the danger of accidentally pulling in some dependency that may
hide bugs.

Moreover this is a preparation for #56722, so that we can run those tests
between different JDBC and ES node versions and ensure forward
compatibility.

Move the rest of existing tests inside a new `:sql:qa:server` project, so that
the `:sql:qa` becomes the parent project for both and one can run all the integration
tests by using this parent project.

(cherry picked from commit c09f4a04484b8a43934fe58fbc41bd90b7dbcc76)
2020-05-22 17:49:36 +02:00
Ioannis Kakavas 6c90727166
Fix custom policy in plugins in FIPS 140 (#52046) (#57049)
Our FIPS 140 testing depends on setting the appropriate java policy
in order to configure the JVM in FIPS mode. Some tests (
discovery-ec2 and ccr qa ) also needed to set a custom policy file
to grant a specific permission, which overwrote the FIPS related
policy and tests would fail. This change ensures that when a
custom policy needs to be set in these tests, the permissions that
are necessary for FIPS are also set.

Resolves: #51685, #52034
2020-05-21 19:26:56 +03:00
Benjamin Trent f00dfb2d5f
[ML] adds WKT support in filestructurefinder (#57014) (#57032)
Field mapping detection is done via grok patterns. 
This commit adds well-known text (WKT) formatted geometry detection.

If everything is a `POINT`, then a `geo_point` mapping is preferred. 
Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred.

This does **NOT** detect other types of formatted geometries (geohash, comma delimited points, etc.)

closes https://github.com/elastic/elasticsearch/issues/56967
2020-05-21 08:22:51 -04:00
markharwood eb8cb31d46
Update Lucene version to 8.6.0-snapshot-9d6c738ffce (#57024)
Same version as master
2020-05-21 11:28:16 +01:00
Bogdan Pintea ec4a6aa1c6 SQL: JDBC: fix temporary directory locked test errors in Windows (#56917)
* Fix temp dir locked errors

The tests involving a temporary directory (containing the JDBC JAR) fail
on Windows because they can't be deleted, due to still being in use.
This commit forces a premature closing of the JAR file, which mitigates
the failure by giving the JVM more time to collect any open FDs.
(Calling the System.gc() in the tests is another working alternative
fix.)

The stream-based JAR access is taken care by disabling the cache usage

(cherry picked from commit 04f97333a015404a68e8f19223f33aadeb396687)
2020-05-20 19:46:57 +02:00
Benjamin Trent ee4ce8ecec
Fix geotile_grid group_by field mapping (#56939) (#56990)
The original implementation utilized `bbox` as the index mapping type. This would not work as it would have to be `envelope`. But, given that `envelope` and `polygon` are tessellated in the same way, we choose to use `polygon` as the geo_shape type. This is for easier support other places in the stack (a la kibana maps)
2020-05-20 08:22:13 -04:00
Alan Woodward 18bfbeda29 Move merge compatibility logic from MappedFieldType to FieldMapper (#56915)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.

This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.

Relates to #56814
2020-05-20 09:43:13 +01:00
Marios Trivyzas 644ae49817
SQL: Fix behaviour of COUNT(DISTINCT <literal>) (#56869) (#56932)
Previously `COUNT(DISTINCT <literal>)` was returning the same result
as `COUNT(<literal>)` which is not correct as it should always return 1
if there is at least one matching row (bucket if there is a GROUP BY),
or 0 otherwise.

(cherry picked from commit 7f7d7562d43034907f432d39d0d66f490d78f4a8)
2020-05-19 11:19:06 +02:00
Yannick Welsch f296c08021 Increase timeout for assertLongBusy in AutoFollowIT (#56910)
Closes #56891
2020-05-18 16:20:46 +02:00
Benjamin Trent 297f864884
[ML] relax throttling on expired data cleanup (#56711) (#56895)
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 08:46:42 -04:00
David Kyle 0fac152188 Muse AsyncSearchActionIT (#56897)
For #56765
2020-05-18 13:36:33 +01:00
Ioannis Kakavas bb852ab2e7
Cause is tracked in #49094 (#56887) 2020-05-18 15:03:38 +03:00
David Kyle 52a329fa12 Mute sql.client.VersionTests suite (#56883)
For  #56882
2020-05-18 10:15:30 +01:00
Bogdan Pintea de7dd6154e Fix range of version number generation in test (#56849)
The version number componenent can't equal or exceed the revision
multiplier.
This fixes a the VersionTests unit test.

(cherry picked from commit 7d2331a2818ae20024c5c3617cd4433f90e9c098)
2020-05-16 08:59:45 +02:00
Andrei Stefan 4d47d63f55
SQL: implement SUM, MIN, MAX, AVG over literals (#56786) (#56850)
* Adds support for MIN, MAX, AVG, SUM aggregates acting on literals.
SELECT SUM(1) FROM index
and
SELECT SUM(1), AVG(2)
work both on indices and as local execution.

(cherry picked from commit efb72907c0391612c4a2b6256e327060b4167912)
2020-05-16 02:13:55 +03:00
Jake Landis 813609b47c
Ensure that .watcher-history-11* template is in installed prior to use (#56734)
WatcherIndexTemplateRegistry as of https://github.com/elastic/elasticsearch/pull/52962 
requires all nodes to be on 7.7.0 before it allows the version 11 index template to be 
installed.

While in a mixed cluster, nothing prevents Watcher from running on the new
host before the all of the nodes are on 7.7.0. This will result in the
.watcher-history-11* index without the proper mappings. Without the proper
mapping a single document (for a large watch) can exceed the default 1000 field
limit and cause error to show in the logs.

This commit ensures the same logic for writing to the index is applied as for
installing the template. In a mixed cluster, the `10` index template will continue
to be written. Only once all of nodes are on 7.7.0+ will the `11` index template
be installed and used.

closes #56732
2020-05-15 16:29:04 -05:00
Dimitris Athanasiou 54d3cc74ec
[7.x][ML] Ensure class is represented when its cardinality is low (#56783) (#56829)
In DF analytics classification, it is possible to use no samples
of a class if its cardinality is too low.

This commit fixes this by ensuring the target sample count can never be zero.

Backport of #56783
2020-05-15 20:52:06 +03:00
Bogdan Pintea 14ad733bd1
SQL: JDBC: fix access to the Manifest for non-entry JAR URLs (#56797) (#56839)
* JDBC: fix access to the Manifest for non-entry JAR

The JDBC driver will attempt to read its version from the Manifest file
embedded into its JAR. The URL pointing to the JAR can be provided in a
few ways.

So far, accessing the Manfiest was attempted by getting a URLConnection
out of the URL and then getting an input stream out of this connection.
For file JAR URLs, this only works however if the URL points to the
driver as a JAR file entry (i.e. <sub-url>!/jdbc-driver.jar!/). If
that's not the case, the JarURLConnection will throw an IOException.

This commit fixes that: in case the URL points to a JAR entry
(jar:file:<path>/jdbc-driver.jar!/), the manifest is read directly with
JarURLConnection#getManifest().

(cherry picked from commit 2175b7b01cf5fcf3ab2bb21404a9bd454a8df3f0)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-05-15 19:35:54 +02:00
James Baiera 4809db3ff9
EnrichProcessorFactory should not throw NPE if missing metadata (#55977) (#56793)
In some cases the Enrich processor factory may be called before it is
ready to create processors. While these calls are usually made in error,
the response from the Enrich processor is an NPE which is almost always
an unhelpful error when debugging an issue.
2020-05-15 12:02:13 -04:00
Ioannis Kakavas 239ada1669
Test adjustments for FIPS 140 (#56526)
This change aims to fix our setup in CI so that we can run 7.x in
FIPS 140 mode. The major issue that we have in 7.x and did not
have in master is that we can't use the diagnostic trust manager
in FIPS mode in Java 8 with SunJSSE in FIPS approved mode as it
explicitly disallows the wrapping of X509TrustManager.

Previous attempts like #56427 and #52211 focused on disabling the
setting in all of our tests when creating a Settings object or
on setting fips_mode.enabled accordingly (which implicitly disables
the diagnostic trust manager). The attempts weren't future proof
though as nothing would forbid someone to add new tests without
setting the necessary setting and forcing this would be very
inconvenient for any other case ( see
#56427 (comment) for the full argumentation).

This change introduces a runtime check in SSLService that overrides
the configuration value of xpack.security.ssl.diagnose.trust and
disables the diagnostic trust manager when we are running in Java 8
and the SunJSSE provider is set in FIPS mode.
2020-05-15 18:10:45 +03:00
Benjamin Trent f71c305090
[7.x] [Transform] add support for terms agg in transforms (#56696) (#56809)
* [Transform] add support for terms agg in transforms (#56696)

This adds support for `terms` and `rare_terms` aggs in transforms. 

The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`

The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
2020-05-15 08:08:43 -04:00
David Roberts 270a23e422 [TEST] Fix log tail mocking in native process unit tests (#56804)
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.

Fixes #56796
2020-05-15 12:46:37 +01:00
Alan Woodward d33d13f2be Simplify generics on Mapper.Builder (#56747)
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
2020-05-15 12:14:49 +01:00
Yang Wang c66e7ecbfe
Fix test failure of file role store auto-reload (#56398) (#56802)
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
2020-05-15 15:10:45 +10:00
Ryan Ernst 9fb80d3827
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 20:23:07 -07:00
Tal Levy 5e90ff32f7
Add Normalize Pipeline Aggregation (#56399) (#56792)
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```
2020-05-14 17:40:15 -07:00
Mark Vieira 0fd756d511
Enforce strict license distribution requirements (#56642) 2020-05-14 13:57:56 -07:00
Costin Leau 6f4af43405 EQL: Skip execution for filters with empty results (#56718)
Optimize away events queries and joins/sequence that cannot match any
results without having to query the backend.

(cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)
2020-05-14 22:38:23 +03:00
Mark Tozzi b718193a01
Clean up DocValuesIndexFieldData (#56372) (#56684) 2020-05-14 12:42:37 -04:00
Dimitris Athanasiou ac5902624c
[7.x][ML] Improve error upon DF analytics mappings conflict (#56700) (#56776)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.

Backport of #56700
2020-05-14 19:16:10 +03:00
Jim Ferenczi fb5e6329b7 Stop/Start async search maintenance service in tests(#56673)
This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method.

Fixes #55988
2020-05-14 15:13:01 +02:00
David Turner bec6821fe6 AwaitsFix for #56755 2020-05-14 11:46:05 +01:00
Alexander Reelsen 3a263d91f6 Ensure watcher email action message ids are always unique (#56574)
If an email action is used in a foreach loop, message ids could have
been duplicated, which then get rejected by the mail server.

This commit introduces an additional static counter in the email action
in order to ensure that every message id is unique.
2020-05-14 10:36:00 +02:00
Przemysław Witek 98fbd85290
[7.x] Add scope-related fields to Annotation (#56417) (#56681) 2020-05-14 10:23:13 +02:00
Andrei Stefan ddf4e47e86
EQL: fix QueryFolderOkTests (#56714) (#56728)
(cherry picked from commit 8b21ccd0eac3b3d0fbd090152b3dff6ae5217b52)
2020-05-14 10:58:25 +03:00
David Roberts 3051c37f92
[ML] Tail the C++ logging pipe before connecting other pipes (#56701)
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.

This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block.  This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.

This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.

Backport of #56632
2020-05-14 07:10:30 +01:00
Aleksandr Maus 87a10806ab
EQL: Fix cidrMatch function fails to match when used in scripts (#56246) (#56735)
EQL: Fix cidrMatch function fails to match when used in scripts (#56246)

Addresses https://github.com/elastic/elasticsearch/issues/55709
2020-05-13 22:41:24 -04:00
Nik Everett b98b260048
Merge significant_terms into the terms package (backport of #56699) (#56715)
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in #56487
2020-05-13 17:36:21 -04:00
Ross Wolf 61e2cf89b5
EQL: Add number function (#55084)
* EQL: Add number function
* EQL: Fix the locale used for number for deterministic functionality
* EQL: Add more ToNumber tests
* EQL: Add more number ToNumberProcessor unit tests
* EQL: Remove unnecessary overrides, fix processor methods
* EQL: Remove additional unnecessary overrides
* EQL: Lint fixes for ToNumber
* EQL: ToNumber renames from PR feedback
* EQL: Remove NumberFormat locale handling
* EQL: Removed NumberFormat from ToNumber
* EQL: Add number function tests
* EQL: ToNumberProcessorTests formatting
* EQL: Remove newline in ToNumberProcessorTests
* EQL: Add number(..., null) test
* EQL: Create expression.function.scalar.math package
* EQL: Remove painless whitespace for ToNumber.asScript
* EQL: Add Long support
2020-05-13 14:09:06 -06:00
Costin Leau 9f1ecd52eb EQL: Introduce support for sequences (#56300)
Initial support for EQL sequences
The current algorithm is focused on correctness and does not contain
any optimization which is left for the future.

The current implementation uses a state machine approach which moves
ascending and runs each query one after the other working on computing
sequences as the data comes in.
For each result, the key and its timestamp are being extracted which are
then used for matching/building a sequence.

(cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)
2020-05-13 15:42:31 +03:00
Ignacio Vera b4521d5183
upgrade to Lucene 8.6.0 snapshot (#56661) 2020-05-13 14:25:16 +02:00
Marios Trivyzas cbbbd499bf
SQL/EQL: Add support for scalars within LIKE/RLIKE (#56495) (#56674)
- Add support for scalar functions on the field of SQL's LIKE/RLIKE
- Add support for scalar functions on the field of EQL's match/matchLite

Closes: #55058
(cherry picked from commit 51c14e2dbb7fb29004a23369c449d425b3ac8fe2)
2020-05-13 13:40:24 +02:00
Luca Cavanna 30e9a1b8c7 Improve error handling when decoding async execution ids (#56285)
When decoding async execution ids, exceptions thrown from the decode method itself were not caught, leading to cryptic errors like "Input byte array has incorrect ending byte at 68" being returned. With this commit we return "invalid id: [abcdef]".

Added tests coverage for a couple of these scenarios and also added tests for equals/hashcode methods.
2020-05-13 12:26:17 +02:00
Marios Trivyzas e781193cf9
SQL: Fix JDBC url pattern in docs and error message (#56612)
The docs pattern url was using `*` which means zero or many instead
of `?` which means zero or one. The pattern url returned in error
messages was not in sync with the one in the docs.

Fixes: #56476
(cherry picked from commit 1a5945c3962cdda21482f4b0b3e0ca508534c2c4)
2020-05-13 12:13:58 +02:00
David Turner c10b4ae15a Support cloning of searchable snapshot indices (#56595)
Today you can convert a searchable snapshot index back into a regular index by
restoring the underlying snapshot, but this is somewhat wasteful if the shards
are already in cache since it copies the whole index from the repository again.

Instead, we can make use of the locally-cached data by using the clone API to
copy the contents of the cache into the layout expected by a regular shard.
This commit marks the searchable snapshot's private index settings as
`NotCopyableOnResize` so that they are removed by resize operations such as
cloning.

Cloning a regular index typically hard-links the underlying files rather than
copying them, but this is tricky to support in the case of a searchable
snapshot so this commit takes the simpler approach of always copying the
underlying files.
2020-05-13 11:05:14 +01:00
Ioannis Kakavas cc119c3853
Expose idp.metadata.http.refresh for SAML realm (#56354) (#56593)
This setting was not returned in the SamlRealmSettings#getSettings
so it was not possible for users to set this in the realm config
in our configuration.
2020-05-13 11:51:18 +03:00
Jake Landis a010f4f624
[7.x] Watcher dont add watches post index if stopped (#56556) (#56629)
Watcher adds watches to the trigger service on the postIndex action
for the .watches index. This has the (intentional) side effect of also
adding the watches to the stats. The tests rely on these stats for their
assertions. The tests also start and stop Watcher between each test for
a clean slate.

When Watcher executes it updates the .watches index and upon this update
it will go through the postIndex method and end up added that watch to the
trigger service (and stats). Functionally this is not a problem, if Watcher
is stopping or stopped since Watcher is also paused and will not execute
the watch. However, with specific timing and expectations of a clean slate
can cause issues the test assertions against the stats.

This commit ensures that the postIndex action only adds to the trigger service
if the Watcher state is not stopping or stopped. When started back up it will
re-read index .watches.

This commit also un-mutes the tests related to #53177 and #56534
2020-05-12 16:30:27 -05:00
Jake Landis 9c76ee47c4
[7.x] json spec: allow null for documentation url (#55749) (#56625)
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.

This commit also adds a documentation.url for two ml resources.
2020-05-12 14:49:02 -05:00
Armin Braun 0a879b95d1
Save Bounds Checks in BytesReference (#56577) (#56621)
Two spots that allow for some optimization:

* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
   * Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
2020-05-12 20:33:45 +02:00
Armin Braun c104c9a11b
Fix Missing IgnoredUnavailable Flag in 7.x SLM Retention Task (#56616)
Without the flag we run into the situation where a broken repository (broken by some old 6.x
version of ES that is missing some snap-${uuid}.dat blobs fails to run the SLM retention task
since it always errors out).
2020-05-12 18:07:58 +02:00
Marios Trivyzas 4240b97d0e
SQL: [Test] Fix JdbcPreparedStatement date test
Use `ORDER BY` to ensure order of the rows since more
than are returned in the testDate().

Follows: #56492
(cherry picked from commit 0053a1cb515b4db160d7b0bed5cf3f13c1050687)
2020-05-12 17:08:16 +02:00
Martijn van Groningen 0c61bc63e4
Backport: auto create data streams using index templates v2 (#56596)
Backport: #55377

This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.

Relates to #53100
2020-05-12 17:01:15 +02:00
Andrei Stefan f0074e93a0
QL: case sensitive support in EQL (#56404) (#56597)
* QL: case sensitive support in EQL (#56404)
* adds a generic startsWith function to QL
* modifies the existent EQL startsWith function to be case sensitive
aware
* improves the existent EQL startsWith function to use a prefix query
when the function is used in a case sensitive context. Same improvement
is used in SQL's newly added STARTS_WITH function.
* adds case sensitivity to EQL configuration through a case_sensitive
parameter in the eql request, as established in #54411.
The case_sensitive parameter can be specified when running queries
(default is case insensitive)

(cherry picked from commit ee5a09ea840167566e34c28c8225dc38bc6a7ae8)
2020-05-12 16:56:18 +03:00
Hendrik Muhs a9425a0240
[7.x][Transform] fix count when matching exact ids(#56544) (#56582)
fix count in get and get stats if explicit ids are given and ids might be
duplicated when configuration are stored in different index (versions).

fixes #56196
2020-05-12 14:23:13 +02:00
Marios Trivyzas 575cafb8da
SQL: Fix serialization of JDBC prep statement date/time params (#56492) (#56579)
The Date/Time related query params of a JDBC prepared statement
serialized using java.util.Date. The rules for serializing
`java.util.Date` objects though reside in
`XContentElasticsearchExtension` which is not available in the
jdbc jar as this class is in `server` module. Therefore, a
custom extension of the `XContentBuilderExtension` iface has been
added to the jdbc module/jar.

Moreover the sql's `qa` project had as dependency the `sql-action`
module which depends on `server` so the `XContentBuilderExtension`
was available for the integ tests hiding the real problem.

Previously, when a user was setting a `java.sql.Time` to the prepStmt,
the DataType used was `DATETIME` instead of `TIME` and therefore
prevented from filtering with a `TIME` casted field:
```
SELECT * FROM test WHERE date::TIME = ?
```

Fixes: #56084
(cherry picked from commit f8d8e971bd2c85fa4aea44b5b3ba0cdcc950a4ed)
2020-05-12 13:25:02 +02:00
Martijn van Groningen 2e86801f61
Backport: enable searchable snapshots feature flag for xpack rest tests.
Backport of: #56569

A data stream test, which tests data stream resolvability in xpack apis failed in release builds.
A invocation of a searchable snapshot api failed, because the corresponding feature flag
wasn't enabled for xpack rest tests.

Closes #56531
2020-05-12 12:18:24 +02:00
Ignacio Vera 222ee721ec
Add moving percentiles pipeline aggregation (#55441) (#56575)
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
2020-05-12 11:35:23 +02:00
Marios Trivyzas 5c0f26de1d
SQL: [Docs] Fix example for DATETIME_PARSE (#56409)
When no timezone is specified the session timezone is used without
conversion, fix the docs test accordingly.

Follows: #56158
(cherry picked from commit 4b79b19ea5c3d17e05cb8130f3c754ac9bfd2382)
2020-05-12 09:23:00 +02:00
Ryan Ernst 902fc546bd
Migrate remaining ESIntegTestCases to internalClusterTest (#56479) (#56563)
This commit migrates the ESIntegTestCase tests in x-pack to the
internalClusterTest source set.
2020-05-11 21:06:04 -07:00
Nick Knize 9b64149ad2
[Geo] Refactor Point Field Mappers (#56060) (#56540)
This commit refactors the following:
  * GeoPointFieldMapper and PointFieldMapper to
    AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
  * .setupFieldType moved up to AbstractGeometryFieldMapper
  * lucene indexing moved up to AbstractGeometryFieldMapper.parse
  * new addStoredFields, addDocValuesFields abstract methods for implementing
    stored field and doc values field indexing in the concrete field mappers

This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
2020-05-11 17:11:36 -05:00
Tim Brooks 760ab726c2
Share netty event loops between transports (#56553)
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.
2020-05-11 15:43:43 -06:00
Benjamin Trent 1d6b2f074e
[Transform] adds geotile_grid support in group_by (#56514) (#56549)
This adds support for grouping by geo points. This uses the agg [geotile_grid](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geotilegrid-aggregation.html).

I am opting to store the tile results of group_by as a `geo_shape` so that users can query the results. Additionally, the shapes could be visualized and filtered in the kibana maps app.

relates to https://github.com/elastic/elasticsearch/issues/56121
2020-05-11 17:02:40 -04:00
Lee Hinman 1337b35572
Remove prefer_v2_templates query string parameter (#56545)
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.

Relates to #53101
Resolves #56528

This is not a breaking change because this flag was never in a released version.
2020-05-11 14:56:42 -06:00
zhenxianyimeng 8e96e5c936
Use CollectionUtils.isEmpty where appropriate (#55910)
This commit uses the isEmpty utility method for arrays in place of null and greater than zero checks.
2020-05-11 09:55:57 -07:00
Armin Braun 3ab6eba6bc
Fix RollupJobTaskTests Leaking Threads on Slowness (#56438) (#56518)
We are ensuring order in the two tests changed by waiting on latches.
The problem is, that 3s is a pretty short wait and on CI can randomly be exceeded
by pure chance. If that happened we wouldn't have visibility on it since we didn't
assert that the waits actually worked.
=> Fixed by asserting that the waits work and upping the timeout to our standard 10s
Also, moved to a per-test threadpool to make it simpler to identify which test failed,
should an unexpected task run on a closed client's pool afterall.
2020-05-11 17:24:10 +02:00
Jim Ferenczi 02ab9112a9 Fix spurious failures in AsyncSearchIntegTestCase (#56026)
Async search integration tests are subject to random failures when:
  * The test index has more than one replica.
  * The request cache is used.
  * Some shards are empty.
  * The maintenance service starts a garbage collection when node is closing.

They are also slow because the test index is created/populated on each
test method.

This change refactors these integration tests in order to:
  * Create the index once for the entire test suite.
  * Fix the usage of the request cache and replicas.
  * Ensures that all shards have at least one document.
  * Increase the delay of the maintenance service garbage collection.

Closes #55895
Closes #55988
2020-05-11 15:03:03 +02:00
Martijn van Groningen 9ae09570d8
Allow a number of broadcast transport actions to resolve data streams (#55726) (#56502)
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.

This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats, 
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.

Relates to #53100
2020-05-11 12:48:35 +02:00
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
Armin Braun 0a254cf223
Serialize Monitoring Bulk Request Compressed (#56410) (#56442)
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
2020-05-08 23:16:07 +02:00
Dimitris Athanasiou 44ffa388ac
[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423) (#56428)
We have been using a zero timeout in the case that DF analytics
is stopped. This may cause a timeout when we cancel, for example,
the reindex task.

This commit fixes this by using the default timeout instead.

Backport of #56423
2020-05-08 21:12:11 +03:00
David Roberts 9a3924a641
[ML] Adjust list of platforms that have ML native code (#56426)
Native code is now available for linux-aarch64.

Note that it is _not_ currently supported!
2020-05-08 16:22:45 +01:00
Dimitris Athanasiou c117ae7a6e
[7.x][ML] Force stopping stopped DF analytics should succeed (#56421) (#56424)
Force stopping a DF analytics job whose config exists and that
is stopped should succeed. This was broken by #56360.

Closes #56414

Backport of #56421
2020-05-08 18:04:24 +03:00
Tanguy Leroux 8e9b69bfd7
Use snapshot information to build searchable snapshot store MetadataSnapshot (#56289) (#56403)
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.

Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.

In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.

Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.

This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
2020-05-08 14:16:19 +02:00
Dimitris Athanasiou 60b1c67409
[7.x][ML] Allow stopping DF analytics whose config is missing (#56360) (#56408)
It is possible that the config document for a data frame
analytics job is deleted from the config index. If that is
the case the user is unable to stop a running job because
we attempt to retrieve the config and that will throw.

This commit changes that. When the request is forced,
we do not expand the requested ids based on the existing
configs but from the list of running tasks instead.

Backport of #56360
2020-05-08 13:54:44 +03:00
Dimitris Athanasiou d064eda2b0
[7.x][ML] Ensure phase progress may only increase (#56339) (#56357)
Due to multi-threading it is possible that phase progress
updates written from the c++ process arrive reordered.
We can address this by ensuring that progress may only increase.

Closes #56282

Backport of #56339
2020-05-07 19:46:58 +03:00
William Brafford 691044e67b
Add xpack setting deprecations to deprecation API (#56290)
* Add xpack setting deprecations to deprecation API

The deprecated settings showed up in the deprecation log file by
default, but I did not add them to the deprecation API. This commit
fixes that. Now if you use one of the deprecated basic feature
enablement settings, calling _monitoring/deprecations will inform you of
that fact.

* Remove incorrectly backported settings documents

It seems that I backported these docs to the wrong place in #56061,
in #55980, and in #56167. I hope they're in the right place now.

Co-authored-by: debadair <debadair@elastic.co>
2020-05-07 10:28:17 -04:00
Nik Everett e35919d3b8
Optimize date_histograms across daylight savings time (backport of #55559) (#56334)
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.

Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)

I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.

This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.

This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.

Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
2020-05-07 09:10:51 -04:00
Tanguy Leroux 6233e32ab3 Fix SearchableSnapshotDirectoryTests.testIndexSearcher() (#56275)
Closes #56233
2020-05-07 11:12:35 +02:00
Tanguy Leroux 65a061e33a Fix SearchableSnapshotDirectoryTests.testClearCache (#56277)
This test sometimes fails when prewarming is enabled because 
it's possible that some files are cached in background while the 
test tries to clear the cache. This commit disables prewarming 
for this test.
2020-05-07 10:59:33 +02:00
Andrei Stefan 980f175222
EQL: simplify equals/not-equals TRUE/FALSE expressions (#56191) (#56306)
* Simplify equals/not-equals TRUE/FALSE expressions, by returning them
as is (TRUE variant) or negating them (FALSE variant)

(cherry picked from commit 17858afbe6da5fa0b3ecfc537cabb337e4baaffe)
2020-05-07 03:02:04 +03:00
Jason Tedor 33669c0420
Upgrade to Jackson 2.10.4 (#56188)
Another Jackson release is available. There are some CVEs addressed,
none of which impact us, but since we can now bump Jackson easily, let
us move along with the train to avoid the false positives from security
scanners.
2020-05-06 17:20:23 -04:00
Przemysław Witek 0cd0ab276e
Introduce Annotation.Builder class and use it to create instances of Annotation class (#56276) (#56286) 2020-05-06 20:47:03 +02:00
Julie Tibshirani e852bb29b7
Simplify signature of FieldMapper#parseCreateField. (#56144)
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.

This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
2020-05-06 11:12:09 -07:00
Dimitris Athanasiou 011e995165
[7.x][ML] Unmute ClssificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange (#56268) (#56287)
Closes #56240
2020-05-06 18:20:46 +03:00
Luca Cavanna 9a9cb68e83 Async Search: correct shards counting (#55758)
Async search allows users to retrieve partial results for a running search. For partial results, the number of successful shards does not include the skipped shards, while the response returned to users should.

Also, we recently had a bug where async search would miss tracking shard failures, which would have been caught if we had assertions in place that verified that whenever we get the last response, the number of failures included in it is the same as the failures that were tracked through the listener notifications.
2020-05-06 12:13:30 +02:00
Tanguy Leroux 07ad742b60 Enable prewarming by default for searchable snapshots (#56201)
Now searchable snapshots directories respect the repository 
rate limitations (#55952) we can enable prewarming by default 
for shards.
2020-05-06 10:18:34 +02:00
Tanguy Leroux 131a3911eb Replace BlobContainerWrapper by FilterBlobContainer (#56200)
A FilterBlobContainer class was introduced in #55952 and it delegates
 its behavior to a given BlobContainer while allowing to override 
only necessary methods.

This commit replaces the existing BlobContainerWrapper class from 
the test framework with the new FilterBlobContainer from core.
2020-05-06 10:05:43 +02:00
Julie Tibshirani bd7a2d2b01 Mute the geogrid agg circuit breaker tests. 2020-05-05 18:09:07 -07:00
Jake Landis a22690c9ca
[7.x] Ensure that the monitoring export exceptions are logged. (#56237) (#56251)
If an exception occurs while flushing a bulk the cause of the exception
can be lost. This commit ensures that cause of the exception is carried
forward and gets logged.
2020-05-05 19:24:26 -05:00
Julie Tibshirani 49de092b38 Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet. 2020-05-05 16:25:36 -07:00
Bogdan Pintea 47250b14a4
SQL: Add BigDecimal support to JDBC (#56015) (#56220)
* SQL: Add BigDecimal support to JDBC (#56015)

* Introduce BigDecimal support to JDBC -- fetching

This commit adds support for the getBigDecimal() methods.

* Allow BigDecimal params in double range

A prepared statement will now accept a BigDecimal parameter as a proxy
for a double, if the conversion is lossless.

(cherry picked from commit e9a873ad7f387682e3472110b1d7c0514bd347c9)

* Fix compilation error

Dimond notation with anonymous inner classes not avail in Java8.
2020-05-05 23:19:36 +02:00
Bogdan Pintea f159fd8a20
Fix test on incompatible client versions (#56234) (#56241)
The incomatible client version test is changed to:
- iterate on all versions prior to the allowed one_s;
- format the exception message just as the server does it.

The defect stemed from the fact that the clients will not send a
version's qualifier, but just major.minor.revision, so the raised
error/exception_message won't contain it, while the test expected it.

(cherry picked from commit 4a81c8f7a1f4573e3be95f346d9fb18772b297ee)
2020-05-05 23:18:29 +02:00
Julie Tibshirani 63062ec7bd Mute ClassificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange. 2020-05-05 13:48:35 -07:00
Dan Hermann 6674f14fb3
[7.x] Get index includes parent data stream for backing indices (#56238) 2020-05-05 15:43:42 -05:00
Benjamin Trent e1c5ca421e
[7.x] [ML] lay ground work for handling >1 result indices (#55892) (#56192)
* [ML] lay ground work for handling >1 result indices (#55892)

This commit removes all but one reference to `getInitialResultsIndexName`. 
This is to support more than one result index for a single job.
2020-05-05 15:54:08 -04:00
Julie Tibshirani 793f265451 Mute SearchableSnapshotDirectoryTests.testIndexSearcher. 2020-05-05 12:29:05 -07:00
Ross Wolf 389082033e
EQL: Add concat function (#55193)
* EQL: Add concat function
* EQL: for loop spacing for concat
* EQL: return unresolved arguments to concat early
* EQL: Add concat integration tests
* EQL: Fix concat query fail test
* EQL: Add class for concat function testing
* EQL: Add concat integration tests
* EQL: Update concat() null behavior
2020-05-05 12:53:34 -06:00
Bogdan Pintea 23c35e32f2
SQL: introduce a query builder for the Rest tests (#55094) (#56221)
* Introduce a query builder for the rest tests

The new BaseRestSqlTestCase.RequestObjectBuilder class is a helper class
to build REST request objects for the tests. Consequently, "manual" string
concatenation to form JSON is done away with.

The class mimics SqlQueryRequestBuilder API.

(cherry picked from commit c8363f04c029542c233a758e9286d33c51d9c0c4)
2020-05-05 18:55:41 +02:00
Tal Levy e4f2c3105d
Add geo_shape support for geotile_grid and geohash_grid (#55966) (#56228)
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.

it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
2020-05-05 09:54:14 -07:00
Benjamin Trent 641f598364
[Transform] fixes http status code when bad scripts are provided (#56117) (#56219)
Transforms should propagate up the search execution exception if one is returned when it does the test query. 

this allows transforms to return a `4xx` when the aggs are malformed but parseable. 

closes https://github.com/elastic/elasticsearch/issues/55994
2020-05-05 12:36:22 -04:00
Bogdan Pintea 0e5632dc3a
SQL: relax version lock between server and clients (#56148) (#56223)
* Relax version lock between ES/SQL and clients

Allow older-than-server clients to connect, if these are past or on a
certain min release.

(cherry picked from commit 108f907297542ce649aa7304060aaf0a504eb699)
2020-05-05 18:27:06 +02:00
William Brafford 3499fa917c
Deprecated xpack "enable" settings should be no-ops (#55416) (#56167)
The following settings are now no-ops:

* xpack.flattened.enabled
* xpack.logstash.enabled
* xpack.rollup.enabled
* xpack.slm.enabled
* xpack.sql.enabled
* xpack.transform.enabled
* xpack.vectors.enabled

Since these settings no longer need to be checked, we can remove settings
parameters from a number of constructors and methods, and do so in this
commit.

We also update documentation to remove references to these settings.
2020-05-05 10:40:49 -04:00
Tanguy Leroux b9636713b1
Searchable Snapshots should respect max_restore_bytes_per_sec (#55952) (#56199)
This commit changes searchable snapshots so that it now respects the 
repository's max_restore_bytes_per_sec setting when it downloads blobs.

Backport of #55952 for 7.x
2020-05-05 15:43:06 +02:00
David Roberts 7aa0daaabd
[7.x][ML] More advanced model snapshot retention options (#56194)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of #56125
2020-05-05 14:31:58 +01:00
David Turner 40ea0eabd9 Forbid snapshot access on applier thread (#56044)
This commit strengthens the assertion about which threads may access a blob
store to exclude the cluster applier thread, since we no longer need to do so.

Relates #50999
2020-05-05 13:27:55 +01:00
Dimitris Athanasiou 2d7899c83c
[7.x][ML] Adjust DF Analytics process phases (#56107) (#56177)
As of elastic/ml-cpp#1179, the analytics process reports phases
depending on the analysis type. This commit adjusts the phases
of current analyses from `analyzing` to the following:

 - outlier_detection: [`computing_outlier`]
 - regression/classification: [`feature_selection`, `coarse_parameter_search`, `fine_tuning_parameters`, `final_training`]

Backport of #56107
2020-05-05 15:00:07 +03:00
Dimitris Athanasiou 75dadb7a6d
[7.x][ML] Add loss_function to regression (#56118) (#56187)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
2020-05-05 14:59:51 +03:00
Hendrik Muhs e177a38504
[7.x][Transform] add throttling (#56007) (#56184)
add throttling to transform, throttling will slow down search requests by
delaying the execution based on a documents per second metric.

fixes #54862
2020-05-05 13:09:02 +02:00
Marios Trivyzas 363e994171
SQL: Fix DATETIME_PARSE behaviour regarding timezones (#56158) (#56182)
Previously, when the timezone was missing from the datetime string
and the pattern, UTC was used, instead of the session defined timezone.
Moreover, if a timezone was included in the datetime string and the
pattern then this timezone was used. To have a consistent behaviour
the resulting datetime will always be converted to the session defined
timezone, e.g.:
```
SELECT DATETIME_PARSE('2020-05-04 10:20:30.123 +02:00', 'HH:mm:ss dd/MM/uuuu VV') AS datetime;
```
with `time_zone` set to `-03:00` will result in
```
2020-05-04T05:20:40.123-03:00
```

Follows: #54960
(cherry picked from commit 8810ed03a209cc8fe1bad309a81e85b56a39da27)
2020-05-05 12:08:39 +02:00
Tanguy Leroux f717830563
Use workers to warm cache parts (#55793) (#56181)
Today the cache prewarming introduced in #55322 works by 
enqueuing altogether the files parts to warm in the 
searchable_snapshots thread pool. In order to make this fairer
 among concurrent warmings, this commit starts workers that 
concurrently polls file parts to warm from a queue, warms the 
part and then immediately schedule another warming 
execution. This should leave more room for concurrent 
shard warming to sneak in and be executed.

Relates #55322
2020-05-05 11:48:06 +02:00
Tanguy Leroux 35622747fd
Add Minio tests for searchable snapshots (#56112) (#56179)
This commit adds QA tests for searchable snapshot on MinIO,
similarly to what already exist for S3, GCS and Azure.
2020-05-05 11:40:06 +02:00
Marios Trivyzas cc21468559
SQL: Fix issue with date range queries and timezone (#56115) (#56174)
Previously, the timezone parameter was not passed to the RangeQuery
and as a results queries that use the ES date math notation (now,
now-1d, now/d, now/h, now+2h, etc.) were using the UTC timezone and
not the one passed through the "timezone"/"time_zone" JDBC/REST params.
As a consequence, the date math defined dates were always considered in
UTC and possibly led to incorrect results for queries like:
```
SELECT * FROM t WHERE date BETWEEN now-1d/d AND now/d
```

Fixes: #56049
(cherry picked from commit 300f010c0b18ed0f10a41d5e1606466ba0a3088f)
2020-05-05 10:54:23 +02:00
Dimitris Athanasiou 6061aa3db4
[7.x][ML] Fix race condition updating reindexing progress (#56135) (#56146)
In #55763 I thought I could remove the flag that marks
reindexing was finished on a data frame analytics task.
However, that exposed a race condition. It is possible that
between updating reindexing progress to 100 because we
have called `DataFrameAnalyticsManager.startAnalytics()` and
a call to the _stats API which updates reindexing progress via the
method `DataFrameAnalyticsTask.updateReindexTaskProgress()` we
end up overwriting the 100 with a lower progress value.

This commit fixes this issue by bringing back the help of
a `isReindexingFinished` flag as it was prior to #55763.

Closes #56128

Backport of #56135
2020-05-05 10:48:42 +03:00
Albert Zaharovits e8763bad41
Let realms gracefully terminate the authN chain (#55623)
AuthN realms are ordered as a chain so that the credentials of a given
user are verified in succession. Upon the first successful verification,
the user is authenticated. Realms do however have the option to cut short
this iterative process, when the credentials don't verify and the user
cannot exist in any other realm. This mechanism is currently used by
the Reserved and the Kerberos realm.

This commit improves the early termination operation by allowing
realms to gracefully terminate authentication, as if the chain has been
tried out completely. Previously, early termination resulted in an
authentication error which varies the response body compared
to the failed authentication outcome where no realm could verify the
credentials successfully.

Reserved users are hence denied authentication in exactly the same
way as other users are when no realm can validate their credentials.
2020-05-05 10:11:49 +03:00
Martijn van Groningen 2ac32db607
Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151)
Backport of #56034.

Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.

Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.

Relates to #53100
2020-05-04 22:38:33 +02:00
Dan Hermann 9892813842
[7.x] Delay warning about missing x-pack (#56142)
* Delay warning about missing x-pack (#54265)

Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.

The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.

Closes #40898
2020-05-04 14:16:18 -05:00
Benjamin Trent 6c26de444d
[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020) (#56126)
If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes https://github.com/elastic/elasticsearch/issues/55985
2020-05-04 13:32:01 -04:00
Martijn van Groningen 6d03081560
Add auto create action (#56122)
Backport of #55858 to 7.x branch.

Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377

* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:

AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Relates to #53100
2020-05-04 19:10:09 +02:00
Julie Tibshirani 6b5cf1b031 For constant_keyword, make sure exists query handles missing values. (#55757)
It's possible for a constant_keyword to have a 'null' value before any documents
are seen that contain a value for the field. In this case, no documents have a
value for the field, and 'exists' queries should return no documents.
2020-05-04 09:41:52 -07:00
Ross Wolf 6da686c7e0
EQL: Add match function implementation (#55182)
* EQL: Add Match function
* EQL: Add note about character classes
* EQL: QueryFolderFailTests.java
* EQL: Add match() fail tests
* EQL: Add match tests and fix alias
* EQL: Add match verifier failure tests
* EQL: Reorder query folder fail tests
2020-05-04 09:34:20 -06:00
Dimitris Athanasiou 76fa5a2397
[7.x][ML] Improve cleanup for DF Analytics HLRC tests (#56101) (#56109)
Adds the step of stopping all data frame analytics before
deleting them to the cleanup of the corresponding HLRC tests.

Closes #56097

Backport of #56101
2020-05-04 16:08:08 +03:00
Andrei Stefan 5d1bc6c89c
EQL: reject queries that use a nested field or a sub-field of a nested field (#56108)
* Reject queries that act on nested fields or fields with nested field types in their hierarchy (#55721)

(cherry picked from commit 2a024461cd9da821112953d4c6e565ea622c678b)
2020-05-04 15:50:31 +03:00
Przemysław Witek 44f5a8ccd3
Use snapshot's latest result time rather than snapshot's creation time when creating an annotation (#56093) (#56103) 2020-05-04 12:36:12 +02:00
Christos Soulios c65f828cb7
[7.x] Histogram field type support for ValueCount and Avg aggregations (#56099)
Backports #55933 to 7.x

Implements value_count and avg aggregations over Histogram fields as discussed in #53285

- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
2020-05-04 13:23:02 +03:00
Armin Braun 0860d1dc74
Remove Dead Code in SLM Delete Handling (#56081) (#56098)
The delete response is always acknowledged. No need to handle anything else.
2020-05-04 12:22:06 +02:00
Armin Braun e01b999ef0
Add Functionality to Consistently Read RepositoryData For CS Updates (#55773) (#56091)
Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
2020-05-04 08:13:14 +02:00
Armin Braun 3a64ecb6bf
Allow Deleting Multiple Snapshots at Once (#55474) (#56083)
* Allow Deleting Multiple Snapshots at Once (#55474)

Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
2020-05-03 20:30:58 +02:00
William Brafford d53c941c41
Make xpack.monitoring.enabled setting a no-op (#55617) (#56061)
* Make xpack.monitoring.enabled setting a no-op

This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved
removing the setting from the setup for integration tests. Monitoring may
introduce some complexity for test setup and teardown, so we should keep an eye
out for turbulence and failures

* Docs for making deprecated setting a no-op
2020-05-01 16:42:11 -04:00
Andrei Stefan fbba65d8b3
SQL: SubSelect unresolved bugfix (#55956) (#56055)
* Resolve the missing refs only after the aggregate tree is resolved

(cherry picked from commit 10167b1cf2df6b074a1ba0c8e73c261ff9e9d1db)
2020-05-01 07:48:11 +03:00
Ryan Ernst 52b9d8d15e
Convert remaining license methods to isAllowed (#55908) (#55991)
This commit converts the remaining isXXXAllowed methods to instead of
use isAllowed with a Feature value. There are a couple other methods
that are static, as well as some licensed features that check the
license directly, but those will be dealt with in other followups.
2020-04-30 15:52:22 -07:00
Igor Motov d8f9df771d
Expose agg usage in Feature Usage API (#55732) (#56048)
Counts usage of the aggs and exposes them on the _nodes/usage/.

Closes #53746
2020-04-30 12:53:36 -04:00
Przemko Robakowski 797f63e743
[7.x] Emit deprecation warning if multiple v1 templates match with a new index (#55558) (#56038)
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)

* Emit deprecation warning if multiple v1 templates match with a new index

* DEPRECATION_LOGGER rename
2020-04-30 17:36:17 +02:00
Luca Cavanna fc6422ffcc Consolidate DelayableWriteable (#55932)
This commit includes a number of minor improvements around `DelayableWriteable`: javadocs were expanded and reworded, `get` was renamed to `expand` and `DelayableWriteable` no longer implements `Supplier`. Also a couple of methods are now private instead of package private.
2020-04-30 17:16:58 +02:00
Benjamin Trent c36bcb4dd0
[ML] fixing file structure finder multiline merge max for delimited formats (#56023) (#56035)
This commit correctly sets the maxLinesPerRow in the CsvPreference for delimited files given the file structure finder settings.

Previously, it was silently ignored.
2020-04-30 10:51:32 -04:00
Benjamin Trent 04b1f6498b
[ML] using new fixed interval in ml tests (#56021) (#56031)
This commit removes deprecated references to DateHistogram.interval from ml tests
2020-04-30 10:26:39 -04:00
Dimitris Athanasiou 17b904def5
[7.x][ML] Decouple DFA progress testing from analyses phases (#55925) (#56024)
This refactors native integ tests to assert progress without
expecting explicit phases for analyses. We can test those with
yaml tests in a single place.

Backport of #55925
2020-04-30 17:05:47 +03:00
William Brafford 273ff6a105
Make xpack.ilm.enabled setting a no-op (#55592) (#55980)
* Make xpack.ilm.enabled setting a no-op

* Add watcher setting to not use ILM

* Update documentation for no-op setting

* Remove NO_ILM ml index templates

* Remove unneeded setting from test setup

* Inline variable definitions for ML templates

* Use identical parameter names in templates

* New ILM/watcher setting falls back to old setting

* Add fallback unit test for watcher/ilm setting
2020-04-30 09:50:18 -04:00
David Kyle c204353249
[ML] Wait for model loaded and cached in ModelLoadingServiceTests (#56014)
Fixes test by exposing the method ModelLoadingService::addModelLoadedListener() 
so that the test class can be notified when a model is loaded which happens in
a background thread
2020-04-30 13:32:07 +01:00
Yang Wang 317d9fb88f
Remove synthetic role names of API keys as they confuse users (#56005) (#56011)
Synthetic role names of API keys add confusion to users. This happens to API responses as well as audit logs. The PR removes them for clarity.
2020-04-30 21:32:55 +10:00
Hendrik Muhs d3bcef2962
[7.x][Transform] implement throttling in indexer (#55011) (#56002)
implement throttling in async-indexer used by rollup and transform. The added
docs_per_second parameter is used to calculate a delay before the next
search request is send. With re-throttle its possible to change the parameter
at runtime. When stopping a running job, its ensured that despite throttling
the indexer stops in reasonable time. This change contains the groundwork, but
does not expose the new functionality.

relates #54862
backport: #55011
2020-04-30 11:20:35 +02:00
Ioannis Kakavas 3c7c9573b4
Fix PemKeyConfigTests (#55577) (#55996)
We were creating PemKeyConfig objects using different private
keys but always using testnode.crt certificate that uses the
RSA public key. The PemKeyConfig was built but we would
then later fail to handle SSL connections during the TLS
handshake eitherway.
This became obvious in FIPS tests where the consistency
checks that FIPS 140 mandates kick in and failed early
becausethe private key was of different type than the
public key
2020-04-30 12:05:27 +03:00
Yang Wang 84a2f1adf2
Resolve anonymous roles and deduplicate roles during authentication (#53453) (#55995)
Anonymous roles resolution and user role deduplication are now performed during authentication instead of authorization. The change ensures:

* If anonymous access is enabled, user will be able to see the anonymous roles added in the roles field in the /_security/_authenticate response.
* Any duplication in user roles are removed and will not show in the above authenticate response.
* In any other case, the response is unchanged.

It also introduces a behaviour change: the anonymous role resolution is now authentication node specific, previously it was authorization node specific. Details can be found at #47195 (comment)
2020-04-30 17:34:14 +10:00
Christos Soulios 43dab77186
[7.x] Modified searchAndReduce() to return empty agg when no docs exist (#55967)
Backports #55826 to 7.x

    Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted.

    Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase()

    Fixes #55824
2020-04-30 00:28:32 +03:00
jimczi 86ee8974d0 Revert "Mute failing tests in AsyncSearchActionIT"
This reverts commit 2fe4801ca1.
2020-04-29 22:22:21 +02:00
Mark Vieira 2fe4801ca1
Mute failing tests in AsyncSearchActionIT 2020-04-29 10:59:10 -07:00
Dimitris Athanasiou c5aa281171
[7.x][ML] Remove error on parsing progress for unknown phase in DFA (#55926) (#55954)
On second thought, this check does not seem to be adding value.
We can test that the phases are as we expect them for each analysis
by adding yaml tests. Those would fail if we introduce new phases
from c++ accidentally or without coordination. This would achieve
the same thing. At the same time we would not have to comment out
this code each time a new phase is introduced. Instead we can just
temporarily mute those yaml tests. Note I will add those tests
right after the imminent new phases are added to the c++ side.

Backport of #55926
2020-04-29 20:11:33 +03:00
Benjamin Trent edd049f9cd
[ML] Allow a certain number of ill-formatted rows when delimited format is specified (#55735) (#55944)
While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.).

This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format.

This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means.

related to https://github.com/elastic/elasticsearch/issues/38890
2020-04-29 11:15:21 -04:00
Jim Ferenczi 293c81dd59 Fix AsyncSearchActionIT#testTermsAggregation (#55924)
This commit fixes the initialization of total hits
in the async search response.

Relates #55683
Closes #55920
2020-04-29 15:44:10 +02:00
Jake Landis ae4d980c8c
[7.x] json spec - add description for autoscaling (#55748) (#55901) 2020-04-29 08:40:11 -05:00
Andrei Dan 6a0e1e161b
ILM stop step execution if writeIndex is false (#54805) (#55923)
(cherry picked from commit 47a9fd760f7bf2cc6cd778485dc057b6aaf07709)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-29 13:39:37 +01:00
Christos Soulios 02bf0c586a
[7.x] Histogram field type support for Sum aggregation (#55916)
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285

Backports #55681 to 7.x
2020-04-29 15:06:12 +03:00
David Roberts 6ad497bfda Muting AsyncSearchActionIT.testTermsAggregation
Due to https://github.com/elastic/elasticsearch/issues/55920
2020-04-29 12:34:47 +01:00
Dimitris Athanasiou d9685a0f19
[7.x][ML] Validate at least one feature is available for DF analytics (#55876) (#55914)
We were previously checking at least one supported field existed
when the _explain API was called. However, in the case of analyses
with required fields (e.g. regression) we were not accounting that
the dependent variable is not a feature and thus if the source index
only contains the dependent variable field there are no features to
train a model on.

This commit adds a validation that at least one feature is available
for analysis. Note that we also move that validation away from
`ExtractedFieldsDetector` and the _explain API and straight into
the _start API. The reason for doing this is to allow the user to use
the _explain API in order to understand why they would be seeing an
error like this one.

For example, the user might be using an index that has fields but
they are of unsupported types. If they start the job and get
an error that there are no features, they will wonder why that is.
Calling the _explain API will show them that all their fields are
unsupported. If the _explain API was failing instead, there would
be no way for the user to understand why all those fields are
ignored.

Closes #55593

Backport of #55876
2020-04-29 11:39:58 +03:00
David Roberts 61ac09ae21
[ML] Add daily_model_snapshot_retention_after_days to job config (#55891)
This change adds a new setting, daily_model_snapshot_retention_after_days,
to the anomaly detection job config.

Initially this has no effect, the effect will be added in a followup PR.
This PR gets the complexities of making changes that interact with BWC
over well before feature freeze.

Backport of #55878
2020-04-29 09:12:53 +01:00
Nik Everett a5d0409a8f
Save memory in on aggs in async search (#55683) (#55879)
This replaces a reference to the result of partially reducing
aggregations that async search keeps with a reference to the serialized
form of the result of the partial reduction which we need to keep
anyway.
2020-04-28 16:23:30 -04:00
Larry Gregory 47d252424b
Backport: Deprecate the kibana reserved user (#54967) (#55822) 2020-04-28 10:30:25 -04:00
Christos Soulios fae9ec13dd
Removed ValuesSourceRegistry.registerAny() (#55846)
* Backports #55747 to 7.x
* All ValuesSourceTypes must be registered
explicitly
* Removed lambdas in ValuesSourceRegistry
2020-04-28 15:44:42 +03:00
Adrien Grand 58c3bb5ae1
Repurpose `ignore_throttled` to be only about frozen indices. (#55047) (#55852)
This has no practical impact on users since frozen indices are the only
throttled indices today. However this has an impact on upcoming features
that would use search throttling.

Filtering out throttled indices made sense a couple years ago, but as
we're now improving support for slow requests with `_async_search` and
exploring ways to reduce storage costs, this feature has most likely
become a trap, that we'd like to not have with upcoming features that
would use search throttling.

Relates #54058
2020-04-28 14:31:54 +02:00
David Turner 3f2d10d8fc Permit searches to be concurrent to prewarming (#55795)
Today when prewarming a searchable snapshot we use the `SparseFileTracker` to
lock each (part of a) snapshotted blob, blocking any other readers from
accessing this data until the whole part is available.

This commit changes this strategy: instead we optimistically start to download
the blob without any locking, and then lock much smaller ranges after each
individual `read()` call. This may mean that some bytes are downloaded twice,
but reduces the time that other readers may need to wait before the data they
need is available.

As a best-effort optimisation we try to request the smallest possible single
range of missing bytes in the part by first checking how many of the initial
and terminal bytes of the part are already present in cache. In particular if
the part is already fully cached before prewarming then this check means we
skip the part entirely.
2020-04-28 10:44:05 +01:00
Tim Brooks 80662f31a1
Introduce mechanism to stub request handling (#55832)
Currently there is a clear mechanism to stub sending a request through
the transport. However, this is limited to testing exceptions on the
sender side. This commit reworks our transport related testing
infrastructure to allow stubbing request handling on the receiving side.
2020-04-27 16:57:15 -06:00
Tal Levy 6ba5148ead
Add geo_shape support for the geo_centroid aggregation (#55602) (#55819)
this commit leverages the new geo_shape doc values
to register a new geo_centroid aggregator that works
on geo_shape field.
2020-04-27 12:16:10 -07:00
Ioannis Kakavas ca5d677130
Mute-55816 (#55818)
See #55816
2020-04-27 21:26:02 +03:00
Hendrik Muhs 4b93f17b24 [Transform] improve TransformRestTestCase robustness (#55786)
handles/retries temporary SearchPhaseExecutionErrors

fixes #54810
2020-04-27 17:17:53 +02:00
Jake Landis 6f392cf5b9
[7.x] json spec - add description for searchable snapshots (#55746) (#55809) 2020-04-27 10:08:09 -05:00
Mark Tozzi 22a98ec279
Aggregation support for Value Scripts that change types (#54830) (#55752) 2020-04-27 09:57:05 -04:00
Dimitris Athanasiou abab4c4d4f
[7.x][ML] Do not fail DFA task when it's stopped whilst reindexing (#55797) (#55800)
Adding to #55659, we missed another way we could set the task to
failed due to task cancellation. CI revealed that we might also
get a `SearchPhaseExecutionException` whose cause is a
`TaskCancelledException`. That exception is not wrapped so
unwrapping it will not return the underlying `TaskCancelledException`.
Thus to be complete in catching this, we also need to check the
error's cause.

Closes #55068

Backport of #55797
2020-04-27 16:03:57 +03:00
Dimitris Athanasiou 7f100c1196
[7.x][ML] Allow analytics process define its own progress phases (#55763) (#55791)
This is a continuation from #55580.

Now that we're parsing phase progresses from the analytics process
we change `ProgressTracker` to allow for custom phases between
the `loading_data` and `writing_results` phases. Each `DataFrameAnalysis`
may declare its own phases.

This commit sets things in place for the analytics process to start
reporting different phases per analysis type. However, this is
still preserving existing behaviour as all analyses currently
declare a single `analyzing` phase.

Backport of #55763
2020-04-27 13:30:05 +03:00
Ioannis Kakavas d56f25acb4
Validate hashing algorithm in users tool (#55628) (#55734)
This change adds validation when running the users tool so that
if Elasticsearch is expected to run in a JVM that is configured to
be in FIPS 140 mode and the password hashing algorithm is not
compliant, we would throw an error.
Users tool uses the configuration from the node and this validation
would also happen upon node startup but users might be added in the
file realm before the node is started and we would have the
opportunity to notify the user of this misconfiguration.
The changes in #55544 make this much less probable to happen in 8
since the default algorithm will be compliant but this change can
act as a fallback in anycase and makes for a better user experience.
2020-04-27 12:23:41 +03:00
Ioannis Kakavas 38b55f06ba
Fix concurrent refresh of tokens (#55114) (#55733)
Our handling for concurrent refresh of access tokens suffered from
a race condition where:

1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since thread A still
hasn't finished writing the document.

This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff )
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately

It also adjusts TokenAuthIntegTests
to test authenticating with the tokens each thread receives,
which would fail without the fix.

Resolves: #54289
2020-04-27 12:23:17 +03:00
David Roberts 3ba44a5af8
[ML] Adding failed_category_count to model_size_stats (#55761)
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.

Backport of #55716
2020-04-25 10:36:49 +01:00
Aleksandr Maus ad54cca823
EQL: implement math functions: add, divide, module, multiply, subtract (#55137) (#55737)
* EQL: implement math functions: add, divide, module, multiply, subtract
2020-04-24 15:52:27 -04:00
James Rodewig c1b0548db0
[DOCS] Document EQL search REST API (#52384) 2020-04-24 15:36:01 -04:00
Nick Knize b0e8a8a4d1
[Backport] Refactor Spatial Field Mappers (#55696)
This commit refactors all spatial Field Mappers to a common
AbstractGeometryFieldMapper that implements shared parameter functionality
(e.g., ignore_malformed, ignore_z_value) and provides a common framework for
overriding type parsing, and building in xpack. Common shape functionality is
implemented in a new AbstractShapeGeometryFieldMapper that is reused and
overridden in GeoShapeFieldMapper, GeoShapeFieldMapperWithDocValues,
LegacyGeoShapeFieldMapper, and ShapeFieldMapper. This abstraction provides a
reusable foundation for adding new xpack features; such as coordinate reference
system support.
2020-04-24 14:05:16 -05:00
Mark Tozzi 87b4979c24
[7.x] Make ValuesSourceRegistry immutable after initilization #55493 (#55697) 2020-04-24 13:33:38 -04:00
Jason Tedor 22a8b60187
Reduce code duplication in CCR non-compliance tests
This commit removes some code duplication in the CCR non-compliance
tests by refactoring an assertion method so that it can be used in both
tests that are present there.
2020-04-24 13:24:56 -04:00
Tanguy Leroux 41ddbd4188 Allow to prewarm the cache for searchable snapshot shards (#55322)
Relates #50999
2020-04-24 18:03:34 +02:00
Dimitris Athanasiou 210b7f1b76
[7.x][ML] Remove parsing of old progress format in DF Analytics (#55711) (#55720)
Since #55580 we've introduced a new format for parsing progress
from the data frame analytics process. As the process is now
writing out progress in this new way, we can remove the parsing
of the old format.

Backport of #55711
2020-04-24 16:50:56 +03:00
David Turner aa9a2bce37 Avoid accidental contiguous read (#55713)
If we choose to read from two random positions that are 1024 bytes apart then
this counts as a contiguous read for stats purposes, failing this test. This
commit ensures that we always perform a non-contiguous read.
2020-04-24 11:50:31 +01:00
David Turner de30550aea Relax elapsed time stats assertion (#55710)
`SearchableSnapshotDirectoryStatsTests#testCachedBytesReadsAndWrites` asserts
that each write takes one clock tick, but we now permit concurrent reads and
writes so each write might take longer. This commit relaxes the assertion to
match.

Closes #55707
2020-04-24 10:21:08 +01:00
Przemysław Witek c89917c799
Register DFA jobs on putAnalytics rather than via a separate method (#55458) (#55708) 2020-04-24 10:59:32 +02:00
Dimitris Athanasiou b8379872a7
[7.x][ML] Logs error when DFA task is set to failed (#55545) (#55668)
Also unmutes the integ test that stops and restarts
an outlier detection job with the hope of learning more
of the failure in #55068.

Backport of #55545

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-24 11:06:07 +03:00
Jim Ferenczi 0a6c74b7d3 AsyncSearchMaintenanceService should stop when closing a node (#55651)
This change turns the AsyncSearchMaintenanceService into an
AbstractLifecycleComponent and ensures that the service is
stopped when a node is closing.

Closes #55646
2020-04-24 09:38:40 +02:00
Hendrik Muhs b213209f0c [Rollup] improve stopping tests (#55666)
improve tests related to stopping using a client that answers and can be
synchronized with the test thread in order to test special situations

relates #55011
2020-04-24 08:48:36 +02:00
Jay Modi 30f8c326fe
Test: fix SSLReloadDuringStartupIntegTests (#55637)
This commit fixes reproducible test failures with the
SSLReloadDuringStartupIntegTests on the 7.x branch. The failures only
occur on 7.x due to the existence of the transport client and its usage
in our test infrastructure. This change removes the randomized usage of
transport clients when retrieving a client from a node in the internal
cluster. Transport clients do not support the reloading of files for
TLS configuration changes but if we build one from the nodes settings
and attempt to use it after the files have been changed, the client
will not know about the changes and the TLS connection will fail.

Closes #55524
2020-04-23 21:36:43 -06:00
Ryan Ernst 97c4b64fb1
Add isAllowed license utility (#55424) (#55700)
License state is currently made up of boolean methods that check whether
a particular feature is allowed by the current license state. Each new
feature must copy/past boiler plate code. While that has gotten easier
with utilities like isAllowedByLicense, this is still more cumbersome
than should be necessary. This commit adds a general purpose isAllowed
method which takes a new Feature enum, where each value of the enum
defines the minimum license mode and whether the license must be active
to be allowed. Only security features are converted in this PR, in order
to keep the commit size relatively small. The rest of the features will
be converted in a followup.
2020-04-23 16:28:28 -07:00
Zachary Tong 715c90bf7d Aggs must specify a `field` or `script` (or both) (#52226)
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user.  This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
2020-04-23 19:23:41 -04:00
jimczi c857adf603 Fix AsyncSearchTaskTests#testWithFetchFailures
Fix usage of a possible invalid random range [1, 0].

Relates #55688
2020-04-24 00:45:17 +02:00
Jim Ferenczi 31d1727698 Fix (de)serialization of async search failures (#55688)
The (de)serialization code of the async search response
cannot handle exceptions that extend ElasticsearchException (e.g. ScriptException).
This commit fixes this bug by serializing the error with the more generic
StreamInput#writeException.
2020-04-24 00:44:43 +02:00
Igor Motov 8c7ef2417f
Make AsyncSearchIndexService reusable (#55598)
EQL will require very similar functionality to async search. This PR refactors
AsyncSearchIndexService to make it reusable for EQL.

Supersedes #55119
Relates to #49638
2020-04-23 18:02:17 -04:00
Nick Knize 96a02089c2
Refactor GeoShape DocValues in spatial xpack (#55691)
This commit refactors geo_shape doc values, fielddata, and utility classes from
the single mapper package in x-pack spatial plugin to a package structure that
is consistent with the server module.
2020-04-23 15:32:23 -05:00
David Roberts 46be9959a0
[ML] Audit when unassigned datafeeds are stopped (#55667)
Previously audit messages were indexed when datafeeds that were
assigned to a node were stopped, but not datafeeds that were
unassigned at the time they were stopped.

This change adds auditing for the unassigned case.

Backport of #55656
2020-04-23 20:46:35 +01:00
Dan Hermann dd5c96c2ed
[7.x] Rollover for data streams 2020-04-23 12:04:34 -05:00
Zachary Tong 4f483ac370 Fix half-float range in SupportedTypeTests (#55409)
Also adds a comment to the half-float number field type tests indicating
why 70000 is used instead of 65504
2020-04-23 11:36:37 -04:00
Dimitris Athanasiou 4b11adf074
[7.x][ML] Do not fail DFA task that is stopped during reindexing (#55659) (#55663)
While we were catching `TaskCancelledException` while we wait for
reindexing to complete, we missed the fact that this exception
may be wrapped in a multi-node cluster. This is the reason
we may still fail the task when stop is called while reindexing.

Some times we're lucky and the exception is thrown by the same
node that runs the job. Then the exception is not wrapped and
things work fine. But when that is not the case the exception is
wrapped, we fail to catch it, and set the task to failed.

The fix is to simply unwrap the exception when we check it it
is `TaskCancelledException`.

Closes #55068

Backport of #55659
2020-04-23 15:57:01 +03:00
Tanguy Leroux 8669766a81 Reduce contention in CacheFile.fileLock() method (#55662)
The CacheFile.fileLock() method is used to acquire a lock 
on a cache file so that the file can't be deleted (or its file 
handle closed) during the execution of a read or a write 
operation.

Today this lock is obtained by first acquiring the eviction 
lock (the write lock of the readwrite lock), then by checking 
if the cache file is evicted and the file channel still open, 
and finally by obtaining the file lock (the read lock of the 
readwrite lock). Acquiring the read lock while the eviction 
lock is held ensures that the cache file eviction cannot 
start in the meanwhile. But eviction starts (and terminations) 
also acquire the eviction lock; and this lock cannot be 
obtained while a read lock is held (the write lock of a 
readwrite lock is exclusive).

If we were acquiring a read lock and checking the eviction 
flag and file channel existence while holding the read lock 
we know that no eviction can start or finish until the 
read lock is released.
2020-04-23 14:40:27 +02:00
Rory Hunter d66af46724
Always use deprecateAndMaybeLog for deprecation warnings (#55319)
Backport of #55115.

Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
2020-04-23 09:20:54 +01:00
David Roberts 87f4751eca [ML] Make find_file_structure recognize Kibana CSV report timestamps (#55609)
The Kibana CSV export feature uses a non-standard timestamp format.
This change adds it to the formats the find_file_structure endpoint
recognizes out-of-the-box, to make round-tripping data from Kibana
back to Kibana via CSV files easier.

Fixes #55586
2020-04-23 08:39:07 +01:00
Jake Landis 25ea6a74f0
[7.x] Validate REST specs against schema (#55117) (#55563)
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.

The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.

An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.

Closes #54314
2020-04-22 14:14:03 -05:00
Albert Zaharovits 82ed0ab420
Update the audit logfile list of system users (#55578)
Out of the box "access granted" audit events are not logged
for system users. The list of system users was stale and included
only the _system and _xpack users. This commit expands this list
with _xpack_security and _async_search, effectively reducing the
auditing noise by not logging the audit events of these system
users out of the box.

Closes #37924
2020-04-22 21:59:31 +03:00
Tal Levy c370b83bd7
Fix locale lowercase test issue in GenerateSnapshotNameStepTests (#55597) (#55605)
The testPerformAction test has been failing periodically due to
how Hamcrest's containsStringIgnoringCase does not lowercase using
the same Locale set in the test infrastructure.

This commit falls back to explicitly lowercasing using the root
locale
2020-04-22 11:29:57 -07:00
Tal Levy f27ce69f0c
[backport] Add geo_bounds aggregation support for geo_shape (#55328) (#55600)
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
2020-04-22 11:29:35 -07:00
Tal Levy 0844455505
Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037) (#55500)
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.

This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
2020-04-22 08:12:54 -07:00
Dimitris Athanasiou 50a5afed15
[7.x][ML] Prepare parsing phase_progress from DFA process (#55580) (#55587)
Data frame analytics process currently reports progress as
an integer `progress_percent`. We parse that and report it
from the _stats API as the progress of the `analyzing` phase.
However, we want to allow the DFA process to report progress
for more than one phase. This commit prepares for this by
parsing `phase_progress` from the process, an object that
contains the `phase` name plus the `progress_percent` for that
phase.

Backport of #55580
2020-04-22 16:38:32 +03:00
Benjamin Trent 7c81cd7833
[ML] explicitly disallow partial results in datafeed extractors (#55537) (#55585)
Instead of doing our own checks against REST status, shard counts, and shard failures, this commit changes all our extractor search requests to set `.setAllowPartialSearchResults(false)`.

- Scrolls are automatically cleared when a search failure occurs with `.setAllowPartialSearchResults(false)` set.
- Code error handling is simplified

closes https://github.com/elastic/elasticsearch/issues/40793
2020-04-22 09:07:44 -04:00
David Roberts 810caf5ffe
[ML] Test that audit message is written when closing unassigned job (#55582)
Issue #55521 suggested that audit messages were not written when
closing an unassigned job.  This is not the case, but we didn't
have a test to prove it.

Backport of #55571
2020-04-22 13:23:43 +01:00
David Roberts 2dc5586afe
[ML] Add effective max model memory limit to ML info (#55581)
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Backport of #55529
2020-04-22 12:28:50 +01:00
David Roberts da5aeb8be7
[ML] Return assigned node in start/open job/datafeed response (#55570)
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Backport of #55473
2020-04-22 12:06:53 +01:00
David Kyle e99ef3542c Mute ModelLoadingServiceTests::testMaxCachedLimitReached 2020-04-22 11:53:07 +01:00
Tim Vernum 8b566aea47
Fix use of password protected PKCS#8 keys for SSL (#55567)
PEMUtils would incorrectly fill the encryption password with zeros
(the '\0' character) after decrypting a PKCS#8 key.

Since PEMUtils did not take ownership of this password it should not
zero it out because it does not know whether the caller will use that
password array again. This is actually what PEMKeyConfig does - it
uses the key encryption password as the password for the ephemeral
keystore that it creates in order to build a KeyManager.

Backport of: #55457
2020-04-22 16:38:51 +10:00
Yang Wang 32e46bf552
Fix certutil http for empty password with JDK 11 and lower (#55437) (#55565)
Fix elasticseaerch-certutil http command so that it correctly accepts empty keystore password with JDK version 11 and lower.
2020-04-22 15:03:10 +10:00
David Kyle 8e8c6b4aee
Fix accounting in ModelLoadingServiceTests (#55307) (#55547)
In the test after the first load event is is not known which models are cached as 
loading a later one will evict an earlier one and the order is not known.
The models could have been loaded 1 or 2 times not exactly twice
2020-04-21 19:25:06 +01:00
Armin Braun db7eb8e8ff
Remove Redundant CS Update on Snapshot Finalization (#55276) (#55528)
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
2020-04-21 15:33:17 +02:00
David Turner be60d50452 Allow searching of snapshot taken while indexing (#55511)
Today a read-only engine requires a complete history of operations, in the
sense that its local checkpoint must equal its maximum sequence number. This is
a valid check for read-only engines that were obtained by closing an index
since closing an index waits for all in-flight operations to complete. However
a snapshot may not have this property if it was taken while indexing was
ongoing, but that's ok.

This commit weakens the check for a complete history to exclude the case of a
searchable snapshot.

Relates #50999
2020-04-21 13:21:38 +01:00
Ignacio Vera e4c65b4388
mute test SSLReloadDuringStartupIntegTests.testReloadDuringStartup (#55525) 2020-04-21 14:13:13 +02:00
Jim Ferenczi 0b3bdfcc3e Fix expiration time in async search response (#55435)
This change ensures that we return the latest expiration time
when retrieving the response from the index.
This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.
2020-04-21 14:04:29 +02:00
Przemysław Witek 59d377462f
Apply default timeout in StopDataFrameAnalyticsAction.Request (#55512) (#55517) 2020-04-21 13:05:48 +02:00
Nhat Nguyen 3cc4e0dd09 Retry follow task when remote connection queue full (#55314)
If more than 100 shard-follow tasks are trying to connect to the remote 
cluster, then some of them will abort with "connect listener queue is 
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.
2020-04-20 22:43:05 -04:00
Stuart Tettemer 93a2e9b0f9
Test: MockScoreScript can be cacheable. (#55499)
Backport: 0ed1eb5
2020-04-20 17:09:58 -06:00
Benjamin Trent cabff65aec
[ML] Fixing inference stats race condition (#55163) (#55486)
`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.

To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.

When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.

This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.

I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).

closes  https://github.com/elastic/elasticsearch/issues/54786
2020-04-20 16:21:18 -04:00
Benjamin Trent 24d41eb695
[ML] partitions model definitions into chunks (#55260) (#55484)
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.

This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.

I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap.
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents.
Supporting models 20 times this size (compressed) seems adequate for now.
2020-04-20 16:08:54 -04:00
Benjamin Trent fa0373a19f
[7.x] [ML] Fix log spam and disable ILM/SLM history for native ML tests (#55475)
* [ML] fix native ML test log spam (#55459)

This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors.

* removing unnecessary changes

* removing unused imports

* removing unnecessary java setting
2020-04-20 15:41:30 -04:00
Lee Hinman 9eddd2bcc9
[7.x] Add prefer_v2_templates flag and index setting (#55411) (#55476)
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover

These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.

Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.

This also adds support for sending this parameter to the High Level Rest Client.

Relates to #53101
2020-04-20 12:05:42 -06:00
Armin Braun a0763d958d
Make RepositoryData Less Memory Heavy (#55293) (#55468)
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
2020-04-20 18:28:06 +02:00
William Brafford 7817948926 Disable monitoring in ML multinode tests (#55461)
Removing the deprecated "xpack.monitoring.enabled" setting introduced
log spam and potentially some failures in ML tests. It's possible to use
a different, non-deprecated setting to disable monitoring, so we do that
here.
2020-04-20 10:51:16 -04:00
David Turner 0df329dde7 Use soft deletes for searchable snapshots tests (#55453)
This allows us to perform some dummy indexing including updates/deletes.
2020-04-20 14:37:51 +01:00
Przemysław Witek 7d5f74e964
Fix and unmute testSetUpgradeMode_ExistingTaskGetsUnassigned (#55368) (#55452) 2020-04-20 13:29:29 +02:00
Yannick Welsch b9da307cd1 Add GCS support for searchable snapshots (#55403)
Adds ranged read support for GCS repositories in order to enable searchable snapshot support
for GCS.

As part of this PR, I've extracted some of the test infrastructure to make sure that
GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering
similar test (as I saw those diverging in what they cover)
2020-04-20 13:02:59 +02:00
Jason Tedor 9ecb222bfa
Remove unneeded validation in feature set usage
This validation is not needed, as we have discovered the source of the
serialization error that was leading to some usage instances appearing
to not have a name.
2020-04-18 14:29:59 -04:00
Jason Tedor 23049391be
Upgrade feature aware check usage of ASM to 7.3.1 (#54577)
This commit upgrades the ASM dependency used in the feature aware check
to 7.3.1. This gives support for JDK 14. Additionally, now that Gradle
understands JDK 13, it means we can remove a restriction on running the
feature aware check to JDK 12 and lower.
2020-04-18 10:49:57 -04:00
Jay Modi 405ff0ce27
Handle TLS file updates during startup (#55330)
This change reworks the loading and monitoring of files that are used
for the construction of SSLContexts so that updates to these files are
not lost if the updates occur during startup. Previously, the
SSLService would parse the settings, build the SSLConfiguration
objects, and construct the SSLContexts prior to the
SSLConfigurationReloader starting to monitor these files for changes.
This allowed for a small window where updates to these files may never
be observed until the node restarted.

To remove the potential miss of a change to these files, the code now
parses the settings and builds SSLConfiguration instances prior to the
construction of the SSLService. The files back the SSLConfiguration
instances are then registered for monitoring and finally the SSLService
is constructed from the previously parse SSLConfiguration instances. As
the SSLService is not constructed when the code starts monitoring the
files for changes, a CompleteableFuture is used to obtain a reference
to the SSLService; this allows for construction of the SSLService to
complete and ensures that we do not miss any file updates during the
construction of the SSLService.

While working on this change, the SSLConfigurationReloader was also
refactored to reflect how it is currently used. When the
SSLConfigurationReloader was originally written the files that it
monitored could change during runtime. This is no longer the case as
we stopped the monitoring of files that back dynamic SSLContext
instances. In order to support the ability for items to change during
runtime, the class made use of concurrent data structures. The use of
these concurrent datastructures has been removed.

Closes #54867
Backport of #54999
2020-04-17 20:10:33 -06:00
Zachary Tong f46b567563 Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250)
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).

This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class.  Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
2020-04-17 16:39:38 -04:00
Ryan Ernst 66071b2f6e
Remove combo security and license helper from license state (#55366) (#55417)
Security features in the license state currently do a dynamic check on
whether security is enabled. This is because the license level can
change the default security enabled state. This commit splits out the
check on security being enabled, so that the combo method of security
enabled plus license allowed is no longer necessary.
2020-04-17 13:07:02 -07:00
William Brafford 49e30b15a2
Deprecate disabling basic-license features (#54816) (#55405)
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.

This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
2020-04-17 15:04:17 -04:00
Benjamin Trent 4be3663968
[7.x] [ML] fix bugs with prediction field value settings (#55333) (#55394)
* [ML] fix bugs with prediction field value settings (#55333)

This fixes two unreleased bugs:

1. Prediction value type of `number` might show unexpected classes

Analytics created models may have class labels like `1, 5, 10` (or some collection of discrete, whole numbers). These labels are passed to the inference model config in the `classification_labels` field.

When the predicted value format is `numeric` it should attempt to see if the classification labels are provided and are numeric. If so, use those. If not, use the underlying value.

2. When supplying an update overwrite, inference was losing the default prediction field value. This is because it was not copied over in the copy ctor in the ClassificationConfig.Builder class. 

closes #55332
2020-04-17 14:45:02 -04:00
Jake Landis eb30cf5c89
[7.x] Move Watcher config out of RestResourcesPlugin (#55136) (#55336) 2020-04-17 12:38:01 -05:00
Benjamin Trent 8c581c3388
[ML] fixing and unmuting testHRDSplit test (#55349) (#55393)
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). 

The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. 

closes https://github.com/elastic/elasticsearch/issues/32966
2020-04-17 09:55:52 -04:00
Tanguy Leroux eb52df6652 Mute GraphTests.testTimedoutQueryCrawl (#55397)
Relates #55396
Relates #53913
2020-04-17 15:31:48 +02:00
Benjamin Trent 65e0084120
[ML] do not start stopping tasks on reassignment (#55315) (#55388)
When a anomaly jobs, datafeeds, and analytics tasks are stopped, they enter an ephemeral state called `STOPPING`. 

If the node executing the task fails while this is occurring, they could be stuck in the limbo state of `STOPPING`. It is best to mark the tasks as completed if they get reassigned to a node.
2020-04-17 08:57:12 -04:00
Costin Leau fc6261967b SQL: Streamline declaration of LeafAggs (#55380)
Avoid repetition of the aggregation builder setup

Relates #55241

(cherry picked from commit 6cfe130e5da4aac11bad64f187fecc411139f5e2)
2020-04-17 15:04:54 +03:00
markharwood 7761b01a33
Remove normalizer support from wildcard field while we decide on approach for handling case insensitvity (#55294) (#55375)
Closes #55288
2020-04-17 11:43:26 +01:00
Marios Trivyzas f958e9abdc
SQL: Implement scripting inside aggs (#55241) (#55371)
Implement the use of scalar functions inside aggregate functions.
This allows for complex expressions inside aggregations, with or without
GROUBY as well as with or without a HAVING clause. e.g.:

```
SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b
FROM test
GROUP BY b
HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
```

Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as
this is currently not implemented on the ElasticSearch side.

Fixes: #29980
Fixes: #36865
Fixes: #37271

(cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)
2020-04-17 12:41:22 +02:00
Tanguy Leroux 71855fbfe0 Mute testSupportedFieldTypes in HDRPreAggregatedPercentile tests (#55369)
Relates #55360
2020-04-17 10:49:43 +02:00
Martijn van Groningen 417d5f2009
Make data streams in APIs resolvable. (#55337)
Backport from: #54726

The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.

In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.

Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).

Relates to #53100
2020-04-17 08:33:37 +02:00
Jason Tedor 9a9c1a721c
Add validation to feature set usage name (#55350)
We do not validate the name is not null, and not empty. Even though it
never should be, we had a build failure where it appears that somehow
this did happen. We add some validation here, in case this really is
happening, we will have a more clear indication where this is coming
from, and of course, validation that name fits the implicit assumptions
that it is not null and not empty.
2020-04-16 18:16:53 -04:00
Mark Tozzi 22c55180c1
[7.x] Backport ValuesSourceRegistry and related work (#54922)
* Add ValuesSource Registry and associated logic (#54281)

* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)

* ValuesSourceRegistry Prototype (#48758)

* Remove generics from ValuesSource related classes (#49606)

* fix percentile aggregation tests (#50712)

* Basic thread safety for ValuesSourceRegistry (#50340)

* Remove target value type from ValuesSourceAggregationBuilder (#49943)

* Cleanup default values source type (#50992)

* CoreValuesSourceType no longer implements Writable (#51276)

* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)

* Put values source types on fields (#51503)

* Remove VST Any (#51539)

* Rewire terms agg to use new VS registry (#51182)

Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)

* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)

* Wire Percentiles aggregator into new VS framework (#51639)

This required a bit of a refactor to percentiles itself.  Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory.  This doesn't work (or at least, would be
difficult) in the new VS framework.

This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory.  This object
is then used when deciding which kind of aggregator to create

Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.

* Remove generics and target value type from MultiVSAB (#51647)

* fix checkstyle after merge (#52008)

* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)

* Convert RareTerms to new VS registry (#52166)

* Wire up Value Count (#52225)

* Wire up Max & Min aggregations (#52219)

* ValuesSource refactoring: Wire up Sum aggregation (#52571)

* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)

* Soft immutability for VSConfig (#52729)

* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)

Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)

* VS refactoring: Wire up stats aggregation (#52891)

* ValuesSource refactoring: Wire up string_stats aggregation (#52875)

* VS refactoring: Wire up median (MAD) aggregation (#52945)

* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java

this commit implements `getValuesSourceType` for
the ConstantKeyword field type.

master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.

* ValuesSource refactoring: Wire up Avg aggregation (#52752)

* Wire PercentileRanks aggregator into new VS framework  (#51693)

* Add a VSConfig resolver for aggregations not using the registry (#53038)

* Vs refactor wire up ranges and date ranges (#52918)

* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)

This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry

relates #42949.

* VS refactoring: convert Boxplot to new registry (#53132)

* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)

This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry

relates to the values-source refactoring meta issue #42949.

* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)

This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.

relates to the values-source refactoring meta issue #42949.

* Fix type tests for Missing aggregation (#53501)

* ValuesSource Refactor: move histo VSType into XPack module (#53298)

- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs

* Wire up DateHistogram to the ValuesSourceRegistry (#53484)

* Vs refactor parser cleanup (#53198)

Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>

* First batch of easy fixes

* Remove List.of from ValuesSourceRegistry

Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.

* More compiler fixes

* More compiler fixes

* More compiler fixes

* Precommit is happy and so am I

* Add new Core VSTs to tests

* Disabled supported type test on SigTerms until we can backport it's fix

* fix checkstyle

* Fix test failure from semantic merge issue

* Fix some metaData->metadata replacements that got lost

* Fix list of supported types for MinAggregator

* Fix list of supported types for Avg

* remove unused import

Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
2020-04-16 16:54:46 -04:00
Marios Trivyzas 8abdf7c7d3
SQL: Fix ODBC metadata for DATE & TIME data types (#55316) (#55345)
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: #41086
(cherry picked from commit c23677cd2955e25bb952c8e7ff8ca3151ee0df98)
2020-04-16 22:41:39 +02:00
Ioannis Kakavas b27f23a80d
Rest spec and documentation (#54664) (#55305)
This change adds the spec for the new REST APIs that we
introduce for the IDP and documentation for each of the APIs. The
documentation pages are intentionally not included in the API
reference so as to minimize unnecessary exposure.

supersedes: #53858
2020-04-16 20:18:05 +03:00
Lee Hinman 8b7bdae6cb
Ensure error handler is called during SLM retention callback failure (#55252) (#55321)
When retrieving the snapshots for a set of repos or deleting a single snapshot, it's possible for
the body of the `ActionListener`'s `onResponse` method to throw an Exception. In this case, the
`errHandler` passed in may not be executed, resulting in the `running` boolean not being reset back
to false.

This commit uses `ActionListener.wrap(...)` instead of creating a new ActionListener, which ensures
that if the `onResponse` fails in any way, the `onFailure` handler is still called.

Resolves #55217
2020-04-16 10:50:15 -06:00
David Turner 7941f4a47e Add RepositoriesService to createComponents() args (#54814)
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.

After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
2020-04-16 16:27:36 +01:00
Marios Trivyzas 327d268673
SQL: [Test] Add test for a fixed bug for string scalars on aggs (#55304) (#55309)
Added an integration test to validate behaviour of string scalars on top
of aggregate functions. The behaviour was fixed with #49570.

Relates to: #41597

(cherry picked from commit 35f964154850e3f02b6c7f9ca238da98ad83ebb3)
2020-04-16 16:41:54 +02:00
Benjamin Trent 2b68aa3471
muting test for issue 55068 (#55312) 2020-04-16 10:32:12 -04:00
David Kyle 643ecf68b5
Remove InferenceConfigUpdate generic parameter (#55249) (#55301)
Simplify the code by removing the generic type from InferenceConfigUpdate which 
meant wildcard types were used in many places. Instead check the class type is
appropriate where used.
2020-04-16 13:44:53 +01:00
Ioannis Kakavas ac87c10039
[7.x] Fix responses for the token APIs (#54532) (#55278)
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
 and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a  `401` with "token
malformed" message.

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
   ```
   {
     "invalidated_tokens":0,
     "previously_invalidated_tokens":0,
      "error_count":0
   }
   ```
   The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).

- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.

- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.

Resolves: #53323
2020-04-16 14:05:55 +03:00
David Roberts 8489f8c121
[ML] Add test to prove categorization state written after lookback (#55297)
When a datafeed transitions from lookback to real-time we request
that state is persisted from the autodetect process in the
background.

This PR adds a test to prove that for a categorization job the
state that is persisted includes the categorization state.
Without the fix from elastic/ml-cpp#1137 this test fails.  After
that C++ fix is merged this test should pass.

Backport of #55243
2020-04-16 11:55:18 +01:00
Dimitris Athanasiou 6c9e1fecc5
[7.x][ML] Mark task as completed when DFA job is stopped while reindexing (#55286) (#55290)
After #54650 we catch `TaskCancelledException` when we wait for
reindexing to complete as it may be thrown. However, when that happens
we do not mark the task as completed. This results in the stop request
never returning and the failures we saw in #55068.

Closes #55068

Backport of #55286
2020-04-16 13:08:54 +03:00
David Roberts ac11dd619c
Only ship Linux binaries for the correct architecture (#55280)
Following elastic/ml-cpp#1135 there are now Linux binaries
for both x86_64 and aarch64.  The code that finds the
correct binaries to ship with each distribution was
including both on every Linux distribution.  This change
alters that logic to consider the architecture as well
as the operating system.

Also, there is no need to disable ML on aarch64 now that
we have the native binaries available.  ML is still not
supported on aarch64, but the processes at least run up
and work at a superficial level.

Backport of #55256
2020-04-16 09:45:52 +01:00
David Roberts 5de6ddfef2 Mute ClassificationIT.testSetUpgradeMode_ExistingTaskGetsUnassigned
Due to https://github.com/elastic/elasticsearch/issues/55221
2020-04-16 09:03:46 +01:00
Jay Modi 2d9e3c7794
Start resource watcher service early (#55275)
The ResourceWatcherService enables watching of files for modifications
and deletions. During startup various consumers register the files that
should be watched by this service. There is behavior that might be
unexpected in that the service may not start polling until later in the
startup process due to the use of lifecycle states to control when the
service actually starts the jobs to monitor resources. This change
removes this unexpected behavior so that upon construction the service
has already registered its tasks to poll resources for changes. In
making this modification, the service no longer extends
AbstractLifecycleComponent and instead implements the Closeable
interface so that the polling jobs can be terminated when the service
is no longer required.

Relates #54867
Backport of #54993
2020-04-15 20:45:39 -06:00
Jason Tedor cad1a3b0ad
Fix imports in CCRFeatureSet
This commit fixes some imports that were mixed up during a
backport. Because, backports.
2020-04-15 19:37:25 -04:00
Jason Tedor a18faacf1b
Make feature usage version aware (#55246)
Today we indiscriminately serialize these independent of the version on
the stream, even though the other side might not understand a new
feature set usage that we have added. For example, if we add feature set
usage in 7.7 for EQL, in a mixed cluster context if a request is sent to
an old coordinating node, but the master is a new version, then it would
attempt to serialize the usage information for the new feature back to
the old coordinating node, who will blow up on the unrecognized named
writeable. This commit addresses this by making feature usage version
aware, and only serializing those that the other side would understand.
2020-04-15 19:24:47 -04:00
William Brafford 2ba3be9db6
Remove deprecated third-party methods from tests (#55255) (#55269)
I've noticed that a lot of our tests are using deprecated static methods
from the Hamcrest matchers. While this is not a big deal in any
objective sense, it seems like a small good thing to reduce compilation
warnings and be ready for a new release of the matcher library if we
need to upgrade. I've also switched a few other methods in tests that
have drop-in replacements.
2020-04-15 17:54:47 -04:00
Ryan Ernst 29b70733ae
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:27:53 -07:00
Henning Andersen b3eb57a094 CCR: Test follow on top of closed index (#54956)
Added testing of following on top of a closed index.
This could for instance be the old leader index in
cases where leader and follower clusters have been
swapped.
2020-04-15 20:13:32 +02:00
Mark Vieira 5d4bc8aea6
Mute ModelLoadingServiceTests.testMaxCachedLimitReached 2020-04-15 10:25:51 -07:00
Ignacio Vera a677b63daa
Upgrade to lucene 8.5.1 release (#55229) (#55235)
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
2020-04-15 17:35:42 +02:00
Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00
Armin Braun 2f91e2aab7
Fix Race in Snapshot Abort (#54873) (#55233)
We can be a little more efficient when aborting a snapshot. Since we know the new repository
data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners.
This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.
2020-04-15 15:42:15 +02:00
Przemysław Witek b5fe565c89
Add log line that will help debug item failures during multi search request (#55220) (#55227) 2020-04-15 15:01:17 +02:00
Hendrik Muhs 9ec9866acb [Transform] simplify TransformConfigUpdate (#55224)
removes the unnecessary ToXContent method in TransformConfigUpdate
2020-04-15 13:22:50 +02:00
Yannick Welsch d1123281b1 Use unlimited cache size by default (#55218)
Sets the default cache size for searchable snapshots to unlimited, which, for testing purposes,
is a better default than the 1GB that we currently have.
2020-04-15 12:20:51 +02:00
David Kyle bdf0eab78d
[7.x] Fix non-deterministic behaviour in ModelLoadingServiceTests (#55008) (#55213) 2020-04-15 11:09:12 +01:00
Ioannis Kakavas 0f51934bcf
[7.x] Add support for more named curves (#55179) (#55211)
We implicitly only supported the prime256v1 ( aka secp256r1 )
curve for the EC keys we read as PEM files to be used in any
SSL Context. We would not fail when trying to read a key
pair using a different curve but we would silently assume
that it was using `secp256r1` which would lead to strange
TLS handshake issues if the curve was actually another one.

This commit fixes that behavior in that it
supports parsing EC keys that use any of the named curves
defined in rfc5915 and rfc5480 making no assumptions about
whether the security provider in use supports them (JDK8 and
higher support all the curves defined in rfc5480).
2020-04-15 12:33:40 +03:00
Dimitris Athanasiou 4000138105
[7.x][ML] Add debug logging for outlier detection stop and restart integ test (#55169) (#55202)
To understand the failures in #55068

Backport of #55169
2020-04-15 10:40:38 +03:00
Lee Hinman 36f6e542a2 Ignore ILM indices in the TerminalPolicyStep (#55184)
Prior to the change in #51631 indices were moved to the `TerminalPolicyStep` when their ILM actions
had completed. Once we switched ILM to stop in the last policy configured, these steps because
inaccessible from the policy's perspective. This meant that indices upgraded from ES prior to 7.7.0
could see the following error spammed in their logs every 10 minutes (by default) for every index in
this state:

```
[2020-04-14T15:52:23,764][ERROR][o.e.x.i.IndexLifecycleRunner] [midgar] current step [{"phase":"completed","action":"completed","name":"completed"}] for index [foo] with policy [full] is not recognized
```

This changes the runner to ignore these steps, which is what is desired anyway since the index is
already in the terminal phase.
2020-04-14 16:45:03 -06:00
Igor Motov 1754e50cbd
[7.x] Add analytics plugin usage stats to _xpack/usage (#54911) (#55162)
Adds analytics plugin usage stats to _xpack/usage.

Closes #54847
2020-04-14 17:03:14 -04:00
Mark Vieira ce85063653
[7.x] Re-add origin url information to publish POM files (#55173) 2020-04-14 13:24:15 -07:00
Albert Zaharovits 7f35b927d1
Preserve parent task id for ml transform (#55124)
This change ensures that internal client requests spawned by the
transform persistent task executor and that use the end user security
credentials, have the parent task id assigned. The objective here is
to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because transform tasks already implements graceful shutdown of the
child tasks, this change does not interfere with that by opting out of
the persistent task cancellation of child tasks.

Relates #55046 #54943 #52314
Closes #54957
2020-04-14 18:43:47 +03:00
Andrei Dan d918ef0da9
[Tests] Enable searchable_snapshots for non-snapshot builds (#55151) (#55157)
Fixes https://github.com/elastic/elasticsearch/issues/55050

(cherry picked from commit 13391ceff1cbf6db69706c5f46127b6ff8850a1f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-14 16:13:39 +01:00
Albert Zaharovits 5998486ce8
Refactor AuditTrail for TransportRequests instead of TransportMessage (#55141)
This commit refactors the `AuditTrail` to use the `TransportRequest` as a parameter
for all its audit methods, instead of the current `TransportMessage` super class.

The goal is to gain access to the `TransportRequest#parentTaskId` member,
so that it can be audited. The `parentTaskId` is used internally when spawning tasks
that handle transport requests; in this way tasks across nodes are related by the
same parent task.

Relates #52314
2020-04-14 16:53:59 +03:00
Yannick Welsch a610513ec7 Provide repository-level stats for searchable snapshots (#55051)
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
2020-04-14 14:34:08 +02:00
Przemysław Witek d5bb574e1e
[7.x] Unassign DFA tasks in SetUpgradeModeAction (#54523) (#55143) 2020-04-14 14:09:02 +02:00
Igor Motov 8a669dc9b7
EQL: Add cascading search cancellation (#54843)
EQL search cancellation now propagates cancellation to underlying search
operations.

Relates to #49638
2020-04-14 08:06:02 -04:00
David Turner 87e8367ece Fix testCreateAndRestoreSearchableSnapshot (#55147)
Fixes a couple of related failures in SearchableSnapshotsIntegTests.

Firstly, we were not correctly accounting for the case where the cache was so
small that some/all files were read directly; fixed this by only asserting that
the cache is definitely used if the corresponding node has a cache that's large
enough to hold the whole index.

Secondly, we were not permitting shards to be completely empty, which might be
the case (rarely) if there were not many documents indexed and the distribution
of IDs was a bit unlucky; fixed this by asserting that we get stats for at
least one file for the whole index, rather than for each shard separately.

Closes #55126
2020-04-14 11:54:46 +01:00
Ioannis Kakavas 70cc1d57fb Mute failing test (#54734) 2020-04-14 10:18:33 +01:00
Mark Vieira cb58725164 Mute InferenceIngestIT.testPipelineIngest 2020-04-14 09:27:56 +01:00
William Brafford 52bebec51f
NodeInfo response should use a collection rather than fields (#54460) (#55132)
This is a first cut at giving NodeInfo the ability to carry a flexible
list of heterogeneous info responses. The trick is to be able to
serialize and deserialize an arbitrary list of blocks of information. It
is convenient to be able to deserialize into usable Java objects so that
we can aggregate nodes stats for the cluster stats endpoint.

In order to provide a little bit of clarity about which objects can and
can't be used as info blocks, I've introduced a new interface called
"ReportingService."

I have removed the hard-coded getters (e.g., getOs()) in favor of a
flexible method that can return heterogeneous kinds of info blocks
(e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the
need to cast in the client code.
2020-04-13 17:18:39 -04:00
Ryan Ernst ae14d1661e
Replace license check isAuthAllowed with isSecurityEnabled (#54547) (#55082)
The isAuthAllowed() method for license checking is used by code that
wants to ensure security is both enabled and available. The enabled
state is dynamic and provided by isSecurityEnabled(). But since security
is available with all license types, an check on the license level is
not necessary. Thus, this change replaces isAuthAllowed() with calling
isSecurityEnabled().
2020-04-13 12:26:39 -07:00
Benjamin Trent d32f6fed1d
[ML] inference only persist if there are stats (#54752) (#55121)
We needlessly send documents to be persisted. If there are no stats added, then we should not attempt to persist them.

Also, this PR fixes the race condition that caused issue:  https://github.com/elastic/elasticsearch/issues/54786
2020-04-13 14:03:05 -04:00
Igor Motov 51c6f69e02
[7.x] Add support for filters to T-Test aggregation (#54980) (#55066)
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes #53692
2020-04-13 12:28:58 -04:00
Jake Landis a2fafa6af4
[7.x] Lazy test cluster module and plugins (#54852) (#55087)
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.

Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
2020-04-13 10:53:35 -05:00
Igor Motov 6861295706
Further improve InternalTTestTests (#55081)
A small follow-up to #54910. Now that we can generated consistent set of
internal aggs to reduce, we no longer need to keep agg parameters as class
variables.

Related to #54910
2020-04-13 10:26:23 -04:00
Benjamin Trent c5c7ee9d73
[7.x] [ML] Start gathering and storing inference stats (#53429) (#54738)
* [ML] Start gathering and storing inference stats (#53429)

This PR enables stats on inference to be gathered and stored in the `.ml-stats-*` indices.

Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user.

`.ml-stats-*` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name.

We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).
2020-04-13 08:15:46 -04:00
Andrei Dan c0406f78b7
ILM add cluster update timeout on step retry (#54878) (#55022)
This commits adds a timeout when moving ILM back on to a failed step. In
case the master is struggling with processing the cluster update requests
these ones will expire (as we'll send them again anyway on the next ILM
loop run)

ILM more descriptive source messages for cluster updates

Use the configured ILM step master timeout setting

(cherry picked from commit ff6c5ed16616eadfcddd9c95317d370f0d126583)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-11 10:13:31 +01:00
Andrei Dan b8df265b42
[7.x] ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909) (#55018)
* ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909)

This changes the priority of the cluster state update that stops ILM
altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to
temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL`
priority can see the "stop ILM update" not make it up the tasks queue.

On the same note, we're keeping the `start ILM` cluster update priority
to `NORMAL` on purpose such that we only start `ILM` if the cluster can
handle it.

(cherry picked from commit d67df3a7cd2a8619c2c9efac4dde3ba83271f2fa)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-11 10:12:35 +01:00
Albert Zaharovits f22004a262
Preserve parent task id for data frame analytics (#55046)
This change makes sure that all internal client requests spawned by the
data frame analytics persistent task executor and that use the end user
security credentials, have the parent task id assigned. The objective here
is to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because data frame analytics taks already implements graceful shutdown
of child tasks, this change does not interfere with it by opting out of
the persistent task cancellation of child tasks.

Relates #54943 #52314
2020-04-10 22:27:21 +03:00
Mark Vieira 5d4ddf9146
Fixes for IntelliJ IDEA 2020.1 support (#55077) 2020-04-10 11:57:48 -07:00
Nik Everett c00811f3a3
Make some agg tests easier to read (#54954) (#55079)
We added a fancy method to provide random realistic test data to the
reduction tests in #54910. This uses that to remove some of the more
esoteric machinations in the agg tests. This will marginally increase
the coverage of the serialiation tests and, more importantly, remove
some mysterious value generation code that only really made sense for
random reduction tests but was used all over the place. It doesn't, on
the other hand, make the tests shorter. Just *hopefully* more clear.

I only cleaned up a few tests this way. If we like this it'd probably be
worth grabbing others.
2020-04-10 14:15:30 -04:00
Luca Cavanna 93c39ad4e7 Async search: create internal index only before storing initial response (#54619)
We currently create the .async-search index if necessary before performing any action (index, update or delete). Truth is that this is needed only before storing the initial response. The other operations are either update or delete, which will anyways not find the document to update/delete even if the index gets created when missing. This also caused `testCancellation` failures as we were trying to delete the document twice from the .async-search index, once from `TransportDeleteAsyncSearchAction` and once as a consequence of the search task being completed. The latter may be called after the test is completed, but before the cluster is shut down and causing problems to the after test checks, for instance if it happens after all the indices have been cleaned up. It is totally fine to try to delete a response that is no longer found, but not quite so if such call will also trigger an index creation.

With this commit we remove all the calls to createIndexIfNecessary from the update/delete operation, and we leave one call only from storeInitialResponse which is where the index is expected to be created.

Closes #54180
2020-04-10 18:24:05 +02:00
Ross Wolf 96a903b17f
EQL: Add string function (#54470)
* EQL: Add string() function
* EQL: Reorder queryfolder_tests
* EQL: Add test queries
* EQL: Fix InternalEqlScriptUtils.string and test case
* EQL: Fix testStringFunctionWithText error message
* EQL: Flatten ToStringFunctionPipe.equals
* EQL: Reorder painless whitelist
* EQL: Address feedback and remove string(null) handling
* EQL: Move string(pid) test over
* EQL: Rename source -> value
2020-04-10 09:48:29 -06:00
Przemysław Witek 17101d86d9
[7.x] Do not execute ML CRUD actions when upgrade mode is enabled (#54437) (#55049) 2020-04-10 16:07:11 +02:00
Dimitrios Liappis b062535e27
Mute testSearchableSnapshotAction in TimeSeriesLifecycleActions tests (#55055)
Backport of #55052
Details in #55050
2020-04-10 16:03:09 +03:00
Jason Tedor a370668fcc
Clean up even more instances of "metaData"
We recently cleaned up the use of the word "metadata" across the
codebase. Even more additional uses have trickled in, likely from
in-progress work. This commit cleans up these last few additional
instances.

Relates #54519
2020-04-10 08:52:37 -04:00
Jason Tedor 9eeae59a83
Clarify available processors (#54907)
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
2020-04-10 08:48:27 -04:00
Costin Leau a7e4f79e8f EQL: Deprecate lenient sequence declaration (#55032)
Deprecate alternative sequence parameter declaration (with then by)
Disallow lack of time units inside maxspan

Fix #55023
Relate #54680

(cherry picked from commit 201adafba9def1de4bf843760defb9def3394f63)
2020-04-10 10:30:07 +03:00
Marios Trivyzas bf0cadb602
SQL: Implement DATETIME_PARSE function for parsing strings (#54960) (#55035)
Implement DATETIME_PARSE(<datetime_str>, <pattern_str>) function
which allows to parse a datetime string according to the specified
pattern into a datetime object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Relates to #53714

(cherry picked from commit 3febcd8f3cdf9fdda4faf01f23a5f139f38b57e0)
2020-04-10 01:16:29 +02:00
Nhat Nguyen c9f8fb2dd0 Clear recent errors when auto-follow successfully (#54997)
Today, we do not clear the recent errors in AutoFollowCoordinator when 
we successfully auto-follow indices. This can lead to confusion for the
operators.
2020-04-09 14:35:16 -04:00
Albert Zaharovits f55a361b64
Preserve Task Id for ML Datafeed (#54943)
This change preserves the task id for internal requests for the `StartDatafeedPersistentTask`.

Task ids are a way to express a relationship between related internal requests.
In this particular case, the task ids are used for debugging and (soon) security auditing,
but not for task cancellation, because there is already a graceful-shutdown of child
internal requests (given a task id) in place.
2020-04-09 13:22:29 +03:00
Hendrik Muhs 223fbb2ae7 [Transform] fix sporadic test failure due to unavailable notif… (#54939)
move no initializing shards check before dumping audit messages

fixes #54810
2020-04-09 08:04:42 +02:00
Andrei Stefan 85f129a50a
EQL: indexOf function implementation (#54543) (#54989)
(cherry picked from commit a4b1d6e52d9ba22d541dd86d69861b1efee83604)
2020-04-09 02:41:01 +03:00
Mark Vieira 1552f2fa3e
Enable searchable snapshots for release tests (#54987) 2020-04-08 14:41:03 -07:00
Mark Vieira 0fa8a14bcb
Mute SamlServiceProviderDocumentTests.testStreamRoundTripWithAllFields 2020-04-08 12:56:36 -07:00
Jay Modi 3600c9862f
Reintroduce system index APIs for Kibana (#54935)
This change reintroduces the system index APIs for Kibana without the
changes made for marking what system indices could be accessed using
these APIs. In essence, this is a partial revert of #53912. The changes
for marking what system indices should be allowed access will be
handled in a separate change.

The APIs introduced here are wrapped versions of the existing REST
endpoints. A new setting is also introduced since the Kibana system
indices' names are allowed to be changed by a user in case multiple
instances of Kibana use the same instance of Elasticsearch.

Relates #52385
Backport of #54858
2020-04-08 09:08:49 -06:00
Bogdan Pintea 8d6d7b88d8
SQL: drop BASE TABLE type in favour for just TABLE (#54836) (#54951)
* Drop BASE TABLE type in favour for just TABLE

This commit drops the table type 'BASE TABLE' and replaces all
occurences with just 'TABLE', since his type is wider-used and
friendlier to the client applications that query for certain table types
in their discovery mode.

The 'TABLE' type is also explicitely mentioned by the JDBC and ODBC
standards and although other data source-specific types are permitted,
older apps will not work well with them.

* Refactor table type constants out of IndexType

Move SQL_TABLE/_ALIAS out of IndexType, so that they can also be used in
that Enum definition.

(cherry picked from commit 70241b52697ac2cf71004040042123c1ec050299)
2020-04-08 16:02:12 +02:00
Marios Trivyzas 6afd60b082
SQL: Implement DATETIME_FORMAT function for date/time formatting (#54832) (#54942)
Implement DATETIME_FORMAT(<date/datetime/time>, ) function
which allows for formatting a timestamp to the specified format. The
patterns allowed as those of java.time.format.DateTimeFormatter.

Related to #53714

(cherry picked from commit 72be0b54a9299e87e785469cdc9aafac2a48c046)
2020-04-08 13:45:47 +02:00
David Turner 0d2195191d Allocate searchable snapshots with the balancer (#54889)
Today the shards of searchable snapshots are allocated with a naive
`ExistingShardsAllocator` which selects the first valid node for each shard.
Thanks to #54729 we can now allow these shards to fall through to the balanced
shards allocator so that they are allocated in a more balanced fashion.

Relates #50999
2020-04-08 10:02:42 +01:00
Ryan Ernst 37795d259a
Remove guava from transitive compile classpath (#54309) (#54695)
Guava was removed from Elasticsearch many years ago, but remnants of it
remain due to transitive dependencies. When a dependency pulls guava
into the compile classpath, devs can inadvertently begin using methods
from guava without realizing it. This commit moves guava to a runtime
dependency in the modules that it is needed.

Note that one special case is the html sanitizer in watcher. The third
party dep uses guava in the PolicyFactory class signature. However, only
calling a method on the PolicyFactory actually causes the class to be
loaded, a reference alone does not trigger compilation to look at the
class implementation. There we utilize a MethodHandle for invoking the
relevant method at runtime, where guava will continue to exist.
2020-04-07 23:20:17 -07:00
Aleksandr Maus d02f774cb6
EQL: implement cidrMatch function (#54186) (#54928)
Related to https://github.com/elastic/elasticsearch/issues/54132
2020-04-07 22:07:28 -04:00
Tal Levy 254d1e3543
[7.x] Create new `geo` module and migrate geo_shape registration (#53562) (#54924)
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.

As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.

Co-authored-by: Nicholas Knize <nknize@gmail.com>
2020-04-07 16:30:58 -07:00
Aleksandr Maus de381271f1
EQL: implement stringContains function (#54380) (#54923) 2020-04-07 17:55:13 -04:00
Nik Everett ce7ae4a7d1
Remove pipline aggs from agg result tree (backport of #54716) (#54920)
This removes pipeline aggregators from the aggregation result tree
except for a single field used for backwards compatibility with pre-7.8
versions of Elasticsearch. That field isn't populated unless we are
serializing to pre-7.8 Elasticsearch. So, good news! We no longer build
pipeline aggregators on the data node. Most of the time.
2020-04-07 17:22:23 -04:00
Nik Everett 100f7258c7
Improve agg reduce tests (#54910) (#54914)
This allows subclasses of `InternalAggregationTestCase` to make a `List`
of values to reduce so that it can make values that are realistic
*together*. The first use of this is with `InternalTTest` which uses it
to make results that don't cause their `sum` field to wrap. It'd likely
be useful for a ton of other aggs but just one for now.
2020-04-07 17:22:04 -04:00
Aleksandr Maus 868798e4db
EQL: implement between function (#54277) (#54913) 2020-04-07 16:52:30 -04:00
Costin Leau 8b1e87cb61 EQL: Change query folding spec from new lines to ; (#54882)
The usage of blank lines as separator between tests can be tricky to
deal with in case of merges where such lines can be added by accident.
Further more counting non-consecutive lines is non-intuitive.
The tests have been aligned to use ; at the end of the query and
exceptions so that the presence or absence of empty lines is irrelevant.
The parsing of the spec has been changed to perform validation to not
allow invalid/incomplete specs to cause exceptions.

(cherry picked from commit 192ad88d3a51e1e1f1f82830526518720ec88217)
2020-04-07 21:57:06 +03:00
Tanguy Leroux b8d2b952b8 Only one of azure key or token can be specified in 3rd party tests (#54876)
#54803 introduces more QA tests for Azure storage service, but 
they fail the build is one of the key or token is missing. It should i
nstead work like repository-azure:qa tests.
2020-04-07 19:36:48 +02:00
Nik Everett faa687c0ae Fix InternalTTestTests
`testReduceRandom` was bumping up against the serialization that I added
in #54776. This makes it use random values that reduce in ways that
don't cause the randomized serialization to fail.
2020-04-07 11:51:54 -04:00
Larry Gregory 8c8baa10f4
[Backport] Add reserved_ml_user and reserved_ml_admin kibana p… (#54837)
* add reserved_ml_user and reserved_ml_admin kibana privileges

* address feedback, update dataframe roles

* fix checkstyle failure
2020-04-07 11:42:11 -04:00
Dimitris Athanasiou 9b4ac60b53
[7.x][ML] Cancel reindex task from correct thread context (#54874) (#54898)
When a data frame analytics job is stopped, if the reindexing
task was still in progress we cancel it. Cancelling it should
be done from the same context as when we executed the reindexing
task. That means from a thread context with ML origin.

Backport of #54874
2020-04-07 18:11:58 +03:00
Andrei Dan bbc57828c4
ILM fix retry delete action test (#54809) (#54895)
Asserting on the failed_step field from the explainAPI can produce flakiness
because the ILM state is moved back and forth between the (failing) step and
the ERROR step (as the workflow is retry, fail then move to ERROR step,
move back to the (failing) step,  retry, fail, etc) and the failed_step
information is only available whilst in the ERROR state.

Unmute other tests as they were collateral failures
A read-only index could not be deleted in the wipeCluster phase and caused
these failures

(cherry picked from commit 99a6d57aeb3cf11abc38b514f38a96bb1612e357)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-07 15:55:56 +01:00
Nik Everett 3c56e0de42
Fix scripted metric in ccs (backport of #54776) (#54888)
`scripted_metric` did not work with cross cluster search because it
assumed that you'd never perform a partial reduction, serialize the
results, and then perform a final reduction. That
serialized-after-partial-reduction step was broken.

This is also required to support #54758.
2020-04-07 10:43:00 -04:00
Ignacio Vera 076c199484
Add new point field. (#53804) (#54879)
This commit adds a new point field that is able to index arbitrary pair of values (x/y)
in the cartesian space. It only supports filtering using shape queries at the moment.
2020-04-07 15:28:50 +02:00
Tanguy Leroux 4d36917e52
Merge feature/searchable-snapshots branch into 7.x (#54803) (#54825)
This is a backport of #54803 for 7.x.

This pull request cherry picks the squashed commit from #54803 with the additional commits:

    6f50c92 which adjusts master code to 7.x
    a114549 to mute a failing ILM test (#54818)
    48cbca1 and 50186b2 that cleans up and fixes the previous test
    aae12bb that adds a missing feature flag (#54861)
    6f330e3 that adds missing serialization bits (#54864)
    bf72c02 that adjust the version in YAML tests
    a51955f that adds some plumbing for the transport client used in integration tests

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-07 13:28:53 +02:00
David Roberts 8f2ddaee1a [TEST] Allow kb or mb for data frame analytics memory estimate (#54869)
This change is to support the switch from kb to mb being made in
https://github.com/elastic/ml-cpp/pull/1126
2020-04-07 11:28:29 +01:00
David Roberts df4ae79b41
[TEST] Unmute CategorizationIT.testNumMatchesAndCategoryPreference (#54868)
Should work again now that https://github.com/elastic/ml-cpp/issues/1121
is resolved.

Backport of #54768
2020-04-07 11:04:31 +01:00
Armin Braun 1039cae2cc
Fix Repository Consistency TODOs from SLM Tests (#54767) (#54860)
These TODOs don't apply any longer with the repository generation
now being tracked consistently so we can remove the workarounds.
2020-04-07 09:27:50 +02:00
Jim Ferenczi c7ff67ddef Preserve final response headers in asynchronous search (#54349)
This change adds the response headers of the original search request
in the stored response in order to be able to restore them when retrieving a result
from the async-search index. It also ensures that response headers are preserved for
users that retrieve a final response on a running search task.
Partial response can eventually return response headers too but this change only ensures
that they are present when the response if final.

Relates #33936
2020-04-07 08:37:03 +02:00
Jim Ferenczi d57a047ab7 Fix transport serialization of AsyncSearchUser (#54761)
This change ensures that the AsyncSearchUser is correctly (de)serialized when
an action executed by this user is sent to a remote node internally (via transport client).
2020-04-07 08:25:58 +02:00
Costin Leau 99846f47b7 QL: Introduce infrastructure for surrogate functions (#54795)
Some functions act as shortcuts for more verbose declarations (sometimes
with certain constraints). This PR removes the boilerplate around
declaring such functions as well as a dedicated rule for the optimizer
to perform the actual substitution.

Fix #54334

(cherry picked from commit 3231d01b0c583deb89252fafe84db48878da3246)
2020-04-07 00:46:50 +03:00
Costin Leau 36121117f0 EQL: Sequence/Join parsing and model (#54227)
Add parsing and (logical) domain model for sequence and join

(cherry picked from commit 9e9632d41a39877256c68634ab18e441f4b67fe8)
2020-04-06 23:15:35 +03:00
Igor Motov 1aa87cd4a9
EQL: Make EQL search task cancellable (#54598)
First step towards async search execution. At the moment we don't try to cancel
the underlying search requests, and just check if the task is canceled before
performing network operation (such as field caps and search)

Relates to #49638
2020-04-06 13:38:03 -04:00
Igor Motov 2794572a35
[7.x] Add Student's t-test aggregation support (#54469) (#54737)
Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.

Relates to #53692
2020-04-06 11:36:47 -04:00
Dimitris Athanasiou 0049e9467b
[7.x][ML] Fix node serialization on GET df-nalytics stats without id (#54808) (#54812)
Previously, the id of the `GetDataFrameAnalyticsStatsAction.Request`
could be `null` which caused NPE on serialization as `writeString`
is used (it doesn't accept null values).

This commit ensures the id is never null.

Closes #54807

Backport of #54808
2020-04-06 18:13:16 +03:00
David Kyle 03bc368c14
Wait for ML templates after creating a new cluster in TooManyJobsIT (#54801) 2020-04-06 13:45:56 +01:00
Dimitris Athanasiou ed4ef78330
[7.x][ML] Increase open job wait time in MlDistributedFailureIT (#54792) (#54798)
It seems the 20 seconds timeout is occasionally not enough.
We still get sporadic failures where the logs reveal the job
wasn't opened within 20 seconds. I'm increasing the wait time
to 30 seconds.

Closes #54448

Backport of #54792
2020-04-06 14:51:46 +03:00
Tim Vernum 30b01fe00d
Resolve SSO roles by pattern (#54777)
This changes a SamlServiceProvider to have a function that maps
from an "action-name" to set of role-names instead of a Map that does
so.

The on-disk representation of this mapping is a set of Java Regexp
Patterns, for which the first matching group is the role name.

For example "sso:(\w+)" would map any action that started with "sso:"
to the corresponding role name (e.g. "sso:superuser" -> "superuser").

Backport of: #54440
2020-04-06 14:10:30 +10:00
Jason Tedor b939b47b77
Add wire tests for get autoscaling decision objects
This commit adds wire serializing tests for the get autoscaling decision
request and response objects.
2020-04-05 21:34:36 -04:00
Jason Tedor 8f520f0a9c
Add wire tests to delete autoscaling policy request
This commit adds some wire serializing tests for delete autoscaling
policy requests.
2020-04-05 21:34:35 -04:00
Jason Tedor 98c4165348
Add wire tests for put autoscaling policy request
This commit adds some wire serializing tests for put autoscaling policy
requests.
2020-04-05 21:34:34 -04:00
Tim Vernum cf442aae38
Resolve SERVICE_UNAVAILABLE in IdP IntegTest (#54700)
The SamlIdentityProviderTests IntegTests would sometimes encounter a
service unavailable exception when registering a new service provider.

This change ensure that there is a data node, and that the cluster
state is recovered before registering providers

Backport of: #54622
2020-04-06 11:23:08 +10:00
Jason Tedor b2cd858f29
Return 404s when autoscaling policies do not exist (#54774)
This commit updates the autoscaling get and delete policy APIs to return
404s when the named policy does not exist.
2020-04-05 21:05:11 -04:00
Jason Tedor 184c038f59
Add get autoscaling policy API (#54762)
This commit adds the get autoscaling policy API.
2020-04-04 18:04:25 -04:00
Nhat Nguyen 73d24203e7 Handle no such remote cluster exception in ccr (#53415)
A remote client can throw a NoSuchRemoteClusterException while fetching
the cluster state from the leader cluster. We also need to handle that
exception when retrying to add a retention lease to the leader shard.

Closes #53225
2020-04-04 13:55:06 -04:00
Jason Tedor 2a94672c32
Separate autoscaling REST test cases
A couple of the autoscaling REST tests combine multiple tests into a
single REST test. This commit separates them in to single tests.
2020-04-04 10:21:21 -04:00
Jason Tedor d5a195ab3d
Rename the policies in put autoscaling REST tests
The autoscaling REST tests use policies named "hot" in their test
cases. Instead, this commit changes the name of these policies to
"my_autoscaling_policy".
2020-04-04 10:14:11 -04:00
Jason Tedor 79c72cd398
Migrate common autoscaling test code to base class
This commit moves some code repeated in a few autoscaling tests related
to writeable and x-content registries to the autoscaling tests base
class.
2020-04-04 09:57:20 -04:00
Jason Tedor dd99e6d951
Simplify name of delete autoscaling policy handler
The name here is unnecessarily long, containing the word "action" when
it does not need to. This commit simplifies the name.
2020-04-03 21:46:48 -04:00
Ross Wolf 022f829d84
EQL: Add wildcard function (#54020)
* EQL: Add wildcard function
* EQL: Cleanup Wildcard.getArguments
* EQL: Cleanup Wildcard and rearrange methods
* EQL: Wildcard newline lint
* EQL: Make StringUtils function final
* EQL: Make Wildcard.asLikes return ScalarFunction
* QL: Restore BinaryLogic.java
* EQL: Add Wildcard PR feedback
* EQL: Add Wildcard verification tests
* EQL: Switch wildcard to isFoldable test
* EQL: Change wildcard test to numeric field
* EQL: Remove Wildcard.get_arguments
2020-04-03 10:15:43 -06:00
Ioannis Kakavas 8e255337f8
Fix SamlServiceProviderDocumentTests (#54718) (#54723)
Don't assume byte for byte equality because internal structures
 do not guarantee order
2020-04-03 18:46:36 +03:00
Christoph Büscher 8c9ac14a98
Rename field name constants in AbstractBuilderTestCase (#53234)
Some field name constants were not updaten when we moved from "string" to "text"
and "keyword" fields. Renaming them makes it easier and faster to know which
field type is used in test subclassing this base test case.
2020-04-03 17:28:22 +02:00
David Roberts 470aa9a5f1 [TEST] Mute CategorizationIT.testNumMatchesAndCategoryPreference (#54717)
The test results are affected by the off-by-one error that is
fixed by https://github.com/elastic/ml-cpp/pull/1122

This test can be unmuted once that fix is merged and has been
built into ml-cpp snapshots.
2020-04-03 14:40:47 +01:00
Dimitris Athanasiou e8c0351fd8
[7.x][ML] Allow force stopping failed and stopping DF analytics (#54650) (#54712)
Force stopping a failed job used to work but it
now puts the job in `stopping` state and hangs.
In addition, force stopping a `stopping` job is
not handled.

This commit addresses those issues with force
stopping data frame analytics. It inlines the
approach with that followed for anomaly detection
jobs.

Backport of #54650
2020-04-03 16:08:06 +03:00
Maria Ralli aa697346c4 Remove Xlint exclusions from gradle files (part 2)
Backport of #54576.

This commit is part of issue #40366 to remove disabled Xlint warnings
from gradle files. Remove the Xlint exclusions from the following files:

- x-pack/plugin/rollup/build.gradle
- x-pack/plugin/monitoring/build.gradle
- x-pack/qa/rolling-upgrade-basic/build.gradle

Add type parameters to parameterized types. Add wildcard-type parameters
or bounded wildcard-type parameters. Suppress `unchecked` and `rawtypes`
warnings at method level.
2020-04-03 12:15:42 +01:00
Christoph Büscher 9f22c0d37c Fix Eclipse compile problem in ModelLoadingService (#54670)
Current Eclipse 4.14.0 cannot deal with the direct lambda notation, changing to
an exlicite one.
2020-04-03 11:56:30 +02:00
Julie Tibshirani 5fb7602227
Disallow changing 'enabled' on the root mapper. (#54681)
In #33933 we disallowed changing the `enabled` parameter in object mappings.
However, the fix didn't cover the root object mapper. This PR adjusts the change
to also include the root mapper and clarifies the error message.
2020-04-02 15:28:48 -07:00
Benjamin Trent 6e73f67f3b
[ML] unmute categorization test for native backport (#54679) 2020-04-02 17:08:19 -04:00
Benjamin Trent 7fe38935f6
[ML] add training_percent to analytics process params (#54605) (#54678)
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations.

See native side pr: elastic/ml-cpp#1111
2020-04-02 17:08:06 -04:00
Nik Everett 54ea4f4f50 Begin to drop pipeline aggs from the result tree (backport of #54311) (#54659)
Removes pipeline aggregations from the aggregation result tree as they
are no longer used. This stops us from building the pipeline aggregators
at all on data nodes except for backwards compatibility serialization.
This will save a tiny bit of space in the aggregation tree which is
lovely, but the biggest benefit is that it is a step towards simplifying
pipeline aggregators.

This only does about half of the work to remove the pipeline aggs from
the tree. Removing all of it would, well, double the size of the change
and make it harder to review.
2020-04-02 16:45:12 -04:00
Benjamin Trent 4a1610265f
[7.x] [ML] add new inference_config field to trained model config (#54421) (#54647)
* [ML] add new inference_config field to trained model config (#54421)

A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.

The inference processor can still override whatever is set as the default in the trained model config.

* fixing for backport
2020-04-02 12:25:10 -04:00
Jason Tedor 2113c1ffb6
Fix autoscaling internal cluster release tests
This commit addresses an issue with the autoscaling feature flag not
being registered in release builds of the internal cluster tests. This
commit addresses this by enabling the system property that is needed,
but only in release builds.
2020-04-02 11:48:02 -04:00
Benjamin Trent 65233383f6
[7.x] [ML] prefer secondary authorization header for data[feed|frame] authz (#54121) (#54645)
* [ML] prefer secondary authorization header for data[feed|frame] authz (#54121)

Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds.

Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided).

closes https://github.com/elastic/elasticsearch/issues/53801

* fixing for backport
2020-04-02 11:20:25 -04:00
Zachary Tong 20d67720aa
Refactor Percentiles/Ranks aggregation builders and factories (#51887) (#54537)
- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileConfig object
- Disallows setting algo options that don't apply to current algo

The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm.  For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.

Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>

Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
2020-04-02 10:39:41 -04:00
Jason Tedor 7467cc04ec
Remove toXContent from autoscaling request classess (#54643)
These methods are not needed, we were only following a pattern in the
rest of the codebase, but it's legacy from the HLRC sharing
request/response objects with the server.
2020-04-02 10:30:20 -04:00
David Roberts 4b4800e096
[ML] Take more care that normalize processes use unique named pipes (#54641)
When one of ML's normalize processes fails to connect to the JVM
quickly enough and another normalize process for the same job
starts shortly afterwards it is possible that their named pipes
can get mixed up.

This change avoids the risk of that by adding an incrementing
counter value into the named pipe names used for normalize
processes.

Backport of #54636
2020-04-02 14:25:31 +01:00
Benjamin Trent eb31be0e71
[7.x] [ML] add num_matches and preferred_to_categories to category defintion objects (#54214) (#54639)
* [ML] add num_matches and preferred_to_categories to category defintion objects (#54214)

This adds two new fields to category definitions.

- `num_matches` indicating how many documents have been seen by this category
- `preferred_to_categories` indicating which other categories this particular category supersedes when messages are categorized.

These fields are only guaranteed to be up to date after a `_flush` or `_close`

native change: https://github.com/elastic/ml-cpp/pull/1062

* adjusting for backport
2020-04-02 09:09:19 -04:00
Jason Tedor 54ecb009bb
Add delete autoscaling policy API (#54601)
This commit adds an API for deleting autoscaling policies.
2020-04-02 09:05:12 -04:00
Martijn Laarman 0ed20cc349 Rename cat.transform => cat.transforms (#54438)
* Rename cat.transform => cat.transforms

To match the url.

We typically prefer singular url nouns but _cat tends to use plural and
this API does in fact uses `/_cat/transforms`

* also rename the api in the spec and tests

(cherry picked from commit c495d220ac8fedba7f70f82387cd6d6a672b8b14)
2020-04-02 09:40:51 +02:00
Tim Vernum c40ec6a577
Turn on trace logging for failing test (#54623)
SamlIdentityProviderTests is failing with 409 conflicts that have not
been reproducible outside of CI.
This change turn on additional logging in this test to determine why
these conflict occur.

Relates: #54423
Backport of: #54475
2020-04-02 16:15:12 +11:00
Russ Cam 2978024375 Update rest API specs (#54252)
This commit updates the rest API specs to validate against a
JSON schema for the specifications. Most updates are to add
a description, whilst others fix typos and unify conventions
e.g. deprecations, descriptions, urls starting with /. The schema
conforms to draft-07 JSON schema.

(cherry picked from commit da37e01d32f9764c3937736ef0c7d3ab40af9a77)
2020-04-02 10:53:32 +10:00
Russ Cam a2f59a2744 Add hidden value to expand_wildcards params (#54551)
This commit adds the hidden enum value to
all expand_wildcards params

(cherry picked from commit 581b8cdabe11444105edb62226b439ba4c7e908a)
2020-04-02 09:01:20 +10:00
William Brafford 958e9d1b78
Refactor nodes stats request builders to match requests (#54363) (#54604)
* Refactor nodes stats request builders to match requests (#54363)

* Remove hard-coded setters from NodesInfoRequestBuilder

* Remove hard-coded setters from NodesStatsRequest

* Use static imports to reduce clutter

* Remove uses of old info APIs
2020-04-01 17:03:04 -04:00
Jason Tedor fd729a6509
Fix the name of an autoscaling policy test
The test name says it is testing the put autoscaling decision API, but
that is not right, since no such API exists (nor will exist). This
commit corrects the name of this test to reflect the fact that the test
is about the put autoscaling policy API.
2020-04-01 16:36:47 -04:00
Mayya Sharipova bf4857d9e0
Search hit refactoring (#41656) (#54584)
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: #38373
Relates to issue #24422

Co-authored-by: sandmannn <bohdanpukalskyi@gmail.com>
2020-04-01 15:19:00 -04:00
Jason Tedor 8ed1a6cdb6
Use List.of convenience methods in 7.x autoscaling
We do not have access to JDK 9 collection convenience methods in 7.x
because we are compatible with JDK 8 there. Yet, we have recently added
a substitute for these convenience methods that even delegate to the
right places when running on JDK 9, to make backporting easier. This
commit utilizes these new methods in the autoscaling codebase.
2020-04-01 12:17:56 -04:00
Ioannis Kakavas c9ffa379ba
[7.x] Add end to end QA authentication test (#54215) (#54567)
Use the same ES cluster as both an SP and an IDP and perform
IDP initiated and SP initiated SSO. The REST client plays the role
of both the Cloud UI and Kibana in these flows

Backport of #54215

* fix compilation issues
2020-04-01 18:35:21 +03:00
Jason Tedor a039f45604
Fix autoscaling metadata not adding X-Pack mix-in
This commit addresses an issue with the autoscaling metadata not
implementing a required interface, used in the feature aware checks.
2020-04-01 08:42:01 -04:00
Jason Tedor f670ae0bc8
Introduce autoscaling policies (#54473)
This commit is the first in a series of commits that introduces
autoscaling policies, and APIs for working with them. For now, we
introduce the basic infrastructure, and a single API for putting an
autoscaling policy. We will follow in rapid succession with APIs for
getting, and deleting autoscaling policies.
2020-04-01 08:12:26 -04:00
Przemysław Witek 1fe2705826
Skip daily maintenance activity if upgrade mode is enabled (#54565) (#54571) 2020-04-01 13:29:34 +02:00
Ioannis Kakavas 1cff6897f3
Add error message in JSON response (#54389) (#54562)
When the SAML authentication is not successful, we return a SAML
Response with a status that indicates a failure. This commit adds
an error message in the REST API response along with the SAML
Response XML string so that the caller of the API can identify
that this is an unsuccessful response without needing to parse the
XML.
2020-04-01 13:02:52 +03:00
Luca Cavanna d75571ff0f [TEST] rename AsyncSearchActionTests to IT and move it out of unit tests (#54520)
`AsyncSearchActionTests` currently fails quite often. That is since the introduction of `RestSubmitAsyncSearchActionTests` which indirectly manipulates the channels being tracked in `RestCancellableNodeClient`. There are channels left in the map after `RestSubmitAsyncSearchActionTests`  is run, and later `AsyncSearchActionTests` checks that there are no channels in the map which makes each test method fail. This is particularily hard to reproduce as the order in which tests are run appears to be platform dependent.

The test cluster assertion that there are no channels in the map only makes sense in the context of internal cluster tests, while there may be collisions with unit tests that register http channels as part of their testing.

This can be solved by renaming `AsyncSearchActionTests` to `AsyncSearchActionIT`. This way it won't be run as part of unit tests but rather within another JVM where the number of channels is `0` and such assumption holds, because there are no expected manual manipulation of the channels.

Relates to #54180
2020-04-01 11:23:27 +02:00
Ioannis Kakavas 74eeecf91b
Fix testGenerateAndSignMetadata in FIPS mode (#54115) (#54387)
BC provider throws different error message on signature
validation failure
2020-04-01 12:04:20 +03:00
Jason Tedor 63e5f2b765
Rename META_DATA to METADATA
This is a follow up to a previous commit that renamed MetaData to
Metadata in all of the places. In that commit in master, we renamed
META_DATA to METADATA, but lost this on the backport. This commit
addresses that.
2020-03-31 17:30:51 -04:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Zachary Tong c9db2de41d
[7.x] Comprehensively test supported/unsupported field type:agg combinations (#54451)
* Comprehensively test supported/unsupported field type:agg combinations (#52493)

This adds a test to AggregatorTestCase that allows us to programmatically
verify that an aggregator supports or does not support a particular
field type.  It fetches the list of registered field type parsers,
creates a MappedFieldType from the parser and then attempts to run
a basic agg against the field.

A supplied list of supported VSTypes are then compared against the
output (success or exception) and suceeds or fails the test accordingly.

Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com>
* Skip fields that are not aggregatable

* Use newIndexSearcher() to avoid incompatible readers (#52723)

Lucene's `newSearcher()` can generate readers like ParallelCompositeReader
which we can't use.  We need to instead use our helper `newIndexSearcher`
2020-03-31 14:35:03 -04:00
David Roberts b8f06df53f
[ML] Fix bug, add tests, improve estimates for estimate_model_memory (#54508)
This PR:

1. Fixes the bug where a cardinality estimate of zero could cause
   a 500 status
2. Adds tests for that scenario and a few others
3. Adds sensible estimates for the cases that were previously TODO

Backport of #54462
2020-03-31 17:59:38 +01:00
David Kyle 9150e77269
[7.x] Remove unused environment from anomaly detector classes (#54399) (#54456) 2020-03-31 16:55:37 +01:00
Dimitris Athanasiou e4230c533c
[7.x][ML] Move DFA MemoryUsage to stats.common pkg (#54492) (#54512)
This belongs in stats.common

Backport of #54492
2020-03-31 18:36:05 +03:00
Andrei Stefan 977302e46c
EQL: startsWith and endsWith functions implementation (#54504)
* EQL: startsWith function implementation (#54400)

(cherry picked from commit 666719fcfc40f6fc0535609577791369123320ab)

* EQL: endsWith function implementation (#54442)

(cherry picked from commit 554a4c8ef04b67eed107d29b57185e9af25d9d4f)
2020-03-31 18:06:03 +03:00
Dimitris Athanasiou 6d96ca9bc8
[7.x][ML] Reenable classification and regression integ tests (#54489) (#54494)
Relates #54401

Backport of #54489
2020-03-31 17:50:08 +03:00
Andrei Stefan 364ea0a3c0
EQL: Length function implementation (#54209) (#54490)
(cherry picked from commit 18493467e55e014be2c9e0ebdf734e9d7fc4beaa)
2020-03-31 16:49:18 +03:00
Ioannis Kakavas 349293da6d
Mute failing test (#54446) (#54487)
see #54445
2020-03-31 15:56:10 +03:00
Tim Vernum a0853628cd
Add wildcard service providers to IdP (#54477)
This adds the ability for the IdP to define wildcard service
providers in a JSON file within the ES node's config directory.

If a request is made for a service provider that has not been
registered, then the set of wildcard services is consulted. If the
SP entity-id and ACS match one of the wildcard patterns, then a
dynamic service provider is defined from the associated mustache
template.

Backport of: #54148
2020-03-31 16:53:13 +11:00
Jason Tedor 5d760051a9
Clarify autoscaling feature flag registration (#54427)
This commit clarifies the autoscaling feature flag registration system
property. The intention is that this system property is:
 - unset in snapshot builds
 - unset, true, or false in release builds
 - in release builds, unset behaves the same as false
 - therefore, we only register the enabled flag if the build is a
   snapshot build, or the build is a release build and the system
   property is set to true

This commit clarifies that intention, and removed a confusion situation
where the AUTOSCALING_FEATURE_FLAG_REGISTERED field would be set to
false in a snapshot build, even though we were going to register the
setting.
2020-03-30 21:37:25 -04:00
Ross Wolf d11e977b1f
EQL: Use In from QL (#53244)
* EQL: Use In from QL
* EQL: Add more In tests
* EQL: Test In duplicates
* EQL: Add test for In mixed types
* EQL: Copy In translation to QL
* SQL: Use InComparisons from QL
* EQL: Remove boost checks from QueryFolderOkTests
* QL: Add TranslatorHandler.convert
2020-03-30 15:19:23 -06:00
Dimitris Athanasiou b4b54efa73
[7.x][ML] Hyperparameter names should match config (#54401) (#54435)
Java side of elastic/ml-cpp#1096

Backport of #54401
2020-03-30 23:32:40 +03:00
Ryan Ernst c9421594bf
Remove allowTrial flag in license checking (#54293)
The allowTrial flag is always true, since trial licenses act as though
everything is licensed. This commit removes the allowTrial flag in
license checking helper methods.
2020-03-30 12:22:38 -07:00
Nik Everett e58ad9fed3
Clean up how pipeline aggs check for multi-bucket (backport of #54161) (#54379)
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.

This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.

This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.

Closes #53215
2020-03-30 10:44:55 -04:00
Jason Tedor 39b3010578
Add node local storage deprecation check (#54383)
The node.local_storage setting has been deprecated and will be removed
in 8.0.0. This commit adds a deprecation check to 7.x.
2020-03-30 10:23:43 -04:00
Christoph Büscher 67b9b68c66 [Docs] Add HLRC Async Search API documentation (#54353)
Adds documentation and a corresponding test case containing typical API usage
for the Async Search API to the High Level Rest Client.
2020-03-30 15:37:22 +02:00
Przemysław Witek 3c604da7f6
[7.x] Create an annotation when a model snapshot is stored (#53783) (#54405) 2020-03-30 15:17:08 +02:00
Benjamin Trent 374e76d7cd
[Transform] fixing naming in HLRC and _cat to match API content (#54300) (#54408)
Fixing the naming of the HLRC values to match the ToXContent field names (i.e. the field names returned from an API call).

Also fixes the names in the _cat API as well.

closes #53946
2020-03-30 08:57:02 -04:00
Martijn van Groningen 4b4fbc160d
Refactor AliasOrIndex abstraction. (#54394)
Backport of #53982

In order to prepare the `AliasOrIndex` abstraction for the introduction of data streams,
the abstraction needs to be made more flexible, because currently it really can be only
an alias or an index.

* Renamed `AliasOrIndex` to `IndexAbstraction`.
* Introduced a `IndexAbstraction.Type` enum to indicate what a `IndexAbstraction` instance is.
* Replaced the `isAlias()` method that returns a boolean with the `getType()` method that returns the new Type enum.
* Moved `getWriteIndex()` up from the `IndexAbstraction.Alias` to the `IndexAbstraction` interface.
* Moved `getAliasName()` up from the `IndexAbstraction.Alias` to the `IndexAbstraction` interface and renamed it to `getName()`.
* Removed unnecessary casting to `IndexAbstraction.Alias` by just checking the `getType()` method.

Relates to #53100
2020-03-30 10:12:16 +02:00
Jason Tedor d2aced810d
Add assertion for get autoscaling decision API test
This commit adds a match assertion to the get autoscaling decision REST
test.
2020-03-29 14:36:38 -04:00
Jason Tedor 512a318b4b
Do not stash environment in security (#54372)
Today the security plugin stashes a copy of the environment in its
constructor, and uses the stashed copy to construct its components even
though it is provided with an environment to create these
components. What is more, the environment it creates in its constructor
is not fully initialized, as it does not have the final copy of the
settings, but the environment passed in while creating components
does. This commit removes that stashed copy of the environment.
2020-03-28 12:47:16 -04:00
Jason Tedor cf68ac8a2c
Do not stash environment in machine learning (#54371)
Today the machine learning plugin stashes a copy of the environment in
its constructor, and uses the stashed copy to construct its components
even though it is provided with an environment to create these
components. What is more, the environment it creates in its constructor
is not fully initialized, as it does not have the final copy of the
settings, but the environment passed in while creating components
does. This commit removes that stashed copy of the environment.
2020-03-28 12:46:16 -04:00
Tim Brooks 2ccddbfa88
Move transport decoding and aggregation to server (#54360)
Currently all of our transport protocol decoding and aggregation occurs
in the individual transport modules. This means that each implementation
(test, netty, nio) must implement this logic. Additionally, it means
that the entire message has been read from the network before the server
package receives it.

This commit creates a pipeline in server which can be passed arbitrary
bytes to handle. Internally, the pipeline will decode, decompress, and
aggregate the messages. Additionally, this allows us to run many
megabytes of bytes through the pipeline in tests to ensure that the
logic works.

This work will enable future work:

Circuit breaking or backoff logic based on message type and byte
in the content aggregator.
Sharing bytes with the application layer using the ref counted
releasable network bytes.
Improved network monitoring based specifically on channels.
Finally, this fixes the bug where we do not circuit break on the correct
message size when compression is enabled.
2020-03-27 14:13:10 -06:00
Stuart Tettemer 1630de4a42
Scripting: stats per context in nodes stats (#54008) (#54357)
Adds script cache stats to `_node/stats`.
If using the general cache:
```
      "script_cache": {
        "sum": {
          "compilations": 12,
          "cache_evictions": 9,
          "compilation_limit_triggered": 5
        }
      }

```
If using context caches:
```
      "script_cache": {
        "sum": {
          "compilations": 13,
          "cache_evictions": 9,
          "compilation_limit_triggered": 5
        },
        "contexts": [
          {
            "context": "aggregation_selector",
            "compilations": 8,
            "cache_evictions": 6,
            "compilation_limit_triggered": 3
          },
          {
            "context": "aggs",
            "compilations": 5,
            "cache_evictions": 3,
            "compilation_limit_triggered": 2
          },
```
Backport of: 32f46f2
Refs: #50152
2020-03-27 12:26:00 -06:00
Lee Hinman f2cc2b1127
[7.x] Add REST APIs for IndexTemplateV2Metadata CRUD (#54039) (#54347)
* Add REST APIs for IndexTemplateV2Metadata CRUD (#54039)

* Add REST APIs for IndexTemplateV2Metadata CRUD

This commit adds the get/put/delete APIs for interacting with the now v2 versions of index
templates.

These APIs are behind the existing `es.itv2_feature_flag_registered` system property feature flag.

Relates to #53101

* Add exceptions for HLRC tests

* Add skips for 7.x versions

* Use index_template instead of template_v2 in action names

* Add test for MetaDataIndexTemplateService.addIndexTemplateV2

* Move removal to static method and add test

* Add unit tests for request classes (implement hashCode & equals)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

* Fix compilation

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-03-27 10:47:22 -06:00
Christoph Büscher 0d17295601 [Docs] Minor fix for SubmitAsyncSearchRequest.keepOnCompletion javadoc (#54325)
The semantics and the default value for this parameter have changed, adapting
the javadoc accordingly.
2020-03-27 16:02:03 +01:00
Przemysław Witek 2eb079b67f
Add version guards around ML hidden indices settings (#54322) 2020-03-27 14:50:57 +01:00
Ioannis Kakavas 5983f6aceb
Mute testSpInitiatedSsoFailsForMalformedRequest (#54328) (#54339)
see #54285
2020-03-27 15:46:08 +02:00
Yannick Welsch 8126ad0ab1 Increase timeout on testUpdateAnalysisLeaderIndexSettings
Closes #54204
2020-03-27 13:41:47 +01:00
Przemysław Witek d40afc7871
[7.x] Do not fail Evaluate API when the actual and predicted fields' types differ (#54255) (#54319) 2020-03-27 10:05:19 +01:00
Jason Tedor c547fabb2b
Put CCR tasks on (data && remote cluster clients) (#54146)
Today we assign CCR persistent tasks to nodes with the data role. It
could be that the data node is not capable of connecting to remote
clusters, in which case the task will fail since it can not connect to
the remote cluster with the leader shard. Instead, we need to assign
such tasks to nodes that are capable of connecting to remote
clusters. This commit addresses this by enabling such persistent tasks
to only be assigned to nodes that have the data role, and also have the
remote cluster client role.
2020-03-26 23:50:16 -04:00
Hendrik Muhs 4ecf9904d5 [Transform] Transform optmize date histogram (#54068)
optimize transform for group_by on date_histogram by injecting an additional range query. This limits the number of search and index requests and avoids unnecessary updates. Only recent buckets get re-written.

fixes #54254
2020-03-26 21:39:50 +01:00
Gordon Brown 0d30b48613
Disallow negative TimeValues (#53913)
This commit causes negative TimeValues, other than -1 which is sometimes used as
a sentinel value, to be rejected during parsing.

Also introduces a hack to allow ILM to load policies which were written to the
cluster state with a negative min_age, treating those values as 0, which should
match the behavior of prior versions.
2020-03-26 13:30:35 -06:00
William Brafford 14204f8381
Use set-based interface for NodesStatsRequest (#53637) (#54141)
The NodesStatsRequest class uses a set of strings for its internal
serialization. This commit updates the class's interface so that we
no longer use hard-coded getters and setters, but rather
methods that add strings directly. For example, the old way of
adding "os" metrics to a request would be to call request.os(true).
The new way of doing this is to call request.addMetric("os").

For the time being, the canonical list of metrics is an enum in
NodesStatsRequest. This will eventually be replaced with something
pluggable.
2020-03-26 14:41:49 -04:00
Dimitris Athanasiou 13368aae37
[7.x][ML] DF Analytics should always display operational stats (#54210) (#54290)
This commit populates the _stats API response with sensible "empty"
`data_counts` and `memory_usage` objects when the job itself
has not started reporting them.

Backport of #54210
2020-03-26 20:03:14 +02:00
Christoph Büscher da404bbce2
HLRC: Don't send defaults for SubmitAsyncSearchRequest (#54200) (#54266)
Currently we set the defaults for ccsMinimizeRoundtrips, preFilterShardSize and
requestCache on the HLRC SubmitAsyncSearchRequest in the constructor. This is no
longer needed since we now only send the parameters along with the rest request
that are supported (omitting e.g. ccsMinimizeRoundtrips) and the correct
defaults are set on the client side. This change removes setting and sending
these defaults where possible, leaving only the overwrite of batchedReduceSize
with a default value of 5, since the default used in the vanilla SearchRequest
is 512. However, we don't need to send this value along as a request parameter
if its the default since the correct one will be set on the receiving end if no
value is specified.
Also adding tests for RestSubmitAsyncSearchAction that check the correct
defaults are set when parameters are missing on the server side.

Backport of #54200
2020-03-26 19:01:17 +01:00
David Turner fc92bf4208 assertBusy in XPackRestIT#awaitCallApi (#54264)
Retries in this method were lost in #45794. This commit reinstates them.
2020-03-26 16:16:05 +00:00
Dimitris Athanasiou cc981fa377
[7.x][ML] Get ML filters size should default to 100 (#54207) (#54278)
When get filters is called without setting the `size`
paramter only up to 10 filters are returned. However,
100 filters should be returned. This commit fixes this
and adds an integ test to guard it.

It seems this was accidentally broken in #39976.

Closes #54206

Backport of #54207
2020-03-26 17:51:43 +02:00
David Turner f48e8f31b9 AwaitsFix for #54180 2020-03-26 15:35:36 +00:00
David Turner ad3c96e250 AwaitsFix for #54093 2020-03-26 13:24:33 +00:00
David Turner 53e2fec93d AwaitsFix for #53612 2020-03-26 10:41:37 +00:00
Yannick Welsch 1ba6783780 Schedule commands in current thread context (#54187)
Changes ThreadPool's schedule method to run the schedule task in the context of the thread
that scheduled the task.

This is the more sensible default for this method, and eliminates a range of bugs where the
current thread context is mistakenly dropped.

Closes #17143
2020-03-26 10:07:59 +01:00
Luca Cavanna ff269160af Async search: rename REST parameters (#54198)
This commit renames wait_for_completion to wait_for_completion_timeout in submit async search and get async search.
Also it renames clean_on_completion to keep_on_completion and turns around its behaviour.

Closes #54069
2020-03-26 09:40:50 +01:00
Yang Wang 1afd510721
Check authentication type using enum instead of string (#54145) (#54246)
Avoid string comparison when we can use safer enums.
This refactor is a follow up for #52178.

Resolves: #52511
2020-03-26 15:45:10 +11:00
Tim Vernum 1fc518c25e
Improve stability of SamlServiceProviderIndexTests (#54241)
This test assumed cluster events would be processed quickly which is
not always true

Backport of: #54166
2020-03-26 13:07:42 +10:00
Ryan Ernst 5a5d6e9ef2
Invert license security disabled helper method (#54043) (#54239)
Xpack license state contains a helper method to determine whether
security is disabled due to license level defaults. Most code needs to
know whether security is enabled, not disabled, but this method exists
so that the security being explicitly disabled can be distinguished from
licence level defaulting to disabled. However, in the case that security
is explicitly disabled, the handlers in question are never registered,
so security is implicitly not disabled explicitly, and thus we can share
a single method to know whether licensing is enabled.
2020-03-25 19:20:10 -07:00
Benjamin Trent 6d68cf809c
[Transform] Remove node.attr.transform.remote_connect and use new remote cluster client node role (#54217) (#54224)
With the addition of a formal role for nodes indicating remote cluster connection, the transform specific attribute `node.attr.transform.remote_connect` is no longer necessary.

closes https://github.com/elastic/elasticsearch/issues/54179
2020-03-25 16:29:02 -04:00
Nik Everett 8f40f1435a
Save a little space in agg tree (backport of #53730) (#54213)
This drop the "top level" pipeline aggregators from the aggregation
result tree which should save a little memory and a few serialization
bytes. Perhaps more imporantly, this provides a mechanism by which we
can remove *all* pipelines from the aggregation result tree. This will
save quite a bit of space when pipelines are deep in the tree.

Sadly, doing this isn't simple because of backwards compatibility. Nodes
before 7.7.0 *need* those pipelines. We provide them by setting passing
a `Supplier<PipelineTree>` into the root of the aggregation tree that we
only call if we need to serialize to a version before 7.7.0.

This solution works for cross cluster search because we always reduce
the aggregations in each remote cluster and then forward them back to
the coordinating node. Its quite possible that the coordinating node
needs the pipeline (say it is version 7.1.0) and the gateway node in the
remote cluster doesn't (version 7.7.0). In that case the data nodes
won't send the pipeline aggregations back to the gateway node.
Critically, the gateway node *will* send the pipeline aggregations back
to the coordinating node. This is all managed with that
`Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.
2020-03-25 15:51:16 -04:00
Jason Tedor d14f170093
Add cluster.remote.connect to deprecation info API (#54142)
This setting was recently deprecated in favor of
node.remote_cluster_client. This commit adds this setting to the
deprecation info API.
2020-03-25 15:11:59 -04:00
Hendrik Muhs cb0ecafdd8 [Transform] fix transform failure case for percentiles and spa… (#54202)
index null if percentiles could not be calculated due to sparse data

fixes #54201
2020-03-25 19:28:51 +01:00
Martijn Laarman 077bf52acc transform.cat should live in the cat namespace. (#54196)
* transform.cat should live in the cat namespace.

Similarly to to ml cat API's also living in the `cat` namespace.

Clients treat the `cat` namespace differently then other API's (return
types, content types). This introduces an exception to this rule.

* rename the specification file as well

(cherry picked from commit 0a98904b1a73a30bbaebc32bd16a238c8d03c329)
2020-03-25 18:16:01 +01:00
Mark Vieira 7728ccd920
Encore consistent compile options across all projects (#54120)
(cherry picked from commit ddd068a7e92dc140774598664efdc15155ab05c2)
2020-03-25 08:24:21 -07:00
Dimitris Athanasiou ba09a778dc
[7.x][ML] Unmute classification cardinality integ test (#54165) (#54173)
Adjusts test to work for new cardinality limit.

Backport of #54165
2020-03-25 15:00:34 +02:00
Benjamin Trent ef05a4f416
[ML] relaxing parameters on stratified split test (#54127) (#54168)
Relaxing the error rate a bit on two of the tests.
Ran 1000s of times locally and never had a failure after these changes. 

closes https://github.com/elastic/elasticsearch/issues/54122
2020-03-25 08:06:15 -04:00
Tanguy Leroux 3a3930c7ec
Mute TooManyJobsIT.testCloseFailedJob on 7.x (#54163)
Relates #54162
2020-03-25 12:44:41 +01:00
Tanguy Leroux 4a2db4651e
Mute ReadActionsTests (#54153)
Relates #53340
2020-03-25 10:35:58 +01:00
Jason Tedor 381d7586e4
Introduce formal role for remote cluster client (#54138)
This commit introduce a formal role for identifying nodes that are
capable of making connections to remote clusters.

Relates #53924
2020-03-24 21:59:43 -04:00