Commit Graph

6309 Commits

Author SHA1 Message Date
Benjamin Trent e163559e4c
[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) (#62620)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
2020-09-18 10:07:35 -04:00
Nhat Nguyen 8bea6b3711 Increase keep alive of point in time in async search tests (#62593)
Async search tests can take more than one minute due to the excessive trace logs. 
And the point in time in the tests can be expired the midway.

Closes #62451
2020-09-18 08:01:57 -04:00
Martijn van Groningen 4c949b0869
Adjust allowed warnings in data stream yaml test. (#62610) 2020-09-18 12:44:57 +02:00
Marios Trivyzas b072de4ce0
EQL: Disallow chained comparisons (#62567) (#62601)
Expressions like `1 = 2 = 3 = 4` or `1 < 2 = 3 >= 4` were treated with
leftmost priority: ((1 = 2) = 3) = 4 which can lead to confusing
results. Since such expressions don't make so much change for EQL
filters we disallow them in the parser to prevent unexpected results
from their bad usage.

Major DBs like PostgreSQL and Oracle also disallow them in their SQL
syntax. (counter example would be MySQL which interprets them as we did
before with leftmost priority).

Fixes: #61654
(cherry picked from commit 8f94981bb093f104228d267b532e0a3d5b7f6a38)
2020-09-18 10:48:14 +02:00
Costin Leau 81f2f84177 EQL: Allow requests with size 0 (#62537)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.

Fix #62494

(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
2020-09-18 11:24:39 +03:00
Martijn van Groningen 5190b0961d
adjust skip reason 2020-09-18 10:10:00 +02:00
Adrien Grand 4de8579455
Upgrade to lucene-8.7.0-snapshot-830bd186a8d. (#62596) 2020-09-18 09:51:34 +02:00
Martijn van Groningen c83d8ce78e
Adjust skip version data stream test (#62597)
after #62527 was backported.
2020-09-18 09:28:16 +02:00
David Turner 7324ee1044 Remove unused upgrade actions (#62552)
These actions were almost completely removed in #40075 but a couple of
classes were left in place. This commit completes their removal.
2020-09-18 08:16:13 +01:00
Ignacio Vera 6a3d731be1
Only call reduce on a single InternalAggregation when needed (#62525) (#62594)
Adds a new abstract method in InternalAggregation that flags the framework if it needs to reduce on a single InternalAggregation.
2020-09-18 08:43:58 +02:00
Jake Landis 5b7246157f
[7.x] Fix projects that failed to build within Intellij (#62258) (#62408)
This commit address some build failures from the perspective of Intellij.
These changes include:
* changing an order of a dependency definition that seems to can cause Intellij build to fail.
* introduction of an abstract class out of the test source set (seems to be an issue sharing 
  classes cross projects with non-standard source sets. 
* a couple of missing dependency definitions (not sure how the command line worked prior to this)
2020-09-17 17:45:12 -05:00
William Brafford b764f8977e
Copy Key Certs for javaRestTest (#62584) 2020-09-17 17:45:42 -04:00
Tim Vernum ab427534f7 [DOCS] Add warning about derived API keys to docs (#62351) 2020-09-17 13:46:25 -07:00
Dimitris Athanasiou 7118ff7976
[7.x][ML] Remove model snapshot legacy doc ids (#62434) (#62569)
Removes methods that were no longer used regarding version 5.4 doc ids of ModelState.

Also adds clean up of 5.4 model state and quantile docs in the daily maintenance.

Backport of #62434
2020-09-17 23:43:28 +03:00
Lee Hinman 9bb7ce0b22
[7.x] Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338) (#62557)
Backports the following commits to 7.x:

    Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338)
2020-09-17 13:29:23 -06:00
William Brafford 5a0dca2491
Deprecate xpack.eql.enabled setting and make it a no-op (#61375) (#62491)
* Deprecate xpack.eql.enabled and make it a no-op
* Remove uses of xpack.eql.enabled
2020-09-17 14:17:27 -04:00
Martijn van Groningen 5f643433c6
Prohibit the usage of create index api in namespaces managed by data stream templates (#62574)
Backport of #62527 to 7.x branch.

This commit adds validation that prohibits the creation of regular indices
in the namespace of templates with data streams enabled.

It shouldn't be possible to create ordinary indices when the name of the index
matches with a composable index template that enables data streams. Auto creation
has logic that creates data streams instead of regular indices. However validation
logic for the create index api was missing.
2020-09-17 20:10:42 +02:00
Jim Ferenczi df93b31b15
Faster sequential access for stored fields (#62509) (#62573)
Faster sequential access for stored fields

Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:

Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:

|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |
Closes #62024
2020-09-17 19:58:18 +02:00
Andrei Dan 3753682877
Fix AllocationRoutedStep equals and hashcode (#62548) (#62559)
(cherry picked from commit 79039e16305c7fb71ee012e693219a0d2b77e97b)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-17 17:40:21 +01:00
Alan Woodward 5421a743a7 Move SearchLookup into FetchContext (#62549)
FetchSubPhase#getProcessor currently takes a SearchLookup parameter. This
however is only needed by a couple of subphases, and will almost certainly change in
future as we want to simplify how fetch phases retrieve values for individual hits.

To future-proof against further signature changes, this commit moves the SearchLookup
reference into FetchContext instead.
2020-09-17 17:39:02 +01:00
Dimitris Athanasiou f5c28e2054
[7.x][ML] Do not start data frame analytics when too many docs are analyzed (#62547) (#62558)
The data frame structure in c++ has a limit on 2^32 documents. This commit
adds a check that the number of documents involved in the analysis are
less than that and fails to start otherwise. That saves the cost of
reindexing when it is unnecessary.

Backport of #62547
2020-09-17 19:06:38 +03:00
Mark Vieira 7d36393b09
Disable composePull task on idp-fixture project due to error (#62510) 2020-09-17 08:55:47 -07:00
Nik Everett 4d272a2a00 Runtime fields: fix a test name (#62498)
This fixes the name of a test method so we actually run it. I broke it a
few commits ago without realizing it.
2020-09-17 11:17:44 -04:00
Lee Hinman 3081b3827b
[7.x] Add host.ip and observer.ip fields to the synthetics-*-* mappings (#62412) (#62553)
We need to ensure these are mapped as 'ip' instead of a keyword, even if they do end up not being
used.

Relates to #62193
2020-09-17 09:01:53 -06:00
Lee Hinman a636d106bf
[7.x] Remove data_frozen node role (tier) and frozen ILM phase (#62403) (#62465)
Backports the following commits to 7.x:

    Remove data_frozen node role (tier) and frozen ILM phase (#62403)
2020-09-17 08:58:07 -06:00
Andrei Dan fe1194d58f
[7.x] ILM migrate data between tiers (#61377) (#62536)
This adds ILM support for automatically migrating the managed
indices between data tiers.

This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).

(cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-17 15:08:31 +01:00
Adam Locke db9dd9f7e1
[DOCS] Updating CCR setup to be more tutorial focused (#62256) (#62499)
* Applying some initial changes.

* Updating intro and screenshots.

* Removing unnecessary links, streamlining content, and adding GIF.

* Adding what's next section.

* Removing what's next.

* Minor edits.

* Apply suggestions from code review

Co-authored-by: debadair <debadair@elastic.co>

* Incorporating review feedback.

* Moving CCR user privileges to another page, plus more edits.

* Apply suggestions from code review

Co-authored-by: debadair <debadair@elastic.co>

* Incorporating more review feedback.

* Adding TESTSETUP to fix build errors.

* Update docs/reference/ccr/getting-started.asciidoc

Co-authored-by: debadair <debadair@elastic.co>

* Swapping GIF for mp4 hosted on web team CMS.

* Removing GIF in favor of mp4.

Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-09-17 09:50:21 -04:00
Luca Cavanna 3cf559bf9c Minor cleanup of runtime fields classes (#62531)
This commit addresses some compiler warnings in the runtime fields classes
2020-09-17 15:47:05 +02:00
David Kyle 417ce9396d
[ML] Add datafeed run time fields integration test (#62535) (#62538) 2020-09-17 13:41:07 +01:00
Fernando Briano d3bdff6bbf
Adds quotes to timestamp values in runtime_fields/40_date YAML test (#62526) 2020-09-17 12:09:10 +01:00
Christoph Büscher aba86d7d29
Fix condition in ILM step that cannot be met (#62377) (#62528)
ReplaceDataStreamBackingIndexStep#performAction seems to perform an equality
check on an original Index and the write indexes names, but because this
compares an Index instance to a String, the condition can never be met. This PR
changes this comparison.
2020-09-17 12:38:05 +02:00
Tanguy Leroux a39ddb8a0f
Remove ShardClusterSnapshotRestoreIT and fold test in DataStreamsSnapshotsIT (#62470) (#62523)
ShardClusterSnapshotRestoreIT is confusing as we already have a
very complete SharedClusterSnapshotRestoreIT test suite. This
commit removes ShardClusterSnapshotRestoreIT and folds its
unique test in DataStreamsSnapshotsIT.
2020-09-17 12:37:17 +02:00
Tim Vernum fe3bf86620
Fix RestrictedTrustManagerTests on Zulu8 (#62436)
Since #61857 we test using BCJSSE (Bouncy Castle SSL) when running on
Zulu8 because Azul have backported SSL changes from Java11 into their
Java8 JRE which prevents us from using Sun JSSE in FIPS mode.

BCJSSE uses different exception messages than Sun JSSE, so we needed
to update
RestrictedTrustManagerTests.testThatDelegateTrustManagerIsRespected
to reflect the fact that sometimes we might be receive BCJSSE error
messages on a Java8 JVM

Resolves: #62281
2020-09-17 20:00:42 +10:00
Marios Trivyzas abce04888f
EQL: Forbid usage of ['] for string literals (#62458) (#62496)
The usage of single quotes to wrap a string literal is forbidden
and an error encouraging the user to user double quotes is returned.

Tests are properly adjusted.

Relates to #61659

(cherry picked from commit 8be400b77370bf4cf68c89f492c2d235f3cce43c)
2020-09-17 11:29:09 +02:00
Dimitris Athanasiou d091c12e0c
[7.x] Generalize AsyncTwoPhaseIndexer first phase (#61739) (#62482)
Current implementations of the indexer are using aggregations.
Thus each search step executes a search action. However,
we can generalize that to allow for any action that returns a `SearchResponse`.
This commit abstracts the search phase from the search action.

Backport of #61739
2020-09-17 11:57:22 +03:00
Adrien Grand 9a8225bbc1
Upgrade to lucene-8.7.0-snapshot-9cd3af50f80. (#62450) (#62476)
This new snapshot contains the following JIRAs that we're interested in:
 - [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525)
Better handling of small documents. This should improve retrieval times
when documents are less than ~1kB.
 - [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510)
Faster flushes when index sorting is enabled by not compressing the
temporary files that store stored fields and term vectors.
2020-09-17 10:28:20 +02:00
Luca Cavanna 26388fe22e
Runtime fields: rename fielddata and mapped field type classes (#62483)
With this commit we rename all of the fielddata, doc_values and mapped field type classes for runtime fields to not start with the Script prefix but rather their runtime type (e.g. Boolean) and only then Script
2020-09-17 09:14:30 +02:00
Ignacio Vera 2d3ca9c155
Introduce a sparse HyperLogLogPlusPlus class for cloning and serializing low cardinality buckets (#62480) (#62520)
Reduces the memory footprint of an HLL++ structure that uses Linear counting when cloning or deserialising the data structure.
2020-09-17 08:54:50 +02:00
Costin Leau ceaf96061c EQL: Fetch sequence documents using Point-In-Time (#62469)
To preserve the PIT semantics, the retrieval of results has moved from
using multi-get to using an idsQuery.

(cherry picked from commit 1c2362fcf2be62ce568b3772924abce7331ef23c)
2020-09-17 00:12:19 +03:00
Luca Cavanna 1e352fdb7f
Runtime fields: rename script classes (#62448)
With this commit we rename the script classes used for each mapped field type used for runtime fields. The new naming is a shorter version of the previous one: from e.g. BooleanScriptFieldScrip to BooleanScript . We also move such classes to the existing mapper package.
2020-09-16 18:00:06 +02:00
James Rodewig 6394629b99
[DOCS] Document `toJSON` function for role query (#62257) (#62462) 2020-09-16 10:38:56 -04:00
Christoph Büscher f8634e5bea Muting SimpleSecurityNetty4ServerTransportTests 2020-09-16 15:14:08 +02:00
Benjamin Trent 341eeae6e7
[ML] fixes testWatchdog test verifying matcher is interrupted on timeout (#62391) (#62447)
Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.

The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.

closes #48861
2020-09-16 09:13:22 -04:00
Lyudmila Fokina 167172a057
Update authc failure headers on license change (#61734) (#62442)
Backport of #61734
2020-09-16 14:37:03 +02:00
Benjamin Trent 8d89a28126
[ML] unmuting test for testTooManyPartitions memory check on windows (#62393) (#62405)
This commit unmutes the windows check for testTooManyPartitions test.

The assertion has since changed to include a soft_limit check.

This coupled with changes over the past years means the test should be enabled again.

related to: #32033
2020-09-16 07:03:10 -04:00
Christoph Büscher 6a016fb755 Muting LogstashSystemIndexIT.testPipelineCRUD 2020-09-16 11:04:41 +02:00
Hendrik Muhs 8566e9e3e7 [Transform] Make pivot validation sub-agg aware (#62381)
With the addition of sub aggregations like filter, the validation could fail if 2 sub aggs use the
same output name. This change makes validation sub-agg aware.

fixes #57814
2020-09-16 07:55:58 +02:00
Yang Wang a11dfbe031
Oidc additional client auth types (#58708) (#62289)
The OpenID Connect specification defines a number of ways for a
client (RP) to authenticate itself to the OP when accessing the
Token Endpoint. We currently only support `client_secret_basic`.

This change introduces support for 2 additional authentication
methods, namely `client_secret_post` (where the client credentials
are passed in the body of the POST request to the OP) and
`client_secret_jwt` where the client constructs a JWT and signs
it using the the client secret as a key.

Support for the above, and especially `client_secret_jwt` in our
integration tests meant that the OP we use ( Connect2id server )
should be able to validate the JWT that we send it from the RP.
Since we run the OP in docker and it listens on an ephemeral port
we would have no way of knowing the port so that we can configure
the ES running via the testcluster to know the "correct" Token
Endpoint, and even if we did, this would not be the Token Endpoint
URL that the OP would think it listens on. To alleviate this, we
run an ES single node cluster in docker, alongside the OP so that
we can configured it with the correct hostname and port within
the docker network.

Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
2020-09-16 14:29:09 +10:00
Nik Everett 24a24d050a
Implement fields fetch for runtime fields (backport of #61995) (#62416)
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to #59332
2020-09-15 20:24:10 -04:00
Nik Everett e5ad3a41f1
Check for runtime field loops in queries (backport of #61927) (#62421)
We were checking for loops in queries before, but we had an "off by one"
error where we wouldn't notice the "top level" runtime field when
detecting a loop. So the error message would be wrong.

I also caught a few bugs with query generation caused by missing
`@Override` annotations and fixed a few of them. There is a bug with
`regexp` queries with match options that I'm not fixing in this PR but
will get to later.

Relates to #59332
2020-09-15 17:24:19 -04:00
Costin Leau b2e85d5639 SQL: Do not resolve self-referencing aliases (#62382)
Prevent the analyzer for trying to resolve aliases on expressions that
reference themselves (or fields within themselves) as that causes
infinite recursion.

Fix #62296

(cherry picked from commit 021d27815b03e92e02859bc9c0c8eec78f30c72e)
2020-09-15 20:53:28 +03:00
Armin Braun 9ac4ee9c44
Increase Flaky Timeout in testIlmHistoryIndexCanRollover (#62353) (#62402)
This busy assert easily takes about 5s on a very fast work station
so the default of 10s is not sufficient here at all.
2020-09-15 19:50:45 +02:00
Nik Everett 771a8893a6
Add more debugging information for cardinality agg (#62317) (#62397)
This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
   out if we're using the optimization for segment ordinals. It adds a few
   more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
   non-trivial for some aggregations, like cardinality.
2020-09-15 13:21:11 -04:00
William Brafford af64e46065
Add logstash system index APIs (#53350) (#62347)
We want Logstash indices to be system indices, but the logstash
service will still need to be able to manage its indices. This PR
adds special system index APIs to the logstash plugin so that
logstash can manage its pipelines without direct access to the
underlying indices.

* Add logstash module with dedicated logstash APIs
* merge with x-pack plugin
* add system index access allowance
* Break out serialization tests into distinct classes
* Log failures for partial multiget failure
* Move LogstashSystemIndexIT to javaRestTest task

Co-authored-by: William Brafford <william.brafford@elastic.co>

Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
2020-09-15 12:42:14 -04:00
Adrien Grand 6db8afefc2
Upgrade to lucene-8.7.0-snapshot-cdfdc1e0851. (#62376)
Upgrade to a new Lucene snapshot that (at least partially) addresses the
indexing rate regression when index sorting is enabled.

Backport of #62334.
2020-09-15 17:48:07 +02:00
Fernando Briano 7dd073c243
Wraps timestamp values in quotes in runtime fields YAML tests. (#62155) 2020-09-15 15:24:57 +01:00
Albert Zaharovits aeed1c05b0
Ensure authz operation overrides transient authz headers (#61621)
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum tim@adjective.org
2020-09-15 16:37:38 +03:00
Armin Braun 76f56c1264
Add Missing NamedWritable Registration for ExecuteEnrichPolicyStatus (#62364) (#62374)
This was missing and caused nodes to drop out of the cluster on serialization failures
when ever one tried to get an enrich policy task by name.
The test in here is a little dirty but I figured it would be nice to have an actual reproducer
for the issue and I couldn't find any infrastructure to nicely time the tasks so I put this on
top of existing test infra.
2020-09-15 15:24:15 +02:00
Costin Leau 03d2395183 EQL: Use Point In Time inside sequences (#62276)
Use the newly introduced PIT API to have a consistent view of the data
while doing sequence matching, which involves multiple calls, aka
repeatable reads and thus avoid race conditions or any in-flight updates
on the data.

(cherry picked from commit daa72fc3c71fd36afb55278021ff6bbc591ef148)
2020-09-15 15:40:03 +03:00
Martijn van Groningen 3ed60df59d
Re-enable resolve index multi cluster test (#62365)
Backport of #62361 to 7.x branch.

This test was fine and shouldn't have been muted.
The test case class should have preserved data streams as part of #62205

Closes #62210
2020-09-15 13:26:52 +02:00
David Kyle 717259a049 Revert "[ML] Add debug logging of notification messages to upgrade test (#62342)"
This reverts commit c50899dd8f.
2020-09-15 11:07:23 +01:00
David Kyle c50899dd8f
[ML] Add debug logging of notification messages to upgrade test (#62342)
For #61908
2020-09-15 08:24:13 +01:00
Lee Hinman 6b2af30a62
[7.x] Add "synthetics-*-*" templates for synthetics fleet data (#62193) (#62346)
* Add "synthetics-*-*" templates for synthetics fleet data

For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong
with those and thus we should have a place for it to live. This would be data reported from
heartbeat and under the 'monitoring' category.

This commit adds a composable index template for `synthetics-*-*` indices similar to the work in
 #56709 and #57629.

Resolves #61665
2020-09-14 17:14:34 -06:00
Julie Tibshirani 4a19bdb2ea
Support the 'fields' option in inner_hits and top_hits. (#62337)
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation

Addresses #61949.
2020-09-14 11:51:45 -07:00
David Roberts 3d5c13f559
[ML] Add an assertion on annotations mappings to upgrade test (#62331)
The annotations index is not covered by the comparison between
mappings and templates, as it does not use an index template.

This commit adds an assertion on annotations index mappings
that will fail if the mappings are not upgraded as expected.

Backport of #62325
2020-09-14 18:46:35 +01:00
David Roberts e4275f3749 [ML] Use utility thread pool for memory estimation (#62314)
The job comms thread pool is intended for the long-running job
processes that do anomaly detection or data frame analytics and
count towards job count and memory limits.

This commit moves the short-lived memory estimation processes
to the ML utility thread pool.

Although this doesn't matter in most cases, at the limits of
scale it could mean that memory estimations would get in the way
of starting jobs, or would queue up for an excessive period of
time while waiting for jobs to finish.
2020-09-14 16:47:12 +01:00
Lee Hinman bf9651c635
[7.x] Add "content" tier as new "data_content" role (#62247) (#62322)
Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.

Currently this tier is not used anywhere, but subsequent work will use this tier.

Relates to #60848
2020-09-14 09:42:57 -06:00
Benjamin Trent 13c193a9fc
[Enrich] add logging for when there are search/bulk failures on _execute (#62313) (#62320)
When calling `_execute` there is a chance that there will be bulk indexing failures 
or search failures. 

These will result in the call failing overall. But, no information is provided for troubleshooting the failure.

This commit adds logging to indicate the number of failures, and new debug level logging so that
failure details can be determined if necessary.

closes https://github.com/elastic/elasticsearch/issues/60491
2020-09-14 11:20:13 -04:00
Armin Braun 95766da345
Save Some Allocations when Working with ClusterState (#62060) (#62303)
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
2020-09-14 15:09:54 +02:00
Tanguy Leroux 9e38dd0254
Deprecate Repository Stats API (#62297) (#62308)
This commit deprecates the Repository Stats API added in 7.8.0 as
an experimental API behind a feature flag. The goal is to deprecate
this API in 7.10.0 and remove it in a follow up PR in 8.0.0.

This API is now superseded by the Repositories Metering API.
2020-09-14 14:57:38 +02:00
David Roberts d8288526d9 [ML] Add null checks for C++ log handler (#62238)
It has been observed that if the normalizer process fails
to connect to the JVM then this causes a null pointer
exception as the JVM tries to close the native process
object.  The accessors and close methods of the native
process class that access the C++ log handler should not
assume that it connected correctly.
2020-09-14 11:28:26 +01:00
Martijn van Groningen c88f4174ec
Fix resolve index data streams yaml test. (#62221)
Closes #62190
2020-09-14 08:43:58 +02:00
Nhat Nguyen 7779c1f703 Ensure to release async search iterator in tests
We need to close an async search response iterator to release
the related point in time if the test uses pit.
2020-09-12 12:04:10 -04:00
Martijn van Groningen 1bb094a27b
Return 404 when deleting a non existing data stream (#62224)
Backport of #62059 to 7.x branch.

Return a 404 http status code when attempting to delete a non existing data stream.
However only return a 404 when targeting a data stream without any wildcards.

Closes #62022
2020-09-11 15:36:05 +02:00
Nhat Nguyen b118697368
Adjust BWC rest version for point in time (#62264)
Relates #61872
2020-09-11 08:54:11 -04:00
Luca Cavanna b5e1e652c1 Remove unused import 2020-09-11 10:19:01 +02:00
Luca Cavanna 3d3a1b4bc2 Tweak OpenPointInTimeRequest createTask
This commit addresses a super minor misalignment with master, applying exactly the same change that was made as part of #62057, which was backported before point in time APIs were backported.
2020-09-11 10:06:35 +02:00
Nhat Nguyen aafb2cb812 Support point in time cross cluster search (#61827)
This commit integrates point in time into cross cluster search.

Relates #61062
Closes #61790
2020-09-10 19:25:48 -04:00
Nhat Nguyen 808c8689ac Always include the matching node when resolving point in time (#61658)
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.

Closes #61627
2020-09-10 19:25:48 -04:00
Nhat Nguyen 035f0638f4 Support point in time in async_search (#61560)
This commit integrates point in time into async search and
ensures that it works correctly with security enabled.

Relates #61062
2020-09-10 19:25:48 -04:00
Nhat Nguyen 2eb1e8bc84 Make keep alive of point in time optional in search (#62184)
A search request should not be required to extend the keep_alive of a point in time. 
This change makes that parameter optional.
2020-09-10 19:25:48 -04:00
Jim Ferenczi 4d528e91a1 Ensure validation of the reader context is executed first (#61831)
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.

Relates #61446

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2020-09-10 19:25:48 -04:00
Nhat Nguyen 3d69b5c41e Introduce point in time APIs in x-pack basic (#61062)
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
2020-09-10 19:25:47 -04:00
Nhat Nguyen 87c889f9c9 CCR should retry on CircuitBreakingException (#62013)
CCR shard follow task can hit CircuitBreakingException on the leader 
cluster (read changes requests) or the follower cluster (bulk requests).
CCR should retry on CircuitBreakingException as it's a transient error.
2020-09-10 17:23:47 -04:00
Nik Everett ac23380560 Fix some query methods in runtime fields
We were missing a few `@Override` annotations in runtime fields which
let us drift from the methods we were supposed to override. Oops. This
adds them and links the methods.
2020-09-10 17:06:05 -04:00
Luca Cavanna 39e59d6edf Share more query execution code for runtime fields (#62229)
For runtime fields we have written quite some lucene queries that work against runtime values that are the result of the execution of the different script contexts that runtime fields support.

The all (but one) share the same main logic: use a two phase iterator, iterate over all documents, and decide whether the current doc matches or not based on what the script returns. I went ahead and shared this bit of code in the base class for all queries on top of runtime fields.
2020-09-10 20:27:49 +02:00
Luca Cavanna cd9774d8cb Runtime fields: rename emitValue function to emit (#62191)
We decided to shorten the emitValue function to emit, given that emit is self-explanatory.

Relates to #59332
2020-09-10 20:27:49 +02:00
Adam Locke a8e3796d96
[DOCS] Adding link from SAML docs to ADFS blog. (#62006) (#62232) 2020-09-10 11:15:15 -04:00
Tanguy Leroux 42f5d38d9b
Remove REST APIs documentation for experimental Searchable Snapshot APIs (#62217) (#62231)
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats

These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
2020-09-10 16:51:28 +02:00
Ignacio Vera c8981ea93d
upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213) (#62222) 2020-09-10 16:23:18 +02:00
Martijn van Groningen 81b89fe3ba
Change yaml test suite testcase to automatically delete all data streams after each yaml test (#62214)
Backporting #62205 to 7.x branch.

This is similar to what happens for indices. Initially we decided to let each test cleanup the
data streams it created.

The reason behind this was that client yaml test runners would need to be modified to do this too and
because data steams were new, we waited with that and let each test cleanup the data stream it created.
However we sometimes have very hard to debug test failures, because many tests fail because another test
failed mid way and didn't clean up the data streams it created. Given that and data streams exist in
the code base for a while now, we should automatically delete all data streams after each yaml test.

Relates to #62190

* preserve data streams for rolling upgrade yaml tests
2020-09-10 15:10:57 +02:00
Martijn van Groningen f87fc67592
Mute resolve index data stream tests. (#62211)
Relates to #62190 and #62210
2020-09-10 13:13:50 +02:00
Hendrik Muhs ab259d37a7 mute mapping upgrade check for the .ml-notifications, as a global AwaitsFix
would mute everything I allow myself to use a comment
2020-09-10 12:24:57 +02:00
Andrei Stefan cce6da7d52
EQL: add the wildcard field type to the IT tests (#62166) (#62200)
* Add wildcard field type as an option for randomized testing of IT queries

(cherry picked from commit 87b14c409c180c4d53c3c61a30bd69f1b81a2823)
2020-09-10 12:36:36 +03:00
Yannick Welsch e3feafc1e9 Enable searchable snapshots in release builds (#62201)
Enables searchable snapshot functionality not only in snapshot, but also release builds.
2020-09-10 11:20:12 +02:00
David Roberts 969a1c558b [ML] Include the "properties" layer in find_file_structure mappings (#62158)
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing.  The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects.  However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects.  As a first
step it makes sense to move the returned mappings closer
to the standard format.

This is a small building block towards fixing #55616
2020-09-10 09:33:42 +01:00
Jake Landis c6c6596623
[7.x] Configure internalClusterTest for snapshot_feature_enabled flag (#62185) (#62189)
closes #62039
2020-09-09 15:41:43 -05:00
Jake Landis d8dad9ab2c
[7.x] Remove integTest task from PluginBuildPlugin (#61879) (#62135)
This commit removes `integTest` task from all es-plugins.  
Most relevant projects have been converted to use yamlRestTest, javaRestTest, 
or internalClusterTest in prior PRs. 

A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest 
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)

related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
2020-09-09 14:25:41 -05:00
Nik Everett 6d2cab9437
Stop runtime script from emitting too many values (#61938) (#62186)
This prevent `keyword` valued runtime scripts from emitting too many
values or values that take up too much space. Without this you can put
allocate a ton of memory with the script by sticking it into a tight
loop. Painless has some protections against this but:
1. I don't want to rely on them out of sheer paranoia
2. They don't really kick in when the script uses callbacks like we do
   anyway.

Relates to #59332
2020-09-09 14:47:24 -04:00
Benjamin Trent e181e24d48
[ML] only persist progress if it has changed (#62123) (#62180)
* [ML] only persist progress if it has changed

We already search for the previously stored progress document.

For optimization purposes, and to prevent restoring the same
progress after a failed analytics job is stopped,
this commit does an equality check between the previously stored progress and current progress
If the progress has changed, persistence continues as normal.
2020-09-09 12:04:09 -04:00