Commit Graph

5324 Commits

Author SHA1 Message Date
David Roberts 2c04685b81
[ML] Ensure config index mappings are up-to-date before updating configs (#58938)
We already had code to ensure the config index mappings were
up-to-date before creating a new config.  However, it's also
possible that an update to a config could add the latest
settings that require the latest mappings to index correctly.

This change checks that the latest config index mappings are
in place in the 3 update actions in the same way as the checks
are done in the 3 put actions.

Backport of #58916
2020-07-02 18:55:19 +01:00
Dan Hermann c988afdc15
Data stream support for migrations deprecations info API 2020-07-02 11:16:22 -05:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Tanguy Leroux 6aa669c8bb Fix SearchableSnapshotDirectoryStatsTests (#58912)
Similar to #58847 but in a different tests. The failure never 
reproduced locally but occurs from time to time on CI.
2020-07-02 16:39:26 +02:00
Dan Hermann b78bfa01f6
[7.x] Data stream support for graph explore API 2020-07-02 08:19:03 -05:00
David Kyle d6643bfc7f Revert "Mute FsSearchableSnapshotsIT testClearCache (#58902)"
The test was fixed in #58847

This reverts commit bb96c910a5.
2020-07-02 13:21:05 +01:00
David Kyle bb96c910a5 Mute FsSearchableSnapshotsIT testClearCache (#58902)
For #58901
2020-07-02 12:58:28 +01:00
Costin Leau 965f77fa44 EQL: Introduce sequence internal paging (#58859)
Refactor sequence matching classes in order to decouple querying from
results consumption (and matching).
Rename some classes to better convey their intent.

Introduce internal pagination of sequence algorithm, that is getting the
data in slices and, if needed, moving forward in order to find more
matches until either the dataset is consumer or the number of results
desired is found.

(cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)
2020-07-02 13:44:21 +03:00
Przemysław Witek 8e074c4495
Rename "error" field to "value" for consistency between metrics (#58726) (#58870) 2020-07-02 09:08:56 +02:00
Yang Wang a5a8b4ae1d
Add cache for application privileges (#55836) (#58798)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-07-02 11:50:03 +10:00
Benjamin Trent c64e283dbf
[7.x] [ML] handles compressed model stream from native process (#58009) (#58836)
* [ML] handles compressed model stream from native process (#58009)

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: https://github.com/elastic/ml-cpp/pull/1349
2020-07-01 15:14:31 -04:00
Mark Vieira 1fcaec7dfc
Ignore test seed used in test system properties (#58789) 2020-07-01 11:52:22 -07:00
Nhat Nguyen f63cbad629
Ensure CCR partial reads never overuse buffer (#58620)
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
2020-07-01 13:23:28 -04:00
Tanguy Leroux ec4843f4df Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847)
Since #58728 part of searchable snapshot shard files are written in cache 
in an asynchronous manner in a dedicated thread pool. It means that even 
if a search query is successful and returns, there are still more bytes to 
write in the cached files on disk.

On CI this can be slow; if we want to check that the cached_bytes_written 
has changed we need to check multiple times to give some time for the 
cached data to be effectively written.
2020-07-01 18:01:00 +02:00
Benjamin Trent c768467155
Muting flakey test (#58855) (#58856) 2020-07-01 11:54:43 -04:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Ryan Ernst c23613e05a
Split license allowed checks into two types (#58704) (#58797)
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.

Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
2020-07-01 07:11:05 -07:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Tanguy Leroux d35e8f45da
Allow read operations to be executed without waiting for full range to be written in cache (#58728) (#58829)
This commit changes CacheFile and CachedBlobContainerIndexInput so that
 the read operations made by these classes are now progressively executed 
and do not wait for full range to be written in cache. It relies on the change 
introduced in #58477 and it is the last change extracted from #58164.

Relates #58164
2020-07-01 15:38:17 +02:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00
Andrei Stefan b904a60275
EQL: Add case handling to stringContains (#58762) (#58813)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 1a58776d3aa563beb364b067a1db46497122306f)
2020-07-01 13:51:45 +03:00
Andrei Stefan 470bcee5bf
EQL: Integrate TOML tests for function folding (#58748) (#58812)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit e9b1fa58cf8d510a4b4afb14f66b0d5f9c603ebb)
2020-07-01 13:50:54 +03:00
Przemysław Witek 2638809cba
Mute failing test DataFrameAnalyticsConfigProviderIT.testUpdate_UpdateCannotBeAppliedWhenTaskIsRunning (#58821) 2020-07-01 12:28:23 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
David Kyle 27d52d4d23
Remove the Model interface (#58754) (#58803)
The Model interface was implemented by just one class and did not 
contribute to making the code more undertandable
2020-07-01 09:57:02 +01:00
Dario Gieselaar 417f7062c5
[7.x] Add read privileges for annotations for apm_user (#58530) (#58781)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-01 09:04:57 +02:00
Yang Wang 3d49e62960
Support handling LogoutResponse from SAML idP (#56316) (#58792)
SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective.

The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error.

This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.
2020-07-01 16:47:27 +10:00
Lee Hinman 74a78b3a7b
Mute AzureSearchableSnapshotsIT (#58775)
Relates to #58260
2020-06-30 13:30:51 -06:00
Dan Hermann 22806c943d
Data stream support for ILM remove policy API (#58595) (#58770) 2020-06-30 14:03:19 -05:00
Benjamin Trent a2331bc9d4
[Transform] fix bug in supporting boolean values in pivot (#58741) (#58760)
Since the underlying composite aggs support boolean mapped values for terms, transforms should also support them

closes #58697
2020-06-30 13:47:58 -04:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
Julie Tibshirani ab65a57d70
Merge mappings for composable index templates (#58709)
This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).

Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.

Relates to #53101.
2020-06-30 08:01:37 -07:00
David Roberts d9e0e0bf95
[ML] Pass through the stop-on-warn setting for categorization jobs (#58738)
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.

Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356

Backport of #58632
2020-06-30 15:17:04 +01:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Przemysław Witek 9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 14:09:11 +02:00
Benjamin Trent def5550df3
[ML] fix ml inference stats tests (#58690) (#58729) 2020-06-30 07:53:33 -04:00
Przemyslaw Gomulka 3923a10165
Exclude SystemV timezones from randomZone method (#58549) (#58655)
RandomZone test method returns a ZoneId from the set of ids supported by
java. The only difference between joda and java supported timezones are
SystemV* timezones.
These should be excluded from randomZone method as they would break
testing. They also do not bring much confidence when used in testing as
I suspect they are rarely used.
That exclude should be removed for simplification once joda support is removed.
2020-06-30 12:45:53 +02:00
Andrei Stefan 7b80ea7218
Fix release tests (#58713) (#58725)
(cherry picked from commit 7816c100612168bf46595c4813fe374bca2e7259)
2020-06-30 13:42:32 +03:00
Tanguy Leroux 4e03633a66
Differentiate base paths for searchable snapshots QA tests (#58664) (#58714)
This commit adds the BuildParams.testSeed to the repository base paths used 
in searchable snapshots QA tests. For S3 and GCS the test seed is added for 
coherency sake with other integration tests while it's required for Azure as 
Azure 3rd party tests are executed on CI simultaneously for regular and 
SAS token accounts.

Closes #58260
2020-06-30 10:18:33 +02:00
Tim Vernum dcc5a06dec
Display enterprise license as platinum in /_xpack (#58217)
The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.

However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.

This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.

This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
2020-06-30 16:42:28 +10:00
Costin Leau 3a546f1f51 EQL: Introduce support for sequence maxspan (#58635)
EQL sequences can specify now a maximum time allowed for their span
(computed between the first and the last matching event).

(cherry picked from commit 747c3592244192a2e25a092f62aec91a899afc83)
2020-06-29 21:31:00 +03:00
Igor Motov 773f3574a9
Removes debug logging from RestEqlCancellationIT (#58676)
The test didn't fail since the fix in #58493. So, it's time to remove debug
logging and close the issue.

Closes #58270
2020-06-29 13:15:01 -04:00
Andrei Stefan 3cb8f54f28
EQL: case sensitivity aware integration testing (#58624) (#58672)
* EQL: case sensitivity aware integration testing (#58624)

* Add DataLoader
* Rewrite case sensitivity settings:
NULL -> run both case sensitive and insensitive tests
TRUE -> run case sensitive test only
FALSE -> run case insensitive test only
* Rename test_queries_supported
* Add more toml tests from the Python client

Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)
2020-06-29 18:40:07 +03:00
Tanguy Leroux 73adcf4d44
SparseFileTracker.Gap should keep a reference to the corresponding Range (#58587) (#58665)
SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill,
it does not need to resolve the range each time onSuccess/onProgress/onFailure are 
called.

Relates #58477
2020-06-29 15:24:19 +02:00
Przemysław Witek 3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API (#58302) (#58648) 2020-06-29 10:56:11 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00
Dimitris Athanasiou 1817b896c9
[7.x][ML] Add status and increased estimate to memory usage (#58588) (#58606)
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.

Backport of #58588
2020-06-28 16:27:26 +03:00
Costin Leau 3c81b91474 EQL: Add Head/Tail pipe support (#58536)
Introduce pipe support, in particular head and tail
(which can also be chained).

(cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea)
(cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)
2020-06-27 09:49:14 +03:00
Benjamin Trent 7a202b149e
Muting analytics tests (#58617) (#58618) 2020-06-26 16:50:59 -04:00
Tanguy Leroux 775fb5d4cf
Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477) (#58584)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.

This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
 SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 18:26:20 +02:00
James Baiera 89243857ce
Update precommit to filter out project dependencies (#58189) (#58572)
If a project is pulling in an external org.elasticsearch dependency, the dependency
report generation would require a license file for the dependency to be present. 
This would break precommit because a license was present that it did not feel was
warranted. This un-reverts the update to the dependenciesInfo task, as well as the 
JNA license addition.
2020-06-25 16:33:25 -04:00
Lee Hinman f732003370
[7.x] Fix negative limiting with fewer PARTIAL snapshots than minimum required (#58563) (#58569)
In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove
the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can
cause:

```
[mynode] error during snapshot retention task
java.lang.IllegalArgumentException: -5
	at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?]
	at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?]
	at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?]
	at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
```

When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and
adds a unit test for the behavior.

Resolves #58515
2020-06-25 14:16:34 -06:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Igor Motov 20af856abd
[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192)
Adds async support to EQL searches

Closes #49638

Co-authored-by: James Rodewig james.rodewig@elastic.co
2020-06-25 14:11:57 -04:00
Benjamin Trent c7ba79bc19
[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537) (#58553)
* [ML] make waiting for renormalization optional for internally flushing job (#58537)

When flushing, datafeeds only need the guaruntee that the latest bucket has been handled.

But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data.

This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization.

closes #58395
2020-06-25 12:26:52 -04:00
Nik Everett 03e6d1b535
Add Variable Width Histogram Aggregation (backport of #42035) (#58440)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-06-25 11:40:47 -04:00
Nik Everett 71adade73a
Return clear error message if aggregation type is invalid (#58255) (#58365)
The main changes are:

1. Catch the `NamedObjectNotFoundException` when parsing aggregation
   type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().

Closes #58146.

Co-authored-by: bellengao <gbl_long@163.com>
2020-06-25 11:08:25 -04:00
Dimitris Athanasiou c3dfafe0b4
[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541) (#58550)
It is possible for the source document to have an empty string value
for a field that is mapped as numeric. We should treat those as missing
values and avoid throwing an assertion error.

Backport of #58541
2020-06-25 18:07:29 +03:00
Dimitris Athanasiou 5af7071db0
[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546)
This changes the default value for the results field of inference
applied on models that are trained via a data frame analytics job.
Previously, the results field default was `predicted_value`. This
commit makes it the same as in the training job itself. The new
default field is `<dependent_variable>_prediction`. Apart from
making inference consistent with the training job the model came
from, it is helpful to preserve the dependent variable name
by default as it provides some context to the user that may
avoid confusion as to which model results came from.

Backport of #58538
2020-06-25 18:03:43 +03:00
Benjamin Trent add8ff1ad3
[ML] assume data streams are enabled in data stream tests (#58502) (#58508) 2020-06-24 14:14:48 -04:00
Chris Roberson d5899d1765
[Monitoring] APM mapping update (#46244) (#58498)
* Add acm mapping to APM for beats

* Add root mapping for APM

* Add sourcemap mapping to APM

* Fix missing properties

* Fix a second missing properties

* Add request property to acm

* Remove root and sourcemap per review

Co-authored-by: Mike Place <mike.place@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-24 13:26:30 -04:00
Armin Braun 9e4c5d1dde
Cleaner Handling of Snapshot Related null Custom Values in CS (#58382) (#58501)
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
2020-06-24 17:24:44 +02:00
Martijn van Groningen f4fad9c65a
Re-enable data streams yaml tests in bwc mode (#58500)
Backport of #58403 to 7.x branch.
2020-06-24 16:59:51 +02:00
Andrei Stefan 69f73d948b
EQL: code cleanup and further tests (#58458) (#58497)
Add FunctionPipe tests to all functions. Cleanup functions code.

(cherry picked from commit 0f83d5799841fe99d8aeaf46e50dd11aa6bf8a57)
2020-06-24 17:38:56 +03:00
Przemysław Witek 551b8bcd73
[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names (#58484) (#58490) 2020-06-24 15:52:45 +02:00
Benjamin Trent fa88e71532
[ML] unify usages of _all and wildcard <*> (#58460) (#58494) 2020-06-24 09:47:57 -04:00
Luca Cavanna dbbf2772d8 Mute newly added ml data streams tests (#58492)
Relates to #58491
2020-06-24 15:11:40 +02:00
markharwood d5ac3bb87f
Field capabilities - make `keyword` a family of field types (#58315) (#58483)
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.

Relates to #53175
2020-06-24 12:32:14 +01:00
Alan Woodward d251a482e9 Move MappedFieldType.similarity() to TextSearchInfo (#58439)
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.

It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
2020-06-24 10:00:32 +01:00
Jim Ferenczi fcd8a432d9 Submit _async search task should cancel children on cancellation (#58332)
This change allows the submit async search task to cancel children
and removes the manual indirection that cancels the search task when the submit
task is cancelled. This is now handled by the task cancellation, which can cancel
grand-children since #54757.
2020-06-24 09:10:26 +02:00
Przemysław Witek 4e4ca6ac25
Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447) (#58459) 2020-06-23 22:18:39 +02:00
Benjamin Trent a9b868b7a9
[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280) (#58455)
This commits allows data streams to be a valid source for analytics and transforms.

Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.

For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
2020-06-23 14:40:35 -04:00
Benjamin Trent 0cc84d3caf
[ML] wait for yellow state for stats index in tests (#58436) (#58456)
GET inference stats now reads from the .ml-stats index.

Our tests should wait for yellow state before attempting to query the index for stat information.
2020-06-23 13:32:24 -04:00
Dimitris Athanasiou f67fee387b
[7.x][ML] Make regression training set predictable in size (#58331) (#58453)
Unlike `classification`, which is using a cross validation splitter
that produces training sets whose size is predictable and equal to
`training_percent * class_cardinality`, for regression we have been
using a random splitter that takes an independent decision for each
document. This means we cannot predict the exact size of the training
set. This poses a problem as we move towards performing test inference
on the java side as we need to be able to provide an accurate upper
bound of the training set size to the c++ process.

This commit replaces the random splitter we use for regression with
the same streaming-reservoir approach we do for `classification`.

Backport of #58331
2020-06-23 19:49:03 +03:00
Marios Trivyzas e7c40d973e
SQL: Relax parsing of date/time escaped literals (#58336) (#58450)
Improve the usability of the MS-SQL server/ODBC escaped
date/time/timestamp literals, by allowing timezone/offset ids
in the parsed string, e.g.:
```
{ts '2000-01-01T11:11:11Z'}
```

Closes: #58262
(cherry picked from commit 0af1f2fef805324e802d97d2fd9b4660abb403f0)
2020-06-23 18:05:54 +02:00
David Roberts 0d6bfd0ac3
[7.x][ML] Fix wire serialization for flush acknowledgements (#58443)
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null.  This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant.  But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.

This change corrects the discrepancies.  Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.

Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.

Backport of #58413
2020-06-23 16:42:06 +01:00
Mark Tozzi 52806a8f89
Small VS config cleanup (#58294) (#58442) 2020-06-23 10:53:06 -04:00
Benjamin Trent 61142a3005
[ML] only log if forecasts are set to failed (#58421) (#58437)
This adjusts the logging level for setting forecasts to failed to WARN. And it will only log if 1 or more forecasts were adjusted to failed.
2020-06-23 10:24:03 -04:00
Alan Woodward 8ebd341710
Add text search information to MappedFieldType (#58230) (#58432)
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.

This allows us to remove the MapperService.getLuceneFieldType() shim method.
2020-06-23 14:37:26 +01:00
Alan Woodward 519d1278e2
Make FieldTypeLookup immutable (#58162) (#58411)
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
2020-06-23 10:51:32 +01:00
David Roberts f97b37190b [ML] Add a new annotation type for categorization status changes (#58394)
Adds a new value to the "event" enum of ML annotations, namely
"categorization_status_change".

This will allow users to see when categorization was found to
be performing poorly.  Once per-partition categorization is
available, it will allow users to see when categorization is
performing poorly for a specific partition.

It does not make sense to reuse the "model_change" event that
annotations already have, because categorizer state is separate
to model state ("model" state is really anomaly detector state),
and is not reverted by the revert model snapshot API.
Therefore annotations related to categorization need to be
treated differently to annotations related to anomaly detection.
2020-06-23 09:16:27 +01:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00
Costin Leau 765f1b5775 SQL: Fix bug in resolving aliases against filters (#58399)
When doing aliasing with the same name over non existing fields, the analyzer gets stuck in a loop trying to resolve the alias over and over leading to SO. This PR breaks the cycle by checking the relationship between the alias and the child it tries to replace as an alias should never replace its child.

Fix #57270
Close #57417
Co-authored-by: Hailei <zhh5919@163.com>

(cherry picked from commit 46786ff2e1ed5951006ff4bdd2b6ac6a1ebcf17b)
2020-06-22 16:05:42 +03:00
Przemko Robakowski a44dad9fbb
[7.x] Add support for snapshot and restore to data streams (#57675) (#58371)
* Add support for snapshot and restore to data streams (#57675)

This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.

Closes #57127

Relates to #53100

* version fix

* compilation fix

* compilation fix

* remove unused changes

* compilation fix

* test fix
2020-06-19 22:41:51 +02:00
Benjamin Trent bf8641aa15
[7.x] [ML] calculate cache misses for inference and return in stats (#58252) (#58363)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-19 09:46:51 -04:00
Stuart Tettemer 20abba8433
Scripting: Deprecate general cache settings (#55753) (#58283)
Backport: ef543b0
2020-06-18 11:54:23 -06:00
Jim Ferenczi 1c1a6d4ec8 Handle failures with no explicit cause in async search (#58319)
This commit fixes an AOOBE in the handling of fatal
failures in _async_search. If the underlying cause is not found,
this change uses the root failure.

Closes #58311
2020-06-18 18:57:58 +02:00
Przemysław Witek 9dd3d5aa48
[7.x] Delete auto-generated annotations when model snapshot is reverted (#58240) (#58335) 2020-06-18 17:59:52 +02:00
Jason Tedor be08268562
Allow follower indices to override leader settings (#58103)
Today when creating a follower index via the put follow API, or via an
auto-follow pattern, it is not possible to specify settings overrides
for the follower index. Instead, we copy all of the leader index
settings to the follower. Yet, there are cases where a user would want
some different settings on the follower index such as the number of
replicas, or allocation settings. This commit addresses this by allowing
the user to specify settings overrides when creating follower index via
manual put follower calls, or via auto-follow patterns. Note that not
all settings can be overrode (e.g., index.number_of_shards) so we also
have detection that prevents attempting to override settings that must
be equal between the leader and follow index. Note that we do not even
allow specifying such settings in the overrides, even if they are
specified to be equal between the leader and the follower
index. Instead, the must be implicitly copied from the leader index, not
explicitly set by the user.
2020-06-18 11:56:06 -04:00
Alan Woodward 4b8cf2af6a
Add serialization test for FieldMappers when include_defaults=true (#58235) (#58328)
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.

Fixes #58188
2020-06-18 15:46:04 +01:00
Marios Trivyzas 50b391e91b
SQL: [Docs] Fix TIME_PARSE documentation (#58182) (#58317)
TIME_PARSE works correctly if both date and time parts are specified,
and a TIME object (that contains only time is returned).

Adjust docs and add a unit test that validates the behavior.

Follows: #55223
(cherry picked from commit 9d6b679a5da88f3c131b9bdba49aa92c6c272abe)
2020-06-18 16:09:13 +02:00
Alan Woodward ca2d12d039 Remove Settings parameter from FieldMapper base class (#58237)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
2020-06-18 12:53:54 +01:00
Tanguy Leroux f3b6e41f02 Do not wrap CacheFile reentrant r/w locks with ReleasableLock (#58244)
Today the read/write locks used internally by CacheFile object are 
wrapped into a ReleasableLock. This is not strictly required and also 
prevents usage of the tryLock() methods which we would like to use 
for early releasing of read operations (#58164).
2020-06-18 11:01:53 +02:00
Andrei Dan caa5d3abe0
ILM actions check the managed index is not a DS write index (#58239) (#58295)
This changes the actions that would attempt to make the managed index read only to
check if the managed index is the write index of a data stream before proceeding.
The updated actions are shrink, readonly, freeze and forcemerge.

(cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-18 07:45:11 +01:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Lee Hinman d272646a55 Fix name of template in allowed warning for DS YML test (#58273)
The warning was present, but had the incorrect template name, leading to a test failure.
2020-06-17 11:23:04 -06:00
David Roberts 3f8d16304c
Add ML admin permissions to the kibana_system role (#58172)
As part of the "ML in Spaces" project, access to the ML UI in
Kibana is migrating to being controlled by Kibana privileges.
The ML UI will check whether the logged-in user has permission
to do something ML-related using Kibana privileges, and if they
do will call the relevant ML Elasticsearch API using the Kibana
system user.  In order for this to work the kibana_system role
needs to have administrative access to ML.

Backport of #58061
2020-06-17 17:03:32 +01:00
Benjamin Trent 2de242f80e
[ML] rename EnsembleSizeInfo#inputFieldNameLengths to this.featureNameLengths (#58241) (#58253) 2020-06-17 10:08:55 -04:00
Benjamin Trent 69338b03d7
[ML] expand data_streams when assigning datafeed to node (#58175) (#58242) 2020-06-17 08:34:34 -04:00
Ignacio Vera 2d3d7ab387
mute CentroidCalculatorTests#testPolygonAsPoint (#58249) (#58250) 2020-06-17 14:32:13 +02:00
Jason Tedor b78b3edeea
Upgrade to JNA 5.5.0 (#58183)
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
2020-06-17 07:35:08 -04:00
Dimitris Athanasiou 36dbf08d47
[7.x][ML] Improve stability of stratified splitter tests (#58180) (#58224)
The main improvement here is that the total expected
count of training rows in the test is calculated as the
sum of the training fraction times the cardinality of each
class (instead of the training fraction times the total doc count).

Also relaxes slightly the error bound on the uniformity test from 0.12
to 0.13.

Closes #54122

Backport of #58180
2020-06-17 12:40:21 +03:00
Andrei Dan e17c51151b
[7.x] ILM: don't take snapshot of a data stream's write index (#58159) (#58222)
We don't allow converting a data stream's writeable index into a searchable
snapshot. We are currently preventing swapping a data stream's write index
with the restored index.

This adds another step that will not proceed with the searchable snapshot action
until the managed index is not the write index of a data stream anymore.

(cherry picked from commit ccd618ead7cf7f5a74b9fb34524d00024de1479a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-17 09:45:16 +01:00
Ignacio Vera 7080ba5b05
Check for degenerated lines when calculating the centroid (#58216) 2020-06-17 09:34:49 +02:00
Przemysław Witek b22e91cefc
[7.x] Delete auto-generated annotations when job is deleted. (#58169) (#58219) 2020-06-17 09:17:20 +02:00
Stuart Tettemer 01795d1925
Revert "Scripting: Deprecate general cache settings (#55753)" (#58201)
This reverts commit 88e8b34fc2.
2020-06-16 14:58:18 -06:00
Stuart Tettemer 88e8b34fc2
Scripting: Deprecate general cache settings (#55753)
Backport: ef543b0
2020-06-16 13:06:59 -06:00
Benjamin Trent 081da09c72
Allow GET <pattern>/_rollup/data to expand data streams (#58173) (#58177) 2020-06-16 14:01:54 -04:00
Benjamin Trent 3309817d18
[ML] fixing tree inference ctor to allow target_type to be optional (#58132) (#58165)
The tree trained model object will set its target_type to be regression by default.

This updates the inference object to behave the same way.
2020-06-16 13:29:11 -04:00
Benjamin Trent 6c03d97419
Mute TimeSeriesDataStreamsIT.testSearchableSnapshotAction (#58127) (#58181)
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-16 12:40:38 -04:00
Alan Woodward 12a3f6dfca
MappedFieldType should not extend FieldType (#58160)
MappedFieldType is a combination of two concerns:

* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.

Relates to #56814
2020-06-16 16:56:43 +01:00
Dan Hermann 7079a3b09f
[7.x] Prohibit freezing the write index of a data stream (#58168) 2020-06-16 09:37:32 -05:00
Yannick Welsch 1e235a7f55 Fix off-by-one on CCR lease (#58158)
The leases issued by CCR keep one extra operation around on the leader shards. This is not
harmful to the leader cluster, but means that there's potentially one delete that can't be
cleaned up.
2020-06-16 14:04:58 +02:00
David Turner 423697f414 Default to zero replicas for searchable snapshots (#57802)
Today a mounted searchable snapshot defaults to having the same replica
configuration as the index that was snapshotted. This commit changes this
behaviour so that we default to zero replicas on these indices, but allow the
user to override this in the mount request.

Relates #50999
2020-06-16 10:12:23 +01:00
Tal Levy 69d5e044af
Add optional description parameter to ingest processors. (#57906) (#58152)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.

Closes #56000.
2020-06-15 19:27:57 -07:00
markharwood 03dd73dc0d
Fix for wildcard fields that returned ByteRefs not Strings to scripts. (#58060) (#58109)
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.

Closes #58044
2020-06-15 14:52:56 +01:00
Alejandro Fernández Haro 3d0c8da66d Add monitor and view_index_metadata to the built-in `kibana_system` role (#57755)
Allows the kibana user to collect data telemetry in a background
task by giving the kibana_system built-in role the view_index_metadata
and monitoring privileges over all indices (*).
2020-06-15 14:40:27 +03:00
Shaunak Kashyap 5e2faad783 Add ILM policy PUT and GET for remote_monitoring_agent built-in role (#57963)
Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.

Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
2020-06-15 14:35:30 +03:00
Rene Groeschke 01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Jason Tedor dcf4131f00
Revert "Add JNA license to SQL CLI dependency licenses"
This reverts commit 076b32d4f3.
2020-06-12 17:04:39 -04:00
Dan Hermann 17f3318732
[7.x] Resolve index API (#58037) 2020-06-12 15:41:32 -05:00
Jason Tedor 076b32d4f3
Add JNA license to SQL CLI dependency licenses
Previously we excluded requiring licenses for dependencies with the
group name org.elasticsearch under the assumption that these use the
top-level Elasticsearch license. This is not always correct, for
example, for the org.elasticsearch:jna dependency as this is merely a
wrapper around the upstream JNA project, and that is the license that we
should be including. A recent change modified this check from using the
group name to checking only if the dependency is a project
dependency. This exposed the use of JNA in SQL CLI to this check, but
the license for it was not added. This commit addresses this by adding
the license.

Relates #58015
2020-06-12 16:38:23 -04:00
Benjamin Trent 79c784932f
[ML] allow feature_names to be optional in ensemble inference model (#58059) (#58067)
This has `EnsembleInferenceModel` not parse feature_names from the XContent.

Instead, it will rely on `rewriteFeatureIndices` to be called ahead time.

Consequently, protections are made for a fail fast path if `rewriteFeatureIndices` has not been called before `infer`.
2020-06-12 16:33:54 -04:00
Ignacio Vera c518670f83
Fix Geo grid aggregation circuit breaker tests (#58028) (#58042)
This commit makes sure we create index with only one shard.
2020-06-12 15:39:27 +02:00
Martijn van Groningen 01d8bb8cfa
Enforce valid field mapping exists for timestamp_field in templates. (#58036)
Backport of #57741 to 7.x branch.

Relates to #53100
2020-06-12 15:24:42 +02:00
David Roberts 93b693527a
[7.x][ML] Add categorizer stats ML result type (#58001)
This type of result will store stats about how well categorization
is performing.  When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.

This PR is a minimal implementation to allow the C++ side changes to
be made.  More Java side changes related to per-partition
categorization will be in followup PRs.  However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats.  Like forecast request stats the
categorizer stats can be read directly from the job's results alias.

Backport of #57978
2020-06-12 12:08:07 +01:00
markharwood 2da8e57f59
Search - add range query support to wildcard field (#57881) (#57988)
Backport to add range query support to wildcard field

Closes #57816
2020-06-12 11:30:54 +01:00
David Kyle 39020f3900
HLRC for delete expired data by job Id (#57722) (#57975)
High level rest client changes for #57337
2020-06-12 09:44:17 +01:00
Mark Tozzi 36f551bdb4
Make ValuesSourceConfig behave like a config object (#57762) (#58012) 2020-06-11 17:23:55 -04:00
Benjamin Trent 2881995a45
[ML] adding new inference model size estimate handling from native process (#57930) (#57999)
Adds support for reading in `model_size_info` objects.

These objects contain numeric values indicating the model definition size and complexity.

Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-11 15:59:23 -04:00
Alan Woodward 16e230dcb8 Update to lucene snapshot e7c625430ed (#57981)
Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.
2020-06-11 14:51:53 +01:00
David Roberts 54d4f2a623 [ML] Refresh annotations index on job flush and close (#57979)
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
2020-06-11 12:29:04 +01:00
David Kyle b87b147704
Add models for search to ModelLoadingService (#57592) (#57919)
ModelLoadingService only caches models if they are referenced by an 
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest 
pipeline is deleted the model it references should not be evicted if 
it is used in search.
2020-06-11 10:48:37 +01:00
David Kyle 2905a2f623
Use Search After job iterators (#57875) (#57923)
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
2020-06-11 10:06:18 +01:00
Costin Leau ff0ea62cb8 EQL: Fix casing for tiebreaker field (#57943)
Use tiebreaker instead of tieBreaker

(cherry picked from commit 3c774948a5d5e10fac267cb9a54f5d0559a00c1d)
2020-06-11 00:10:19 +03:00
Albert Zaharovits c57ccd99f7
Just log 401 stacktraces (#55774)
Ensure stacktraces of 401 errors for unauthenticated users are logged
but not returned in the response body.
2020-06-10 20:39:32 +03:00
Valeriy Khakhutskyy c0f368bbf3
[7.x][ML] Adjust assertion for job case memory usage estimates (#57929)
Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion.

Backport or #57882
2020-06-10 15:17:16 +02:00
Aleksandr Maus ec60335496
EQL: implement case sensitivity for indexOf and endsWith string functions (#57707) (#57908)
* EQL: implement case sensitivity for indexOf and endsWith string functions
2020-06-10 08:55:49 -04:00
Andrei Dan 9f280621ba
[7.x] ILM add data stream support to searchable snapshot action (#57873) (#57916)
(cherry picked from commit 34856a90532c6c62a53817bb395399c8a8c17c0f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-10 10:16:57 +01:00
Yannick Welsch 80f221e920
Use clean thread context for transport and applier service (#57792) (#57914)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-10 10:30:28 +02:00
Hendrik Muhs 95bd7b63b0 [Transform] fix page size return in cat transform, add dps (#57871)
fixes the page size reported after moving page size to settings(#56007) and
adds documents per second(throttling) to the output.

fixes #56498
2020-06-10 08:10:25 +02:00
Yang Wang 72a6441a88
Revert "Resolve anonymous roles and deduplicate roles during authentication (#53453) (#55995)" (#57858)
This reverts commit 84a2f1adf2.
2020-06-10 10:42:52 +10:00
Jake Landis a370d5eead
[7.x] Ensure Joni warning are logged at debug (#57302) (#57897)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 17:06:29 -05:00
Yannick Welsch 9eec819c5b Revert "Use clean thread context for transport and applier service (#57792)"
This reverts commit 259be236cf.
2020-06-09 22:24:54 +02:00
Costin Leau 439205d1ea EQL: Introduce tie breaker support (#57787)
Allow a field inside the data to be used as a tie breaker for events
that have the same timestamp.
The field is optional by default.
If used, the tie-breaker always requires a non-null value since it is
used inside `search_after` which requires a non-null value.

Fix #56824

(cherry picked from commit e5719ecb474b32730d93afdbb6834a32b0b2df8b)
2020-06-09 22:50:19 +03:00
Andrei Dan 3945712c72
[7.x] ILM add data stream support to the Shrink action (#57616) (#57884)
The shrink action creates a shrunken index with the target number of shards.
This makes the shrink action data stream aware. If the ILM managed index is
part of a data stream the shrink action will make sure to swap the original
managed index with the shrunken one as part of the data stream's backing
indices and then delete the original index.

(cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-09 19:45:22 +01:00
Nik Everett 44a79d1739
Deprecte Rounding#round (#57845) (#57893)
This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in
favor of calling
```
Rounding.Prepared prepared = rounding.prepare(min, max);
...
prepared.round(val)

```

because it is always going to be faster to prepare once. There
are going to be some cases where we won't know what to prepare *for*
and in those cases you can call `prepareForUnknown` and stil be faster
than calling the deprecated method over and over and over again.

Ultimately, this is important because it doesn't look like there is an
easy way to cache `Rounding.Prepared` or any of its precursors like
`LocalTimeOffset.Lookup`. Instead, we can just build it at most once per
request.

Relates to #56124
2020-06-09 14:30:56 -04:00
Dan Hermann b501b282f8
Change default backing index naming scheme 2020-06-09 09:31:34 -05:00
Yannick Welsch 259be236cf Use clean thread context for transport and applier service (#57792)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-09 12:32:28 +02:00
Andrei Stefan 3cc8166946
SQL: handle MIN and MAX functions on dates in Painless scripts (#57605) (#57863)
* Convert to date/datetime the result of numeric aggregations (min, max)
in Painless scripts

(cherry picked from commit f1de99e2a6fbf3806c4f2b6b809738aa8faa2d75)
2020-06-09 10:09:01 +03:00
Benjamin Trent d5522c2747
[ML] add new circuit breaker for inference model caching (#57731) (#57830)
This adds new plugin level circuit breaker for the ML plugin.

`model_inference` is the circuit breaker qualified name.

Right now it simply adds to the breaker when the model is loaded (and possibly breaking) and removing from the breaker when the model is unloaded.
2020-06-08 16:02:48 -04:00
Przemysław Witek 7a1300a09e
[7.x] Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808) (#57815) 2020-06-08 17:41:12 +02:00
Mayya Sharipova 70e63a365a
Refactor how to determine if a field is metafield (#57378) (#57771)
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
2020-06-08 09:16:18 -04:00
Andrei Dan 1b84e93d83
[7.x] DataStream creation validation allows for prefixed indices (#57750) (#57799)
We want to validate the DataStreams on creation to make sure the future backing
indices would not clash with existing indices in the system (so we can
always rollover the data stream).
This changes the validation logic to allow for a DataStream to be created
with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the
former backing index (`foo-000001`) exists in the system.
The new validation logic will look for potential index conflicts with indices
in the system that have the counter in the name greater than the data stream's
generation.

This ensures that the `DataStream`'s future rollovers are safe because for a
`DataStream` `foo` of generation 4, we will look for standalone indices in the
form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if
`foo-000006` exists in the system), but will also allow replacing a
backing index with an index named by prefixing the backing index it replaces.

(cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-08 13:31:52 +01:00
David Kyle 08d1286de7
[7.x] Delete expired data by job (#57337) (#57796)
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which 
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
2020-06-08 13:00:23 +01:00
Luca Cavanna 7a06a13d99 Add description to submit and get async search, as well as cancel tasks (#57745)
This makes it easier to debug where such tasks come from in case they are returned from the get tasks API.

Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.
2020-06-08 11:17:29 +02:00
Luca Cavanna 06ef3042c1 Specify reason whenever async search gets cancelled (#57761)
This allows to trace where the cancel tasks request came from given that it may be triggered for multiple reasons.
2020-06-08 10:25:31 +02:00
David Roberts 1d64d55a86
[7.x][ML] Add per-partition categorization option (#57723)
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.

There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.

The changes so far cover REST APIs, results object
formats, HLRC and docs.

Backport of #57683
2020-06-06 08:15:17 +01:00
Benjamin Trent 9666a895f7
[ML] inference performance optimizations and refactor (#57674) (#57753)
This is a major refactor of the underlying inference logic.

The main refactor is now we are separating the model configuration and
the inference interfaces.

This has the following benefits:
 - we can store extra things with the model that are not
   necessary for inference (i.e. treenode split information gain)
 - we can optimize inference separate from model serialization and storage.
 - The user is oblivious to the optimizations (other than seeing the benefits).

A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.

This new class satisfies a new interface that is ONLY for inference.

The optimizations applied currently are:
- feature maps are flattened once
- feature extraction only happens once at the highest level
  (improves inference + feature importance through put)
- Only storing what we need for inference + feature importance on heap
2020-06-05 14:20:58 -04:00
Jake Landis 459ab9a0b2
[7.x] Ensure type exists for all monitoring configuration (#57399) (#57704)
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.

All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.

This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.

Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.

[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
2020-06-05 10:47:11 -05:00
Dimitris Athanasiou f49a14ce6f
[7.x][ML] Fix race condition when force stopping DF analytics job (#57680) (#57717)
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.

Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.

In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.

Backport of #57680
2020-06-05 17:50:01 +03:00
Tanguy Leroux 0e57528d5d Remove more //NORELEASE (#57517)
We agreed on removing the following //NORELEASE tags.
2020-06-05 15:34:06 +02:00
Hendrik Muhs 61c496d320 [Transform] use old roles only together with old endpoints (#57710)
avoids a CI failure if new endpoints used together with old roles and warnings are asserted.
2020-06-05 10:08:05 +02:00
Hendrik Muhs e91b975878 [Transform] mark old data frame transform roles deprecated (#57655)
mark old data frame transform roles deprecated

fixes #50087
2020-06-05 09:20:35 +02:00
Hendrik Muhs c1c8817eae
[7.x][Transform] improve update API (#57685)
rewrite config on update if either version is outdated, credentials change,
the update changes the config or deprecated settings are found. Deprecated
settings get migrated to the new format. The upgrade can be easily extended to
do any necessary re-writes.

fixes #56499
backport #57648
2020-06-05 08:48:47 +02:00
Jake Landis f4a3d969ad
[7.x] Ensure default watches are updated for rolling upgrades. (#57185) (#57563)
For a rolling/mixed cluster upgrade (add new version to existing cluster
then shutdown old instances), the watches that ship by default
with monitoring may not get properly updated to the new version.
Monitoring watches can only get published if the internal state is
marked as dirty. If a node is not master, will also get marked as
clean (e.g. not dirty).

For a mixed cluster upgrade, it is possible for the new node to be
added, not as master, the internal state gets marked as clean so
that no more attempts can be made to publish the watches. This
happens on all new nodes. Once the old nodes are de-commissioned
one of the new version nodes in the cluster gets promoted to master.
However, that new master node (with out intervention like restarting
the node or removing/adding exporters) will never attempt to re-publish
since the internal state was already marked as clean.

This commit adds a cluster state listener to mark the resource dirty
when a node is promoted to master. This will allow the new resource
to be published without any intervention.
2020-06-04 16:44:36 -05:00
William Brafford dfb6def3da Revert "Restore xpack.ilm.enabled and xpack.slm.enabled settings (#57383)"
This reverts commit 7a67fb2d04.
2020-06-04 16:25:05 -04:00
Ioannis Kakavas 8afd55ebe6
Disable testing conventions for idp in fips (#57663) (#57676)
Since we disable both integTest and test tasks. This should have
been part of #57048 but we missed it.
2020-06-04 20:51:38 +03:00
Ioannis Kakavas af9f9d7f03
[7.x] Add http proxy support for OIDC realm (#57039) (#57584)
This change introduces support for using an http proxy for egress
communication of the OpenID Connect realm.
2020-06-04 20:51:00 +03:00
William Brafford 7a67fb2d04
Restore xpack.ilm.enabled and xpack.slm.enabled settings (#57383)
In #55592 and #55416, we deprecated the settings for enabling and disabling
basic license features and turned those settings into no-ops. Since doing so,
we've had feedback that this change may not give users enough time to cleanly
switch from non-ILM index management tools to ILM. If two index managers
operate simultaneously, results could be strange and difficult to
reconstruct. We don't know of any cases where SLM will cause a problem, but we
are restoring that setting as well, to be on the safe side.

This PR is not a strict commit reversion. First, we are keeping the new
xpack.watcher.use_ilm_index_management setting, introduced when
xpack.ilm.enabled was made a no-op, so that users can begin migrating to using
it. Second, the SLM setting was modified in the same commit as a group of other
settings, so I have taken just the changes relating to SLM.
2020-06-04 13:38:22 -04:00
Mark Vieira 9b0f5a1589
Include vendored code notices in distribution notice files (#57017) (#57569)
(cherry picked from commit 627ef279fd29f8af63303bcaafd641aef0ffc586)
2020-06-04 10:34:24 -07:00
Przemysław Witek 6b5f49d097
[7.x] Introduce ModelPlotConfig. annotations_enabled setting (#57539) (#57641) 2020-06-04 15:15:35 +02:00
Benjamin Trent ea9b8b9d41
[ML] fix setting forecasts to failed method (#57654) (#57656) 2020-06-04 08:54:46 -04:00
Rene Groeschke 751f16858b
Remove duplicate ssl setup in sql/qa projects (#57319) (#57643)
* Remove duplicate ssl setup in sql/qa projects
* Fix enforcement of task instances
* Use static data for cert generation
* Move ssl testing logic into a plugin
* Document test cert creation
2020-06-04 14:53:23 +02:00
Marios Trivyzas 5f8442d1f4
SQL: Improve performances of LTRIM/RTRIM (#57603)
Change custom stripping leading and trailing whitespaces implementation
to substantially improves performance:
```
Benchmark                         Mode  Cnt      Score     Error  Units
StringTrim.testWithStringBuilder  avgt   25  82547.575 ±  66.244  ns/op (existing impl)
StringTrim.testWithSubstring      avgt   25   1398.762 ± 101.152  ns/op (new impl)
StringTrim.testWithJavaStrip      avgt   25   1186.120 ±  10.374  ns/op (for reference)
```
Java's string stripLeading()/stripTrailing() not available to all supported JDKs.

Enhanced LENGTH unit tests and compine a couple of LTRIM/RTRIM integ
tests.

Relates to: #57594
(partially cherry picked from commit ee7868d68733f195dc46926a7eab3d9dd7033ef4)

Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
2020-06-04 13:43:49 +02:00
Igor Motov 8d7f389f3a
Increase search.max_buckets to 65,535 (#57042)
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes #51731
2020-06-03 15:35:41 -04:00
Julie Tibshirani e0a15e8dc4
Remove the 'array value parser' marker interface. (#57571) (#57622)
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).

The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
2020-06-03 11:30:14 -07:00
Marios Trivyzas a674844893
SQL: Implement TRIM function (#57518) (#57593)
Add `TRIM` function which combines the functionality of both
`LTRIM` and `RTRIM` by stripping both leading and trailing
whitespaces.

Refers to #41195

(cherry picked from commit 6c86c919e12f0c4cb5e39d129aa65ab3e274268f)
2020-06-03 15:19:48 +02:00
Ioannis Kakavas 64583f7ec4
Mute EmailSslTests test case in fips (#57576) (#57577)
We test expected TLS failures by catching SSLException, but other
security providers ( i.e. BCFIPS ) might throw a different one. In
this case, BCFIPS throws org.bouncycastle.tls.TlsFatalAlert
2020-06-03 11:23:31 +03:00
Marios Trivyzas 634936e3be
SQL: [Tests] Enable tests which have been fixed (#57526) (#57538)
Enable integration tests for issues that have been fixed
over time.

(cherry picked from commit 117759ee152bcfb0043e5af3a784302ca31f6b8c)
2020-06-02 23:38:33 +02:00
Nik Everett 2a27c411fb
Same memory when geo aggregations are not on top (#57483) (#57551)
Saves memory when the `geotile_grid` and `geohash_grid` are not on the
top level by using the `LongKeyedBucketOrds` we built in #55873.
2020-06-02 16:21:50 -04:00
Dan Hermann 97a51272b0
Fix incorrect log warning when exporting monitoring via HTTP without authentication (#57552) 2020-06-02 15:03:55 -05:00
Mark Tozzi e50f514092
IndexFieldData should hold the ValuesSourceType (#57373) (#57532) 2020-06-02 12:16:53 -04:00
Rene Groeschke 8584da40af
Move classes from build scripts to buildSrc (#57197) (#57512)
* Move classes from build scripts to buildSrc

- move Run task
- move duplicate SanEvaluator

* Remove :run workaround

* Some little cleanup on build scripts on the way
2020-06-02 15:33:53 +02:00
Andrei Dan bd188f4a21
[7.x] ILM: add support for rolling over data streams (#57295) (#57515)
As the datastream information is stored in the `ClusterState.Metadata` we exposed
the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for
the steps to be able to identify when a managed index is part of a DataStream.

If a managed index is part of a DataStream the rollover target is the DataStream
name and the highest generation index is the write index (ie. the rolled index).

(cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-02 11:55:23 +01:00
Przemysław Witek ea6cfb7c3d
[7.x] Make Annotation a result type (#56342) (#57508) 2020-06-02 11:56:41 +02:00
Tanguy Leroux b4a2cd810a
Use 3rd party task to run integration tests on external service (#56588)
Backport of #56587 for 7.x
2020-06-02 11:26:58 +02:00
Marios Trivyzas 52c555e286
SQL: Make CASTing string to DATETIME more lenient (#57451) (#57509)
Some BI tools (i.e. Tableau) would try to cast strings where the time
part is separated from the date part with a whitespace instead of `T`.
Adjust type conversion used by CAST to support this.

(cherry picked from commit 0e18321e7ad9f779c42855efbf93f171b9128a5e)
2020-06-02 10:54:03 +02:00
Marios Trivyzas b8a13de20f
SQL: Implement TOP as an alternative to LIMIT (#57428) (#57507)
Add basic support for `TOP X` as a synonym to LIMIT X which is used
by [MS-SQL server](https://docs.microsoft.com/en-us/sql/t-sql/queries/top-transact-sql?view=sql-server-ver15),
e.g.:

```
SELECT TOP 5 a, b, c FROM test
```

TOP in SQL server also supports the `PERCENTAGE` and `WITH TIES`
keywords which this implementation doesn't.

Don't allow usage of both TOP and LIMIT in the same query.

Refers to #41195

(cherry picked from commit 2f5ab81b9ad884434d1faa60f4391f966ede73e8)
2020-06-02 10:53:42 +02:00
Przemysław Witek ceb4b29b98
Introduce Annotation.event field (#57144) (#57453) 2020-06-01 20:42:25 +02:00
Mark Tozzi 1f500583b1
Clean up Aggregator Supplier Boiler Plate (#57442) (#57452) 2020-06-01 14:21:07 -04:00
Zachary Tong daaf5a3dcc
Fix assertion catching in aggregation supported type test (#56466) (#57382)
At some point, we changed the supported-type test to also catch
assertion errors.  This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.

This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.

Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.

Also includes a variety of fixes to other tests which were failing
but being silently smothered.
2020-06-01 12:10:05 -04:00
David Kyle 064093c4d4 Fix compilation after backport of #57278 2020-06-01 12:03:13 +01:00
Przemysław Witek 72ad9a4548
[7.x] Make AnnotationPersister use bulk requests instead of indexing individual documents (#57278) (#57354) 2020-06-01 12:05:09 +02:00
Benjamin Trent 34f1e0b6bb
[7.x] [ML] mark forecasts for force closed/failed jobs as failed (#57143) (#57374)
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)

forecasts that are still running should be marked as failed/finished in the following scenarios:

- Job is force closed
- Job is re-assigned to another node.

Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.

Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.

relates to https://github.com/elastic/elasticsearch/issues/56419
2020-05-29 14:48:10 -04:00
Benjamin Trent 35d5126cea
[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351) (#57368)
* [ML] adds new for_export flag to GET _ml/inference API (#57351)

Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 14:01:08 -04:00
Benjamin Trent 15aba60c02
[7.x] Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695) (#57359)
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)

This commit lays the ground work for plugins supplying their own circuit breakers.

It adds a new interface: `CircuitBreakerPlugin`.

This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.

With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
2020-05-29 12:13:46 -04:00
Benjamin Trent c8374dc9f3
[ML] add max_model_memory parameter to forecast request (#57254) (#57355)
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 11:16:08 -04:00
Marios Trivyzas b2651323fd
SQL: Implement TIME_PARSE function for parsing strings into TIME values (#55223) (#57342)
Implement TIME_PARSE(<time_str>, <pattern_str>) function
which allows to parse a time string according to the specified
pattern into a time object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Closes #54963

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <patrickjiang0530@gmail.com>

(cherry picked from commit 1fe1188d449cad7d0782a202372edc52a4014135)
2020-05-29 15:48:37 +02:00
Dan Hermann 6b0d707671
[7.x] Do not report negative values for swap sizes (#57353) 2020-05-29 08:11:47 -05:00
Martijn van Groningen 04ef39da77
Change cluster info actions to be able to resolve data streams. (#57343)
Backport of #56878 to 7.x branch.

With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.

Relates to #53100
2020-05-29 12:17:53 +02:00
Dimitris Athanasiou 322f953060
[7.x][ML] Anomaly detection jobs should allow missing values for geo fields (#57300) (#57338)
Allows geo fields (`geo_point`, `geo_shape`) to have missing values.
Fixes a bug where such missing values would result in an error.

Closes #57299

Backport of #57300
2020-05-29 13:06:16 +03:00
Benjamin Trent 24d605e41e
[ML] fixing GET _ml/inference so size param is respected (#57303) (#57308)
`size` was previously ignored when grabbing full trained model configs. 

closes https://github.com/elastic/elasticsearch/issues/57298
2020-05-28 15:45:26 -04:00
Martijn van Groningen 225ccd1cfa
Ensure template exists when creating data stream (#57275)
Backporting #56888 to 7.x branch.

Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.

This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.

Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.

Relates to #53100
2020-05-28 15:08:25 +02:00
Marios Trivyzas fdac9e99fa
SQL: Fix unecessary evaluation for CASE/IIF (#57159) (#57262)
Previously, `CASE` and `IIF` when translated to painless scripts
(used in GROUP BY, HAVING, WHERE) a custom `caseFunction`
registered in the `InternalSqlScriptUtils` was used. This function
received and array of arbitrary length:
```[condition1, result1, condition2, result2, ... elseResult]```

Painless doesn't know of the context and therefore is evaluating
all conditions and results before invoking the `caseFunction` on them.
As a consequence, erroneous result expressions (i.e. division by 0)
where always evaluated despite of the guarding condition.

Replace the `caseFunction` with painless `<cond> ? <res1> : <res2>`
expressions to properly guard the result expressions and only evaluate
the one for which its guarding condition evaluates to true (or of course
the elseResult).

As a bonus, this approach includes performance benefits since we avoid
unnecessary evaluations of both conditions and result expressions.

Fixes: #49672
(cherry picked from commit 9584b345d89f797bfb658212b928b9812804f02f)
2020-05-28 11:30:14 +02:00
Tim Vernum 408250dcc4
Fix smtp.ssl.trust setting for watcher email (#57268)
The ssl.trust setting for Watcher provides a list of hostnames that
should be automatically trusted for SSL hostname verification. It was
accidentally broken when we added the full ssl.* settings for email
notifications (see #45272)

This commit corrects this, so the setting is once again respected,
as long as none of the other ssl settings are configured for email
notifications.

Resolves: #52153
Backport of: #56090
2020-05-28 17:34:13 +10:00
Ryan Ernst fdb8573413
Convert remaining compilerJavaHome reference 2020-05-27 17:04:04 -07:00
Ryan Ernst beb1d0c338
Remove compiler java version flag (#57237)
This commit removes the compiler.java setting from the build. It was
originally added when Gradle was far behind support for the latest jdk,
but is no longer applicable as we don't have any need to update the
supported compile version before gradle supports the newer version. Note
that the runtime version changing support still exists here, this only
ensures we use the same jdk to compile as we use to run gradle.
2020-05-27 16:33:38 -07:00
David Roberts d139a79ef6
[7.x][ML] Fix monitoring if orphaned anomaly detector persistent tasks exist (#57240)
Since #51888 the ML job stats endpoint has returned entries for
jobs that have a persistent task but not job config. Such
orphaned tasks caused monitoring to fail.

This change ignores any such corrupt jobs for monitoring purposes.

Backport of #57235
2020-05-27 22:59:11 +01:00
James Baiera 3b73ce3112
Fix enrich coordinator to reject documents instead of deadlocking (#56247) (#57179)
This PR removes the blocking call to insert ingest documents into a queue in the
coordinator. It replaces it with an offer call which will throw a rejection exception
in the event that the queue is full. This prevents deadlocks of the write threads
when the queue fills to capacity and there are more than one enrich processors
in a pipeline.
2020-05-27 15:32:13 -04:00
Lee Hinman c0f732b9f6
[7.x] Rename template V2 classes to ComposableTemplate (#57183) (#57232)
Backports the following commits to 7.x:

    Rename template V2 classes to ComposableTemplate (#57183)
2020-05-27 11:01:59 -06:00
Tal Levy 81060820e9 Fix NormalizerAgg test searcher wrapping (#57171)
The searcher was randomly wrapping its reader as slow, parallel, or filtered.
This was causing casting issues in the normalizer tests. By removing the
wrapping, the problem goes away.

Closes #57164
2020-05-26 13:25:19 -07:00
Benjamin Trent decc6277f9
[ML] allow unran/incomplete forecasts to be deleted for stopped/failed jobs (#57152) (#57172)
If a job is NOT opened, forecasts should be able to be deleted, no matter their state.

This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids

closes https://github.com/elastic/elasticsearch/issues/56419
2020-05-26 15:44:22 -04:00
Bogdan Pintea 74b2c8a770 Change error message for comp against fields (#57126)
Change the error message wording for comparisons against fields in
filtering (s/variables/fields).

(cherry picked from commit d9a1cb50940d0a98fd75b9c0123ca6e1d862f65d)
2020-05-26 17:57:51 +02:00
Bogdan Pintea 0c379e334a SQL: update the JLine dependency to 3.14.1 (#57111)
* Update the JLine dependency to 3.14.1

Update the JLine dependency from 3.10.0 to 3.14.1.

(cherry picked from commit c2d9b74046fa5ddb54604da3afa7887cc38548a1)
2020-05-26 17:56:34 +02:00
markharwood b2bc6071fd
Add regex query support to wildcard field (approach 2) (#55548) (#57141)
Backport of #55548

Adds equivalence for keyword field to the wildcard field. Regex, fuzzy, wildcard and prefix queries are all supported.
All queries use an approximation query backed by an automaton-based verification queries.

Closes #54275
2020-05-26 16:55:59 +01:00
markharwood 1d74549d7f
Wildcard field - add support for null field with test (#57047) (#57139)
Backport of #57047
2020-05-26 16:07:49 +01:00
David Kyle 571477d0ad
[7.x] Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041) (#57136)
Fix delete_expired_data/nightly maintenance when 
many model snapshots need deleting (#57041)

The queries performed by the expired data removers pull back entire 
documents when only a few fields are required. For ModelSnapshots in 
particular this is a problem as they contain quantiles which may be 
100s of KB and the search size is set to 10,000.

This change makes the search more efficient by only requesting the 
fields needed to work out which expired data should be deleted.
2020-05-26 10:56:42 +01:00
Ioannis Kakavas 1e03de4999
Fix key usage in SamlAuthenticatorTests (#57124) (#57129)
In #51089 where SamlAuthenticatorTests were refactored, we missed
to update one test case which meant that a single key would be
used both for signing and encryption in the same run. As explained
in #51089, and due to FIPS 140 requirements, BouncyCastle FIPS
provider will block RSA keys that have been used for signing from
being used for encryption and vice versa

This commit changes testNoAttributesReturnedWhenTheyCannotBeDecrypted
to always use the specific keys we have added for encryption.
2020-05-26 10:51:47 +03:00
Jim Ferenczi 52443d41cf Stop async search maintenance service on restart (#56982)
This change ensures that we stop the maintenance service on all nodes
when a data node is restarted. This ensures that we don't send
update_by_query requests on the node that is restarted.
This commit also raises the log level to trace for some packages
in order to investigate the failures to acquire a shard lock
after a restart.

Relates #56765
2020-05-26 09:30:33 +02:00
Przemysław Witek ea2012778e
Mute failing test (#57112) (#57113) 2020-05-25 14:06:29 +02:00
Ioannis Kakavas 174af2bb1a
[7.x] Refactor SamlAuthenticatorTests (#51089) (#57105)
- Use opensaml to sign and encrypt responses/assertions/attributes
instead of doing this manually
- Use opensaml to build response and assertion objects instead of
parsing xml strings
- Always use different keys for signing and encryption. Due to FIPS
140 requirements, BouncyCastle FIPS provider will block
RSA keys that have been used for signing from being used for
encryption and vice versa. This change adds new encryption specific
 keys to be used throughout the tests.
2020-05-25 14:09:42 +03:00
Ioannis Kakavas 6c832fe4e3
Don't run IDP tests in FIPS 140 mode (#57048) (#57098)
We don't support this for now so there is no need to handle all
the test logic/exceptions to run this in FIPS 140 mode.
2020-05-25 14:08:48 +03:00
Armin Braun 9fa60f7367
Add History UUID Index Setting (#56930) (#57104)
Pre-requesite for #50278 to be able to uniquely identify index metadata by
its version fields and UUIDs when restoring into closed indices.
2020-05-25 11:26:03 +02:00
Marios Trivyzas b91bae30b1
SQL: [Tests] Move JDBC integration tests to new module (#56872) (#57072)
Move the JDBC functionality integration tests from `:sql:qa` to a separate
module `:sql:qa:jdbc`. This way the tests are isolated from the rest of the
integration tests and they only depend to the `:sql:jdbc` module, thus
removing the danger of accidentally pulling in some dependency that may
hide bugs.

Moreover this is a preparation for #56722, so that we can run those tests
between different JDBC and ES node versions and ensure forward
compatibility.

Move the rest of existing tests inside a new `:sql:qa:server` project, so that
the `:sql:qa` becomes the parent project for both and one can run all the integration
tests by using this parent project.

(cherry picked from commit c09f4a04484b8a43934fe58fbc41bd90b7dbcc76)
2020-05-22 17:49:36 +02:00
Ioannis Kakavas 6c90727166
Fix custom policy in plugins in FIPS 140 (#52046) (#57049)
Our FIPS 140 testing depends on setting the appropriate java policy
in order to configure the JVM in FIPS mode. Some tests (
discovery-ec2 and ccr qa ) also needed to set a custom policy file
to grant a specific permission, which overwrote the FIPS related
policy and tests would fail. This change ensures that when a
custom policy needs to be set in these tests, the permissions that
are necessary for FIPS are also set.

Resolves: #51685, #52034
2020-05-21 19:26:56 +03:00
Benjamin Trent f00dfb2d5f
[ML] adds WKT support in filestructurefinder (#57014) (#57032)
Field mapping detection is done via grok patterns. 
This commit adds well-known text (WKT) formatted geometry detection.

If everything is a `POINT`, then a `geo_point` mapping is preferred. 
Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred.

This does **NOT** detect other types of formatted geometries (geohash, comma delimited points, etc.)

closes https://github.com/elastic/elasticsearch/issues/56967
2020-05-21 08:22:51 -04:00
markharwood eb8cb31d46
Update Lucene version to 8.6.0-snapshot-9d6c738ffce (#57024)
Same version as master
2020-05-21 11:28:16 +01:00
Bogdan Pintea ec4a6aa1c6 SQL: JDBC: fix temporary directory locked test errors in Windows (#56917)
* Fix temp dir locked errors

The tests involving a temporary directory (containing the JDBC JAR) fail
on Windows because they can't be deleted, due to still being in use.
This commit forces a premature closing of the JAR file, which mitigates
the failure by giving the JVM more time to collect any open FDs.
(Calling the System.gc() in the tests is another working alternative
fix.)

The stream-based JAR access is taken care by disabling the cache usage

(cherry picked from commit 04f97333a015404a68e8f19223f33aadeb396687)
2020-05-20 19:46:57 +02:00
Benjamin Trent ee4ce8ecec
Fix geotile_grid group_by field mapping (#56939) (#56990)
The original implementation utilized `bbox` as the index mapping type. This would not work as it would have to be `envelope`. But, given that `envelope` and `polygon` are tessellated in the same way, we choose to use `polygon` as the geo_shape type. This is for easier support other places in the stack (a la kibana maps)
2020-05-20 08:22:13 -04:00
Alan Woodward 18bfbeda29 Move merge compatibility logic from MappedFieldType to FieldMapper (#56915)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.

This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.

Relates to #56814
2020-05-20 09:43:13 +01:00
Marios Trivyzas 644ae49817
SQL: Fix behaviour of COUNT(DISTINCT <literal>) (#56869) (#56932)
Previously `COUNT(DISTINCT <literal>)` was returning the same result
as `COUNT(<literal>)` which is not correct as it should always return 1
if there is at least one matching row (bucket if there is a GROUP BY),
or 0 otherwise.

(cherry picked from commit 7f7d7562d43034907f432d39d0d66f490d78f4a8)
2020-05-19 11:19:06 +02:00
Yannick Welsch f296c08021 Increase timeout for assertLongBusy in AutoFollowIT (#56910)
Closes #56891
2020-05-18 16:20:46 +02:00
Benjamin Trent 297f864884
[ML] relax throttling on expired data cleanup (#56711) (#56895)
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 08:46:42 -04:00
David Kyle 0fac152188 Muse AsyncSearchActionIT (#56897)
For #56765
2020-05-18 13:36:33 +01:00
Ioannis Kakavas bb852ab2e7
Cause is tracked in #49094 (#56887) 2020-05-18 15:03:38 +03:00
David Kyle 52a329fa12 Mute sql.client.VersionTests suite (#56883)
For  #56882
2020-05-18 10:15:30 +01:00
Bogdan Pintea de7dd6154e Fix range of version number generation in test (#56849)
The version number componenent can't equal or exceed the revision
multiplier.
This fixes a the VersionTests unit test.

(cherry picked from commit 7d2331a2818ae20024c5c3617cd4433f90e9c098)
2020-05-16 08:59:45 +02:00
Andrei Stefan 4d47d63f55
SQL: implement SUM, MIN, MAX, AVG over literals (#56786) (#56850)
* Adds support for MIN, MAX, AVG, SUM aggregates acting on literals.
SELECT SUM(1) FROM index
and
SELECT SUM(1), AVG(2)
work both on indices and as local execution.

(cherry picked from commit efb72907c0391612c4a2b6256e327060b4167912)
2020-05-16 02:13:55 +03:00
Jake Landis 813609b47c
Ensure that .watcher-history-11* template is in installed prior to use (#56734)
WatcherIndexTemplateRegistry as of https://github.com/elastic/elasticsearch/pull/52962 
requires all nodes to be on 7.7.0 before it allows the version 11 index template to be 
installed.

While in a mixed cluster, nothing prevents Watcher from running on the new
host before the all of the nodes are on 7.7.0. This will result in the
.watcher-history-11* index without the proper mappings. Without the proper
mapping a single document (for a large watch) can exceed the default 1000 field
limit and cause error to show in the logs.

This commit ensures the same logic for writing to the index is applied as for
installing the template. In a mixed cluster, the `10` index template will continue
to be written. Only once all of nodes are on 7.7.0+ will the `11` index template
be installed and used.

closes #56732
2020-05-15 16:29:04 -05:00
Dimitris Athanasiou 54d3cc74ec
[7.x][ML] Ensure class is represented when its cardinality is low (#56783) (#56829)
In DF analytics classification, it is possible to use no samples
of a class if its cardinality is too low.

This commit fixes this by ensuring the target sample count can never be zero.

Backport of #56783
2020-05-15 20:52:06 +03:00
Bogdan Pintea 14ad733bd1
SQL: JDBC: fix access to the Manifest for non-entry JAR URLs (#56797) (#56839)
* JDBC: fix access to the Manifest for non-entry JAR

The JDBC driver will attempt to read its version from the Manifest file
embedded into its JAR. The URL pointing to the JAR can be provided in a
few ways.

So far, accessing the Manfiest was attempted by getting a URLConnection
out of the URL and then getting an input stream out of this connection.
For file JAR URLs, this only works however if the URL points to the
driver as a JAR file entry (i.e. <sub-url>!/jdbc-driver.jar!/). If
that's not the case, the JarURLConnection will throw an IOException.

This commit fixes that: in case the URL points to a JAR entry
(jar:file:<path>/jdbc-driver.jar!/), the manifest is read directly with
JarURLConnection#getManifest().

(cherry picked from commit 2175b7b01cf5fcf3ab2bb21404a9bd454a8df3f0)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-05-15 19:35:54 +02:00
James Baiera 4809db3ff9
EnrichProcessorFactory should not throw NPE if missing metadata (#55977) (#56793)
In some cases the Enrich processor factory may be called before it is
ready to create processors. While these calls are usually made in error,
the response from the Enrich processor is an NPE which is almost always
an unhelpful error when debugging an issue.
2020-05-15 12:02:13 -04:00
Ioannis Kakavas 239ada1669
Test adjustments for FIPS 140 (#56526)
This change aims to fix our setup in CI so that we can run 7.x in
FIPS 140 mode. The major issue that we have in 7.x and did not
have in master is that we can't use the diagnostic trust manager
in FIPS mode in Java 8 with SunJSSE in FIPS approved mode as it
explicitly disallows the wrapping of X509TrustManager.

Previous attempts like #56427 and #52211 focused on disabling the
setting in all of our tests when creating a Settings object or
on setting fips_mode.enabled accordingly (which implicitly disables
the diagnostic trust manager). The attempts weren't future proof
though as nothing would forbid someone to add new tests without
setting the necessary setting and forcing this would be very
inconvenient for any other case ( see
#56427 (comment) for the full argumentation).

This change introduces a runtime check in SSLService that overrides
the configuration value of xpack.security.ssl.diagnose.trust and
disables the diagnostic trust manager when we are running in Java 8
and the SunJSSE provider is set in FIPS mode.
2020-05-15 18:10:45 +03:00
Benjamin Trent f71c305090
[7.x] [Transform] add support for terms agg in transforms (#56696) (#56809)
* [Transform] add support for terms agg in transforms (#56696)

This adds support for `terms` and `rare_terms` aggs in transforms. 

The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`

The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
2020-05-15 08:08:43 -04:00
David Roberts 270a23e422 [TEST] Fix log tail mocking in native process unit tests (#56804)
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.

Fixes #56796
2020-05-15 12:46:37 +01:00
Alan Woodward d33d13f2be Simplify generics on Mapper.Builder (#56747)
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
2020-05-15 12:14:49 +01:00
Yang Wang c66e7ecbfe
Fix test failure of file role store auto-reload (#56398) (#56802)
Ensure assertion is only performed when we can be sure that the desired changes are picked up by the file watcher.
2020-05-15 15:10:45 +10:00
Ryan Ernst 9fb80d3827
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 20:23:07 -07:00
Tal Levy 5e90ff32f7
Add Normalize Pipeline Aggregation (#56399) (#56792)
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.

The aggregations supports the following normalizations

- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization

To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.

For example:

```
{
  "normalize": {
    "buckets_path": <>,
    "normalizer": "percent"
  }
}
```
2020-05-14 17:40:15 -07:00
Mark Vieira 0fd756d511
Enforce strict license distribution requirements (#56642) 2020-05-14 13:57:56 -07:00
Costin Leau 6f4af43405 EQL: Skip execution for filters with empty results (#56718)
Optimize away events queries and joins/sequence that cannot match any
results without having to query the backend.

(cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)
2020-05-14 22:38:23 +03:00
Mark Tozzi b718193a01
Clean up DocValuesIndexFieldData (#56372) (#56684) 2020-05-14 12:42:37 -04:00
Dimitris Athanasiou ac5902624c
[7.x][ML] Improve error upon DF analytics mappings conflict (#56700) (#56776)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.

Backport of #56700
2020-05-14 19:16:10 +03:00
Jim Ferenczi fb5e6329b7 Stop/Start async search maintenance service in tests(#56673)
This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method.

Fixes #55988
2020-05-14 15:13:01 +02:00
David Turner bec6821fe6 AwaitsFix for #56755 2020-05-14 11:46:05 +01:00
Alexander Reelsen 3a263d91f6 Ensure watcher email action message ids are always unique (#56574)
If an email action is used in a foreach loop, message ids could have
been duplicated, which then get rejected by the mail server.

This commit introduces an additional static counter in the email action
in order to ensure that every message id is unique.
2020-05-14 10:36:00 +02:00
Przemysław Witek 98fbd85290
[7.x] Add scope-related fields to Annotation (#56417) (#56681) 2020-05-14 10:23:13 +02:00
Andrei Stefan ddf4e47e86
EQL: fix QueryFolderOkTests (#56714) (#56728)
(cherry picked from commit 8b21ccd0eac3b3d0fbd090152b3dff6ae5217b52)
2020-05-14 10:58:25 +03:00
David Roberts 3051c37f92
[ML] Tail the C++ logging pipe before connecting other pipes (#56701)
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.

This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block.  This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.

This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.

Backport of #56632
2020-05-14 07:10:30 +01:00
Aleksandr Maus 87a10806ab
EQL: Fix cidrMatch function fails to match when used in scripts (#56246) (#56735)
EQL: Fix cidrMatch function fails to match when used in scripts (#56246)

Addresses https://github.com/elastic/elasticsearch/issues/55709
2020-05-13 22:41:24 -04:00
Nik Everett b98b260048
Merge significant_terms into the terms package (backport of #56699) (#56715)
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.

Precondition for the terms work in #56487
2020-05-13 17:36:21 -04:00
Ross Wolf 61e2cf89b5
EQL: Add number function (#55084)
* EQL: Add number function
* EQL: Fix the locale used for number for deterministic functionality
* EQL: Add more ToNumber tests
* EQL: Add more number ToNumberProcessor unit tests
* EQL: Remove unnecessary overrides, fix processor methods
* EQL: Remove additional unnecessary overrides
* EQL: Lint fixes for ToNumber
* EQL: ToNumber renames from PR feedback
* EQL: Remove NumberFormat locale handling
* EQL: Removed NumberFormat from ToNumber
* EQL: Add number function tests
* EQL: ToNumberProcessorTests formatting
* EQL: Remove newline in ToNumberProcessorTests
* EQL: Add number(..., null) test
* EQL: Create expression.function.scalar.math package
* EQL: Remove painless whitespace for ToNumber.asScript
* EQL: Add Long support
2020-05-13 14:09:06 -06:00
Costin Leau 9f1ecd52eb EQL: Introduce support for sequences (#56300)
Initial support for EQL sequences
The current algorithm is focused on correctness and does not contain
any optimization which is left for the future.

The current implementation uses a state machine approach which moves
ascending and runs each query one after the other working on computing
sequences as the data comes in.
For each result, the key and its timestamp are being extracted which are
then used for matching/building a sequence.

(cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)
2020-05-13 15:42:31 +03:00
Ignacio Vera b4521d5183
upgrade to Lucene 8.6.0 snapshot (#56661) 2020-05-13 14:25:16 +02:00
Marios Trivyzas cbbbd499bf
SQL/EQL: Add support for scalars within LIKE/RLIKE (#56495) (#56674)
- Add support for scalar functions on the field of SQL's LIKE/RLIKE
- Add support for scalar functions on the field of EQL's match/matchLite

Closes: #55058
(cherry picked from commit 51c14e2dbb7fb29004a23369c449d425b3ac8fe2)
2020-05-13 13:40:24 +02:00
Luca Cavanna 30e9a1b8c7 Improve error handling when decoding async execution ids (#56285)
When decoding async execution ids, exceptions thrown from the decode method itself were not caught, leading to cryptic errors like "Input byte array has incorrect ending byte at 68" being returned. With this commit we return "invalid id: [abcdef]".

Added tests coverage for a couple of these scenarios and also added tests for equals/hashcode methods.
2020-05-13 12:26:17 +02:00
Marios Trivyzas e781193cf9
SQL: Fix JDBC url pattern in docs and error message (#56612)
The docs pattern url was using `*` which means zero or many instead
of `?` which means zero or one. The pattern url returned in error
messages was not in sync with the one in the docs.

Fixes: #56476
(cherry picked from commit 1a5945c3962cdda21482f4b0b3e0ca508534c2c4)
2020-05-13 12:13:58 +02:00
David Turner c10b4ae15a Support cloning of searchable snapshot indices (#56595)
Today you can convert a searchable snapshot index back into a regular index by
restoring the underlying snapshot, but this is somewhat wasteful if the shards
are already in cache since it copies the whole index from the repository again.

Instead, we can make use of the locally-cached data by using the clone API to
copy the contents of the cache into the layout expected by a regular shard.
This commit marks the searchable snapshot's private index settings as
`NotCopyableOnResize` so that they are removed by resize operations such as
cloning.

Cloning a regular index typically hard-links the underlying files rather than
copying them, but this is tricky to support in the case of a searchable
snapshot so this commit takes the simpler approach of always copying the
underlying files.
2020-05-13 11:05:14 +01:00
Ioannis Kakavas cc119c3853
Expose idp.metadata.http.refresh for SAML realm (#56354) (#56593)
This setting was not returned in the SamlRealmSettings#getSettings
so it was not possible for users to set this in the realm config
in our configuration.
2020-05-13 11:51:18 +03:00
Jake Landis a010f4f624
[7.x] Watcher dont add watches post index if stopped (#56556) (#56629)
Watcher adds watches to the trigger service on the postIndex action
for the .watches index. This has the (intentional) side effect of also
adding the watches to the stats. The tests rely on these stats for their
assertions. The tests also start and stop Watcher between each test for
a clean slate.

When Watcher executes it updates the .watches index and upon this update
it will go through the postIndex method and end up added that watch to the
trigger service (and stats). Functionally this is not a problem, if Watcher
is stopping or stopped since Watcher is also paused and will not execute
the watch. However, with specific timing and expectations of a clean slate
can cause issues the test assertions against the stats.

This commit ensures that the postIndex action only adds to the trigger service
if the Watcher state is not stopping or stopped. When started back up it will
re-read index .watches.

This commit also un-mutes the tests related to #53177 and #56534
2020-05-12 16:30:27 -05:00
Jake Landis 9c76ee47c4
[7.x] json spec: allow null for documentation url (#55749) (#56625)
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.

This commit also adds a documentation.url for two ml resources.
2020-05-12 14:49:02 -05:00
Armin Braun 0a879b95d1
Save Bounds Checks in BytesReference (#56577) (#56621)
Two spots that allow for some optimization:

* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
   * Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
2020-05-12 20:33:45 +02:00
Armin Braun c104c9a11b
Fix Missing IgnoredUnavailable Flag in 7.x SLM Retention Task (#56616)
Without the flag we run into the situation where a broken repository (broken by some old 6.x
version of ES that is missing some snap-${uuid}.dat blobs fails to run the SLM retention task
since it always errors out).
2020-05-12 18:07:58 +02:00
Marios Trivyzas 4240b97d0e
SQL: [Test] Fix JdbcPreparedStatement date test
Use `ORDER BY` to ensure order of the rows since more
than are returned in the testDate().

Follows: #56492
(cherry picked from commit 0053a1cb515b4db160d7b0bed5cf3f13c1050687)
2020-05-12 17:08:16 +02:00
Martijn van Groningen 0c61bc63e4
Backport: auto create data streams using index templates v2 (#56596)
Backport: #55377

This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.

Relates to #53100
2020-05-12 17:01:15 +02:00
Andrei Stefan f0074e93a0
QL: case sensitive support in EQL (#56404) (#56597)
* QL: case sensitive support in EQL (#56404)
* adds a generic startsWith function to QL
* modifies the existent EQL startsWith function to be case sensitive
aware
* improves the existent EQL startsWith function to use a prefix query
when the function is used in a case sensitive context. Same improvement
is used in SQL's newly added STARTS_WITH function.
* adds case sensitivity to EQL configuration through a case_sensitive
parameter in the eql request, as established in #54411.
The case_sensitive parameter can be specified when running queries
(default is case insensitive)

(cherry picked from commit ee5a09ea840167566e34c28c8225dc38bc6a7ae8)
2020-05-12 16:56:18 +03:00
Hendrik Muhs a9425a0240
[7.x][Transform] fix count when matching exact ids(#56544) (#56582)
fix count in get and get stats if explicit ids are given and ids might be
duplicated when configuration are stored in different index (versions).

fixes #56196
2020-05-12 14:23:13 +02:00
Marios Trivyzas 575cafb8da
SQL: Fix serialization of JDBC prep statement date/time params (#56492) (#56579)
The Date/Time related query params of a JDBC prepared statement
serialized using java.util.Date. The rules for serializing
`java.util.Date` objects though reside in
`XContentElasticsearchExtension` which is not available in the
jdbc jar as this class is in `server` module. Therefore, a
custom extension of the `XContentBuilderExtension` iface has been
added to the jdbc module/jar.

Moreover the sql's `qa` project had as dependency the `sql-action`
module which depends on `server` so the `XContentBuilderExtension`
was available for the integ tests hiding the real problem.

Previously, when a user was setting a `java.sql.Time` to the prepStmt,
the DataType used was `DATETIME` instead of `TIME` and therefore
prevented from filtering with a `TIME` casted field:
```
SELECT * FROM test WHERE date::TIME = ?
```

Fixes: #56084
(cherry picked from commit f8d8e971bd2c85fa4aea44b5b3ba0cdcc950a4ed)
2020-05-12 13:25:02 +02:00
Martijn van Groningen 2e86801f61
Backport: enable searchable snapshots feature flag for xpack rest tests.
Backport of: #56569

A data stream test, which tests data stream resolvability in xpack apis failed in release builds.
A invocation of a searchable snapshot api failed, because the corresponding feature flag
wasn't enabled for xpack rest tests.

Closes #56531
2020-05-12 12:18:24 +02:00
Ignacio Vera 222ee721ec
Add moving percentiles pipeline aggregation (#55441) (#56575)
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
2020-05-12 11:35:23 +02:00
Marios Trivyzas 5c0f26de1d
SQL: [Docs] Fix example for DATETIME_PARSE (#56409)
When no timezone is specified the session timezone is used without
conversion, fix the docs test accordingly.

Follows: #56158
(cherry picked from commit 4b79b19ea5c3d17e05cb8130f3c754ac9bfd2382)
2020-05-12 09:23:00 +02:00
Ryan Ernst 902fc546bd
Migrate remaining ESIntegTestCases to internalClusterTest (#56479) (#56563)
This commit migrates the ESIntegTestCase tests in x-pack to the
internalClusterTest source set.
2020-05-11 21:06:04 -07:00
Nick Knize 9b64149ad2
[Geo] Refactor Point Field Mappers (#56060) (#56540)
This commit refactors the following:
  * GeoPointFieldMapper and PointFieldMapper to
    AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
  * .setupFieldType moved up to AbstractGeometryFieldMapper
  * lucene indexing moved up to AbstractGeometryFieldMapper.parse
  * new addStoredFields, addDocValuesFields abstract methods for implementing
    stored field and doc values field indexing in the concrete field mappers

This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
2020-05-11 17:11:36 -05:00
Tim Brooks 760ab726c2
Share netty event loops between transports (#56553)
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.
2020-05-11 15:43:43 -06:00
Benjamin Trent 1d6b2f074e
[Transform] adds geotile_grid support in group_by (#56514) (#56549)
This adds support for grouping by geo points. This uses the agg [geotile_grid](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geotilegrid-aggregation.html).

I am opting to store the tile results of group_by as a `geo_shape` so that users can query the results. Additionally, the shapes could be visualized and filtered in the kibana maps app.

relates to https://github.com/elastic/elasticsearch/issues/56121
2020-05-11 17:02:40 -04:00
Lee Hinman 1337b35572
Remove prefer_v2_templates query string parameter (#56545)
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.

Relates to #53101
Resolves #56528

This is not a breaking change because this flag was never in a released version.
2020-05-11 14:56:42 -06:00
zhenxianyimeng 8e96e5c936
Use CollectionUtils.isEmpty where appropriate (#55910)
This commit uses the isEmpty utility method for arrays in place of null and greater than zero checks.
2020-05-11 09:55:57 -07:00
Armin Braun 3ab6eba6bc
Fix RollupJobTaskTests Leaking Threads on Slowness (#56438) (#56518)
We are ensuring order in the two tests changed by waiting on latches.
The problem is, that 3s is a pretty short wait and on CI can randomly be exceeded
by pure chance. If that happened we wouldn't have visibility on it since we didn't
assert that the waits actually worked.
=> Fixed by asserting that the waits work and upping the timeout to our standard 10s
Also, moved to a per-test threadpool to make it simpler to identify which test failed,
should an unexpected task run on a closed client's pool afterall.
2020-05-11 17:24:10 +02:00
Jim Ferenczi 02ab9112a9 Fix spurious failures in AsyncSearchIntegTestCase (#56026)
Async search integration tests are subject to random failures when:
  * The test index has more than one replica.
  * The request cache is used.
  * Some shards are empty.
  * The maintenance service starts a garbage collection when node is closing.

They are also slow because the test index is created/populated on each
test method.

This change refactors these integration tests in order to:
  * Create the index once for the entire test suite.
  * Fix the usage of the request cache and replicas.
  * Ensures that all shards have at least one document.
  * Increase the delay of the maintenance service garbage collection.

Closes #55895
Closes #55988
2020-05-11 15:03:03 +02:00
Martijn van Groningen 9ae09570d8
Allow a number of broadcast transport actions to resolve data streams (#55726) (#56502)
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.

This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats, 
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.

Relates to #53100
2020-05-11 12:48:35 +02:00
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
Armin Braun 0a254cf223
Serialize Monitoring Bulk Request Compressed (#56410) (#56442)
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
2020-05-08 23:16:07 +02:00
Dimitris Athanasiou 44ffa388ac
[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423) (#56428)
We have been using a zero timeout in the case that DF analytics
is stopped. This may cause a timeout when we cancel, for example,
the reindex task.

This commit fixes this by using the default timeout instead.

Backport of #56423
2020-05-08 21:12:11 +03:00
David Roberts 9a3924a641
[ML] Adjust list of platforms that have ML native code (#56426)
Native code is now available for linux-aarch64.

Note that it is _not_ currently supported!
2020-05-08 16:22:45 +01:00
Dimitris Athanasiou c117ae7a6e
[7.x][ML] Force stopping stopped DF analytics should succeed (#56421) (#56424)
Force stopping a DF analytics job whose config exists and that
is stopped should succeed. This was broken by #56360.

Closes #56414

Backport of #56421
2020-05-08 18:04:24 +03:00
Tanguy Leroux 8e9b69bfd7
Use snapshot information to build searchable snapshot store MetadataSnapshot (#56289) (#56403)
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.

Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.

In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.

Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.

This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
2020-05-08 14:16:19 +02:00
Dimitris Athanasiou 60b1c67409
[7.x][ML] Allow stopping DF analytics whose config is missing (#56360) (#56408)
It is possible that the config document for a data frame
analytics job is deleted from the config index. If that is
the case the user is unable to stop a running job because
we attempt to retrieve the config and that will throw.

This commit changes that. When the request is forced,
we do not expand the requested ids based on the existing
configs but from the list of running tasks instead.

Backport of #56360
2020-05-08 13:54:44 +03:00