Commit Graph

4781 Commits

Author SHA1 Message Date
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
Armin Braun 0a254cf223
Serialize Monitoring Bulk Request Compressed (#56410) (#56442)
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
2020-05-08 23:16:07 +02:00
Dimitris Athanasiou 44ffa388ac
[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423) (#56428)
We have been using a zero timeout in the case that DF analytics
is stopped. This may cause a timeout when we cancel, for example,
the reindex task.

This commit fixes this by using the default timeout instead.

Backport of #56423
2020-05-08 21:12:11 +03:00
David Roberts 9a3924a641
[ML] Adjust list of platforms that have ML native code (#56426)
Native code is now available for linux-aarch64.

Note that it is _not_ currently supported!
2020-05-08 16:22:45 +01:00
Dimitris Athanasiou c117ae7a6e
[7.x][ML] Force stopping stopped DF analytics should succeed (#56421) (#56424)
Force stopping a DF analytics job whose config exists and that
is stopped should succeed. This was broken by #56360.

Closes #56414

Backport of #56421
2020-05-08 18:04:24 +03:00
Tanguy Leroux 8e9b69bfd7
Use snapshot information to build searchable snapshot store MetadataSnapshot (#56289) (#56403)
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.

Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.

In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.

Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.

This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
2020-05-08 14:16:19 +02:00
Dimitris Athanasiou 60b1c67409
[7.x][ML] Allow stopping DF analytics whose config is missing (#56360) (#56408)
It is possible that the config document for a data frame
analytics job is deleted from the config index. If that is
the case the user is unable to stop a running job because
we attempt to retrieve the config and that will throw.

This commit changes that. When the request is forced,
we do not expand the requested ids based on the existing
configs but from the list of running tasks instead.

Backport of #56360
2020-05-08 13:54:44 +03:00
Dimitris Athanasiou d064eda2b0
[7.x][ML] Ensure phase progress may only increase (#56339) (#56357)
Due to multi-threading it is possible that phase progress
updates written from the c++ process arrive reordered.
We can address this by ensuring that progress may only increase.

Closes #56282

Backport of #56339
2020-05-07 19:46:58 +03:00
William Brafford 691044e67b
Add xpack setting deprecations to deprecation API (#56290)
* Add xpack setting deprecations to deprecation API

The deprecated settings showed up in the deprecation log file by
default, but I did not add them to the deprecation API. This commit
fixes that. Now if you use one of the deprecated basic feature
enablement settings, calling _monitoring/deprecations will inform you of
that fact.

* Remove incorrectly backported settings documents

It seems that I backported these docs to the wrong place in #56061,
in #55980, and in #56167. I hope they're in the right place now.

Co-authored-by: debadair <debadair@elastic.co>
2020-05-07 10:28:17 -04:00
Nik Everett e35919d3b8
Optimize date_histograms across daylight savings time (backport of #55559) (#56334)
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.

Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)

I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.

This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.

This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.

Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
2020-05-07 09:10:51 -04:00
Tanguy Leroux 6233e32ab3 Fix SearchableSnapshotDirectoryTests.testIndexSearcher() (#56275)
Closes #56233
2020-05-07 11:12:35 +02:00
Tanguy Leroux 65a061e33a Fix SearchableSnapshotDirectoryTests.testClearCache (#56277)
This test sometimes fails when prewarming is enabled because 
it's possible that some files are cached in background while the 
test tries to clear the cache. This commit disables prewarming 
for this test.
2020-05-07 10:59:33 +02:00
Andrei Stefan 980f175222
EQL: simplify equals/not-equals TRUE/FALSE expressions (#56191) (#56306)
* Simplify equals/not-equals TRUE/FALSE expressions, by returning them
as is (TRUE variant) or negating them (FALSE variant)

(cherry picked from commit 17858afbe6da5fa0b3ecfc537cabb337e4baaffe)
2020-05-07 03:02:04 +03:00
Jason Tedor 33669c0420
Upgrade to Jackson 2.10.4 (#56188)
Another Jackson release is available. There are some CVEs addressed,
none of which impact us, but since we can now bump Jackson easily, let
us move along with the train to avoid the false positives from security
scanners.
2020-05-06 17:20:23 -04:00
Przemysław Witek 0cd0ab276e
Introduce Annotation.Builder class and use it to create instances of Annotation class (#56276) (#56286) 2020-05-06 20:47:03 +02:00
Julie Tibshirani e852bb29b7
Simplify signature of FieldMapper#parseCreateField. (#56144)
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.

This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
2020-05-06 11:12:09 -07:00
Dimitris Athanasiou 011e995165
[7.x][ML] Unmute ClssificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange (#56268) (#56287)
Closes #56240
2020-05-06 18:20:46 +03:00
Luca Cavanna 9a9cb68e83 Async Search: correct shards counting (#55758)
Async search allows users to retrieve partial results for a running search. For partial results, the number of successful shards does not include the skipped shards, while the response returned to users should.

Also, we recently had a bug where async search would miss tracking shard failures, which would have been caught if we had assertions in place that verified that whenever we get the last response, the number of failures included in it is the same as the failures that were tracked through the listener notifications.
2020-05-06 12:13:30 +02:00
Tanguy Leroux 07ad742b60 Enable prewarming by default for searchable snapshots (#56201)
Now searchable snapshots directories respect the repository 
rate limitations (#55952) we can enable prewarming by default 
for shards.
2020-05-06 10:18:34 +02:00
Tanguy Leroux 131a3911eb Replace BlobContainerWrapper by FilterBlobContainer (#56200)
A FilterBlobContainer class was introduced in #55952 and it delegates
 its behavior to a given BlobContainer while allowing to override 
only necessary methods.

This commit replaces the existing BlobContainerWrapper class from 
the test framework with the new FilterBlobContainer from core.
2020-05-06 10:05:43 +02:00
Julie Tibshirani bd7a2d2b01 Mute the geogrid agg circuit breaker tests. 2020-05-05 18:09:07 -07:00
Jake Landis a22690c9ca
[7.x] Ensure that the monitoring export exceptions are logged. (#56237) (#56251)
If an exception occurs while flushing a bulk the cause of the exception
can be lost. This commit ensures that cause of the exception is carried
forward and gets logged.
2020-05-05 19:24:26 -05:00
Julie Tibshirani 49de092b38 Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet. 2020-05-05 16:25:36 -07:00
Bogdan Pintea 47250b14a4
SQL: Add BigDecimal support to JDBC (#56015) (#56220)
* SQL: Add BigDecimal support to JDBC (#56015)

* Introduce BigDecimal support to JDBC -- fetching

This commit adds support for the getBigDecimal() methods.

* Allow BigDecimal params in double range

A prepared statement will now accept a BigDecimal parameter as a proxy
for a double, if the conversion is lossless.

(cherry picked from commit e9a873ad7f387682e3472110b1d7c0514bd347c9)

* Fix compilation error

Dimond notation with anonymous inner classes not avail in Java8.
2020-05-05 23:19:36 +02:00
Bogdan Pintea f159fd8a20
Fix test on incompatible client versions (#56234) (#56241)
The incomatible client version test is changed to:
- iterate on all versions prior to the allowed one_s;
- format the exception message just as the server does it.

The defect stemed from the fact that the clients will not send a
version's qualifier, but just major.minor.revision, so the raised
error/exception_message won't contain it, while the test expected it.

(cherry picked from commit 4a81c8f7a1f4573e3be95f346d9fb18772b297ee)
2020-05-05 23:18:29 +02:00
Julie Tibshirani 63062ec7bd Mute ClassificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange. 2020-05-05 13:48:35 -07:00
Dan Hermann 6674f14fb3
[7.x] Get index includes parent data stream for backing indices (#56238) 2020-05-05 15:43:42 -05:00
Benjamin Trent e1c5ca421e
[7.x] [ML] lay ground work for handling >1 result indices (#55892) (#56192)
* [ML] lay ground work for handling >1 result indices (#55892)

This commit removes all but one reference to `getInitialResultsIndexName`. 
This is to support more than one result index for a single job.
2020-05-05 15:54:08 -04:00
Julie Tibshirani 793f265451 Mute SearchableSnapshotDirectoryTests.testIndexSearcher. 2020-05-05 12:29:05 -07:00
Ross Wolf 389082033e
EQL: Add concat function (#55193)
* EQL: Add concat function
* EQL: for loop spacing for concat
* EQL: return unresolved arguments to concat early
* EQL: Add concat integration tests
* EQL: Fix concat query fail test
* EQL: Add class for concat function testing
* EQL: Add concat integration tests
* EQL: Update concat() null behavior
2020-05-05 12:53:34 -06:00
Bogdan Pintea 23c35e32f2
SQL: introduce a query builder for the Rest tests (#55094) (#56221)
* Introduce a query builder for the rest tests

The new BaseRestSqlTestCase.RequestObjectBuilder class is a helper class
to build REST request objects for the tests. Consequently, "manual" string
concatenation to form JSON is done away with.

The class mimics SqlQueryRequestBuilder API.

(cherry picked from commit c8363f04c029542c233a758e9286d33c51d9c0c4)
2020-05-05 18:55:41 +02:00
Tal Levy e4f2c3105d
Add geo_shape support for geotile_grid and geohash_grid (#55966) (#56228)
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.

it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
2020-05-05 09:54:14 -07:00
Benjamin Trent 641f598364
[Transform] fixes http status code when bad scripts are provided (#56117) (#56219)
Transforms should propagate up the search execution exception if one is returned when it does the test query. 

this allows transforms to return a `4xx` when the aggs are malformed but parseable. 

closes https://github.com/elastic/elasticsearch/issues/55994
2020-05-05 12:36:22 -04:00
Bogdan Pintea 0e5632dc3a
SQL: relax version lock between server and clients (#56148) (#56223)
* Relax version lock between ES/SQL and clients

Allow older-than-server clients to connect, if these are past or on a
certain min release.

(cherry picked from commit 108f907297542ce649aa7304060aaf0a504eb699)
2020-05-05 18:27:06 +02:00
William Brafford 3499fa917c
Deprecated xpack "enable" settings should be no-ops (#55416) (#56167)
The following settings are now no-ops:

* xpack.flattened.enabled
* xpack.logstash.enabled
* xpack.rollup.enabled
* xpack.slm.enabled
* xpack.sql.enabled
* xpack.transform.enabled
* xpack.vectors.enabled

Since these settings no longer need to be checked, we can remove settings
parameters from a number of constructors and methods, and do so in this
commit.

We also update documentation to remove references to these settings.
2020-05-05 10:40:49 -04:00
Tanguy Leroux b9636713b1
Searchable Snapshots should respect max_restore_bytes_per_sec (#55952) (#56199)
This commit changes searchable snapshots so that it now respects the 
repository's max_restore_bytes_per_sec setting when it downloads blobs.

Backport of #55952 for 7.x
2020-05-05 15:43:06 +02:00
David Roberts 7aa0daaabd
[7.x][ML] More advanced model snapshot retention options (#56194)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of #56125
2020-05-05 14:31:58 +01:00
David Turner 40ea0eabd9 Forbid snapshot access on applier thread (#56044)
This commit strengthens the assertion about which threads may access a blob
store to exclude the cluster applier thread, since we no longer need to do so.

Relates #50999
2020-05-05 13:27:55 +01:00
Dimitris Athanasiou 2d7899c83c
[7.x][ML] Adjust DF Analytics process phases (#56107) (#56177)
As of elastic/ml-cpp#1179, the analytics process reports phases
depending on the analysis type. This commit adjusts the phases
of current analyses from `analyzing` to the following:

 - outlier_detection: [`computing_outlier`]
 - regression/classification: [`feature_selection`, `coarse_parameter_search`, `fine_tuning_parameters`, `final_training`]

Backport of #56107
2020-05-05 15:00:07 +03:00
Dimitris Athanasiou 75dadb7a6d
[7.x][ML] Add loss_function to regression (#56118) (#56187)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
2020-05-05 14:59:51 +03:00
Hendrik Muhs e177a38504
[7.x][Transform] add throttling (#56007) (#56184)
add throttling to transform, throttling will slow down search requests by
delaying the execution based on a documents per second metric.

fixes #54862
2020-05-05 13:09:02 +02:00
Marios Trivyzas 363e994171
SQL: Fix DATETIME_PARSE behaviour regarding timezones (#56158) (#56182)
Previously, when the timezone was missing from the datetime string
and the pattern, UTC was used, instead of the session defined timezone.
Moreover, if a timezone was included in the datetime string and the
pattern then this timezone was used. To have a consistent behaviour
the resulting datetime will always be converted to the session defined
timezone, e.g.:
```
SELECT DATETIME_PARSE('2020-05-04 10:20:30.123 +02:00', 'HH:mm:ss dd/MM/uuuu VV') AS datetime;
```
with `time_zone` set to `-03:00` will result in
```
2020-05-04T05:20:40.123-03:00
```

Follows: #54960
(cherry picked from commit 8810ed03a209cc8fe1bad309a81e85b56a39da27)
2020-05-05 12:08:39 +02:00
Tanguy Leroux f717830563
Use workers to warm cache parts (#55793) (#56181)
Today the cache prewarming introduced in #55322 works by 
enqueuing altogether the files parts to warm in the 
searchable_snapshots thread pool. In order to make this fairer
 among concurrent warmings, this commit starts workers that 
concurrently polls file parts to warm from a queue, warms the 
part and then immediately schedule another warming 
execution. This should leave more room for concurrent 
shard warming to sneak in and be executed.

Relates #55322
2020-05-05 11:48:06 +02:00
Tanguy Leroux 35622747fd
Add Minio tests for searchable snapshots (#56112) (#56179)
This commit adds QA tests for searchable snapshot on MinIO,
similarly to what already exist for S3, GCS and Azure.
2020-05-05 11:40:06 +02:00
Marios Trivyzas cc21468559
SQL: Fix issue with date range queries and timezone (#56115) (#56174)
Previously, the timezone parameter was not passed to the RangeQuery
and as a results queries that use the ES date math notation (now,
now-1d, now/d, now/h, now+2h, etc.) were using the UTC timezone and
not the one passed through the "timezone"/"time_zone" JDBC/REST params.
As a consequence, the date math defined dates were always considered in
UTC and possibly led to incorrect results for queries like:
```
SELECT * FROM t WHERE date BETWEEN now-1d/d AND now/d
```

Fixes: #56049
(cherry picked from commit 300f010c0b18ed0f10a41d5e1606466ba0a3088f)
2020-05-05 10:54:23 +02:00
Dimitris Athanasiou 6061aa3db4
[7.x][ML] Fix race condition updating reindexing progress (#56135) (#56146)
In #55763 I thought I could remove the flag that marks
reindexing was finished on a data frame analytics task.
However, that exposed a race condition. It is possible that
between updating reindexing progress to 100 because we
have called `DataFrameAnalyticsManager.startAnalytics()` and
a call to the _stats API which updates reindexing progress via the
method `DataFrameAnalyticsTask.updateReindexTaskProgress()` we
end up overwriting the 100 with a lower progress value.

This commit fixes this issue by bringing back the help of
a `isReindexingFinished` flag as it was prior to #55763.

Closes #56128

Backport of #56135
2020-05-05 10:48:42 +03:00
Albert Zaharovits e8763bad41
Let realms gracefully terminate the authN chain (#55623)
AuthN realms are ordered as a chain so that the credentials of a given
user are verified in succession. Upon the first successful verification,
the user is authenticated. Realms do however have the option to cut short
this iterative process, when the credentials don't verify and the user
cannot exist in any other realm. This mechanism is currently used by
the Reserved and the Kerberos realm.

This commit improves the early termination operation by allowing
realms to gracefully terminate authentication, as if the chain has been
tried out completely. Previously, early termination resulted in an
authentication error which varies the response body compared
to the failed authentication outcome where no realm could verify the
credentials successfully.

Reserved users are hence denied authentication in exactly the same
way as other users are when no realm can validate their credentials.
2020-05-05 10:11:49 +03:00
Martijn van Groningen 2ac32db607
Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151)
Backport of #56034.

Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.

Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.

Relates to #53100
2020-05-04 22:38:33 +02:00
Dan Hermann 9892813842
[7.x] Delay warning about missing x-pack (#56142)
* Delay warning about missing x-pack (#54265)

Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.

The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.

Closes #40898
2020-05-04 14:16:18 -05:00
Benjamin Trent 6c26de444d
[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020) (#56126)
If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes https://github.com/elastic/elasticsearch/issues/55985
2020-05-04 13:32:01 -04:00