Commit Graph

50105 Commits

Author SHA1 Message Date
Marios Trivyzas daab242c75
SQL: Fix ORDER BY on aggregates and GROUPed BY fields (#51894)
Previously, in the in-memory sorting module
`LocalAggregationSorterListener` only the aggregate functions where used
(grabbed by the `sortingColumns`). As a consequence, if the ORDER BY
was also using columns of the GROUP BY clause, (especially in the case
of higher priority - before the aggregate functions) wrong results were
produced. E.g.:
```
SELECT gender, MAX(salary) AS max FROM test_emp
GROUP BY gender
ORDER BY gender, max
```

Add all columns of the ORDER BY to the `sortingColumns` so that the
`LocalAggregationSorterListener` can use the correct comparators in
the underlying PriorityQueue used to implement the in-memory sorting.

Fixes: #50355
(cherry picked from commit be680af11c823292c2d115bff01658f7b75abd76)
2020-02-12 09:38:47 +01:00
Andrei Stefan 74e7777cbb Hook in the optimizer rules (#52172)
(cherry picked from commit 1f90d8cc56052fbf2af604e72f9f5ca73f5e75d5)
2020-02-12 09:32:34 +02:00
Andrei Stefan a21e2b211a Extract common optimizer tests (#52169)
(cherry picked from commit e5ad72bc22e9ec0686ab582195f0032efcb880bf)
2020-02-12 09:32:33 +02:00
Hendrik Muhs edaf6d1f79
[Transform] maintain a list of unsupported aggregations in transforms (#52190) (#52222)
add a list of unsupported aggs in transforms and create a test that fails if a new aggregation is
added. Limitation: works only if a new agg is added to either the core or a known plugin
(Analytics, MatrixAggregation).
2020-02-12 07:48:04 +01:00
Lisa Cawley dd14210689 [DOCS] Clarifies machine learning built-in roles (#51504) 2020-02-11 18:28:53 -08:00
Jason Tedor 79e5e809b6
Add unit tests for reading JVM options files (#52176)
This commit adds some unit tests to cover the reading of JVM options
files.
2020-02-11 21:02:34 -05:00
Benjamin Trent 2a968f4f2b
[ML] job results provider refactoring (#52012) (#52238)
During a bug hunt, I caught a handful of things (unrelated to the bug) that could be potential issues:

1. Needlessly wrapping in exception handling (minor cleanup)
2. Potential of notifying listeners of a failure multiple times + even trying to notify of a success after a failure notification
2020-02-11 17:54:44 -05:00
Mark Vieira 28c56da754
Don't track absolute path as test input to improve cacheability (#52235) 2020-02-11 13:32:59 -08:00
Gordon Brown d48ce12920
Convert ILM and SLM histories into hidden indices (#51456)
Modifies SLM's and ILM's history indices to be hidden indices for added
protection against accidental querying and deletion, and improves
IndexTemplateRegistry to handle upgrading index templates.

Also modifies the REST test cleanup to delete hidden indices.
2020-02-11 14:18:55 -07:00
Jason Tedor bb2e04bc16
Use absolute path for temporary directory in tests (#52228)
We explicitly set the path for the temporary directory to use in test
tasks, but today this path is a relative path, relative to the current
working directory of the test task. The fact that we are using a
relative path here appears to be legacy, simply leftover from the days
of the Maven build. An absolute path is preferred here, since it's
explicit and we do not have to rely on everyone resolving the path
properly relative to the working directory.
2020-02-11 15:17:45 -05:00
Jason Tedor 6ed3311443
Ensure test temporary directory exists (#52227)
Today we we set the test temporary directory explicitly by controling
java.io.tmpdir. Yet, we do not guarantee this directory exists, instead
relying on a test base class (LuceneTestCase) to create this directory
when it initializes. However, some of our tests do not rely on our test
framework, and thus do not have access to LuceneTestCase, instead
relying on RandomizedRunner directly. We should not be relying on the
temporary directory being implicitly created, instead guaranteeing that
it exists before test execution starts. This commit does that by
creating the test temporary directory before the test task executes (via
a doFirst).
2020-02-11 14:53:16 -05:00
Zachary Tong 0372d6d239 Allow ObjectParsers to specify required sets of fields (#49661)
ConstructingObjectParser can be used to specify required fields,
but it is still difficult to configure "sets" of fields where only
one of the set is required (requiring hand-rolled logic in each
ConstructingObjectParser, or adding special validation methods
to objects that are called after building the object).

This commit adds a new method on ObjectParser which allows
the parsers to register required sets.  E.g. ["foo", "bar"] can be
registered, which means "foo", "bar" or both must be configured
by the user otherwise an exception is thrown.

This pattern crops up in many places in our parsers; a good example are
the aggregation "field" and "script" fields.  One or both must be
configured on all aggregations, omitting both should result in an exception.
This was previously handled far downstream resulting in an aggregation
exception, when it should be a parse exception.
2020-02-11 13:03:33 -05:00
Nik Everett 86d5211c05
Make sorting by an agg results a real abstraction (#52007) (#52212)
This removes a bunch of `instanceof`s in favor of two new methods on
`InernalAggregation`. The default implementations of these methods just
throw exceptions explaining that you can't sort on this aggregation.
They are overridden by all of the classes that used to have `instanceof`
checks against them.

I doubt this is really any faster in practice. The real benefit here is
that it is a little more obvious *that* you can sort by the results of
an aggregation and it should be *much* more obvious where to look at
*how* aggregations sort themselves.

There are still a bunch more `instanceof`s in left in `AggregationPath`
but those will wait for a followup change.
2020-02-11 12:58:40 -05:00
Albert Zaharovits cc1fce96ba
Add a new async search security origin (#52141)
This commit adds a new security origin, and an associated reserved user
and role, named `_async_search`, which can be used by internal clients to
manage the `.async-search-*` restricted index namespace.
2020-02-11 19:58:06 +02:00
James Rodewig d68a4ec82e
[7.x] Permit EQL feature flag in release builds (#52201) (#52214)
7.x backport of #52201

Provides a path to set register the EQL feature flag in release builds.
This enables EQL in release builds so that release docs tests pass.

Release docs tests do not have infrastructure in place to only register
snippets from included portions of the docs, they instead include all
docs snippets.

Since EQL can not be enabled in release builds, this meant that the EQL
snippets fail in the release docs tests.

This adds the ability to enable EQL in the release docs tests. This
system property will be removed when EQL is ready for release.
2020-02-11 11:49:49 -05:00
Hendrik Muhs 098380e483 Percentiles aggregation validation checks for range (#51871)
disallow to specify percentile out of range [0,100]. This also fixes a problem in transform by failing
validation if an invalid percentile configuration is used.
2020-02-11 17:25:39 +01:00
James Rodewig 6fe8f1649b [DOCS] Include docs on permanently unreleased branches only (#51743)
Adds the ability to display docs on permanently unreleased branches,
such as `master` and `7.x`.

Also updates how the autoscaling and EQL docs are included.
Currently, these feature-flag docs would display on any unreleased
branches that contain the changes, such as 7.7.
2020-02-11 11:24:13 -05:00
David Roberts d1d9c40e71 [ML] Switch poor categorization audit warning to use status field (#52195)
In #51146 a rudimentary check for poor categorization was added to
7.6.

This change replaces that warning based on a Java-side check with
a new one based on the categorization_status field that the ML C++
sets.  categorization_status was added in 7.7 and above by #51879,
so this new warning based on more advanced conditions will also be
in 7.7 and above.

Closes #50749
2020-02-11 15:33:27 +00:00
David Roberts 473468d763 [ML] Better error when persistent task assignment disabled (#52014)
Changes the misleading error message when attempting to open
a job while the "cluster.persistent_tasks.allocation.enable"
setting is set to "none" to a clearer message that names the
setting.

Closes #51956
2020-02-11 15:23:21 +00:00
Zachary Tong 87854573e4 Add version constant for 7.6.1 2020-02-11 09:44:43 -05:00
Igor Motov 667e1a5225
Add Boxplot Aggregation (#52174)
Adds a `boxplot` aggregation that calculates min, max, medium and the first
and the third quartiles of the given data set.

Closes #33112
2020-02-11 09:38:17 -05:00
Marios Trivyzas 204d086266 SQL: Fix issue with timezone when paginating (#52101)
Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone Z. This could lead to incorrect results.

Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.

Fixes: #51258
(cherry picked from commit 8f7afbdeb9295999b48a6c36db5b31cbe0cee432)
2020-02-11 15:27:56 +01:00
David Turner 00b9098250 Ignore timeouts with single-node discovery (#52159)
Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
`cluster.publish.timeout` to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.

In the special case of `discovery.type: single-node` there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.
2020-02-11 14:15:01 +00:00
David Roberts 4c88996cd7 [DOCS] Correct important note for xpack.transform.enabled (#52194)
Because transforms get assigned to an arbitrary data node it
is important that the transforms plugin is enabled on every
data node.
2020-02-11 13:02:10 +00:00
Yang Wang 16ba59e9d1
Expose more authentication info to ingest pipeline (#51305) (#52119)
The changes add more granularity for identiying the data ingestion user.
The ingest pipeline can now be configure to record authentication realm and
type. It can also record API key name and ID when one is in use. 
This improves traceability when data are being ingested from multiple agents
and will become more relevant with the incoming support of required
pipelines (#46847)

Resolves: #49106
2020-02-11 23:05:01 +11:00
David Kyle 343ced42be Mute LoggingOutputStreamTests.testMaxBuffer (#52193)
Relates to https://github.com/elastic/elasticsearch/issues/51838
2020-02-11 11:46:17 +00:00
Tim Vernum b0b1b13311
Extract class to store Authentication in context (#52183)
This change extracts the code that previously existed in the
"Authentication" class that was responsible for reading and writing
authentication objects to/from the ThreadContext.

This is needed to support multiple authentication objects under
separate keys.

This refactoring highlighted that there were a large number of places
where we extracted the Authentication/User objects from the thread
context, in a variety of ways. These have been consolidated to rely on
the SecurityContext object.

Backport of: #52032
2020-02-11 20:59:06 +11:00
Dimitris Athanasiou 6086fadf00
[7.x][ML] Prepare to hold additional stats in DF Analytics task (#52134) (#52187)
Refactors `DataFrameAnalyticsTask` to hold a `StatsHolder` object.
That just has a `ProgressTracker` for now but this is paving the
way to add additional stats like memory usage, analysis stats, etc.

Backport #52134
2020-02-11 11:18:45 +02:00
Martijn van Groningen c14e4666df
Wait for watcher to be started prior to rolling upgrade tests. (#52186)
Backport: #52139

In the rolling upgrade tests, watcher is manually executed,
in rare scenarios this happens before watcher is started,
resulting in the manual execution to fail.

Relates to #33185
2020-02-11 09:39:20 +01:00
Dimitris Athanasiou cbebc26f50
[7.x][ML] Retry persisting DF Analytics results (#52048) (#52160)
Employs `ResultsPersisterService` from `DataFrameRowsJoiner` in order
to add retries when a data frame analytics job is persisting the results
to the destination data frame.

Backport of #52048
2020-02-11 09:55:00 +02:00
Andrei Stefan 2f1631d9d0
Telemetry data initial implementation (#51715) (#52175)
(cherry picked from commit f1d1cceacaacf226fcd2459f34689843b822fe4b)
2020-02-11 09:15:47 +02:00
Lisa Cawley c4525f8cca
[DOCS] Adds ml-cpp PRs to release notes (#52158)
Co-Authored-By: David Roberts <dave.roberts@elastic.co>
2020-02-10 18:06:01 -08:00
Jason Tedor 91d0996e08
Remove unnecessary method in JvmOptionsParser (#52173)
Back when the distribution launchers were compiled to target JDK 7, we
did not have access to the String#join method to space-delimit JVM
options. Since the launchers now target the same minimum JDK as
Elasticsearch itself, we now have access to this method and can replace
the use of spaceDelimitJvmOptions with String#join. This commit does
that.
2020-02-10 20:22:02 -05:00
Gordon Brown 350288ddf8
Check dot-index rules after template application (#52087)
Previously, the dot-index rules (namely, that indices with dot-prefixed
names should be either hidden indices or system indices) was done
before* template application, and so only checked for the `index.hidden`
setting in the request, ignoring if that setting was set via a template.

This commit moves that check to a different method, which is applied
after templates have been resolved and applied to the index settings.
2020-02-10 17:01:59 -07:00
Jason Tedor a99b311e2f
Refactor JvmOptionsParser for testability (#52102)
This commit prepares the JvmOptionsParser to be more unit testable by
refactoring the class to have some input that it pulls from external
sources passed in as arguments. We do not change any functionality in
this commit, nor add any unit tests, we are only preparing the way.
2020-02-10 18:40:00 -05:00
Ryan Ernst 5a72b23716
Migrate SysV init tests from bats to java packaging (#51077) (#51498)
This commit converts the sysv init tests from bats tests into the java
packaging tests. Since it is the last oss specific test, the bats oss
test task is also removed.

relates #46005
2020-02-10 17:41:33 -05:00
Mark Vieira 42610c6d74
Only pull docker images for fixture projects (#52157) 2020-02-10 13:50:31 -08:00
Ryan Ernst 88cf8ac0a8 Fix windows empty line in logging capture (#52162)
This commit fixes another edge case in handling windows newlines in our
capture of stdout/stderr to log4j. The case is that the \r appears at
the beginning of the buffer when flushing, which would unintentionally
be emitted as an empty string. This commit skips the flush if only a \r
was found.

closes #51838
2020-02-10 13:29:50 -08:00
Marios Trivyzas 6b600855a9
SQL: Make parsing of date more lenient (#52137)
Make the parsing of date more lenient

- as an escaped literal: `{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}`
- cast a string to a date: `CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)`

Closes: #49379
(cherry picked from commit 5863b27500d5e7f6cdd8c6c62b09b84e53ca724a)
2020-02-10 21:47:00 +01:00
Mark Vieira 47255c4fd7
Remove unnecessary CI configuration files
Signed-off-by: Mark Vieira <portugee@gmail.com>
2020-02-10 11:16:35 -08:00
Julie Tibshirani 28a8db730f In FieldTypeLookup, factor out flat object field logic. (#52091)
Currently, the logic for looking up `flattened` field types lives in the
top-level `FieldTypeLookup`. This PR moves it into a dedicated class
`DynamicKeyFieldTypeLookup`.
2020-02-10 10:44:02 -08:00
Bogdan Pintea 7b58ed0dd7
Fix milliseconds handling in intervals (#51675) (#52156)
This fixes:

- the parsing of milliseconds in intervals: everything past the . used to be converted as-is to milliseconds, with no normalisation of the unit; thus, a value of .23 ended up as 23 millis in the interval, instead of 230.
- the printing of a trailing .0, in case the interval lacks the fractional part;
- tests generating a random millisecond value used to simply print it in the string about to be evaluated without a necessary front-filling of 0[s], where the amount was below 100/10.

(The combination of first and last issues above, plus statistical "luck" made the incorrect handling pass the tests.)

(cherry picked from commit 4de8c64f63ee37c1bcfdb9b9d3a07d09be243222)
2020-02-10 19:24:26 +01:00
Jason Tedor d188dda7eb
Move docker-compose logging statement to debug (#52107)
When docker-compose is required for a test fixture but is not
available, we warn log a message to this effect. This ends up being
noise during configuration, especially when working locally. This
commit changes the logging level of these messages to debug.
2020-02-10 13:13:36 -05:00
William Brafford 610f6814da
Remove unnecessary dirname command (#51968) (#52089)
The elasticsearch-env script changes the working directory to ES_HOME,
so we can just use bin/elasticsearch-keystore to invoke the keystore.
2020-02-10 11:05:36 -05:00
Lee Hinman 37a2e9bac6
[7.x] Allow forcemerge in the hot phase for ILM policies (#520… (#52083)
* Allow forcemerge in the hot phase for ILM policies

This commit changes the `forcemerge` action to also be allowed in the `hot` phase for policies. The
forcemerge will occur after a rollover, and allows users to take advantage of higher disk speeds for
performing the force merge (on a separate node type, for example).

On caveat with this is that a `forcemerge` in the `hot` phase *MUST* be accompanied by a `rollover`
action. ILM validates policies to ensure this is the case.

Resolves #43165

* Use anyMatch instead of findAny in validation

* Make randomTimeseriesLifecyclePolicy single-pass
2020-02-10 08:54:49 -07:00
Armin Braun d8169e5fdc
Don't Upload Redundant Shard Files (#51729) (#52147)
Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
phyiscally loading this information as well.
2020-02-10 16:50:09 +01:00
Przemysław Witek c7cc383d33
[7.x] Update persistent state document in the index the document belongs to (#51751) (#52145) 2020-02-10 16:32:34 +01:00
Martijn van Groningen c77b80f01e
Unmute smoke test monitoring with watcher. (#52140)
Backport of #51490
2020-02-10 15:13:32 +01:00
Nhat Nguyen 864e9d875d Bubble up exception in follow task in ccr tests (#52085)
It's perfectly fine if a bulk request on the follower hits 
IndexShardClosedException in some CCR tests because we sometimes 
close some follower shards while the follow-task is replicating operations.
Instead of failing the test immediately, this commit bubbles up that
failure to the shard follow task.

Closes #52052
2020-02-10 08:27:04 -05:00
Marios Trivyzas 27265f032a SQL: Enhance timestamp escaped literal parsing (#52097)
Allow also whitespace ` ` (together with `T`) as a separator between
date and time parts of the timestamp string. E.g.:
```
{ts '2020-02-08 12.10.45'}
```
or
```
{ts '2020-02-08T12.10.45'}
```

Fixes: #46069
(cherry picked from commit 07c977023fb8ceab5991c359a6cbfe07beaad9bb)
2020-02-10 11:24:55 +01:00