Commit Graph

3263 Commits

Author SHA1 Message Date
David Roberts ab045744ac [ML-DataFrame] Fix off-by-one error in checkpoint operations_behind (#46235)
Fixes a problem where operations_behind would be one less than
expected per shard in a new index matched by the data frame
transform source pattern.

For example, if a data frame transform had a source of foo*
and a new index foo-new was created with 2 shards and 7 documents
indexed in it then operations_behind would be 5 prior to this
change.

The problem was that an empty index has a global checkpoint
number of -1 and the sequence number of the first document that
is indexed into an index is 0, not 1.  This doesn't matter for
indices included in both the last and next checkpoints, as the
off-by-one errors cancelled, but for a new index it affected
the observed result.
2019-09-03 12:45:02 +01:00
Ignacio Vera 59c474e675
reset queryGeometry in ShapeQueryTests (#45974) (#46251) 2019-09-03 10:24:46 +02:00
markharwood a9e3871fc7
Test fix for PinnedQueryBuilderIT (#46187) (#46227)
Fix test issue to stabilise scoring through use of DFS search mode.
Randomised index-then-delete docs introduced by the test framework likely caused an imbalance in IDF scores across shards. Also made number of shards used in test a random number for added test coverage.

Closes #46174
2019-09-02 13:31:22 +01:00
Marios Trivyzas 3bee647e5b
SQL: Fix issue with DataType for CASE with NULL (#46173)
Previously, if the DataType of all the WHEN conditions of a CASE
statement is NULL, then it was set to NULL even if the ELSE clause
has a non-NULL data type, e.g.:
```
CASE WHEN a = 1 THEN NULL
           WHEN a = 5 THEN NULL
ELSE 'foo'
```

Fixes: #46032
(cherry picked from commit 8c1012efbbd3a300afd0dfb9b18250f15ea753f9)
2019-09-02 11:17:24 +03:00
Benjamin Trent d0c5573a51
[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044) (#46096)
Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot.

This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.
2019-08-30 09:27:07 -05:00
Alexander Reelsen 98c32c7846 Fix wrong URL encoding in watcher HTTP client (#45894)
The test assumption was calling the wrong method resulting in a URL
encoding before returning the data.

Closes #44970
2019-08-30 14:02:49 +02:00
Dimitris Athanasiou 5921ae53d8
[7.x][ML] Regression dependent variable must be numeric (#46072) (#46136)
* [ML] Regression dependent variable must be numeric

This adds a validation that the dependent variable of a regression
analysis must be numeric.

* Address review comments and fix some problems

In addition to addressing the review comments, this
commit fixes a few issues I found during testing.

In particular:

- if there were mappings for required fields but they were
not included we were not reporting the error
- if explicitly included fields had unsupported types we were
not reporting the error

Unfortunately, I couldn't get those fixed without refactoring
the code in `ExtractedFieldsDetector`.
2019-08-30 09:57:43 +03:00
Zachary Tong cf8a4171e1
Rename `data-science` plugin to `analytics` (#46133)
Rename `data-science` plugin to `analytics`. 
Also removes enabled flag. Backport of #46092
2019-08-29 12:45:39 -04:00
Simon Willnauer 9b2ea07b17
Flush engine after big merge (#46066) (#46111)
Today we might carry on a big merge uncommitted and therefore
occupy a significant amount of diskspace for quite a long time
if for instance indexing load goes down and we are not quickly
reaching the translog size threshold. This change will cause a
flush if we hit a significant merge (512MB by default) which
frees diskspace sooner.
2019-08-29 17:54:15 +02:00
Nhat Nguyen 028e792e1d Remove already exist assertion while renew ccr lease (#46009)
If a CCR lease is disappeared while we are renewing it, then we will
issue asyncAddRetentionLease to add that lease. And if
asyncAddRetentionLease takes longer than retentionLeaseRenewInterval,
then we can issue another asyncAddRetentionLease request. One of
asyncAddRetentionLease requests will fail with
RetentionLeaseAlreadyExistsException, hence trip the assertion.

Closes #45192
2019-08-29 09:44:40 -04:00
Przemysław Witek b8a0379057
Refactor auditor-related classes (#45893) (#46120) 2019-08-29 14:21:03 +02:00
Przemysław Witek fbe9e8a530
Do not throw an exception if the process finished quickly but without any error. (#46073) (#46113) 2019-08-29 10:47:17 +02:00
Gordon Brown 47bbd9d9a9
[7.x] Fix rollover alias in SLM history index template (#46001)
This commit adds the `rollover_alias` setting required for ILM to work
correctly to the SLM history index template and adds assertions to the
SLM integration tests to ensure that it works correctly.
2019-08-28 14:50:22 -07:00
Tal Levy a356bcff41
Add Circle Processor (#43851) (#46097)
add circle-processor that translates circles to polygons
2019-08-28 14:44:08 -07:00
Julie Tibshirani d94c4dcffb Use float instead of double for query vectors. (#46004)
Currently, when using script_score functions like cosineSimilarity, the query
vector is treated as an array of doubles. Since the stored document vectors use
floats, it seems like the least surprising behavior for the query vectors to
also be float arrays.

In addition to improving consistency, this change may help with some
optimizations we have been considering around vector dot product.
2019-08-28 11:03:14 -07:00
Mark Tozzi 9ac85a4a2b Fix compilation in CumulativeCardinalityAggregatorTests 2019-08-28 09:31:48 -04:00
Dimitris Athanasiou 25d64508f6
[7.x][ML] Support boolean fields for DF analytics (#46037) (#46054)
This commit adds support for `boolean` fields in data frame
analytics (and currently both outlier detection and regression).
The analytics process expects `boolean` fields to be encoded as
integers with 0 or 1 value.
2019-08-28 12:02:29 +03:00
Jake Landis 154d1dd962
Watcher max_iterations with foreach action execution (#45715) (#46039)
Prior to this commit the foreach action execution had a hard coded 
limit to 100 iterations. This commit allows the max number of 
iterations to be a configuration ('max_iterations') on the foreach 
action. The default remains 100.
2019-08-27 16:57:20 -05:00
Armin Braun fdef293c81 Fix RegressionTests#fromXContent (#46029)
* The `trainingPercent` must be between `1` and `100`, not `0` and `100` which is causing test failures
2019-08-27 18:24:26 +03:00
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Yogesh Gaikwad 7b6246ec67
Add `manage_own_api_key` cluster privilege (#45897) (#46023)
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.

This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.

The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.

The changes cover HLRC changes, documentation for the API changes.

Closes #40031
2019-08-28 00:44:23 +10:00
Dimitris Athanasiou dd6c13fdf9
[ML] Add description to DF analytics (#45774) (#46019) 2019-08-27 15:48:59 +03:00
Albert Zaharovits 1ebee5bf9b
PKI realm authentication delegation (#45906)
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.

In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.

Closes #34396
2019-08-27 14:42:46 +03:00
Zachary Tong 943a016bb2
Add Cumulative Cardinality agg (and Data Science plugin) (#45990)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 16:19:55 -04:00
Benjamin Trent a3a4ae0ac2
[ML] fixing bug where analytics process starts with 0 rows (#45879) (#45988)
The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start.

When building the configuration for the process we should not start the native process if there are no rows.

Adding some logging to indicate what is occurring.
2019-08-26 14:18:17 -05:00
Benjamin Trent d64018f8e1
[ML] add supported types to no fields error message (#45926) (#45987)
* [ML] add supported types to no fields error message

* adding supported types to logger debug
2019-08-26 14:18:00 -05:00
Jake Landis 767f648f8e
Watcher add email warning if CSV attachment contains formulas (#44460) (#45557)
* Watcher add email warning if CSV attachment contains formulas (#44460)

This commit introduces a Warning message to the emails generated by 
Watcher's reporting action. This change complements Kibana's CSV 
formula notifications (see elastic/kibana#37930). 

This is implemented by reading a header (kbn-csv-contains-formulas) 
provided by Kibana to notify to attach the Warning to the email. 
The wording of the warning is borrowed from Kibana's UI and may 
be overridden by a dynamic setting
xpack.notification.reporting.warning.kbn-csv-contains-formulas.text.
This warning is enabled by default, but may be disabled via a 
dynamic setting xpack.notification.reporting.warning.enabled.
2019-08-26 08:35:33 -05:00
Jake Landis f2241a152f
watcher tests - increase stop timeout to 60s (#45679) (#45934)
As of #43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.

This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.
2019-08-26 08:34:54 -05:00
Andrey Ershov 479ab9b8db Fix plaintext on TLS port logging (#45852)
Today if non-TLS record is received on TLS port generic exception will
be logged with the stack-trace.
SSLExceptionHelper.isNotSslRecordException method does not work because
it's assuming that NonSslRecordException would be top-level.
This commit addresses the issue and the log would be more concise.

(cherry picked from commit 6b83527bf0c23d4d5b97fab7f290c43432945d4f)
2019-08-26 12:32:35 +02:00
Ioannis Kakavas 2bee27dd54
Allow Transport Actions to indicate authN realm (#45946)
This commit allows the Transport Actions for the SSO realms to
indicate the realm that should be used to authenticate the
constructed AuthenticationToken. This is useful in the case that
many authentication realms of the same type have been configured
and where the caller of the API(Kibana or a custom web app) already
know which realm should be used so there is no need to iterate all
the realms of the same type.
The realm parameter is added in the relevant REST APIs as optional
so as not to introduce any breaking change.
2019-08-25 19:36:41 +03:00
Jason Tedor 040a810b3c
Add deprecation check for pidfile setting (#45939)
The pidfile setting is deprecated. This commit adds a deprecation check
for usage of this setting.
2019-08-24 17:19:20 -04:00
Jason Tedor 43ca652d11
Add deprecation check for processors (#45925)
The processors setting is deprecated. This commit adds a deprecation
check for the use of the processors setting.
2019-08-23 20:16:40 -04:00
Jason Tedor 6b116a48f3
Skip feature aware check on JDK 14 (#45928)
ASM can not currently handle classes compiled with JDK 14. This commit
skips these checks on JDK 14, for now.
2019-08-23 17:38:15 -04:00
Dimitris Athanasiou be554fe5f0
[7.x][ML] Improve progress reportings for DF analytics (#45856) (#45910)
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.

This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.

In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.

This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.

When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
2019-08-23 23:04:39 +03:00
Benjamin Trent b756e1b9be
[ML][Transforms] adjusting when and what to audit (#45876) (#45916)
* [ML][Transforms] adjusting when and what to audit

* Update DataFrameTransformTask.java

* removing unnecessary audit message
2019-08-23 13:53:02 -05:00
Benjamin Trent 94c2de65b9
[ML][Transforms] fix doSaveState check (#45882) (#45902)
* [ML][Transforms] fix doSaveState check

* removing unnecessary log statement
2019-08-23 09:38:52 -05:00
Alexander Reelsen ecafe4f4ad Update joda to 2.10.3 (#45495) 2019-08-23 10:39:39 +02:00
markharwood 217e41ab6c
Search - added HLRC support for PinnedQueryBuilder (#45779) (#45853)
Added HLRC support for PinnedQueryBuilder

Related  #44074
2019-08-23 09:22:17 +01:00
Przemysław Witek 85d55e30d0
Add test that proves _timing_stats document is deleted when the job is deleted (#45840) (#45854) 2019-08-23 07:03:09 +02:00
Przemysław Witek 2ed19b2c81
Put error message from inside the process into the exception that is thrown when the process doesn't start correctly. (#45846) (#45875) 2019-08-23 07:02:50 +02:00
Tim Vernum f94e4a9151
Set security index refresh interval to 1s (#45888)
The security indices were being created without specifying the
refresh interval, which means it would inherit a value from any
templates that exists.
However, certain security functionality depends on being able to
wait_for refresh, and causes errors (e.g. in Kibana) if that time
exceeds 30s.

This commit changes the security indices configuration to always be
created with a 1s refresh interval. This prevents any templates from
inadvertantly interfering with the proper functioning of security.
It is possible for an administrator to explicitly change the refresh
interval after the indices have been created.

Backport of: #45434
2019-08-23 12:41:37 +10:00
Tim Vernum 029725fc35
Add SSL/TLS settings for watcher email (#45836)
This change adds a new SSL context

    xpack.notification.email.ssl.*

that supports the standard SSL configuration settings (truststore,
verification_mode, etc). This SSL context is used when configuring
outbound SMTP properties for watcher email notifications.

Backport of: #45272
2019-08-23 10:13:51 +10:00
Nhat Nguyen 3393f9599e
Ignore translog retention policy if soft-deletes enabled (#45473)
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Backport of #45473
Relates #45136
2019-08-22 16:40:06 -04:00
Benjamin Trent 8e3c54fff7
[7.x] [ML] Adding data frame analytics stats to _usage API (#45820) (#45872)
* [ML] Adding data frame analytics stats to _usage API (#45820)

* [ML] Adding data frame analytics stats to _usage API

* making the size of analytics stats 10k

* adjusting backport
2019-08-22 15:15:41 -05:00
Benjamin Trent dff3e636c2
[ML][Transforms] unifying logging, adding some more logging (#45788) (#45859)
* [ML][Transforms] unifying logging, adding some more logging

* using parameterizedMessage instead of string concat

* fixing bracket closure
2019-08-22 13:15:07 -05:00
Benjamin Trent e50a78cf50
[ML-DataFrame] version data frame transform internal index (#45375) (#45837)
Adds index versioning for the internal data frame transform index. Allows for new indices to be created and referenced, `GET` requests now query over the index pattern and takes the latest doc (based on INDEX name).
2019-08-22 11:46:30 -05:00
Jake Landis 1dab73929f
Watcher add stopped listener (#43939) (#45670)
When Watcher is stopped and there are still outstanding watches running
Watcher will report it self as stopped. In normal cases, this is not problematic.

However, for integration tests Watcher is started and stopped between
each test to help ensure a clean slate for each test. The tests are blocking
only on the stopped state and make an implicit assumption that all watches are
finished if the Watcher is stopped. This is an incorrect assumption since
Stopped really means, "I will not accept any more watches". This can lead to
un-predictable behavior in the tests such as message : "Watch is already queued
in thread pool" and state: "not_executed_already_queued".
This can also change the .watcher-history if watches linger between tests.

This commit changes the semantics of a manual stopping watcher to now mean:
"I will not accept any more watches AND all running watches are complete".
There is now an intermediary step "Stopping" and callback to allow transition
to a "Stopped" state when all Watches have completed.

Additionally since this impacts how long the tests will block waiting for a
"Stopped" state, the timeout has been increased.

Related: #42409
2019-08-22 10:54:29 -05:00
Armin Braun bfddaaa2ae
Acknowledge Indices Were Wiped Successfully in REST Tests (#45832) (#45842)
In internal test clusters tests we check that wiping all indices was acknowledged
but in REST tests we didn't.
This aligns the behavior in both kinds of tests.
Relates #45605 which might be caused by unacked deletes that were just slow.
2019-08-22 17:19:51 +02:00
Przemysław Witek 7512337922
[7.x] Allow the user to specify 'query' in Evaluate Data Frame request (#45775) (#45825) 2019-08-22 11:14:26 +02:00
Benjamin Trent 3ebeaa2557
Fixing rollup state tests after onFailure ordering change (#45784) (#45814)
After the PR #45676 onFailure is now called before the indexer state has transitioned out of indexing.

To fix these tests, I added a new check to make sure that we don't mark it as failed until AFTER doSaveState is called with a STARTED indexer.
2019-08-21 14:46:09 -05:00
Gordon Brown 47b1e2b3d0
[7.x] Use rollover for SLM's history indices (#45686)
Following our own guidelines, SLM should use rollover instead of purely
time-based indices to keep shard counts low. This commit implements lazy
index creation for SLM's history indices, indexing via an alias, and
rollover in the built-in ILM policy.
2019-08-21 13:42:11 -06:00
William Brafford 2b549e7342
CLI tools: write errors to stderr instead of stdout (#45586)
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.

This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.

Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
2019-08-21 14:46:07 -04:00
Przemysław Witek bf701b83d2
Shorten field names in EstimateMemoryUsageResponse (#45719) (#45772) 2019-08-21 12:45:09 +02:00
Zachary Tong 6b391cd0d5 Mute ShapeQueryTests#testFieldAlias()
Tracking issue: https://github.com/elastic/elasticsearch/issues/45628
2019-08-21 10:31:13 +01:00
David Kyle 982560afeb Mute RollupIndexerStateTests
See #45770
2019-08-21 10:05:15 +01:00
Przemysław Witek c6709f0979
Mute tests affected by renaming fields in Estimate memory usage response (#45743) (#45766) 2019-08-21 09:57:23 +02:00
Dimitris Athanasiou d5c3d9b50f
[7.x][ML] Do not skip rows with missing values for regression (#45751) (#45754)
Regression analysis support missing fields. Even more, it is expected
that the dependent variable has missing fields to the part of the
data frame that is not for training.

This commit allows to declare that an analysis supports missing values.
For such analysis, rows with missing values are not skipped. Instead,
they are written as normal with empty strings used for the missing values.

This also contains a fix to the integration test.

Closes #45425
2019-08-21 08:15:38 +03:00
Benjamin Trent ba7b677618
[ML] better handle empty results when evaluating regression (#45745) (#45759)
* [ML] better handle empty results when evaluating regression

* adding new failure test to ml_security black list

* fixing equality check for regression results
2019-08-20 17:37:04 -05:00
Armin Braun a01bd6c5a3
Stop Executing SLM Policy Transport Action on Snapshot Pool (#45727) (#45748)
* Executing SLM policies on the snapshot thread will block until a snapshot finishes if the pool is completely busy executing that snapshot
* Fixes #45594
2019-08-20 19:15:36 +02:00
Nhat Nguyen 99b21d50b8 Include leases in ccr errmsg when ops no longer available (#45681)
The setting index.soft_deletes.retention.operations is no longer needed
nor recommended in CCR. We, therefore, should hint users about the
retention leases period setting instead when operations are no longer
available for replicating.
2019-08-20 10:40:12 -04:00
Benjamin Trent 43bb5924e6
[ML][Data Frame] fixing _start?force=true bug (#45660) (#45734)
* [ML][Data Frame] fixing _start?force=true bug

* removing unused import

* removing old TODO
2019-08-20 09:23:07 -05:00
Dimitris Athanasiou 49edf9e5b5
[7.x][ML] Remove timeout on waiting for DF analytics result processor to complete (#45724) (#45733)
We cannot know how long the analysis will take to complete thus we should not have
a timeout. Note that if the process crashes, the result processor will pick the
exception due to the stream closing.

Closes #45723
2019-08-20 17:21:40 +03:00
Przemysław Witek b37ebd1adf
Prepare the codebase for new Auditor subclasses (#45716) (#45731) 2019-08-20 16:03:50 +02:00
Przemysław Witek 80dd0a0948
Get rid of EstimateMemoryUsageRequest and EstimateMemoryUsageAction.Request. (#45718) (#45725) 2019-08-20 15:49:17 +02:00
Benjamin Trent 88641a08af
[ML][Data frame] fixing failure state transitions and race condition (#45627) (#45656)
* [ML][Data frame] fixing failure state transitions and race condition (#45627)

There is a small window for a race condition while we are flagging a task as failed.

Here are the steps where the race condition occurs:
1. A failure occurs
2. Before `AsyncTwoPhaseIndexer` calls the `onFailure` handler it does the following:
   a. `finishAndSetState()` which sets the IndexerState to STARTED
   b. `doSaveState(...)` which attempts to save the current state of the indexer
3. Another trigger is fired BEFORE `onFailure` can fire, but AFTER `finishAndSetState()` occurs.

The trick here is that we will eventually set the indexer to failed, but possibly not before another trigger had the opportunity to fire. This could obviously cause some weird state interactions. To combat this, I have put in some predicates to verify the state before taking actions. This is so if state is indeed marked failed, the "second trigger" stops ASAP.

Additionally, I move the task state checks INTO the `start` and `stop` methods, which will now require a `force` parameter. `start`, `stop`, `trigger` and `markAsFailed` are all `synchronized`. This should gives us some guarantees that one will not switch states out from underneath another.

I also flag the task as `failed` BEFORE we successfully write it to cluster state, this is to allow us to make the task fail more quickly. But, this does add the behavior where the task is "failed" but the cluster state does not indicate as much. Adding the checks in `start` and `stop` will handle this "real state vs cluster state" race condition. This has always been a problem for `_stop` as it is not a master node action and doesn’t always have the latest cluster state.

closes #45609

Relates to #45562

* [ML][Data Frame] moves failure state transition for MT safety (#45676)

* [ML][Data Frame] moves failure state transition for MT safety

* removing unused imports
2019-08-20 07:30:17 -05:00
markharwood 7d5ab17bb2
Search enhancement: pinned queries (#44345) (#45657)
* Search enhancement: pinned queries (#44345)

Search enhancement: - new query type allows selected documents to be promoted above any "organic” search results.
This is the first feature in a new module `search-business-rules` which will house licensed (non OSS) logic for rewriting queries according to business rules.
The PinnedQueryBuilder class offers a new `pinned` query in the DSL that takes an array of promoted IDs and an “organic” query and ensures the documents with the promoted IDs rank higher than the organic matches.

Closes #44074
2019-08-20 11:38:22 +01:00
Costin Leau 0f51dd69cb SQL: Improve serialization of SQL processors (#45678)
Encapsulate the serialization/deserialization of SQL client classes.
Make configuration specific parameters (such as ZoneId) generic just
like the version and remove the need for consumer classes to manage them
individually.
This is not only consistent but also provides significant savings in the
cursor.

Fix #40216

(cherry picked from commit 5c844798045d7baa0d932289d2e3d1607ba6a9a4)
2019-08-20 11:50:47 +03:00
Przemysław Witek 7bc8400222
Call the new _estimate_memory_usage API endpoint on df analytics _start (#45536) (#45701) 2019-08-19 21:37:55 +02:00
Costin Leau 1cd58c8ea8 SQL: Break TextFormatter/Cursor dependency (#45613)
Improve the initialization and state passing of TextFormatter in CLI
and TEXT mode by leveraging the Page listener hook. Additionally
simplify the code inside RestSqlQueryAction.

(cherry picked from commit a56db2fa119cf9e8748723e19f1fc9f6a8afe5fc)
2019-08-17 00:16:08 +03:00
Costin Leau 96883dd028 SQL: Refactor away the cycle between Rowset and Cursor (#45516)
Improve encapsulation of pagination of rowsets by breaking the cycle
between cursor and associated rowset implementation, all logic now
residing inside each cursor implementation.

(cherry picked from commit be8fe0a0ce562fe732fae12a0b236b5731e4638c)
2019-08-17 00:16:05 +03:00
Gordon Brown ecb3ebd796
Clean SLM and ongoing snapshots in test framework (#45564)
Adjusts the cluster cleanup routine in ESRestTestCase to clean up SLM
test cases, and optionally wait for all snapshots to be deleted.

Waiting for all snapshots to be deleted, rather than failing if any are
in progress, is necessary for tests which use SLM policies because SLM
policies may be in the process of executing when the test ends.
2019-08-16 14:17:34 -06:00
Igor Motov 98c850c08b
Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618)
Changes the order of parameters in Geometries from lat, lon to lon, lat
and moves all Geometry classes are moved to the
org.elasticsearch.geomtery package.

Backport of #45332

Closes #45048
2019-08-16 14:42:02 -04:00
Luca Cavanna c31cddf27e
Update the schema for the REST API specification (#42346)
* Update the REST API specification

This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.

Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.

Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.

Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.

* Adapt YAML runner to new REST API specification format

The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.

Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
2019-08-16 14:40:00 +02:00
Andrei Stefan 30a0711777 Remove deprecated use of "interval" method, in favor of "fixedInterval". (#45501)
(cherry picked from commit 3fef65160f9e61883e9f8f7f345b814f945e2f4b)
2019-08-16 15:03:43 +03:00
Alpar Torok 7119e54be5 Mute data frame tests on 7.x
Tracking in #45610 #45609
2019-08-15 17:07:53 +03:00
David Roberts d40f3718f2 [ML] Muting 5 SSLErrorMessageTests tests on Windows (#45602)
Due to https://github.com/elastic/elasticsearch/issues/45598
2019-08-15 11:05:00 +01:00
Benjamin Trent fde5dae387
[ML][Data Frame] adjusting change detection workflow (#45511) (#45580)
* [ML][Data Frame] adjusting change detection workflow

* adjusting for PR comment

* disallowing null as an argument value
2019-08-14 17:26:24 -05:00
Nick Knize 647a8308c3
[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363)
* Introduce Spatial Plugin (#44389)

Introduce a skeleton Spatial plugin that holds new licensed features coming to 
Geo/Spatial land!

* [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923)

Refactor DeprecatedParameters specific to legacy geo_shape out of
AbstractGeometryFieldMapper.TypeParser#parse.

* [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980)

Add a new ShapeFieldMapper to the xpack spatial module for
indexing arbitrary cartesian geometries using a new field type called shape.
The indexing approach leverages lucene's new XYShape field type which is
backed by BKD in the same manner as LatLonShape but without the WGS84
latitude longitude restrictions. The new field mapper builds on and
extends the refactoring effort in AbstractGeometryFieldMapper and accepts
shapes in either GeoJSON or WKT format (both of which support non geospatial
geometries).

Tests are provided in the ShapeFieldMapperTest class in the same manner
as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests.
Documentation for how to use the new field type and what parameters are
accepted is included. The QueryBuilder for searching indexed shapes is
provided in a separate commit.

* [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108)

Add a new ShapeQueryBuilder to the xpack spatial module for
querying arbitrary Cartesian geometries indexed using the new shape field
type.

The query builder extends AbstractGeometryQueryBuilder and leverages the
ShapeQueryProcessor added in the previous field mapper commit.

Tests are provided in ShapeQueryTests in the same manner as
GeoShapeQueryTests and docs are updated to explain how the query works.
2019-08-14 16:35:10 -05:00
Benjamin Trent 0c343d8443
[7.x] [ML][Transforms] adjusting stats.progress for cont. transforms (#45361) (#45551)
* [ML][Transforms] adjusting stats.progress for cont. transforms (#45361)

* [ML][Transforms] adjusting stats.progress for cont. transforms

* addressing PR comments

* rename fix

* Adjusting bwc serialization versions
2019-08-14 13:08:27 -05:00
Przemysław Witek df574e5168
[7.x] Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) (#45510) 2019-08-14 08:26:03 +02:00
Gordon Brown 3f5dab99c3
Properly set origin for SLM history store client (#45515)
The origin was not set properly for the SnapshotHistoryStore client,
resulting in errors when SLM was used when security was enabled.
2019-08-13 18:23:20 -06:00
Andrei Stefan adf8e20021
SQL: adds format parameter to range queries for constant date comparisons (#45503)
* Add format parameter to the range queries built for CURRENT_* functions
used in comparison conditions
* Use range queries for date fields equality/non-equality as well.

(cherry picked from commit c1e81e90f937ee5a002524d632bfce74d76962f9)
2019-08-13 23:04:30 +03:00
Armin Braun 90803a5caf
Reenable Integ Tests in native-multi-node-tests (#45482) (#45496)
* Reenable Integ Tests in native-multi-node-tests

* The tests broken here were likely fixed by #45463 => let's reenable them and see if things run fine again
* Relates #45405, #45455
2019-08-13 15:55:54 +02:00
Alexander Reelsen dd527b4e91 Fix watcher HttpClient URL creation (#45207)
The http client could end up creating URLs, that did not resemble the
original one, when encoding. This fixes a couple of corner cases, where
too much or too few slashes were added to an URI.

Closes #44970
2019-08-13 12:15:54 +02:00
Przemysław Witek 1aed388a24
Add view_index_metadata to roles.yml and remove as many df analytics test cases from build.gradle blacklist as possible. (#45451) (#45465) 2019-08-13 08:31:58 +02:00
Yogesh Gaikwad 471d940c44
Refactor cluster privileges and cluster permission (#45265) (#45442)
The current implementations make it difficult for
adding new privileges (example: a cluster privilege which is
more than cluster action-based and not exposed to the security
administrator). On the high level, we would like our cluster privilege
either:
- a named cluster privilege
  This corresponds to `cluster` field from the role descriptor
- or a configurable cluster privilege
  This corresponds to the `global` field from the role-descriptor and
allows a security administrator to configure them.

Some of the responsibilities like the merging of action based cluster privileges
are now pushed at cluster permission level. How to implement the predicate
(using Automaton) is being now enforced by cluster permission.

`ClusterPermission` helps in enforcing the cluster level access either by
performing checks against cluster action and optionally against a request.
It is a collection of one or more permission checks where if any of the checks
allow access then the permission allows access to a cluster action.

Implementations of cluster privilege must be able to provide information
regarding the predicates to the cluster permission so that can be enforced.
This is enforced by making implementations of cluster privilege aware of
cluster permission builder and provide a way to specify how the permission is
to be built for a given privilege.

This commit renames `ConditionalClusterPrivilege` to `ConfigurableClusterPrivilege`.
`ConfigurableClusterPrivilege` is a renderable cluster privilege exposed
as a `global` field in role descriptor.

Other than this there is a requirement where we would want to know if a cluster
permission is implied by another cluster-permission (`has-privileges`).
This is helpful in addressing queries related to privileges for a user.
This is not just simply checking of cluster permissions since we do not
have access to runtime information (like request object).
This refactoring does not try to address those scenarios.

Relates #44048
2019-08-13 09:06:18 +10:00
Mark Vieira 7e3379444b
Fix build failure due to unknown task and disable test conventions
(cherry picked from commit 8ed84bc5cef9bcfae6c817059f764d97e4451a4a)
2019-08-12 09:18:39 -07:00
Przemyslaw Gomulka 421e9b8e8b
Mute integ tests in native-multi-node-tests (#45457)
Tracked at #45405
2019-08-12 17:42:24 +02:00
Przemyslaw Gomulka d11ae08467
Muting ForecastIT.testOverflowToDisk (#45435) (#45438)
awaits #45405
2019-08-12 11:01:32 +02:00
Dimitris Athanasiou d02d6e40c2 [ML] Mute regression integ test
Relates #45425
2019-08-12 10:59:24 +03:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Benjamin Trent fac1a6f8e8
[ML][Data Frame] have DataFrameTransformConfigUpdate#apply set Version (#45391) (#45400) 2019-08-09 14:32:49 -05:00
Hendrik Muhs bf4da6c6ad
[ML-DataFrame] fix starting a batch data frame after stopping at runtime (#45340) (#45381)
fix loading of next checkpoint after data frame transform has been stopped/started within one run

closes #45339
2019-08-09 20:30:11 +02:00
Dimitris Athanasiou 27497ff75f
[7.x][ML] Add regression analysis to DF analytics (#45292) (#45388)
This commit adds a first draft of a regression analysis
to data frame analytics. There is high probability that
the exact syntax might change.

This commit adds the new analysis type and its parameters as
well as appropriate validation. It also modifies the extractor
and the fields detector to be able to handle categorical fields
as regression analysis supports them.
2019-08-09 19:31:13 +03:00
Alpar Torok 634a070430 Restrict which tasks can use testclusters (#45198)
* Restrict which tasks can use testclusters

This PR fixes a problem between the interaction of test-clusters and
build cache.
Before this any task could have used a cluster without tracking it as
input.
With this change a new interface is introduced to track the tasks that
can use clusters and we do consider the cluster as input for all of
them.
2019-08-09 13:38:01 +03:00
Hendrik Muhs 7d0aff0ed5 [ML-DataFrame] fix test failure in checkpoint retrieval (#45297)
gracefully handle if index response returns null, increase and assert timeout

closes #45238
2019-08-09 09:04:53 +02:00
Hendrik Muhs 68f9102550 [ML-DataFrame] audit changes in the source index (#45282)
add audits when the set of source indexes changes and in a special case runs empty
2019-08-08 23:31:55 +02:00
Andrei Stefan 740d58fd46
SQL: Uniquely named inner_hits sections for each nested field condition (#45341)
* Name each inner_hits section of nested queries differently and extract and combine the multiple values it generates into a single list.
This also introduces a limitation (its origin it's with Elasticsearch
though) on the sorting capabilities when the sorting is based on the
nested fields filtered: only one of the conditions applied to nested
documents will be used in the nested sorting.

(cherry picked from commit cfc5cf68f6e83b07bb9006986d0903d6be418ec6)
2019-08-09 00:22:49 +03:00
David Roberts 14545f8958
[ML-DataFrame] Combine task_state and indexer_state in _stats (#45324)
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:

- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state

Backport of #45276
2019-08-08 16:24:26 +01:00
Ioannis Kakavas 99ddb8b3d8 Allow empty token endpoint for implicit flow (#45038)
When using the implicit flow in OpenID Connect, the
op.token_endpoint_url should not be mandatory as there is no need
to contact the token endpoint of the OP.
2019-08-08 12:50:53 +03:00