Commit Graph

3228 Commits

Author SHA1 Message Date
Hendrik Muhs 42ada88c5d cleanup static member 2019-09-06 08:55:57 +02:00
Hendrik Muhs ab5f09d29b [ML-DataFrame] improve error message for timeout case in stop (#46131)
improve error message if stopping of transform times out.

related #45610
2019-09-06 08:36:01 +02:00
Hendrik Muhs 78824cce81 reuse mock client to avoid probles with thread context closed errors (#46398) 2019-09-06 08:17:25 +02:00
Benjamin Trent 457ff3e2fb
7.x/ml fix instance serialization bwc (#46404)
* [ML] Fixing instance serialization version for bwc

* fixing CppLogMessage
2019-09-05 13:23:26 -05:00
Benjamin Trent caf3e4d654
[7.x] [ML][Transforms] fixing rolling upgrade continuous transform test (#45823) (#46347) (#46337)
* [ML][Transforms] fixing rolling upgrade continuous transform test (#45823)

* [ML][Transforms] fixing rolling upgrade continuous transform test

* adjusting wait assert logic

* adjusting wait conditions

* [ML][Transforms] allow executor to call start on started task (#46347)

* making sure we only upgrade from 7.4.0 in test
2019-09-05 10:31:11 -05:00
Benjamin Trent 5201386232
[ML] testFullClusterRestart waiting for stable cluster (#46280) (#46335)
* [ML] waiting for ml indices before waiting task assignment testFullClusterRestart

* waiting for a stable cluster after fullrestart

* removing unused imports
2019-09-05 06:57:33 -05:00
Yogesh Gaikwad d5acb15a71
[Backport] Initialize document subset bit set cache used for DLS (#46211) (#46359)
This commit initializes DocumentSubsetBitsetCache even if DLS
is disabled. Previously it would throw null pointer when querying
usage stats if we explicitly disabled DLS as there would be no instance of DocumentSubsetBitsetCache to query. It is okay to initialize
DocumentSubsetBitsetCache which will be empty as the license enforcement
would prevent usage of DLS feature and it will not fail when accessing usage stats.

Closes #45147
2019-09-05 14:34:19 +10:00
Julie Tibshirani 40c3225d26
First round of optimizations for vector functions. (#46294)
This PR merges the `vectors-optimize-brute-force` feature branch, which makes
the following changes to how vector functions are computed:
* Precompute the L2 norm of each vector at indexing time. (#45390)
* Switch to ByteBuffer for vector encoding. (#45936)
* Decode vectors and while computing the vector function. (#46103) 
* Use an array instead of a List for the query vector. (#46155)
* Precompute the normalized query vector when using cosine similarity. (#46190)

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2019-09-04 14:45:57 -07:00
Andrey Ershov ece9eb4acd Remove stack trace logging in Security(Transport|Http)ExceptionHandler (#45966)
As per #45852 comment we no longer need to log stack-traces in
SecurityTransportExceptionHandler and SecurityHttpExceptionHandler even
if trace logging is enabled.

(cherry picked from commit c99224a32d26db985053b7b36e2049036e438f97)
2019-09-04 11:50:35 +03:00
Dimitris Athanasiou 8fca5b5204
[7.x][ML] Unmute testStopOutlierDetectionWithEnoughDocumentsToScroll (#46271) (#46282)
The test seems to have been failing due to a race condition between
stopping the task and refreshing the destination index. In particular,
we were going forward with refreshing the destination index even
though the task stopped in the meantime. This was fixed in
request.

Closes #43960

Backport of #46271
2019-09-04 10:57:01 +03:00
Marios Trivyzas fd0affb503
SQL: Fix issue with IIF function when condition folds (#46290)
Previously, when the condition (1st argument) of the IIF function could
be evaluated (folded) to false, the `IfConditional` was eliminated which
caused `IndexOutOfBoundsException` to be thrown when `info()` and
`resolveType()` methods where called.

Fixes: #46268

(cherry picked from commit 9a885a3ac47bc8f52c07770d1d8d670ce0af1e59)
2019-09-04 10:32:49 +03:00
Benjamin Trent 8a4ff7d57d
[ML][Transforms] protecting doSaveState with optimistic concurrency (#46156) (#46281)
* [ML][Transforms] protecting doSaveState with optimistic concurrency

* task code cleanup
2019-09-03 15:27:24 -05:00
Benjamin Trent c3c1dcd2ac
[ML][Transforms] fixing listener being called twice (#46284) (#46292) 2019-09-03 15:26:56 -05:00
Benjamin Trent 53df54c703
[ML][Transforms] fixing stop on changes check bug (#46162) (#46273)
* [ML][Transforms] fixing stop on changes check bug

* Adding new method finishAndCheckState to cover race conditions in early terminations

* changing stopping conditions in `onStart`

* allow indexer to finish when exiting early
2019-09-03 11:04:18 -05:00
Lee Hinman 3d4b8e01c7
Validate SLM policy ids strictly (#45998) (#46145)
This uses strict validation for SLM policy ids, similar to what we use
for index names.

Resolves #45997
2019-09-03 09:20:02 -06:00
David Roberts ab045744ac [ML-DataFrame] Fix off-by-one error in checkpoint operations_behind (#46235)
Fixes a problem where operations_behind would be one less than
expected per shard in a new index matched by the data frame
transform source pattern.

For example, if a data frame transform had a source of foo*
and a new index foo-new was created with 2 shards and 7 documents
indexed in it then operations_behind would be 5 prior to this
change.

The problem was that an empty index has a global checkpoint
number of -1 and the sequence number of the first document that
is indexed into an index is 0, not 1.  This doesn't matter for
indices included in both the last and next checkpoints, as the
off-by-one errors cancelled, but for a new index it affected
the observed result.
2019-09-03 12:45:02 +01:00
Ignacio Vera 59c474e675
reset queryGeometry in ShapeQueryTests (#45974) (#46251) 2019-09-03 10:24:46 +02:00
markharwood a9e3871fc7
Test fix for PinnedQueryBuilderIT (#46187) (#46227)
Fix test issue to stabilise scoring through use of DFS search mode.
Randomised index-then-delete docs introduced by the test framework likely caused an imbalance in IDF scores across shards. Also made number of shards used in test a random number for added test coverage.

Closes #46174
2019-09-02 13:31:22 +01:00
Marios Trivyzas 3bee647e5b
SQL: Fix issue with DataType for CASE with NULL (#46173)
Previously, if the DataType of all the WHEN conditions of a CASE
statement is NULL, then it was set to NULL even if the ELSE clause
has a non-NULL data type, e.g.:
```
CASE WHEN a = 1 THEN NULL
           WHEN a = 5 THEN NULL
ELSE 'foo'
```

Fixes: #46032
(cherry picked from commit 8c1012efbbd3a300afd0dfb9b18250f15ea753f9)
2019-09-02 11:17:24 +03:00
Benjamin Trent d0c5573a51
[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044) (#46096)
Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot.

This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.
2019-08-30 09:27:07 -05:00
Alexander Reelsen 98c32c7846 Fix wrong URL encoding in watcher HTTP client (#45894)
The test assumption was calling the wrong method resulting in a URL
encoding before returning the data.

Closes #44970
2019-08-30 14:02:49 +02:00
Dimitris Athanasiou 5921ae53d8
[7.x][ML] Regression dependent variable must be numeric (#46072) (#46136)
* [ML] Regression dependent variable must be numeric

This adds a validation that the dependent variable of a regression
analysis must be numeric.

* Address review comments and fix some problems

In addition to addressing the review comments, this
commit fixes a few issues I found during testing.

In particular:

- if there were mappings for required fields but they were
not included we were not reporting the error
- if explicitly included fields had unsupported types we were
not reporting the error

Unfortunately, I couldn't get those fixed without refactoring
the code in `ExtractedFieldsDetector`.
2019-08-30 09:57:43 +03:00
Zachary Tong cf8a4171e1
Rename `data-science` plugin to `analytics` (#46133)
Rename `data-science` plugin to `analytics`. 
Also removes enabled flag. Backport of #46092
2019-08-29 12:45:39 -04:00
Simon Willnauer 9b2ea07b17
Flush engine after big merge (#46066) (#46111)
Today we might carry on a big merge uncommitted and therefore
occupy a significant amount of diskspace for quite a long time
if for instance indexing load goes down and we are not quickly
reaching the translog size threshold. This change will cause a
flush if we hit a significant merge (512MB by default) which
frees diskspace sooner.
2019-08-29 17:54:15 +02:00
Nhat Nguyen 028e792e1d Remove already exist assertion while renew ccr lease (#46009)
If a CCR lease is disappeared while we are renewing it, then we will
issue asyncAddRetentionLease to add that lease. And if
asyncAddRetentionLease takes longer than retentionLeaseRenewInterval,
then we can issue another asyncAddRetentionLease request. One of
asyncAddRetentionLease requests will fail with
RetentionLeaseAlreadyExistsException, hence trip the assertion.

Closes #45192
2019-08-29 09:44:40 -04:00
Przemysław Witek b8a0379057
Refactor auditor-related classes (#45893) (#46120) 2019-08-29 14:21:03 +02:00
Przemysław Witek fbe9e8a530
Do not throw an exception if the process finished quickly but without any error. (#46073) (#46113) 2019-08-29 10:47:17 +02:00
Gordon Brown 47bbd9d9a9
[7.x] Fix rollover alias in SLM history index template (#46001)
This commit adds the `rollover_alias` setting required for ILM to work
correctly to the SLM history index template and adds assertions to the
SLM integration tests to ensure that it works correctly.
2019-08-28 14:50:22 -07:00
Tal Levy a356bcff41
Add Circle Processor (#43851) (#46097)
add circle-processor that translates circles to polygons
2019-08-28 14:44:08 -07:00
Julie Tibshirani d94c4dcffb Use float instead of double for query vectors. (#46004)
Currently, when using script_score functions like cosineSimilarity, the query
vector is treated as an array of doubles. Since the stored document vectors use
floats, it seems like the least surprising behavior for the query vectors to
also be float arrays.

In addition to improving consistency, this change may help with some
optimizations we have been considering around vector dot product.
2019-08-28 11:03:14 -07:00
Mark Tozzi 9ac85a4a2b Fix compilation in CumulativeCardinalityAggregatorTests 2019-08-28 09:31:48 -04:00
Dimitris Athanasiou 25d64508f6
[7.x][ML] Support boolean fields for DF analytics (#46037) (#46054)
This commit adds support for `boolean` fields in data frame
analytics (and currently both outlier detection and regression).
The analytics process expects `boolean` fields to be encoded as
integers with 0 or 1 value.
2019-08-28 12:02:29 +03:00
Jake Landis 154d1dd962
Watcher max_iterations with foreach action execution (#45715) (#46039)
Prior to this commit the foreach action execution had a hard coded 
limit to 100 iterations. This commit allows the max number of 
iterations to be a configuration ('max_iterations') on the foreach 
action. The default remains 100.
2019-08-27 16:57:20 -05:00
Armin Braun fdef293c81 Fix RegressionTests#fromXContent (#46029)
* The `trainingPercent` must be between `1` and `100`, not `0` and `100` which is causing test failures
2019-08-27 18:24:26 +03:00
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Yogesh Gaikwad 7b6246ec67
Add `manage_own_api_key` cluster privilege (#45897) (#46023)
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.

This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.

The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.

The changes cover HLRC changes, documentation for the API changes.

Closes #40031
2019-08-28 00:44:23 +10:00
Dimitris Athanasiou dd6c13fdf9
[ML] Add description to DF analytics (#45774) (#46019) 2019-08-27 15:48:59 +03:00
Albert Zaharovits 1ebee5bf9b
PKI realm authentication delegation (#45906)
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.

In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.

Closes #34396
2019-08-27 14:42:46 +03:00
Zachary Tong 943a016bb2
Add Cumulative Cardinality agg (and Data Science plugin) (#45990)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 16:19:55 -04:00
Benjamin Trent a3a4ae0ac2
[ML] fixing bug where analytics process starts with 0 rows (#45879) (#45988)
The native process requires that there be a non-zero number of rows to analyze. If the flag --rows 0 is passed to the executable, it throws and does not start.

When building the configuration for the process we should not start the native process if there are no rows.

Adding some logging to indicate what is occurring.
2019-08-26 14:18:17 -05:00
Benjamin Trent d64018f8e1
[ML] add supported types to no fields error message (#45926) (#45987)
* [ML] add supported types to no fields error message

* adding supported types to logger debug
2019-08-26 14:18:00 -05:00
Jake Landis 767f648f8e
Watcher add email warning if CSV attachment contains formulas (#44460) (#45557)
* Watcher add email warning if CSV attachment contains formulas (#44460)

This commit introduces a Warning message to the emails generated by 
Watcher's reporting action. This change complements Kibana's CSV 
formula notifications (see elastic/kibana#37930). 

This is implemented by reading a header (kbn-csv-contains-formulas) 
provided by Kibana to notify to attach the Warning to the email. 
The wording of the warning is borrowed from Kibana's UI and may 
be overridden by a dynamic setting
xpack.notification.reporting.warning.kbn-csv-contains-formulas.text.
This warning is enabled by default, but may be disabled via a 
dynamic setting xpack.notification.reporting.warning.enabled.
2019-08-26 08:35:33 -05:00
Jake Landis f2241a152f
watcher tests - increase stop timeout to 60s (#45679) (#45934)
As of #43939 Watcher tests now correctly block until all Watch executions
kicked off by that test are finished. Prior we allowed tests to finish with
outstanding watch executions. It was known that this would increase the
time needed to finish a test. However, running the tests on CI can be slow
and on at least 1 occasion it took 60s to actually finish.

This PR simply increases the max allowable timeout for Watcher tests
to clean up after themselves.
2019-08-26 08:34:54 -05:00
Andrey Ershov 479ab9b8db Fix plaintext on TLS port logging (#45852)
Today if non-TLS record is received on TLS port generic exception will
be logged with the stack-trace.
SSLExceptionHelper.isNotSslRecordException method does not work because
it's assuming that NonSslRecordException would be top-level.
This commit addresses the issue and the log would be more concise.

(cherry picked from commit 6b83527bf0c23d4d5b97fab7f290c43432945d4f)
2019-08-26 12:32:35 +02:00
Ioannis Kakavas 2bee27dd54
Allow Transport Actions to indicate authN realm (#45946)
This commit allows the Transport Actions for the SSO realms to
indicate the realm that should be used to authenticate the
constructed AuthenticationToken. This is useful in the case that
many authentication realms of the same type have been configured
and where the caller of the API(Kibana or a custom web app) already
know which realm should be used so there is no need to iterate all
the realms of the same type.
The realm parameter is added in the relevant REST APIs as optional
so as not to introduce any breaking change.
2019-08-25 19:36:41 +03:00
Jason Tedor 040a810b3c
Add deprecation check for pidfile setting (#45939)
The pidfile setting is deprecated. This commit adds a deprecation check
for usage of this setting.
2019-08-24 17:19:20 -04:00
Jason Tedor 43ca652d11
Add deprecation check for processors (#45925)
The processors setting is deprecated. This commit adds a deprecation
check for the use of the processors setting.
2019-08-23 20:16:40 -04:00
Jason Tedor 6b116a48f3
Skip feature aware check on JDK 14 (#45928)
ASM can not currently handle classes compiled with JDK 14. This commit
skips these checks on JDK 14, for now.
2019-08-23 17:38:15 -04:00
Dimitris Athanasiou be554fe5f0
[7.x][ML] Improve progress reportings for DF analytics (#45856) (#45910)
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.

This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.

In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.

This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.

When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
2019-08-23 23:04:39 +03:00
Benjamin Trent b756e1b9be
[ML][Transforms] adjusting when and what to audit (#45876) (#45916)
* [ML][Transforms] adjusting when and what to audit

* Update DataFrameTransformTask.java

* removing unnecessary audit message
2019-08-23 13:53:02 -05:00