Commit Graph

3633 Commits

Author SHA1 Message Date
Lee Hinman 91988c7c26 Throw error retrieving non-existent SLM policy (#47679)
Previously when retrieving an SLM policy it would always return a 200
with `{}` in the body, even if the policy did not exist. This changes
that behavior to throw an error (similar to our other APIs) if a
policy doesn't exist.

This also adds a basic CRUD yml test for the behavior.

Resolves #47664
2019-10-07 19:54:04 -06:00
Lee Hinman 906be45209 Add a test for SLM retention with security enabled (#47608)
This enhances the existing SLM test using users/roles/etc to also test
that SLM retention works when security is enabled.

Relates to #43663
2019-10-07 19:52:09 -06:00
Tal Levy a17f394e27
Geo-Match Enrich Processor (#47243) (#47701)
this commit introduces a geo-match enrich processor that looks up a specific
`geo_point` field in the enrich-index for all entries that have a geo_shape match field
that meets some specific relation criteria with the input field.

For example, the enrich index may contain documents with zipcodes and their respective
geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode
they associate according to which shape they are contained within.

this commit also refactors some of the MatchProcessor by moving a lot of the shared code to
AbstractEnrichProcessor.

Closes #42639.
2019-10-07 15:03:46 -07:00
Jake Landis 74876811c2
Watcher - catch uncaught exception. (#47680) (#47695)
If a thread pool rejection exception happens, an alternative code
path is chosen to write history and delete the trigger. If an exception
happens during deletion of the trigger an exception may be thrown and not
caught.

This commit catches the exception and provides a meaning error message.

fixes #47008
2019-10-07 15:45:45 -05:00
Jake Landis a49a1b6994
Watcher remove assertion that is susceptible to a race conditi… (#47667)
When deactivating a watch, there is a chance that it is fully deactivated
and reporting as not running but the history is not fully written yet.
There is not a tight coupling between the associated watcher history
index and the deactivation. This test assumes that once a watch is
deactivated that all history is fully written in a very short time period.
If the Watch is deactivated, but the history is slow to write it can result
in a failing test.

This change removes an assertion that assumes that the deactivation of a watch
ensured the all of the watch history was written. There is still a minor race
condition with respect to the remaining history assertions. However, if the
history is slow to be written, it will allow the test to still passing.

fixes #47503
2019-10-07 12:07:10 -05:00
Dimitris Athanasiou 7667ea5f6f
[7.x][ML] Additional outlier detection parameters (#47600) (#47669)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values

Backport of #47600
2019-10-07 18:21:33 +03:00
Marios Trivyzas e698e68f06 SQL: Allow whitespaces in escape patterns (#47577)
Previously, we supported only the format `{fn <FUNCTION_NAME>()}`
but other DBs like MSSQL, DB2, MariaDB/MySQL alos allow whitespaces
between `{` and `fn`. Furhermore, also some applications - like PowerBI -
generate escape sequences with spaces: `select { fn name(params) } etc.`

Add support for white spaces between `{` and the escape pattern definition
like `fn`, `ts`, `d`, `guid` etc.

Closes: #47401

(cherry picked from commit 08a22d0b393f4a76c52dabc5e7b9cafcc19c30ca)
2019-10-07 15:05:02 +02:00
Yogesh Gaikwad b6d1d2e6ec
Add 'create_doc' index privilege (#45806) (#47645)
Use case:
User with `create_doc` index privilege will be allowed to only index new documents
either via Index API or Bulk API.

There are two cases that we need to think:
- **User indexing a new document without specifying an Id.**
   For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`.
- **User indexing a new document with an Id.**
   This is problematic as we do not know whether a document with Id exists or not.
   If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine.

Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents.

In the `AuthorizationService` when authorizing a bulk request, we check the implied action.
This code changes that to append the `:op_type/index` or `:op_type/create`
to indicate the implied index action.
2019-10-07 23:58:44 +11:00
Yogesh Gaikwad 7c862fe71f
Add support to retrieve all API keys if user has privilege (#47274) (#47641)
This commit adds support to retrieve all API keys if the authenticated
user is authorized to do so.
This removes the restriction of specifying one of the
parameters (like id, name, username and/or realm name)
when the `owner` is set to `false`.

Closes #46887
2019-10-07 23:58:21 +11:00
Tanguy Leroux b5ac0204d2
Fail earlier Put Follow requests for closed leader indices (#47637)
Backport of (#47582)

Today when following a new leader index, we fetch the remote cluster state, 
check the remote cluster license, check the user privileges, retrieve the 
index shard stats before initiating a CCR restore session.

But if the leader index to follow is closed, we're executing a bunch of 
operations that would inevitability fail at some point (on retrieving the
 index shard stats, because this type of request forbid closed indices 
when resolving indices). We could fail a Put Follow request at the first 
step by checking the leader index state directly from the remote cluster 
state.

This also helps the Resume Follow API to fail a bit earlier.
2019-10-07 13:59:04 +02:00
Martijn van Groningen f2f2304c75
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-10-07 10:07:56 +02:00
Andrei Dan 4506b37ed5
ILM: Skip rolling indexes that are already rolled (#47324) (#47592)
An index with an ILM policy that has a rollover action in one of the
phases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This changes this behaviour and has ILM check if the index it's about to
roll has not been rolled over in the meantime.

(cherry picked from commit 37d6106feeb9f9369519117c88a9e7e30f3ac797)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-10-07 07:47:47 +01:00
Dimitris Athanasiou ffacfc642c
[7.x][ML] Mute RegressionIT.testStopAndRestart (#47624) (#47625)
Relates #47612
2019-10-05 23:58:32 +03:00
Jason Tedor 35ca3d68d7
Validating monitoring hosts setting while parsing (#47571)
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
2019-10-04 17:32:49 -04:00
Lee Hinman 79376b7219 Set default SLM retention invocation time (#47604)
This adds a default for the `slm.retention_schedule` setting, setting it
to `0 30 1 * * ?` which is 1:30am every day.

Having retention unset meant that it would never be invoked and clean up
snapshots. We determined it would be better to have a default than never
to be run. When coming to a decision, we weighed the option of an
absolute time (such as 1:30am) versus a periodic invocation (like every
12 hours). In the end we decided on the absolute time because it has
better predictability and consistency than a periodic invocation, which
would rely on when the master node were elected or restarted.

Relates to #43663
2019-10-04 15:00:20 -06:00
James Baiera a66c0dcd95 Add pipeline to ensure unique Enrich index documents (#46348)
Adds a pipeline that removes ids and routing from documents before indexing
them into enrich indices. Enrich documents may come from multiple indices,
and thus have id collisions on them. This pipeline ensures that documents
with colliding id fields do not clobber one another during the reindex operation
while executing an enrich policy.
2019-10-04 12:20:52 -04:00
Przemysław Witek ee952da2e2
[7.x] Implement evaluation API for multiclass classification problem (#47126) (#47343) 2019-10-04 17:54:51 +02:00
Andrei Stefan a46f312ded SQL: fix multi full-text functions usage with aggregate functions (#47444)
* Skip functions involving full-text predicates when replacing multiple
aggregate functions with "stats" or "matrix_stats" aggregations.

(cherry picked from commit bb14ba83128dfb7a70f825ea08b1524072fb9ad0)
2019-10-04 16:27:22 +03:00
Przemysław Witek 8c180a77f0
[7.x] Fix serialization of evaluation response. (#47557) (#47566) 2019-10-04 15:12:18 +02:00
Przemysław Witek ec9b77deaa
[7.x] Implement new analysis type: classification (#46537) (#47559) 2019-10-04 13:47:19 +02:00
David Roberts 31a5e1c7ee [ML] More accurate job memory overhead (#47516)
When an ML job runs the memory required can be
broken down into:

1. Memory required to load the executable code
2. Instrumented model memory
3. Other memory used by the job's main process or
   ancilliary processes that is not instrumented

Previously we added a simple fixed overhead to
account for 1 and 3. This was 100MB for anomaly
detection jobs (large because of the completely
uninstrumented categorization function and
normalize process), and 20MB for data frame
analytics jobs.

However, this was an oversimplification because
the executable code only needs to be loaded once
per machine.  Also the 100MB overhead for anomaly
detection jobs was probably too high in most cases
because categorization and normalization don't use
_that_ much memory.

This PR therefore changes the calculation of memory
requirements as follows:

1. A per-node overhead of 30MB for _only_ the first
   job of any type to be run on a given node - this
   is to account for loading the executable code
2. The established model memory (if applicable) or
   model memory limit of the job
3. A per-job overhead of 10MB for anomaly detection
   jobs and 5MB for data frame analytics jobs, to
   account for the uninstrumented memory usage

This change will enable more jobs to be run on the
same node.  It will be particularly beneficial when
there are a large number of small jobs.  It will
have less of an effect when there are a small number
of large jobs.
2019-10-04 09:57:31 +01:00
Yogesh Gaikwad d371f9d44d
Fix for ApiKeyIntegTests related to Expired API keys remover (#43477) (#47546)
When API key is invalidated we do two things first it tries to trigger `ExpiredApiKeysRemover` task
and second, we do index the invalidation for the API key. The index invalidation may happen
before the `ExpiredApiKeysRemover` task is run and in that case, the API key
invalidated will also get deleted. If the `ExpiredApiKeysRemover` runs before the
API key invalidation is indexed then the API key is not deleted and will be
deleted in the future run.
This behavior was not captured in the tests related to `ExpiredApiKeysRemover`
causing intermittent failures.
This commit fixes those tests by checking if the API key invalidated is reported
back when we get API keys after invalidation and perform the checks based on that.

Closes #41747
2019-10-04 13:17:52 +10:00
Ioannis Kakavas fd6a585009
Fix ADRealmTests in FIPS 140 JVMs (#47437) (#47506)
The changes introduced in #47179 made it so that we could try to
build an SSLContext with verification mode set to None, which is
not allowed in FIPS 140 JVMs. This commit address that
2019-10-03 17:14:26 +03:00
Alpar Torok 0a14bb174f Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 11:55:00 +03:00
Lee Hinman 2e3eb4b24e
Add API to execute SLM retention on-demand (#47405) (#47463)
* Add API to execute SLM retention on-demand (#47405)

This is a backport of #47405

This commit adds the `/_slm/_execute_retention` API endpoint. This
endpoint kicks off SLM retention and then returns immediately.

This in particular allows us to run retention without scheduling it
(for entirely manual invocation) or perform a one-off cleanup.

This commit also includes HLRC for the new API, and fixes an issue
in SLMSnapshotBlockingIntegTests where retention invoked prior to the
test completing could resurrect an index the internal test cluster
cleanup had already deleted.

Resolves #46508
Relates to #43663
2019-10-02 12:29:04 -06:00
Lee Hinman 013d87d716 Fix AllocationRoutedStepTests.testConditionMetOnlyOneCopyAlloc… (#47313)
* Fix AllocationRoutedStepTests.testConditionMetOnlyOneCopyAllocated

These tests were using randomly generated includes/excludes/requires for
routing, however, it was possible to generate mutually exclusive
allocation settings (about 1 out of 50,000 times for my runs).

This splits the test into three different tests, and removes the
randomization (it doesn't add anything to the testing here) to fix the
issue.

Resolves #47142
2019-10-02 10:01:23 -06:00
Ioannis Kakavas 4f722f0f53
Fix Active Directory tests (#47358) (#47440)
Fixes multiple Active Directory related tests that run against the
samba fixture. Some were failing since we changed the realm settings
format in 7.0 and a few were slightly broken in other ways.
We can move to cleanup the tests in a follow up but this work fits
better to be done with or after we move the tests from a Samba
based fixture to a real(-ish) Microsoft Active Directory based
fixture.

Resolves: #33425, #35738
2019-10-02 17:18:12 +03:00
Benjamin Trent 2228a7dd8d
[ML][Inference] adding ensemble model objects (#47241) (#47438)
* [ML][Inference] adding ensemble model objects

* addressing PR comments

* Update TreeTests.java

* addressing PR comments

* fixing test
2019-10-02 09:49:46 -04:00
Dimitris Athanasiou b9541eb3af
[7.x][ML] Make PUT data frame analytics action a master node action (… (#47433)
While it seemed like the PUT data frame analytics action did not
have to be a master node action as the config is stored in an index
rather than the cluster state, there are other subtle nuances which
make it worthwhile to convert it. In particular, it helps maintain
order of execution for put actions which are anyhow user driven and
are expected to have low volume.

This commit converts `TransportPutDataFrameAnalyticsAction` from
a handled transport action to a master node action.

Note this means that the action might fail in a mixed cluster
but as the API is still experimental and not widely used there will
be few moments more suitable to make this change than now.
2019-10-02 16:24:21 +03:00
Yannick Welsch 7b2613db55 Allow optype CREATE for append-only indexing operations (#47169)
Bulk requests currently do not allow adding "create" actions with auto-generated IDs.
This commit allows using the optype CREATE for append-only indexing operations. This is
mainly the user facing aspect of it.
2019-10-02 14:16:52 +02:00
Henning Andersen 42453aec96 Fix XPackPlugin usages in tests (#47252)
XPackPlugin holds data in statics and can only be initialized once. This
caused tests to fail primarily when running with a low max-workers.

Replaced usages with the LocalStateCompositeXPackPlugin, which handles
this properly for testing.
2019-10-02 12:36:02 +02:00
David Roberts 4379a3c52b [ML] Throttle the delete-by-query of expired results (#47177)
Due to #47003 many clusters will have built up a
large backlog of expired results. On upgrading to
a version where that bug is fixed users could find
that the first ML daily maintenance task deletes
a very large amount of documents.

This change introduces throttling to the
delete-by-query that the ML daily maintenance uses
to delete expired results to limit it to deleting an
average 200 documents per second. (There is no
throttling for state/forecast documents as these
are expected to be lower volume.)

Additionally a rough time limit of 8 hours is applied
to the whole delete expired data action. (This is only
rough as it won't stop part way through a single
operation - it only checks the timeout between
operations.)

Relates #47103
2019-10-02 11:16:34 +01:00
Dimitris Athanasiou 36884a3c32
[7.x][ML] Restore analytics state if available (#47128) (#47393)
This commit restores the model state if available in data
frame analytics jobs.

In addition, this changes the start API so that a stopped job
can be restarted. As we now store the progress in the state index
when the task is stopped, we can use it to determine what state
the job was in when it got stopped.

Note that in order to be able to distinguish between a job
that runs for the first time and another that is restarting,
we ensure reindexing progress is reported to be at least 1
for a running task.
2019-10-02 10:24:05 +03:00
Benjamin Trent f5fe5e7cd6
[7.x] [ML][Inference] Adding preprocessors to definition object (#47320) (#47370)
* [ML][Inference] Adding preprocessors to definition object (#47320)

* [ML][Inference] Adding preprocessors to definition object

* Update TrainedModelConfig.java

* adjusting for backport
2019-10-01 13:31:25 -04:00
Michael Basnight 0e1b77568a Add enable checks to missing enrich plugin methods (#47187)
Some of the server side objects that do not need to be created unless
enrich is enabled were still being created. This commit fixes that.
2019-10-01 12:04:46 -05:00
Albert Zaharovits 78558a7b2f
Fix AD realm additional metadata (#47179)
Due to a regression bug the metadata Active Directory realm
setting is ignored (it works correctly for the LDAP realm type).
This commit redresses it.

Closes #45848
2019-10-01 17:05:25 +03:00
Marios Trivyzas f792dbf239 SQL: Implement DATE_PART function (#47206)
DATE_PART(<datetime unit>, <date/datetime>) is a function that allows
the user to extract the specified unit from a date/datetime field
similar to the EXTRACT (<datetime unit> FROM <date/datetime>) but
with different names and aliases for the units and it also provides more
options like `DATE_PART('tzoffset', datetimeField)`.

Implemented following the SQL server's spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/datepart-transact-sql?view=sql-server-2017
with the difference that the <datetime unit> argument is either a
literal single quoted string or gets a value from a table field, whereas
in SQL server keywords are used (unquoted identifiers) and it's not
possible to use a value coming for a table column.

Closes: #46372
(cherry picked from commit ead743d3579eb753fd314d4a58fae205e465d72e)
2019-10-01 16:28:27 +03:00
Benjamin Trent 4335e07716
[7.x] [ML][Inference] adding .ml-inference* index and storage (#47267) (#47310)
* [ML][Inference] adding .ml-inference* index and storage (#47267)

* [ML][Inference] adding .ml-inference* index and storage

* Addressing PR comments

* Allowing null definition, adding validation tests for model config

* fixing line length

* adjusting for backport
2019-10-01 08:20:33 -04:00
Ioannis Kakavas 3b06916fcd Revert "Fix Active Directory tests (#47266)"
This reverts commit 7d9c064218.
2019-10-01 13:32:31 +03:00
Ioannis Kakavas 7d9c064218 Fix Active Directory tests (#47266)
Fixes multiple Active Directory related tests that run against the
samba fixture. Some were failing since we changed the realm settings
format in 7.0 and a few were slightly broken in other ways.
We can move to cleanup the tests in a follow up but this work fits
better to be done with or after we move the tests from a Samba
based fixture to a real(-ish) Microsoft Active Directory based
fixture.

Resolves: #33425, #35738
2019-10-01 10:52:07 +03:00
Ioannis Kakavas 33c5e5b09d Fix SSLErrorMessageTests in Windows (#47315)
- Build paths with PathUtils#get instead of hard-coding a string with
forward slashes.
- Do not try to match the whole message that includes paths. The
file separator is `\\` in windows but when we throw an Elasticsearch
Exception, the message is formatted with LoggerMessageFormat#format
which replaces `\\` with `\` in Path names. That means that in Windows
the Exception message will contain paths with single backslashes while
the expected string that comes from Path#toString on filename and
env.configFile will contain double backslashes. There is no point in
attempting to match the whole message string for the purpose of this test.

Resolves: #45598
2019-10-01 09:14:36 +03:00
Marios Trivyzas fa0b1b641a
SQL: Add examples fo muting sql/csv integ tests (#47291)
Add examples of failures for both sql and csv integeration
tests and instructions on how to mute them.

(cherry picked from commit 591bba46516d770f5fc95a4c536dd7448b74dd49)
2019-10-01 09:12:20 +03:00
Armin Braun 3d23cb44a3
Speed up Snapshot Finalization (#47283) (#47309)
As a result of #45689 snapshot finalization started to
take significantly longer than before. This may be a
little unfortunate since it increases the likelihood
of failing to finalize after having written out all
the segment blobs.
This change parallelizes all the metadata writes that
can safely run in parallel in the finalization step to
speed the finalization step up again. Also, this will
generally speed up the snapshot process overall in case
of large number of indices.

This is also a nice to have for #46250 since we add yet
another step (deleting of old index- blobs in the shards
to the finalization.
2019-09-30 23:28:59 +02:00
Marios Trivyzas bd2abeef40
SQL: [TESTS] Improve error messages on failures (#47308)
When an integration test fails before the assertion of the results it's
missing information, like the file name and the line in the file where
the test resides.

(cherry picked from commit 683dc7213311d13c81e06829e08f3f9f80ebf73a)
2019-09-30 22:18:39 +03:00
emasab 87156ad93b
SQL: Fix issue with duplicate columns in SELECT (#42122)
Previously, if a column (field, scalar, alias) appeared more than once in the
SELECT list, the value was returned only once (1st appearance) in each row.

Fixes: #41811

(cherry picked from commit 097ea36581a751605fc4f2088319d954ce35b5d1)
2019-09-30 15:56:29 +03:00
Martijn van Groningen fe937ea4b8
Add config namespace in get policy api response (#47162)
Currently the policy config is placed directly in the json object
of the toplevel `policies` array field. For example:

```
{
    "policies": [
        {
            "match": {
                "name" : "my-policy",
                "indices" : ["users"],
                "match_field" : "email",
                "enrich_fields" : [
                    "first_name",
                    "last_name",
                    "city",
                    "zip",
                    "state"
                ]
            }
        }
    ]
}
```

This change adds a `config` field in each policy json object:

```
{
    "policies": [
        {
            "config": {
                "match": {
                    "name" : "my-policy",
                    "indices" : ["users"],
                    "match_field" : "email",
                    "enrich_fields" : [
                        "first_name",
                        "last_name",
                        "city",
                        "zip",
                        "state"
                    ]
                }
            }
        }
    ]
}
```

This allows us in the future to add other information about policies
in the get policy api response.

The UI will consume this API to build an overview of all policies.
The UI may in the future include additional information about a policy
and the plan is to include that in the get policy api, so that this
information can be gathered in a single api call.

An example of the information that is likely to be added is:
* Last policy execution time
* The status of a policy (executing, executed, unexecuted)
* Information about the last failure if exists
2019-09-30 14:37:23 +02:00
David Roberts 0807d409bf [ML] Reinstate ML daily maintenance actions (#47103)
A refactoring in 6.6 meant that the ML daily
maintenance actions have not been run at all
since then. This change installs the local
master listener that schedules the ML daily
maintenance, and also defends against some
subtle race conditions that could occur in the
future if a node flipped very quickly between
master and non-master.

Fixes #47003
2019-09-30 13:12:32 +01:00
Jason Tedor 2cba323b4e
Remove use of get raw in token/API key settings (#47260)
These settings were using get raw to fallback to whether or not SSL is
enabled. Yet, we have a formal mechanism for falling back to a
setting. This commit cuts over to that formal mechanism.
2019-09-30 06:35:58 -04:00
Yannick Welsch 9dc90e41fc Remove "force" version type (#47228)
It's been deprecated long ago and can be removed.

Relates to #20377

Closes #19769
2019-09-30 11:58:34 +02:00
Martijn van Groningen bb3e9cb908
fixed checkstyle violation 2019-09-30 08:42:51 +02:00
Martijn van Groningen 66f72bcdbc
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-30 08:12:28 +02:00
Martijn van Groningen 1c3d5b77b5
give monitoring more time 2019-09-30 08:04:29 +02:00
Yogesh Gaikwad 2be351c5d0
Use 'should' clause instead of 'filter' when querying native privileges (#47019) (#47271)
When we added support for wildcard application names, we started to build
the prefix query along with the term query but we used 'filter' clause
instead of 'should', so this would not fetch the correct application
privilege descriptor thereby failing the _has_privilege checks.
This commit changes the clause to use should and with minimum_should_match
as 1.
2019-09-30 14:14:52 +10:00
Rory Hunter 53a4d2176f
Convert most awaitBusy calls to assertBusy (#45794) (#47112)
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.

There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.

I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.

Other changes:

  * Rework `waitForDocs` to scale its timeout. Instead of calling
    `assertBusy` in a loop, work out a reasonable overall timeout and await
    just once.
  * Some tests failed after switching to `assertBusy` and had to be fixed.
  * Correct the expect templates in AbstractUpgradeTestCase.  The ES
    Security team confirmed that they don't use templates any more, so
    remove this from the expected templates. Also rewrite how the setup
    code checks for templates, in order to give more information.
  * Remove an expected ML template from XPackRestTestConstants The ML team
    advised that the ML tests shouldn't be waiting for any
    `.ml-notifications*` templates, since such checks should happen in the
    production code instead.
  * Also rework the template checking code in `XPackRestTestHelper` to give
    more helpful failure messages.
  * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
2019-09-29 12:21:46 +01:00
Nhat Nguyen 444b47ce88 Relax maxSeqNoOfUpdates assertion in FollowingEngine (#47188)
We disable MSU optimization if the local checkpoint is smaller than
max_seq_no_of_updates. Hence, we need to relax the MSU assertion in
FollowingEngine for that scenario. Suppose the leader has three
operations: index-0, delete-1, and index-2 for the same doc Id. MSU on
the leader is 1 as index-2 is an append. If the follower applies index-0
then index-2, then the assertion is violated.

Closes #47137
2019-09-27 14:00:20 -04:00
James Rodewig b159305274
[DOCS] Add redirect for SLM API docs (#46838) (#46865) 2019-09-27 11:05:55 -04:00
Martijn van Groningen 7ffe2e7e63
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-27 14:42:11 +02:00
Marios Trivyzas 01623f9f1c
SQL: Add alias DATETRUNC to DATE_TRUNC function (#47173)
To be on the safe side in terms of use cases also add the alias
DATETRUNC to the DATE_TRUNC function.

Follows: #46473

(cherry picked from commit 9ac223cb1fc66486f86e218fa785a32b61e9bacc)
2019-09-27 15:38:51 +03:00
Andrei Dan 4c909438dd
Fix OriginationDate parsing tests. (#47170) (#47200)
Drop the usage of `SimpleDateFormat` and use the `DateFormatter` instead

(cherry picked from commit 7cf509a7a11ecf6c40c44c18e8f03b8e81fcd1c2)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-09-27 13:16:45 +01:00
Przemysław Witek 3fbd58d156
[7.x] Allow evaluation to consist of multiple steps. (#46653) (#47194) 2019-09-27 13:01:51 +02:00
Costin Leau b29a2cb360 SQL: Check case where the pivot limit is reached (#47121)
In some cases, the fetch size affects the way the groups are returned
causing the last page to go beyond the limit. Add dedicated check to
prevent extra data from being returned.

Fix #47002

(cherry picked from commit f4c29646f097bbd29855300342823ef4cef61c05)
2019-09-26 22:32:42 +03:00
Igor Motov ae202fda21 SQL: Add support for shape type (#46464)
Enables support for Cartesian geometries shape type. We still need to
decide how to handle the distance function since it is currently using
the haversine distance formula and returns results in meters, which
doesn't make any sense for Cartesian geometries.

Closes #46412
Relates to #43644
2019-09-26 09:47:42 -04:00
David Roberts 77cc6d5bad [TEST] Work around _cat/indices bug with security enabled (#47160)
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652
2019-09-26 13:29:40 +01:00
Dimitris Athanasiou 0765bd4bf7
[7.x][ML] Ensure data frame analytics task is only marked completed once (#47119) (#47157)
Closes #46907
2019-09-26 15:26:06 +03:00
Tanguy Leroux 95e2ca741e
Remove unused private methods and fields (#47154)
This commit removes a bunch of unused private fields and unused
private methods from the code base.

Backport of (#47115)
2019-09-26 12:49:21 +02:00
Martijn van Groningen 8a4eefdd83
Expose enrich stats api to monitoring. (#46708)
This change also slightly modifies the stats response,
so that is can easier consumer by monitoring and other
users. (coordinators stats are now in a list instead of
a map and has an additional field for the node id)

Relates to #32789
2019-09-26 11:04:33 +02:00
Yogesh Gaikwad 9a64b7a888
[Backport] Validate `query` field when creating roles (#46275) (#47094)
In the current implementation, the validation of the role query
occurs at runtime when the query is being executed.

This commit adds validation for the role query when creating a role
but not for the template query as we do not have the runtime
information required for evaluating the template query (eg. authenticated user's
information). This is similar to the scripts that we
store but do not evaluate or parse if they are valid queries or not.

For validation, the query is evaluated (if not a template), parsed to build the
QueryBuilder and verify if the query type is allowed.

Closes #34252
2019-09-26 17:57:36 +10:00
Jim Ferenczi 04972baffa
Merge ShardSearchTransportRequest and ShardSearchLocalRequest (#46996) (#47081)
This change merges the `ShardSearchTransportRequest` and `ShardSearchLocalRequest`
into a single `ShardSearchRequest` that can be used to create a SearchContext.

Relates #46523
2019-09-26 09:20:53 +02:00
Benjamin Trent fcddaa90de
[7.x] [ML][Inference] adding tree model (#47044) (#47141)
* [ML][Inference] adding tree model (#47044)

* [ML][Inference] adding tree model

* renaming features for updated schema

* fixing 7.x compilation
2019-09-25 19:11:15 -04:00
Gordon Brown 7ac647c365
Add support for POST requests to SLM Execute API (#47061)
This commit adds support for POST requests to the SLM `_execute` API,
because POST is a more appropriate HTTP verb for this action as it is
not idempotent. The docs are also changed to favor POST over PUT,
although PUT is not removed or officially deprecated.
2019-09-25 16:15:10 -06:00
Andrei Dan 27520cac3b
ILM: parse origination date from index name (#46755) (#47124)
* ILM: parse origination date from index name (#46755)

Introduce the `index.lifecycle.parse_origination_date` setting that
indicates if the origination date should be parsed from the index name.
If set to true an index which doesn't match the expected format (namely
`indexName-{dateFormat}-optional_digits` will fail before being created.
The origination date will be parsed when initialising a lifecycle for an
index and it will be set as the `index.lifecycle.origination_date` for
that index.

A user set value for `index.lifecycle.origination_date` will always
override a possible parsable date from the index name.

(cherry picked from commit c363d27f0210733dad0c307d54fa224a92ddb569)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

* Drop usage of Map.of to be java 8 compliant
2019-09-25 21:44:16 +01:00
Lee Hinman a267df30fa Wait for snapshot completion in SLM snapshot invocation (#47051)
* Wait for snapshot completion in SLM snapshot invocation

This changes the snapshots internally invoked by SLM to wait for
completion. This allows us to capture more snapshotting failure
scenarios.

For example, previously a snapshot would be created and then registered
as a "success", however, the snapshot may have been aborted, or it may
have had a subset of its shards fail. These cases are now handled by
inspecting the response to the `CreateSnapshotRequest` and ensuring that
there are no failures. If any failures are present, the history store
now stores the action as a failure instead of a success.

Relates to #38461 and #43663
2019-09-25 14:25:22 -06:00
Gordon Brown a46eef9634
Change SLM stats format (#46991)
Using arrays of objects with embedded IDs is preferred for new APIs over
using entity IDs as JSON keys.  This commit changes the SLM stats API to
use the preferred format.
2019-09-25 11:32:08 -06:00
Yannick Welsch 9e17b78fee Mute second test in monitoring/bulk/10_basic
Relates #30101
2019-09-25 14:17:01 +02:00
Benjamin Trent 05fb7be571
[7.x] [ML][Inference] Feature pre-processing objects and functions (#46777) (#47040)
* [ML][Inference] Feature pre-processing objects and functions (#46777)

To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future. 

This PR lays down the ground work for 3 basic encodings:

* HotOne
* Target Mean
* Frequency

More feature encodings or pre-processings could be added in the future:

* Handling missing columns
* Standardization
* Label encoding
* etc....

* fixing compilation for namedxcontent tests
2019-09-25 08:16:24 -04:00
Yannick Welsch a4cecc54ab Mute monitoring/bulk/20_privileges
Relates #30101
2019-09-25 14:03:08 +02:00
Yannick Welsch eb86d71edd Mute MlJobIT.testDeleteJob
Relates #45652
2019-09-25 12:53:09 +02:00
Yannick Welsch 7a5b5af171 Mute MlJobIT.testDeleteJobAsync
Relates #45652
2019-09-25 12:53:05 +02:00
Ioannis Kakavas 23bceaadf8
Handle RelayState in preparing a SAMLAuthN Request (#46534) (#47092)
This change allows for the caller of the `saml/prepare` API to pass
a `relay_state` parameter that will then be part of the redirect
URL in the response as the `RelayState` query parameter.

The SAML IdP is required to reflect back the value of that relay
state when sending a SAML Response. The caller of the APIs can
then, when receiving the SAML Response, read and consume the value
as it see fit.
2019-09-25 13:23:46 +03:00
Yogesh Gaikwad 6f453aa6b2
Validate index and cluster privilege names when creating a role (#46361) (#47063)
This commit adds validation so a role cannot be created with
invalid index or cluster privilege name.

Closes #29703
2019-09-25 18:57:11 +10:00
Yannick Welsch 056ac32738 Mute JdbcCsvSpecIT.testAverageWithOneValueAndLimit
Relates to #47080
2019-09-25 10:36:53 +02:00
Christoph Büscher 0c187e0a10
Add migration tool checks for `_field_names` disabling (#46972)
This change adds a check to the migration tool that warns about the deprecated
`enabled` setting for the `_field_names` field on 7.x indices and issues a
warning for templates containing this setting, which has been removed
with 8.0.

Relates to #42854, #46681
2019-09-25 10:15:10 +02:00
Hendrik Muhs 7377ac4637 [Transform] Replace transforms with transform, index constants (#47023)
- replace "transforms" with "transform" for consistency
 - use constants for internal index naming wherever possible and document required changes
2019-09-25 08:31:43 +02:00
Hendrik Muhs e974f178b5 [Transform] rename data frame transform to transform for hlrc client (#46933)
rename data frame transform to transform for hlrc
2019-09-25 08:31:43 +02:00
Benjamin Trent 00c1c0132b
[ML] fix two datafeed flush lockup bugs (#46982) (#47024)
* [ML] fix two flush lockup bugs

* Addressing PR comments

* moving debug logging line so it is only written on success
2019-09-24 13:03:20 -04:00
James Baiera 9967aff714 Add notice to Enrich index mapping metadata (#45996) 2019-09-24 12:55:11 -04:00
Albert Zaharovits 3a82e0f7f4
Do not rewrite aliases on remove-index from aliases requests (#46989) (#47018)
When we rewrite alias requests, after filtering down to only those that
the user is authorized to see, it can be that there are no aliases
remaining in the request. However, core Elasticsearch interprets this as
_all so the user would see more than they are authorized for. To address
this, we previously rewrote all such requests to have aliases `"*"`,
`"-*"`, which would be interpreted when aliases are resolved as
nome. Yet, this is only needed for get aliases requests and we were
applying it to all alias requests, including remove index requests. If
such a request was sent to a coordinating node that is not the master
node, the request would be rewritten to include `"*"` and `"-*"`, and
then the master would authorize the user for these. If the user had
limited permissions, the request would fail, even if they were
authorized on the index that the remove index action was over. This
commit addresses this by rewriting for get aliases and remove
aliases request types but not for the remove index.

Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
Co-authored-by: Tim Vernum <tim@adjective.org>
2019-09-24 19:07:55 +03:00
Dimitris Athanasiou 64bf1b56fe
[7.x] SQL: Mute pivot testAverageWithOneValueAndOrder and testSumWithoutSubquery (#47030) (#47033)
Relates #47002
2019-09-24 19:04:52 +03:00
Ioannis Kakavas 98e6bb4d01
Workaround JDK-8213202 in SSLClientAuthTests (#46995)
This change works around JDK-8213202, which is a bug related to TLSv1.3
session resumption before JDK 11.0.3 that occurs when there are
multiple concurrent sessions being established. Nodes connecting to
each other will trigger this bug when client authentication is
disabled, which is the case for SSLClientAuthTests.

Backport of #46680
2019-09-24 12:47:56 +03:00
Lee Hinman 5ca37db60c Mute SLMSnapshotBlockingIntegTests.testRetentionWhileSnapshotInProgress
Relates to #46508
2019-09-23 17:08:09 -06:00
James Baiera a349b22273 Add the cluster version to enrich policies (#45021)
Adds the Elasticsearch version as a field on the EnrichPolicy object
2019-09-23 18:44:45 -04:00
Julie Tibshirani 9124c94a6c
Add support for aliases in queries on _index. (#46944)
Previously, queries on the _index field were not able to specify index aliases.
This was a regression in functionality compared to the 'indices' query that was
deprecated and removed in 6.0.

Now queries on _index can specify an alias, which is resolved to the concrete
index names when we check whether an index matches. To match a remote shard
target, the pattern needs to be of the form 'cluster:index' to match the
fully-qualified index name. Index aliases can be specified in the following query
types: term, terms, prefix, and wildcard.
2019-09-23 13:21:37 -07:00
Jim Ferenczi 08f28e642b Replace SearchContext with QueryShardContext in query builder tests (#46978)
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.

Relates #46523
2019-09-23 20:24:02 +02:00
Costin Leau a610503783 SQL: Add PIVOT support (#46489)
Add initial PIVOT support for transforming a regular table into a
statistics table around an arbitrary pivoting column:

SELECT * FROM
 (SELECT languages, country, salary, FROM mp)
 PIVOT (AVG(salary) FOR countries IN ('NL', 'DE', 'ES', 'RO', 'US'))

In the current implementation PIVOT allows only one aggregation however
this restriction is likely to be lifted in the future.
Also not all aggregations are working, in particular MatrixStats are not yet supported.

(cherry picked from commit d91263746a222915c570d4a662ec48c1d6b4f583)
2019-09-23 21:04:13 +03:00
Martijn van Groningen 33bbc4798b
fixed compile errors after merging 2019-09-23 09:46:14 +02:00
Martijn van Groningen 0cfddca61d
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-23 09:46:05 +02:00
Lisa Cawley 875d864be6
[DOCS] Update data frame transform URLs (#46940) (#46946) 2019-09-20 15:57:43 -07:00
Michael Basnight f1c7ed647b Allow comma separated ids in get enrich policy API (#46351)
This commit changes the GET REST api so it will accept an optional comma
separated list of enrich policy ids. This change also modifies the
behavior of the GET API in that it will not error if it is passed a bad
enrich id anymore, but will instead just return an empty list.
2019-09-20 10:06:58 -05:00
Hendrik Muhs 4a2cb05162 add message about transform disabled if license is missing (#46901)
adds a message for transform about what happens if no license has been activated
2019-09-20 13:47:40 +02:00
Hendrik Muhs abe889af75
[7.5][Transform] rename classes in transform plugin (#46867)
rename classes and settings in transform plugin, provide BWC for old settings
2019-09-20 10:43:00 +02:00
Jason Tedor bd77626177
Add the ability to require an ingest pipeline (#46847)
This commit adds the ability to require an ingest pipeline on an
index. Today we can have a default pipeline, but that could be
overridden by a request pipeline parameter. This commit introduces a new
index setting index.required_pipeline that acts similarly to
index.default_pipeline, except that it can not be overridden by a
request pipeline parameter. Additionally, a default pipeline and a
request pipeline can not both be set. The required pipeline can be set
to _none to ensure that no pipeline ever runs for index requests on that
index.
2019-09-19 16:37:45 -04:00
Yannick Welsch 9638ca20b0 Allow dropping documents with auto-generated ID (#46773)
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
2019-09-19 16:46:33 +02:00
Armin Braun 6b09c2cdbb
Limit Netty Workers in NativeRealmIntegTestCase (#46816) (#46850)
The fact that this test randomly uses a relatively large number
of nodes and hence Netty worker threads created a problem with
running out of direct memory on CI.
Tests run with 512M heap (and hence 512M direct memory) by default.
On a CI worker with 16 cores, this means Netty will by default set
up 32 transport workers. If we get unlucky and a lot of them
actually do work (and thus instantiate a `CopyBytesSocketChannel`
which costs 1M per thread for the thread-local IO buffer) we
would run out of memory.

This specific failure was only seen with `NativeRealmIntegTests` so I
only added the constraint on the Netty worker count here.
We can add it to other tests (or `SecurityIntegTestCase`) if need be
but for now it doesn't seem necessary so I opted for least impact.

Closes #46803
2019-09-19 13:07:42 +02:00
Dimitris Athanasiou 02a5e153dc
[7.x][ML] Parse and index data frame analytics state (#46804) (#46820)
This commit reuses the same state processor that is used for autodetect
to parse state output from data frame analytics jobs. We then index the
state document into the state index.

Backport of #46804
2019-09-18 20:37:40 +03:00
Benjamin Trent 9cf9c64ec2
[7.x] [ML][Transforms] remove `force` flag from _start (#46414) (#46748)
* [ML][Transforms] remove `force` flag from _start (#46414)

* [ML][Transforms] remove `force` flag from _start

* fixing expected error message

* adjusting bwc version
2019-09-18 10:06:05 -04:00
Dimitris Athanasiou cebe8da617
[7.x][ML] MlMemoryTracker should ignore analytics tasks without config (#46789) (#46811)
It is possible for a running analytics job that its config is removed
from the '.ml-config' index (perhaps the user deleted the entire index,
etc.). In that case the task remains without a matching config. I have
raised #46781 to discuss how to deal with this issue.

This commit focuses on `MlMemoryTracker` and changes it so that when
we get the configs for the running tasks we leniently ignore missing ones.
This at least means memory tracking will keep working for other jobs
if one or more are missing.

In addition, this commit makes the cleanup code for native analytics
tests more robust by explicitly stopping all jobs and force-stopping
if an error occurs. This helps so that a single failing test does
not cause other tests fail due to pending tasks.

Backport of #46789
2019-09-18 16:35:25 +03:00
Alpar Torok f3e67bdd17 Add resolution rule to allow resolving all deps (#46768)
Since the `resolveAllDependencies` task resolves all the congfigurations
it can find, this was not caught by our testing, but it's required to be
configuraed specifically.

We should probably cut-over to the new configurations at some point to
avoid problems like this.

Closes elastic/infra#14580
2019-09-18 11:09:43 +03:00
Lee Hinman b85468d6ea
Add node setting for disabling SLM (#46794) (#46796)
This adds the `xpack.slm.enabled` setting to allow disabling of SLM
functionality as well as its HTTP API endpoints.

Relates to #38461
2019-09-17 17:39:41 -06:00
Oliver Gupte cbd58d3b78
Give kibana user privileges to create APM agent config index (#46765) (#46792)
* Give kibana user reserved role privileges on .apm-* to create APM agent configuration index.

* fixed test to include checking all .apm-* permissions

* changed pattern from ".apm-*" to the more specific ".apm-agent-configuration"
2019-09-17 15:01:42 -07:00
Costin Leau 92e518e789 SQL: Properly handle indices with no/empty mapping (#46775)
When encountering only indices with empty mapping, the IndexResolver
throws an exception as it expects to find at least one entry.
This commit fixes this case so that an empty mapping is returned.

Fix #46757

(cherry picked from commit 5f4f5807acb93b5fab36718c092c328977a396b6)
2019-09-17 16:01:22 +03:00
Armin Braun b0f09b279f
Make Snapshot Logic Write Metadata after Segments (#45689) (#46764)
* Write metadata during snapshot finalization after segment files to prevent outdated metadata in case of dynamic mapping updates as explained in #41581
* Keep the old behavior of writing the metadata beforehand in the case of mixed version clusters for BwC reasons
   * Still overwrite the metadata in the end, so even a mixed version cluster is fixed by this change if a newer version master does the finalization
* Fixes #41581
2019-09-17 13:09:39 +02:00
Tomas Della Vedova e1cf103980
Fixes for API specification (#46522) (#46736)
Follow-up of #42346
2019-09-17 11:49:24 +02:00
Costin Leau 683b5fdeca SQL: Support queries with HAVING over SELECT (#46709)
Handle queries with implicit GROUP BY where the aggregation is not in
the projection/SELECT but inside the filter/HAVING such as:

SELECT 1 FROM x HAVING COUNT(*) > 0

The engine now properly identifies the case and handles it accordingly.

Fix #37051

(cherry picked from commit fa53ca05d8219c27079b50b4a5b7aeb220c7cde2)
2019-09-17 11:14:39 +03:00
Costin Leau 90f4c2379b SQL: improve ResultSet behavior when no rows are available (#46753)
Improve the defensive behavior of ResultSet when dealing with incorrect
API usage. In particular handle the case of dealing with no row
available (either because the cursor is before the first entry or after
the last).

Fix #46750

(cherry picked from commit 58fa38e4606625962e879265d35eacb0960c6cdb)
2019-09-17 11:14:38 +03:00
Przemysław Witek e49be611ad
[7.x] Add audit messages for Data Frame Analytics (#46521) (#46738) 2019-09-16 21:21:38 +02:00
Benjamin Trent 92acc732de
[ML][Transform] Use field caps for mapping deductino (#46703) (#46742) 2019-09-16 10:05:55 -04:00
Andrei Stefan 40e9353947 SQL: use the correct data type for types conversion (#46574)
(cherry picked from commit 3e25db2f302c3aafe27e4d8d4fb1743401d85e6d)
2019-09-16 15:36:17 +03:00
Hendrik Muhs c8f52ec4ff
[Transform] Rename data frame plugin to transform: classes in xpack.core (#46644) (#46734)
rename classes in xpack.core of transform plugin from "data frame transform" to "transform"
2019-09-16 13:39:22 +02:00
Andrei Dan c57cca98b2
[ILM] Add date setting to calculate index age (#46561) (#46697)
* [ILM] Add date setting to calculate index age

Add the `index.lifecycle.origination_date` to allow users to configure a
custom date that'll be used to calculate the index age for the phase
transmissions (as opposed to the default index creation date).

This could be useful for users to create an index with an "older"
origination date when indexing old data.

Relates to #42449.

* [ILM] Don't override creation date on policy init

The initial approach we took was to override the lifecycle creation date
if the `index.lifecycle.origination_date` setting was set. This had the
disadvantage of the user not being able to update the `origination_date`
anymore once set.

This commit changes the way we makes use of the
`index.lifecycle.origination_date` setting by checking its value when
we calculate the index age (ie. at "read time") and, in case it's not
set, default to the index creation date.

* Make origination date setting index scope dynamic

* Document orignation date setting in ilm settings

(cherry picked from commit d5bd2bb77ee28c1978ab6679f941d7c02e389d32)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-09-16 08:50:28 +01:00
Dimitris Athanasiou 63eb0d9081
[7.x][ML] Avoid marking data frame analytics task completed twice (#46721) (#46724)
When the stop API is called while the task is running there is
a chance the task gets marked completed twice. This may cause
undesired side effects, like indexing the progress document a second
time after the stop API has returned (the cause for #46705).

This commit adds a check that the task has not been completed before
proceeding to mark it so. In addition, when we update the task's state
we could get some warnings that the task was missing if the stop API
has been called in the meantime. We now check the errors are
`ResourceNotFoundException` and ignore them if so.

Closes #46705

Backports #46721
2019-09-15 17:25:26 +03:00
Nhat Nguyen cabff5a7cd Handle lower retaining seqno retention lease error (#46420)
We renew the CCR retention lease at a fixed interval, therefore it's
possible to have more than one in-flight renewal requests at the same
time. If requests arrive out of order, then the assertion is violated.

Closes #46416
Closes #46013
2019-09-13 08:50:19 -04:00
Dimitris Athanasiou 0bc8acaf5b
[7.x][ML] Create state index and alias before starting an analytics job (#46602) (#46648)
This is fixing a bug where if an analytics job is started before any
anomaly detection job is opened, we create an index after the state
write alias.

Instead, we should create the state index and alias before starting
an analytics job and this commit makes sure this is the case.

Backport of #46602
2019-09-13 10:34:12 +03:00
Luca Cavanna e57756492a Update http-core and http-client dependencies (#46549)
Relates to #45808
Closes #45577
2019-09-12 09:45:29 +02:00
Zachary Tong 6dc8ed5d57
[7.x Backport] Refactor AllocatedPersistentTask#init(), move rollup ctor logic (#46406)
This makes the AllocatedPersistentTask#init() method protected so that
implementing classes can perform their initialization logic there,
instead of the constructor.  Rollup's task is adjusted to use this
init method.

It also slightly refactors the methods to se a static logger in the
AllocatedTask instead of passing it in via an argument.  This is
simpler, logged messages come from the task instead of the
service, and is easier for tests
2019-09-11 17:00:28 -04:00
James Rodewig f9bf10f2b6
[DOCS] Change "a SSL" to "an SSL" in the Java docs (#46524) (#46618) 2019-09-11 15:55:57 -04:00
Marios Trivyzas d956509394 SQL: Implement DATE_TRUNC function (#46473)
DATE_TRUNC(<truncate field>, <date/datetime>) is a function that allows
the user to truncate a timestamp to the specified field by zeroing out
the rest of the fields. The function is implemented according to the
spec from PostgreSQL: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC

Closes: #46319
(cherry picked from commit b37e96712db1aace09f17b574eb02ff6b942a297)
2019-09-11 21:41:02 +03:00
Ryan Ernst 86290cb3d9 Make reuse of sql test code explicit (#45884)
The sql project uses a common set of security tests, which are run in
subprojects. Currently these are shared through a shared directory, but
this is not setup correctly to ensure it is built before tests run. This
commit changes the test classes to be an artifact of the sql/qa/security
project and makes the test runner use the built artifact (a directory of
classes) for tests.

closes #45866
2019-09-11 10:56:07 -07:00
Lee Hinman 09a9cefaa0 Handle partial failure retrieving segments in SegmentCountStep (#46556)
Since the `IndicesSegmentsRequest` scatters to all shards for the index,
it's possible that some of the shards may fail. This adds failure
handling and logging (since this is a best-effort step in the first
place) for this case.
2019-09-11 10:29:31 -06:00
Marios Trivyzas 0963e78164 SQL: Fix issue with common type resolution (#46565)
Many scalar functions try to find out the common type between their
arguments in order to set it as their return time, e.g.:
for `float + double` the common type which is set as the return type
of the + operation is `double`.

Previously, for data types TEXT and KEYWORD (string data types) there
was no common data type found and null was returned causing NPEs when
the function was trying to resolve the return data type.

Fixes: #46551
(cherry picked from commit 291017d69dfc810707c3c7c692f5a50af431b790)
2019-09-11 19:10:15 +03:00
Lee Hinman 52d7b03b49 Wait for no snapshots in state in testRetentionWhileSnapshotIn… (#46573)
This commit adds a wait/check for all running snapshots to be cleared
before taking another snapshot. The previous snapshot was successful but
had not yet been cleared from the cluster state, so the second snapshot
failed due to a `ConcurrentSnapshotException`.

Resolves #46508
2019-09-11 09:47:01 -06:00
David Roberts 461de5b58e [TEST] Remove incorrect data frame analytics state assertion (#46597)
After starting the analytics job and checking its state
the state can be any of "started", "reindexing" or
"analyzing" depending on how quickly the work is done.
2019-09-11 16:33:14 +01:00
David Roberts 07a0140260
[ML-DataFrame] Ensure latest index template exists before indexing docs (#46595)
When upgrading data nodes to a newer version before
master nodes there was a risk that a transform running
on an upgraded data node would index a document into
the new transforms internal index before its index
template was created.  This would cause the index to
be created with entirely dynamic mappings.

This change introduces a check before indexing any
internal transforms document to ensure that the required
index template exists and create it if it doesn't.

Backport of #46553
2019-09-11 16:27:26 +01:00
Jim Ferenczi 23bf310c84 Replace the SearchContext with QueryShardContext when building aggregator factories (#46527)
This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.

Relates #46523
2019-09-11 16:43:30 +02:00
Hendrik Muhs efea581dcc
[7.x][Transform]Rename data frame plugin to transform: plugin and package names (#46583)
rename data frame transform plugin to transform:

 - rename plugin data-frame to transform
 - change all package names from o.e.*.dataframe.* to o.e.*.transform.*
 - necessary changes to fix loading/testing
2019-09-11 14:50:08 +02:00
Armin Braun 41633cb9b5
More Efficient Ordering of Shard Upload Execution (#42791) (#46588)
* More Efficient Ordering of Shard Upload Execution (#42791)
* Change the upload order of of snapshots to work file by file in parallel on the snapshot pool instead of merely shard-by-shard
* Inspired by #39657
* Cleanup BlobStoreRepository Abort and Failure Handling (#46208)
2019-09-11 13:59:20 +02:00
Martijn van Groningen a4b0f66919
Add enrich stats api (#46462)
The enrich api returns enrich coordinator stats and
information about currently executing enrich policies.

The coordinator stats include per ingest node:
* The current number of search requests in the queue.
* The total number of outstanding remote requests that
  have been executed since node startup. Each remote
  request is likely to include multiple search requests.
  This depends on how much search requests are in the
  queue at the time when the remote request is performed.
* The number of current outstanding remote requests.
* The total number of search requests that `enrich`
  processors have executed since node startup.

The current execution policies stats include:
* The name of policy that is executing
* A full blow task info object that is executing the policy.

Relates to #32789
2019-09-11 13:40:24 +02:00
Jim Ferenczi 425b1a77e8 Add more context to QueryShardContext (#46584)
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
2019-09-11 12:24:51 +02:00
Dimitris Athanasiou 579af626f5
[7.x][ML] No error when datafeed stops during updating to started (#46495) (#46542)
Investigating the test failure reported in #45518 it appears that
the datafeed task was not found during a tast state update. There
are only two places where such an update is performed: when we set
the state to `started` and when we set it to `stopping`. We handle
`ResourceNotFoundException` in the latter but not in the former.

Thus the test reveals a rare race condition where the datafeed gets
requested to stop before we managed to update its state to `started`.
I could not reproduce this scenario but it would be my best guess.

This commit catches `ResourceNotFoundException` while updating the
state to `started` and lets the task terminate smoothly.

Closes #45518

Backport of #46495
2019-09-11 13:18:42 +03:00
Przemysław Witek e38e631dac
[7.x] Implement DataFrameAnalyticsAuditMessage and DataFrameAnalyticsAuditor (#45967) (#46519) 2019-09-11 12:17:26 +02:00
Ioannis Kakavas 35810bd2ae
Enforce realm name uniqueness (#46580)
We depend on file realms being unique in a number of places. Pre
7.0 this was enforced by the fact that the multiple realm types
with different name would mean identical configuration keys and
cause configuration parsing errors. Since we intoduced affix
settings for realms this is not the case any more as the realm type
is part of the configuration key.
This change adds a check when building realms which will explicitly
fail if multiple realms are defined with the same name.

Backport of #46253
2019-09-11 13:13:59 +03:00
Martijn van Groningen c79a8e448d
Convert enrich qa modules to use testclusters. 2019-09-11 11:40:18 +02:00
Martijn van Groningen 8a48ef2a06
fixed typo 2019-09-11 09:52:25 +02:00
Martijn van Groningen ef33a99e6e
Disable default features that are not needed for enrich indices. (#46525)
Relates to #32789
2019-09-11 09:20:38 +02:00
Tim Vernum 80064652f8
Fallback to realm authc if ApiKey fails (#46552)
This changes API-Key authentication to always fallback to the realm
chain if the API key is not valid. The previous behaviour was
inconsistent and would terminate on some failures, but continue to the
realm chain for others.

Backport of: #46538
2019-09-11 14:33:17 +10:00
Gordon Brown 7a2878b29b
Fix class used to initialize logger in Watcher (#46467)
This class has been using a logger configured for a different class for
quite a while. While the circumstance in which it logs is rare, it
should still use the correct logger.
2019-09-10 12:41:36 -06:00
Lee Hinman cdc3a260af
Add retention to Snapshot Lifecycle Management (backport of #4… (#46506)
* Add retention to Snapshot Lifecycle Management (#46407)

This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown

* Update serialization versions

* Fix type inference

* Use non-Cancellable HLRC return value

* Fix Client mocking in test

* Fix SLMSnapshotBlockingIntegTests for 7.x branch

* Update SnapshotRetentionTask for non-multi-repo snapshot retrieval

* Add serialization guards for SnapshotLifecyclePolicy
2019-09-10 09:08:09 -06:00
Michael Basnight 9304f5c889 Ensure enrich executes on master node only (#46448)
The previous transport action was a read action, which under the right
set of circumstances can execute on a coordinating node. This commit
ensures that cannot happen.
2019-09-10 09:59:36 -05:00
Ioannis Kakavas 690164d0be Change EmailSslTest for FIPS 140 JVMs (#46278)
This commit changes the SSLContext for the email server we use in
the tests so that it loads its key material from an in memory
keystore (that is in turn built from a pair of PEM encoded private key
and certificate) instead of a PKCS#12 one. This is done so that when 
we run our tests in FIPS 140-2 JVMs, the keystore is of a type that the
Security Provider actually supports.

This also mutes testCanSendMessageToSmtpServerByDisablingVerification
as we can't run tests with verification set to `none` in FIPS 140
JVMs.
2019-09-10 14:39:40 +03:00
Alpar Torok b40ac6dee7 mute on 7.x fo windows
Tracking #44942
2019-09-10 12:34:16 +03:00
Przemysław Witek e21deae535
Disallow persisting any documents when datafeed is isolated (#46485) (#46490) 2019-09-09 21:01:27 +02:00
Martijn van Groningen c057fce978
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-09 08:40:54 +02:00
Andrei Stefan 7b26a8c041 Use `null` schema response for `SYS TABLES` command. (#46386)
(cherry picked from commit a6152f42a47a1ccd668e5892778c8bd2d3a78c4c)
2019-09-07 09:24:54 +03:00
Andrei Stefan 7cf100ba07 SQL: fix scripting for grouped by datetime functions (#46421)
* Fix issue with painless scripting not being correctly generated when
datetime functions are used for GROUPing of an INTERVAL operation.

(cherry picked from commit cb92828e8ec9d9d241bd6189e5835fd99f8b9a44)
2019-09-07 09:24:53 +03:00
David Roberts 7c7fb7e32d [ML] Tolerate total_search_time_ms not mapped in get datafeed stats (#46432)
ML users who upgrade from versions prior to 7.4 to 7.4 or later
will have ML results indices that do not have mappings for the
total_search_time_ms field.  Therefore, when searching these
indices we must tolerate this field not having a mapping.

Fixes #46437
2019-09-06 14:31:15 +01:00
Hendrik Muhs c2194aa7e1 [Transform] simplify class structure of indexer (#46306)
simplify transform task and indexer

 - remove redundant transform id
 - moving client data frame indexer (and builder) into a separate file
2019-09-06 15:24:26 +02:00
Dimitris Athanasiou a6834068e3
[7.x][ML] Extract DataFrameAnalyticsTask into its own class (#46402) (#46426)
This refactors `DataFrameAnalyticsTask` into its own class.
The task has quite a lot of functionality now and I believe it would
make code more readable to have it live as its own class rather than
an inner class of the start action class.

Backport of #46402
2019-09-06 14:13:46 +03:00
Hendrik Muhs 42ada88c5d cleanup static member 2019-09-06 08:55:57 +02:00
Hendrik Muhs ab5f09d29b [ML-DataFrame] improve error message for timeout case in stop (#46131)
improve error message if stopping of transform times out.

related #45610
2019-09-06 08:36:01 +02:00
Hendrik Muhs 78824cce81 reuse mock client to avoid probles with thread context closed errors (#46398) 2019-09-06 08:17:25 +02:00
Benjamin Trent 457ff3e2fb
7.x/ml fix instance serialization bwc (#46404)
* [ML] Fixing instance serialization version for bwc

* fixing CppLogMessage
2019-09-05 13:23:26 -05:00
Benjamin Trent caf3e4d654
[7.x] [ML][Transforms] fixing rolling upgrade continuous transform test (#45823) (#46347) (#46337)
* [ML][Transforms] fixing rolling upgrade continuous transform test (#45823)

* [ML][Transforms] fixing rolling upgrade continuous transform test

* adjusting wait assert logic

* adjusting wait conditions

* [ML][Transforms] allow executor to call start on started task (#46347)

* making sure we only upgrade from 7.4.0 in test
2019-09-05 10:31:11 -05:00
Benjamin Trent 5201386232
[ML] testFullClusterRestart waiting for stable cluster (#46280) (#46335)
* [ML] waiting for ml indices before waiting task assignment testFullClusterRestart

* waiting for a stable cluster after fullrestart

* removing unused imports
2019-09-05 06:57:33 -05:00
Yogesh Gaikwad d5acb15a71
[Backport] Initialize document subset bit set cache used for DLS (#46211) (#46359)
This commit initializes DocumentSubsetBitsetCache even if DLS
is disabled. Previously it would throw null pointer when querying
usage stats if we explicitly disabled DLS as there would be no instance of DocumentSubsetBitsetCache to query. It is okay to initialize
DocumentSubsetBitsetCache which will be empty as the license enforcement
would prevent usage of DLS feature and it will not fail when accessing usage stats.

Closes #45147
2019-09-05 14:34:19 +10:00
Julie Tibshirani 40c3225d26
First round of optimizations for vector functions. (#46294)
This PR merges the `vectors-optimize-brute-force` feature branch, which makes
the following changes to how vector functions are computed:
* Precompute the L2 norm of each vector at indexing time. (#45390)
* Switch to ByteBuffer for vector encoding. (#45936)
* Decode vectors and while computing the vector function. (#46103) 
* Use an array instead of a List for the query vector. (#46155)
* Precompute the normalized query vector when using cosine similarity. (#46190)

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2019-09-04 14:45:57 -07:00
Martijn van Groningen ded98e50b7
Change exact match processor to match processor. (#46041)
Besides a rename, this changes allows to processor to attach multiple
enrich docs to the document being ingested.

Also in order to control the maximum number of enrich docs to be
included in the document being ingested, the `max_matches` setting
is added to the enrich processor.

Relates #32789
2019-09-04 18:05:12 +02:00
Martijn van Groningen 6bec63fdfa
removed redundant cast 2019-09-04 11:18:31 +02:00
Andrey Ershov ece9eb4acd Remove stack trace logging in Security(Transport|Http)ExceptionHandler (#45966)
As per #45852 comment we no longer need to log stack-traces in
SecurityTransportExceptionHandler and SecurityHttpExceptionHandler even
if trace logging is enabled.

(cherry picked from commit c99224a32d26db985053b7b36e2049036e438f97)
2019-09-04 11:50:35 +03:00
Dimitris Athanasiou 8fca5b5204
[7.x][ML] Unmute testStopOutlierDetectionWithEnoughDocumentsToScroll (#46271) (#46282)
The test seems to have been failing due to a race condition between
stopping the task and refreshing the destination index. In particular,
we were going forward with refreshing the destination index even
though the task stopped in the meantime. This was fixed in
request.

Closes #43960

Backport of #46271
2019-09-04 10:57:01 +03:00
Marios Trivyzas fd0affb503
SQL: Fix issue with IIF function when condition folds (#46290)
Previously, when the condition (1st argument) of the IIF function could
be evaluated (folded) to false, the `IfConditional` was eliminated which
caused `IndexOutOfBoundsException` to be thrown when `info()` and
`resolveType()` methods where called.

Fixes: #46268

(cherry picked from commit 9a885a3ac47bc8f52c07770d1d8d670ce0af1e59)
2019-09-04 10:32:49 +03:00
Benjamin Trent 8a4ff7d57d
[ML][Transforms] protecting doSaveState with optimistic concurrency (#46156) (#46281)
* [ML][Transforms] protecting doSaveState with optimistic concurrency

* task code cleanup
2019-09-03 15:27:24 -05:00
Benjamin Trent c3c1dcd2ac
[ML][Transforms] fixing listener being called twice (#46284) (#46292) 2019-09-03 15:26:56 -05:00
Benjamin Trent 53df54c703
[ML][Transforms] fixing stop on changes check bug (#46162) (#46273)
* [ML][Transforms] fixing stop on changes check bug

* Adding new method finishAndCheckState to cover race conditions in early terminations

* changing stopping conditions in `onStart`

* allow indexer to finish when exiting early
2019-09-03 11:04:18 -05:00
Lee Hinman 3d4b8e01c7
Validate SLM policy ids strictly (#45998) (#46145)
This uses strict validation for SLM policy ids, similar to what we use
for index names.

Resolves #45997
2019-09-03 09:20:02 -06:00
David Roberts ab045744ac [ML-DataFrame] Fix off-by-one error in checkpoint operations_behind (#46235)
Fixes a problem where operations_behind would be one less than
expected per shard in a new index matched by the data frame
transform source pattern.

For example, if a data frame transform had a source of foo*
and a new index foo-new was created with 2 shards and 7 documents
indexed in it then operations_behind would be 5 prior to this
change.

The problem was that an empty index has a global checkpoint
number of -1 and the sequence number of the first document that
is indexed into an index is 0, not 1.  This doesn't matter for
indices included in both the last and next checkpoints, as the
off-by-one errors cancelled, but for a new index it affected
the observed result.
2019-09-03 12:45:02 +01:00
Ignacio Vera 59c474e675
reset queryGeometry in ShapeQueryTests (#45974) (#46251) 2019-09-03 10:24:46 +02:00
markharwood a9e3871fc7
Test fix for PinnedQueryBuilderIT (#46187) (#46227)
Fix test issue to stabilise scoring through use of DFS search mode.
Randomised index-then-delete docs introduced by the test framework likely caused an imbalance in IDF scores across shards. Also made number of shards used in test a random number for added test coverage.

Closes #46174
2019-09-02 13:31:22 +01:00
Marios Trivyzas 3bee647e5b
SQL: Fix issue with DataType for CASE with NULL (#46173)
Previously, if the DataType of all the WHEN conditions of a CASE
statement is NULL, then it was set to NULL even if the ELSE clause
has a non-NULL data type, e.g.:
```
CASE WHEN a = 1 THEN NULL
           WHEN a = 5 THEN NULL
ELSE 'foo'
```

Fixes: #46032
(cherry picked from commit 8c1012efbbd3a300afd0dfb9b18250f15ea753f9)
2019-09-02 11:17:24 +03:00
Martijn van Groningen 555b630160
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-02 09:16:55 +02:00
Benjamin Trent d0c5573a51
[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044) (#46096)
Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot.

This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.
2019-08-30 09:27:07 -05:00
Alexander Reelsen 98c32c7846 Fix wrong URL encoding in watcher HTTP client (#45894)
The test assumption was calling the wrong method resulting in a URL
encoding before returning the data.

Closes #44970
2019-08-30 14:02:49 +02:00
Dimitris Athanasiou 5921ae53d8
[7.x][ML] Regression dependent variable must be numeric (#46072) (#46136)
* [ML] Regression dependent variable must be numeric

This adds a validation that the dependent variable of a regression
analysis must be numeric.

* Address review comments and fix some problems

In addition to addressing the review comments, this
commit fixes a few issues I found during testing.

In particular:

- if there were mappings for required fields but they were
not included we were not reporting the error
- if explicitly included fields had unsupported types we were
not reporting the error

Unfortunately, I couldn't get those fixed without refactoring
the code in `ExtractedFieldsDetector`.
2019-08-30 09:57:43 +03:00
Zachary Tong cf8a4171e1
Rename `data-science` plugin to `analytics` (#46133)
Rename `data-science` plugin to `analytics`. 
Also removes enabled flag. Backport of #46092
2019-08-29 12:45:39 -04:00
Simon Willnauer 9b2ea07b17
Flush engine after big merge (#46066) (#46111)
Today we might carry on a big merge uncommitted and therefore
occupy a significant amount of diskspace for quite a long time
if for instance indexing load goes down and we are not quickly
reaching the translog size threshold. This change will cause a
flush if we hit a significant merge (512MB by default) which
frees diskspace sooner.
2019-08-29 17:54:15 +02:00
Michael Basnight 51a703da29
Add enrich transport client support (#46002)
This commit adds an enrich client, as well as a smoke test to validate
the client works.
2019-08-29 09:10:07 -05:00
Nhat Nguyen 028e792e1d Remove already exist assertion while renew ccr lease (#46009)
If a CCR lease is disappeared while we are renewing it, then we will
issue asyncAddRetentionLease to add that lease. And if
asyncAddRetentionLease takes longer than retentionLeaseRenewInterval,
then we can issue another asyncAddRetentionLease request. One of
asyncAddRetentionLease requests will fail with
RetentionLeaseAlreadyExistsException, hence trip the assertion.

Closes #45192
2019-08-29 09:44:40 -04:00
Przemysław Witek b8a0379057
Refactor auditor-related classes (#45893) (#46120) 2019-08-29 14:21:03 +02:00
Przemysław Witek fbe9e8a530
Do not throw an exception if the process finished quickly but without any error. (#46073) (#46113) 2019-08-29 10:47:17 +02:00
Gordon Brown 47bbd9d9a9
[7.x] Fix rollover alias in SLM history index template (#46001)
This commit adds the `rollover_alias` setting required for ILM to work
correctly to the SLM history index template and adds assertions to the
SLM integration tests to ensure that it works correctly.
2019-08-28 14:50:22 -07:00
Tal Levy a356bcff41
Add Circle Processor (#43851) (#46097)
add circle-processor that translates circles to polygons
2019-08-28 14:44:08 -07:00
Julie Tibshirani d94c4dcffb Use float instead of double for query vectors. (#46004)
Currently, when using script_score functions like cosineSimilarity, the query
vector is treated as an array of doubles. Since the stored document vectors use
floats, it seems like the least surprising behavior for the query vectors to
also be float arrays.

In addition to improving consistency, this change may help with some
optimizations we have been considering around vector dot product.
2019-08-28 11:03:14 -07:00
Mark Tozzi 9ac85a4a2b Fix compilation in CumulativeCardinalityAggregatorTests 2019-08-28 09:31:48 -04:00
Dimitris Athanasiou 25d64508f6
[7.x][ML] Support boolean fields for DF analytics (#46037) (#46054)
This commit adds support for `boolean` fields in data frame
analytics (and currently both outlier detection and regression).
The analytics process expects `boolean` fields to be encoded as
integers with 0 or 1 value.
2019-08-28 12:02:29 +03:00
Martijn van Groningen 1157224a6b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-28 10:14:07 +02:00
Jake Landis 154d1dd962
Watcher max_iterations with foreach action execution (#45715) (#46039)
Prior to this commit the foreach action execution had a hard coded 
limit to 100 iterations. This commit allows the max number of 
iterations to be a configuration ('max_iterations') on the foreach 
action. The default remains 100.
2019-08-27 16:57:20 -05:00
Armin Braun fdef293c81 Fix RegressionTests#fromXContent (#46029)
* The `trainingPercent` must be between `1` and `100`, not `0` and `100` which is causing test failures
2019-08-27 18:24:26 +03:00
Dimitris Athanasiou 873ad3f942
[7.x][ML] Add option to regression to randomize training set (#45969) (#46017)
Adds a parameter `training_percent` to regression. The default
value is `100`. When the parameter is set to a value less than `100`,
from the rows that can be used for training (ie. those that have a
value for the dependent variable) we randomly choose whether to actually
use for training. This enables splitting the data into a training set and
the rest, usually called testing, validation or holdout set, which allows
for validating the model on data that have not been used for training.

Technically, the analytics process considers as training the data that
have a value for the dependent variable. Thus, when we decide a training
row is not going to be used for training, we simply clear the row's
dependent variable.
2019-08-27 17:53:11 +03:00
Yogesh Gaikwad 7b6246ec67
Add `manage_own_api_key` cluster privilege (#45897) (#46023)
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.

This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.

The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.

The changes cover HLRC changes, documentation for the API changes.

Closes #40031
2019-08-28 00:44:23 +10:00
Dimitris Athanasiou dd6c13fdf9
[ML] Add description to DF analytics (#45774) (#46019) 2019-08-27 15:48:59 +03:00
Albert Zaharovits 1ebee5bf9b
PKI realm authentication delegation (#45906)
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.

In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.

Closes #34396
2019-08-27 14:42:46 +03:00
Zachary Tong 943a016bb2
Add Cumulative Cardinality agg (and Data Science plugin) (#45990)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 16:19:55 -04:00