Commit Graph

6424 Commits

Author SHA1 Message Date
Alan Woodward 01950bc80f
Move FieldMapper#valueFetcher to MappedFieldType (#62974) (#63220)
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
2020-10-04 14:54:59 +01:00
Jason Tedor 1c136bb7fc
Add tier preference when mounting (#63204)
This commit adds a tier preference when mounting a searchable
snapshot. This sets a preference that a searchable snapshot is mounted
to a node with the cold role if one exists, then the warm role, then the
hot role, assuming that no other allocation rules are in place. This
means that by default, searchable snapshots are mounted to a node with
the cold role.

Note that depending on how we implement frozen functionality of
searchable snapshots (not pre-cached/not fully-cached), we might need to
adjust this to prefer frozen if mounting a not pre-cached/fully-cached
searchable snapshot versus mounting a pre-cached/fully-cached searchable
snapshot. This is a later concern since neither this nor the frozen role
are implemented currently.
2020-10-03 07:33:36 -04:00
Nhat Nguyen 4ef8673fdd Fix testRestartAfterCompletion (#63211)
We need to complete the search before closing the iterator, which 
internally closes the point in time; otherwise, the search will fail
with a missing context error.

Closes #62451
2020-10-02 18:14:42 -04:00
Martijn van Groningen 0b6e2b8f16
Fix enrich policy test bug.
Backport #63182 to 7.x branch.

The `randomEnrichPolicy(...)` helper method stores the policy and creates the source indices.
If a source index already exists, because it was creates for a random policy created earlier then
skipping the source index fails, but that is ignored and the test continues. However if the policy
has a match field that doesn't exist in the previous random policy then the mapping is never updated
and the put policy api fails with the fact that the match field can't be found.

This pr fixes that by execute a put mapping call in the event that the source index already exists.

Closes #63126
2020-10-02 19:34:39 +02:00
Benjamin Trent 752ee0288e
[7.x] [ML] optimize delete expired snapshots (#63134) (#63200)
* [ML] optimize delete expired snapshots (#63134)

When deleting expired snapshots, we do an individual delete action per snapshot per job.

We should instead gather the expired snapshots and delete them in a single call.

This commit achieves this and a side-effect is there is less audit log spam on nightly cleanup

closes https://github.com/elastic/elasticsearch/issues/62875
2020-10-02 13:24:36 -04:00
Marios Trivyzas 3cac996373
EQL: Fix syntax for event type (#63169) (#63194)
Event type is actually a string value for event.category which can
contain any kind of characters, or start with a digit, which currently
is not supported, so we introduce the possibility to be able to use the
usual syntax of " and """ for strings and raw strings.

Make the grammar a bit cleaner by using the identifier only where it's
actually an identifier in terms of query scemantics.

Fixes: #62933
(cherry picked from commit 306e1d76da3db652db57f11f847705b3995609ff)
2020-10-02 17:28:13 +02:00
markharwood bfb3071539
Wildcard field - add normalisation of ngram tokens to reduce disk space. (#63120) (#63193)
Adds normalisation of ngram tokens to reduce disk space.
All punctuation becomes / char and for A-Z0-9 chars turn even codepoints to prior odd e.g. aab becomes aaa

Closes #62817
2020-10-02 16:24:27 +01:00
Przemysław Witek 5370f270d7
[7.x] [ML] Ensure data frame analytics jobs don't run on a node that's too new (#62749) (#63175) 2020-10-02 17:19:58 +02:00
Marios Trivyzas 9cf0722fe6
SQL: Fix exception when using CAST on inexact field (#62943) (#63187)
Currently, CAST will use the first keyword subfield of a text field for
an expression in WHERE clause that gets translated to a painless script
which will lead to an exception thrown:
```
"root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:759)",
          "org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:116)",
          "org.elasticsearch.index.query.QueryShardContext.lambda$lookup$0(QueryShardContext.java:308)",
          "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:101)",
          "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:98)",
          "java.security.AccessController.doPrivileged(Native Method)",
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:98)",
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
          "org.elasticsearch.xpack.sql.expression.function.scalar.whitelist.InternalSqlScriptUtils.docValue(InternalSqlScriptUtils.java:79)",
          "InternalSqlScriptUtils.cast(InternalSqlScriptUtils.docValue(doc,params.v0),params.v1)",
          "                                                                      ^---- HERE"
        ],
        "script": "InternalSqlScriptUtils.cast(InternalSqlScriptUtils.docValue(doc,params.v0),params.v1)",
        "lang": "painless"
      }
    ],
```

Instead of allowing a painless translation using the first underlying
keyword silently, which can be confusing, we detect such usage and throw\
an error early.

Relates to #60178

(cherry picked from commit 7402e8267ba564e52dc672c25b262824b6048b40)
2020-10-02 16:42:59 +02:00
Joe Gallo d172a18c95 Tidy up some ILM and SLM packages (#63146)
Very minor refactoring, just moving some ILM and SLM classes around to decrease
the total number of packages.
2020-10-02 09:30:24 -04:00
Martijn van Groningen 300e525138
Fix querying a data stream name in _index field. (#63178)
Backport #63170 to 7.x branch.

The _index field is a special field that allows using queries against the name of an index or alias.
Data stream names were not included, this pr fixes that by changing SearchIndexNameMatcher
(which used via IndexFieldMapper) to also include data streams.
2020-10-02 15:29:20 +02:00
Armin Braun 1663dc7cf8
Fix GCS Repo Cleanup Tool Exception Handling (#63168)
We recently upgraded the SDK which resulted in the storage exception to be
wrapped now so we must unwrap to check for whether it's a 404 or not.

Closes #63091
2020-10-02 15:26:39 +02:00
Marios Trivyzas 7d74fb8577
EQL: Replace ?"..." with """...""" for unescaped strings (#62539) (#63174)
Use triple double quotes enclosing a string literal to interpret it
as unescaped, in order to use `?` for marking query params and avoid
user confusion. `?` also usually implies regex expressions.

Any character inside the `"""` beginning-closing markings is considered
raw and the only thing that is not permitted is the `"""` sequence itself.
If a user wants to use that, needs to resort to the normal `"` string literal
and use proper escaping.

Relates to #61659

(cherry picked from commit d87c2ca2eacab5552bca1e520d33cf71da40bcfd)
2020-10-02 14:58:50 +02:00
Benjamin Trent cfcf973259
[7.x] [ML] renames */inference* apis to */trained_models* (#63097) (#63136)
* [ML] renames */inference* apis to */trained_models* (#63097)

This commit renames all `inference` CRUD APIs to `trained_models`.

This aligns with internal terminology, documentation, and use-cases.
2020-10-02 07:34:28 -04:00
Benjamin Trent 535f8a434b
Revert "[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)" (#63144)
This reverts commit 95242eccee.
2020-10-02 07:03:15 -04:00
Luca Cavanna a42a516b67 Shorten runtime field type class names (#63123)
In the codebase there is the non-written convention that classes that extend `MappedFieldType` are generally called `*FieldType`. With this commit we adopt the same convention for runtime field types which allows us to shorten their names by removing the `Mapped` portion which is implicit.
2020-10-02 11:25:25 +02:00
Ioannis Kakavas e91f66e22f
Ensure domain_name setting for AD realm is present (#61983) (#63159)
We would only check for a null value and not for an empty string so
that meant that we were not actually enforcing this mandatory
setting. This commits ensures we check for both and fail 
accordingly if necessary, on startup
2020-10-02 12:16:08 +03:00
David Kyle 279f951700
[ML] Set parent task Id on ml expired data removers (#62854) (#62966)
Setting the parent task Id (of the delete expired data action) on the ML
expired data removers makes it easier to track and cancel long running
tasks
2020-10-02 10:14:10 +01:00
Christoph Büscher 4c7c540ca1 Update version field yml test skip version (#63139) 2020-10-02 10:01:27 +02:00
Costin Leau 614f4c13a5 EQL: Introduce case-sensitive equality (#63121)
Introduce : operator for doing case insensitive string comparisons.
Recognizes "*" for wildcard matches in string literals.
Restricted only to string types.

Relates #62941

(cherry picked from commit 201e577e65f26a9b958a6197fe6c7268da39de29)
2020-10-02 00:23:08 +03:00
Igor Motov fc13b72cea
Extract histogramFieldDocValues into an utility class (#63100) (#63148)
This function will be needed in the upcoming rate aggs tests.
2020-10-01 15:44:37 -04:00
Marios Trivyzas 3ad4b00c7e
EQL: Clean grammar from `fork` (#63094) (#63138)
Since `fork` is not used, is undocumented in Python EQL and
there is no plan at the moment to implement it in the future,
removing it  from the grammar. User will get parsing exceptions
instead of higher level messages about unsupported features
which can lead to wrong expectations.

(cherry picked from commit f6a0f8f01c1b1893bab86629d1de73e9f9dae8dc)
2020-10-01 21:14:41 +02:00
Lee Hinman f0f0da2188
[7.x] Add telemetry for data tiers (#63031) (#63140)
Backports the following commits to 7.x:

    Add telemetry for data tiers (#63031)
2020-10-01 12:37:32 -06:00
Benjamin Trent 95242eccee
[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)
This adds a new `baseline` field to the feature importance values. 

This field contains the baseline importance for a given feature and class.
2020-10-01 09:48:07 -04:00
Dimitris Athanasiou 46c3973400
[7.x][ML] Remove direct access to system index from filter_crud REST test (#63111) (#63115)
This test accesses system indices for 2 reasons.

First, it creates a filter that has a different type. This was done
to assert that filter is not returned from the APIs. However,
now that access to the `.ml-meta` index is restricted,
it is not really a concern.

Second, it creates a `.ml-meta` index without mappings to test
the get API does not fail due to lack of mappings on a sorted field,
namely the `filter_id`. Once again, this test is less useful once
system indices have restricted access.

Relates #62501

Backport of #63111
2020-10-01 15:15:34 +03:00
Costin Leau c2992ea287 EQL: Fix NPE from incorrect use of ids search (#63032)
This fixes a bug introduced when moving from mget to ids query. While
mget returns all the ids given, id query is a search query and thus by
default returns only 10 documents.
The fix correctly sets the expected size so all the information is
returned inside the response.

Fix #63030

(cherry picked from commit 09ba85548a0142a1fe8376efea9cc4e7764a207c)
2020-10-01 13:49:58 +03:00
Hendrik Muhs e001b4c021 [Transform] fix time rounding in TransformContinuousIT (#63113)
fix a time rounding problem in the test, due to rounding down to epoch
seconds instead of epoch millis

fixes #62951
2020-10-01 11:43:50 +02:00
Ignacio Vera ba5574935e
Remove dependency of Geometry queries with mapped type names (#63077) (#63110)
It extracts the query capabilities from AbstractGeometryFieldType into two new interfaces, GeoshapeQueryable and ShapeQueryable. Those interfaces are implemented by the final mappers.
2020-10-01 10:49:12 +02:00
Howard 8c6e197f51 Remove allocation id from engine (#62680)
We no longer need the allocation id in Engine.
2020-09-30 15:28:27 -04:00
Marios Trivyzas f69d268500
SQL: Allow skip of bwc tests on `check` task (#62936) (#63089)
Bwc tests can consume much time to build and to run so it's nice to be
able to skip them when running the `check` task on the SQL module.

Introduce a new task `checkNoBwc` so one can use:
```
./gradlew -p x-pack/plugin/sql checkNoBwc
```
to skip them.

(cherry picked from commit a52e1846f338f6869273181c6f248579581fa68c)
2020-09-30 20:03:19 +02:00
Marios Trivyzas 0ebaf8a3ec
EQL: Allow escaped backquote in identifiers (#62932) (#63082)
Previously, backquote couldn't not be used inside an escaped identifier,
e.g.:
```
`my`identifier` = "some_value"
```
was not allowed. Introduce escaping of the backquote by using a
double backquote:
```
`my``identifier` = "some_value"
```

(cherry picked from commit 49514121486f42a58674b3e5901de4021fda5c15)
2020-09-30 19:10:09 +02:00
Alan Woodward 675d18f6ea Convert dense/sparse vector field mappers to Parametrized form (#62992)
Also adds a proper MapperTestCase test for dense vectors.

Relates to #62988
2020-09-30 16:55:28 +01:00
Dimitris Athanasiou e09074d382
[7.x][ML] Fix online updates with custom rules referencing filters (#63057) (#63064)
When an opened anomaly detection job is updated with a detection
rule that references a filter, apart from updating the c++ process
with the rule, we also need to update it with the referenced filter.

This commit fixes a bug which led to the job not applying such updates
on-the-fly.

Fixes #62948

Backport of #63057
2020-09-30 16:01:06 +03:00
Costin Leau a6b903b783 EQL: Remove unused classes from reponse API (#62134)
Remove Count class and related artifacts since that functionality is not
(yet) available.
Update parser name for better error reporting.

Fix #62131

(cherry picked from commit 060f500346788c4c5d0b3b9c045facec5d677d3d)
2020-09-30 15:45:30 +03:00
Mayya Sharipova f221349593 Fix UnsignedLongTests test failure (#63056)
Test testSortDifferentFormatsShouldFail was occasionally failing for
2 reasons:
1) Documents on "idx2" were not available for search before a
search request started
2) Running a test multiple times was causing
occasional ResourceAlreadyExistsException for idx2,
as idx2 was not deleted for a test.

This patch makes the following fixes:
1) Sets up immediate refresh policy for docs in the index"idx2"
2) Creates an index idx2 only once per cluster

Closes: #62997
2020-09-30 08:41:31 -04:00
Yang Wang e31bef4032
Fix API key role descriptors rewrite bug for upgraded clusters (#62917) (#63042)
This PR ensures that API key role descriptors are always rewritten to a target node
compatible format before a request is sent.
2020-09-30 22:16:39 +10:00
Benjamin Trent 0860746bf2
[ML] changing ngram loop order for minor performance improvement (#63033) (#63059)
This is a very minor optimization but trivial to implement, so might as well. 

```
Benchmark                               (nGramStrs)  Mode  Cnt        Score        Error  Units
NGramProcessorBenchmark.ngramInnerLoop        1,2,3  avgt   20  4415092.443 ±  31302.115  ns/op
NGramProcessorBenchmark.ngramOuterLoop        1,2,3  avgt   20  4235550.340 ± 103393.465  ns/op
```

This measurement is in nanoseconds, consequently, the overall performance of inference is dominated by other factors (i.e. map#put). But, this optimization adds up overtime and is simple.
2020-09-30 07:51:31 -04:00
Benjamin Trent b7c47b1717
[ML] Add data frame analytics bwc testing (#63012)
This commit adds bwc testing for data frame analytics.

The bwc tests only go back to the 7.9.0.
Meaning, initially only rolling upgrades from 7.9.x -> 7.10.0 are tested.

Since the feature was experimental in < 7.9.0, this is acceptable.
2020-09-30 07:13:40 -04:00
Przemysław Witek 4366d58564
[7.x] [ML] Implement AucRoc metric for classification (#60502) (#63051) 2020-09-30 12:55:52 +02:00
Dimitris Athanasiou 179fe9cc0e
[7.x][ML] Delete dest index and reindex if incompatible (#62960) (#63050)
Data frame analytics results format changed in version `7.10.0`.
If existing jobs that were not completed are restarted, it is
possible the destination index had already been created. That index's
mappings are not suitable for the new results format.

This commit checks the version of the destination index and deletes
it when the version is outdated. The job will then continue by
recreating the destination index and reindexing.

Backport of #62960
2020-09-30 12:57:48 +03:00
Hendrik Muhs df93f46888 [Transform] fix issue in TransformIndexerStateTests.testStopAtCheckpoint (#63006)
fix a test issue by improving counting the number of times the deferred listener is called

fixes #62996
2020-09-30 08:54:45 +02:00
David Roberts 05427c2bb2
[ML] Add timeouts to named pipe connections (#63022)
This PR adds timeouts to the named pipe connections of the
autodetect, normalize and data_frame_analyzer processes.
This argument requires the changes of elastic/ml-cpp#1514 in
order to work, so that PR will be merged before this one.
(The controller process already had a different mechanism,
tied to the ES JVM lifetime.)

Backport of #62993
2020-09-29 18:04:02 +01:00
Costin Leau 3bee28056f EQL: Fix bug in sequences with any pattern (#63007)
Fix query creation inside sequences with any queries due to lacking a
clause to combine, which lead to an invalid request being created.

Fix #62967

(cherry picked from commit ff59d8823919a6e70928816e5c3687308ebde33f)
2020-09-29 18:19:25 +03:00
Benjamin Trent 0b3af242d4
[ML] fixing classification feature importance parsing (#63003) (#63015)
Classification feature importance supports various types in the class name:
- string
- boolean
- numerical

The xcontent parsing on the server side and the HLRC side should support and test these types.
2020-09-29 10:54:35 -04:00
Yang Wang 068f605040
Use compilation as validation for painless role template (#62845) (#63010)
* Use compilation as validation for painless role template (#62845)

Role template validation now performs only compilation if the script is painless.
It no longer attempts to execute the script with empty input which is problematic.
The compliation process will catch things like invalid syntax, undefined variables,
which still provide certain level of protection against ill-defined role templates.
Behaviour for Mustache script is unchanged.

* Checkstyle
2020-09-30 00:37:41 +10:00
Alan Woodward de08ba58bf Convert percolator, murmur3 and histogram mappers to parametrized form (#63004)
Relates to #62988
2020-09-29 14:42:26 +01:00
Dimitris Athanasiou facf9ede0a
[ML] Fix binary classification importance in LegacyFeatureImportanceTests (#63000)
Fixes #62991
2020-09-29 15:53:34 +03:00
Benjamin Trent 2b9032a07d
[7.x] [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976) (#62999)
* [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976)

This fixes the two test failures.

The shard failure seems to be due to the .ml-stats index being in the middle of being created.
2020-09-29 08:12:20 -04:00
Hendrik Muhs 154a0c00b7 [Transform] add debug logging to investigate #62951 (#62990) 2020-09-29 12:06:35 +02:00
Mayya Sharipova ca42726a99 Ensure consistent ordering of hits in test (#62977)
50_script_values/Script query fails sometimes
as resulting hits will be ordered differently from expected.
This patch ensures consisten ordering of hits.

Closes #62975
2020-09-29 06:00:34 -04:00
Armin Braun 678688dc84
Avoid Redundantly Loading Monitoring Templates on CS Applier Thread (#62913) (#62979)
This refactors the loading of monitoring templates slightly so that they aren't loaded over and
over again (from disk) on CS updates. This isn't an important optimization in production for obvious
reasons since it only affects the install stage, but this turned out to cause some slow CS applies
in tests.

Relates #62853
2020-09-29 11:45:22 +02:00
David Kyle f23603dafd
[ML][Transform] Filter null objects from field caps request (#62945) (#62971)
If the transform grouping is a script then exclude the field from the source index
mappings fields caps request. A null object caused an NPE in the serialisation of 
FieldCapabilitiesIndexRequest.
2020-09-29 09:07:01 +01:00
Dimitris Athanasiou 7f6c1ff5b4
[7.x][ML] Remove top level importance from classification inference results (#62486) (#62964)
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.

Backport of #62486
2020-09-29 10:58:48 +03:00
Mayya Sharipova 4c8c3c8df6
Upgrade lucene to lucene-8.7.0-snapshot-3b59906 (#62978)
Backport for #62970
2020-09-28 16:52:31 -04:00
Benjamin Trent a054e62bc4
[ML] allow datafeeds to run if there are any concrete indices (#62827) (#62965)
This commit allows a datafeed to be assigned to a node if only one index pattern has concrete indices.
2020-09-28 12:58:07 -04:00
Hendrik Muhs be5edcfb26 [Transform] fix possible NPE if transform task has no node assigned (#62946)
ignore transform tasks that do not have a node assigned when collecting
nodes to forward the request for _stop, _stats and _update

fixes #62847
2020-09-28 15:25:38 +02:00
Alan Woodward a3ba24123e Refactor PointParser to not take FieldMapper as a parameter (#62950)
Passing FieldMappers to point parsing functions makes trying to build source-only
fields from MappedFieldTypes more complicated. This small refactoring changes
things so that the relevant parsing and factory functions from
AbstractGeometryFieldMapper are instead passed as lambdas to the PointParser
constructor.
2020-09-28 13:45:13 +01:00
Costin Leau ef7a6ce4b2 EQL: Refactor testing infrastructure (#62928)
Extract reusable methods inside QL TestUtils
Rename abstract base classes for clarity
Clean-up EQL DataLoader

(cherry picked from commit 48db3f285aa8976ead5a9f5d071a9c1046d7bd31)
2020-09-28 14:22:56 +03:00
Hendrik Muhs b1a8437d0b
[7.x][Transform] Improve robustness when saving state (#62927)
refactor how state is persisted, call doSaveState only from the indexer thread, except there is none.

fixes #60781
fixes #52931
fixes #51629
fixes #52035
2020-09-28 10:12:51 +02:00
Tim Brooks 43a4882951
Move CorsHandler to server (#62007)
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
2020-09-24 16:32:59 -06:00
Mayya Sharipova 54064a1eec
Unsigned long 64bits(#62892)
Introduce 64-bit unsigned long field type

This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
  to double and can be imprecise for large values.

Backport for #60050
Closes #32434
2020-09-24 16:51:47 -04:00
Andrei Stefan a43f29cfc9
EQL: data streams tests for PIT and EQL sequences (#62850) (#62889)
* PIT should run well with data streams

(cherry picked from commit 0a89a7db848b015b797c7678874b5c9e33bbd650)
2020-09-24 23:37:46 +03:00
Alan Woodward e28750b001
Add parameter update and conflict tests to MapperTestCase (#62828) (#62902)
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.

Fixes #61631
2020-09-24 20:38:12 +01:00
Armin Braun 4b9ddb48b6
Add Missing Netty Runtime Proc Property to Security Tests (#62846) (#62890)
Same as in the normal Netty tests we have to disable the runtime proc
setting in the normal tests task just like we do for the internal cluster tests.

Closes #61919
Closes #62298
2020-09-24 20:48:38 +02:00
Jim Ferenczi 78a93dc18f
Request-level circuit breaker support on coordinating nodes (#62884)
This commit allows coordinating node to account the memory used to perform partial and final reduce of
aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save
and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final
reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown
if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced.
This estimation can be completely off for some aggregations but it is corrected with the real size after
the reduce completes.
If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations
and replace the estimation with the serialized size of the newly reduced result.

As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead
of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is
to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking
search request to a more sane number.

Closes #37182
2020-09-24 18:59:28 +02:00
Benjamin Trent c56424f740
[ML] write deprecation warning when include_model_definition parameter is used (#62834) (#62885)
for get trained models include_model_definition is now deprecated.

This commit writes a deprecation warning if that parameter is used and suggests the caller to utilize the replacement
2020-09-24 11:38:54 -04:00
Stuart Tettemer 8d69334c2f
Scripting: Watcher defaults to unlimited compile rate (#62655) (#62671)
Backport of #62655
2020-09-24 10:22:50 -05:00
Martijn van Groningen 8ca33feffd
Fail with correct error if first backing index exists when auto creating data stream (#62862)
Backport #62825 to 7.x branch.

Today if a data stream is auto created, but an index with same name as the
first backing index already exists then internally that error is ignored,
which then result that later in the execution of a bulk request, the
bulk item fails due to that the data stream hasn't been auto created.

This situation can only occur if an index with same is created that
will be the backing index of a data stream prior to the creation
of the data stream.

Co-authored-by: Dan Hermann <danhermann@users.noreply.github.com>
2020-09-24 17:16:34 +02:00
Armin Braun 83ec8dd4e2
Upgrade GCS SDK to 1.113.1 (#62848) (#62864)
Just staying on top of upgrades to the SDK and its dependencies.
2020-09-24 15:43:21 +02:00
Daniel Mitterdorfer d2166030d1
Mute failing test case in DeleteExpiredDataIT (#62870) (#62871)
Relates #62699
2020-09-24 15:42:52 +02:00
Andrei Dan e323c5245b
[7.x] ILM: migrate action configures the _tier_preference setting (#62829) (#62860)
The `migrate` action will now configure the
`index.routing.allocation.include._tier_preference` setting to the corresponding
tiers. For the HOT phase it will configure `data_hot`, for the WARM phase it will
configure `data_warm,data_hot` and for the COLD phase
`data_cold,data_warm,data_cold`.

(cherry picked from commit 9dbf0e6f0c267e40c5bcfb568bb2254da103ae40)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-24 13:37:09 +01:00
Rory Hunter 7771d8b6fa Tweak the ECS fields in DeprecatedMessage (#62855)
Backport of #62855. Follow-up to #61484.
2020-09-24 12:07:48 +01:00
Costin Leau 71b92f8699 QL: Optimize Like/Rlike all (#62682)
Replace common Like and RLike queries that match all characters with
IsNotNull (exists) queries

Fix #62585

(cherry picked from commit 4c23fad0468a9edd7325b06c6a96f7af37625dbf)
2020-09-24 13:44:53 +03:00
Martijn van Groningen 8d73379493
Adjust skip version in data stream yaml test. (#62831) (#62851)
Relates to #62766
2020-09-24 11:00:02 +02:00
Hendrik Muhs a70389015d [Transform] Return parsed count for get transform stats (#62809)
In case of more than 500 transforms, get and stats return paged results which can be requested using
page parameters. For >500 transforms count wasn't parsed out of the server response but taken from
size of the list of transforms.

The change also adds client/server hlrc tests and fixes a wrong type for count in get.

fixes #56245
2020-09-24 08:38:07 +02:00
Nhat Nguyen 38c8a55df8
Better UUID for reader context (#62799)
We can use a single and stronger UUID for all reader contexts
created by the same SearchService.

Backport of #62715
2020-09-23 12:50:18 -04:00
Martijn van Groningen 0baefc8ddc
Always validate that only a create op is allowed in bulk api for data streams (#62820)
Backport #62766 to 7.x branch.

The bulk api cache the resolved concrete indices when resolving the user provided
index name into the actual index name. The validation that prevents write ops other
than create from being executed in a data stream was only performed if the result
wasn't cached. In case of cached resolvings, the validation never occurs.

The validation would be skipped for all bulk items for a data stream after a create
operation for that same data stream. This commit ensures that the validation is always
performed for all bulk items (whether the concrete index resolution has been cached or
not cached).

Closes #62762
2020-09-23 16:27:54 +02:00
Dimitris Athanasiou 7de5201291
[7.x][ML] Handle data frame analytics state spreading over multiple docs (#62564) (#62824)
When state persistence was first implemented for data frame analytics
we had the assumption that state would always fit in a single document.
However this is not the case any more.

This commit adds handling of state that spreads over multiple documents.

Backport of #62564
2020-09-23 16:16:34 +03:00
James Rodewig e3d5915566 [DOCS] Fix JSON spec linnk for PIT API (#61783) 2020-09-23 14:29:06 +02:00
Dimitris Athanasiou 69e72656fa
[7.x][ML] Reset reindexing progress when DFA job resumes with incomplete reindexing (#62772) (#62816)
This fixes reindexing progress in the scenario when a DFA job that had not finished
reindexing is resumed (either because the user called stop and start or because the
job was reassigned in the middle of reindexing). Before the fix reindexing progress
stays to the value it had reached before until it surpasses that value.

When we resume a data frame analytics job we want to preserve reindexing progress
and reset all other phases. Except for when reindexing was not completed.
In that case we are deleting the destination index and starting reindexing
from scratch. Thus we need to reset reindexing progress too.

Backport of #62772
2020-09-23 14:09:04 +03:00
Christoph Büscher 054a950ceb Align version field plugin naming (#62757)
To better align the plugin naming with other mapper plugins under x-pack (e.g.
mapper-flattened) this PR changes the plugin name and the containing directory
to "mapper-version"
2020-09-23 11:50:15 +02:00
Christoph Büscher 29074e7055
Add case insensitive prefix and wildcard to 'version' field (#62754) (#62782)
This change adds support for the recently introduced case insensitivity flag for
wildcard and prefix queries. Since version field values are encoded differently we
need to adapt our own AutomatonQuery variation to add both cases if case insensitivity
is turned on.
2020-09-23 11:48:34 +02:00
Luca Cavanna 862fab06d3
Share same existsQuery impl throughout mappers (#57607)
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.

There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.

This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.

At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
2020-09-23 11:00:53 +02:00
David Kyle bc34ecc581
[ML] Mute annotations index upgrade mapping test (#62814)
For #61908
2020-09-23 09:37:04 +01:00
Luca Cavanna 5ca86d541c
Move stored flag from TextSearchInfo to MappedFieldType (#62717) (#62770) 2020-09-23 09:40:34 +02:00
Albert Zaharovits b4ec821067
Fix doc-update interceptor for indices with DLS and FLS (#61516)
This fixes the protection against updates (and bulk updates) for indices with DLS
and/or FLS, when the request uses date math expressions.
2020-09-23 08:55:22 +03:00
Nhat Nguyen 663b85b98f Make keep alive optional in PointInTimeBuilder (#62720)
Remove the keepAlive parameter from the constructor of PointInTimeBuilder
as it's optional.
2020-09-22 18:52:54 -04:00
Nik Everett fa13585fae
Fix Eclipse build (#62733) (#62786)
Eclipse was confused for two reasons:
1. `:x-pack:plugin` depended on itself.
2. `ql`, `sql`, and `eql` couldn't see some methods.

I fixed problem 1 by only adding the "depends on itself" configuration
outside of eclipse. I fixed problem 2 by making a `test` sub-project in
`ql` that contains test utilities and depending on those where possible.
2020-09-22 17:44:25 -04:00
Jay Modi cb1dc5260f
Dedicated threadpool for system index writes (#62792)
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.

Backport of #61655
2020-09-22 15:31:38 -06:00
Benjamin Trent 77bfb32635
[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) (#62784)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)

* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
 global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.

This commit addresses this issue for the ML plugin

* unmuting test
2020-09-22 15:07:08 -04:00
Marios Trivyzas 1e72144847
EQL: Remove support for `=` for comparisons (#62756) (#62775)
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.

Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
2020-09-22 20:56:04 +02:00
Nik Everett 39a617773d
Raname grok's built-in patterns (backport of #62735) (#62765)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
2020-09-22 13:06:43 -04:00
Lisa Cawley 7e97f17845 [DOCS] Add SLM security privileges (#62737) 2020-09-22 08:44:18 -07:00
James Rodewig c0e611e0a7
[DOCS] Fix typo: NamedID -> NameID (#62721) (#62767)
Co-authored-by: Greg Back <1045796+gtback@users.noreply.github.com>
2020-09-22 10:30:35 -04:00
markharwood a0df0fb074
Search - add case insensitive flag for "term" family of queries #61596 (#62661)
Backport of fe9145f

Closes #61546
2020-09-22 13:56:51 +01:00
Andrei Dan 0be89bcd7f
Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (#62763) 2020-09-22 13:43:15 +01:00
David Kyle 31fbc6800f
[7.x] [ML] Add upgrade mappings assertions to full cluster restart tests (#62293) (#62305)
Refactors the index mapping checks in the rolling upgrade tests
and use that shared code in the full cluster restart tests.
2020-09-22 13:09:51 +01:00
Luca Cavanna 9ae29713fd
Dense vector field type minor fixes (#62631)
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).

This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
2020-09-22 10:40:51 +02:00
Christoph Büscher 593511e5c9
VersionFieldIT should register transportClientPlugins (#62734) 2020-09-22 10:10:44 +02:00
Yang Wang 28503f04f7
Fix privilege requirement for CCS with Point In Time reader (#62261) (#62696)
When target indices are remote only, CCS does not require user to have privileges on the local cluster. This PR ensure Point-In-Time reader follows the same pattern.

Relates: #61827
2020-09-22 12:51:51 +10:00