Commit Graph

5565 Commits

Author SHA1 Message Date
Christoph Büscher 29074e7055
Add case insensitive prefix and wildcard to 'version' field (#62754) (#62782)
This change adds support for the recently introduced case insensitivity flag for
wildcard and prefix queries. Since version field values are encoded differently we
need to adapt our own AutomatonQuery variation to add both cases if case insensitivity
is turned on.
2020-09-23 11:48:34 +02:00
Luca Cavanna 862fab06d3
Share same existsQuery impl throughout mappers (#57607)
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.

There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.

This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.

At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
2020-09-23 11:00:53 +02:00
Luca Cavanna 5ca86d541c
Move stored flag from TextSearchInfo to MappedFieldType (#62717) (#62770) 2020-09-23 09:40:34 +02:00
Albert Zaharovits b4ec821067
Fix doc-update interceptor for indices with DLS and FLS (#61516)
This fixes the protection against updates (and bulk updates) for indices with DLS
and/or FLS, when the request uses date math expressions.
2020-09-23 08:55:22 +03:00
Nhat Nguyen 663b85b98f Make keep alive optional in PointInTimeBuilder (#62720)
Remove the keepAlive parameter from the constructor of PointInTimeBuilder
as it's optional.
2020-09-22 18:52:54 -04:00
Nik Everett fa13585fae
Fix Eclipse build (#62733) (#62786)
Eclipse was confused for two reasons:
1. `:x-pack:plugin` depended on itself.
2. `ql`, `sql`, and `eql` couldn't see some methods.

I fixed problem 1 by only adding the "depends on itself" configuration
outside of eclipse. I fixed problem 2 by making a `test` sub-project in
`ql` that contains test utilities and depending on those where possible.
2020-09-22 17:44:25 -04:00
Jay Modi cb1dc5260f
Dedicated threadpool for system index writes (#62792)
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.

Backport of #61655
2020-09-22 15:31:38 -06:00
Benjamin Trent 77bfb32635
[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) (#62784)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)

* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
 global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.

This commit addresses this issue for the ML plugin

* unmuting test
2020-09-22 15:07:08 -04:00
Marios Trivyzas 1e72144847
EQL: Remove support for `=` for comparisons (#62756) (#62775)
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.

Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
2020-09-22 20:56:04 +02:00
Nik Everett 39a617773d
Raname grok's built-in patterns (backport of #62735) (#62765)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
2020-09-22 13:06:43 -04:00
markharwood a0df0fb074
Search - add case insensitive flag for "term" family of queries #61596 (#62661)
Backport of fe9145f

Closes #61546
2020-09-22 13:56:51 +01:00
Andrei Dan 0be89bcd7f
Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (#62763) 2020-09-22 13:43:15 +01:00
Luca Cavanna 9ae29713fd
Dense vector field type minor fixes (#62631)
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).

This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
2020-09-22 10:40:51 +02:00
Christoph Büscher 593511e5c9
VersionFieldIT should register transportClientPlugins (#62734) 2020-09-22 10:10:44 +02:00
Yang Wang 28503f04f7
Fix privilege requirement for CCS with Point In Time reader (#62261) (#62696)
When target indices are remote only, CCS does not require user to have privileges on the local cluster. This PR ensure Point-In-Time reader follows the same pattern.

Relates: #61827
2020-09-22 12:51:51 +10:00
Yang Wang 897d2e8a02
Fix ccs permission for search with a scroll id (#62053) (#62695)
CCS with remote indices only does not require any privileges on the local cluster.
This PR ensures that search with scroll follow the permission model.
2020-09-22 11:49:40 +10:00
Andrei Dan 79d0c4ed18
ILM: allow check-migration step to continue if tier setting unset (#62636) (#62724)
This allows the `check-migration` step to move past the allocation check
if the tier routing settings are manually unset.

This helps a user unblock ILM in case a tier is removed (ie. if the warm tier
is decommissioned this will allow users to resume the ILM policies stuck in
`check-migration` waiting for the warm nodes to become available and the managed
index to allocate. this allows the index to allocate on the other available tiers)

(cherry picked from commit d7a1eaa7f51d0972d10c0df1d3cd77d6b755dd41)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-21 20:40:01 +01:00
Marios Trivyzas 1f612cccbb
SQL: Implement FORMAT function (#55454) (#62701)
Implement FORMAT according to the SQL Server spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver15#ExampleD by translating to the java.time patterns used in DATETIME_FORMAT.

Closes: #54965

Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit da511f4e033db6e8a6aa2a54b23e906b5e026845)
2020-09-21 19:22:04 +02:00
Alan Woodward 1dde4983f6 Convert ConstantKeywordFieldMapper to parametrized form (#62688)
As part of the conversion, adds the ability to customize merge validation - in this case, we
allow an update to the constant value if it is currently set to null, but refuse further
updates once it has been set once.

This commit also converts ParametrizedMapperTests to use MapperServiceTestCase.
2020-09-21 15:22:56 +01:00
David Roberts 4537561692 [TEST] Mute IPFilterTests.testThatNodeStartsWithIPFilterDisabled
Due to https://github.com/elastic/elasticsearch/issues/62298
2020-09-21 14:49:43 +01:00
Christoph Büscher d6e977f87b Muting VersionFieldIT.testTermsAggregation 2020-09-21 15:46:48 +02:00
Christoph Büscher 803f78ef05
Add field type for version strings (#59773) (#62692)
This PR adds a new 'version' field type that allows indexing string values
representing software versions similar to the ones defined in the Semantic
Versioning definition (semver.org). The field behaves very similar to a
'keyword' field but allows efficient sorting and range queries that take into
accound the special ordering needed for version strings. For example, the main
version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't
be possible with 'keyword' fields today.

Valid version values are similar to the Semantic Versioning definition, with the
notable exception that in addition to the "main" version consiting of
major.minor.patch, we allow less or more than three numeric identifiers, i.e.
"1.2" or "1.4.6.123.12" are treated as valid too.

Relates to #48878
2020-09-21 14:25:42 +02:00
Tanguy Leroux f775cf8594
Add test for snapshot incrementality of snapshot-backed indices (#62641)
This commit adds a test that verifies that snapshots incrementality 
is respected when a snapshot-backed index is snapshotted. This 
test mounts a snapshot as a snapshot-backed index, creates a 
new snapshot from it and then verifies that no new data blobs 
were added to the repository.
2020-09-21 12:06:47 +02:00
Christos Soulios ad79a2b6a1
[7.x] Histogram field type support for min/max aggregations (#62689)
Implement min/max aggregations for histogram fields.

Backports #62532
2020-09-21 12:53:56 +03:00
Henning Andersen 2eeb1bddde
Autoscaling decision return absolute capacity (#61575) (#62670)
The autoscaling decision API now returns an absolute capacity,
and leaves the actual decision of whether a scale up or down
is needed to the orchestration system.

The decision API now returns both a tier and node level required
and current capacity as wells as a decider level breakdown of the
same though with in particular current memory still not populated.
2020-09-19 09:05:23 +02:00
Luca Cavanna 1580fc70bd
Minor FlatObjectFieldMapper fix (#62633)
The splitQueriesOnWhitespace instance field can be made final, and setter and getter are not always needed.
2020-09-19 00:24:44 +02:00
William Brafford 8aeab0bec9
Add refresh policy to logstash plugin write requests (#62583) (#62665) 2020-09-18 17:44:53 -04:00
Lee Hinman 4a08928c47
[7.x] Add index.routing.allocation.include._tier_preference setting (#62589) (#62667)
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:

```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```

If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.

This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.

Subsequent work will change the ILM migration to make additional use of this setting.

Relates to #60848
2020-09-18 15:41:36 -06:00
Christos Soulios 6a298970fd
[7.x] Allow metadata fields in the _source (#62616)
Backports #61590 to 7.x

    So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set

    This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
    This way each metadata field can configure if it can be included in the document _source
2020-09-18 19:56:41 +03:00
Benjamin Trent 0f142c6afc
[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563) (#62629)
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.

- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
2020-09-18 11:06:07 -04:00
Benjamin Trent e163559e4c
[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) (#62620)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
2020-09-18 10:07:35 -04:00
Nhat Nguyen 8bea6b3711 Increase keep alive of point in time in async search tests (#62593)
Async search tests can take more than one minute due to the excessive trace logs. 
And the point in time in the tests can be expired the midway.

Closes #62451
2020-09-18 08:01:57 -04:00
Martijn van Groningen 4c949b0869
Adjust allowed warnings in data stream yaml test. (#62610) 2020-09-18 12:44:57 +02:00
Marios Trivyzas b072de4ce0
EQL: Disallow chained comparisons (#62567) (#62601)
Expressions like `1 = 2 = 3 = 4` or `1 < 2 = 3 >= 4` were treated with
leftmost priority: ((1 = 2) = 3) = 4 which can lead to confusing
results. Since such expressions don't make so much change for EQL
filters we disallow them in the parser to prevent unexpected results
from their bad usage.

Major DBs like PostgreSQL and Oracle also disallow them in their SQL
syntax. (counter example would be MySQL which interprets them as we did
before with leftmost priority).

Fixes: #61654
(cherry picked from commit 8f94981bb093f104228d267b532e0a3d5b7f6a38)
2020-09-18 10:48:14 +02:00
Costin Leau 81f2f84177 EQL: Allow requests with size 0 (#62537)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.

Fix #62494

(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
2020-09-18 11:24:39 +03:00
Martijn van Groningen 5190b0961d
adjust skip reason 2020-09-18 10:10:00 +02:00
Adrien Grand 4de8579455
Upgrade to lucene-8.7.0-snapshot-830bd186a8d. (#62596) 2020-09-18 09:51:34 +02:00
Martijn van Groningen c83d8ce78e
Adjust skip version data stream test (#62597)
after #62527 was backported.
2020-09-18 09:28:16 +02:00
David Turner 7324ee1044 Remove unused upgrade actions (#62552)
These actions were almost completely removed in #40075 but a couple of
classes were left in place. This commit completes their removal.
2020-09-18 08:16:13 +01:00
Ignacio Vera 6a3d731be1
Only call reduce on a single InternalAggregation when needed (#62525) (#62594)
Adds a new abstract method in InternalAggregation that flags the framework if it needs to reduce on a single InternalAggregation.
2020-09-18 08:43:58 +02:00
Jake Landis 5b7246157f
[7.x] Fix projects that failed to build within Intellij (#62258) (#62408)
This commit address some build failures from the perspective of Intellij.
These changes include:
* changing an order of a dependency definition that seems to can cause Intellij build to fail.
* introduction of an abstract class out of the test source set (seems to be an issue sharing 
  classes cross projects with non-standard source sets. 
* a couple of missing dependency definitions (not sure how the command line worked prior to this)
2020-09-17 17:45:12 -05:00
William Brafford b764f8977e
Copy Key Certs for javaRestTest (#62584) 2020-09-17 17:45:42 -04:00
Dimitris Athanasiou 7118ff7976
[7.x][ML] Remove model snapshot legacy doc ids (#62434) (#62569)
Removes methods that were no longer used regarding version 5.4 doc ids of ModelState.

Also adds clean up of 5.4 model state and quantile docs in the daily maintenance.

Backport of #62434
2020-09-17 23:43:28 +03:00
Lee Hinman 9bb7ce0b22
[7.x] Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338) (#62557)
Backports the following commits to 7.x:

    Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338)
2020-09-17 13:29:23 -06:00
William Brafford 5a0dca2491
Deprecate xpack.eql.enabled setting and make it a no-op (#61375) (#62491)
* Deprecate xpack.eql.enabled and make it a no-op
* Remove uses of xpack.eql.enabled
2020-09-17 14:17:27 -04:00
Martijn van Groningen 5f643433c6
Prohibit the usage of create index api in namespaces managed by data stream templates (#62574)
Backport of #62527 to 7.x branch.

This commit adds validation that prohibits the creation of regular indices
in the namespace of templates with data streams enabled.

It shouldn't be possible to create ordinary indices when the name of the index
matches with a composable index template that enables data streams. Auto creation
has logic that creates data streams instead of regular indices. However validation
logic for the create index api was missing.
2020-09-17 20:10:42 +02:00
Jim Ferenczi df93b31b15
Faster sequential access for stored fields (#62509) (#62573)
Faster sequential access for stored fields

Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:

Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:

|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |
Closes #62024
2020-09-17 19:58:18 +02:00
Andrei Dan 3753682877
Fix AllocationRoutedStep equals and hashcode (#62548) (#62559)
(cherry picked from commit 79039e16305c7fb71ee012e693219a0d2b77e97b)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-17 17:40:21 +01:00
Alan Woodward 5421a743a7 Move SearchLookup into FetchContext (#62549)
FetchSubPhase#getProcessor currently takes a SearchLookup parameter. This
however is only needed by a couple of subphases, and will almost certainly change in
future as we want to simplify how fetch phases retrieve values for individual hits.

To future-proof against further signature changes, this commit moves the SearchLookup
reference into FetchContext instead.
2020-09-17 17:39:02 +01:00
Dimitris Athanasiou f5c28e2054
[7.x][ML] Do not start data frame analytics when too many docs are analyzed (#62547) (#62558)
The data frame structure in c++ has a limit on 2^32 documents. This commit
adds a check that the number of documents involved in the analysis are
less than that and fails to start otherwise. That saves the cost of
reindexing when it is unnecessary.

Backport of #62547
2020-09-17 19:06:38 +03:00