Commit Graph

5414 Commits

Author SHA1 Message Date
Mayya Sharipova 4c8c3c8df6
Upgrade lucene to lucene-8.7.0-snapshot-3b59906 (#62978)
Backport for #62970
2020-09-28 16:52:31 -04:00
Armin Braun 2247ab3295
Make TransportNodesAction finishHim Execute on Configured Executor (#62753) (#62955)
Currently, `finishHim` can either execute on the specified executor
(in the less likely case that the local node request is the last to arrive)
or on a transport thread.
In case of e.g. `org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction`
this leads to an expensive execution that deserializes all mapping metadata in the cluster
running on the transport thread and destabilizing the cluster. In case of this transport
action it was specifically moved to the `MANAGEMENT` thread to avoid the high cost of processing
the stats requests on the nodes during fan-out but that did not cover the final execution
on the node that received the initial request. This PR  adds to ability to optionally specify the executor for the final step of the
nodes request execution and uses that to work around the issue for the slow `TransportClusterStatsAction`.

Note: the specific problem that motivated this PR is essentially the same as https://github.com/elastic/elasticsearch/pull/57937 where we moved the execution off the transport and on the management thread as a fix as well.
2020-09-28 18:35:35 +02:00
Alan Woodward a3ba24123e Refactor PointParser to not take FieldMapper as a parameter (#62950)
Passing FieldMappers to point parsing functions makes trying to build source-only
fields from MappedFieldTypes more complicated. This small refactoring changes
things so that the relevant parsing and factory functions from
AbstractGeometryFieldMapper are instead passed as lambdas to the PointParser
constructor.
2020-09-28 13:45:13 +01:00
Hendrik Muhs 4d43fa8816 Make Noderesolver robust against null values (#62893)
make node resolving more robust by ignoring null values. This is a bug in
the usage of this class, however you don't want NPE's in prod. The root cause
might be a corner case. Because silencing the root cause is bad, the assert
causes a fail if assertions are enabled

relates #62847
2020-09-28 13:31:21 +02:00
Armin Braun 21e534e0e6
Fix RareClusterStateIT Publication Cancel (#62662) (#62914)
We have to make sure the applier and not the accept state versions allign here.
Otherwise we can get into the situation where the data node is so slow to process
one version that the next one arrives, gets rejected and the request return with
ack `false` and we fail the assertion that the put mapping request didn't complete.

Closes #62446
2020-09-25 21:57:55 +02:00
Tim Brooks 43a4882951
Move CorsHandler to server (#62007)
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
2020-09-24 16:32:59 -06:00
Mayya Sharipova 54064a1eec
Unsigned long 64bits(#62892)
Introduce 64-bit unsigned long field type

This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
  to double and can be imprecise for large values.

Backport for #60050
Closes #32434
2020-09-24 16:51:47 -04:00
Alan Woodward e28750b001
Add parameter update and conflict tests to MapperTestCase (#62828) (#62902)
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.

Fixes #61631
2020-09-24 20:38:12 +01:00
Jim Ferenczi 78a93dc18f
Request-level circuit breaker support on coordinating nodes (#62884)
This commit allows coordinating node to account the memory used to perform partial and final reduce of
aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save
and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final
reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown
if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced.
This estimation can be completely off for some aggregations but it is corrected with the real size after
the reduce completes.
If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations
and replace the estimation with the serialized size of the newly reduced result.

As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead
of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is
to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking
search request to a more sane number.

Closes #37182
2020-09-24 18:59:28 +02:00
Dan Hermann cd584d49dc Bump version after 7.9.2 release 2020-09-24 10:48:57 -05:00
Martijn van Groningen 8ca33feffd
Fail with correct error if first backing index exists when auto creating data stream (#62862)
Backport #62825 to 7.x branch.

Today if a data stream is auto created, but an index with same name as the
first backing index already exists then internally that error is ignored,
which then result that later in the execution of a bulk request, the
bulk item fails due to that the data stream hasn't been auto created.

This situation can only occur if an index with same is created that
will be the backing index of a data stream prior to the creation
of the data stream.

Co-authored-by: Dan Hermann <danhermann@users.noreply.github.com>
2020-09-24 17:16:34 +02:00
Nik Everett ce24115ba3
Speed up date_histogram by precomputing ranges (backport of #61467) (#62880)
A few of us were talking about ways to speed up the `date_histogram`
using the index for the timestamp rather than the doc values. To do that
we'd have to pre-compute all of the "round down" points in the index. It
turns out that *just* precomputing those values speeds up rounding
fairly significantly:
```
Benchmark  (count)      (interval)                   (range)            (zone)  Mode  Cnt          Score         Error  Units
before    10000000  calendar month  2000-10-28 to 2000-10-31               UTC  avgt   10   96461080.982 ±  616373.011  ns/op
before    10000000  calendar month  2000-10-28 to 2000-10-31  America/New_York  avgt   10  130598950.850 ± 1249189.867  ns/op
after     10000000  calendar month  2000-10-28 to 2000-10-31               UTC  avgt   10   52311775.080 ±  107171.092  ns/op
after     10000000  calendar month  2000-10-28 to 2000-10-31  America/New_York  avgt   10   54800134.968 ±  373844.796  ns/op
```

That's a 46% speed up when there isn't a time zone and a 58% speed up
when there is.

This doesn't work for every time zone, specifically those that have two
midnights in a single day due to daylight savings time will produce wonky
results. So they don't get the optimization.

Second, this requires a few expensive computation up front to make the
transition array. And if the transition array is too large then we give
up and use the original mechanism, throwing away all of the work we did
to build the array. This seems appropriate for most usages of `round`,
but this change uses it for *all* usages of `round`. That seems ok for
now, but it might be worth investigating in a follow up.

I ran a macrobenchmark as well which showed an 11% preformance
improvement. *BUT* the benchmark wasn't tuned for my desktop so it
overwhelmed it and might have produced "funny" results. I think it is
pretty clear that this is an improvement, but know the measurement is
weird:

```
Benchmark  (count)      (interval)                   (range)            (zone)  Mode  Cnt          Score         Error  Units
before    10000000  calendar month  2000-10-28 to 2000-10-31               UTC  avgt   10   96461080.982 ±  616373.011  ns/op
before    10000000  calendar month  2000-10-28 to 2000-10-31  America/New_York  avgt   10  g± 1249189.867  ns/op
after     10000000  calendar month  2000-10-28 to 2000-10-31               UTC  avgt   10   52311775.080 ±  107171.092  ns/op
after     10000000  calendar month  2000-10-28 to 2000-10-31  America/New_York  avgt   10   54800134.968 ±  373844.796  ns/op

Before:
|               Min Throughput | hourly_agg |        0.11 |  ops/s |
|            Median Throughput | hourly_agg |        0.11 |  ops/s |
|               Max Throughput | hourly_agg |        0.11 |  ops/s |
|      50th percentile latency | hourly_agg |      650623 |     ms |
|      90th percentile latency | hourly_agg |      821478 |     ms |
|      99th percentile latency | hourly_agg |      859780 |     ms |
|     100th percentile latency | hourly_agg |      864030 |     ms |
| 50th percentile service time | hourly_agg |     9268.71 |     ms |
| 90th percentile service time | hourly_agg |        9380 |     ms |
| 99th percentile service time | hourly_agg |     9626.88 |     ms |
|100th percentile service time | hourly_agg |     9884.27 |     ms |
|                   error rate | hourly_agg |           0 |      % |

After:
|               Min Throughput | hourly_agg |        0.12 |  ops/s |
|            Median Throughput | hourly_agg |        0.12 |  ops/s |
|               Max Throughput | hourly_agg |        0.12 |  ops/s |
|      50th percentile latency | hourly_agg |      519254 |     ms |
|      90th percentile latency | hourly_agg |      653099 |     ms |
|      99th percentile latency | hourly_agg |      683276 |     ms |
|     100th percentile latency | hourly_agg |      686611 |     ms |
| 50th percentile service time | hourly_agg |     8371.41 |     ms |
| 90th percentile service time | hourly_agg |     8407.02 |     ms |
| 99th percentile service time | hourly_agg |     8536.64 |     ms |
|100th percentile service time | hourly_agg |     8538.54 |     ms |
|                   error rate | hourly_agg |           0 |      % |
```
2020-09-24 11:03:47 -04:00
Daniel Mitterdorfer 00ce1d7e4b
Mute failing test in IndexRecoveryIT (#62865) (#62868)
Relates #62863
2020-09-24 15:16:40 +02:00
Daniel Mitterdorfer aec7c65af4
Mute DiskThresholdDeciderIT (#62858) (#62859)
Relates #62326
2020-09-24 13:24:11 +02:00
Julie Tibshirani f971146de4
Rename FieldValueRetriever -> FieldFetcher. (#62795) (#62836)
The name `FieldFetcher` fits better with the 'fetch' terminology we use
elsewhere, for example `FetchFieldsPhase` and `ValueFetcher`.

This PR also moves the construction of the fetcher off the context and onto
`FetchFieldsPhase`, which feels like a more natural place for it, and fixes a
TODO in javadocs.
2020-09-23 10:12:23 -07:00
Nhat Nguyen 38c8a55df8
Better UUID for reader context (#62799)
We can use a single and stronger UUID for all reader contexts
created by the same SearchService.

Backport of #62715
2020-09-23 12:50:18 -04:00
Julie Tibshirani 7ba0c95191 Mute ClusterHealthIT.testHealthOnMasterFailover while we await a fix. 2020-09-23 09:17:45 -07:00
Alan Woodward 7984e4e89f
Fix test bug in SpanMultiTermQueryBuilderTests (#62833)
This test checks to see if the index has been created before version 6.4, in which
case index prefixes are unavailable and so it expects to see a span multi-term
wrapper. However, the production code doesn't bother with checking for versions,
because if the field in question is configured with index_prefixes then it knows that
it must have been created post 6.4 (you can't merge in a new index_prefixes
configuration).

This commit alters the test to remove the random version checks, as we know we
will always have a prefix field available in this scenario.

Fixes #58199
2020-09-23 17:02:12 +01:00
Martijn van Groningen 0baefc8ddc
Always validate that only a create op is allowed in bulk api for data streams (#62820)
Backport #62766 to 7.x branch.

The bulk api cache the resolved concrete indices when resolving the user provided
index name into the actual index name. The validation that prevents write ops other
than create from being executed in a data stream was only performed if the result
wasn't cached. In case of cached resolvings, the validation never occurs.

The validation would be skipped for all bulk items for a data stream after a create
operation for that same data stream. This commit ensures that the validation is always
performed for all bulk items (whether the concrete index resolution has been cached or
not cached).

Closes #62762
2020-09-23 16:27:54 +02:00
Armin Braun a754fd8020
Fix CoordinatorTests.testLogsMessagesIfPublicationDelayed (#62815) (#62822)
We need to account for an addional `DEFAULT_DELAY_VARIABILITY` timeout for
the lag detector task to be executed after its scheduled.

Closes #62383
2020-09-23 14:23:28 +02:00
Christoph Büscher 29074e7055
Add case insensitive prefix and wildcard to 'version' field (#62754) (#62782)
This change adds support for the recently introduced case insensitivity flag for
wildcard and prefix queries. Since version field values are encoded differently we
need to adapt our own AutomatonQuery variation to add both cases if case insensitivity
is turned on.
2020-09-23 11:48:34 +02:00
Ignacio Vera 81645ec2cc
nextSetBit should check if the underlaying array contains the current word (#62805) (#62812)
This is a recent addition and it is missing a check as the underlaying array can be smaller that the numBits capacity.
2020-09-23 11:17:26 +02:00
Luca Cavanna 862fab06d3
Share same existsQuery impl throughout mappers (#57607)
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.

There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.

This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.

At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
2020-09-23 11:00:53 +02:00
Luca Cavanna 5ca86d541c
Move stored flag from TextSearchInfo to MappedFieldType (#62717) (#62770) 2020-09-23 09:40:34 +02:00
Nhat Nguyen 663b85b98f Make keep alive optional in PointInTimeBuilder (#62720)
Remove the keepAlive parameter from the constructor of PointInTimeBuilder
as it's optional.
2020-09-22 18:52:54 -04:00
Jay Modi cb1dc5260f
Dedicated threadpool for system index writes (#62792)
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.

Backport of #61655
2020-09-22 15:31:38 -06:00
Benjamin Trent 77bfb32635
[7.x] [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694) (#62784)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)

* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
 global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.

This commit addresses this issue for the ML plugin

* unmuting test
2020-09-22 15:07:08 -04:00
Rory Hunter 3f856d1c81 Prioritise recovery of system index shards (#62640)
Closes #61660. When ordering shard for recovery, ensure system index shards are
ordered first so that their recovery will be started first.

Note that I rewrote PriorityComparatorTests to use IndexMetadata instead of its
local IndexMeta POJO.
2020-09-22 15:48:27 +01:00
markharwood a0df0fb074
Search - add case insensitive flag for "term" family of queries #61596 (#62661)
Backport of fe9145f

Closes #61546
2020-09-22 13:56:51 +01:00
Armin Braun 0d5250c99b
Add Trace Logging to File Restore (#62755) (#62761)
Requested by the performance team and generally potentially useful
to log each file at `TRACE` like we do for snapshot create.
2020-09-22 14:44:40 +02:00
Amogh Mishra bc6bea5924 Remove node from cluster when node locks broken (#61400)
In #52680 we introduced a mechanism that will allow nodes to remove
themselves from the cluster if they locally determine themselves to be
unhealthy. The only check today is that their data paths are all
empirically writeable. This commit extends this check to consider a
failure of `NodeEnvironment#assertEnvIsLocked()` to be an indication of
unhealthiness.

Closes #58373
2020-09-22 10:08:41 +01:00
Armin Braun aa0dc56412
Ensure MockRepository is Unblocked on Node Close (#62711) (#62748)
`RepositoriesService#doClose` was never called which lead to
mock repositories not unblocking until the `ThreadPool` interrupts
all threads. Thus stopping a node that is blocked on a mock repository operation wastes `10s`
in each test that does it (which is quite a few as it turns out).
2020-09-22 11:00:18 +02:00
Armin Braun 4bdbc39e9f
Fix testQueuedSnapshotOperationsAndBrokenRepoOnMasterFailOverMultiple (#62713) (#62747)
There's possible retries here that work out if both the snapshot and the delete
operation are retried when master shuts down and hits the unlikely case of the retried delete
executing before the retried snapshot, making both operations pass.

Closes #62686
2020-09-22 10:42:11 +02:00
Luca Cavanna 9ae29713fd
Dense vector field type minor fixes (#62631)
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).

This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
2020-09-22 10:40:51 +02:00
Ignacio Vera 265387f348
override needsScore() on ValueCountAggregator (#62683) (#62745) 2020-09-22 08:47:16 +02:00
Yang Wang 897d2e8a02
Fix ccs permission for search with a scroll id (#62053) (#62695)
CCS with remote indices only does not require any privileges on the local cluster.
This PR ensures that search with scroll follow the permission model.
2020-09-22 11:49:40 +10:00
Jim Ferenczi 1fc78d430b Fix terms aggregation ordering after the final reduce (#62732)
This commit ensures that the final order of the terms aggregations
is registered correctly after the final reduce.
This bug was introduced in #62028 which is not released yet so this PR is marked
as a non-issue.
This issue was discovered when running a terms aggregation under an auto-date
histogram. In such a case, the auto-date histogram may run multiple final reduce
to merge buckets together. This change makes sure that running multiple final reduces
doesn't create duplicates but it doesn't fix the fact that the final reduce may prune
the list of terms prematurely. This other bug is tracked separately in #62731.
2020-09-22 00:03:04 +02:00
Nhat Nguyen f9f4d87437 Remove invalid assertion in SearchService (#62675)
This assertion does not always hold because there can be a race between
`putReaderContext` and `afterIndexRemoved` when an index is deleted.

Closes #62624
2020-09-21 16:29:00 -04:00
Ignacio Vera cadd5dc53f
Fix bug when initializing HyperLogLogPlusPlusSparse (#62602) (#62702)
This is a follow up of #62480 where we are oversizing one array when initialising. In addition it prevents a possible CircuitBreaker leak during initialisation.
2020-09-21 17:30:40 +02:00
Armin Braun 13e28b85ff
Speed up RepositoryData Serialization (#62684) (#62703)
Make serializing `RepositoryData` a little faster and split up/document the code for it a little
as well given how massive this method has gotten at this point.
2020-09-21 17:29:56 +02:00
Dan Hermann a06339ffae
Fix NPE when deleting multiple backing indices on a data stream (#62274) (#62708) 2020-09-21 10:26:47 -05:00
Alan Woodward 1dde4983f6 Convert ConstantKeywordFieldMapper to parametrized form (#62688)
As part of the conversion, adds the ability to customize merge validation - in this case, we
allow an update to the constant value if it is currently set to null, but refuse further
updates once it has been set once.

This commit also converts ParametrizedMapperTests to use MapperServiceTestCase.
2020-09-21 15:22:56 +01:00
Henning Andersen 0c4cfe4c44 Cardinality request breaker leak (#62685)
If HyperLogLogPlusPlus failed during construction, it would
not release already allocated resources, causing the request
circuit breaker to not be adjusted down.

Closes #62439
2020-09-21 15:54:04 +02:00
Christoph Büscher 803f78ef05
Add field type for version strings (#59773) (#62692)
This PR adds a new 'version' field type that allows indexing string values
representing software versions similar to the ones defined in the Semantic
Versioning definition (semver.org). The field behaves very similar to a
'keyword' field but allows efficient sorting and range queries that take into
accound the special ordering needed for version strings. For example, the main
version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't
be possible with 'keyword' fields today.

Valid version values are similar to the Semantic Versioning definition, with the
notable exception that in addition to the "main" version consiting of
major.minor.patch, we allow less or more than three numeric identifiers, i.e.
"1.2" or "1.4.6.123.12" are treated as valid too.

Relates to #48878
2020-09-21 14:25:42 +02:00
Alan Woodward 178b25fc4b Fix standard filter BWC check to allow for cacheing bug (#62649)
The `standard` tokenfilter was removed by #33310, and should have been
unuseable in any indexes created since 7.0. However, a cacheing bug fixed
by #51092 meant that it was still possible in certain circumstances to create
indexes referencing the standard filter in versions up to 7.5.2. Our checks
in AnalysisModule still refer to 7.0.0, however, meaning that a cluster that
contains one of these rogue indexes cannot be upgraded.

This commit adjusts the AnalysisModule checks so that we only refuse to
build a mapping referring to standard filter if the index created version is
7.6 or later.

Fixes #62644
2020-09-21 10:12:55 +01:00
Henning Andersen 9a77f41e55 Fix cluster health when closing (#61709)
When master shuts down it's cluster service, a waiting health request
would fail rather than fail over to a new master.
2020-09-19 10:02:36 +02:00
Luca Cavanna 00272ea877
Remove cache key renderer argument from IndicesRequestCache (#62534)
In the context of of a recurring test failure tracked by #32827, we added trace logging and an extra cache key renderer argument to IndicesRequestCache#getOrCompute (see #39475 and #34180).

We addressed the issue with #54071, but the extra argument was left behind, with a NORELEASE comment saying it should be removed.

With this commit, we remove the extra cache key rendered argument and the corresponding log lines which are not so useful without it.

Closes #55837
2020-09-19 00:24:02 +02:00
Lee Hinman 4a08928c47
[7.x] Add index.routing.allocation.include._tier_preference setting (#62589) (#62667)
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:

```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```

If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.

This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.

Subsequent work will change the ILM migration to make additional use of this setting.

Relates to #60848
2020-09-18 15:41:36 -06:00
Christos Soulios 6a298970fd
[7.x] Allow metadata fields in the _source (#62616)
Backports #61590 to 7.x

    So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set

    This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
    This way each metadata field can configure if it can be included in the document _source
2020-09-18 19:56:41 +03:00
Alan Woodward 17aabaed15 Fix warning on boost docs and warning message on non-implementing fieldmappers 2020-09-18 16:45:08 +01:00