Commit Graph

5327 Commits

Author SHA1 Message Date
Armin Braun eae6a3b18e
Fix testMappingVersionAfterDynamicMappingUpdate (#62352) (#62360)
There is a race in this test where the index request will return
once the dynamic mapping update has been observed by the cluster
state observer internally used by the indexing but not hit all
state appliers and thus isn't showing up as the applied state returned
by `clusterService.state()` yet.
2020-09-15 11:59:22 +02:00
Alan Woodward a68f7077c7 Rationalise fetch phase exceptions (#62230)
We have a special FetchPhaseExecutionException which contains some useful
information about which shard and doc a fetch phase has failed in. However, this
is not used in many places - currently only the ExplainPhase and the highlighters
throw one, and the FetchPhase itself catches IOExceptions and just passes them
to the ExceptionsHelper with no extra context.

This commit changes FetchPhase to throw FetchPhaseExecutionException if it
encounters problems in any of its subphases, and removes the special handling
from the explain and highlight phases. It also removes the need to pass shard ids
around when building HitContext objects.
2020-09-15 09:28:19 +01:00
Alan Woodward 8089210815 Some small cleanups in TermVectorsService (#62292)
We removed the use of aggregated stats from term vectors back in #16452, but there is
a bunch of dead code left here which can be stripped out.
2020-09-15 09:01:49 +01:00
Ignacio Vera 3536f7f7c2
Initialize BitArray storage as number of bits (#62327) (#62354) 2020-09-15 08:34:22 +02:00
Armin Braun c81a076f5a
Improve Efficiency of ClusterApplierService Iteration (#62282) (#62350)
The complexity of removing a timeout listener was `O(n)` which
means that in case of many queued up CS update tasks (such as in the
case of an avalanche of dynamic mapping updates) we're dealing with
quadratic complexity for timing out N tasks which was observed to be
an issue in practice.

This PR makes the complexity of timing out a task `O(1)` and generally
simplifies the iteration logic of listeners and applies to be a little
more efficient and inline better.
2020-09-15 05:59:48 +02:00
Julie Tibshirani f56ce4f39b
Fix failure in InnerHitBuilderTests around 'fields' option. (#62344)
The case InnerHitBuilderTests#testEqualsAndHashcode creates a copy of the object
by serializing + deserializing it, then applies a modification. If the 'fields'
list is empty, then deserializing it results in Collections.emptyList. Because
this is immutable, then modifying it can throw an UnsupportedOperationException.

This PR takes the same approach as for docvalue_fields, where we create a new
list instead of trying to add to an empty one.
2020-09-14 15:39:03 -07:00
Julie Tibshirani 4a19bdb2ea
Support the 'fields' option in inner_hits and top_hits. (#62337)
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation

Addresses #61949.
2020-09-14 11:51:45 -07:00
David Turner 9acd2fd1fd Minor cleanups to BytesReferenceStreamInput (#62302)
Followup to #61681:

- reuse the current iterator in `reset()` if possible
- simply some integer-overflow-avoidance in `skip()`
- clarify some comments
- address some IntelliJ warnings
2020-09-14 17:02:27 +01:00
Christoph Büscher e2eada2498
Fix disabling `allow_leading_wildcard` (#62300) (#62318)
Disabling the `query_string` queries `allow_leading_wildcard` parameter didn't
work after a change probably introduced in #60959 because the various field types
`wildcardQuery` don't check the leading characters like
QueryParserBase#getWildcardQuery does. This PR adds the missing check also
before calling the field types wildcard generating method.

Closes #62267
2020-09-14 17:13:17 +02:00
Alan Woodward 5358cee29c Cut over more mapping tests to MapperServiceTestCase (#62312)
Shaves a few more seconds off the build.
2020-09-14 16:00:37 +01:00
Armin Braun 95766da345
Save Some Allocations when Working with ClusterState (#62060) (#62303)
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
2020-09-14 15:09:54 +02:00
Armin Braun 875af1c976
Remove Dead Variable in BlobStoreIndexShardSnapshots. (#62285) (#62295)
This was never used.

Co-authored-by: Howard <danielhuang@tencent.com>
2020-09-14 13:40:39 +02:00
Luca Cavanna 53bf057a53 [TEST] avoid double null check in TransportSearchActionTests 2020-09-11 10:10:09 +02:00
Nhat Nguyen aafb2cb812 Support point in time cross cluster search (#61827)
This commit integrates point in time into cross cluster search.

Relates #61062
Closes #61790
2020-09-10 19:25:48 -04:00
Nhat Nguyen 808c8689ac Always include the matching node when resolving point in time (#61658)
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.

Closes #61627
2020-09-10 19:25:48 -04:00
Nhat Nguyen 035f0638f4 Support point in time in async_search (#61560)
This commit integrates point in time into async search and
ensures that it works correctly with security enabled.

Relates #61062
2020-09-10 19:25:48 -04:00
Nhat Nguyen 063a6d047c Release search context when scroll keep_alive is too large (#62179)
Previously, we close related search contexts if the keep_alive of a scroll is too large. 
But we accidentally change this behavior in #62061.
2020-09-10 19:25:48 -04:00
Nhat Nguyen 2eb1e8bc84 Make keep alive of point in time optional in search (#62184)
A search request should not be required to extend the keep_alive of a point in time. 
This change makes that parameter optional.
2020-09-10 19:25:48 -04:00
Jim Ferenczi 3fc35aa76e Shard Search Scroll failures consistency (#62061)
Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context
and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced,
this behavior can lead to missing data since scrolls always move forward.
In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure
occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures
in the responses.
This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and
move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above.
That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document
silently.
Finally this change removes an old TODO that was fulfilled with #61062.
2020-09-10 19:25:48 -04:00
Jim Ferenczi 4d528e91a1 Ensure validation of the reader context is executed first (#61831)
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.

Relates #61446

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2020-09-10 19:25:48 -04:00
Luca Cavanna 44bd4a6004 Fix point in time toXContent impl (#62080)
PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object.

This commit fixes this and adds tests for it.
2020-09-10 19:25:47 -04:00
Nhat Nguyen 3d69b5c41e Introduce point in time APIs in x-pack basic (#61062)
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
2020-09-10 19:25:47 -04:00
Armin Braun e0a81f7d14
Speed up Version Checks (#62216) (#62253)
The `fromId` method would show up in profiling and JIT analysis as not-inlinable because it's too large
in the contexts it's used in in many cases and was consuming a surprising amount of cycles for computing the
min compat versions.

-> extract cold path from `fromId` to make JIT happy and cache minimumg compatible versions to fields.
2020-09-10 22:57:06 +02:00
Armin Braun 25db5acb0d
Simplify TimeValue Serialization (#62023) (#62248)
This can be done without map lookups => less code and much smaller methods => better inlining potentially.
2020-09-10 20:16:21 +02:00
Armin Braun 7b941a18e9
Optimize Snapshot Shard Status Update Handling (#62070) (#62219)
Avoiding a number of noop updates that were observed to cause trouble (as in needless noop CS publishing) which can become an issue when working with a large number of concurrent snapshot operations.
Also this sets up some simplifications made in the clone snapshot branch.
2020-09-10 16:29:16 +02:00
Ignacio Vera c8981ea93d
upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213) (#62222) 2020-09-10 16:23:18 +02:00
Igor Motov b6bff56a56
Fix hard_bounds interval handling (#62129) (#62188)
The hard bounds were incorrectly scaled for intervals, which was
causing incorrect buckets to show up or no buckets at all for
interval other than 1.

Closes #62126
2020-09-09 15:42:12 -04:00
Nik Everett 1104d65465
Fix bug with terms' min_doc_count (#62130) (#62177)
The `global_ordinals` implementation of `terms` had a bug when
`min_doc_count: 0` that'd cause sub-aggregations to have array index out
of bounds exceptions. Ooops. My fault. This fixes the bug by assigning
ordinals to those buckets.

Closes #62084
2020-09-09 13:04:51 -04:00
Armin Braun 6710104673
Fix Creating NOOP Tasks on SNAPSHOT Pool (#62152) (#62157)
Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly.
Especially when it comes to mixed master+data nodes and concurrent snapshots these
hurt delete operation performance needlessly.
2020-09-09 14:05:17 +02:00
Luca Cavanna fbf0967e20 QueryPhaseResultConsumer to call notifyPartialReduce (#62083)
As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback.

This commit fixes the call and adds a test for this scenario.
2020-09-09 13:44:07 +02:00
Luca Cavanna ad83261348 Print out search request as part of async search task description (#62057)
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.

With this commit we address that while also making sure that the description highlights that the task is originated from an async search.

Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
2020-09-09 13:44:07 +02:00
Rory Hunter b7fd7cf154
Write deprecation logs to a data stream (#61966)
Backport of #58924.

Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream
as well as to disk.
2020-09-09 12:16:28 +01:00
Armin Braun ed4984a32e
Remove Redundant Stream Wrapping from Compression (#62017) (#62132)
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
2020-09-09 03:27:38 +02:00
Nik Everett b8e9a7125f
Speed up empty highlighting many fields (backport of #61860) (#62122)
Kibana often highlights *everything* like this:
```
POST /_search
{
  "query": ...,
  "size": 500,
  "highlight": {
    "fields": {
      "*": { ... }
    }
  }
}
```

This can get slow when there are hundreds of mapped fields. I tested
this locally and unscientifically and it took a request from 20ms to
150ms when there are 100 fields. I've seen clusters with 2000 fields
where simple search go from 500ms to 1500ms just by turning on this sort
of highlighting. Even when the query is just a `range` that and the
fields are all numbers and stuff so it won't highlight anything.

This speeds up the `unified` highlighter in this case in a few ways:
1. Build the highlighting infrastructure once field rather than once pre
   document per field. This cuts out a *ton* of work analyzing the query
   over and over and over again.
2. Bail out of the highlighter before loading values if we can't produce
   any results.

Combined these take that local 150ms case down to 65ms. This is unlikely
to be really useful when there are only a few fetched docs and only a
few fields, but we often end up having many fields with many fetched
docs.
2020-09-08 15:49:50 -04:00
Alan Woodward 28fd4a2ae8 Convert RangeFieldMapper to parametrized form (#62058)
This also adds the ability to define a serialization check on Parameters, used
in this case to only serialize format and locale parameters if the mapper is a
date range.
2020-09-08 18:44:13 +01:00
Alan Woodward 5f05eef7e3 Convert some more mapping tests to MapperServiceTestCase (#62089)
We don't need to extend ESSingleNodeTestCase for all these tests.
2020-09-08 17:51:40 +01:00
Tim Brooks 075271758e
Keep checkpoint file channel open across fsyncs (#61744)
Currently we open and close the checkpoint file channel for every fsync.
This file channel can be kept open for the lifecycle of a translog
writer. This avoids the overhead of opening the file, checking file
permissions, and closing the file on every fsync.
2020-09-08 08:54:53 -06:00
Francisco Fernández Castaño 2bb5716b3d
Add repositories metering API (#62088)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.

Backport of #60371
2020-09-08 14:01:04 +02:00
Armin Braun ebd1569028
Fix testMasterFailOverWithQueuedDeletes (#62062) (#62078)
Fixing very rare corner case where the delete retry is slow.

Closes #62031
2020-09-08 10:35:06 +02:00
Nhat Nguyen bb0a583990
Allow enabling soft-deletes on restore from snapshot (#62018)
Closes #61969
2020-09-07 09:45:36 -04:00
Alan Woodward cbc9578cbd Remove SearchPhase interface (#62050)
The interface is never used as an abstraction - implementations are are called directly,
and most of them don't need to implement the preProcess method.
2020-09-07 13:45:43 +01:00
David Turner 3389d5ccb2 Introduce integ tests for high disk watermark (#60460)
An important goal of the disk threshold decider is to ensure that nodes
use less disk space than the high watermark, and to take action if a
node ever exceeds this watermark. Today we do not have any
integration-style tests of this high-level behaviour. This commit
introduces a small test harness that can adjust the apparent size of the
disk and verify that the disk threshold decider moves shards around in
response.

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-09-07 14:39:39 +02:00
Armin Braun 395538f508
Improve Snapshot State Machine Performance (#62000) (#62049)
Just a few random things to optimize motivated by somewhat sub-standard performance
for large snapshot cluster states with many concurrent snapshots observed in production.
2020-09-07 13:25:40 +02:00
Jim Ferenczi fa8e76abb1
Improve reduction of terms aggregations (#61779) (#62028)
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).

Relates #51857
2020-09-07 13:13:20 +02:00
Alan Woodward a295b0aa86 Fix null_value parsing for data_nanos field mapper (#61994)
The null_value parameter for date fields is always parsed using DateFormatter.parseMillis,
which is incorrect for nanosecond resolution fields. This commit changes the parsing logic
to always use DateFieldType.parse() to parse the null value.
2020-09-07 10:58:54 +01:00
Alan Woodward 1799c0c583 Convert completion, binary, boolean tests to MapperTestCase (#62004)
Also fixes a metadata serialization bug in CompletionFieldMapper.
2020-09-07 10:48:20 +01:00
Luca Cavanna 0c8b438577
Add support for runtime fields (#61776)
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.

We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-09-07 09:14:53 +02:00
Howard b26584dff8 Remove unused deciders in BalancedShardsAllocator (#62026) 2020-09-07 00:04:16 -04:00
Armin Braun 1e3edbbe74
Simplify BytesReference StreamInput (#61681) (#62014)
Flattening both streams into a single stream here saves a few objects and some indirection.
Also, removed the redundant `offset` field which added nothing but complexity by forcing the
incrementation of two counters on every read.
2020-09-05 10:45:52 +02:00
Ryan Ernst 6d3b691048
Add snapshot only test modules (#61954)
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.

Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.
2020-09-04 16:35:18 -07:00