Commit Graph

5188 Commits

Author SHA1 Message Date
markharwood 66098e0bf4
Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959) (#61010)
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.

Closes #60957
2020-08-12 10:44:52 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Nik Everett ce9c5f0e46 Fix diversified sample tests
The test assumed that the aggregator only ran once but we turned that
off. This turns it back on.
2020-08-11 17:49:43 -04:00
Jay Modi 2fa6448a15
System index reads in separate threadpool (#60927)
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.

In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.

Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.

Relates #50251
Relates #37867
Backport of #57936
2020-08-11 12:16:34 -06:00
Julie Tibshirani a93be8d577
Handle nested arrays in field retrieval. (#60981)
We accept _source values with multiple levels of arrays, such as
`"field": [[[1, 2]]]`. This PR ensures that field retrieval can handle nested
arrays by unwrapping the arrays before parsing the values.
2020-08-11 10:22:16 -07:00
Mark Tozzi ab8518fb5b
[7.x] Extensibility for Composite Agg #59648 (#60842) 2020-08-11 09:14:33 -04:00
Alan Woodward 54279212cf
Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847) (#60924)
This commit cuts over all metadata field mappers to parametrized format.
2020-08-11 09:02:28 +01:00
Armin Braun 3e2dfc6eac
Remove GCS Bucket Exists Check (#60899) (#60914)
Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS.
We don't need to do the bucket exists check before using the repo, that just needlessly
increases the necessary permissions for using the GCS repository.
2020-08-11 09:54:27 +02:00
Julie Tibshirani d51eae6e9f
Prevent loading 'fields' with stored fields disabled. (#60938)
Because the 'fields' option loads from _source (which is a stored field), it is
not possible to retrieve 'fields' when stored_fields are disabled.

This also fixes #60912, where setting stored_fields: _none_ prevented the
_ignored fields from being loaded and caused a parsing exception.
2020-08-10 15:40:27 -07:00
Nik Everett 0286d0a769
Move distance_feature query building into MFT (#60614) (#60846)
This moves the `distance_feature` query building out of
`DistanceFeatureQueryBuilder` and into subclasses of `MappedFieldType`.
Without this we don't have a chance of supporting this for runtime
fields. In general I'm not sad to see the `instanceof`s go.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 16:05:17 -04:00
Julie Tibshirani b216340f50
Make `FetchPhase` logic more readable. (#60779)
* Factor out FieldsVisitor#postProcess call.
* Swap logical order for normal and nested documents.
* Extract the method createStoredFieldsVisitor.
2020-08-10 11:04:54 -07:00
Nik Everett dfd502f9ca
Rework checking if a year is a leap year (#60585) (#60790)
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for `date_histogram` which is
a very popular aggregation so it seems worth it to at least try and
speed it up a little.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 12:45:34 -04:00
Jim Ferenczi f30f1f04e2
Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816)
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
2020-08-10 17:23:00 +02:00
David Turner f44c28b595
Deprecate and ignore join timeout (#60872)
There is no point in timing out a join attempt any more once a cluster
is entirely in 7.x. Timing out and retrying with the same master is
pointless, and an in-flight join attempt to one master no longer blocks
attempts to join other masters. This commit deprecates this unnecessary
setting and removes its effect from the joining process.

Relates #60873 which removes this setting in master.
2020-08-10 13:57:41 +01:00
Martijn van Groningen 64bb082f9b
Improve error message for non append-only writes that target data stream (#60874)
Backport of #60809 to 7.x branch.

Closes #60581
2020-08-10 13:18:59 +02:00
Alan Woodward e8d9185045 Cut over IPFieldMapper to parametrized form (#60602)
This commit makes IpFieldMapper extend ParametrizedFieldMapper. It also
updates the IpFieldMapper docs to add the ignore_malformed parameter,
which was not previously documented.
2020-08-10 11:01:10 +01:00
David Turner 1f49e0b9d0 Fix testRerouteOccursOnDiskPassingHighWatermark (#60869)
Sometimes this test would refresh the disk stats so quickly that it hit
the refresh rate limiter even though it was almost completely disabled.
This commit allows the rate limiter to be completely disabled.

Closes #60587
2020-08-10 09:39:44 +01:00
Ryan Ernst ddcfbec569
Add assert message for multiple lines in osprobe (#60796)
Several /proc files are expected to contain a single line. We assert on
this in tests, but the contents of the file are lost and the assertion
therefore lacks important information to debug why the file appeared to
have multiple lines. This commit dumps the contents of the file on
assertion failure.

relates #59284
2020-08-06 15:53:30 -07:00
Ryan Ernst fc38af363e
Ensure latch is counted down when assertion trips (#60800)
The ReloadSecureSettingsIT uses latches to ensure coordination across
requests to the underlying in memory cluster. However, in the case of an
expected failure, if the assertion fails, the latch will never be
counted down, and will cause the test to hang indefinitely. This commit
ensures the latch is always counted down with a try/finally.

relates #51546
2020-08-06 15:33:46 -07:00
Jim Ferenczi 98119578a1 Disable sort optimization on search collapsing (#60838)
Collapse search queries that sort by a field can throw
an ArrayStoreException due to a bug in the [sort optimization](https://github.com/elastic/elasticsearch/pull/51852)
introduced in 7.7.0. Search collapsing were not supposed to
be eligible for this sort optimization so this change explicitly
filters them from this new feature.
2020-08-06 21:37:12 +02:00
Jim Ferenczi 14980ff97e Fix AOOBE when setting min_doc_count to 0 in significant_terms (#60823)
This commit fixes the computation of the subset size on empty buckets (doc count of 0).
The aggregator test refactoring in #60683 revealed this bug.
2020-08-06 18:57:09 +02:00
David Turner 721198c29e Increase logging in testRerouteOccursOnDiskPassingHighWatermark (#60817)
Relates #60587
2020-08-06 14:08:09 +01:00
Armin Braun a2c7991e96
Fix CompressibleBytesOutputStreamTests (#60815) (#60822)
Since #60730 the `bytes` field can be `null`. This adds the missing `null` check to the test
override.

Closes #60814
2020-08-06 15:07:48 +02:00
David Turner 273a6f916d AwaitsFix for #60814 2020-08-06 12:56:28 +01:00
Tim Brooks 2f76c48ea7
Propagate forceExecution when acquiring permit (#60634)
Currently the transport replication action does not propagate the force
execution parameter when acquiring the indexing permit. The logic to
acquire the index permit supports force execution, so this parameter
should be propagate. Fixes #60359.
2020-08-05 15:57:40 -06:00
Francisco Fernández Castaño b4044004aa
Add recovery state tracking for Searchable Snapshots (#60751)
This pull request adds recovery state tracking for Searchable Snapshots.

In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.

Those differences can be summarized as follows:

-  The Directory implementation that's provided by SearchableSnapshots mark the
    snapshot files as reused during recovery. In order to keep track of the
    recovery process as the cache is pre-warmed, those files shouldn't be marked
    as reused.
 - Once the shard is created, the cache starts its pre-warming phase, meaning that
    we should keep track of those downloads during that process and tie the recovery
    to this pre-warming phase. The shard is considered recovered once this pre-warming
    phase has finished.

Backport of #60505
2020-08-05 17:41:49 +02:00
Jake Landis f3752ba1d5
7.x suport new path for re-index java-api doc (#60319)
This commit uses the new location for the reindex java-api documentation.
Temporary files have been left behind to pacify the docs build.

related #60339
2020-08-05 09:05:07 -05:00
Armin Braun ebfb93ff26
Improve some BytesStreamOutput Usage (#60730) (#60736)
* Stop redundantly creating a `0` length `ByteArray` that is never used
* Add efficient way to get a minimal size copy of the bytes in a `BytesStreamOutput`
* Avoid multiple redundant `byte[]` copies in search cache key creation
2020-08-05 15:51:06 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Igor Motov 959690a64a
Refactor extendedBounds to use DoubleBounds (#60556) (#60681)
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
2020-08-04 16:45:47 -04:00
Alan Woodward b3ae5d26bd
Move mapper validation to the mappers themselves (#60072) (#60649)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.

This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
2020-08-04 14:39:20 +01:00
Armin Braun 212ce22d15
Optimize CS Persistence Stream Use (#60643) (#60647)
In the metadata persistence logic we failed to override the bulk write
method on the FilterOutputStream resulting in all the writes to it
running byte-by-byte in a loop adding a large number of bounds checks
needlessly.
2020-08-04 15:06:57 +02:00
Armin Braun 859ad761bb
Fix Broken Stream Close in writeRawValue (#60625) (#60644)
Small oversight in #56078 that only showed up during backporting where a stream copy was turned from a non-closing to a closing one. Enhanced part of a test in this PR to make it show up in master also even though we practically never use this method with stream targets that actually close.
2020-08-04 13:39:52 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Julie Tibshirani f99584c6f3
Avoid reloading _source for every inner hit. (#60632)
Previously if an inner_hits block required _ source, we would reload and parse
the root document's source for every hit. This PR adds a shared SourceLookup to
the inner hits context that allows inner hits to reuse parsed source if it's
already available. This matches our approach for sharing the root document ID.

Relates to #32818.
2020-08-03 17:12:27 -07:00
Julie Tibshirani fc63f8224f
Simplify class hierarchy for ordinals field data. (#60606)
This PR simplifies the hierarchy for ordinals field data classes:
* Remove `AbstractIndexFieldData`, since only `AbstractIndexOrdinalsFieldData`
inherits directly from it.
* Make `SortedSetOrdinalsIndexFieldData` extend
`AbstractIndexOrdinalsFieldData`. This lets us remove some redundant code.
2020-08-03 09:58:29 -07:00
Yannick Welsch 3409e019d2 Ignore shutdown when retrying recoveries (#60586)
Avoids failures when shutting down a node.
2020-08-03 15:14:38 +02:00
Nik Everett 2cde43b799
Allows nanosecond resolution in search_after (backport of #60328) (#60426)
Allows nanosecond resolution in search_after (#60328)

This fixes `search_after` to properly parse string formatted dates that
have nanosecond resolution.

Closes #52424
2020-08-03 08:17:48 -04:00
David Turner d2ddf8cd6a Improve deserialization failure logging (#60577)
Today when a node fails to properly deserialize a transport message with
a parent task we log the following relatively uninformative message:

    java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting

In particular, the wrapping of the listener in the `TransportService`
obscures all clues as to the source of the problem, e.g. the action name
or the identity of the underlying listener. This commit exposes the
inner listener to the logs.

Also if the listener is wrapped with `ContextPreservingActionListener`
then its identity is similarly hidden. This commit also exposes the
wrapped listener in this case.

Relates #38939
2020-08-03 11:51:01 +01:00
Armin Braun 3270cb3088
More Efficient Writes for Snapshot Shard Generations (#60458) (#60575)
Same as #59905 but for shard level metadata. Since we wnat to retain
the ability to do safe+atomic writes for non-uuid shard generations
this PR has to create two separate write paths for both kinds of
shard generations.
2020-08-03 11:11:36 +02:00
Armin Braun 204efe9387
Add Repository Setting to Disable Writing index.latest (#60448) (#60576)
Writing the `index.latest` blob is unnecessary unless the contents of the repository
are to be used as a URL-repository. Also, in some edge cases, the fact that `index.latest` is the only
blob in the repository that regularly gets overwritten was causing compatibility issues with
some backing blobstores (Azure no-overwrite policy, Hitachy S3 equivalent).
=> this commit changes behavior to make snapshots not fail if writing `index.latest` fails
and adds a setting to disable writing `index.latest`.
2020-08-03 11:11:24 +02:00
Andrei Dan ac258f10d6
Data streams: throw ResourceAlreadyExists exception (#60518) (#60536)
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.

(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-01 16:31:09 +01:00
Julie Tibshirani f1d4fd8c3e Correct name of IndexFieldData#loadGlobalDirect. (#60492)
It seems 'localGlobalDirect' was just a typo.
2020-07-31 10:53:21 -07:00
Jim Ferenczi 8db896d290 Fix race condition in SearchPhaseControllerTests#testPartialMergeFailure (#60488)
This change ensures that we call the listener for partial merge failure **before**
calling the completion listener in order to avoid race condition in tests.

Closes #60446
2020-07-31 16:29:20 +02:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Julie Tibshirani 8ac81a3447 Remove IndexFieldData#clear since it is unused. (#60475)
This method was never called. It also seemed tricky that calling a method on
`IndexFieldData` could clear the contents of a shared cache.
2020-07-30 14:07:55 -07:00
Julie Tibshirani dfd7f226f0
Clarify SourceLookup sharing across fetch subphases. (#60484)
The `SourceLookup` class provides access to the _source for a particular
document, specified through `SourceLookup#setSegmentAndDocument`. Previously
the search context contained a single `SourceLookup` that was shared between
different fetch subphases. It was hard to reason about its state: is
`SourceLookup` set to the expected document? Is the _source already loaded and
available?

Instead of using a global source lookup, the fetch hit context now provides
access to a lookup that is set to load from the hit document.

This refactor closes #31000, since the same `SourceLookup` is no longer shared
between the 'fetch _source phase' and script execution.
2020-07-30 13:22:31 -07:00
Dan Hermann 5e5503ac28
Change severity of negative stats messages from WARN to DEBUG (#60375) (#60444) 2020-07-30 06:06:13 -05:00
Armin Braun 3bf4c01d8e
Don't Allocate Redundant Pages in BigArrays (#60201) (#60441)
The oversize algorithm was allocating more pages than necessary to accommodate `minTargetSize`.
An example would be that a 16k page size and 15k `minTargetSize` would result in a new size of 32k (2 pages).
The difference between the minimum number of necessary pages and the estimated size then keeps growing as sizes increase.

I don't think there is much value in preemptively allocating pages by over-sizing aggressively since the behavior of
the system is quite different from that of a single array where over-sizing avoids copying
once the minimum target size is more than a single page.

Relates #60173 which lead me to this when `BytesStreamOutput` would allocate a large number of never used
pages during serialization of repository metadata.
2020-07-30 11:09:58 +02:00
Armin Braun a2c49a4f02
Reduce Heap Use during Shard Snapshot (#60370) (#60440)
Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger
number of shards the previous execution order would lead to memory use proportional to the number of shards
for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot
pool (except for in the BwC format path).
This PR makes it so that they are written to the repository at the earliest possible point in time
so that they can be garbage collected.
If shard generations are used, we can safely write these right at the beginning of the shard snapshot.
If shard generations are not used we can only write them at the end of the shard snapshot after all
other blobs have been written.

Closes #60173
2020-07-30 10:45:00 +02:00