OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-11 07:25:23 +00:00

Author	SHA1	Message	Date
markharwood	66098e0bf4	Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959 ) (#61010 ) The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type. The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields. Closes #60957	2020-08-12 10:44:52 +01:00
Armin Braun	32423a486d	Simplify and Speed up some Compression Usage (#60953 ) (#61008 ) Use thread-local buffers and deflater and inflater instances to speed up compressing and decompressing from in-memory bytes. Not manually invoking `end()` on these should be safe since their off-heap memory will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals that are not instantiated at a high frequency. This significantly reduces the amount of byte copying and object creation relative to the previous approach which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc. Relates #57284 which should be helped by this change to some degree. Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these code paths.	2020-08-12 11:06:23 +02:00
Nik Everett	ce9c5f0e46	Fix diversified sample tests The test assumed that the aggregator only ran once but we turned that off. This turns it back on.	2020-08-11 17:49:43 -04:00
Jay Modi	2fa6448a15	System index reads in separate threadpool (#60927 ) This commit introduces a new thread pool, `system_read`, which is intended for use by system indices for all read operations (get and search). The `system_read` pool is a fixed thread pool with a maximum number of threads equal to lesser of half of the available processors or 5. Given the combination of both get and read operations in this thread pool, the queue size has been set to 2000. The motivation for this change is to allow system read operations to be serviced in spite of the number of user searches. In order to avoid a significant performance hit due to pattern matching on all search requests, a new metadata flag is added to mark indices as system or non-system. Previously created system indices will have flag added to their metadata upon upgrade to a version with this capability. Additionally, this change also introduces a new class, `SystemIndices`, which encapsulates logic around system indices. Currently, the class provides a method to check if an index is a system index and a method to find a matching index descriptor given the name of an index. Relates #50251 Relates #37867 Backport of #57936	2020-08-11 12:16:34 -06:00
Julie Tibshirani	a93be8d577	Handle nested arrays in field retrieval. (#60981 ) We accept _source values with multiple levels of arrays, such as `"field": [[[1, 2]]]`. This PR ensures that field retrieval can handle nested arrays by unwrapping the arrays before parsing the values.	2020-08-11 10:22:16 -07:00
Mark Tozzi	ab8518fb5b	[7.x] Extensibility for Composite Agg #59648 (#60842 )	2020-08-11 09:14:33 -04:00
Alan Woodward	54279212cf	Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847 ) (#60924 ) This commit cuts over all metadata field mappers to parametrized format.	2020-08-11 09:02:28 +01:00
Armin Braun	3e2dfc6eac	Remove GCS Bucket Exists Check (#60899 ) (#60914 ) Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS. We don't need to do the bucket exists check before using the repo, that just needlessly increases the necessary permissions for using the GCS repository.	2020-08-11 09:54:27 +02:00
Julie Tibshirani	d51eae6e9f	Prevent loading 'fields' with stored fields disabled. (#60938 ) Because the 'fields' option loads from _source (which is a stored field), it is not possible to retrieve 'fields' when stored_fields are disabled. This also fixes #60912, where setting stored_fields: _none_ prevented the _ignored fields from being loaded and caused a parsing exception.	2020-08-10 15:40:27 -07:00
Nik Everett	0286d0a769	Move distance_feature query building into MFT (#60614 ) (#60846 ) This moves the `distance_feature` query building out of `DistanceFeatureQueryBuilder` and into subclasses of `MappedFieldType`. Without this we don't have a chance of supporting this for runtime fields. In general I'm not sad to see the `instanceof`s go. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-10 16:05:17 -04:00
Julie Tibshirani	b216340f50	Make `FetchPhase` logic more readable. (#60779 ) * Factor out FieldsVisitor#postProcess call. * Swap logical order for normal and nested documents. * Extract the method createStoredFieldsVisitor.	2020-08-10 11:04:54 -07:00
Nik Everett	dfd502f9ca	Rework checking if a year is a leap year (#60585 ) (#60790 ) This way is faster, saving about 8% on the microbenchmark that rounds to the nearest month. That is in the hot path for `date_histogram` which is a very popular aggregation so it seems worth it to at least try and speed it up a little. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-08-10 12:45:34 -04:00
Jim Ferenczi	f30f1f04e2	Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816 ) This commit removes the ability to test the top level result of an aggregator before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test the final output (the one sent to the end user) rather than an intermediary result that could be different. This change also removes spurious commits triggered on top of a random index writer. These commits slow down the tests and are redundant with the commits that the random index writer performs.	2020-08-10 17:23:00 +02:00
David Turner	f44c28b595	Deprecate and ignore join timeout (#60872 ) There is no point in timing out a join attempt any more once a cluster is entirely in 7.x. Timing out and retrying with the same master is pointless, and an in-flight join attempt to one master no longer blocks attempts to join other masters. This commit deprecates this unnecessary setting and removes its effect from the joining process. Relates #60873 which removes this setting in master.	2020-08-10 13:57:41 +01:00
Martijn van Groningen	64bb082f9b	Improve error message for non append-only writes that target data stream (#60874 ) Backport of #60809 to 7.x branch. Closes #60581	2020-08-10 13:18:59 +02:00
Alan Woodward	e8d9185045	Cut over IPFieldMapper to parametrized form (#60602 ) This commit makes IpFieldMapper extend ParametrizedFieldMapper. It also updates the IpFieldMapper docs to add the ignore_malformed parameter, which was not previously documented.	2020-08-10 11:01:10 +01:00
David Turner	1f49e0b9d0	Fix testRerouteOccursOnDiskPassingHighWatermark (#60869 ) Sometimes this test would refresh the disk stats so quickly that it hit the refresh rate limiter even though it was almost completely disabled. This commit allows the rate limiter to be completely disabled. Closes #60587	2020-08-10 09:39:44 +01:00
Ryan Ernst	ddcfbec569	Add assert message for multiple lines in osprobe (#60796 ) Several /proc files are expected to contain a single line. We assert on this in tests, but the contents of the file are lost and the assertion therefore lacks important information to debug why the file appeared to have multiple lines. This commit dumps the contents of the file on assertion failure. relates #59284	2020-08-06 15:53:30 -07:00
Ryan Ernst	fc38af363e	Ensure latch is counted down when assertion trips (#60800 ) The ReloadSecureSettingsIT uses latches to ensure coordination across requests to the underlying in memory cluster. However, in the case of an expected failure, if the assertion fails, the latch will never be counted down, and will cause the test to hang indefinitely. This commit ensures the latch is always counted down with a try/finally. relates #51546	2020-08-06 15:33:46 -07:00
Jim Ferenczi	98119578a1	Disable sort optimization on search collapsing (#60838 ) Collapse search queries that sort by a field can throw an ArrayStoreException due to a bug in the [sort optimization](https://github.com/elastic/elasticsearch/pull/51852) introduced in 7.7.0. Search collapsing were not supposed to be eligible for this sort optimization so this change explicitly filters them from this new feature.	2020-08-06 21:37:12 +02:00
Jim Ferenczi	14980ff97e	Fix AOOBE when setting min_doc_count to 0 in significant_terms (#60823 ) This commit fixes the computation of the subset size on empty buckets (doc count of 0). The aggregator test refactoring in #60683 revealed this bug.	2020-08-06 18:57:09 +02:00
David Turner	721198c29e	Increase logging in testRerouteOccursOnDiskPassingHighWatermark (#60817 ) Relates #60587	2020-08-06 14:08:09 +01:00
Armin Braun	a2c7991e96	Fix CompressibleBytesOutputStreamTests (#60815 ) (#60822 ) Since #60730 the `bytes` field can be `null`. This adds the missing `null` check to the test override. Closes #60814	2020-08-06 15:07:48 +02:00
David Turner	273a6f916d	AwaitsFix for #60814	2020-08-06 12:56:28 +01:00
Tim Brooks	2f76c48ea7	Propagate forceExecution when acquiring permit (#60634 ) Currently the transport replication action does not propagate the force execution parameter when acquiring the indexing permit. The logic to acquire the index permit supports force execution, so this parameter should be propagate. Fixes #60359.	2020-08-05 15:57:40 -06:00
Francisco Fernández Castaño	b4044004aa	Add recovery state tracking for Searchable Snapshots (#60751 ) This pull request adds recovery state tracking for Searchable Snapshots. In order to track recoveries for searchable snapshot backed indices, this pull request adds a new type of RecoveryState. This newRecoveryState instance is able to deal with the small differences that arise during Searchable snapshots recoveries. Those differences can be summarized as follows: - The Directory implementation that's provided by SearchableSnapshots mark the snapshot files as reused during recovery. In order to keep track of the recovery process as the cache is pre-warmed, those files shouldn't be marked as reused. - Once the shard is created, the cache starts its pre-warming phase, meaning that we should keep track of those downloads during that process and tie the recovery to this pre-warming phase. The shard is considered recovered once this pre-warming phase has finished. Backport of #60505	2020-08-05 17:41:49 +02:00
Jake Landis	f3752ba1d5	7.x suport new path for re-index java-api doc (#60319 ) This commit uses the new location for the reindex java-api documentation. Temporary files have been left behind to pacify the docs build. related #60339	2020-08-05 09:05:07 -05:00
Armin Braun	ebfb93ff26	Improve some BytesStreamOutput Usage (#60730 ) (#60736 ) * Stop redundantly creating a `0` length `ByteArray` that is never used * Add efficient way to get a minimal size copy of the bytes in a `BytesStreamOutput` * Avoid multiple redundant `byte[]` copies in search cache key creation	2020-08-05 15:51:06 +02:00
Yannick Welsch	9f6f66f156	Fail searchable snapshot shards on invalid license (#60722 ) Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.	2020-08-05 13:14:15 +02:00
Igor Motov	959690a64a	Refactor extendedBounds to use DoubleBounds (#60556 ) (#60681 ) Refactors extendedBounds to use DoubleBounds instead of 2 variables. This is a follow up for #59175	2020-08-04 16:45:47 -04:00
Alan Woodward	b3ae5d26bd	Move mapper validation to the mappers themselves (#60072 ) (#60649 ) Currently, validation of mappers (checking that cross-references are correct, limits on field name lengths and object depths, multiple definitions, etc) is performed by the MapperService. This means that any mapper-specific validation, for example that done on the CompletionFieldMapper, needs to be called specifically from core server code, and so we can't add validation to mappers that live in plugins. This commit reworks the validation framework so that mapper-specific validation is done on the Mapper itself. Mapper gets a new `validate(MappingLookup)` method (already present on `MetadataFieldMapper` and now pulled up to the parent interface), which is called from a new `DocumentMapper.validate()` method. All the validation code currently living on `MapperService` moves either to individual mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into `MappingLookup`, an altered `DocumentFieldMappers` which now knows about object fields and can check for duplicate definitions, or into DocumentMapper which handles soft limit checks.	2020-08-04 14:39:20 +01:00
Armin Braun	212ce22d15	Optimize CS Persistence Stream Use (#60643 ) (#60647 ) In the metadata persistence logic we failed to override the bulk write method on the FilterOutputStream resulting in all the writes to it running byte-by-byte in a loop adding a large number of bounds checks needlessly.	2020-08-04 15:06:57 +02:00
Armin Braun	859ad761bb	Fix Broken Stream Close in writeRawValue (#60625 ) (#60644 ) Small oversight in #56078 that only showed up during backporting where a stream copy was turned from a non-closing to a closing one. Enhanced part of a test in this PR to make it show up in master also even though we practically never use this method with stream targets that actually close.	2020-08-04 13:39:52 +02:00
Armin Braun	7ae9dc2092	Unify Stream Copy Buffer Usage (#56078 ) (#60608 ) We have various ways of copying between two streams and handling thread-local buffers throughout the codebase. This commit unifies a number of them and removes buffer allocations in many spots.	2020-08-04 09:54:52 +02:00
Julie Tibshirani	f99584c6f3	Avoid reloading _source for every inner hit. (#60632 ) Previously if an inner_hits block required _ source, we would reload and parse the root document's source for every hit. This PR adds a shared SourceLookup to the inner hits context that allows inner hits to reuse parsed source if it's already available. This matches our approach for sharing the root document ID. Relates to #32818.	2020-08-03 17:12:27 -07:00
Julie Tibshirani	fc63f8224f	Simplify class hierarchy for ordinals field data. (#60606 ) This PR simplifies the hierarchy for ordinals field data classes: * Remove `AbstractIndexFieldData`, since only `AbstractIndexOrdinalsFieldData` inherits directly from it. * Make `SortedSetOrdinalsIndexFieldData` extend `AbstractIndexOrdinalsFieldData`. This lets us remove some redundant code.	2020-08-03 09:58:29 -07:00
Yannick Welsch	3409e019d2	Ignore shutdown when retrying recoveries (#60586 ) Avoids failures when shutting down a node.	2020-08-03 15:14:38 +02:00
Nik Everett	2cde43b799	Allows nanosecond resolution in search_after (backport of #60328 ) (#60426 ) Allows nanosecond resolution in search_after (#60328) This fixes `search_after` to properly parse string formatted dates that have nanosecond resolution. Closes #52424	2020-08-03 08:17:48 -04:00
David Turner	d2ddf8cd6a	Improve deserialization failure logging (#60577 ) Today when a node fails to properly deserialize a transport message with a parent task we log the following relatively uninformative message: java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting In particular, the wrapping of the listener in the `TransportService` obscures all clues as to the source of the problem, e.g. the action name or the identity of the underlying listener. This commit exposes the inner listener to the logs. Also if the listener is wrapped with `ContextPreservingActionListener` then its identity is similarly hidden. This commit also exposes the wrapped listener in this case. Relates #38939	2020-08-03 11:51:01 +01:00
Armin Braun	3270cb3088	More Efficient Writes for Snapshot Shard Generations (#60458 ) (#60575 ) Same as #59905 but for shard level metadata. Since we wnat to retain the ability to do safe+atomic writes for non-uuid shard generations this PR has to create two separate write paths for both kinds of shard generations.	2020-08-03 11:11:36 +02:00
Armin Braun	204efe9387	Add Repository Setting to Disable Writing index.latest (#60448 ) (#60576 ) Writing the `index.latest` blob is unnecessary unless the contents of the repository are to be used as a URL-repository. Also, in some edge cases, the fact that `index.latest` is the only blob in the repository that regularly gets overwritten was causing compatibility issues with some backing blobstores (Azure no-overwrite policy, Hitachy S3 equivalent). => this commit changes behavior to make snapshots not fail if writing `index.latest` fails and adds a setting to disable writing `index.latest`.	2020-08-03 11:11:24 +02:00
Andrei Dan	ac258f10d6	Data streams: throw ResourceAlreadyExists exception (#60518 ) (#60536 ) For consistency reasons (and reducing the overload of IllegalArgumentException) this changes the exception thrown when trying to create a data stream that already exists. (cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-08-01 16:31:09 +01:00
Julie Tibshirani	f1d4fd8c3e	Correct name of IndexFieldData#loadGlobalDirect. (#60492 ) It seems 'localGlobalDirect' was just a typo.	2020-07-31 10:53:21 -07:00
Jim Ferenczi	8db896d290	Fix race condition in SearchPhaseControllerTests#testPartialMergeFailure (#60488 ) This change ensures that we call the listener for partial merge failure before calling the completion listener in order to avoid race condition in tests. Closes #60446	2020-07-31 16:29:20 +02:00
Rene Groeschke	ed4b70190b	Replace immediate task creations by using task avoidance api (#60071 ) (#60504 ) - Replace immediate task creations by using task avoidance api - One step closer to #56610 - Still many tasks are created during configuration phase. Tackled in separate steps	2020-07-31 13:09:04 +02:00
Julie Tibshirani	8ac81a3447	Remove IndexFieldData#clear since it is unused. (#60475 ) This method was never called. It also seemed tricky that calling a method on `IndexFieldData` could clear the contents of a shared cache.	2020-07-30 14:07:55 -07:00
Julie Tibshirani	dfd7f226f0	Clarify SourceLookup sharing across fetch subphases. (#60484 ) The `SourceLookup` class provides access to the _source for a particular document, specified through `SourceLookup#setSegmentAndDocument`. Previously the search context contained a single `SourceLookup` that was shared between different fetch subphases. It was hard to reason about its state: is `SourceLookup` set to the expected document? Is the _source already loaded and available? Instead of using a global source lookup, the fetch hit context now provides access to a lookup that is set to load from the hit document. This refactor closes #31000, since the same `SourceLookup` is no longer shared between the 'fetch _source phase' and script execution.	2020-07-30 13:22:31 -07:00
Dan Hermann	5e5503ac28	Change severity of negative stats messages from WARN to DEBUG (#60375 ) (#60444 )	2020-07-30 06:06:13 -05:00
Armin Braun	3bf4c01d8e	Don't Allocate Redundant Pages in BigArrays (#60201 ) (#60441 ) The oversize algorithm was allocating more pages than necessary to accommodate `minTargetSize`. An example would be that a 16k page size and 15k `minTargetSize` would result in a new size of 32k (2 pages). The difference between the minimum number of necessary pages and the estimated size then keeps growing as sizes increase. I don't think there is much value in preemptively allocating pages by over-sizing aggressively since the behavior of the system is quite different from that of a single array where over-sizing avoids copying once the minimum target size is more than a single page. Relates #60173 which lead me to this when `BytesStreamOutput` would allocate a large number of never used pages during serialization of repository metadata.	2020-07-30 11:09:58 +02:00
Armin Braun	a2c49a4f02	Reduce Heap Use during Shard Snapshot (#60370 ) (#60440 ) Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger number of shards the previous execution order would lead to memory use proportional to the number of shards for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot pool (except for in the BwC format path). This PR makes it so that they are written to the repository at the earliest possible point in time so that they can be garbage collected. If shard generations are used, we can safely write these right at the beginning of the shard snapshot. If shard generations are not used we can only write them at the end of the shard snapshot after all other blobs have been written. Closes #60173	2020-07-30 10:45:00 +02:00

1 2 3 4 5 ...

5188 Commits