OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Turner	9acd2fd1fd	Minor cleanups to BytesReferenceStreamInput (#62302 ) Followup to #61681: - reuse the current iterator in `reset()` if possible - simply some integer-overflow-avoidance in `skip()` - clarify some comments - address some IntelliJ warnings	2020-09-14 17:02:27 +01:00
Christoph Büscher	e2eada2498	Fix disabling `allow_leading_wildcard` (#62300 ) (#62318 ) Disabling the `query_string` queries `allow_leading_wildcard` parameter didn't work after a change probably introduced in #60959 because the various field types `wildcardQuery` don't check the leading characters like QueryParserBase#getWildcardQuery does. This PR adds the missing check also before calling the field types wildcard generating method. Closes #62267	2020-09-14 17:13:17 +02:00
Alan Woodward	5358cee29c	Cut over more mapping tests to MapperServiceTestCase (#62312 ) Shaves a few more seconds off the build.	2020-09-14 16:00:37 +01:00
Armin Braun	95766da345	Save Some Allocations when Working with ClusterState (#62060 ) (#62303 ) Just a number of obvious spots where we were allocating duplicate empty structures or otherwise inefficient that I found while investigating snapshot cluster state update performance.	2020-09-14 15:09:54 +02:00
Armin Braun	875af1c976	Remove Dead Variable in BlobStoreIndexShardSnapshots. (#62285 ) (#62295 ) This was never used. Co-authored-by: Howard <danielhuang@tencent.com>	2020-09-14 13:40:39 +02:00
Luca Cavanna	53bf057a53	[TEST] avoid double null check in TransportSearchActionTests	2020-09-11 10:10:09 +02:00
Nhat Nguyen	aafb2cb812	Support point in time cross cluster search (#61827 ) This commit integrates point in time into cross cluster search. Relates #61062 Closes #61790	2020-09-10 19:25:48 -04:00
Nhat Nguyen	808c8689ac	Always include the matching node when resolving point in time (#61658 ) If shards are relocated to new nodes, then searches with a point in time will fail, although a pit keeps search contexts open. This commit solves this problem by reducing info used by SearchShardIterator and always including the matching nodes when resolving a point in time. Closes #61627	2020-09-10 19:25:48 -04:00
Nhat Nguyen	035f0638f4	Support point in time in async_search (#61560 ) This commit integrates point in time into async search and ensures that it works correctly with security enabled. Relates #61062	2020-09-10 19:25:48 -04:00
Nhat Nguyen	063a6d047c	Release search context when scroll keep_alive is too large (#62179 ) Previously, we close related search contexts if the keep_alive of a scroll is too large. But we accidentally change this behavior in #62061.	2020-09-10 19:25:48 -04:00
Nhat Nguyen	2eb1e8bc84	Make keep alive of point in time optional in search (#62184 ) A search request should not be required to extend the keep_alive of a point in time. This change makes that parameter optional.	2020-09-10 19:25:48 -04:00
Jim Ferenczi	3fc35aa76e	Shard Search Scroll failures consistency (#62061 ) Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced, this behavior can lead to missing data since scrolls always move forward. In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures in the responses. This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above. That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document silently. Finally this change removes an old TODO that was fulfilled with #61062.	2020-09-10 19:25:48 -04:00
Jim Ferenczi	4d528e91a1	Ensure validation of the reader context is executed first (#61831 ) This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext) before any other operation and that it is correctly recycled or removed at the end of the operation. This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once. Relates #61446 Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>	2020-09-10 19:25:48 -04:00
Luca Cavanna	44bd4a6004	Fix point in time toXContent impl (#62080 ) PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object. This commit fixes this and adds tests for it.	2020-09-10 19:25:47 -04:00
Nhat Nguyen	3d69b5c41e	Introduce point in time APIs in x-pack basic (#61062 ) This commit introduces a new API that manages point-in-times in x-pack basic. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. A search request by default executes against the most recent point in time. In some cases, it is preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time. A point in time must be opened before being used in search requests. The `keep_alive` parameter tells Elasticsearch how long it should keep a point in time around. ``` POST /my_index/_pit?keep_alive=1m ``` The response from the above request includes a `id`, which should be passed to the `id` of the `pit` parameter of search requests. ``` POST /_search { "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } ``` Point-in-times are automatically closed when the `keep_alive` is elapsed. However, keeping point-in-times has a cost; hence, point-in-times should be closed as soon as they are no longer used in search requests. ``` DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" } ``` #### Notable works in this change: - Move the search state to the coordinating node: #52741 - Allow searches with a specific reader context: #53989 - Add the ability to acquire readers in IndexShard: #54966 Relates #46523 Relates #26472 Co-authored-by: Jim Ferenczi <jimczi@apache.org>	2020-09-10 19:25:47 -04:00
Armin Braun	e0a81f7d14	Speed up Version Checks (#62216 ) (#62253 ) The `fromId` method would show up in profiling and JIT analysis as not-inlinable because it's too large in the contexts it's used in in many cases and was consuming a surprising amount of cycles for computing the min compat versions. -> extract cold path from `fromId` to make JIT happy and cache minimumg compatible versions to fields.	2020-09-10 22:57:06 +02:00
Armin Braun	25db5acb0d	Simplify TimeValue Serialization (#62023 ) (#62248 ) This can be done without map lookups => less code and much smaller methods => better inlining potentially.	2020-09-10 20:16:21 +02:00
Armin Braun	7b941a18e9	Optimize Snapshot Shard Status Update Handling (#62070 ) (#62219 ) Avoiding a number of noop updates that were observed to cause trouble (as in needless noop CS publishing) which can become an issue when working with a large number of concurrent snapshot operations. Also this sets up some simplifications made in the clone snapshot branch.	2020-09-10 16:29:16 +02:00
Ignacio Vera	c8981ea93d	upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213 ) (#62222 )	2020-09-10 16:23:18 +02:00
Igor Motov	b6bff56a56	Fix hard_bounds interval handling (#62129 ) (#62188 ) The hard bounds were incorrectly scaled for intervals, which was causing incorrect buckets to show up or no buckets at all for interval other than 1. Closes #62126	2020-09-09 15:42:12 -04:00
Nik Everett	1104d65465	Fix bug with terms' min_doc_count (#62130 ) (#62177 ) The `global_ordinals` implementation of `terms` had a bug when `min_doc_count: 0` that'd cause sub-aggregations to have array index out of bounds exceptions. Ooops. My fault. This fixes the bug by assigning ordinals to those buckets. Closes #62084	2020-09-09 13:04:51 -04:00
Armin Braun	6710104673	Fix Creating NOOP Tasks on SNAPSHOT Pool (#62152 ) (#62157 ) Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly. Especially when it comes to mixed master+data nodes and concurrent snapshots these hurt delete operation performance needlessly.	2020-09-09 14:05:17 +02:00
Luca Cavanna	fbf0967e20	QueryPhaseResultConsumer to call notifyPartialReduce (#62083 ) As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback. This commit fixes the call and adds a test for this scenario.	2020-09-09 13:44:07 +02:00
Luca Cavanna	ad83261348	Print out search request as part of async search task description (#62057 ) Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should. With this commit we address that while also making sure that the description highlights that the task is originated from an async search. Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.	2020-09-09 13:44:07 +02:00
Rory Hunter	b7fd7cf154	Write deprecation logs to a data stream (#61966 ) Backport of #58924. Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream as well as to disk.	2020-09-09 12:16:28 +01:00
Armin Braun	ed4984a32e	Remove Redundant Stream Wrapping from Compression (#62017 ) (#62132 ) In many cases we don't need a `StreamInput` or `StreamOutput` wrapper around these streams so I this commit adjusts the API to just normal streams and adds the wrapping where necessary.	2020-09-09 03:27:38 +02:00
Nik Everett	b8e9a7125f	Speed up empty highlighting many fields (backport of #61860 ) (#62122 ) Kibana often highlights everything like this: ``` POST /_search { "query": ..., "size": 500, "highlight": { "fields": { "": { ... } } } } ``` This can get slow when there are hundreds of mapped fields. I tested this locally and unscientifically and it took a request from 20ms to 150ms when there are 100 fields. I've seen clusters with 2000 fields where simple search go from 500ms to 1500ms just by turning on this sort of highlighting. Even when the query is just a `range` that and the fields are all numbers and stuff so it won't highlight anything. This speeds up the `unified` highlighter in this case in a few ways: 1. Build the highlighting infrastructure once field rather than once pre document per field. This cuts out a ton* of work analyzing the query over and over and over again. 2. Bail out of the highlighter before loading values if we can't produce any results. Combined these take that local 150ms case down to 65ms. This is unlikely to be really useful when there are only a few fetched docs and only a few fields, but we often end up having many fields with many fetched docs.	2020-09-08 15:49:50 -04:00
Alan Woodward	28fd4a2ae8	Convert RangeFieldMapper to parametrized form (#62058 ) This also adds the ability to define a serialization check on Parameters, used in this case to only serialize format and locale parameters if the mapper is a date range.	2020-09-08 18:44:13 +01:00
Alan Woodward	5f05eef7e3	Convert some more mapping tests to MapperServiceTestCase (#62089 ) We don't need to extend ESSingleNodeTestCase for all these tests.	2020-09-08 17:51:40 +01:00
Tim Brooks	075271758e	Keep checkpoint file channel open across fsyncs (#61744 ) Currently we open and close the checkpoint file channel for every fsync. This file channel can be kept open for the lifecycle of a translog writer. This avoids the overhead of opening the file, checking file permissions, and closing the file on every fsync.	2020-09-08 08:54:53 -06:00
Francisco Fernández Castaño	2bb5716b3d	Add repositories metering API (#62088 ) This pull request adds a new set of APIs that allows tracking the number of requests performed by the different registered repositories. In order to avoid losing data, the repository statistics are archived after the repository is closed for a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the statistics for the active repositories as well as the modified/closed repositories. Backport of #60371	2020-09-08 14:01:04 +02:00
Armin Braun	ebd1569028	Fix testMasterFailOverWithQueuedDeletes (#62062 ) (#62078 ) Fixing very rare corner case where the delete retry is slow. Closes #62031	2020-09-08 10:35:06 +02:00
Nhat Nguyen	bb0a583990	Allow enabling soft-deletes on restore from snapshot (#62018 ) Closes #61969	2020-09-07 09:45:36 -04:00
Alan Woodward	cbc9578cbd	Remove SearchPhase interface (#62050 ) The interface is never used as an abstraction - implementations are are called directly, and most of them don't need to implement the preProcess method.	2020-09-07 13:45:43 +01:00
David Turner	3389d5ccb2	Introduce integ tests for high disk watermark (#60460 ) An important goal of the disk threshold decider is to ensure that nodes use less disk space than the high watermark, and to take action if a node ever exceeds this watermark. Today we do not have any integration-style tests of this high-level behaviour. This commit introduces a small test harness that can adjust the apparent size of the disk and verify that the disk threshold decider moves shards around in response. Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2020-09-07 14:39:39 +02:00
Armin Braun	395538f508	Improve Snapshot State Machine Performance (#62000 ) (#62049 ) Just a few random things to optimize motivated by somewhat sub-standard performance for large snapshot cluster states with many concurrent snapshots observed in production.	2020-09-07 13:25:40 +02:00
Jim Ferenczi	fa8e76abb1	Improve reduction of terms aggregations (#61779 ) (#62028 ) Today, the terms aggregation reduces multiple aggregations at once using a map to group same buckets together. This operation can be costly since it requires to lookup every bucket in a global map with no particular order. This commit changes how term buckets are sorted by shards and partial reduces in order to be able to reduce results using a merge-sort strategy. For bwc, results are merged with the legacy code if any of the aggregations use a different sort (if it was returned by a node in prior versions). Relates #51857	2020-09-07 13:13:20 +02:00
Alan Woodward	a295b0aa86	Fix null_value parsing for data_nanos field mapper (#61994 ) The null_value parameter for date fields is always parsed using DateFormatter.parseMillis, which is incorrect for nanosecond resolution fields. This commit changes the parsing logic to always use DateFieldType.parse() to parse the null value.	2020-09-07 10:58:54 +01:00
Alan Woodward	1799c0c583	Convert completion, binary, boolean tests to MapperTestCase (#62004 ) Also fixes a metadata serialization bug in CompletionFieldMapper.	2020-09-07 10:48:20 +01:00
Luca Cavanna	0c8b438577	Add support for runtime fields (#61776 ) This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch. We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script. The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields. Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-09-07 09:14:53 +02:00
Howard	b26584dff8	Remove unused deciders in BalancedShardsAllocator (#62026 )	2020-09-07 00:04:16 -04:00
Armin Braun	1e3edbbe74	Simplify BytesReference StreamInput (#61681 ) (#62014 ) Flattening both streams into a single stream here saves a few objects and some indirection. Also, removed the redundant `offset` field which added nothing but complexity by forcing the incrementation of two counters on every read.	2020-09-05 10:45:52 +02:00
Ryan Ernst	6d3b691048	Add snapshot only test modules (#61954 ) This commit adds external test modules. These are modules meant for external systems to test edge cases in elasticsearch, but only within snapshots. They are not meant to be used in production, so protections are also added from their accidental inclusion in release builds. Note that this commit does not actually add any new modules, it only adds the infrastructure for the new modules, under `test/external-modules`.	2020-09-04 16:35:18 -07:00
Yannick Welsch	6d08b55d4e	Simplify searchable snapshot shard allocation (#61911 ) Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).	2020-09-04 15:45:00 +02:00
Alan Woodward	66bb1eea98	Improve error messages on bad [format] and [null_value] params for date mapper (#61932 ) Currently, if an incorrectly formatted date is passed as a null_value for a date field mapper configuration, you get a vague error: Failed to parse mapping [_doc]: cannot parse empty date Similarly, if you pass an incorrect format, you get the error: Failed to parse mapping [_doc]: Invalid format [...] This commit improves both these errors by including the mapper name and parameter that are misconfigured. Fixes #61712	2020-09-04 14:13:28 +01:00
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Nik Everett	3d23dcd742	Use standard bit set impl in cardinality (#61816 ) (#61930 ) This replaces a specialized bit set implementation used in cardinality with our standard `BitArray` which works exactly the same way. Its also tracked by `BigArrays` which is great!	2020-09-03 12:37:30 -04:00
Nik Everett	3934e14bc0	Fixup vwhisto test (#60936 ) (#61928 ) This test assumed some random bounds that turned out not to hold in some cases. Closes #60673	2020-09-03 12:37:17 -04:00
Alan Woodward	48870c60c7	Don't spin up a whole node to unit test some data structures (#61923 ) BytesRefHashTests and LongObjectHashMapTests currently extend ESSingleNodeTestCase, which builds an entire node just to run some unit tests over entirely in-memory data structures. This commit converts them both to extend ESTestCase.	2020-09-03 17:19:42 +01:00
Alan Woodward	3a1e0edf0a	Convert DateFieldMapperTests to MapperTestCase (#61920 )	2020-09-03 16:04:02 +01:00

1 2 3 4 5 ...

5320 Commits