OpenSearch

Commit Graph

Author	SHA1	Message	Date
Zachary Tong	8d17527050	[TEST] create larger cuckoo filters for tests (#46457 ) The cuckoofilters could be randomly created with too small of capacity or precision, which means that they can only absorb a few values before collisions start to make all filters look identical. This increases the size of filters we generate (capacity >> than the test cases) and lower fpp rate.	2019-09-09 10:18:51 -04:00
David Turner	8428f8e6e8	Remove trailing comma from nodes lists (#46484 ) Today when the membership of the cluster changes we log messages that describe the change like this: added {{node-1}{OPdaTIGmSxaEXXOyg3o96w}{127.0.0.1}{127.0.0.1:9301}{di},} The trailing comma suggests there is some missing string that might contain extra information, but in fact it's an artefact of how these messages are constructed. This commit removes the trailing comma from these lists.	2019-09-09 14:47:32 +01:00
Armin Braun	ee3396735c	Execute SnapshotsService Error Callback on Generic Thread (#46277 ) (#46480 ) I couldn't find a test for this, as it seems we only get into this error handler on a bug. Regardless, we are executing the snapshot finalization on the master update thread here which shouldn't happen and will make debugging a production issue resulting from this trickier than it has to be (because we probably also get a cluster state apply is slow warning in addition to the original bug). Used the generic pool here instead of the snapshot pool because we're resolving the user callback here as well and the generic pool seemed like the safer bet for that.	2019-09-09 14:38:11 +02:00
Nhat Nguyen	24c3a1de3c	Ignore replication for noop updates (#46458 ) Previously, we ignore replication for noop updates because they do not have sequence numbers. Since #44603, we started assigning sequence numbers to noop updates leading them to be replicated to replicas. This bug occurs only on 8.0 for it requires #41065 and #44603. Closes #46366	2019-09-07 11:32:01 -04:00
markharwood	323ec022be	Deprecate the "index.max_adjacency_matrix_filters" index setting (#46394 ) Following performance optimisations to the adjacency_matrix aggregation we no longer require this setting. Marked as deprecated and due for removal in 8.0 Related #46324	2019-09-06 13:59:47 +01:00
Yunfeng,Wu	7582af27b0	Resolve the incorrect scroll_current when delete or close index (#45226 ) Resolve the incorrect current scroll for deleted or closed index	2019-09-06 09:45:53 +02:00
Jim Ferenczi	f2a6c88f83	Add a system property to ignore awareness attributes (#46375 ) This is a follow up of #19191 for 7.x. This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this commit adds a way to opt-in to this new behavior in a minor version of 7.x. Relates #45735	2019-09-06 09:29:27 +02:00
Paul Sanwald	758680c549	version bump to 6.8.4 (#46409 )	2019-09-05 15:14:36 -04:00
Jason Tedor	92866f977a	Clarify error message on keystore write permissions (#46321 ) When the Elasticsearch process does not have write permissions to upgrade the Elasticsearch keystore, we bail with an error message that indicates there is a filesystem permissions problem. This commit clarifies that error message by pointing out the directory where write permissions are required, or that the user can also run the elasticsearch-keystore upgrade command manually before starting the Elasticsearch process. In this case, the upgrade would not be needed at runtime, so the permissions would not be needed then.	2019-09-05 15:11:54 -04:00
Benjamin Trent	d912a49c6f	[7.x] Support geotile_grid aggregation in composite agg sources (#45810 ) (#46399 ) * Support geotile_grid aggregation in composite agg sources (#45810) Adds support for `geotile_grid` as a source in composite aggs. Part of this change includes adding a new docFormat of `GEOTILE` that formats a hashed `long` value into a geotile formatting string `zoom/x/y`.	2019-09-05 13:22:57 -05:00
Armin Braun	7a9af874ad	Enable Debug Logging for Master and Coordination Packages (#46363 ) (#46374 ) In order to track down #46091: * Enables debug logging in REST tests for `master` and `coordination` packages since we suspect that issues are caused by failed and then retried publications	2019-09-05 14:03:38 +02:00
Yannick Welsch	7e4c633ce3	Quiet down shard lock failures (#46368 ) These were actually never intended to be logged at the warning level but made visible by a refactoring in #19991, which introduced a new exception type but forgot to adapt some of the consumers of the exception.	2019-09-05 13:08:11 +02:00
Nhat Nguyen	03ed18a010	Unmute testRecoveryFromFailureOnTrimming Tracked at #46267	2019-09-04 22:33:17 -04:00
Julie Tibshirani	40c3225d26	First round of optimizations for vector functions. (#46294 ) This PR merges the `vectors-optimize-brute-force` feature branch, which makes the following changes to how vector functions are computed: * Precompute the L2 norm of each vector at indexing time. (#45390) * Switch to ByteBuffer for vector encoding. (#45936) * Decode vectors and while computing the vector function. (#46103) * Use an array instead of a List for the query vector. (#46155) * Precompute the normalized query vector when using cosine similarity. (#46190) Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2019-09-04 14:45:57 -07:00
Nhat Nguyen	a16cb89956	Revert "Sync translog without lock when trim unreferenced readers (#46203 )" Unfortunately, with this change, we won't clean up all unreferenced generations when reopening. We assume that there's at most one unreferenced generation when reopening translog. The previous implementation guarantees this assumption by syncing translog every time after we remove a translog reader. This change, however, only syncs translog once after we have removed all unreferenced readers (can be more than one) and breaks the assumption. Closes #46267 This reverts commit fd8183ee51d7cf08d9def58a2ae027714beb60de.	2019-09-04 17:09:39 -04:00
Jason Tedor	3cbdd84b89	Add test that get triggers shard search active (#46317 ) This commit is a follow-up to a change that fixed that multi-get was not triggering a shard to become search active. In that change, we added a test that multi-get properly triggers a shard to become search active. This commit is a follow-up to that change which adds a test for the get case. While get is already handled correctly in production code, there was not a test for it. This commit adds one. Additionally, we factor all the search idle tests from IndexShardIT into a separate test class, as an effort to keep related tests together instead of a single large test class containing a jumble of tests, and also to keep test classes smaller for better parallelization.	2019-09-04 11:53:32 -04:00
markharwood	408b58dd9d	Adjacency_matrix aggregation optimisation. (#46257 ) (#46315 ) Avoid pre-allocating ((N * N) - N) / 2 “BitsIntersector” objects given N filters. Most adjacency matrices will be sparse and we typically don’t need to allocate all of these objects - can save a lot of allocations when the number of filters is high. Closes #46212	2019-09-04 16:45:32 +01:00
Nhat Nguyen	eb56d23421	Do not send recovery requests with CancellableThreads (#46287 ) Previously, we send recovery requests using CancellableThreads because we send requests and wait for responses in a blocking manner. With async recovery, we no longer need to do so. Moreover, if we fail to submit a request, then we can release the Store using an interruptible thread which can risk invalidating the node lock. This PR is the first step to avoid forking when releasing the Store. Relates #45409 Relates #46178	2019-09-04 11:26:11 -04:00
Henning Andersen	5066835569	Fix SearchService.createContext exception handling (#46258 ) An exception from the DefaultSearchContext constructor could leak a searcher, causing future issues like shard lock obtained exceptions. The underlying cause of the exception in the constructor has been fixed, but as a safety precaution we also fix the exception handling in createContext. Closes #45378	2019-09-04 14:46:30 +02:00
Nhat Nguyen	3f67cbe974	Suppress warning from background sync on relocated primary (#46247 ) If a primary as being relocated, then the global checkpoint and retention lease background sync can emit unnecessary warning logs. This side effect was introduced in #42241. Relates #40800 Relates #42241	2019-09-03 18:44:15 -04:00
Nhat Nguyen	5924df1764	Mute testRecoveryFromFailureOnTrimming Tracked at #46267	2019-09-03 18:44:08 -04:00
Lee Hinman	57f322f85e	Move MockRespository into test framework (#46298 ) This moves the `MockRespository` class into `test/framework/src/main` so it can be used across all modules and plugins in tests.	2019-09-03 16:21:10 -06:00
Jason Tedor	b8c51ff894	Multi-get requests should wait for search active (#46283 ) When a shard has fallen search idle, and a non-realtime multi-get request is executed, today such requests do not wait for the shard to become search active and therefore such requests do not wait for a refresh to see the latest changes to the index. This also prevents such requests from triggering the shard as non-search idle, influencing the behavior of scheduled refreshes. This commit addresses this by attaching a listener to the shard search active state for multi-get requests. In this way, when the next scheduled refresh is executed, the multi-get request will then proceed.	2019-09-03 14:31:37 -04:00
Henning Andersen	2383acaa89	Fix testSyncFailsIfOperationIsInFlight (#46269 ) testSyncFailsIfOperationIsInFlight could fail due to the index request spawing a GCP sync (new since 7.4). Test now waits for it to finish before testing that flushed sync fails.	2019-09-03 17:30:00 +02:00
dengweisysu	416419e4c9	Sync translog without lock when trim unreferenced readers (#46203 ) With this change, we can avoid blocking writing threads when trimming unreferenced readers; hence improving the translog writing performance in async durability mode. Close #46201	2019-09-02 21:55:06 -04:00
Anup	e01ec802e7	Remove duplicate line in SearchAfterBuilder (#45994 )	2019-09-03 01:30:01 +02:00
Armin Braun	2662c1b417	Wait for all Rec. to Stop on Node Close (#46178 ) (#46237 ) * Wait for all Rec. to Stop on Node Close * This issue is in the `RecoverySourceHandler#acquireStore`. If we submit the store release to the generic threadpool while it is getting shut down we never complete the futue we wait on (in the generic pool as well) and fail to ever release the store potentially. * Fixed by waiting for all recoveries to end on node close so that we aways have a healthy thread pool here * Closes #45956	2019-09-02 18:04:37 +02:00
Martijn van Groningen	5747badaa8	Allow ingest processors access to node client. (#46077 ) This is the first PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. The plan is to merge changes made to the server module separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation.	2019-09-02 08:24:26 +02:00
Nhat Nguyen	db949847e5	Fix translog stats in testPrepareIndexForPeerRecovery (#46137 ) When recovering a shard locally, we use a translog snapshot from newSnapshotFromGen which consists of all readers from a certain generation. In the test, we use newSnapshotFromMinSeqNo for the expectation. The snapshot of this method includes only readers containing operations in the requesting range. Closes #46022	2019-08-30 08:53:27 -04:00
Andrey Ershov	152ce62c58	Enhanced logging when transport is misconfigured to talk to HTTP port (#45964 ) If a node is misconfigured to talk to remote node HTTP port (instead of transport port) eventually it will receive an HTTP response from the remote node on transport port (this happens when a node sends accidentally line terminating byte in a transport request). If this happens today it results in a non-friendly log message and a long stack trace. This commit adds a check if a malformed response is HTTP response. In this case, a concise log message would appear. (cherry picked from commit 911d02b7a9c3ce7fe316360c127a935ca4b11f37)	2019-08-30 13:02:08 +02:00
Paul Sanwald	8bdbc7d9bf	Bump version from 7.4 to 7.5 (#46142 )	2019-08-29 15:03:26 -04:00
Julie Tibshirani	b5d8b364bb	Ensure top docs optimization is fully disabled for queries with unbounded max scores. (#46105 ) (#46139 ) When a query contains a mandatory clause that doesn't track the max score per block, we disable the max score optimization. Previously, we were doing this by wrapping the collector with a FilterCollector that always returned ScoreMode.COMPLETE. However we weren't adjusting totalHitsThreshold, so the collector could still call Scorer#setMinCompetitiveScore. It is against the method contract to call setMinCompetitiveScore when the score mode is COMPLETE, and some scorers like ReqOptSumScorer throw an error in this case. This commit tries to disable the optimization by always setting totalHitsThreshold to max int, as opposed to wrapping the collector.	2019-08-29 10:56:53 -07:00
Simon Willnauer	9b2ea07b17	Flush engine after big merge (#46066 ) (#46111 ) Today we might carry on a big merge uncommitted and therefore occupy a significant amount of diskspace for quite a long time if for instance indexing load goes down and we are not quickly reaching the translog size threshold. This change will cause a flush if we hit a significant merge (512MB by default) which frees diskspace sooner.	2019-08-29 17:54:15 +02:00
Nhat Nguyen	bb49124690	Only verify global checkpoint if translog sync occurred (#45980 ) We only sync translog if the given offset hasn't synced yet. We can't verify the global checkpoint from the latest translog checkpoint unless a sync has occurred. Closes #46065 Relates #45634	2019-08-29 09:44:40 -04:00
David Turner	d340530a47	Avoid overshooting watermarks during relocation (#46079 ) Today the `DiskThresholdDecider` attempts to account for already-relocating shards when deciding how to allocate or relocate a shard. Its goal is to stop relocating shards onto a node before that node exceeds the low watermark, and to stop relocating shards away from a node as soon as the node drops below the high watermark. The decider handles multiple data paths by only accounting for relocating shards that affect the appropriate data path. However, this mechanism does not correctly account for _new_ relocating shards, which are unwittingly ignored. This means that we may evict far too many shards from a node above the high watermark, and may relocate far too many shards onto a node causing it to blow right past the low watermark and potentially other watermarks too. There are in fact two distinct issues that this PR fixes. New incoming shards have an unknown data path until the `ClusterInfoService` refreshes its statistics. New outgoing shards have a known data path, but we fail to account for the change of the corresponding `ShardRouting` from `STARTED` to `RELOCATING`, meaning that we fail to find the correct data path and treat the path as unknown here too. This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths for all shards. With the changes here, the data paths are handled in tests as they are in production, except that their sizes are fake. Fixes #45177	2019-08-29 12:40:55 +01:00
Jason Tedor	9bc4a24118	Handle delete document level failures (#46100 ) Today we assume that document failures can not occur for deletes. This assumption is bogus, as they can fail for a variety of reasons such as the Lucene index having reached the document limit. Because of this assumption, we were asserting that such a document-level failure would never happen. When this bogus assertion is violated, we fail the node, a catastrophe. Instead, we need to treat this as a fatal engine exception.	2019-08-28 22:17:16 -04:00
Tal Levy	a356bcff41	Add Circle Processor (#43851 ) (#46097 ) add circle-processor that translates circles to polygons	2019-08-28 14:44:08 -07:00
Jason Tedor	1249e6ba5d	Handle no-op document level failures (#46083 ) Today we assume that document failures can not occur for no-ops. This assumption is bogus, as they can fail for a variety of reasons such as the Lucene index having reached the document limit. Because of this assumption, we were asserting that such a document-level failure would never happen. When this bogus assertion is violated, we fail the node, a catastrophe. Instead, we need to treat this as a fatal engine exception.	2019-08-28 13:57:24 -04:00
Tanguy Leroux	9e14ffa8be	Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068 )	2019-08-28 16:29:46 +02:00
Mark Tozzi	aec125faff	Support Range Fields in Histogram and Date Histogram (#46012 ) Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395) * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type. This is similar to how Terms aggregator works. * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation	2019-08-28 09:06:09 -04:00
Henning Andersen	300e717e42	Disallow partial results when shard unavailable (#45739 ) Searching with `allowPartialSearchResults=false` could still return partial search results during recovery. If a shard copy fails with a "shard not available" exception, the failure would be ignored and a partial result returned. The one case where this is known to happen is when a shard copy is recovering when searching, since `IllegalIndexShardStateException` is considered a "shard not available" exception. Relates to #42612	2019-08-27 17:01:23 +02:00
Nhat Nguyen	146e23a8a9	Relax translog assertion in testRestoreLocalHistoryFromTranslog (#45943 ) Since #45473, we trim translog below the local checkpoint of the safe commit immediately if soft-deletes enabled. In testRestoreLocalHistoryFromTranslog, we should have a safe commit after recoverFromTranslog is called; then we will trim translog files which contain only operations that are at most the global checkpoint. With this change, we relax the assertion to ensure that we don't put operations to translog while recovering history from the local translog.	2019-08-26 17:19:19 -04:00
Nhat Nguyen	c66bae39c3	Update translog checkpoint after marking ops as persisted (#45634 ) If two translog syncs happen concurrently, then one can return before its operations are marked as persisted. In general, this should not be an issue; however, peer recoveries currently rely on this assumption. Closes #29161	2019-08-26 17:18:52 -04:00
Nhat Nguyen	f2e8b17696	Do not create engine under IndexShard#mutex (#45263 ) Today we create new engines under IndexShard#mutex. This is not ideal because it can block the cluster state updates which also execute under the same mutex. We can avoid this problem by creating new engines under a separate mutex. Closes #43699	2019-08-26 17:18:29 -04:00
Jason Tedor	3d64605075	Remove node settings from blob store repositories (#45991 ) This commit starts from the simple premise that the use of node settings in blob store repositories is a mistake. Here we see that the node settings are used to get default settings for store and restore throttle rates. Yet, since there are not any node settings registered to this effect, there can never be a default setting to fall back to there, and so we always end up falling back to the default rate. Since this was the only use of node settings in blob store repository, we move them. From this, several places fall out where we were chaining settings through only to get them to the blob store repository, so we clean these up as well. That leaves us with the changeset in this commit.	2019-08-26 16:26:13 -04:00
Zachary Tong	943a016bb2	Add Cumulative Cardinality agg (and Data Science plugin) (#45990 ) This adds a pipeline aggregation that calculates the cumulative cardinality of a field. It does this by iteratively merging in the HLL sketch from consecutive buckets and emitting the cardinality up to that point. This is useful for things like finding the total "new" users that have visited a website (as opposed to "repeat" visitors). This is a Basic+ aggregation and adds a new Data Science plugin to house it and future advanced analytics/data science aggregations.	2019-08-26 16:19:55 -04:00
James Baiera	5535ff0a44	Fix IngestService to respect original document content type (#45799 ) (#45984 ) Backport of #45799 This PR modifies the logic in IngestService to preserve the original content type on the IndexRequest, such that when a document with a content type like SMILE is submitted to a pipeline, the resulting document that is persisted will remain in the original content type (SMILE in this case).	2019-08-26 14:33:33 -04:00
Armin Braun	af2bd75def	Fix Broken HTTP Request Breaking Channel Closing (#45958 ) (#45973 ) This is essentially the same issue fixed in #43362 but for http request version instead of the request method. We have to deal with the case of not being able to parse the request version, otherwise channel closing fails. Fixes #43850	2019-08-26 16:20:58 +02:00
Armin Braun	5a17987e19	Fix SnapshotStatusApisIT (#45929 ) (#45971 ) The snapshot status when blocking can still be INIT in rare cases when the new cluster state that has the snapshot in `STARTED` hasn't yet become visible. Fixes #45917	2019-08-26 15:59:02 +02:00
Andrey Ershov	d96469ddff	Better logging for TLS message on non-secure transport channel (#45835 ) This commit enhances logging for 2 cases: 1. If non-TLS enabled node receives transport message from TLS enabled node on transport port. 2. If non-TLS enabled node receives HTTPs request on transport port. (cherry picked from commit 4f52ebd32eb58526b4c8022f8863210bf88fc9be)	2019-08-26 15:07:13 +02:00

1 2 3 4 5 ...

3581 Commits