OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	20cb95ca5e	Fix testSnapshotRelocatingPrimary to Actually Run Relocations (#46594 ) (#46620 ) Without replicas we won't actually get any relocations going when removing the node constraints in this test. Adjusted the code to force relocations by forbidding nodes that hold primaries instead. Also, fixed the timeouts and asserted that we actually get relocations. Fixes #46276	2019-09-16 15:15:33 +02:00
Andrei Dan	c57cca98b2	[ILM] Add date setting to calculate index age (#46561 ) (#46697 ) * [ILM] Add date setting to calculate index age Add the `index.lifecycle.origination_date` to allow users to configure a custom date that'll be used to calculate the index age for the phase transmissions (as opposed to the default index creation date). This could be useful for users to create an index with an "older" origination date when indexing old data. Relates to #42449. * [ILM] Don't override creation date on policy init The initial approach we took was to override the lifecycle creation date if the `index.lifecycle.origination_date` setting was set. This had the disadvantage of the user not being able to update the `origination_date` anymore once set. This commit changes the way we makes use of the `index.lifecycle.origination_date` setting by checking its value when we calculate the index age (ie. at "read time") and, in case it's not set, default to the index creation date. * Make origination date setting index scope dynamic * Document orignation date setting in ilm settings (cherry picked from commit d5bd2bb77ee28c1978ab6679f941d7c02e389d32) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-09-16 08:50:28 +01:00
Armin Braun	2b85dcb201	Parallelize Repository Cleanup Actions (#46647 ) (#46714 ) * Parallelize Repository Cleanup Actions Deleting root blobs and unreferenced indices can safely happen in parallel, no need to have both operations run sequentially when they preclude all other repository operations.	2019-09-16 07:52:03 +02:00
David Turner	272b0ecbdd	Remove docs for proxy mode (#46677 ) We added docs for proxy mode in #40281 but on reflection we should not be documenting this setting since it does not play well with all proxies and we can't recommend its use. This commit removes those docs and expands its Javadoc instead.	2019-09-13 22:20:11 +01:00
Nhat Nguyen	cabff5a7cd	Handle lower retaining seqno retention lease error (#46420 ) We renew the CCR retention lease at a fixed interval, therefore it's possible to have more than one in-flight renewal requests at the same time. If requests arrive out of order, then the assertion is violated. Closes #46416 Closes #46013	2019-09-13 08:50:19 -04:00
Nhat Nguyen	e1a33c6283	Fix false positive out of sync warning in synced-flush (#46576 ) Synced-flush consists of three steps: (1) force-flush on every active copy; (2) check for ongoing indexing operations; (3) seal copies if there's no change since step 1. If some indexing operations are completed on the primary but not replicas, then Lucene commits from step 1 on replicas won't be the same as the primary's. And step 2 would pass if it's executed when all pending operations are done. Once step 2 passes, we will incorrectly emit the "out of sync" warning message although nothing wrong here. Relates #28464 Relates #30244	2019-09-12 16:34:33 -04:00
Nhat Nguyen	5465c8d095	Increase timeout for relocation tests (#46554 ) There's nothing wrong in the logs from these failures. I think 30 seconds might not be enough to relocate shards with many documents as CI is quite slow. This change increases the timeout to 60 seconds for these relocation tests. It also dumps the hot threads in case of timed out. Closes #46526 Closes #46439	2019-09-12 16:34:01 -04:00
Zachary Tong	e1e06c2589	Add version constant for 7.3.3	2019-09-12 13:50:40 -04:00
Jim Ferenczi	4407f3af1b	Delay the creation of SubSearchContext to the FetchSubPhase (#46598 ) This change delays the creation of the SubSearchContext for nested and parent/child inner_hits to the fetch sub phase in order to ensure that a SearchContext can built entirely from a QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid. Relates #46523	2019-09-12 14:52:15 +02:00
Igor Motov	35cb93248d	Geo: fix indexing of west to east linestrings crossing the antimeridian (#46601 ) Fixes that way linestrings that are crossing the antimeridian are indexed due to a normalization bug these lines were decomposed into a line segment that was stretching entire globe. Fixes #43775	2019-09-11 17:43:17 -04:00
Zachary Tong	6dc8ed5d57	[7.x Backport] Refactor AllocatedPersistentTask#init(), move rollup ctor logic (#46406 ) This makes the AllocatedPersistentTask#init() method protected so that implementing classes can perform their initialization logic there, instead of the constructor. Rollup's task is adjusted to use this init method. It also slightly refactors the methods to se a static logger in the AllocatedTask instead of passing it in via an argument. This is simpler, logged messages come from the task instead of the service, and is easier for tests	2019-09-11 17:00:28 -04:00
Ryan Ernst	fa9327cdb9	Add more meaningful keystore version mismatch errors (#46291 ) This commit changes the version bounds of keystore reading to give better error messages when a user has a too new or too old format. relates #44624	2019-09-11 09:55:19 -07:00
Jim Ferenczi	23bf310c84	Replace the SearchContext with QueryShardContext when building aggregator factories (#46527 ) This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them. The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`. Relates #46523	2019-09-11 16:43:30 +02:00
Armin Braun	27c15f137e	Remove Unused Method from BlobStoreRepository (#46204 ) (#46593 ) This method isn't used anymore and I forgot to delete it.	2019-09-11 16:34:24 +02:00
William Brafford	8c9f15db44	Fix Path comparisons for Windows tests (#46503 ) (#46566 ) * Fix Path comparisons for Windows tests The test NodeEnvironmentTests#testCustonDataPaths worked just fine on Darwin and Linux, but the comparison was breaking in Windows because one path had the "C:\" prefix and the other one didn't. The simple fix is to compare absolute paths rather than potentially relative ones.	2019-09-11 09:33:00 -04:00
Christoph Büscher	aa0c586b73	Deprecate `_field_names` disabling (#42854 ) Currently we allow `_field_names` fields to be disabled explicitely, but since the overhead is negligible now we decided to keep it turned on by default and deprecate the `enable` option on the field type. This change adds a deprecation warning whenever this setting is used, going forward we want to ignore and finally remove it. Closes #27239	2019-09-11 14:58:08 +02:00
Armin Braun	41633cb9b5	More Efficient Ordering of Shard Upload Execution (#42791 ) (#46588 ) * More Efficient Ordering of Shard Upload Execution (#42791) * Change the upload order of of snapshots to work file by file in parallel on the snapshot pool instead of merely shard-by-shard * Inspired by #39657 * Cleanup BlobStoreRepository Abort and Failure Handling (#46208)	2019-09-11 13:59:20 +02:00
Jim Ferenczi	80bb08fbda	Replace the SearchContext with QueryShardContext when building collapsing context (#46543 ) This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one. Relates #46523	2019-09-11 12:25:38 +02:00
Jim Ferenczi	425b1a77e8	Add more context to QueryShardContext (#46584 ) This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext. It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the query shard context. Relates #46523	2019-09-11 12:24:51 +02:00
Armin Braun	f8d5145472	Fix SnapshotStatusApisIT (#46563 ) (#46582 ) Obviously we have to run the status request again to busy wait for the `STARTED` state, just busy waiting on an existing response won't do anything. Closes #45917	2019-09-11 11:58:42 +02:00
Lee Hinman	cdc3a260af	Add retention to Snapshot Lifecycle Management (backport of #4… (#46506 ) * Add retention to Snapshot Lifecycle Management (#46407) This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria. An example policy would look like: ``` PUT /_slm/policy/snapshot-every-day { "schedule": "0 30 2 * * ?", "name": "<production-snap-{now/d}>", "repository": "my-s3-repository", "config": { "indices": ["foo-", "important"] }, // Newly configured retention options "retention": { // Snapshots should be deleted after 14 days "expire_after": "14d", // Keep a maximum of thirty snapshots "max_count": 30, // Keep a minimum of the four most recent snapshots "min_count": 4 } } ``` SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour. Included in this work is a new SLM stats API endpoint available through ``` json GET /_slm/stats ``` That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API. Add base framework for snapshot retention (#43605) * Add base framework for snapshot retention This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask` to start as the basis for SLM's retention implementation. Relates to #38461 * Remove extraneous 'public' * Use a local var instead of reading class var repeatedly * Add SnapshotRetentionConfiguration for retention configuration (#43777) * Add SnapshotRetentionConfiguration for retention configuration This commit adds the `SnapshotRetentionConfiguration` class and its HLRC counterpart to encapsulate the configuration for SLM retention. Currently only a single parameter is supported as an example (we still need to discuss the different options we want to support and their names) to keep the size of the PR down. It also does not yet include version serialization checks since the original SLM branch has not yet been merged. Relates to #43663 * Fix REST tests * Fix more documentation * Use Objects.equals to avoid NPE * Put `randomSnapshotLifecyclePolicy` in only one place * Occasionally return retention with no configuration * Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764) * Implement SnapshotRetentionTask's snapshot filtering and deletion This commit implements the snapshot filtering and deletion for `SnapshotRetentionTask`. Currently only the expire-after age is used for determining whether a snapshot is eligible for deletion. Relates to #43663 * Fix deletes running on the wrong thread * Handle missing or null policy in snap metadata differently * Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>> * Use the `OriginSettingClient` to work with security, enhance logging * Prevent NPE in test by mocking Client * Allow empty/missing SLM retention configuration (#45018) Semi-related to #44465, this allows the `"retention"` configuration map to be missing. Relates to #43663 * Add min_count and max_count as SLM retention predicates (#44926) This adds the configuration options for `min_count` and `max_count` as well as the logic for determining whether a snapshot meets this criteria to SLM's retention feature. These options are optional and one, two, or all three can be specified in an SLM policy. Relates to #43663 * Time-bound deletion of snapshots in retention delete function (#45065) * Time-bound deletion of snapshots in retention delete function With a cluster that has a large number of snapshots, it's possible that snapshot deletion can take a very long time (especially since deletes currently have to happen in a serial fashion). To prevent snapshot deletion from taking forever in a cluster and blocking other operations, this commit adds a setting to allow configuring a maximum time to spend deletion snapshots during retention. This dynamic setting defaults to 1 hour and is best-effort, meaning that it doesn't hard stop a deletion at an hour mark, but ensures that once the time has passed, all subsequent deletions are deferred until the next retention cycle. Relates to #43663 * Wow snapshots suuuure can take a long time. * Use a LongSupplier instead of actually sleeping * Remove TestLogging annotation * Remove rate limiting * Add SLM metrics gathering and endpoint (#45362) * Add SLM metrics gathering and endpoint This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The stats stored include the number of snapshots taken, failed, deleted, the number of retention runs, as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount of time spent deleting snapshots from SLM retention. This commit also adds an endpoint for retrieving all stats (further commits will expose this in the SLM get-policy API) that looks like: ``` GET /_slm/stats { "retention_runs" : 13, "retention_failed" : 0, "retention_timed_out" : 0, "retention_deletion_time" : "1.4s", "retention_deletion_time_millis" : 1404, "policy_metrics" : { "daily-snapshots2" : { "snapshots_taken" : 7, "snapshots_failed" : 0, "snapshots_deleted" : 6, "snapshot_deletion_failures" : 0 }, "daily-snapshots" : { "snapshots_taken" : 12, "snapshots_failed" : 0, "snapshots_deleted" : 12, "snapshot_deletion_failures" : 6 } }, "total_snapshots_taken" : 19, "total_snapshots_failed" : 0, "total_snapshots_deleted" : 18, "total_snapshot_deletion_failures" : 6 } ``` This does not yet include HLRC for this, as this commit is quite large on its own. That will be added in a subsequent commit. Relates to #43663 * Version qualify serialization * Initialize counters outside constructor * Use computeIfAbsent instead of being too verbose * Move part of XContent generation into subclass * Fix REST action for master merge * Unused import * Record history of SLM retention actions (#45513) This commit records the deletion of snapshots by the retention component of SLM into the SLM history index for the purposes of reviewing operations taken by SLM and alerting. * Retry SLM retention after currently running snapshot completes (#45802) * Retry SLM retention after currently running snapshot completes This commit adds a ClusterStateObserver to wait until the currently running snapshot is complete before proceeding with snapshot deletion. SLM retention waits for the maximum allowed deletion time for the snapshot to complete, however, the waiting time is not factored into the limit on actual deletions. Relates to #43663 * Increase timeout waiting for snapshot completion * Apply patch From `2374316f0d`.patch * Rename test variables * [TEST] Be less strict for stats checking * Skip SLM retention if ILM is STOPPING or STOPPED (#45869) This adds a check to ensure we take no action during SLM retention if ILM is currently stopped or in the process of stopping. Relates to #43663 * Check all actions preventing snapshot delete during retention (#45992) * Check all actions preventing snapshot delete during retention run Previously we only checked to see if a snapshot was currently running, but it turns out that more things can block snapshot deletion. This changes the check to be a check for: - a snapshot currently running - a deletion already in progress - a repo cleanup in progress - a restore currently running This was found by CI where a third party delete in a test caused SLM retention deletion to throw an exception. Relates to #43663 * Add unit test for okayToDeleteSnapshots * Fix bug where SLM retention task would be scheduled on every node * Enhance test logging * Ignore if snapshot is already deleted * Missing import * Fix SnapshotRetentionServiceTests * Expose SLM policy stats in get SLM policy API (#45989) This also adds support for the SLM stats endpoint to the high level rest client. Retrieving a policy now looks like: ```json { "daily-snapshots" : { "version": 1, "modified_date": "2019-04-23T01:30:00.000Z", "modified_date_millis": 1556048137314, "policy" : { "schedule": "0 30 1 * * ?", "name": "<daily-snap-{now/d}>", "repository": "my_repository", "config": { "indices": ["data-", "important"], "ignore_unavailable": false, "include_global_state": false }, "retention": {} }, "stats": { "snapshots_taken": 0, "snapshots_failed": 0, "snapshots_deleted": 0, "snapshot_deletion_failures": 0 }, "next_execution": "2019-04-24T01:30:00.000Z", "next_execution_millis": 1556048160000 } } ``` Relates to #43663 Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356) * Rewrite SnapshotLifecycleIT as as ESIntegTestCase This commit splits `SnapshotLifecycleIT` into two different tests. `SnapshotLifecycleRestIT` which includes the tests that do not require slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an integration test using `MockRepository` to simulate a snapshot being in progress. Relates to #43663 Resolves #46205 * Add error logging when exceptions are thrown * Update serialization versions * Fix type inference * Use non-Cancellable HLRC return value * Fix Client mocking in test * Fix SLMSnapshotBlockingIntegTests for 7.x branch * Update SnapshotRetentionTask for non-multi-repo snapshot retrieval * Add serialization guards for SnapshotLifecyclePolicy	2019-09-10 09:08:09 -06:00
Mayya Sharipova	2c5f9b558b	Fix highlighting for script_score query (#46507 )	2019-09-10 08:26:47 -04:00
David Turner	6c67b53932	Load metadata at start time not construction time (#46326 ) Today we load the metadata from disk while constructing the node. However there is no real need to do so, and this commit moves that code to run later while the node is starting instead.	2019-09-10 11:15:10 +01:00
Henning Andersen	9fce5a99d8	Rest Controller wildcard registration (#46487 ) Registering two different http methods on the same path using different wildcard names would result in the last wildcard name being active only. Now throw an exception instead. Closes #46482	2019-09-09 21:49:18 +02:00
Zachary Tong	8d17527050	[TEST] create larger cuckoo filters for tests (#46457 ) The cuckoofilters could be randomly created with too small of capacity or precision, which means that they can only absorb a few values before collisions start to make all filters look identical. This increases the size of filters we generate (capacity >> than the test cases) and lower fpp rate.	2019-09-09 10:18:51 -04:00
David Turner	8428f8e6e8	Remove trailing comma from nodes lists (#46484 ) Today when the membership of the cluster changes we log messages that describe the change like this: added {{node-1}{OPdaTIGmSxaEXXOyg3o96w}{127.0.0.1}{127.0.0.1:9301}{di},} The trailing comma suggests there is some missing string that might contain extra information, but in fact it's an artefact of how these messages are constructed. This commit removes the trailing comma from these lists.	2019-09-09 14:47:32 +01:00
Armin Braun	ee3396735c	Execute SnapshotsService Error Callback on Generic Thread (#46277 ) (#46480 ) I couldn't find a test for this, as it seems we only get into this error handler on a bug. Regardless, we are executing the snapshot finalization on the master update thread here which shouldn't happen and will make debugging a production issue resulting from this trickier than it has to be (because we probably also get a cluster state apply is slow warning in addition to the original bug). Used the generic pool here instead of the snapshot pool because we're resolving the user callback here as well and the generic pool seemed like the safer bet for that.	2019-09-09 14:38:11 +02:00
Nhat Nguyen	24c3a1de3c	Ignore replication for noop updates (#46458 ) Previously, we ignore replication for noop updates because they do not have sequence numbers. Since #44603, we started assigning sequence numbers to noop updates leading them to be replicated to replicas. This bug occurs only on 8.0 for it requires #41065 and #44603. Closes #46366	2019-09-07 11:32:01 -04:00
markharwood	323ec022be	Deprecate the "index.max_adjacency_matrix_filters" index setting (#46394 ) Following performance optimisations to the adjacency_matrix aggregation we no longer require this setting. Marked as deprecated and due for removal in 8.0 Related #46324	2019-09-06 13:59:47 +01:00
Yunfeng,Wu	7582af27b0	Resolve the incorrect scroll_current when delete or close index (#45226 ) Resolve the incorrect current scroll for deleted or closed index	2019-09-06 09:45:53 +02:00
Jim Ferenczi	f2a6c88f83	Add a system property to ignore awareness attributes (#46375 ) This is a follow up of #19191 for 7.x. This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this commit adds a way to opt-in to this new behavior in a minor version of 7.x. Relates #45735	2019-09-06 09:29:27 +02:00
Paul Sanwald	758680c549	version bump to 6.8.4 (#46409 )	2019-09-05 15:14:36 -04:00
Jason Tedor	92866f977a	Clarify error message on keystore write permissions (#46321 ) When the Elasticsearch process does not have write permissions to upgrade the Elasticsearch keystore, we bail with an error message that indicates there is a filesystem permissions problem. This commit clarifies that error message by pointing out the directory where write permissions are required, or that the user can also run the elasticsearch-keystore upgrade command manually before starting the Elasticsearch process. In this case, the upgrade would not be needed at runtime, so the permissions would not be needed then.	2019-09-05 15:11:54 -04:00
Benjamin Trent	d912a49c6f	[7.x] Support geotile_grid aggregation in composite agg sources (#45810 ) (#46399 ) * Support geotile_grid aggregation in composite agg sources (#45810) Adds support for `geotile_grid` as a source in composite aggs. Part of this change includes adding a new docFormat of `GEOTILE` that formats a hashed `long` value into a geotile formatting string `zoom/x/y`.	2019-09-05 13:22:57 -05:00
Armin Braun	7a9af874ad	Enable Debug Logging for Master and Coordination Packages (#46363 ) (#46374 ) In order to track down #46091: * Enables debug logging in REST tests for `master` and `coordination` packages since we suspect that issues are caused by failed and then retried publications	2019-09-05 14:03:38 +02:00
Yannick Welsch	7e4c633ce3	Quiet down shard lock failures (#46368 ) These were actually never intended to be logged at the warning level but made visible by a refactoring in #19991, which introduced a new exception type but forgot to adapt some of the consumers of the exception.	2019-09-05 13:08:11 +02:00
Nhat Nguyen	03ed18a010	Unmute testRecoveryFromFailureOnTrimming Tracked at #46267	2019-09-04 22:33:17 -04:00
Julie Tibshirani	40c3225d26	First round of optimizations for vector functions. (#46294 ) This PR merges the `vectors-optimize-brute-force` feature branch, which makes the following changes to how vector functions are computed: * Precompute the L2 norm of each vector at indexing time. (#45390) * Switch to ByteBuffer for vector encoding. (#45936) * Decode vectors and while computing the vector function. (#46103) * Use an array instead of a List for the query vector. (#46155) * Precompute the normalized query vector when using cosine similarity. (#46190) Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2019-09-04 14:45:57 -07:00
Nhat Nguyen	a16cb89956	Revert "Sync translog without lock when trim unreferenced readers (#46203 )" Unfortunately, with this change, we won't clean up all unreferenced generations when reopening. We assume that there's at most one unreferenced generation when reopening translog. The previous implementation guarantees this assumption by syncing translog every time after we remove a translog reader. This change, however, only syncs translog once after we have removed all unreferenced readers (can be more than one) and breaks the assumption. Closes #46267 This reverts commit fd8183ee51d7cf08d9def58a2ae027714beb60de.	2019-09-04 17:09:39 -04:00
Jason Tedor	3cbdd84b89	Add test that get triggers shard search active (#46317 ) This commit is a follow-up to a change that fixed that multi-get was not triggering a shard to become search active. In that change, we added a test that multi-get properly triggers a shard to become search active. This commit is a follow-up to that change which adds a test for the get case. While get is already handled correctly in production code, there was not a test for it. This commit adds one. Additionally, we factor all the search idle tests from IndexShardIT into a separate test class, as an effort to keep related tests together instead of a single large test class containing a jumble of tests, and also to keep test classes smaller for better parallelization.	2019-09-04 11:53:32 -04:00
markharwood	408b58dd9d	Adjacency_matrix aggregation optimisation. (#46257 ) (#46315 ) Avoid pre-allocating ((N * N) - N) / 2 “BitsIntersector” objects given N filters. Most adjacency matrices will be sparse and we typically don’t need to allocate all of these objects - can save a lot of allocations when the number of filters is high. Closes #46212	2019-09-04 16:45:32 +01:00
Nhat Nguyen	eb56d23421	Do not send recovery requests with CancellableThreads (#46287 ) Previously, we send recovery requests using CancellableThreads because we send requests and wait for responses in a blocking manner. With async recovery, we no longer need to do so. Moreover, if we fail to submit a request, then we can release the Store using an interruptible thread which can risk invalidating the node lock. This PR is the first step to avoid forking when releasing the Store. Relates #45409 Relates #46178	2019-09-04 11:26:11 -04:00
Henning Andersen	5066835569	Fix SearchService.createContext exception handling (#46258 ) An exception from the DefaultSearchContext constructor could leak a searcher, causing future issues like shard lock obtained exceptions. The underlying cause of the exception in the constructor has been fixed, but as a safety precaution we also fix the exception handling in createContext. Closes #45378	2019-09-04 14:46:30 +02:00
Nhat Nguyen	3f67cbe974	Suppress warning from background sync on relocated primary (#46247 ) If a primary as being relocated, then the global checkpoint and retention lease background sync can emit unnecessary warning logs. This side effect was introduced in #42241. Relates #40800 Relates #42241	2019-09-03 18:44:15 -04:00
Nhat Nguyen	5924df1764	Mute testRecoveryFromFailureOnTrimming Tracked at #46267	2019-09-03 18:44:08 -04:00
Lee Hinman	57f322f85e	Move MockRespository into test framework (#46298 ) This moves the `MockRespository` class into `test/framework/src/main` so it can be used across all modules and plugins in tests.	2019-09-03 16:21:10 -06:00
Jason Tedor	b8c51ff894	Multi-get requests should wait for search active (#46283 ) When a shard has fallen search idle, and a non-realtime multi-get request is executed, today such requests do not wait for the shard to become search active and therefore such requests do not wait for a refresh to see the latest changes to the index. This also prevents such requests from triggering the shard as non-search idle, influencing the behavior of scheduled refreshes. This commit addresses this by attaching a listener to the shard search active state for multi-get requests. In this way, when the next scheduled refresh is executed, the multi-get request will then proceed.	2019-09-03 14:31:37 -04:00
Henning Andersen	2383acaa89	Fix testSyncFailsIfOperationIsInFlight (#46269 ) testSyncFailsIfOperationIsInFlight could fail due to the index request spawing a GCP sync (new since 7.4). Test now waits for it to finish before testing that flushed sync fails.	2019-09-03 17:30:00 +02:00
dengweisysu	416419e4c9	Sync translog without lock when trim unreferenced readers (#46203 ) With this change, we can avoid blocking writing threads when trimming unreferenced readers; hence improving the translog writing performance in async durability mode. Close #46201	2019-09-02 21:55:06 -04:00
Anup	e01ec802e7	Remove duplicate line in SearchAfterBuilder (#45994 )	2019-09-03 01:30:01 +02:00
Armin Braun	2662c1b417	Wait for all Rec. to Stop on Node Close (#46178 ) (#46237 ) * Wait for all Rec. to Stop on Node Close * This issue is in the `RecoverySourceHandler#acquireStore`. If we submit the store release to the generic threadpool while it is getting shut down we never complete the futue we wait on (in the generic pool as well) and fail to ever release the store potentially. * Fixed by waiting for all recoveries to end on node close so that we aways have a healthy thread pool here * Closes #45956	2019-09-02 18:04:37 +02:00
Martijn van Groningen	5747badaa8	Allow ingest processors access to node client. (#46077 ) This is the first PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. The plan is to merge changes made to the server module separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation.	2019-09-02 08:24:26 +02:00
Nhat Nguyen	db949847e5	Fix translog stats in testPrepareIndexForPeerRecovery (#46137 ) When recovering a shard locally, we use a translog snapshot from newSnapshotFromGen which consists of all readers from a certain generation. In the test, we use newSnapshotFromMinSeqNo for the expectation. The snapshot of this method includes only readers containing operations in the requesting range. Closes #46022	2019-08-30 08:53:27 -04:00
Andrey Ershov	152ce62c58	Enhanced logging when transport is misconfigured to talk to HTTP port (#45964 ) If a node is misconfigured to talk to remote node HTTP port (instead of transport port) eventually it will receive an HTTP response from the remote node on transport port (this happens when a node sends accidentally line terminating byte in a transport request). If this happens today it results in a non-friendly log message and a long stack trace. This commit adds a check if a malformed response is HTTP response. In this case, a concise log message would appear. (cherry picked from commit 911d02b7a9c3ce7fe316360c127a935ca4b11f37)	2019-08-30 13:02:08 +02:00
Paul Sanwald	8bdbc7d9bf	Bump version from 7.4 to 7.5 (#46142 )	2019-08-29 15:03:26 -04:00
Julie Tibshirani	b5d8b364bb	Ensure top docs optimization is fully disabled for queries with unbounded max scores. (#46105 ) (#46139 ) When a query contains a mandatory clause that doesn't track the max score per block, we disable the max score optimization. Previously, we were doing this by wrapping the collector with a FilterCollector that always returned ScoreMode.COMPLETE. However we weren't adjusting totalHitsThreshold, so the collector could still call Scorer#setMinCompetitiveScore. It is against the method contract to call setMinCompetitiveScore when the score mode is COMPLETE, and some scorers like ReqOptSumScorer throw an error in this case. This commit tries to disable the optimization by always setting totalHitsThreshold to max int, as opposed to wrapping the collector.	2019-08-29 10:56:53 -07:00
Simon Willnauer	9b2ea07b17	Flush engine after big merge (#46066 ) (#46111 ) Today we might carry on a big merge uncommitted and therefore occupy a significant amount of diskspace for quite a long time if for instance indexing load goes down and we are not quickly reaching the translog size threshold. This change will cause a flush if we hit a significant merge (512MB by default) which frees diskspace sooner.	2019-08-29 17:54:15 +02:00
Nhat Nguyen	bb49124690	Only verify global checkpoint if translog sync occurred (#45980 ) We only sync translog if the given offset hasn't synced yet. We can't verify the global checkpoint from the latest translog checkpoint unless a sync has occurred. Closes #46065 Relates #45634	2019-08-29 09:44:40 -04:00
David Turner	d340530a47	Avoid overshooting watermarks during relocation (#46079 ) Today the `DiskThresholdDecider` attempts to account for already-relocating shards when deciding how to allocate or relocate a shard. Its goal is to stop relocating shards onto a node before that node exceeds the low watermark, and to stop relocating shards away from a node as soon as the node drops below the high watermark. The decider handles multiple data paths by only accounting for relocating shards that affect the appropriate data path. However, this mechanism does not correctly account for _new_ relocating shards, which are unwittingly ignored. This means that we may evict far too many shards from a node above the high watermark, and may relocate far too many shards onto a node causing it to blow right past the low watermark and potentially other watermarks too. There are in fact two distinct issues that this PR fixes. New incoming shards have an unknown data path until the `ClusterInfoService` refreshes its statistics. New outgoing shards have a known data path, but we fail to account for the change of the corresponding `ShardRouting` from `STARTED` to `RELOCATING`, meaning that we fail to find the correct data path and treat the path as unknown here too. This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths for all shards. With the changes here, the data paths are handled in tests as they are in production, except that their sizes are fake. Fixes #45177	2019-08-29 12:40:55 +01:00
Jason Tedor	9bc4a24118	Handle delete document level failures (#46100 ) Today we assume that document failures can not occur for deletes. This assumption is bogus, as they can fail for a variety of reasons such as the Lucene index having reached the document limit. Because of this assumption, we were asserting that such a document-level failure would never happen. When this bogus assertion is violated, we fail the node, a catastrophe. Instead, we need to treat this as a fatal engine exception.	2019-08-28 22:17:16 -04:00
Tal Levy	a356bcff41	Add Circle Processor (#43851 ) (#46097 ) add circle-processor that translates circles to polygons	2019-08-28 14:44:08 -07:00
Jason Tedor	1249e6ba5d	Handle no-op document level failures (#46083 ) Today we assume that document failures can not occur for no-ops. This assumption is bogus, as they can fail for a variety of reasons such as the Lucene index having reached the document limit. Because of this assumption, we were asserting that such a document-level failure would never happen. When this bogus assertion is violated, we fail the node, a catastrophe. Instead, we need to treat this as a fatal engine exception.	2019-08-28 13:57:24 -04:00
Tanguy Leroux	9e14ffa8be	Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068 )	2019-08-28 16:29:46 +02:00
Mark Tozzi	aec125faff	Support Range Fields in Histogram and Date Histogram (#46012 ) Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395) * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type. This is similar to how Terms aggregator works. * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation	2019-08-28 09:06:09 -04:00
Henning Andersen	300e717e42	Disallow partial results when shard unavailable (#45739 ) Searching with `allowPartialSearchResults=false` could still return partial search results during recovery. If a shard copy fails with a "shard not available" exception, the failure would be ignored and a partial result returned. The one case where this is known to happen is when a shard copy is recovering when searching, since `IllegalIndexShardStateException` is considered a "shard not available" exception. Relates to #42612	2019-08-27 17:01:23 +02:00
Nhat Nguyen	146e23a8a9	Relax translog assertion in testRestoreLocalHistoryFromTranslog (#45943 ) Since #45473, we trim translog below the local checkpoint of the safe commit immediately if soft-deletes enabled. In testRestoreLocalHistoryFromTranslog, we should have a safe commit after recoverFromTranslog is called; then we will trim translog files which contain only operations that are at most the global checkpoint. With this change, we relax the assertion to ensure that we don't put operations to translog while recovering history from the local translog.	2019-08-26 17:19:19 -04:00
Nhat Nguyen	c66bae39c3	Update translog checkpoint after marking ops as persisted (#45634 ) If two translog syncs happen concurrently, then one can return before its operations are marked as persisted. In general, this should not be an issue; however, peer recoveries currently rely on this assumption. Closes #29161	2019-08-26 17:18:52 -04:00
Nhat Nguyen	f2e8b17696	Do not create engine under IndexShard#mutex (#45263 ) Today we create new engines under IndexShard#mutex. This is not ideal because it can block the cluster state updates which also execute under the same mutex. We can avoid this problem by creating new engines under a separate mutex. Closes #43699	2019-08-26 17:18:29 -04:00
Jason Tedor	3d64605075	Remove node settings from blob store repositories (#45991 ) This commit starts from the simple premise that the use of node settings in blob store repositories is a mistake. Here we see that the node settings are used to get default settings for store and restore throttle rates. Yet, since there are not any node settings registered to this effect, there can never be a default setting to fall back to there, and so we always end up falling back to the default rate. Since this was the only use of node settings in blob store repository, we move them. From this, several places fall out where we were chaining settings through only to get them to the blob store repository, so we clean these up as well. That leaves us with the changeset in this commit.	2019-08-26 16:26:13 -04:00
Zachary Tong	943a016bb2	Add Cumulative Cardinality agg (and Data Science plugin) (#45990 ) This adds a pipeline aggregation that calculates the cumulative cardinality of a field. It does this by iteratively merging in the HLL sketch from consecutive buckets and emitting the cardinality up to that point. This is useful for things like finding the total "new" users that have visited a website (as opposed to "repeat" visitors). This is a Basic+ aggregation and adds a new Data Science plugin to house it and future advanced analytics/data science aggregations.	2019-08-26 16:19:55 -04:00
James Baiera	5535ff0a44	Fix IngestService to respect original document content type (#45799 ) (#45984 ) Backport of #45799 This PR modifies the logic in IngestService to preserve the original content type on the IndexRequest, such that when a document with a content type like SMILE is submitted to a pipeline, the resulting document that is persisted will remain in the original content type (SMILE in this case).	2019-08-26 14:33:33 -04:00
Armin Braun	af2bd75def	Fix Broken HTTP Request Breaking Channel Closing (#45958 ) (#45973 ) This is essentially the same issue fixed in #43362 but for http request version instead of the request method. We have to deal with the case of not being able to parse the request version, otherwise channel closing fails. Fixes #43850	2019-08-26 16:20:58 +02:00
Armin Braun	5a17987e19	Fix SnapshotStatusApisIT (#45929 ) (#45971 ) The snapshot status when blocking can still be INIT in rare cases when the new cluster state that has the snapshot in `STARTED` hasn't yet become visible. Fixes #45917	2019-08-26 15:59:02 +02:00
Andrey Ershov	d96469ddff	Better logging for TLS message on non-secure transport channel (#45835 ) This commit enhances logging for 2 cases: 1. If non-TLS enabled node receives transport message from TLS enabled node on transport port. 2. If non-TLS enabled node receives HTTPs request on transport port. (cherry picked from commit 4f52ebd32eb58526b4c8022f8863210bf88fc9be)	2019-08-26 15:07:13 +02:00
Jason Tedor	599bf2d68b	Deprecate the pidfile setting (#45938 ) This commit deprecates the pidfile setting in favor of node.pidfile.	2019-08-23 21:31:35 -04:00
Mayya Sharipova	3bc1494d38	Correct warning testScalingThreadPoolConfiguration Correct expected warning Closes #45907	2019-08-23 10:30:36 -04:00
Henning Andersen	46d9a575db	Fix RemoteClusterConnection close race (#45898 ) Closing a `RemoteClusterConnection` concurrently with trying to connect could result in double invoking the listener. This fixes RemoteClusterConnectionTest#testCloseWhileConcurrentlyConnecting Closes #45845	2019-08-23 14:26:02 +02:00
Tanguy Leroux	8e66df9925	Move testRetentionLeasesClearedOnRestore (#45896 )	2019-08-23 13:43:40 +02:00
Alexander Reelsen	ecafe4f4ad	Update joda to 2.10.3 (#45495 )	2019-08-23 10:39:39 +02:00
Armin Braun	ba6d72ea9f	Fix TransportSnapshotsStatusAction ThreadPool Use (#45824 ) (#45883 ) In case of an in-progress snapshot this endpoint was broken because it tried to execute repository operations in the callback on a transport thread which is not allowed (only generic or snapshot pool are allowed here).	2019-08-23 06:17:50 +02:00
Jason Tedor	de6b6fd338	Add node.processors setting in favor of processors (#45885 ) This commit namespaces the existing processors setting under the "node" namespace. In doing so, we deprecate the existing processors setting in favor of node.processors.	2019-08-22 22:18:37 -04:00
Nhat Nguyen	3393f9599e	Ignore translog retention policy if soft-deletes enabled (#45473 ) Since #45136, we use soft-deletes instead of translog in peer recovery. There's no need to retain extra translog to increase a chance of operation-based recoveries. This commit ignores the translog retention policy if soft-deletes is enabled so we can discard translog more quickly. Backport of #45473 Relates #45136	2019-08-22 16:40:06 -04:00
dengweisysu	72c6302d12	Fsync translog without writeLock before rolling (#45765 ) Today, when rolling a new translog generation, we block all write threads until a new generation is created. This choice is perfectly fine except in a highly concurrent environment with the translog async setting. We can reduce the blocking time by pre-sync the current generation without writeLock before rolling. The new step would fsync most of the data of the current generation without blocking write threads. Close #45371	2019-08-22 16:18:42 -04:00
William Brafford	f82c0f56a6	Mute flaky RemoteClusterConnection test (#45850 )	2019-08-22 15:00:43 -04:00
Jake Landis	c60399c77f	introduce 7.3.2 version to 7.x (#45864 )	2019-08-22 12:24:19 -05:00
Andrey Ershov	ed8307c198	Deprecate es.http.cname_in_publish_address setting (#45616 ) Follow up on #32806. The system property es.http.cname_in_publish_address is deprecated starting from 7.0.0 and deprecation warning should be added if the property is specified. This PR will go to 7.x and master. Follow-up PR to remove es.http.cname_in_publish_address property completely will go to the master. (cherry picked from commit a5ceca7715818f47ec87dd5f17f8812c584b592b)	2019-08-22 12:09:35 +02:00
Armin Braun	88acae48ce	Remove index-N Rebuild in Shard Snapshot Updates (#45740 ) (#45778 ) * There is no point in listing out every shard over and over when the `index-N` blob in the shard contains a list of all the files * Rebuilding the `index-N` from the `snap-${uuid}.dat` blobs does not provide any material benefit. It only would in the corner case of a corrupted `index-N` but otherwise uncorrupted blobs since we neither check the correctness of the content of all segment blobs nor do we do a similar recovery at the root of the repository. * Also, at least in version `6.x` we only mark a shard snapshot as successful after writing out the updated `index-N` blob so all snapshots that would work with `7.x` and newer must have correct `index-N` blobs => Removed the rebuilding of the `index-N` content from `snap-${uuid}.dat` files and moved to only listing `index-N` when taking a snapshot instead of listing all files => Removed check of file existence against physical blob listing => Kept full listing on the delete side to retain full cleanup of blobs that aren't referenced by the `index-N`	2019-08-22 11:32:45 +02:00
Luca Cavanna	b95ca9c3bb	Fix compile errors in HttpChannelTaskHandler Relates to #43332	2019-08-22 11:13:26 +02:00
Luca Cavanna	a47ade3e64	Cancel search task on connection close (#43332 ) This PR introduces a mechanism to cancel a search task when its corresponding connection gets closed. That would relief users from having to manually deal with tasks and cancel them if needed. Especially the process of finding the task_id requires calling get tasks which needs to call every node in the cluster. The implementation is based on associating each http channel with its currently running search task, and cancelling the task when the previously registered close listener gets called.	2019-08-22 10:43:20 +02:00
Nhat Nguyen	3029887451	Never release store using CancellableThreads (#45409 ) Today we can release a Store using CancellableThreads. If we are holding the last reference, then we will verify the node lock before deleting the store. Checking node lock performs some I/O on FileChannel. If the current thread is interrupted, then the channel will be closed and the node lock will also be invalid. Closes #45237	2019-08-21 21:24:31 -04:00
Tal Levy	9b14b7298b	[7.x] Add is_write_index column to cat.aliases (#45798 ) * Add is_write_index column to cat.aliases (#44772) Aliases have had the option to set `is_write_index` since 6.4, but the cat.aliases action was never updated. * correct version bounds to 7.4	2019-08-21 14:15:49 -07:00
William Brafford	2b549e7342	CLI tools: write errors to stderr instead of stdout (#45586 ) Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools. This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr. Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo \| sort).	2019-08-21 14:46:07 -04:00
Armin Braun	790765d3f9	Remove Dep. on SnapshotsService in SnapshotShardsService (#45776 ) (#45791 ) SnapshotShardsService depends on the RepositoriesService not the SnapshotsService, no need to have this indirection.	2019-08-21 19:26:19 +02:00
Armin Braun	6aaee8aa0a	Repository Cleanup Endpoint (#43900 ) (#45780 ) * Repository Cleanup Endpoint (#43900) * Snapshot cleanup functionality via transport/REST endpoint. * Added all the infrastructure for this with the HLRC and node client * Made use of it in tests and resolved relevant TODO * Added new `Custom` CS element that tracks the cleanup logic. Kept it similar to the delete and in progress classes and gave it some (for now) redundant way of handling multiple cleanups but only allow one * Use the exact same mechanism used by deletes to have the combination of CS entry and increment in repository state ID provide some concurrency safety (the initial approach of just an entry in the CS was not enough, we must increment the repository state ID to be safe against concurrent modifications, otherwise we run the risk of "cleaning up" blobs that just got created without noticing) * Isolated the logic to the transport action class as much as I could. It's not ideal, but we don't need to keep any state and do the same for other repository operations (like getting the detailed snapshot shard status)	2019-08-21 17:59:49 +02:00
Jim Ferenczi	fe2a7523ec	Add support for inlined user dictionary in the Kuromoji plugin (#45489 ) This change adds a new option called user_dictionary_rules to Kuromoji's tokenizer. It can be used to set additional tokenization rules to the Japanese tokenizer directly in the settings (instead of using a file). This commit also adds a check that no rules are duplicated since this is not allowed in the UserDictionary. Closes #25343	2019-08-21 16:28:30 +02:00
Christos Soulios	2a0c7c40e5	[7.x] Implement AvgAggregatorTests#testDontCacheScripts and remove AvgIT #45746 Backports PR #45737: Similar to PR #45030 integration test testDontCacheScripts() was moved to unit test AvgAggregatorTests#testDontCacheScripts. AvgIT class was removed.	2019-08-20 20:19:51 +03:00
Christos Soulios	96a40acd82	[7.x] Migrate tests from MaxIT to MaxAggregatorTests (#45030 ) #45742 Backports PR #45030 to 7.x: This PR migrates tests from MaxIT integration test to MaxAggregatorTests, as described in #42893	2019-08-20 18:58:47 +03:00
Nhat Nguyen	e9759b2b33	Wait for background refresh in testAutomaticRefresh (#45661 ) If the background refresh is running, but not finished yet then the document might not be visible to the next search. Thus, if scheduledRefresh returns false, we need to wait until the background refresh is done. Closes #45571	2019-08-20 10:40:12 -04:00
Rory Hunter	47b3dccbc4	Always check that cgroup data is present (#45647 ) `OsProbe` fetches cgroup data from the filesystem, and has asserts that check for missing values. This PR changes most of these asserts into runtime checks, since at least one user has reported an NPE where a piece of cgroup data was missing. Backport of #45606 to 7.x.	2019-08-19 10:29:41 +01:00
Nhat Nguyen	6f5d944fbd	Ensure AsyncTask#isScheduled remain false after close (#45687 ) If a scheduled task of an AbstractAsyncTask starts after it was closed, then isScheduledOrRunning can remain true forever although no task is running or scheduled. Closes #45576	2019-08-17 13:48:50 -04:00
Vega	6f2daa85e3	Allow uppercase in keystore setting names (#45222 ) The elasticsearch keystore was originally backed by a PKCS#12 keystore, which had several limitations. To overcome some of these limitations in encoding, the setting names existing within the keystore were limited to lowercase alphanumberic (with underscore). Now that the keystore is backed by an encrypted blob, this restriction is no longer relevant. This commit relaxes that restriction by allowing uppercase ascii characters as well. closes #43835	2019-08-16 17:50:08 -07:00
Igor Motov	98c850c08b	Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618 ) Changes the order of parameters in Geometries from lat, lon to lon, lat and moves all Geometry classes are moved to the org.elasticsearch.geomtery package. Backport of #45332 Closes #45048	2019-08-16 14:42:02 -04:00
Ryan Ernst	742213d710	Improve error message when index settings are not a map (#45588 ) This commit adds an explicit error message when a create index request contains a settings key that is not a json object. Prior to this change the user would be given a ClassCastException with no explanation of what went wrong. closes #45126	2019-08-16 11:39:26 -07:00
Zachary Tong	50c65d05ba	Move bucket reduction from Bucket to the InternalAgg (#45566 ) The current idiom is to have the InternalAggregator find all the buckets sharing the same key, put them in a list, get the first bucket and ask that bucket to reduce all the buckets (including itself). This a somewhat confusing workflow, and feels like the aggregator should be reducing the buckets (since the aggregator owns the buckets), rather than asking one bucket to do all the reductions. This commit basically moves the `Bucket.reduce()` method to the InternalAgg and renames it `reduceBucket()`. It also moves the `createBucket()` (or equivalent) method from the bucket to the InternalAgg as well.	2019-08-16 13:59:00 -04:00
Andrey Ershov	dbc90653dc	transport.publish_address should contain CNAME (#45626 ) This commit adds CNAME reporting for transport.publish_address same way it's done for http.publish_address. Relates #32806 Relates #39970 (cherry picked from commit e0a2558a4c3a6b6fbfc6cd17ed34a6f6ef7b15a9)	2019-08-16 17:42:00 +02:00
Armin Braun	d6a9edea16	Lower Limit for Maximum Message Size in TcpTransport (#44496 ) (#45635 ) * Since we're buffering network reads to the heap and then deserializing them it makes no sense to buffer a message that is 90% of the heap size since we couldn't deserialize it anyway * I think `30%` is a more reasonable guess here given that we can reasonably assume that the deserialized message will be larger than the serialized message itself and processing it will take additional heap as well	2019-08-16 12:27:54 +02:00
Armin Braun	a48242c371	Cleanup Redundant TransportLogger Instantiation (#43265 ) (#45629 ) * This class' methods are all effectively `static` => make them `static` and stop instantiating it needlessly	2019-08-15 21:16:56 +02:00
Zachary Tong	cd441f6906	Catch AllocatedTask registration failures (#45300 ) When a persistent task attempts to register an allocated task locally, this creates the Task object and starts tracking it locally. If there is a failure while initializing the task, this is handled by a catch and subsequent error handling (canceling, unregistering, etc). But if the task fails to be created because an exception is thrown in the tasks ctor, this is uncaught and fails the cluster update thread. The ramification is that a persistent task remains in the cluster state, but is unable to create the allocated task, and the exception prevents other tasks "after" the poisoned task from starting too. Because the allocated task is never created, the cancellation tools are not able to remove the persistent task and it is stuck as a zombie in the CS. This commit adds exception handling around the task creation, and attempts to notify the master if there is a failure (so the persistent task can be removed). Even if this notification fails, the exception handling means the rest of the uninitialized tasks can proceed as normal.	2019-08-15 15:14:19 -04:00
Armin Braun	de58353722	Lower Painless Static Memory Footprint (#45487 ) (#45619 ) * Painless generates a ton of duplicate strings and empty `Hashmap` instances wrapped as unmodifiable * This change brings down the static footprint of Painless on an idle node by 20MB (after running the PMC benchmark against said node) * Since we were looking into ways of optimizing for smaller node sizes I think this is a worthwhile optimization	2019-08-15 19:41:45 +02:00
Alpar Torok	03a1645bc6	Use dynamic port ranges for ExternalTestCluster (#45601 ) Moves methods added in #44213 and uses them to configure the port range for `ExternalTestCluster` too. These were still using `9300-9400` ( teh default ) and running into races.	2019-08-15 16:40:12 +03:00
Armin Braun	1beea3588b	Make BlobStoreRepository Validation Read master.dat (#45546 ) (#45578 ) * Fixing this for two reasons: 1. Why not verify that the seed we wrote is actually there when we can 2. The AWS S3 SDK started to log a bunch of WARN messages about not fully reading the stream now that we started to abuse the read blob as an `exists` check after removing that method from the blob container	2019-08-15 07:07:52 +02:00
Nick Knize	647a8308c3	[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363 ) * Introduce Spatial Plugin (#44389) Introduce a skeleton Spatial plugin that holds new licensed features coming to Geo/Spatial land! * [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923) Refactor DeprecatedParameters specific to legacy geo_shape out of AbstractGeometryFieldMapper.TypeParser#parse. * [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980) Add a new ShapeFieldMapper to the xpack spatial module for indexing arbitrary cartesian geometries using a new field type called shape. The indexing approach leverages lucene's new XYShape field type which is backed by BKD in the same manner as LatLonShape but without the WGS84 latitude longitude restrictions. The new field mapper builds on and extends the refactoring effort in AbstractGeometryFieldMapper and accepts shapes in either GeoJSON or WKT format (both of which support non geospatial geometries). Tests are provided in the ShapeFieldMapperTest class in the same manner as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests. Documentation for how to use the new field type and what parameters are accepted is included. The QueryBuilder for searching indexed shapes is provided in a separate commit. * [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108) Add a new ShapeQueryBuilder to the xpack spatial module for querying arbitrary Cartesian geometries indexed using the new shape field type. The query builder extends AbstractGeometryQueryBuilder and leverages the ShapeQueryProcessor added in the previous field mapper commit. Tests are provided in ShapeQueryTests in the same manner as GeoShapeQueryTests and docs are updated to explain how the query works.	2019-08-14 16:35:10 -05:00
Armin Braun	e0d84e7178	Clean up Callback Chains and Duplicate in SnapshotResiliencyTests (#45398 ) (#45563 ) * It's in the title, follow up to #45233 * Flatten more listeners into `StepListener` * Remove duplication from repo and index bootstrap and asserting that the steps execute successfully	2019-08-14 21:53:07 +02:00
Armin Braun	5f6bc6fc2d	Prevent Leaking Search Tasks on Exceptions in FetchSearchPhase and DfsQueryPhase (#45500 ) (#45540 ) * If `counter.onResult` throws an exception we might leak a transport task because the failure is not handled as a phase failure (instead it bubbles up in the transport service eventually hitting the `onFailure` callback again and couting down the `counter` twice). Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>	2019-08-14 14:49:38 +02:00
Armin Braun	00e4fba2fb	Simplify and Optimize RestController Slightly (#45419 ) (#45485 ) * Simplify the path iterator to generate less garbage * `dispatchRequest` always terminates, adjust code accordingly	2019-08-13 10:43:30 +02:00
Julie Tibshirani	dc1856ca53	Make sure to validate the type before attempting to merge a new mapping. (#45157 ) Currently, when adding a new mapping, we attempt to parse + merge it before checking whether its top-level document type matches the existing type. So when a user attempts to introduce a new mapping type, we may give a confusing error message around merging instead of complaining that it's not possible to add more than one type ("Rejecting mapping update to [my-index] as the final mapping would have more than 1 type..."). This PR moves the type validation to the start of `MetaDataMappingService#applyRequest` so that we make sure the type matches before performing any mapper merging. We already partially addressed this issue in #29316, but the tests there focused on `MapperService` and did not catch this problem with end-to-end mapping updates. Addresses #43012.	2019-08-12 14:28:03 -07:00
Zachary Tong	4d97d2c50f	Revert "Only execute one final reduction in InternalAutoDateHistogram (#45359 )" This reverts commit `c0ea8a867e`.	2019-08-12 17:17:17 -04:00
Julie Tibshirani	8c4394d5d7	Fix a bug where mappings are dropped from rollover requests. (#45411 ) We accidentally introduced this bug when adding a typeless version of the rollover request. The bug is not present if include_type_name is set to true.	2019-08-12 12:46:27 -07:00
Michael Basnight	a521e4c86f	Retrieve processors instead of checking existence (#45354 ) The previous hasProcessors method would validate if a processor was present within a pipeline, but would not return the contents of the processors. This does not allow a consumer to inspect the processor for specific metadata. The method now returns the list of processors based on the class of the processor passed in.	2019-08-12 13:48:17 -05:00
Zachary Tong	472f6ef41a	Mute InternalAutoDateHistogramTests#testReduceRandom()	2019-08-12 14:45:08 -04:00
Zachary Tong	c0ea8a867e	Only execute one final reduction in InternalAutoDateHistogram (#45359 ) Because auto-date-histo can perform multiple reductions while merging buckets, we need to ensure that the intermediate reductions are done with a `finalReduce` set to false to prevent Pipeline aggs from generating their output. Once all the buckets have been merged and the output is stable, a mostly-noop reduction can be performed which will allow pipelines to generate their output.	2019-08-12 14:07:38 -04:00
Albert Zaharovits	2cb172f079	CreateIndex and PutIndexTemplate with typeless mapping (#45120 ) This commit makes sure that mapping parameters to `CreateIndex` and `PutIndexTemplate` are keyed by the type name. `IndexCreationTask` expects mappings to be keyed by the type name. It asserts this for template mappings but not for the mappings in the request. The `CreateIndexRequest` and `RestCreateIndexAction` mostly make it sure that the mapping is keyed by a type name, but not always. When building the create-index request outside of the REST handler, there are a few methods to set the mapping for the request. Some of them add the type name some of them do not. For example, `CreateIndexRequest#mapping(String type, Map<String, ?> source)` adds the type name, but `CreateIndexRequest#mapping(String type, XContentBuilder source)` does not. This PR asserts the type name in the request mapping inside `IndexCreationTask` and makes all `CreateIndexRequest#mapping` methods add the type name.	2019-08-12 08:05:07 +03:00
Armin Braun	a9e1402189	Remove Settings from BaseRestRequest Constructor (#45418 ) (#45429 ) * Resolving the todo, cleaning up the unused `settings` parameter * Cleaning up some other minor dead code in affected classes	2019-08-12 05:14:45 +02:00
Nhat Nguyen	cf9a73b5ac	Call afterWriteOperation after trim translog in peer recovery (#45182 ) testShouldFlushAfterPeerRecovery was added #28350 to make sure the flushing loop triggered by afterWriteOperation eventually terminates. This test relies on the fact that we call afterWriteOperation after making changes in translog. In #44756, we roll a new generation in RecoveryTarget#finalizeRecovery but do not call afterWriteOperation. Relates #28350 Relates #45073	2019-08-10 22:59:02 -04:00
Nhat Nguyen	25c6102101	Trim local translog in peer recovery (#44756 ) Today, if an operation-based peer recovery occurs, we won't trim translog but leave it as is. Some unacknowledged operations existing in translog of that replica might suddenly reappear when it gets promoted. With this change, we ensure trimming translog above the starting sequence number of phase 2. This change can allow us to read translog forward.	2019-08-10 22:59:02 -04:00
Armin Braun	1cd464d675	Isolate Request in Call-Chain for REST Request Handling (#45130 ) (#45417 ) * Follow up to #44949 * Stop using a special code path for multi-line JSON and instead handle its detection like that of other XContent types when creating the request * Only leave a single path that holds a reference to the full REST request * In the next step we can move the copying of request content to happen before the actual request handling and make it conditional on the handler in question to stop copying bulk requests as suggested in #44564	2019-08-10 10:21:01 +02:00
Armin Braun	d1ed9bdbfd	Use StepListener to Simplify SnapshotResiliencyTests (#45233 ) (#45386 ) * Reduces complicated callback relations in `testSuccessfulSnapshotAndRestore` to flat steps of sequential actions * Will refactor the other tests in this suit as a follow up * This format certainly makes it easier to create more complicated tests that involve multiple subsequent snapshots as it would allow adding loops	2019-08-09 18:19:48 +02:00
Yannick Welsch	9e6d874a41	Show BWC version in ClusterFormationFailureHelper (#45352 ) When having a cluster state from 6.x, display the metadata version as the cluster state version. Avoids confusion where a cluster state from 6.x is displayed as version 0 even if has some actual content.	2019-08-09 16:23:38 +02:00
Yannick Welsch	5ddeb488a6	Allow _update on write alias (#45318 ) Using the document update API on aliases with a write index does not work. Follow-up to #31520	2019-08-09 11:44:24 +02:00
Tal Levy	2a99eaa7c2	Revert "removes the CellIdSource abstraction from geo-grid aggs (#45307 ) (#45353 )" This reverts commit `7b0a8040de`.	2019-08-08 17:40:03 -07:00
Armin Braun	12ed6dc999	Only retain reasonable history for peer recoveries (#45208 ) (#45355 ) Today if a shard is not fully allocated we maintain a retention lease for a lost peer for up to 12 hours, retaining all operations that occur in that time period so that we can recover this replica using an operations-based recovery if it returns. However it is not always reasonable to perform an operations-based recovery on such a replica: if the replica is a very long way behind the rest of the replication group then it can be much quicker to perform a file-based recovery instead. This commit introduces a notion of "reasonable" recoveries. If an operations-based recovery would involve copying only a small number of operations, but the index is large, then an operations-based recovery is reasonable; on the other hand if there are many operations to copy across and the index itself is relatively small then it makes more sense to perform a file-based recovery. We measure the size of the index by computing its number of documents (including deleted documents) in all segments belonging to the current safe commit, and compare this to the number of operations a lease is retaining below the local checkpoint of the safe commit. We consider an operations-based recovery to be reasonable iff it would involve replaying at most 10% of the documents in the index. The mechanism for this feature is to expire peer-recovery retention leases early if they are retaining so much history that an operations-based recovery using that lease would be unreasonable. Relates #41536	2019-08-09 01:56:32 +02:00
Tal Levy	7b0a8040de	removes the CellIdSource abstraction from geo-grid aggs (#45307 ) (#45353 ) CellIdSource is a helper ValuesSource that encodes GeoPoint into a long-encoded representation of the grid bucket the point is associated with. This complicates thing as usage evolves to support shapes that are associated with more than one bucket ordinal.	2019-08-08 16:33:16 -07:00
Armin Braun	b19de55095	Add missing wait to testAutomaticReleaseOfIndexBlock (#45342 ) (#45351 ) Today the test waits for one of the shards to be blocked, but this does not mean that the block has been applied on all nodes, so a subsequent indexing operation may still go through. Fixes #45338	2019-08-08 22:39:22 +02:00
Henning Andersen	d139896b66	Reindex share retry between hit sources (#44203 ) (#45348 ) The client and remote hit sources had each their own retry mechanism, which would do the same. Supporting resiliency we would have to expand on the retry mechanisms and as a preparation for that, the retry mechanism is now shared such that each sub class is only responsible for sending requests and converting responses/failures to common format. Part of #42612	2019-08-08 22:01:29 +02:00
Christoph Büscher	a552b33276	Fix occasional SuggestSearchIT failure (#45330 ) Refreshes happening during indexing can result differen segment counts and slightly skewed term statistics, which in turn has the potential to change suggestion output slightly. In order to prevent this, disable refresh for the affected tests. Closes #43261	2019-08-08 21:06:32 +02:00
Dimitris Athanasiou	e53bb050db	Mute testAutomaticReleaseOfIndexBlock Relates #45338	2019-08-08 17:56:41 +03:00
Andrey Ershov	07c656fba9	Mute testCustomDataPaths on Windows See #45333 (cherry picked from commit 671e1ad1068aee4b593ad0c8ab13ff60b4f125b8)	2019-08-08 16:26:56 +02:00
Zachary Tong	86d6597890	Use newIndexSearcher() instead of newSearcher() (#45248 ) `newSearcher()` from lucene can randomly choose index readers which are not compatible with our tests, like ParallelCompositeReader. The `newIndexSearcher()` method on AggregatorTestCase is a wrapper similar to newSearcher but compatible with our tests	2019-08-08 09:34:38 -04:00
Martijn van Groningen	e066133016	Change the ingest simulate api to not include dropped documents (#44161 ) If documents are dropped by the `drop` processor then these documents are returned as a `null` value in the response. === Example Create pipeline: ``` PUT _ingest/pipeline/droppipeline { "processors": [ { "set": { "field": "bla", "value": "val" } }, { "drop": {} } ] } ``` Simulate request: POST _ingest/pipeline/droppipeline/_simulate { "docs": [ { "_source": { "message": "text" } } ] } Response: ``` { "docs": [ null ] } ``` Response if verbose is enabled: ``` { "docs": [ { "processor_results": [ { "doc": { "_index": "_index", "_type": "_doc", "_id": "_id", "_source": { "message": "text", "bla": "val" }, "_ingest": { "timestamp": "2019-07-10T11:07:10.758315Z" } } }, null ] } ] } ``` Closes #36150 * Abort pipeline simulation in verbose mode when document has been dropped by drop processor	2019-08-08 13:04:33 +02:00
Martijn van Groningen	fb959d188c	Backport: Add description to force-merge tasks (#41365 ) (#45191 ) * Add description to force-merge tasks (#41365) This is static information that is part of the force merge request. Relates to #15975	2019-08-08 08:15:09 +02:00
Michael Basnight	89861d0884	Add ingest processor existence helper method (#45156 ) This commit adds a helper method to the ingest service allowing it to inspect a pipeline by id and verify the existence of a processor in the pipeline. This work exposed a potential bug in that some processors contain inner processors that are passed in at instantiation. These processors needed a common way to expose their inner processors, so the WrappingProcessor was created in order to expose the inner processor.	2019-08-07 11:19:04 -05:00
Bukhtawar	cd304c4def	Auto-release flood-stage write block (#42559 ) If a node exceeds the flood-stage disk watermark then we add a block to all of its indices to prevent further writes as a last-ditch attempt to prevent the node completely exhausting its disk space. However today this block remains in place until manually removed, and this block is a source of confusion for users who current have ample disk space and did not even realise they nearly ran out at some point in the past. This commit changes our behaviour to automatically remove this block when a node drops below the high watermark again. The expectation is that the high watermark is some distance below the flood-stage watermark and therefore the disk space problem is truly resolved. Fixes #39334	2019-08-07 11:03:53 +01:00
Tanguy Leroux	a869342910	Restore DefaultShardOperationFailedException's reason after deserialization (#45203 ) The reason field of DefaultShardOperationFailedException is lost during serialization. This is sad because this field is checked for nullity during xcontent generation and it means that the cause won't be included in the generated xcontent and won't be printed in two REST API responses (Close Index API and Indices Shard Stores API). This commit simply restores the reason from the cause during deserialization.	2019-08-07 10:37:15 +02:00
Jason Tedor	bd59ee6c72	Fix clock used in update requests (#45262 ) We accidentally switched to using the relative time provider here. This commit fixes this by switching to the appropriate absolute clock.	2019-08-06 21:15:21 -04:00
David Turner	f5d1381e01	Remove always-true param from IndicesService#stats (#45231 ) Parameter `includePrevious` is always true, so this commit inlines it.	2019-08-06 17:22:11 +01:00
David Turner	355713b9ca	Improve slow logging in MasterService (#45241 ) Adds a tighter threshold for logging a warning about slowness in the `MasterService` instead of relying on the cluster service's 30-second warning threshold. This new threshold applies to the computation of the cluster state update in isolation, so we get a warning if computing a new cluster state update takes longer than 10 seconds even if it is subsequently applied quickly. It also applies independently to the length of time it takes to notify the cluster state tasks on completion of publication, in case any of these notifications holds up the master thread for too long. Relates #45007 Backport of #45086	2019-08-06 17:01:49 +01:00
Tanguy Leroux	772ce1f599	Add deprecation warning for Force Merge API (#44903 ) This commit adds a deprecation warning in 7.x for the Force Merge API when both only_expunge_deletes and max_num_segments are set in a request. Relates #44761	2019-08-06 16:04:24 +02:00
Jason Tedor	5b1b146099	Normalize environment paths (#45179 ) This commit applies a normalization process to environment paths, both in how they are stored internally, also their settings values. This normalization is done via two means: - we make the paths absolute - we remove redundant name elements from the path (what Java calls "normalization") This change ensures that when we compare and refer to these paths within the system, we are using a common ground. For example, prior to the change if the data path was relative, we would not compare it correctly to paths from disk usage. This is because the paths in disk usage were being made absolute.	2019-08-06 06:04:30 -04:00
Yannick Welsch	7aeb2fe73c	Add per-socket keepalive options (#44055 ) Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings. By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like to explore whether we can enable them by default, in particular to force keepalive configurations that are better tuned for running ES.	2019-08-06 10:45:44 +02:00
Igor Motov	b5f88120b5	Geo: add Geometry-based query builders to QueryBuilders (#45058 ) Add Geometry-based method for creation of query builders in QueryBuilder Relates to #44715	2019-08-05 13:34:48 -04:00

1 2 3 4 5 ...

3705 Commits