OpenSearch

Commit Graph

Author	SHA1	Message	Date
Rory Hunter	cf5f013033	Return 400 when handling invalid JSON (#49558 ) Backport of #49552. Closes #49428. The code that works out an HTTP code for an exception didn't consider the JsonParseException case, meant that an invalid JSON request could result in a 500 Internal Server Error. Now it returns 400 Bad Request.	2019-11-26 12:36:56 +00:00
Tim Brooks	416178c7c8	Enable simple remote connection strategy (#49561 ) This commit back ports three commits related to enabling the simple connection strategy. Allow simple connection strategy to be configured (#49066) Currently the simple connection strategy only exists in the code. It cannot be configured. This commit moves in the direction of allowing it to be configured. It introduces settings for the addresses and socket count. Additionally it introduces new settings for the sniff strategy so that the more generic number of connections and seed node settings can be deprecated. The simple settings are not yet registered as the registration is dependent on follow-up work to validate the settings. Ensure at least 1 seed configured in remote test (#49389) This fixes #49384. Currently when we select a random subset of seed nodes from a list, it is possible for 0 seeds to be selected. This test depends on at least 1 seed being selected. Add the simple strategy to cluster settings (#49414) This is related to #49067. This commit adds the simple connection strategy settings and strategy mode setting to the cluster settings registry. With these changes, the simple connection mode can be used. Additionally, it adds validation to ensure that settings cannot be misconfigured.	2019-11-25 16:53:07 -07:00
Zachary Tong	99e313695f	Reuse CompensatedSum object in agg collect loops (#49548 ) The new CompensatedSum is a nice DRY refactor, but had the unanticipated side effect of creating a lot of object allocation in the aggregation hot collection loop: one object per visited document, per aggregator. In some places it created two per-doc-per-agg (weighted avg, geo centroids, etc) since there were multiple compensations being maintained. This PR moves the object creation out of the hot loop so that it is now created once per segment, and resets the internal state each time through the loop	2019-11-25 16:46:48 -05:00
Armin Braun	2502ff39a0	Enhance SnapshotResiliencyTests (#49514 ) (#49541 ) A few enhancements to `SnapshotResiliencyTests`: 1. Test running requests from random nodes in more spots to enhance coverage (this is particularly motivated by #49060 where the additional number of cluster state updates makes it more interesting to fully cover all kinds of network failures) 2. Fix issue with restarting only master node in one test (doing so breaks the test at an incredibly low frequency, that becomes not so low in #49060 with the additional cluster state updates between request and response) 3. Improved cluster formation checks (now properly checks the term as well when forming cluster) + makes sure all nodes are connected to all other nodes (previously the data nodes would at times not be connected to other data nodes, which was shaken out now by adding the `client()` method 4. Make sure the cluster left behind by the test makes sense by running the repo cleanup action on it (this also increases coverage of the repository cleanup action obviously and adds the basis of making it part of more resiliency tests)	2019-11-25 13:31:28 +01:00
Jared Tan	1d2bfd1af6	Include id to the error msg when it's too long (#49433 )	2019-11-24 13:08:26 -05:00
Jason Tedor	69f570ea5f	Adjust version on final pipeline serialization This commit adjusts the version final pipeline serialization after it was backported to the 7.5 branch.	2019-11-22 14:56:56 -05:00
Jay Modi	4fd5fb5297	Stop NodeTests from timing out in certain cases (#49202 ) (#49503 ) The NodeTests class contains tests that check behavior when shutting down a node. This involves starting a node, performing some operation, stopping the node, and then awaiting the close of the node. Part of closing a node is the termination of the node's ThreadPool. ThreadPool termination semantics can be deceiving. The ThreadPool#terminate method takes a timeout value and the first oddity is that the terminate method can take two times the timeout value before returning. Internally this method acts on the ExecutorService instances that are held by the ThreadPool. First, an orderly shutdown is attempted and pending tasks are allowed to execute while waiting for the timeout value. If any of the ExecutorService instances have not terminated, a call is made to attempt to stop all active tasks (usually using interrupts) and then waits for up to the timeout value a second time for the termination of the ExecutorService instances. This means that if use a large value when waiting for a node to close, we may not attempt to interrupt any threads that are in a blocking call before the test times out. In order to avoid causing these tests to time out, this change reduces the timeout passed to Node#awaitClose to 10 seconds from 1 day. This will allow blocked threads to be interrupted before the test suite fails due to the timeout. Closes #44256 Closes #42350 Closes #44435	2019-11-22 12:41:52 -07:00
Jason Tedor	71bcfbf1e3	Replace required pipeline with final pipeline (#49470 ) This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.	2019-11-22 14:37:36 -05:00
Armin Braun	97c7ea60b9	Add Missing Nullable Assertions in SnapshotsService (#49465 ) (#49492 ) Just realized we were missing some annotations here which was somewhat confusing since other methods/parameters have the `Nullable` annotation wherever a `null` can be passed.	2019-11-22 17:27:27 +01:00
Rory Hunter	4fae2bb3b1	Don't close stderr under `--quiet` (#49431 ) Backport of #47208. Closes #46900. When running ES with `--quiet`, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception. This is useful when running under e.g. systemd, since the error will go into the journal. Note that stderr is still closed in daemon (`-d`) mode.	2019-11-22 14:58:17 +00:00
Jim Ferenczi	ed4eecc00e	Pre-sort shards based on the max/min value of the primary sort field (#49092 ) This change automatically pre-sort search shards on search requests that use a primary sort based on the value of a field. When possible, the can_match phase will extract the min/max (depending on the provided sort order) values of each shard and use it to pre-sort the shards prior to running the subsequent phases. This feature can be useful to ensure that shards that contain recent data are executed first so that intermediate merge have more chance to contain contiguous data (think of date_histogram for instance) but it could also be used in a follow up to early terminate sorted top-hits queries that don't require the total hit count. The latter could significantly speed up the retrieval of the most/least recent documents from time-based indices. Relates #49091	2019-11-22 11:02:12 +01:00
Igor Motov	e8971ff367	Geo: Fix handling of circles in legacy geo_shape queries (#49410 ) Brings back support for circles in legacy geo_shape queries that was accidentally lost during query refactoring. Fixes #49296	2019-11-21 14:03:31 -05:00
Christoph Büscher	138d16ab9e	Fix ClusterHealthResponsesTests condition (#49360 ) Currently the condtion that is supposed to test creation of test instances with multiple indices is never true because it compares Strings with an enum. This changes it so the condition uses the enum constants instead.	2019-11-21 17:14:23 +01:00
Alan Woodward	d1eb7e749e	Fix test for index phrases shortcut with multi-term synonyms (#49366 ) Lucene 8.3 included a root fix for #43976, which was temporarily fixed in elasticsearch by #44340. Since we have upgraded to 8.3 we no longer need this workaround. This commit fixes the test that was added to check the workaround, and instead checks that fields with index_phrases enabled correctly build queries when used with multi-term synonyms. Closes #47777	2019-11-21 09:49:58 +00:00
Yannick Welsch	d72bd3a171	Verify translog checksum before UUID check (#49394 ) When opening a translog file, we check whether the UUID matches what we expect (the UUID from the latest commit). The UUID check can in certain cases fail when the translog is corrupted. This commit changes the ordering of the checks so that corruption is detected first.	2019-11-21 10:12:49 +01:00
Yannick Welsch	8ee70fa9c6	Fix testPeerRecoveryTrimsLocalTranslog (#49385 ) 7.x uses the transport client, which, when being closed, can throw an IllegalStateException Closes #49375	2019-11-21 10:03:25 +01:00
Nhat Nguyen	37a9cd677b	Ignore Lucene index in peer recovery if translog corrupted (#49114 ) If the translog on a replica is corrupt, we should not perform an operation-based recovery or utilize sync_id as we won't be able to open an engine in the next step. This change adds an extra validation that ensures translog is okay when preparing a peer recovery request.	2019-11-20 16:04:09 -05:00
jaymode	d9fd4cc351	Add version 6.8.6	2019-11-20 11:01:57 -07:00
Jim Ferenczi	81548df2d9	Disable caching when queries are profiled (#48195 ) This change disables the query and request cache when profile is set to true in the request. This means that profiled queries will not check caches to execute the query and the result will never be added in the cache either. Closes #33298	2019-11-20 16:02:59 +01:00
Armin Braun	1cde4a6364	Make SnapshotsService#getRepositoryData Async (#49322 ) (#49358 ) * Make SnapshotsService#getRepositoryData Async (#49322) Follow up to #49299 removing the blocking step for the snapshot status APIs as well.	2019-11-20 15:22:10 +01:00
Alan Woodward	c6b31162ba	Refactor percolator's QueryAnalyzer to use QueryVisitors Lucene now allows us to explore the structure of a query using QueryVisitors, delegating the knowledge of how to recurse through and collect terms to the query implementations themselves. The percolator currently has a home-grown external version of this API to construct sets of matching terms that must be present in a document in order for it to possibly match the query. This commit removes the home-grown implementation in favour of one using QueryVisitor. This has the added benefit of making interval queries available for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050) it also includes a clone of some of the lucene intervals logic, that can be removed once upstream has been fixed. Closes #45639	2019-11-20 09:21:01 +00:00
Mark Tozzi	17358b5af7	(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320 ) (#49330 ) This is a pure code rearrangement refactor. Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility). ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.	2019-11-19 16:44:29 -05:00
Jay Modi	eed4cd25eb	ThreadPool and ThreadContext are not closeable (#43249 ) (#49273 ) This commit changes the ThreadContext to just use a regular ThreadLocal over the lucene CloseableThreadLocal. The CloseableThreadLocal solves issues with ThreadLocals that are no longer needed during runtime but in the case of the ThreadContext, we need it for the runtime of the node and it is typically not closed until the node closes, so we miss out on the benefits that this class provides. Additionally by removing the close logic, we simplify code in other places that deal with exceptions and tracking to see if it happens when the node is closing. Closes #42577	2019-11-19 13:15:16 -07:00
Jack Conradson	14d2e795ae	make dim files mmapped (#49272 ) This change mmaps dim files in HybridDirectory to take advantage of off- heap BKD trees. This is based off of (#48509) via (https://issues.apache.org/jira/browse/LUCENE-8932).	2019-11-19 10:22:30 -08:00
Armin Braun	0acba44a2e	Make Repository.getRepositoryData an Async API (#49299 ) (#49312 ) This API call in most implementations is fairly IO heavy and slow so it is more natural to be async in the first place. Concretely though, this change is a prerequisite of #49060 since determining the repository generation from the cluster state introduces situations where this call would have to wait for other operations to finish. Doing so in a blocking manner would break `SnapshotResiliencyTests` and waste a thread. Also, this sets up the possibility to in the future make use of async IO where provided by the underlying Repository implementation. In a follow-up `SnapshotsService#getRepositoryData` will be made async as well (did not do it here, since it's another huge change to do so). Note: This change for now does not alter the threading behaviour in any way (since `Repository#getRepositoryData` isn't forking) and is purely mechanical.	2019-11-19 16:49:12 +01:00
Armin Braun	9c00648314	Make Snapshot Delete Concurrency Exception Consistent (#49266 ) (#49281 ) We shouldn't be throwing `RepositoryException` when the repository wasn't concurrently modified in an unexpected fashion (i.e. on the blob/file level). When we know that the known repo gen moved higher in terms of the generation tracked in master memory we should throw the concurrent snapshot exception. This change makes concurrent snapshot create and delete always throw the same exception, prevents unnecessary listings when the generation is known to be off and prevents future test failures in SLM tests that assume the concurrent snapshot exception is always thrown here. Without this change, the newly added test randomly fails the `instanceOf` assertion by running into a `RepositoryException`.	2019-11-19 09:50:52 +01:00
Henning Andersen	2ac38fd315	Reindex and friends fail on RED shards (#45830 ) Reindex, update by query and delete by query would silently disregard RED/unavailable shards, thus not copying, updating or deleting matching data in those shards. Now use `allow_partial_search_results=false` to ensure these operations fail if the search crosses an unavailable chard. Added the option to explicitly specify `allow_partial_search_results=true` for reindex only (seemed too strange for update/delete by query). Relates #45739 and #42612	2019-11-18 21:23:08 +01:00
Benjamin Trent	eefe7688ce	[7.x][ML] ML Model Inference Ingest Processor (#49052 ) (#49257 ) * [ML] ML Model Inference Ingest Processor (#49052) * [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955) * fixing version of indexed docs for model inference	2019-11-18 13:19:17 -05:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
Armin Braun	25cc8e3663	Fix RepoCleanup not Removed on Master-Failover (#49217 ) (#49239 ) The logic for `cleanupInProgress()` was backwards everywhere (method itself and all but one user). Also, we weren't checking it when removing a repository. This lead to a bug (in the one spot that didn't use the method backwards) that prevented the cleanup cluster state entry from ever being removed from the cluster state if master failed over during the cleanup process. This change corrects the backwards logic, adds a test that makes sure the cleanup is always removed and adds a check that prevents repository removal during cleanup to the repositories service. Also, the failure handling logic in the cleanup action was broken. Repeated invocation would lead to the cleanup being removed from the cluster state even if it was in progress. Fixed by adding a flag that indicates whether or not any removal of the cleanup task from the cluster state must be executed. Sorry for mixing this in here, but I had to fix it in the same PR, as the first test (for master-failover) otherwise would often just delete the blocked cleanup action as a result of a transport master action retry.	2019-11-18 16:44:09 +01:00
Armin Braun	f7d9e7bdc4	Better Exceptions on Concurrent Snapshot Operations (#49220 ) (#49237 ) * Better Exceptions on Concurrent Snapshot Operations It is somewhat tricky to debug test failures from concurrent operations without having the exact knowledge of what ran concurrently so I added it to these exceptions in all spots.	2019-11-18 14:12:55 +01:00
Armin Braun	42268f0b0e	Fix Broken Network Disruption in SnapshotResiliencyTests (#49216 ) (#49231 ) The network disruption was acting on node ids and node names which made reconnects not work. Moved all usages to node names to fix this. Since the map of all nodes in the test is indexed by name this was easier to work with.	2019-11-18 12:02:27 +01:00
Yannick Welsch	af797a77a1	Auto-expand indices according to allocation filtering rules (#48974 ) Honours allocation filtering rules when auto-expanding indices.	2019-11-18 12:01:56 +01:00
Armin Braun	2886d4c6dd	Make FsBlobContainer Listing Resilient to Concurrent Modifications (#49142 ) (#49176 ) * Make FsBlobContainer Listing Resilient to Concurrent Modifications If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes #37581	2019-11-15 21:14:53 +01:00
Mark Tozzi	dad68c59fe	Avoid precision loss in DocValueFormat.RAW#parseLong (#49063 ) (#49169 )	2019-11-15 12:32:26 -05:00
markharwood	c3745b03ee	Search optimisation - add canMatch early aborts for queries on "_index" field (#49158 ) Make queries on the “_index” field fast-fail if the target shard is an index that doesn’t match the query expression. Part of the “canMatch” phase optimisations. Closes #48473	2019-11-15 16:50:32 +00:00
Jason Tedor	36dc544819	Adjust version on ingest processor exception The dedicated ingest processor exception was backported to 7.5. This commit updates the version in the 7.x branch.	2019-11-15 09:35:12 -05:00
Armin Braun	fc505aaa76	Track Repository Gen. in BlobStoreRepository (#48944 ) (#49116 ) This is intended as a stop-gap solution/improvement to #38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of #38941 causing trouble via SLM (see https://github.com/elastic/elasticsearch/issues/47520). Closes #47834 Closes #49048	2019-11-15 09:54:53 +01:00
Tal Levy	5cd6f64f15	Introduce faster approximate sinh/atan math functions (#49009 ) (#49110 ) This commit introduces a new class called ESSloppyMath that is meant to reflect the purpose of Lucene's SloppyMath, but add additional unimplemented faster alternatives to math functions. The two that are used by geotile-grid a lot are sinh/atan. In a quick elasticsearch rally benchmark for geotile-grid on Switzerland data points, this shows a (1.22x) 22% speed-up over using Math's functions. closes #41166.	2019-11-14 14:15:34 -08:00
bellengao	6ce04429c6	Fix `_analyze` API to correctly use normalizers when specified (#48866 ) Currently the `_analyze` endpoint doesn't correctly use normalizers specified in the request. This change fixes that by returning the resolved normalizer from TransportAnalyzeAction#getAnalyzer and updates test to be able to catch this in the future. Closes #48650	2019-11-14 19:51:11 +01:00
Jason Tedor	2bcdcb17cd	Introduce dedicated ingest processor exception (#48810 ) Today we wrap exceptions that occur while executing an ingest processor in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we only unwrap causes for exceptions that implement ElasticsearchWrapperException, which the top-level ElasticsearchException does not. Ultimately, this means that any exception that occurs during processor execution does not have its cause unwrapped, and so its status is blanket treated as a 500. This means that while executing a bulk request with an ingest pipeline, document-level failures that occur during a processor will cause the status for that document to be treated as 500. Since that does not give the client any indication that they made a mistake, it means some clients will enter infinite retries, thinking that there is some server-side problem that merely needs to clear. This commit addresses this by introducing a dedicated ingest processor exception, so that its causes can be unwrapped. While we could consider a broader change to unwrap causes for more than just ElasticsearchWrapperExceptions, that is a broad change with unclear implications. Since the problem of reporting 500s on client errors is a user-facing bug, we take the conservative approach for now, and we can revisit the unwrapping in a future change.	2019-11-14 11:04:53 -05:00
Christoph Büscher	6c5644335f	Simplify TransportMultiSearchActionTests (#48523 ) The test doesn't seem to need the threadpool that is created and destroyed in setup and teardown any longer, so it can be removed.	2019-11-14 14:48:16 +01:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Henning Andersen	66f0c8900f	Fix Transport Stopped Exception (#48930 ) (#49035 ) When a node shuts down, `TransportService` moves to stopped state and then closes connections. If a request is done in between, an exception was thrown that was not retried in replication actions. Now throw a wrapped `NodeClosedException` exception instead, which is correctly handled in replication action. Fixed other usages too. Relates #42612	2019-11-13 18:48:05 +01:00
Yannick Welsch	2dfa0133d5	Always use primary term from primary to index docs on replica (#47583 ) Ensures that we always use the primary term established by the primary to index docs on the replica. Makes the logic around replication less brittle by always using the operation primary term on the replica that is coming from the primary.	2019-11-13 12:13:45 +01:00
Igor Motov	40776eedaf	Fix ignoring missing values in min/max aggregations (#48970 ) Fixes the issue when the missing values can be ignored in min/max due to BKD optimization. Fixes #48905	2019-11-12 19:57:28 -05:00
Armin Braun	0e1035241d	Fix Broken Snapshots in Mixed Clusters (#48993 ) (#48995 ) Reverts #48947 and fixes the issue orginally addressed by removing the assertion. It turns out we can't simply pass empty shard generations to the snapshot finalization in the BwC case as that results in no indices being added to the meta for the given snapshot since we take the indices from the shard generations (even in the BwC case the `null` generations work fine for this). Closes #48983	2019-11-12 21:35:41 +01:00
David Turner	9baea80853	Ignore metadata of deleted indices at start (#48918 ) Today in 6.x it is possible to add an index tombstone to the graveyard without deleting the corresponding index metadata, because the deletion is slightly deferred. If you shut down the node and upgrade to 7.x when in this state then the node will fail to apply any cluster states, reporting java.lang.IllegalStateException: Cannot delete index [...], it is still part of the cluster state. This commit addresses this situation by skipping over any index metadata with a corresponding tombstone, allowing this metadata to be cleaned up by the 7.x node.	2019-11-12 11:16:54 +00:00
David Turner	dc441588b6	Remove support for ancient corrupted markers (#48858 ) Today we still support reading store corruption markers of versions that haven't been written since 1.7. This commit removes this legacy support.	2019-11-12 11:10:46 +00:00
Yannick Welsch	ab15bce4e7	Auto-expand replicated closed indices (#48973 ) Fixes a bug where replicated closed indices were not being auto-expanded.	2019-11-12 12:00:05 +01:00
Tim Brooks	0645ee88e2	Send cluster name and discovery node in handshake (#48916 ) This commits sends the cluster name and discovery naode in the transport level handshake response. This will allow us to stop sending the transport service level handshake request in the 8.0-8.x release cycle. It is necessary to start sending this in 7.x so that 8.0 is guaranteed to be communicating with a version that sends the required information.	2019-11-11 18:42:02 -05:00
Jake Landis	c320b499a0	Prevent deadlock by using separate schedulers (#48697 ) (#48964 ) Currently the BulkProcessor class uses a single scheduler to schedule flushes and retries. Functionally these are very different concerns but can result in a dead lock. Specifically, the single shared scheduler can kick off a flush task, which only finishes it's task when the bulk that is being flushed finishes. If (for what ever reason), any items in that bulk fails it will (by default) schedule a retry. However, that retry will never run it's task, since the flush task is consuming the 1 and only thread available from the shared scheduler. Since the BulkProcessor is mostly client based code, the client can provide their own scheduler. As-is the scheduler would require at minimum 2 worker threads to avoid the potential deadlock. Since the number of threads is a configuration option in the scheduler, the code can not enforce this 2 worker rule until runtime. For this reason this commit splits the single task scheduler into 2 schedulers. This eliminates the potential for the flush task to block the retry task and removes this deadlock scenario. This commit also deprecates the Java APIs that presume a single scheduler, and updates any internal code to no longer use those APIs. Fixes #47599 Note - #41451 fixed the general case where a bulk fails and is retried that can result in a deadlock. This fix should address that case as well as the case when a bulk failure from the flush needs to be retried.	2019-11-11 16:31:21 -06:00
Mark Tozzi	d9e569278f	Refactor and DRY up Kahan Sum algorithm (#48558 ) (#48959 )	2019-11-11 15:09:19 -05:00
Armin Braun	c45470f84f	Fix ShardGenerations in RepositoryData in BwC Case (#48920 ) (#48947 ) We were tripping the assertion that the makes sure we only have empty `ShardGenerations` in `RepositoryData` in the BwC case because shard generations were passed to the `Repository` in the BwC case. Fixed by only generating empty shard gen for BwC snapshots in `SnapshotsService`.	2019-11-11 18:02:53 +01:00
Rory Hunter	014e1b1090	Improve resiliency to auto-formatting in server (#48940 ) Backport of #48450. Make a number of changes so that code in the `server` directory is more resilient to automatic formatting. This covers: * Reformatting multiline JSON to embed whitespace in the strings * Move some comments around to they aren't auto-formatted to a strange place. This also required moving some `&&` and `\|\|` operators from the end-of-line to start-of-line`. * Add helper method `reformatJson()`, to strip whitespace from a JSON document using XContent methods. This is sometimes necessary where a test is comparing some machine-generated JSON with an expected value. Also, `HyperLogLogPlusPlus.java` is now excluded from formatting because it contains large data tables that don't reformat well with the current settings, and changing the settings would be worse for the rest of the codebase.	2019-11-11 14:33:04 +00:00
Yannick Welsch	87862868c6	Allow realtime get to read from translog (#48843 ) The realtime GET API currently has erratic performance in case where a document is accessed that has just been indexed but not refreshed yet, as the implementation will currently force an internal refresh in that case. Refreshing can be an expensive operation, and also will block the thread that executes the GET operation, blocking other GETs to be processed. In case of frequent access of recently indexed documents, this can lead to a refresh storm and terrible GET performance. While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted to read from the translog in case of realtime GET API or update API, this was removed in 5.0 (#20102) to avoid inconsistencies between values that were returned from the translog and those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and upsert to read from the translog again as it was easier to guarantee consistency for these, and also brought back more predictable performance characteristics of this API. Calls to the realtime GET API, however, would still always do a refresh if necessary to return consistent results. This means that users that were calling realtime GET APIs to coordinate updates on client side (realtime GET + CAS for conditional index of updated doc) would still see very erratic performance. This PR (together with #48707) resolves the inconsistencies between reading from translog and index. In particular it fixes the inconsistencies that happen when requesting stored fields, which were not available when reading from translog. In case where stored fields are requested, this PR will reparse the _source from the translog and derive the stored fields to be returned. With this, it changes the realtime GET API to allow reading from the translog again, avoid refresh storms and blocking the GET threadpool, and provide overall much better and predictable performance for this API.	2019-11-09 17:47:50 +01:00
Nhat Nguyen	ff6c121eb9	Closed shard should never open new engine (#47186 ) We should not open new engines if a shard is closed. We break this assumption in #45263 where we stop verifying the shard state before creating an engine but only before swapping the engine reference. We can fail to snapshot the store metadata or checkIndex a closed shard if there's some IndexWriter holding the index lock. Closes #47060	2019-11-08 23:40:34 -05:00
Nhat Nguyen	9a42e71dd9	Do not cancel recovery for copy on broken node (#48265 ) This change fixes a poisonous situation where an ongoing recovery was canceled because a better copy was found on a node that the cluster had previously tried allocating the shard to but failed. The solution is to keep track of the set of nodes that an allocation was failed on so that we can avoid canceling the current recovery for a copy on failed nodes. Closes #47974	2019-11-08 23:10:47 -05:00
Adrien Grand	3b9ce0a4f3	Elasticsearch 7.5 is on Lucene 8.3. (#48831 )	2019-11-06 10:13:09 -05:00
David Turner	bd5c6c4779	Add preflight check to dynamic mapping updates (#48867 ) Today if the primary discovers that an indexing request needs a mapping update then it will send it to the master for validation and processing. If, however, the put-mapping request is invalid then the master still processes it as a (no-op) cluster state update. When there are a large number of indexing operations that result in invalid mapping updates this can overwhelm the master. However, the primary already has a reasonably up-to-date mapping against which it can check the (approximate) validity of the put-mapping request before sending it to the master. For instance it is not possible to remove fields in a mapping update, so if the primary detects that a mapping update will exceed the fields limit then it can reject it itself and avoid bothering the master. This commit adds a pre-flight check to the mapping update path so that the primary can discard obviously-invalid put-mapping requests itself. Fixes #35564 Backport of #48817	2019-11-05 18:08:22 +01:00
Nhat Nguyen	0887cbc964	Fix testForceMergeWithSoftDeletesRetentionAndRecoverySource (#48766 ) This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735	2019-11-02 21:14:12 -04:00
Armin Braun	3c20541823	Cleanup Concurrent RepositoryData Loading (#48329 ) (#48834 ) The loading of `RepositoryData` is not an atomic operation. It uses a list + get combination of calls. This lead to accidentally returning an empty repository data for generations >=0 which can never not exist unless the repository is corrupted. In the test #48122 (and other SLM tests) there was a low chance of running into this concurrent modification scenario and the repository actually moving two index generations between listing out the index-N and loading the latest version of it. Since we only keep two index-N around at a time this lead to unexpectedly absent snapshots in status APIs. Fixing the behavior to be more resilient is non-trivial but in the works. For now I think we should simply throw in this scenario. This will also help prevent corruption in the unlikely event but possible of running into this issue in a snapshot create or delete operation on master failover on a repository like S3 which doesn't have the "no overwrites" protection on writing a new index-N. Fixes #48122	2019-11-02 20:42:29 +01:00
Armin Braun	a22f6fbe3c	Cleanup Redundant Futures in Recovery Code (#48805 ) (#48832 ) Follow up to #48110 cleaning up the redundant future uses that were left over from that change.	2019-11-02 17:28:12 +01:00
Jason Tedor	c82ecb664c	Do not wrap ingest processor exception with IAE (#48816 ) The problem with wrapping here is that it converts any exception into an IAE, which we treat as a client error (400 status) whereas the exception being wrapped here could be a server error (e.g., NPE). This commit stops wrapping all ingest processor exceptions as IAEs.	2019-11-01 15:11:35 -04:00
Mark Vieira	6ab4645f4e	[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818 ) This commit introduces a consistent, and type-safe manner for handling global build parameters through out our build logic. Primarily this replaces the existing usages of extra properties with static accessors. It also introduces and explicit API for initialization and mutation of any such parameters, as well as better error handling for uninitialized or eager access of parameter values. Closes #42042	2019-11-01 11:33:11 -07:00
Tal Levy	4be54402de	[7.x] Add ingest info to Cluster Stats (#48485 ) (#48661 ) * Add ingest info to Cluster Stats (#48485) This commit enhances the ClusterStatsNodes response to include global processor usage stats on a per-processor basis. example output: ``` ... "processor_stats": { "gsub": { "count": 0, "failed": 0 "current": 0 "time_in_millis": 0 }, "script": { "count": 0, "failed": 0 "current": 0, "time_in_millis": 0 } } ... ``` The purpose for this enhancement is to make it easier to collect stats on how specific processors are being used across the cluster beyond the current per-node usage statistics that currently exist in node stats. Closes #46146. * fix BWC of ingest stats The introduction of processor types into IngestStats had a bug. It was set to `null` and set as the key to the map. This would throw a NPE. This commit resolves this by setting all the processor types from previous versions that are not serializing it out to `_NOT_AVAILABLE`.	2019-10-31 14:36:54 -07:00
Ioannis Kakavas	99aedc844d	Copy http headers to ThreadContext strictly (#45945 ) (#48675 ) Previous behavior while copying HTTP headers to the ThreadContext, would allow multiple HTTP headers with the same name, handling only the first occurrence and disregarding the rest of the values. This can be confusing when dealing with multiple Headers as it is not obvious which value is read and which ones are silently dropped. According to RFC-7230, a client must not send multiple header fields with the same field name in a HTTP message, unless the entire field value for this header is defined as a comma separated list or this specific header is a well-known exception. This commits changes the behavior in order to be more compliant to the aforementioned RFC by requiring the classes that implement ActionPlugin to declare if a header can be multi-valued or not when registering this header to be copied over to the ThreadContext in ActionPlugin#getRestHeaders. If the header is allowed to be multivalued, then all such headers are read from the HTTP request and their values get concatenated in a comma-separated string. If the header is not allowed to be multivalued, and the HTTP request contains multiple such Headers with different values, the request is rejected with a 400 status.	2019-10-31 23:05:12 +02:00
Zachary Tong	34c2375417	Add v7.4.3 version constant	2019-10-31 13:21:25 -04:00
Alexander Reelsen	4ecf234617	Upgrade to joda 2.10.4 (#47805 )	2019-10-31 14:49:50 +01:00
Stéphane Campinas	7ea74918e1	[DOCS] Fix typo in IndexFieldData.java comments (#48743 )	2019-10-31 09:40:35 -04:00
kkewwei	0366c4d4a9	Faster access to INITIALIZING/RELOCATING shards (#47817 ) Today a couple of allocation deciders iterate through all the shards on a node to find the `INITIALIZING` or `RELOCATING` ones, and this can slow down cluster state updates in clusters with very high-density nodes holding many thousands of shards even if those shards belong to closed or frozen indices. This commit pre-computes the sets of `INITIALIZING` and `RELOCATING` shards to speed up this search. Closes #46941 Relates #48579 Co-authored-by: "hongju.xhj" <hongju.xhj@alibaba-inc.com>	2019-10-31 10:55:59 +00:00
Rory Hunter	d96976e2b1	Improve resiliency to formatting JSON in server (#48706 ) Backport of #48553. Make a number of changes so that JSON in the server directory is more resilient to automatic formatting. This covers: * Reformatting multiline JSON to embed whitespace in the strings * Add helper method `stripWhitespace()`, to strip whitespace from a JSON document using XContent methods. This is sometimes necessary where a test is comparing some machine-generated JSON with an expected value.	2019-10-31 10:48:55 +00:00
Arvind Ramachandran	eefa84bc94	Ignore dangling indices created in newer versions (#48652 ) Today it is possible that we import a dangling index that was created in a newer version than one or more of the nodes in the cluster. Such an index would prevent the older node(s) from rejoining the cluster if they were to briefly leave it for some reason. This commit prevents the import of such dangling indices. Fixes #34264	2019-10-31 10:12:42 +00:00
Yannick Welsch	fe8901b00b	Return consistent source in updates (#48707 )	2019-10-31 10:00:40 +01:00
Ignacio Vera	5bea3898a9	Add IndexOrDocValuesQuery to GeoPolygonQueryBuilder (#48449 ) (#48731 )	2019-10-31 08:46:57 +01:00
Nhat Nguyen	f8ef402027	Do not warm up searcher in engine constructor (#48605 ) With this change, we won't warm up searchers until we externally refresh an engine. We explicitly refresh before allowing reading from a shard (i.e., move to post_recovery state) and during resetting. These guarantees that we have warmed up the engine before exposing the external searcher. Another prerequisite for #47186.	2019-10-30 14:22:59 -04:00
Armin Braun	36039706b5	Fix SnapshotShardStatus Reporting for Failed Shard (#48556 ) (#48687 ) Fixes the shard snapshot status reporting for failed shards in the corner case of failing the shard because of an exception thrown in `SnapshotShardsService` and not the repository. We were missing the update on the `snapshotStatus` instance in this case which made the transport APIs using this field report back an incorrect status. Fixed by moving the failure handling to the `SnapshotShardsService` for all cases (which also simplifies the code, the ex. wrapping in the repository was pointless as we only used the ex. trace upstream anyway). Also, added an assertion to another test that explicitly checks this failure situation (ex. in the `SnapshotShardsService`) already. Closes #48526	2019-10-30 15:43:41 +01:00
Armin Braun	52e5ceb321	Restore from Individual Shard Snapshot Files in Parallel (#48110 ) (#48686 ) Make restoring shard snapshots run in parallel on the `SNAPSHOT` thread-pool.	2019-10-30 14:36:30 +01:00
Armin Braun	01e326d2e3	Fix ref count handling in Engine.failEngine (#48639 ) (#48646 ) We can run into an already closed store here and hence throw on trying to increment the ref count => moving to the guarded ref count increment closes #48625	2019-10-30 10:10:48 +01:00
Julie Tibshirani	89c65752dc	Update the signature of vector script functions. (#48653 ) Previously the functions accepted a doc values reference, whereas they now accept the name of the vector field. Here's an example of how a vector function was called before and after the change. ``` Before: cosineSimilarity(params.query_vector, doc['field']) After: cosineSimilarity(params.query_vector, 'field') ``` This seems more intuitive, since we don't allow direct access to vector doc values and the the meaning of `doc['field']` is unclear. The PR makes the following changes (broken into distinct commits): * Add new function signatures of the form `function(params.query_vector, 'field')` and deprecates the old ones. Because Painless doesn't allow two methods with the same name and number of arguments, we allow a generic `Object` to be passed in to the function and decide on the behavior through an `instanceof` check. * Refactor the class bindings so that the document field is passed to the constructor instead of the instance method. This allows us to avoid retrieving the vector doc values on every function invocation, which gives a tiny speed-up in benchmarks. Note that this PR adds new signatures for the sparse vector functions too, even though sparse vectors are deprecated. It seemed simplest to understand (for both us and users) to keep everything symmetric between dense and sparse vectors.	2019-10-29 15:46:05 -07:00
Stuart Tettemer	55d00cf2b1	Scripting: fill in get contexts REST API (#48319 ) (#48602 ) Updates response for `GET /_script_context`, returning a `contexts` object with a list of context description objects. The description includes the context name and a list of methods available. The methods list has the signature for the `execute` mathod and any getters. eg. ``` { "contexts": [ { "name" : "moving-function", "methods" : [ { "name" : "execute", "return_type" : "double", "params" : [ { "type" : "java.util.Map", "name" : "params" }, { "type" : "double[]", "name" : "values" } ] } ] }, { "name" : "number_sort", "methods" : [ { "name" : "execute", "return_type" : "double", "params" : [ ] }, { "name" : "getDoc", "return_type" : "java.util.Map", "params" : [ ] }, { "name" : "getParams", "return_type" : "java.util.Map", "params" : [ ] }, { "name" : "get_score", "return_type" : "double", "params" : [ ] } ] }, ... ] } ``` fixes: #47411	2019-10-29 14:41:15 -06:00
Nhat Nguyen	2a863ac8ff	Fix testCleanUpCommitsWhenGlobalCheckpointAdvanced Relates #48559	2019-10-29 10:39:16 -04:00
Nhat Nguyen	b08cd058bc	Greedily advance safe commit on new global checkpoint (#48559 ) Today we won't advance the safe commit on a new global checkpoint unless the last commit can become safe. This is not great if we have more than two commits as we can have a new safe commit earlier. Closes #4853	2019-10-29 10:39:16 -04:00
Jim Ferenczi	aa70ff5ea4	Fix failures in ShuffleForcedMergePolicyTests#testDiagnostics (#48627 ) This commit fixes intermittent failures in ShuffleForcedMergePolicyTests#testDiagnostics by setting a more restricted merge policy that ensures that extra merging will not happen before the forced merge.	2019-10-29 13:46:55 +01:00
Jim Ferenczi	c6abe58f63	Fix expectations in SearchAfter integration tests (#48372 ) This commit fixes the expectations of SearchAfterIT#shouldFail regarding the inner exceptions that should be thrown when testing failures. The exception is sometimes wrapped in a QueryShardException so this change only checks that the toString representation contains the expected message. Closes #43143	2019-10-29 12:37:22 +01:00
Yannick Welsch	6af3ce58f8	Filter on node id in AllocationIdIT (#48623 ) Makes the assertions more targeted. Relates #48529	2019-10-29 12:10:48 +01:00
Jim Ferenczi	028084ce23	Add a new merge policy that interleaves old and new segments on force merge (#48533 ) This change adds a new merge policy that interleaves eldest and newest segments picked by MergePolicy#findForcedMerges and MergePolicy#findForcedDeletesMerges. This allows time-based indices, that usually have the eldest documents first, to be efficient at finding the most recent documents too. Although we wrap this merge policy for all indices even though it is mostly useful for time-based but there should be no overhead for other type of indices so it's simpler than adding a setting to enable it. This change is needed in order to ensure that the optimizations that we are working on in # remain efficient even after running a force merge. Relates #37043	2019-10-29 10:44:56 +01:00
Armin Braun	53a22b8a8a	Fix Validity of RepositoryDataTests Randomness (#48564 ) (#48566 ) Trivial point, but we were only testing shard generations for a single shard here, accidentally, and not testing the `null` generation case at all.	2019-10-28 11:04:57 +01:00
Nhat Nguyen	1ef87c9a68	Refresh should not acquire readLock (#48414 ) Today, we hold the engine readLock while refreshing. Although this choice simplifies the correctness reasoning, it can block IndexShard from closing if warming an external reader takes time. The current implementation of refresh does not need to hold readLock as ReferenceManager can handle errors correctly if the engine is closed in midway. This PR is a prerequisite that we need to solve #47186.	2019-10-25 17:32:35 -04:00
Dan Hermann	2e3db518c9	Do not reference values for filtered settings (#48066 ) (#48518 )	2019-10-25 16:22:11 -05:00
Tim Brooks	f5f1072824	Multiple remote connection strategy support (#48496 ) * Extract remote "sniffing" to connection strategy (#47253) Currently the connection strategy used by the remote cluster service is implemented as a multi-step sniffing process in the RemoteClusterConnection. We intend to introduce a new connection strategy that will operate in a different manner. This commit extracts the sniffing logic to a dedicated strategy class. Additionally, it implements dedicated tests for this class. Additionally, in previous commits we moved away from a world where the remote cluster connection was mutable. Instead, when setting updates are made, the connection is torn down and rebuilt. We still had methods and tests hanging around for the mutable behavior. This commit removes those. * Introduce simple remote connection strategy (#47480) This commit introduces a simple remote connection strategy which will open remote connections to a configurable list of user supplied addresses. These addresses can be remote Elasticsearch nodes or intermediate proxies. We will perform normal clustername and version validation, but otherwise rely on the remote cluster to route requests to the appropriate remote node. * Make remote setting updates support diff strategies (#47891) Currently the entire remote cluster settings infrastructure is designed around the sniff strategy. As we introduce an additional conneciton strategy this infrastructure needs to be modified to support it. This commit modifies the code so that the strategy implementations will tell the service if the connection needs to be torn down and rebuilt. As part of this commit, we will wait 10 seconds for new clusters to connect when they are added through the "update" settings infrastructure. * Make remote setting updates support diff strategies (#47891) Currently the entire remote cluster settings infrastructure is designed around the sniff strategy. As we introduce an additional conneciton strategy this infrastructure needs to be modified to support it. This commit modifies the code so that the strategy implementations will tell the service if the connection needs to be torn down and rebuilt. As part of this commit, we will wait 10 seconds for new clusters to connect when they are added through the "update" settings infrastructure.	2019-10-25 09:29:41 -06:00
Luca Cavanna	d6d2edf324	Fix .tasks index strict mapping: parent_id should be parent_task_id (#48393 ) * Fix .tasks index strict mapping: parent_id should be parent_task_id The .tasks index has mappings that's strictly defined. `parent_task_id` was defined as `parent_id` though which would cause an exception in case a task is persisted that has a parent task id set. While at it, a couple of compiler warnings were addressed and a test request builder was removed in favour of using its corresponding request. * increment version	2019-10-25 17:00:06 +02:00
Luca Cavanna	9c48ed12bc	Remove response search phase from ExpandSearchPhase (#48401 ) The expand phase is always created providing a function that builds the next phase to be run, which has a single purpose: sending the response back. Such small search phase is not necessary and causes some issues when reporting search progress and counting the search phases that need to be executed and that are already executed. We can simply rather send back the response, without creating a specific phase for that.	2019-10-25 17:00:06 +02:00
Armin Braun	84a47a9632	Remove Outdated AwaitsFix (#48513 ) (#48522 ) This `AwaitsFix` was accidentally added after the test was already fixed in #46594 => we can remove it.	2019-10-25 16:08:56 +02:00
Armin Braun	edab3748e9	Remove Incorrect Assertion from SnapshotsInProgress (#47458 ) (#48514 ) This relates to the effort towards #46250. We added tracking of the shard generation for successful snapshots to `8.0`. This assertion isn't correct though. While an `8.0` master won't create an entry with sucess state and a null shard generation it may still (on e.g. master failover) send a success entry created by a 7.x master with a `null` generation over the wire. Closes #47406	2019-10-25 15:03:23 +02:00
Christoph Büscher	3fb3397c12	BlendedTermQuery's equals method should consider boosts (#48193 ) This changes the queries equals() method so that the boost factors for each term are considered for the equality calculation. This means queries are only equal if both their terms and associated boosts match. The ordering of the terms doesn't matter as before, which is why we internally need to sort the terms and boost for comparison on the first equals() call like before. Boosts that are `null` are considered equal to boosts of 1.0f because topLevelQuery() will only wrap into BoostQuery if boost is not null and different from 1f. Closes #48184	2019-10-25 13:35:14 +02:00
Yannick Welsch	486794f24d	Show task ID in source of persistent task state update (#48483 ) Relates #48395	2019-10-25 10:29:16 +02:00
Tim Brooks	c0b545f325	Make BytesReference an interface (#48486 ) BytesReference is currently an abstract class which is extended by various implementations. This makes it very difficult to use the delegation pattern. The implication of this is that our releasable BytesReference is a PagedBytesReference type and cannot be used as a generic releasable bytes reference that delegates to any reference type. This commit makes BytesReference an interface and introduces an AbstractBytesReference for common functionality.	2019-10-24 15:39:30 -06:00
Yannick Welsch	acf6d34d69	Always use last properly persisted metadata as previous state (#47779 ) On data-only nodes we were not using the last persisted cluster state as base point to compute what needed storage, but the last applied cluster state (but not necessarily properly persisted) instead.	2019-10-24 13:30:59 +02:00
David Turner	50518359fe	Fix relocating shards size calculation (#48421 ) In #48392 we added a second computation of the sizes of the relocating shards in `canRemain()` but passed the wrong value for `subtractLeavingShards`. This fixes that. It also removes some unnecessary logging in a test case added in the same commit.	2019-10-24 08:58:50 +01:00
Jim Ferenczi	dc5c31d67a	Add a deprecation warning regarding allocation awareness in search request (#48351 ) This is a follow up of https://github.com/elastic/elasticsearch/issues/43453 where we added a system property to disallow allocation awareness in search requests. Since search requests will no longer check the allocation awareness attributes for routing in the next major version, this change adds a deprecation warning on any setup that uses these attributes. Relates #43453	2019-10-24 09:25:50 +02:00
Mayya Sharipova	9e9533f717	Correct syntax from backport User older format of map Relates to #48425	2019-10-23 17:19:15 -04:00
Mayya Sharipova	975dbecfa9	Correct rewritting of script_score query (#48425 ) Previously there was a bug when an query inside script_score query was rewritten. If min_score was not set and was equal to null, we were converting it to float value which resulted to NPE. This commit corrects this. Closes #48081	2019-10-23 17:01:51 -04:00
Igor Motov	bdbc353dea	Geo: improve handling of out of bounds points in linestrings (#47939 ) Brings handling of out of bounds points in linestrings in line with points. Now points with latitude above 90 and below -90 are handled the same way as for points by adjusting the longitude by moving it by 180 degrees. Relates to #43916	2019-10-23 14:17:44 -04:00
Jim Ferenczi	41116eb7ea	Do not throw errors on unknown types in SearchAfterBuilder (#48147 ) * Do not throw errors on unknown types in SearchAfterBuilder The support for BigInteger and BigDecimal was added for XContent in https://github.com/elastic/elasticsearch/pull/32888. However the SearchAfterBuilder xcontent parser doesn't expect them to be present so it throws an AssertionError. This change fixes this discrepancy by changing the AssertionError into an IllegalArgumentException that will not cause the node to die when thrown. Closes #48074	2019-10-23 20:02:14 +02:00
Tom Callahan	892264a97a	Add versions 7.4.2 and 6.8.5	2019-10-23 13:32:51 -04:00
David Turner	c783a20560	Handle negative free disk space in deciders (#48392 ) Today it is possible that the total size of all relocating shards exceeds the total amount of free disk space. For instance, this may be caused by another user of the same disk increasing their disk usage, or may be due to how Elasticsearch double-counts relocations that are nearly complete particularly if there are many concurrent relocations in progress. The `DiskThresholdDecider` treats negative free space similarly to zero free space, but it then fails when rendering the messages that explain its decision. This commit fixes its handling of negative free space. Fixes #48380	2019-10-23 18:16:41 +01:00
Adrien Grand	81ef72d3ef	Lucene#asSequentialBits gets the leadCost backwards. (#48335 ) (#48403 ) The comment says it needs random-access, but it passes `Long#MAX_VALUE` as the lead cost, which forces sequential access, it should pass `0` instead. I took advantage of this fix to improve the logic to leverage an estimation of the number of times that `Bits#get` gets called to make better decisions.	2019-10-23 17:48:17 +02:00
Przemyslaw Gomulka	aaa6209be6	[7.x] [Java.time] Calculate week of a year with ISO rules BACKPORT(#48209 ) (#48349 ) Reverting the change introducing IsoLocal.ROOT and introducing IsoCalendarDataProvider that defaults start of the week to Monday and requires minimum 4 days in first week of a year. This extension is using java SPI mechanism and defaults for Locale.ROOT only. It require jvm property java.locale.providers to be set with SPI,COMPAT closes #41670 backport #48209	2019-10-23 17:39:38 +02:00
Armin Braun	7215201406	Track Shard-Snapshot Index Generation at Repository Root (#48371 ) This change adds a new field `"shards"` to `RepositoryData` that contains a mapping of `IndexId` to a `String[]`. This string array can be accessed by shard id to get the generation of a shard's shard folder (i.e. the `N` in the name of the currently valid `/indices/${indexId}/${shardId}/index-${N}` for the shard in question). This allows for creating a new snapshot in the shard without doing any LIST operations on the shard's folder. In the case of AWS S3, this saves about 1/3 of the cost for updating an empty shard (see #45736) and removes one out of two remaining potential issues with eventually consistent blob stores (see #38941 ... now only the root `index-${N}` is determined by listing). Also and equally if not more important, a number of possible failure modes on eventually consistent blob stores like AWS S3 are eliminated by moving all delete operations to the `master` node and moving from incremental naming of shard level index-N to uuid suffixes for these blobs. This change moves the deleting of the previous shard level `index-${uuid}` blob to the master node instead of the data node allowing for a safe and consistent update of the shard's generation in the `RepositoryData` by first updating `RepositoryData` and then deleting the now unreferenced `index-${newUUID}` blob. __No deletes are executed on the data nodes at all for any operation with this change.__ Note also: Previous issues with hanging data nodes interfering with master nodes are completely impossible, even on S3 (see next section for details). This change changes the naming of the shard level `index-${N}` blobs to a uuid suffix `index-${UUID}`. The reason for this is the fact that writing a new shard-level `index-` generation blob is not atomic anymore in its effect. Not only does the blob have to be written to have an effect, it must also be referenced by the root level `index-N` (`RepositoryData`) to become an effective part of the snapshot repository. This leads to a problem if we were to use incrementing names like we did before. If a blob `index-${N+1}` is written but due to the node/network/cluster/... crashes the root level `RepositoryData` has not been updated then a future operation will determine the shard's generation to be `N` and try to write a new `index-${N+1}` to the already existing path. Updates like that are problematic on S3 for consistency reasons, but also create numerous issues when thinking about stuck data nodes. Previously stuck data nodes that were tasked to write `index-${N+1}` but got stuck and tried to do so after some other node had already written `index-${N+1}` were prevented form doing so (except for on S3) by us not allowing overwrites for that blob and thus no corruption could occur. Were we to continue using incrementing names, we could not do this. The stuck node scenario would either allow for overwriting the `N+1` generation or force us to continue using a `LIST` operation to figure out the next `N` (which would make this change pointless). With uuid naming and moving all deletes to `master` this becomes a non-issue. Data nodes write updated shard generation `index-${uuid}` and `master` makes those `index-${uuid}` part of the `RepositoryData` that it deems correct and cleans up all those `index-` that are unused. Co-authored-by: Yannick Welsch <yannick@welsch.lu> Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2019-10-23 10:58:26 +01:00
Jim Ferenczi	50f565b158	SearchSlowLog uses a non thread-safe object to escape json (#48363 ) This commit fixes the usage of JsonStringEncoder#quoteAsUTF8 in the SearchSlowLog. JsonStringEncoder#getInstance should always be called to get a thread local object but this assumption was broken by #44642. This means that any slow log can throw an AIOOBE since it uses the same byte array concurrently. Closes #48358	2019-10-23 10:23:06 +02:00
Armin Braun	8a02a5fc7d	Simplify Shard Snapshot Upload Code (#48155 ) (#48345 ) The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.	2019-10-22 17:17:09 +01:00
Nhat Nguyen	d0a4bad95b	Use MultiFileTransfer in CCR remote recovery (#44514 ) Relates #44468	2019-10-21 23:30:52 -04:00
Armin Braun	e65c60915a	Cleanup FileRestoreContext Abstractions (#48173 ) (#48300 ) This class is only used by the blob store repository and CCR and the abstractions didn't really make sense with CCR ignoring the concrete `restoreFiles` method completely and having a method used only by the blobstore overriden as unsupported. => Moved to a more fitting set of abstractions => Dried up the stream wrapping in `BlobStoreRepository` a little now that the `restoreFile` method could be simplified Relates #48110 as it makes changing the API of `FileRestoreContext` to what is needed for async restores simpler	2019-10-21 17:30:35 +02:00
Armin Braun	dc08feadc6	Remove Redundant Version Param from Repository APIs (#48231 ) (#48298 ) This parameter isn't used by any implementation	2019-10-21 16:20:45 +02:00
David Turner	672b2a92ca	Fix compile error from previous commit (#48230 ) The previous commit, `3a6fa0bbdb` introduces a compile error that was fixed locally but not committed. This commit adds the missing change.	2019-10-21 08:54:04 +01:00
David Turner	3a6fa0bbdb	Close query cache on index service creation failure (#48230 ) Today it is possible that we create the `QueryCache` and then fail to create the owning `IndexService` and this means we do not close the `QueryCache` again. This commit addresses that leak. Fixes #48186	2019-10-21 08:46:53 +01:00
Ignacio Vera	b1224fca8c	upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227 )	2019-10-21 08:21:09 +02:00
Takuya Kajiwara	a56daeae2d	[DOCS] Fix typos in InternalEngine.java comments (#46861 )	2019-10-18 10:36:58 -04:00
David Turner	a8bcbbc38a	Quieter logging from the DiskThresholdMonitor (#48115 ) Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes #48038	2019-10-18 15:00:14 +01:00
Armin Braun	1157775074	Remove Support for pre-5.x Indices in Restore (#48181 ) (#48199 ) The logic for handling empty segment files has been unnecessary ever since #24021 which removes the support for these files in 6.x -> we can safely remove the support for restoring these from 7.x+ to simplify the code.	2019-10-18 09:45:07 +02:00
Przemyslaw Gomulka	02d18f5c1e	[7.x] Slow log must use separate underlying logger for each index BACKPORT(#47234 ) (#48176 ) * Slow log must use separate underlying logger for each index (#47234) SlowLog instances should not share the same underlying logger, as it would cause different indexes override each other levels. When creating underlying logger, unique per index identifier should be used. Name + IndexSettings.UUID Closes #42432	2019-10-17 20:04:57 +02:00
Armin Braun	04e3316408	Stop Resolving Fallback IndexId (#48141 ) (#48204 ) There is no reason to still resolve the fallback `IndexId` here. It only applies to `2.x` repos and those we can't read anymore anyway because they use an `/index` instead of an `/index-N` blob at the repo root for which at least 7.x+ does not contain the logic to find it.	2019-10-17 19:27:49 +02:00
Stuart Tettemer	356eef00c8	Scripting: get context names REST API (#48026 ) (#48168 ) Adds `GET /_script_context`, returning a `contexts` object with each available context as a key whose value is an empty object. eg. ``` { "contexts": { "aggregation_selector": {}, "aggs": {}, "aggs_combine": {}, ... } } ``` refs: #47411	2019-10-17 09:08:55 -06:00
Armin Braun	0ca7cc1848	Safely Close Repositories on Node Shutdown (#48020 ) (#48107 ) We were not closing repositories on Node shutdown. In production, this has little effect but in tests shutting down a node using `MockRepository` and is currently stuck in a simulated blocked-IO situation will only unblock when the node's threadpool is interrupted. This might in some edge cases (many snapshot threads and some CI slowness) result in the execution taking longer than 5s to release all the shard stores and thus we fail the assertion about unreleased shard stores in the internal test cluster. Regardless of tests, I think we should close repositories and release resources associated with them when closing a node and not just when removing a repository from the CS with running nodes as this behavior is really unexpected. Fixes #47689	2019-10-17 07:55:05 +02:00
Armin Braun	f1bc3a0753	Remove TestLogging for #46701 (#48156 ) (#48160 ) This hasn't failed in 5 weeks now. Removing the test logging and closing the issue. Closes #46701	2019-10-17 07:54:20 +02:00
Jack Conradson	fa99721295	Drop stored scripts with the old style-id (#48078 ) This PR fixes (#47593). Stored scripts with the old-style id of lang#id are saved through the upgrade process but are no longer accessible in recent versions. This fix will drop those scripts altogether since there is no way for a user to access them.	2019-10-16 16:10:31 -07:00
jimczi	b2dc98562b	Bump version to 7.6	2019-10-16 15:57:12 +02:00
Klemen Košir	8243e99134	Fix typo in QueryBuilders Javadoc. (#47362 ) This PR fixes a typo in the Javadoc for terms queries in QueryBuilders.	2019-10-15 16:16:21 -07:00
Martijn van Groningen	aff0c9babc	This commits merges (#48040 ) the enrich-7.x feature branch, which is backport merge and adds a new ingest processor, named enrich processor, that allows document being ingested to be enriched with data from other indices. Besides a new enrich processor, this PR adds several APIs to manage an enrich policy. An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner. Related to #32789	2019-10-15 17:31:45 +02:00
jimczi	b858e19bcc	Revert #46598 that breaks the cachability of the sub search contexts.	2019-10-15 09:40:59 +02:00
Martijn van Groningen	cc4b6c43b3	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-15 07:23:47 +02:00
Jim Ferenczi	ef02a736ca	Don't apply the plugin's reader wrapper in can_match phase (#47816 ) This change modifies the local execution of the `can_match` phase to not apply the plugin's reader wrapper (if it is configured) when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes #46817	2019-10-14 13:07:05 +02:00
Martijn van Groningen	d4901a71d7	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-14 10:27:17 +02:00
Nhat Nguyen	8180cf1e68	Mute testDoNotInfinitelyWaitForMapping Tracked at #47974	2019-10-13 22:06:50 -04:00
Nhat Nguyen	2995d4a9c0	Sequence number based replica allocation (#46959 ) With this change, shard allocation prefers allocating replicas on a node that already has a copy of the shard that is as close as possible to the primary, so that it is as cheap as possible to bring the new replica in sync with the primary. Furthermore, if we find a copy that is identical to the primary then we cancel an ongoing recovery because the new copy which is identical to the primary needs no work to recover as a replica. We no longer need to perform a synced flush before performing a rolling upgrade or full cluster start with this improvement. Closes #46318	2019-10-13 22:06:50 -04:00
Nhat Nguyen	4f06225928	Avoid unneeded refresh with concurrent realtime gets (#47895 ) This change should reduce refreshes for a use-case where we perform multiple realtime gets at the same time on an active index. Currently, we only call refresh if the index operation is still on the versionMap. However, at the time we call refresh, that operation might be already or will be included in the latest reader. Hence, we do not need to refresh. Adding another lock here is not an issue as the refresh is already sequential.	2019-10-13 20:08:21 -04:00
Nhat Nguyen	4c1bb210cb	Force flush in translog retention policy test (#47879 ) If we roll translog but do not index, then a flush without force is a noop. In this case, the number of retained translog files will be higher than the value specified by the retention policy. Closes #4741	2019-10-13 20:08:21 -04:00
Przemyslaw Gomulka	6ab58de7ef	[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675 ) (#47913 ) Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from Within DateFormatters we use the correct uuuu year instead of yyyy year of era worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.	2019-10-11 21:19:56 +02:00
Christoph Büscher	2ef12c37f5	Add builder for distance_feature to QueryBuilders (#47846 ) The QueryBuilders convenience class is currently missing a shortcut to construct a DistanceFeatureQueryBuilder, which is added here. Closes #47767	2019-10-11 18:20:01 +02:00
Alan Woodward	ec9198d0e2	Adjust Version.V_6_8_4 to refer to Lucene 7.7.2 (#47926 ) 6.8.4 will ship with Lucene 7.7.2, so we need to change our version settings to reflect this. Relates #47901	2019-10-11 17:01:42 +01:00
David Turner	ba62eb3dce	Allow truncation of clean translog (#47866 ) Today the `elasticsearch-shard remove-corrupted-data` tool will only truncate a translog it determines to be corrupt. However there may be other cases in which it is desirable to truncate the translog, for instance if an operation in the translog cannot be replayed for some reason other than corruption. This commit adds a `--truncate-clean-translog` option to skip the corruption check on the translog and blindly truncate it.	2019-10-11 15:48:12 +01:00
Henning Andersen	a0d0866f59	Shrink should not touch max_retries (#47719 ) Shrink would set `max_retries=1` in order to avoid retrying. This however sticks to the shrunk index afterwards, causing issues when a shard copy later fails to allocate just once. Avoiding a retry of a shrink makes sense since there is no new node to allocate to and a retry will likely fail again. However, the downside of having max_retries=1 afterwards outweigh the benefit of not retrying the failed shrink a few times. This change ensures shrink no longer sets max_retries and also makes all resize operations (shrink, clone, split) leave the setting at default value rather than copy it from source.	2019-10-11 14:22:56 +02:00
Przemyslaw Gomulka	0c439fe495	[7.x] Allow partial parsing dates (#47872 ) backport(#46814 ) Enable partial parsing of date part. This is making the behaviour in java.time implementation the same as with joda. 2018, 2018-01 and 2018-01-01 are all valid dates for date_optional_time or strict_date_optional_time closes #45284 closes #47473	2019-10-11 11:17:19 +02:00
Zachary Tong	2de3411c9c	Make sibling pipeline agg ctor's protected (#42808 ) SiblingPipelineAggregator is a public interfaces, but the ctor was package-private. These should be protected so that plugin authors can extend and implement their own sibling pipeline agg.	2019-10-10 12:31:14 -04:00
Martijn van Groningen	102016d571	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-10 14:44:05 +02:00
Jim Ferenczi	bd6e2592a7	Remove the SearchContext from the highlighter context (#47733 ) Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523	2019-10-10 10:34:10 +02:00
Jim Ferenczi	3d334a262b	Ensure that we don't call listener twice when detecting a partial failure in _search (#47694 ) This change fixes a bug that can occur when a shard failure is detected while we build the search response and accept partial failures in set to false. In this case we currently call onFailure on the provided listener but also continue the search as if the failure didn't occur. This can lead to a listener called twice, once with onFailure and once with onSuccess which is forbidden by design.	2019-10-10 09:59:49 +02:00
dengweisysu	dc4224fbdf	Sync translog without lock before trim unreferenced readers (#47790 ) This commit is similar to the optimization made in #45765. With this change, we fsync most of the data of the current generation without holding writeLock when trimming unreferenced readers. Relates #45765	2019-10-09 17:56:30 -04:00
Armin Braun	302e09decf	Simplify some Common ActionRunnable Uses (#47799 ) (#47828 ) Especially in the snapshot code there's a lot of logic chaining `ActionRunnables` in tricky ways now and the code is getting hard to follow. This change introduces two convinience methods that make it clear that a wrapped listener is invoked with certainty in some trickier spots and shortens the code a bit.	2019-10-09 23:29:50 +02:00

1 2 3 4 5 ...

3996 Commits