OpenSearch

Commit Graph

Author	SHA1	Message	Date
Tim Brooks	c7a7c69b2b	Simplify NioChannel creation and closing process (#25504 ) Currently an NioChannel is created and it is UNREGISTERED. At some point it is registered with a selector. From that point on, the channel can only be closed by the selector. The fact that a channel might not be associated with a selector has significant implications for concurrency and the channel shutdown process. The only thing that is simplified by allowing channels to be in a state independent of a selector is some testing scenarios. This PR modifies channels so that they are given a selector at creation time and are always associated with that selector. Only that selector can close that channel. This simplifies the channel lifecycle and closing intricacies.	2017-07-21 11:55:23 -05:00
Zachary Tong	caef6cc128	[TEST] Move version skip to setup in Indices.GetMapping#70_legacy_multi_type (#25816 ) Since the setup attempts to create an index with two types, and the setup runs before any test, this will fail on versions 6.0+ before it has a chance to check the skip in each individual test. Moving to the setup resolves this issue.	2017-07-21 11:53:48 -04:00
Boaz Leskes	ab1636d547	Engine - do not index operations with seq# lower than the local checkpoint into lucene (#25827 ) When a replica processes out of order operations, it can drop some due to version comparisons. In the past that would have resulted in a VersionConflictException being thrown and the operation was totally ignored. With the seq# push, we started storing these operations in the translog (but not indexing them into lucene) in order to have complete op histories to facilitate ops based recoveries. This in turn had the undesired effect that deleted docs may be resurrected during recovery in some extreme edge situation (see a complete explanation below). This PR contains a simple fix, which is also an optimization for the recovery process, incoming operation that have a seq# lower than the current local checkpoint (i.e., have already been processed) should not be indexed into lucene. Note that sometimes we can also skip storing them in the translog, but this is not required for the fix and is more complicated. This is the equivalent of #25592 ## More details on resurrected ops Consider two operations: - Index d1, seq no 1 - Delete d1, seq no 3 On a replica they come out of order: - Translog gen 1 contains: - delete (seqNo 3) - Translog gen 2 contains: - index (seqNo 1) (wasn't indexed into lucene, but put into the translog) - another operation (seqNo 10) - Translog gen 3 - another op (seqNo 9) - Engine commits with: - local checkpoint 9 - refers to gen 2 If this replica becomes a primary: - Local recovery will replay translog gen 2 and up, causing index #1 to be re-index. - Even if recovery will start at gen 3, the translog retention policy will cause file based recovery to replay the entire translog. If it happens to start at gen 2 (but not 1), we will run into the same problem. #### Some context - out of order delivery involving deletes: On normal operations, this relies on the gc_deletes setting. We assume that the setting represents an upper bound on the time between the index and the delete operation. The index operation will be detected as stale based on the tombstone map in the LiveVersionMap. Recovery presents a challenge as it can replay an old index operation that was in the translog and override a delete operation that was done when the engine was opened (and is not part of the replayed snapshot). To deal with this situation, we disable GC deletes (i.e. retain all deletes) for the duration of recoveries. This means that the delete operation will be remembered and the index operation ignored. Both of the above scenarios (local recover + peer recovery) create a situation where the delete operation is never replayed. It this "lost" as lucene doesn't remember it happened and our LiveVersionMap is populated with it. #### Solution: Note that both local and peer recovery represent a scenario where we replay translog ops on top of an existing lucene index, potentially with ongoing indexing. Therefore we can treat them the same. The local checkpoint in Lucene represent a marker indicating that all operations below it were performed on the index. This is the only form of "memory" that we have that relates to deletes. If we can achieve the following: 1) All ops below the local checkpoint are not indexed to lucene. 2) All ops above the local checkpoint are It will mean that all variants are covered: (i# == index op seq#, d# == delete op seq#, lc == local checkpoint in commit) 1) i# < d# <= lc - document is already deleted in lucene and stays that way. 2) i# <= lc < d# - delete is replayed on index - document is deleted 3) lc < i# < d# - index is replayed and then delete - document is deleted. More formally - we want to make sure that for all ops that performed on the primary o1 and o2, if o2 is processed on a shard before o1, o1 will be dropped. We have the following scenarios 1) If both o1 or o2 are not included in the replayed snapshot and are above it (i.e., have a higher seq#), they fall under the gc deletes assumption. 2) If both o1 is part of the replayed snapshot but o2 is above it: - if o2 arrives first, o1 must arrive due to the recovery and potentially via replication as well. since gc deletes is disabled we are guaranteed to know of o2's existence. 3) If both o2 and o1 are part of the replayed snapshot: - we fall under the same scenarios as #2 - disabling GC deletes ensures we know of o2 if it arrives first. 4) If o1 falls before the snapshot and o2 is either part of the snapshot or higher: - Since the snapshot is guaranteed to contain all ops that are not part of lucene and are above the lc in the commit used, this means that o1 is part of lucene and o1 < local checkpoint. This means it won't be processed and we're not in the scenario we're discussing. 5) If o2 falls before the snapshot but o1 is part of it: - by the same reasoning above, o2 is < local checkpoint. Since o1 < o2, we also get o1 < local checkpoint and this will be dropped. #### Implementation: For local recovery, we can filter the ops we read of the translog and avoid replaying them. For peer recovery this is tricky as we do want to send the operations in order to have some history on the target shard. Filtering operations on the engine level (i.e., not indexing to lucene if op seq# <= lc) would work for both.	2017-07-21 17:19:54 +02:00
Jim Ferenczi	c3784326eb	Refactor field expansion for match, multi_match and query_string query (#25726 ) This commit changes the way we handle field expansion in `match`, `multi_match` and `query_string` query. The main changes are: - For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping. - For partial field names (with `` suffix), the expansion is done only on `keyword`, `text`, `date`, `ip` and `number` field types. Other field types are simply ignored. - For all fields (``), the expansion is done on accepted field types only (see above) and metadata fields are also filtered. - The `` notation can also be used to set `default_field` option on`query_string` query. This should replace the needs for the extra option `use_all_fields` which is deprecated in this change. This commit also rewrites simple `` query to matchalldocs query when all fields are requested (Fixes #25556). The same change should be done on `simple_query_string` for completeness. `use_all_fields` option in `query_string` is also deprecated in this change, `default_field` should be set to `*` instead. Relates #25551	2017-07-21 16:52:57 +02:00
Boaz Leskes	47f92d7c62	testRejectingJoinWithIncompatibleVersion(WithUnrecoveredState) should use immediate priorities That will prevent race conditions with the join task, causing failures.	2017-07-21 16:43:18 +02:00
Yannick Welsch	49279d26da	Reenable BWC tests after merging #25822 & #25824	2017-07-21 16:36:50 +02:00
Yannick Welsch	a2624dfcef	Move primary term from ReplicationRequest to ConcreteShardRequest (#25822 ) Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the ReplicationOperation universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries).	2017-07-21 15:57:42 +02:00
Clinton Gormley	4935bce02c	Added a script to change the labels on github issues which match the search (#25828 ) For instance: ./dev-tools/github_relabel.pl --state=open --labels=v5.5.1 --remove=v5.5.1 --add=v5.5.2	2017-07-21 14:41:16 +02:00
Yannick Welsch	d6a8984be6	Make sure shard is not closed when updating local checkpoint If a primary shard is relocated, and then subsequently closed, there is a short window where ReplicationOperation could access the closed shard (engine is not shut down yet) and, because it does not know that the shard was relocated, try to update the local checkpoint, tripping an assertion in GlobalCheckPointTracker that a local checkpoint cannot be updated if it's not in primary mode.	2017-07-21 14:27:39 +02:00
Colin Goodheart-Smithe	f1f1725fcf	[DOCS] improve explanation of dynamic mapping setting (#25829 ) Closes #25825	2017-07-21 12:24:38 +01:00
Simon Willnauer	682abb90ee	[TEST] Rename variable to make it less confusing	2017-07-21 13:02:33 +02:00
Yannick Welsch	fd57101952	Make sure shard is not closed when accessing ReplicationGroup	2017-07-21 11:45:24 +02:00
Clinton Gormley	618ff159eb	Reorganised setup docs into better order	2017-07-21 11:24:46 +02:00
javanna	c4c1e909a3	[TEST] SearchDocumentationIT#testSearch to sort on _uid instead of _id	2017-07-21 11:15:21 +02:00
Adrien Grand	91fe8d5366	Enforce that bash is used when running `gradle run`. Using `sh` means we used whatever default the system has, which is `dash` on Ubuntu, even though our startup script is written for bash (see the shebang).	2017-07-21 10:46:06 +02:00
Jason Tedor	46d75a3552	Fix broken quotes in systemd unit file The quoting for the ExecStart entry is broken as quotes must wrap an entire argument, and arguments are separated by spaces. It turns out that any quoting is unnecessary here, systemd will handle it correctly either way.	2017-07-21 17:04:49 +09:00
Simon Willnauer	0e3ad522a2	Rewrite search requests on the coordinating nodes (#25814 ) This change rewrites search requests on the coordinating node before we send requests to the individual shards. This will reduce the rewrite load and object creation for each rewrite on the executing nodes and will fetch resources only once instead of N times once per shard for queries like `terms` query with index lookups. (among percolator and geo-shape) Relates to #25791	2017-07-21 09:38:38 +02:00
Simon Willnauer	0d0c103451	First increment shard stats before notifing and potentially sending response (#25818 ) When we skip a shard we should first increment the skip and successful shard counters before we notify the super class about a skipped shard which could send back the result before we increment the stats.	2017-07-21 08:46:10 +02:00
Jason Tedor	0310a6a947	Introduce elasticsearch-env This commit introduces the elasticsearch-env script. The purpose of this script is threefold: - vastly simplify the various scripts used in Elasticsearch - provide a script that can be included in other scripts in the Elasticsearch ecosystem (e.g., plugins) - correctly establish the environment for all scripts (e.g., so that users can run `elasticsearch-keystore` from a package distribution without having to worry about setting `CONF_DIR` first, otherwise the keystore would be created in the wrong location) Relates #25815	2017-07-21 09:38:49 +09:00
Ryan Ernst	cfdfa4705e	Bump the min compat version to 5.6.0 (#25805 ) This commit increases the min compat version for 6.0 to 5.6.0. This is already what is being tested by gradle, but the code was out of sync.	2017-07-20 13:02:07 -07:00
Ryan Ernst	8ab0d10387	Add compatibility versions to main action response (#25799 ) This commit adds the min wire/index compat versions to the main action output. Not only will this make the compatility expected more transparent, but it also allows to test which version others think the compat versions are, similar to how we test the lucene version.	2017-07-20 13:01:41 -07:00
Boaz Leskes	7488877d1a	Validate a joining node's version with version of existing cluster nodes (#25808 ) When a node tries to join a cluster, it goes through a validation step to make sure the node is compatible with the cluster. Currently we validation that the node can read the cluster state and that it is compatible with the indexes of the cluster. This PR adds validation that the joining node's version is compatible with the versions of existing nodes. Concretely we check that: 1) The node's min compatible version is higher or equal to any node in the cluster (this prevents a too-new node from joining) 2) The node's version is higher or equal to the min compat version of all cluster nodes (this prevents a too old join where, for example, the master is on 5.6, there's another 6.0 node in the cluster and a 5.4 node tries to join). 3) The node's major version is at least as higher as the lowest node in the cluster. This is important as we use the minimum version in the cluster to stop executing bwc code for operations that require multiple nodes. If the nodes are already operating in "new cluster mode", we should prevent nodes from the previous major to join (even if they are wire level compatible). This does mean that if you have a very unlucky partition during the upgrade which partitions all old nodes which are also a minority / data nodes only, the may not be able to re-join the cluster. We feel this edge case risk is well worth the simplification it brings to BWC layers only going one way. This restriction only holds if the cluster state has been recovered (i.e., the cluster has properly formed). Also, the node join validation can now selectively fail specific nodes (previously the entire batch was failed). This is an important preparation for a follow up PR where we plan to have a rejected joining node die with dignity.	2017-07-20 20:11:29 +02:00
Boaz Leskes	de6ad7a704	awaitFix testCorruptTranslogTruncationOfReplica see https://github.com/elastic/elasticsearch/issues/25817	2017-07-20 20:04:42 +02:00
Clinton Gormley	febb4bf7bc	Update removal_of_types.asciidoc Fixed `include_in_type` -> `include_type_name`	2017-07-20 19:18:51 +02:00
Jack Conradson	9f7463e796	remove lang url parameter from stored script requests (#25779 ) Also has updates to ScriptMetaData for allowing the old namespace format to be loaded all the way back through 5.0; however, it will throw an exception if two scripts share the same id but different languages.	2017-07-20 08:51:08 -07:00
Jason Tedor	137ab70d58	Fix elasticsearch-keystore handling of path.conf This commit fixes the elasticsearch-keystore script handling of path.conf; the problem here is that the script is setting a system property that is completely unobserved. Instead, we use the path.conf command line flag. Relates #25811	2017-07-20 23:01:57 +09:00
Jason Tedor	9d8f11dc27	Remove legacy checks for config file settings This commit removes legacy checks for unsupported an environment variable and unsupported system properties. This environment variable and these system properties have not been supported since 1.x so it is safe to stop checking for the existence of these settings. Relates #25809	2017-07-20 22:42:39 +09:00
Simon Willnauer	5e629cfba0	Ensure query resources are fetched asynchronously during rewrite (#25791 ) The `QueryRewriteContext` used to provide a client object that can be used to fetch geo-shapes, terms or documents for percolation. Unfortunately all client calls used to be blocking calls which can have significant impact on the rewrite phase since it occupies an entire search thread until the resource is received. In the case that the index the resource is fetched from isn't on the local node this can have significant impact on query throughput. Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction	2017-07-20 15:37:50 +02:00
Jay Modi	3e4bc027eb	RestClient uses system properties and system default SSLContext (#25757 ) This commit calls the `useSystemProperties` method on the HttpAsyncClientBuilder so that the jvm system properties are used. The primary reason for doing this is to ensure the builder uses the system default SSLContext rather than the default instance created by the http client library. Closes #23231	2017-07-20 07:36:56 -06:00
Jason Tedor	3042b5dc7d	Stop exporting HOSTNAME from scripts Today we explicitly export the HOSTNAME variable from scripts. This is probably a relic from the days when the scripts were not run on bash but instead assume a POSIX-compliant shell only where HOSTNAME is not guaranteed to exist. Yet, bash guarantees that HOSTNAME is set so we do not need to set it in scripts. This commit removes this legacy. Relates #25807	2017-07-20 22:27:47 +09:00
Jason Tedor	67a4288c9a	Remove support for ES_INCLUDE Today we enable users to customize the environment through the use of ES_INCLUDE. This made sense for legacy reasons when we did not have nicities like jvm.options (so dumped JVM options in the default include script) and somewhat duplicates some of the functionality that we will need from a dedicated environment script. This commit removes support for ES_INCLUDE as a first step towards a dedicated include script. Relates #25804	2017-07-20 15:41:59 +09:00
Jason Tedor	9aa42a438b	Unzip quietly while provisioning virtual machines When provisioning the virtual machines used for packaging, we download the Gradle zip archive and unzip. This unzip is noisy produing a lot of unnecessary output. This commit silences this output. Relates #25803	2017-07-20 12:45:56 +09:00
Boaz Leskes	9989ac69a4	Revert "Validate a joining node's version with version of existing cluster nodes (#25770 )" This reverts commit `1e1f8e6376`.	2017-07-19 17:34:53 +02:00
Simon Willnauer	4d78935df7	Introduce a new Rewriteable interface to streamline rewriting (#25788 ) Today we have duplicated code that is quite complicated to iterate over rewriteable (`QueryBuilders` mainly) This change introduces a `Rewriteable` interface that allow to share code to do the rewriting as well as encapsulation and composition of queries.	2017-07-19 15:06:49 +02:00
Adrien Grand	d607c3be92	Fix list of unconverted snippets.	2017-07-19 14:57:55 +02:00
Adrien Grand	7a0eeb3978	Fix compilation.	2017-07-19 14:46:30 +02:00
Adrien Grand	55ad318541	Reduce the overhead of timeouts and low-level search cancellation. (#25776 ) Setting a timeout or enforcing low-level search cancellation used to make us wrap the collector and check either the current time or whether the search task was cancelled for every collected document. This can be significant overhead on cheap queries that match many documents. This commit changes the approach to wrap the bulk scorer rather than the collector and exponentially increase the interval between two consecutive checks in order to reduce the overhead of those checks.	2017-07-19 14:15:53 +02:00
Adrien Grand	94a98daa37	Fix parsing of ip range queries. (#25768 ) Closes #25636	2017-07-19 14:12:54 +02:00
Adrien Grand	01f083ca83	Reduce profiling overhead. (#25772 ) Calling `System.nanoTime()` for each method call may have a significant performance impact. Closes #24799	2017-07-19 14:12:14 +02:00
Adrien Grand	f1ff7f2454	Require a field when a `seed` is provided to the `random_score` function. (#25594 ) We currently use fielddata on the `_id` field which is trappy, especially as we do it implicitly. This changes the `random_score` function to use doc ids when no seed is provided and to suggest a field when a seed is provided. For now the change only emits a deprecation warning when no field is supplied but this should be replaced by a strict check on 7.0. Closes #25240	2017-07-19 14:11:15 +02:00
Clinton Gormley	f69decf509	NOCONSOLE -> NOTCONSOLE in removal-of-types	2017-07-19 14:06:04 +02:00
Boaz Leskes	1e1f8e6376	Validate a joining node's version with version of existing cluster nodes (#25770 ) When a node tries to join a cluster, it goes through a validation step to make sure the node is compatible with the cluster. Currently we validation that the node can read the cluster state and that it is compatible with the indexes of the cluster. This PR adds validation that the joining node's version is compatible with the versions of existing nodes. Concretely we check that: 1) The node's min compatible version is higher or equal to any node in the cluster (this prevents a too-new node from joining) 2) The node's version is higher or equal to the min compat version of all cluster nodes (this prevents a too old join where, for example, the master is on 5.6, there's another 6.0 node in the cluster and a 5.4 node tries to join). 3) The node's major version is at least as higher as the lowest node in the cluster. This is important as we use the minimum version in the cluster to stop executing bwc code for operations that require multiple nodes. If the nodes are already operating in "new cluster mode", we should prevent nodes from the previous major to join (even if they are wire level compatible). This does mean that if you have a very unlucky partition during the upgrade which partitions all old nodes which are also a minority / data nodes only, the may not be able to re-join the cluster. We feel this edge case risk is well worth the simplification it brings to BWC layers only going one way. Also, the node join validation can now selectively fail specific nodes (previously the entire batch was failed). This is an important preparation for a follow up PR where we plan to have a rejected joining node die with dignity.	2017-07-19 12:57:29 +02:00
Simon Willnauer	9882d2b9d3	Reduce the scope of `QueryRewriteContext` (#25787 ) Today we provide a lot of functionality on the `QueryRewriteContext` that we potentially don't have ie. if we rewrite on a coordinating node or when we percolating. This change moves most of the unnecessary shard level or index level services and dependencies to `QueryShardContext` instead.	2017-07-19 12:30:38 +02:00
Jason Tedor	4b18800df9	Fix handling of invalid error trace parameter If a request contains an invalid error trace parameter, we send a error on the channel. This should immediately abort any additional processing of the request but instead we march on, dispatch the request and subsequently send another message on the channel. The problem here is this means two writes on the channel which leads to the request being released twice ultimately raising in illegal reference count exception. This commit addresses this by performing an early return in the case that the request contained an invalid error trace parameter. Relates #25785	2017-07-19 18:07:11 +09:00
Jason Tedor	82f52b17e1	Remove timed latch await in listeners test This commit removes a timed latch await in a transport client listeners test. The problem with a timed wait here is that on an overloaded machine, the test can fail because the waiting thread was not unlatched quickly enough. This makes the test unnecessarily flaky. Instead, we should wait indefinitely and simply let the test fail by the test timeout if the latch is not counted down for some reason. Closes #25760	2017-07-19 16:51:27 +09:00
Jason Tedor	3d3d99557d	Expand migration note regarding default paths This commit expands on the migration note regarding the removal of default.path.data and default.path.logs to include a note that users that were relying on the defaults (the common case for path.logs), and they carry over their previous elasticsearch.yml configruation file, then they must add explicit values for path.data and path.logs.	2017-07-19 13:40:42 +09:00
Deb Adair	23c810b334	[DOCS] Changes xrefs to cross doc links to enable building GS "mini-docs"	2017-07-18 13:52:38 -07:00
Deb Adair	d9e55179f1	[DOCS] Adding index file for GS "mini book".	2017-07-18 13:44:08 -07:00
Jim Ferenczi	4cd9728f55	[Test] Make sure that QueryPhaseTests#testIndexSortScrollOptimization creates segments that can be early terminated	2017-07-18 19:30:15 +02:00
Christoph Büscher	e24af64de2	Add strict parsing of aggregation ranges (#25769 ) Currently we ignore unknown field names when parsing RangeAggregator.Range and GeoDistanceAggregationBuilder.Range from `range`, `date_range` or `geo_distance` aggregations. This can hide subtle errors in the query. This change makes parsing `ranges` stricter.	2017-07-18 18:31:04 +02:00

... 2 3 4 5 6 ...

28447 Commits All Branches Search

28447 Commits

All Branches