OpenSearch

Commit Graph

Author	SHA1	Message	Date
Boaz Leskes	83c8daa5e8	FullClusterRestartIT.testRecovery - add more info on failure	2017-12-05 13:00:42 +01:00
Lee Hinman	d3721c48c3	[TEST] Add AwaitsFix for FullClusterRestartIT.testRecovery See: #27649	2017-12-04 15:42:43 -07:00
Boaz Leskes	0b50b313d2	RecoveryIT.testRecoveryWithConcurrentIndexing should check for 110 docs in an upgraded cluster Closes #27650	2017-12-04 18:06:02 +01:00
Boaz Leskes	f58a3d0b96	testRelocationWithConcurrentIndexing: wait for green (on relevan index) and shard initialization to settle down before starting relocation	2017-12-04 13:18:42 +01:00
Boaz Leskes	1a976ea7a4	Cherry pick tests and seqNo recovery hardning from #27580	2017-12-04 13:15:40 +01:00
Jason Tedor	cd67f6a8d7	Enable GC logs by default For too long we have been groping around in the dark when faced with GC issues because we rarely have GC logs at our disposal. This commit enables GC logging by default out of the box. Relates #27610	2017-12-03 08:33:21 -05:00
Jason Tedor	0519fa223c	Ensure logging is configured for CLI commands Any CLI commands that depend on core Elasticsearch might touch classes (directly or indirectly) that depends on logging. If they do this and logging is not configured, Log4j will dump status error messages to the console. As such, we need to ensure that any such CLI command configures logging (with a trivial configuration that dumps log messages to the console). Previously we did this in the base CLI command but with the refactoring of this class out of core Elasticsearch, we no longer configure logging there (since we did not want this class to depend on settings and logging). However, this meant for some CLI commands (like the plugin CLI) we were no longer configuring logging. This commit adds base classes between the low-level command and multi-command classes that ensure that logging is configured. Any CLI command that depends on core Elasticsearch should use this infrastructure to ensure logging is configured. There is one exception to this: Elasticsearch itself because it takes reponsibility into its own hands for configuring logging from Elasticsearch settings and log4j2.properties. We preserve this special status. Relates #27523	2017-11-25 11:40:08 -05:00
David Turner	89ba8996c6	Consolidate version numbering semantics (#27397 ) Fixes to the build system, particularly around BWC testing, and to make future version bumps less painful.	2017-11-23 20:21:53 +00:00
Simon Willnauer	fadbe0de08	Automatically prepare indices for splitting (#27451 ) Today we require users to prepare their indices for split operations. Yet, we can do this automatically when an index is created which would make the split feature a much more appealing option since it doesn't have any 3rd party prerequisites anymore. This change automatically sets the number of routinng shards such that an index is guaranteed to be able to split once into twice as many shards. The number of routing shards is scaled towards the default shard limit per index such that indices with a smaller amount of shards can be split more often than larger ones. For instance an index with 1 or 2 shards can be split 10x (until it approaches 1024 shards) while an index created with 128 shards can only be split 3x by a factor of 2. Please note this is just a default value and users can still prepare their indices with `index.number_of_routing_shards` for custom splitting. NOTE: this change has an impact on the document distribution since we are changing the hash space. Documents are still uniformly distributed across all shards but since we are artificually changing the number of buckets in the consistent hashign space document might be hashed into different shards compared to previous versions. This is a 7.0 only change.	2017-11-23 09:48:54 +01:00
javanna	3eeccb7791	Update version check for CCS optional remote clusters also fixed the remote.info yaml test to clean up the registered remote cluster once the test is completed. Relates to #27182	2017-11-21 16:52:45 +01:00
Luca Cavanna	29450de7b5	Cross Cluster Search: make remote clusters optional (#27182 ) Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that. This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory. Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile. The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped: "_clusters" : { "total" : 3, "successful" : 2, "skipped" : 1 } Such section won't be part of the response if no clusters have been skipped. The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.	2017-11-21 11:41:47 +01:00
Michael Basnight	cb3e8f4763	Move the CLI into its own subproject (#27114 ) Projects the depend on the CLI currently depend on core. This should not always be the case. The EnvironmentAwareCommand will remain in :core, but the rest of the CLI components have been moved into their own subproject of :core, :core:cli.	2017-11-18 21:42:57 -06:00
Nhat Nguyen	db688e1a17	Uses TransportMasterNodeAction to update shard snapshot status (#27165 ) Currently, we are using a plain TransportRequestHandler to post snapshot status messages to the master. However, it doesn't have a robust retry mechanism as TransportMasterNodeAction. This change migrates from TransportRequestHandler to TransportMasterNodeAction for the new versions and keeps the current implementation for the old versions. Closes #27151	2017-11-17 11:54:44 -05:00
Tanguy Leroux	0b5899c647	[Test] Change Elasticsearch startup timeout to 120s in packaging tests When the vagrant box is very very slow, the elasticsearch service can take more than 60 sec to start. This commit changes the timeout to 120. closes #27372	2017-11-15 11:58:47 +01:00
Clinton Gormley	1caa5c8e32	Rest test fixes (#27354 ) * REST: Rename ingest.processor.grok to ingest.processor_grok * REST: Rename remote.info to cluster.remote_info * REST: Fixed bad YAML comments * REST: Force dummy scripts to be strings, not numbers * REST: Fix bad YAML in search/110_field_collapsing.yml * REST: Adjust percentile tests to work with Perl number handling	2017-11-14 11:14:14 +01:00
Tanguy Leroux	91a23de55e	[Test] Fix POI version in packaging tests POI version has not been updated in packaging tests in #25003. Closes #27340	2017-11-13 14:20:10 +01:00
Martijn van Groningen	4f43fe70cb	test: Sort hits by _id instead of _doc and cleanup tests by removing unneeded parameter and settings.	2017-11-10 12:11:51 +01:00
Martijn van Groningen	b4048b4e7f	Use CoveringQuery to select percolate candidate matches and extract all clauses from a conjunction query. When clauses from a conjunction are extracted the number of clauses is also stored in an internal doc values field (minimum_should_match field). This field is used by the CoveringQuery and allows the percolator to reduce the number of false positives when selecting candidate matches and in certain cases be absolutely sure that a conjunction candidate match will match and then skip MemoryIndex validation. This can greatly improve performance. Before this change only a single clause was extracted from a conjunction query. The percolator tried to extract the clauses that was rarest in order (based on term length) to attempt less candidate queries to be selected in the first place. However this still method there is still a very high chance that candidate query matches are false positives. This change also removes the influencing query extraction added via #26081 as this is no longer needed because now all conjunction clauses are extracted. https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction Closes #26307	2017-11-10 07:44:42 +01:00
Yannick Welsch	e04e5ab037	Increase logging on qa:mixed-cluster tests Hopefully helps to figure out why the nodes have trouble starting up.	2017-11-09 15:18:53 +01:00
Jason Tedor	d5451b2037	Die with dignity while merging If an out of memory error is thrown while merging, today we quietly rewrap it into a merge exception and the out of memory error is lost. Instead, we need to rethrow out of memory errors, and in fact any fatal error here, and let those go uncaught so that the node is torn down. This commit causes this to be the case. Relates #27265	2017-11-06 17:55:11 -05:00
Jason Tedor	766d29e7cf	Correctly encode warning headers The warnings headers have a fairly limited set of valid characters (cf. quoted-text in RFC 7230). While we have assertions that we adhere to this set of valid characters ensuring that our warning messages do not violate the specificaion, we were neglecting the possibility that arbitrary user input would trickle into these warning headers. Thus, missing here was tests for these situations and encoding of characters that appear outside the set of valid characters. This commit addresses this by encoding any characters in a deprecation message that are not from the set of valid characters. Relates #27269	2017-11-06 13:20:30 -05:00
David Roberts	749c3ec716	Remove the single argument Environment constructor (#27235 ) Only tests should use the single argument Environment constructor. To enforce this the single arg Environment constructor has been replaced with a test framework factory method. Production code (beyond initial Bootstrap) should always use the same Environment object that Node.getEnvironment() returns. This Environment is also available via dependency injection.	2017-11-04 13:25:09 +00:00
Martijn van Groningen	9e67cca987	build: Fix setting the incorrect bwc version in mixed cluster qa module Prior to this change if the `bwcTest` task is run then it would create task for each version, but each task in reality would use wireCompatVersions - 1 ES version. So we were not actually testing against 5.6.x versions in the 6.x and 6.0 branches.	2017-11-03 14:18:27 +01:00
Jason Tedor	8b4a92fbb7	Adjust assertions for sequence numbers BWC tests This commit adjusts the assertions for the sequence number BWC tests to account for the fact that sometimes these tests are run in mixed-clusters with 5.6 nodes (that do not understand sequence numbers), and sometimes these tests are run in mixed-cluster with 6.0+ nodes (that all understood sequence numbers). Relates #27251	2017-11-03 08:58:05 -04:00
Jason Tedor	77f87732ef	Adjust .DS_Store test assertions on Windows Windows handles trying to read a file that does not exist because a component of the path is not a directory differently than other OS handle this situation. This commit adjusts these assertions for Windows.	2017-10-25 22:36:53 -04:00
Jason Tedor	6722b9c4a2	Ignore .DS_Store files on macOS Finder creates these files if you browse a directory there. These files are really annoying, but it's an incredible pain for users that these files are created unbeknownst to them, and then they get in the way of Elasticsearch starting. This commit adds leniency on macOS only to skip these files. Relates #27108	2017-10-25 11:25:29 -04:00
Simon Willnauer	8dda827ff4	Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000 ) Today all these API calls have a sideeffect of making documents visible to search requests. While this is sometimes desired it's an unnecessary sideeffect and now that we have an internal (engine-private) index reader (#26972) we artificially add a refresh call for bwc. This change removes this sideeffect in 7.0.	2017-10-16 10:16:35 +02:00
Anton Pozhidaev	cee9640c20	Update by Query is modified to accept short `script` parameter. (#26841 ) Update by Query is modified to accept short `script` parameter. Closes issue #24898	2017-10-11 21:57:46 +00:00
kel	2e36f19051	Add support for parsing inline script (#23824 ) (#26846 ) * Add support for parsing inline script (#23824) * Fix test	2017-10-11 09:15:37 -07:00
Martijn van Groningen	19dc629e6d	Test query builder bwc against previous supported versions instead of just the current version. Relates to #25456	2017-10-09 13:22:01 +02:00
Yannick Welsch	a4436195f8	Set minimum_master_nodes on rolling-upgrade test (#26911 ) The rolling-upgrade test was only writing the "minimum_master_nodes" setting to the configuration file of the old nodes, but not the upgraded ones. Also changes the value of "minimum_master_nodes" from "number_of_nodes" to "(number_of_nodes / 2) + 1".	2017-10-09 10:45:03 +02:00
Simon Willnauer	cdd7c1e6c2	Return List instead of an array from settings (#26903 ) Today we return a `String[]` that requires copying values for every access. Yet, we already store the setting as a list so we can also directly return the unmodifiable list directly. This makes list / array access in settings a much cheaper operation especially if lists are large.	2017-10-09 09:52:08 +02:00
Nhat	bf4c3642b2	remove _primary and _replica shard preferences (#26791 ) The shard preference _primary, _replica and its variants were useful for the asynchronous replication. However, with the current impl, they are no longer useful and should be removed. Closes #26335	2017-10-08 11:03:06 -04:00
Boaz Leskes	c342cdeab5	Setup debug logging for qa.full-cluster-restart	2017-10-07 23:37:09 +02:00
Boaz Leskes	2d409a912f	full-cluster-restart tests: prevent shards from going inactive FullClusterRestartIT.testRecovery relies on the translogs not being flushed	2017-10-05 10:08:10 +02:00
Boaz Leskes	2a04118e88	Promote common rest test utility methods to ESRestTestCase We have duplicates in some classes and I was about to create one more.	2017-10-05 10:08:10 +02:00
Luca Cavanna	9b9cb81c41	Fix serialization errors when cross cluster search goes to a single shard (#26881 ) The single shard optimization that we have in our search api changes the type of response returned by the query transport action name based on the shard search request. if the request goes to one shard, we will do query and fetch at the same time, hence the response will be different. The proxying layer used in cross cluster search was not aware of this distinction, which causes serialization issues every time a cross cluster search request goes to a single shard and goes through a gateway node which has to forward the shard request to a data node. The coordinating node would then expect a QueryFetchSearchResult while the gateway would return a QuerySearchResult. Closes #26833	2017-10-04 22:39:14 +02:00
Simon Willnauer	d1533e2397	Remove Settings#getAsMap() (#26845 ) Since `#getAsMap` exposes internal representation we are trying to remove it step by step. This commit is cleaning up some xcontent writing as well as usage in tests	2017-10-04 01:21:38 -06:00
Boaz Leskes	4f8131026e	RecoveryIT.testHistoryUUIDIsGenerated should reduce unassigned shards delay instead of ensure green. The ensure green approach to avoid allocation delays caused problems with other indices created by other tests which didn't use ensure green in the various cluster stages. This aligns testHistoryUUIDIsGenerated to use the same approach used by the other test.	2017-09-30 16:48:23 +02:00
Boaz Leskes	5df77a8c91	enable debug logging for testHistoryUUIDIsGenerated (+1 squashed commit) Squashed commits: [1d4f268] enable debug logging for testHistoryUUIDIsGenerated	2017-09-26 14:49:47 +02:00
Jay Modi	b8cd82e5c2	Increase time to wait for green in rolling upgrade tests (#26781 ) This commit increases the amount of time to wait for green to accound for unassigned shards that have been delayed. The default delay is 60s, so we need to wait longer than that. Previously, the wait would timeout at 30s due to the rest client and the default for the cluster health api. Closes #26742	2017-09-25 12:39:33 -06:00
Boaz Leskes	cd2a4372b4	RecoveryIT should wait for green when in mixed cluster to avoid unassigned shards The test starts with two old nodes and creates indices (without waiting for green, which is fixed here too). Then it restarts one of the nodes and waits for it to join the cluster. This wait condition only uses wait for yellow as our generic infra doesn't how many nodes are there in total. Once the restarted node is part of the cluster (mixed mode) the second old node is restarted. If indices are not fully allocated when that happens, the shards will go into delayed unassigned mode. If the recovery of the replica never completed we may end up with corrupted / no secondary copy on the node. This will cause the shards to be delayed for 1m before being reassigned and the test will time out.	2017-09-24 22:38:20 +02:00
Boaz Leskes	2b6f75730e	RecoveryIT up client time out to 40s to see response in a 30s time	2017-09-24 21:33:20 +02:00
Jason Tedor	2e63a13c0a	Upgrade to Log4j 2.9.1 This commit upgrades the Log4j dependency, picking up a fix for an issue with handling stack traces on JDK 9. Relates #26750	2017-09-22 11:57:06 -04:00
Jason Tedor	f35d1de502	Introduce global checkpoint background sync It is the exciting return of the global checkpoint background sync. Long, long ago, in snapshot version far, far away we had and only had a global checkpoint background sync. This sync would fire periodically and send the global checkpoint from the primary shard to the replicas so that they could update their local knowledge of the global checkpoint. Later in time, as we sped ahead towards finalizing the initial version of sequence IDs, we realized that we need the global checkpoint updates to be inline. This means that on a replication operation, the primary shard would piggy back the global checkpoint with the replication operation to the replicas. The replicas would update their local knowledge of the global checkpoint and reply with their local checkpoint. However, this could allow the global checkpoint on the primary to advance again and the replicas would fall behind in their local knowledge of the global checkpoint. If another replication operation never fired, then the replicas would be permanently behind. To account for this, we added one more sync that would fire when the primary shard fell idle. However, this has problems: - the shard idle timer defaults to five minutes, a long time to wait for the replicas to learn of the new global checkpoint - if a replica missed the sync, there was no follow-up sync to catch them up - there is an inherent race condition where the primary shard could fall idle mid-operation (after having sent the replication request to the replicas); in this case, there would never be a background sync after the operation completes - tying the global checkpoint sync to the idle timer was never natural To fix this, we add two additional changes for the global checkpoint to be synced to the replicas. The first is that we add a post-operation sync that only fires if there are no operations in flight and there is a lagging replica. This gives us a chance to sync the global checkpoint to the replicas immediately after an operation so that they are always kept up to date. The second is that we add back a global checkpoint background sync that fires on a timer. This timer fires every thirty seconds, and is not configurable (for simplicity). This background sync is smarter than what we had previously in the sense that it only sends a sync if the global checkpoint on at least one replica is lagging that of the primary. When the timer fires, we can compare the global checkpoint on the primary to its knowledge of the global checkpoint on the replicas and only send a sync if there is a shard behind. Relates #26591	2017-09-21 15:34:13 -04:00
Christoph Büscher	86b00b84bc	Remove parse field deprecations in query builders (#26711 ) The `fielddata` field and the use of the `_name` field in the short syntax of the range query have been deprecated in 5.0 and can be removed. The same goes for the deprecated `score_mode` field in HasParentQueryBuilder, the deprecated `like_text`, `ids` and `docs` parameter in the `more_like_this` query, the deprecated query name in the short version of the `regexp` query, and several deprecated alternative field names in other query builders.	2017-09-20 16:22:21 +02:00
Yannick Welsch	ff1e26276d	Deguice ActionFilter (#26691 ) Allows to instantiate TransportAction instances without Guice.	2017-09-20 10:30:21 +02:00
Boaz Leskes	04385a9ce9	Restoring from snapshot should force generation of a new history uuid (#26694 ) Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore. As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid. Relates #10708 Closes #26544	2017-09-19 15:58:36 +02:00
Michael Basnight	f385e0cf26	Add bad_request to the rest-api-spec catch params (#26539 ) This adds another request to the catch params. It also makes sure that the generic request param does not allow 400 either.	2017-09-14 14:24:03 -05:00
Boaz Leskes	1ca0b5e9e4	Introduce a History UUID as a requirement for ops based recovery (#26577 ) The new ops based recovery, introduce as part of #10708, is based on the assumption that all operations below the global checkpoint known to the replica do not need to be synced with the primary. This is based on the guarantee that all ops below it are available on primary and they are equal. Under normal operations this guarantee holds. Sadly, it can be violated when a primary is restored from an old snapshot. At the point the restore primary can miss operations below the replica's global checkpoint, or even worse may have total different operations at the same spot. This PR introduces the notion of a history uuid to be able to capture the difference with the restored primary (in a follow up PR). The History UUID is generated by a primary when it is first created and is synced to the replicas which are recovered via a file based recovery. The PR adds a requirement to ops based recovery to make sure that the history uuid of the source and the target are equal. Under normal operations, all shard copies will stay with that history uuid for the rest of the index lifetime and thus this is a noop. However, it gives us a place to guarantee we fall back to file base syncing in special events like a restore from snapshot (to be done as a follow up) and when someone calls the truncate translog command which can go wrong when combined with primary recovery (this is done in this PR). We considered in the past to use the translog uuid for this function (i.e., sync it across copies) and thus avoid adding an extra identifier. This idea was rejected as it removes the ability to verify that a specific translog really belongs to a specific lucene index. We also feel that having a history uuid will serve us well in the future.	2017-09-14 21:25:02 +03:00

1 2 3 4 5 ...

828 Commits