OpenSearch

Commit Graph

Author	SHA1	Message	Date
javanna	43a1e1c353	[TEST] create client nodes using node.client: true instead node.data: false and node.master: false Create client nodes using `node.client: true` instead of `node.data: false` and `node.master: false`. We should create client nodes in our test infra using the `node.client:true` settings as that is the one that users use, and the one that we use as well in `ClientNodePredicate` thus we end up not finding client nodes otherwise as they weren't created with the proper setting. Updated also the `DataNodePredicate` so that `client: true` is enough, no need for `data: false` as well. Closes #7911	2014-09-29 15:24:17 +02:00
Lee Hinman	ab9cc336e5	[TESTS] Additional logging for `testThreadedUpdatesToChildBreakerWithParentLimit`	2014-09-29 15:06:36 +02:00
Boaz Leskes	9b4bf4379a	Test: testNodeNotReachableFromMaster had a typo when choosing a non master node	2014-09-29 11:38:39 +02:00
Alex Ksikes	5014158d6b	MLT Query: use minimum should match more extensive syntax The minimum number of optional should clauses of the generated query to match can now be set using the more extensive minimum should match syntax. This makes the `percent_terms_to_match` parameter deprecated, and replaced in favor to a new `minimum_should_match` parameter. Closes #7898	2014-09-29 11:14:56 +02:00
Boaz Leskes	03d880de38	Discovery: master fault detection fall back to cluster state thread upon error With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node. This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress. Closes #7908	2014-09-29 11:12:11 +02:00
Lee Hinman	168b3752ef	Refactor the Translog.read(Location) method It was only used by `readSource`, it has been changed to return a Translog.Operation, which can have .getSource() called on it to return the source. `readSource` has been removed. This also removes the checked IOException, any exception thrown is unexpected and should throw a runtime exception. Moves the ReleasableBytesStreamOutput allocation into the body of the try-catch block so the lock can be released in the event of an exception during allocation.	2014-09-29 10:13:45 +02:00
mikemccand	6bf635039c	Core: upgrade to Lucene 4.10.1	2014-09-28 13:42:12 -04:00
mikemccand	9e8c51b70d	fix concurrency bug in index throttling	2014-09-28 12:30:48 -04:00
Boaz Leskes	b70f0d5eef	Internal: MulticastChannel should wait on receiver thread to stop during shutdown This was signaled by our tests which shutdown class and check for thread leakage. Closes #7835	2014-09-27 14:23:07 +02:00
Martijn van Groningen	71adb3ada2	If a node is being shutdown some in flight ping request may be executed. Make sure to keep track of those ping requests and close the unicast connect executor service. Closes #7903	2014-09-27 00:05:15 +02:00
javanna	e85e07941d	Internal: split internal fetch request used within scroll and search Similar to #7856 but relates to the fetch shard level requests. We currently use the same internal request when we need to fetch within search and scroll. The two original requests though diverged after #6933 as SearchRequest implements IndicesRequest while SearchScrollRequest doesn't. That said, with #7319 we made `FetchSearchRequest` implement IndicesRequest by making it hold the original indices taken from the original request, which are null if the fetch was originated by a search scroll, and that is why original indices are optional there. This commit introduces a separate fetch request and transport action for scroll, which doesn't hold original indices. The new action is only used against nodes that expose it, the previous action name will be used for nodes older than 1.4.0.Beta1. As a result, in 1.4 we have a new `indices:data/read/search[phase/fetch/id/scroll]` action that is equivalent to the previous `indices:data/read/search[phase/fetch/id]` whose request implements now IndicesRequest and holds the original indices coming from the original request. The original indices in the latter request can only be null during a rolling upgrade (already existing version checks make sure that serialization is bw compatible), when some nodes are still < 1.4. Closes #7870	2014-09-26 18:24:53 +02:00
Britta Weber	bac1da25f6	node shutdown: make close() syncronized An example scenario where this will help: When the node is shutdown via api call (https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/test/ExternalNode.java#L219 ) then the call returns immediately even if the node is not actually shutdown yet (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/action/admin/cluster/node/shutdown/TransportNodesShutdownAction.java#L226). If at the same time the proces is killed, then the hook that would usually prevent uncontrolled shutdown (https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/bootstrap/Bootstrap.java#L75) has no effect: It again calls close() which might then just return for example because one of the lifecycles was moved to closed already. The bwc test FunctionScoreBackwardCompatibilityTests.testSimpleFunctionScoreParsingWorks failed because of this. The translog was not properly written because if the shutdown was called via api, the following process.destroy() (https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/test/ExternalNode.java#L225) killed the node before the translog was written to disk. closes #7885	2014-09-26 12:46:18 +02:00
Boaz Leskes	36c3e896de	NodesFD: simplify concurrency control to fully rely on a single map The node fault detection class is used by the master node to ping the nodes in the cluster and verify they are alive. This PR simplifies the concurrency controls in the class + adds a test for a scenario that surfaced the problem. Closes #7889	2014-09-26 11:21:55 +02:00
Boaz Leskes	db54e9c2d5	Discovery: remove any local state and use clusterService.state instead At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start. Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it. Closes #7834	2014-09-26 11:21:55 +02:00
Britta Weber	eb9d39f611	[TEST] wait for yellow else assertSearchResponse will trip	2014-09-26 11:13:12 +02:00
Britta Weber	75d2a84772	[TEST] wait for yellow else assertSearchResponse will trip	2014-09-26 10:52:44 +02:00
Michael McCandless	e207189037	Tests: turn off CheckIndex for now (it's buggy: there is a race w/ deletion of all files in the data dirs)	2014-09-26 04:44:11 -04:00
Michael McCandless	87e9aba2ac	disable CheckIndex for these no-ack tests	2014-09-26 04:08:03 -04:00
Britta Weber	526b464025	field name lookup: return List instead of Set for names matching a pattern The returned sets are only used for iterating. Therefore we might as well return a list since this guaratees order. This is the same effect as in https://github.com/elasticsearch/elasticsearch/pull/7698 The test SimpleIndexQueryParserTests#testQueryStringFieldsMatch failed on openjdk 1.7.0_65 with <jdk.map.althashing.threshold>0</jdk.map.althashing.threshold> closes #7709	2014-09-26 09:59:12 +02:00
Britta Weber	7feb742a9b	script with _score: remove dependency of DocLookup and scorer As pointed out in #7487 DocLookup is a variable that is accessible by all scripts for one doc while the query is executed. But the _score and therfore the scorer depends on the current context, that is, which part of query is currently executed. Instead of setting the scorer for DocLookup and have Script access the DocLookup for getting the score, the Scorer should just be explicitely set for each script. DocLookup should not have any reference to a scorer. This was similarly discussed in #7043. This dependency caused a stackoverflow when running script score in combination with an aggregation on _score. Also the wrong scorer was called when nesting several script scores. closes #7487 closes #7819	2014-09-26 09:59:12 +02:00
Igor Motov	9c9cd01854	Fix NumberFormatException in Simple Query String Query Incorrect usage of XContentParser.hasTextCharacters() can result in NumberFormatException as well as other possible issues in template query parser and phrase suggest parsers. Fixes #7875	2014-09-26 10:49:05 +04:00
Michael McCandless	3db50b2ebf	don't CheckIndex for this test case	2014-09-25 18:21:12 -04:00
Michael McCandless	637c6d1606	Tests: always run Lucene's CheckIndex when shards are closed in tests and fail the test if corruption is detected Today we only run 10% of the time, and the test doesn't fail when corruption is detected. I think it's better to always run and fail the test, so we can catch any possible resiliency bugs in Lucene/Elasticsearch causing corruption. For known tests that create corrupted indices, it's easy to set MockFSDirectoryService.CHECK_INDEX_ON_CLOSE to false... Closes #7730	2014-09-25 16:50:48 -04:00
markharwood	e97b8fd217	Aggs - support for arrays of numeric values in include/exclude clauses Closes #7714	2014-09-25 11:02:29 +01:00
Simon Willnauer	a90d7b1670	[TRANSPORT] never send requests after transport service is stopped With local transport or any transport that doesn't necessarily send notification if connections are closed we might miss a node disconnection and the request handler hangs forever / until the timeout kicks in. This window only exists during shutdown and is likely unproblematic in practice but tests might run into this problem when local transport is used.	2014-09-25 11:51:06 +02:00
Shay Banon	a82d486bda	Add a listener thread pool Today, when executing an action (mainly when using the Java API), a listener threaded flag can be set to true in order to execute the listener on a different thread pool. Today, this thread pool is the generic thread pool, which is cached. This can create problems for Java clients (mainly) around potential thread explosion. Introduce a new thread pool called listener, that is fixed sized and defaults to the half the cores maxed at 10, and use it where listeners are executed. relates to #5152 closes #7837	2014-09-25 11:25:13 +02:00
Simon Willnauer	4bd37d7ee6	[TEST] Reenable threadleak filters with 5 sec. lingering	2014-09-25 10:48:13 +02:00
Simon Willnauer	a236b80392	[CORE] Add ThreadPool.terminate to streamline shutdown Shutting down threadpools and executor services is done in very similar fashion across the codebase. This commit streamlines the process by adding a terminate method to ThreadPool.	2014-09-25 10:48:12 +02:00
Alex Ksikes	51bf3e6730	MLT Query: fix percent_terms_to_match The parameter `percent_terms_to_match` (percentage of terms that must match in the generated query) was wrongly set to the top level boolean query. This would lead to zero or all results type of situations. This commit ensures that the parameter is indeed applied to the query of generated terms. Closes #7754	2014-09-25 09:56:53 +02:00
Michael McCandless	5e9e2cf50c	Core: try again to upgrade to Lucene 4.10.1-snapshot	2014-09-24 13:48:49 -04:00
Michael McCandless	ab3be76644	Revert Lucene upgrade	2014-09-24 13:25:55 -04:00
Michael McCandless	15c75b1967	Core: upgrade to Lucene 4.10.1 snapshot Lucene will soon release official 4.10.1, but by upgrading sooner we can 1) sidestep the false failures due to the 1.8.0_20 JVM hotspot bug (has caused a number of false failures in recent Jenkins tests), 2) make sure none of the Lucene changes in 4.10.1 are problematic. Closes #7844	2014-09-24 13:13:07 -04:00
Boaz Leskes	0f121ff351	Test: ClusterServiceTests.testLocalNodeMasterListenerCallbacks - increase ping timeout was 200ms, now 400ms	2014-09-24 17:26:40 +02:00
javanna	17b1fd1a6a	Internal: split internal free context request used after scroll and search We currently use the same internal request when we need to free the search context after a search and a scroll. The two original requests though diverged after #6933 as `SearchRequest` implements `IndicesRequest` while `SearchScrollRequest` and `ClearScrollRequest` don't. That said, with #7319 we made `SearchFreeContextRequest` implement `IndicesRequest` by making it hold the original indices taken from the original request, which are null if the free context was originated by a scroll or by a clear scroll call, and that is why original indices are optional there. This commit introduces a separate free context request and transport action for scroll, which doesn't hold original indices. The new action is only used against nodes that expose it, the previous action name will be used for nodes older than 1.4.0.Beta1. As a result, in 1.4 we have a new `indices:data/read/search[free_context/scroll]` action that is equivalent to the previous `indices:data/read/search[free_context]` whose request implements now `IndicesRequest` and holds the original indices coming from the original request. The original indices in the latter requests can only be null during a rolling upgrade (already existing version checks make sure that serialization is bw compatible), when some nodes are still < 1.4. Closes #7856	2014-09-24 15:33:27 +02:00
Simon Willnauer	ea49a3e269	Update version flags after backporting to 1.3.3 Relates to #7857	2014-09-24 15:23:57 +02:00
Colin Goodheart-Smithe	f37815a53b	Aggregations: Significant Terms Heuristics now registered correctly Closes #7840	2014-09-24 11:48:26 +01:00
Simon Willnauer	ca86e1c824	[TEST] Disable thread filter for now	2014-09-23 14:01:22 +02:00
Simon Willnauer	45319dc27f	[TEST] use more terminate calls and wait for termination	2014-09-23 13:58:47 +02:00
Simon Willnauer	30acba624d	[TEST] Add a more restrictive thread leaks filter Today all threads are allowed to leak a suite. This is tricky since it essentially allows resource leaks by default where for instance test private TransportClients will never get closed and consume resources influencing other tests. It also hides threads that are not fully under elasticsearchs control like the Lucene TimeLimitingCollector thread. This commit restricts the threads that can leak a suite to the threads spawned from testclusters and fixes sevearl places that leaked threads. Closes #7833	2014-09-23 13:36:21 +02:00
Simon Willnauer	18212ba09c	[TEST] Make sure test actually throttles	2014-09-23 12:53:47 +02:00
Simon Willnauer	68c5206e50	[TEST] disable translog based flushes when corrupting files These tests rely on the fact that all files stay the same after the corruption and if we run into a translog based flush we might use a new / different delete file causing the test to fail.	2014-09-23 12:30:06 +02:00
Shay Banon	d4d77cdb66	Chunk direct buffer usage by networking layer Today, due to how netty works (both on http layer and transport layer), and even though the buffers sent over to netty are paged (CompositeChannelBuffer), it ends up re-copying the whole buffer into another heap buffer (bad), and then send it over directly to sun.nio which allocates a full thread local direct buffer to send it (which can be repeated if not all message is sent). This is problematic for very large messages, aside from the extra heap temporal usage, the large direct buffers will stay around and not released by the JVM. This change forces the use of gathering when building a CompositeChannelBuffer, which results in netty using the sun.nio write method that accepts an array of ByteBuffer (so no extra heap copying), and also reduces the amount of direct memory allocated for large messages. See the doc on NettyUtils#DEFAULT_GATHERING for more info. closes #7811	2014-09-23 12:15:19 +02:00
Simon Willnauer	6c8aa5fa6c	[RECOVERY] Mark last file chunk to fail fast if payload is truncated Today we rely on the metadata length of the file we are recoverying to indicate when the last chunk was received. Yet, this might hide bugs on the compression layer if payloads are truncated. We should indicate if the last chunk is send to make sure we validate checksums accordingly if possible. Closes #7830	2014-09-23 11:32:18 +02:00
Simon Willnauer	5533495171	[TEST] Ensure primaries are allocated before bulk indexing with dymamic mappings	2014-09-23 08:34:25 +02:00
Shay Banon	f6a0fe5c2f	Apply bulk change to 1.3.3 relates to #7729	2014-09-22 15:13:35 +02:00
Boaz Leskes	b1851906d8	Tests: extend testRecoverFromPreviousVersion to sometimes index during relocation Relates to #7729 Closes #7768	2014-09-22 11:19:38 +02:00
Boaz Leskes	4677d05048	Recovery: mapping check during phase2 should be done in cluster state update task Before phase2 we check verify that the local mapping is in sync with the cluster state mapping (and send & wait on a master update mapping task if not). This check should be done under a cluster state update task to make sure an incoming cluster state update to do not change things while we check. Closes #7744	2014-09-22 11:05:00 +02:00
Boaz Leskes	d17fd26f23	Test: RecoveryWhileUnderLoadTests.recoverWhileRelocating should report cluster state when failing to reach green	2014-09-21 20:16:45 +02:00
Boaz Leskes	41fd5d02f4	Discovery: Give a unique id to each ping response During discovery a node gossips with other nodes to discover the current state of the cluster - what nodes are out there, what version they use and most importantly whether there is an active master out there. During this ping process we may end up in a situation where old information is mixed with new. This is comment if a couple of master election happen in rapid succession. This commit adds a monotonically increasing id to each ping response. This makes it easy to always select the last ping from every node. Closes #7769	2014-09-20 12:58:15 +02:00
Martijn van Groningen	afcbffbfc1	Core: Check if from + size don't cause overflow and fail with a better error. Closes #7778	2014-09-20 12:34:48 +02:00

1 2 3 4 5 ...

5170 Commits