Commit Graph

5281 Commits

Author SHA1 Message Date
Boaz Leskes 12cbb3223a Discovery: node join requests should be handled at lower priority than master election
When a node is elected as master or receives a join request, we submit a cluster state update task. We should give the node join update task a lower priority than the elect as master to increase the chance it will not be rejected. During master election there is a big chance that these will happen concurrently.

This commit lowers the priority of node joins from IMMEDIATE to URGENT

Closes #7733
2014-09-16 11:41:59 +02:00
Simon Willnauer ec28d7c465 [STORE] Fold two hashFile implemenation into one 2014-09-16 11:01:49 +02:00
Simon Willnauer 723a40ef34 [VERSION] s/V_1_4_0_Beta/V_1_4_0_Beta1/g 2014-09-16 10:54:41 +02:00
Simon Willnauer a7dde8dd80 [TEST] Make flush in #indexRandom optinal
Some tests like CorruptedTranslogTests rely on the fact that we
are recovering from translog. In those cases we need to prevent
flushes from happening during indexing. This change adds an optional
flag on the #indexRandom utility to disable flushes.
2014-09-16 10:43:28 +02:00
javanna 38f5aa2248 [TEST] Fixed ActionNamesTests to not use random action names that conflict with existing ones
ActionNamesTests#testIncomingAction rarely uses a random action name to make sure that actions registered via plugins work properly. In some cases the random action would conflict with existing one (e.g. tv) and make the test fail. Fixed also testOutgoingAction although the probability of conflict there is way lower due to longer action names used from 1.4 on.
2014-09-16 10:23:02 +02:00
Simon Willnauer fbf2c3f9f7 [TEST] Use node names for transport clients an close them in tests 2014-09-16 10:08:25 +02:00
Boaz Leskes 7253646664 Discovery: not all master election related cluster state update task use Priority.IMMEDIATE
Most notably the elected_as_master task should run as soon as possible. This is an issue as node join request do use `Priority.IMMEDIATE` and can be unjustly rejected.

Closes #7718
2014-09-15 21:26:01 +02:00
Boaz Leskes db13eead54 Internal: ClusterHealthAPI does not respect waitForEvents when local flag is set
It uses a cluster state update task and it gets rejected if not run on a master node. We should enable running on non-masters if the local flag is set.

Also, report any unexpected error that may happen during this cluster state update task

Closes #7731
2014-09-15 21:02:23 +02:00
Alexander Reelsen ec86808fa9 Netty: Make sure channel closing never happens on i/o thread
Similar to NettyTransport.doStop() all actions which disconnect
from a node (and thus call awaitUnterruptibly) should not be executed
on the I/O thread.

This patch ensures that all disconnects happen in the generic threadpool, trying to avoid unnecessary `disconnectFromNode` calls.

Also added a missing return statement in case the component was not yet
started when catching an exception on the netty layer.

Closes #7726
2014-09-15 17:54:15 +02:00
Boaz Leskes 2250f58757 Discovery: UnicastZenPing - use temporary node ids if can't resolve node by it's address
The Unicast Zen Ping mechanism is configured to ping certain host:port combinations in order to discover other node. Since this is only a ping, we do not setup a full connection but rather do a light connect with one channel. This light connection is closed at the end of the pinging.

During pinging, we may discover disco nodes which are not yet connected (via temporalResponses). UnicastZenPing will setup the same light connection for those node. However, during pinging a cluster state may arrive with those nodes in it. In that case , we will mistakenly believe those nodes are connected and at the end of pinging we will mistakenly disconnect those valid node.

This commit makes sure that all nodes UnicastZenPing connects to have a unique id and can be safely disconnected.

Closes #7719
2014-09-15 16:47:39 +02:00
mikemccand 7ca64237a8 Test: always run CheckIndex for these two tests 2014-09-15 09:58:14 -04:00
Simon Willnauer 509f71cd55 [TEST] Log if we use transport client in trace mode 2014-09-15 15:42:34 +02:00
Boaz Leskes d228606bab Recovery: remove unneeded waits on recovery cancellation
When cancelling recoveries, we wait for up to 10s for the source node to be notified before continuing. This is not needed in two cases:
1) The source node has been disconnected due to node shutdown (recovery is canceled as a response to cluster state processing)
2) The current thread is the one that will be notifying the source node (happens when one of the calls from the source nodes discoveres the local index is closed)

The first one is especially important as it may delay cluster state update processing with 10s.

Closes #7717
2014-09-15 15:30:54 +02:00
javanna ede39edbba [TEST] Minor REST tests infra cleanup
Make the http addresses within the REST client final. It makes no sense to update them before each test if we don't check the version of the nodes again, which would mean adding too much overhead (an additional http call before each test) for no reason. We just reuse the same nodes for the whole suite and check the version once while initializing the client. Would be nice to make the REST client within the execution context final but its initialization still needs to happen after the `ElasticsearchIntegrationTest#beforeInternal` that assigns `GLOBAL_CLUSTER` to `currentCluster`.

Closes #7723
2014-09-15 15:27:14 +02:00
uboness b619cd1112 Added scrollId/s setters to the different scroll requests/responses 2014-09-15 13:56:48 +03:00
Lee Hinman 964db64ed1 Only set `breaker` when stats are retrieved
When communicating with 1.3 and earlier nodes, it's possible that the
field data breaker info is not sent at all. When this happens, we should
leave the `breaker` variable as-is (unset) instead of creating an
AllCircuitBreakerStats object with a null fd breaker and fake request &
parent breakers.
2014-09-15 12:29:57 +02:00
Colin Goodheart-Smithe d4e83df3b8 Aggregations: Adds ability to sort on multiple criteria
The terms aggregation can now support sorting on multiple criteria by replacing the sort object with an array or sort object whose order signifies the priority of the sort. The existing syntax for sorting on a single criteria also still works.

Contributes to #6917
Replaces #7588
2014-09-15 11:08:29 +01:00
Britta Weber be7c75c745 function score: fix cast in Gaussian decay function
Also fix the test
FunctionScoreTests#simpleWeightedFunctionsTestWithRandomWeightsAndRandomCombineMode
which sometimes failed due to rounding issues. Make sure
only floats are returned as scores to assure ratio of
expected and returned score is 1.0f.
2014-09-15 11:46:20 +02:00
Boaz Leskes 3142fec206 Test: ZenUnicastDiscoveryTests.testNormalClusterForming should start unicast hosts first
The test starts a cluster with random nodes as unicast hosts but *doesn't* use min_master_nodes. If the unicast hosts are started last, nodes may elect themselves as master as they do not have mechanism yet to share information.
2014-09-15 11:23:12 +02:00
javanna 8cf922bf9e Internal: make sure that original headers are used when executing search as part of put warmer
Closes #7711
2014-09-15 09:32:17 +02:00
javanna ec5ceecb97 [TEST] Expose ability to provide http headers when sending requests in our REST tests
ElasticsearchRestTests has now a `restClientSettings` method that can be overriden to provide headers as settings (similarly to what we do with transport client). Those headers will be sent together with every REST requests within the tests.

Closes #7710
2014-09-15 09:30:45 +02:00
Boaz Leskes f96bfd3773 Tests: added trace action.search.type to GeoBoundsTests 2014-09-12 20:16:42 +02:00
Britta Weber 5a8ebab96e [TEST] Fix test explain now that explanation is fixed 2014-09-12 18:12:34 +02:00
Philipp Jardas 5e0f67b516 Fixed explanation for GaussDecayFunction
The explanation now gives the correct value instead of the negative.
2014-09-12 18:12:34 +02:00
Martijn van Groningen 91144fc92f Parent/child: If a p/c query is wrapped in a query filter then CustomQueryWrappingFilter must always be used and any filter wrapping the query filter must never be cached.
Closes #7685
2014-09-12 17:22:07 +02:00
markharwood 3c8f8cc090 Aggs enhancement - allow Include/Exclude clauses to use array of terms as alternative to a regex
Closes #6782
2014-09-12 15:28:03 +01:00
Lee Hinman 3e589cd25b [TEST] Additional logging info for node with primary 2014-09-12 15:42:41 +02:00
Colin Goodheart-Smithe 722ff1f56e [TEST] added trace logging for index recovery in GeoBoundsTests 2014-09-12 13:23:45 +01:00
Boaz Leskes 1002ff2f15 Discovery: restore preference to latest unicast pings describing the same node
Closes #7702
2014-09-12 14:02:43 +02:00
Colin Goodheart-Smithe f8d75faaad Geo: Fixes BoundingBox across complete longitudinal range
Adds a special case to the GeoBoundingBoxFilterParser so that the left of the box is not normalised in the case where left and right are 360 apart.  Before this change the left would be normalised to 180 in this case and the filter would only match points with a longitude of 180 (or -180).

Closes #5128
2014-09-12 10:09:02 +01:00
Colin Goodheart-Smithe 2837800500 [TEST] Adds tests for GeoUtils 2014-09-12 09:43:25 +01:00
Boaz Leskes 1bd2a491d1 Tests: add a comment to DiscoveryWithServiceDisruptions.testAckedIndexing reminding to port it to 1.x once the awaitFix is removed 2014-09-12 10:35:18 +02:00
Boaz Leskes 5b461454c2 Tests: add an awaitFix to IndicesLifecycleListenerTests 2014-09-12 10:24:55 +02:00
Simon Willnauer a3f2677b70 [CORE] Ensure GroupShardsIterator is consistent across requests
GroupShardsIterator is used in many places like the search execution
to determin which shards to query. This can hold shards of one index
as well as shards of multiple indices. The iteration order is used
to assigne a per-request shard ID for each shard that is used as a
tie-breaker when scores are the same. Today the iteration order is
soely depending on the HashMap iteration order which is undefined or
rather implementation dependent. This causes search requests to return
inconsistent results across requests if for instance different nodes
are coordinating the requests.

Simple queries like `match_all` may return results in arbitrary order
if pagination is used or may even return different results for the same
request even though there hasn't been a refresh call and preferences are
used.
2014-09-12 07:33:07 +02:00
Simon Willnauer 929a4a54f7 [VERSION] Added Version [1.5.0] 2014-09-11 22:38:03 +02:00
Simon Willnauer b9ee915763 [Version] Add Version 1.4.0-Beta 2014-09-11 22:10:19 +02:00
Simon Willnauer a0e9951e8a [STORE] Turn unexpected exception into CorruptedIndexException
Today if we run into exception like NumberFormatException or IAE
when we try to open a commit point to retrieve checksums and calculate
store metadata we just bubble them up. Yet, those are very likely index
corruptions. In such a case we should really mark the shard as
corrupted.
2014-09-11 21:15:02 +02:00
Britta Weber 9b5497f6ca [TEST] fix another rounding issue 2014-09-11 19:49:51 +02:00
Simon Willnauer b0a377bae8 [TEST] Use a sorted set since sets are compared and compare is order specific 2014-09-11 17:42:37 +02:00
javanna 7e0481d906 More Like This API: remove unused search_query_hint parameter
Closes #7691
2014-09-11 17:34:54 +02:00
Martijn van Groningen d0300b3f59 Aggregations top_hits: Fixed inconsistent sorting of the hits
In the reduce logic of the top_hits aggregation if the first shard result to process contained has no results then the merging of all the shard results can go wrong resulting in an incorrect sorted hits.
This bug can only manifest with a sort other than score.

Closes #7697
2014-09-11 17:26:22 +02:00
Simon Willnauer 3ef6860679 [STORE] Improve exception from Store.failIfCorrupted
If you have previously corrupted files, this method currently builds an
exception like:
```
    failed engine [corrupted preexisting index]
    failed to start shard
```

Followed by a CorruptIndexException. This commit writes the entire
stacktrace to provide additional information. It also changes the
failure message from `corrupted preexisting index` to `preexisting
corrupted index` to prevent confusion.

Closes #7596
2014-09-11 17:11:48 +02:00
Simon Willnauer 595472014e [TEST] Use a real unique clustername for InternalTestClusterTests 2014-09-11 16:20:51 +02:00
Martijn van Groningen f8e93fa2aa Test: Let both types have a non_analyzed id field. 2014-09-11 15:20:28 +02:00
javanna fd6798df69 [TEST] parse response body as json depending on the content-type in our REST tests 2014-09-11 14:32:59 +02:00
javanna 4ab268bab2 Internal: refactor copy headers mechanism to not require a client factory
With #7594 we replaced the static `BaseRestHandler#addUsefulHeaders` by introducing the `RestClientFactory` that can be injected and used to register the relevant headers. To simplify things, we can now register relevant headers through the `RestController` and remove the `RestClientFactory` that was just introduced.

Closes #7675
2014-09-11 13:18:08 +02:00
Colin Goodheart-Smithe 8720a4dcd2 [TEST] added exception for GET index API to bwc tests 2014-09-11 11:59:17 +01:00
Colin Goodheart-Smithe 5fe782b784 Indices API: Added GET Index API
Returns information about settings, aliases, warmers, and mappings. Basically returns the IndexMetadata. This new endpoint replaces the /{index}/_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers and /_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers endpoints whilst maintaining the same response formats.  The only exception to this is on the /_alias|_aliases|_warmer|_warmers endpoint which will now return a section for 'aliases' or 'warmers' even if no aliases or warmers exist. This backwards compatibility change is documented in the reference docs.

Closes #4069
2014-09-11 11:19:21 +01:00
Boaz Leskes a50934ea3e Resiliency: Master election should demotes nodes which try to join the cluster for the first time
With the change in #7493,  we introduced a pinging round when a master nodes goes down. That pinging round helps validating the current state of the cluster and takes, by default, 3 seconds. It may be that during that window, a new node tries to join the cluster and starts pinging (this is typical when you quickly restart the current master).  If this node gets elected as the new master it will force recovery from the gateway (it has no in memory cluster state), which in turn will cause a full cluster shard synchronisation. While this is not a problem on it's own, it's a shame. This commit demotes "new" nodes during master election so the will only be elected if really needed.

Closes #7558
2014-09-11 11:19:10 +02:00
Simon Willnauer 8618d3624a Add inline comment to prevent confusion 2014-09-11 11:07:23 +02:00
Boaz Leskes 6849fd0378 Translog: remove unused stream
Closes #7683
2014-09-11 10:00:06 +02:00
Adrien Grand ccb3d21781 Bulk UDP: Removal.
This feature is rarely used. Removing it will help reduce the moving parts
of Elasticsearch and focus on the core.

Close #7595
2014-09-11 09:52:09 +02:00
Adrien Grand 8bafb5fc8e Core: Use FixedBitSetFilterCache for delete-by-query.
Leftover from #7037.
Close #7581
2014-09-11 09:35:25 +02:00
Martijn van Groningen 144af9c910 Test: Fields in percolator query must exist before percolating 2014-09-10 19:01:55 +02:00
Colin Goodheart-Smithe dc30bb0ea7 [TEST] Added logging of response to aid debugging 2014-09-10 17:24:22 +01:00
Colin Goodheart-Smithe 76082182aa [TEST] added more information to fail message for a test to debug test failure 2014-09-10 17:24:22 +01:00
Martijn van Groningen dbfac659f9 Aggregation top_hits: Move sort resolution to the reduce method, so it is always guaranteed to be invoked. 2014-09-10 17:51:07 +02:00
Boaz Leskes bb1dfbfa42 missing logging brackets 2014-09-10 16:07:15 +02:00
Boaz Leskes 7d80db7c2c Gateway: added trace logging to translog recovery logic
Enabled it in SimpleRecoveryLocalGatewayTests
2014-09-10 16:02:44 +02:00
Boaz Leskes 4f8ddd97bf [Rest] reroute API response didn't filter metadata
By default the reroute API should return the new cluster state, excluding the metadata. It was however it was wrongly using an old parameter (filter_metadata) and thus failed to do so. This commits restores but wiring it to the correct `metric` parameter. We also add an enum representing the possible metrics, to avoid similar future mistakes.

Closes #7520
Closes #7523
2014-09-10 14:48:06 +02:00
Martijn van Groningen ab555e0a33 Test: Added more assertions 2014-09-10 14:45:46 +02:00
Boaz Leskes 09db2e2a27 Tests: log when an ensure green/yellow comes back
Also added some trace logging to SimpleRecoveryLocalGatewayTests
2014-09-10 14:42:35 +02:00
Boaz Leskes fc89316b1b Store: improved trace logging for shard active requests 2014-09-10 12:35:09 +02:00
Boaz Leskes 18f58682e7 Tests: add logging to LocalGatewayIndexStateTests 2014-09-10 12:29:00 +02:00
javanna 5bea31cb96 Internal: refactor copy headers mechanism
The functionality of copying headers in the REST layer (from REST requests to transport requests) remains the same. Made it a bit nicer by introducing a RestClientFactory component that is a singleton and allows to register useful headers without requiring static methods.

Plugins just have to inject the RestClientFactory now, and call its `addRelevantHeaders` method that is not static anymore.

Relates to #6513
Closes #7594
2014-09-10 12:05:53 +02:00
Martijn van Groningen 6f763bded9 Test: Enabled ChildrenTests#testParentWithMultipleBuckets with more logging 2014-09-10 11:19:30 +02:00
Simon Willnauer cb839b56b2 [ThreadPool] Use DirectExecutor instead of deprecated API
Guava deprecated MoreExecutors#sameThreadExecutor in favour of
a more efficient implemenation. We should move over to the new impl.
2014-09-10 09:36:30 +02:00
Ryan Ernst 4638d23484 Tests: Re-enable BWC analysis test, skipping strings that could cause issue with LUCENE-5927. 2014-09-09 14:30:31 -07:00
Simon Willnauer 38d88b2e2c [TEST] Added basic test for InternalTestCluster reproducibility 2014-09-09 21:48:42 +02:00
Britta Weber 1138a6ae60 [TEST] fix rounding issue 2014-09-09 19:44:47 +02:00
Simon Willnauer b0cf929637 [TEST] use local random instance rather than thread local version
This influences reprocucibility dramatically since it modifies the
test random sequence while it should just use the private random
instance.
2014-09-09 18:31:41 +02:00
javanna ee3cbce118 Internal: make sure that the request context is always copied from REST to transport layer
Also renamed HeadersCopyClientTests since it tests a class that was renamed and removed randomization around client wrapper, as it now needs to be weapped all the time to copy the context, doesn't depend on useful headers that have been registered anymore.

Relates to #7610
2014-09-09 18:06:46 +02:00
Colin Goodheart-Smithe c8a3460f39 [TEST] more debugging for GeoBoundsTests 2014-09-09 16:49:06 +01:00
Lee Hinman c328397eb6 [TEST] Disable translog flush in CorruptedTranslogTests 2014-09-09 17:48:37 +02:00
Simon Willnauer 32201c762b [TEST] Ensure BWC test run even if node.mode=local is set
Today we throw an error if local transport is configured with BWC tests.
Yet, the BWC test need network to be enabled so test can just set the
required defaults.

Closes #7660
2014-09-09 17:30:43 +02:00
Lee Hinman fc0ee42c07 Fix ordering of Regex.simpleMatch() parameters
Previously we incorrectly sent them in the wrong order, which can cause
validators not to be run for dynamic settings that have been added
matching a particular wildcard.

Fixes #7651
2014-09-09 16:23:55 +02:00
Lee Hinman 57bcd65ca4 Simplify translog interface
get rid of readSource() entirely, it sucked, Operations should be able
to provide the source themselves.

No more TranslogStream headers, you are now required to pass an
StreamInput or StreamOutput for all operations, which means no extra
state is needed and no need to construct new versions when detecting the
version.

Read and write translog op sizes in TranslogStreams

Previously we handled these integers outside of the translog stream
itself, which was very unclean because other code had to know about
reading the size, or about writing the correct header sometimes.

There is some additional code in LocalIndexShardGateway to handle the
legacy case for older translogs, because we need to read and discard the
size in order to maintain the compatibility for the streaming
operations (they did not read or write the size for 1.3.x and earlier).

Additionally, we need to handle a case where the header is truncated
when recovering from disk

Use a NoopStreamOutput instead of byte arrays

Instead of writing translog operations to a temporary byte array and
then writing that byte array to the stream, we now write the operation
twice, once to a No-op stream to get the size, then again to the real
size.

This trades a little more CPU usage for less memory usage.
2014-09-09 15:26:13 +02:00
Martijn van Groningen 52f1ab6e16 Core: Added the `index.query.parse.allow_unmapped_fields` setting to fail queries if they refer to unmapped fields.
The percolator and filters in aliases by default enforce strict query parsing.

Closes #7335
2014-09-09 15:00:47 +02:00
Alexander Reelsen bd0eb32d9c CORS: Disable by default
In order to deliver a more secure out-of-the-box configuration this commit
disables cross-origin resource sharing by default.

Closes #7151
2014-09-09 11:09:03 +02:00
Ryan Ernst 789c0a9a1b Quiet BWC analysis test case for now.
See https://issues.apache.org/jira/browse/LUCENE-5927.
2014-09-08 14:23:11 -07:00
Simon Willnauer 2619911e17 [STORE] Write Snapshots directly to the blobstore stream
Today we serialize the snashot metadata to a byte array and then copy
the byte array to a stream. Instead this commit moves the serialization
directly to the target stream without the intermediate representation.

Closes #7637
2014-09-08 22:19:24 +02:00
Colin Goodheart-Smithe b127b52fd3 Revert "Aggregations: Adds ability to sort on multiple criteria"
This reverts commit bfedd11ffa.
2014-09-08 20:27:19 +01:00
Colin Goodheart-Smithe 13d01af940 Revert "[TEST] added @AwaitsFix to failing StringTermsTests while I work on a fix"
This reverts commit 18a713a2ae.
2014-09-08 20:27:16 +01:00
Simon Willnauer ce2e65f6e7 [TEST] Don't print BWC test path - it's different on every machine 2014-09-08 21:10:09 +02:00
Boaz Leskes 9054ce5569 [Stats] update action returns before updating stats for `NONE` operations
We keep around a noop stats indicating how many update operations ended up not updating the document (typically because it didn't change). However, the TransportUpdateAction update that counter only after returning the result. This can throw off stats check which are done immediately after, potentially causing test failures.

Closes #7639
2014-09-08 20:46:07 +02:00
Simon Willnauer 72c4cb51cc [CORE] Unify search context cleanup
Today there are two different ways to cleanup search contexts which can
potentially lead to double releasing of a context. This commit unifies
the methods and prevents double closing.

Closes #7625
2014-09-08 20:36:19 +02:00
Andrew Selden 80a3038f83 Make .zip and .tar.gz release artifacts contain same files.
This commit changes the build to include .exe and sigar/.dll files in
both the zip and tar artifacts.

Closes #2793
2014-09-08 10:43:09 -07:00
Colin Goodheart-Smithe 18a713a2ae [TEST] added @AwaitsFix to failing StringTermsTests while I work on a fix 2014-09-08 16:28:12 +01:00
Britta Weber ee5221bd22 _timestamp: enable mapper properties merging
Updates on the _timestamp field were silently ignored.
Now _timestamp undergoes the same merge as regular
fields. This includes exceptions if a property cannot
be changed.
"path" and "default" cannot be changed.

closes #5772
closes #6958
closes #7614
partially fixes #777
2014-09-08 17:17:06 +02:00
Colin Goodheart-Smithe bfedd11ffa Aggregations: Adds ability to sort on multiple criteria
The terms aggregation can now support sorting on multiple criteria by replacing the sort object with an array or sort object whose order signifies the priority of the sort. The existing syntax for sorting on a single criteria also still works.

Contributes to #6917
2014-09-08 15:20:33 +01:00
Adrien Grand 11fe940ea9 [TESTS] Add explicit mappings to IndexAliasesTests.testSearchingFilteringAliasesSingleIndex
This makes sure that all shards know about the `_uid` field.
2014-09-08 16:11:50 +02:00
Colin Goodheart-Smithe 12ca36574e [TEST] added debug info to GeoBoundsTests to try to solve build issue 2014-09-08 10:50:25 +01:00
Simon Willnauer aadbfa44b4 [SEARCH] Execute search reduce phase on the search threadpool
Reduce Phases can be expensive and some of them like the aggregations
reduce phase might even execute a one-off call via an internal client
that might cause a deadlock due to execution on the network thread
that is needed to handle the one-off call. This commit dispatches
the reduce phase to the search threadpool to ensure we don't wait
for the current thread to be available.

Closes #7623
2014-09-08 11:32:55 +02:00
mikemccand 130fdef367 Core: remove built-in support for Lucene's experimental codecs
Lucene's experimental codecs (from the codecs module) do not provide
backwards compatibility and are free to change from release to
release.  When they do change, they typically cannot in general read
older indices and the resulting exceptions look like index corruption.
So, we are removing built-in support for them to prevent applications
from choosing one and then seeing strange exceptions on upgrade.

Closes #7566

Closes #7604
2014-09-08 04:55:15 -04:00
Ryan Ernst 1a9c82d6b5 RestAPI: Change validation exceptions to respond with 400 status instead of 500.
Validation errors are clearly in the realm of client errors (a program
with the request).  Thus they should return a 4xx response code.

closes #7619
2014-09-06 22:02:32 -07:00
Simon Willnauer 36f9d39205 [TEST] Close input stream in test to not upset windows 2014-09-06 22:07:01 +02:00
uboness 333a39cf30 Extended ActionFilter to also enable filtering the response side
Enables filtering the actions on both sides - request and response. Also added a base class for filter implementations (cleans up filters that only need to filter one side)

Also refactored the filter & filter chain methods to more intuitive names
2014-09-06 13:18:40 +02:00
Ryan Ernst dd54025b17 Internal: Change LZFCompressedStreamOutput to use buffer recycler when allocating encoder
closes #7613
2014-09-05 13:59:10 -07:00
Ryan Ernst 669a7eb4f1 RestAPI: Add explicit error when PUT mapping API is given an empty request body.
closes #7536
closes #7618
2014-09-05 13:30:39 -07:00
Simon Willnauer 7f32e8c707 [STORE] Simplify reading / writing from and to BlobContainer
BlobContainer used to provide async APIs which are not used
internally. The implementation of these APIs are also not async
by nature and neither is any of the pluggable BlobContainers. This
commit simplifies the API to a simple input / output stream and
reduces the hierarchy of BlobContainer dramatically.
NOTE: This is a breaking change!

Closes #7551
2014-09-05 21:40:20 +02:00
Simon Willnauer 6a0a7afea6 [TEST] Allow SingleNodeTest to reset the node if really needed after test 2014-09-05 21:22:24 +02:00
Robert Muir 223dab8921 [Lucene] Upgrade to Lucene 4.10
Closes #7584
2014-09-05 12:21:08 -04:00
uboness 5df9c048fe Introduced a transient context to the rest request
Similar to the one in `TransportMessage`. Added the `ContextHolder` base class where both `TransportMessage` and `RestRequest` derive from

Now next to the known headers, the context is always copied over from the rest request to the transport request (when the injected client is used)
2014-09-05 16:54:46 +02:00
Alexander Reelsen 8b8cc80ba8 TransportClient: Mark transport client as such when instantiating
This allows plugins to load/inject specific classes, when the client started
is a transport client (compared to being a node client).

Closes #7552
2014-09-05 15:01:14 +02:00
Alex Ksikes 07d741c2cb Term Vectors: Support for artificial documents
This adds the ability to the Term Vector API to generate term vectors for
artifical documents, that is for documents not present in the index. Following
a similar syntax to the Percolator API, a new 'doc' parameter is used, instead
of '_id', that specifies the document of interest. The parameters '_index' and
'_type' determine the mapping and therefore analyzers to apply to each value
field.

Closes #7530
2014-09-05 07:42:43 +02:00
Adrien Grand b49853a619 Internal: Upgrade Guava to 18.0.
17.0 and earlier versions were affected by the following bug
https://code.google.com/p/guava-libraries/issues/detail?id=1761
which caused caches that are configured with weights that are greater than
32GB to actually be unbounded. This is now fixed.

Relates to #6268
Close #7593
2014-09-04 20:14:59 +02:00
javanna a857798e1c Indexed scripts: make sure headers are handed over to internal requests and streamline versioning support
The get, put and delete indexed script apis map to get, index and delete api and internally create those corresponding requests. We need to make sure that the original headers are handed over to the new request by passing the original request in the constructor when creating the new one.

Also streamlined the support for version and version_type in the REST layer since the parameters were not consistently parsed and set to the internal java API requests.

Modified the REST delete template and delete script actions to make use of a client instead of using the `ScriptService` directly.

Closes #7569
2014-09-04 16:00:32 +02:00
uboness 221eafab59 Refactored TransportMessage context
Removed CHM in favour of an OpenHashMap and synchronized accessor/mutator methods. Also, the context is now lazily inititialied (just like we do with the headers)
2014-09-04 15:11:28 +02:00
javanna 6633221470 Internal: deduplicate useful headers that get copied from REST to transport requests
The useful headers are now stored into a `Set` instead of an array so we can easily deduplicate them. A set is also returned instead of an array by the `usefulHeaders` static getter.

Relates to #6513

Closes #7590
2014-09-04 15:04:11 +02:00
Adrien Grand 4ca2dd0a0a Core: Remove DocSetCache.
This class was unused.

Close #7582
2014-09-04 11:03:16 +02:00
Colin Goodheart-Smithe 228778ceed Aggregations: Fixes resize bug in Geo bounds Aggregator
Closes #7556
2014-09-03 15:14:07 +01:00
javanna 5b5f4add1e [TEST] added test to verify GetIndexedScriptRequest serialization after recent changes 2014-09-03 15:16:13 +02:00
javanna 5ac77f79c2 [TEST] replaced assert with actual assertions in TemplateQueryTest 2014-09-03 15:16:13 +02:00
Britta Weber 59ecfd67e8 _boost: Fix "index" setting
Serialization if "index" setting for boost did not work since
the serialization was just true/false instead of valid options
"no"/"not_analyzed"/"analyzed".

closes #7557
2014-09-03 14:25:18 +02:00
javanna 4dab138db7 [TEST] resolved warning in IndexedScriptTests 2014-09-03 14:05:24 +02:00
javanna 19418749e4 Java api: change base class for GetIndexedScriptRequest and improve its javadocs
`GetIndexedScriptRequest` now extends `ActionRequest` instead of `SingleShardOperationRequest`, as the index field that was provided with the previous base class is not needed (hardcoded).

Closes #7553
2014-09-03 12:33:37 +02:00
javanna 851cb3ae8a Internal: fix members visibility, remove unused constant and needless try catch in indexed scripts transport actions 2014-09-03 11:57:10 +02:00
javanna 151b1c47d4 Java api: remove needless copy constructor from DeleteIndexedScriptRequest 2014-09-03 11:57:10 +02:00
javanna 4364b59846 Internal: remove unused constructor and adjust methods visibility in DelegatingActionListener 2014-09-03 11:57:10 +02:00
Renaud AUBIN 4c21db0dca Packaging: Add default oracle jdk 7 (x64) path in debian init script
On Debian amd64, oracle jdk .deb packages made using make-jpkg (from
java-package) default to /usr/lib/jvm/jdk-7-oracle-x64.

Closes #7312
2014-09-03 10:15:35 +02:00
Adrien Grand 4bfad644b3 Aggregations: Forbid usage of aggregations in conjunction with search_type=SCAN.
Aggregations are collection-wide statistics, which is incompatible with the
collection mode of search_type=SCAN since it doesn't collect all matches on
calls to the search API.

Close #7429
2014-09-03 09:03:01 +02:00
Adrien Grand 203e80e650 Aggregations: Only return aggregations on the first page when scrolling.
Aggregations are collection-wide statistics so they would always be the same.
In order to save CPU/bandwidth, we can just return them on the first page.

Same as #1642 but for aggregations.
2014-09-03 09:03:01 +02:00
Boaz Leskes 1f8db672fc [Internal] Do not use a background thread to disconnect node which are remove from the ClusterState
After a node fails to respond to a ping correctly (master or node fault detection), they are removed from the cluster state through an UpdateTask. When a node is removed, a background task is scheduled using the generic threadpool to actually disconnect the node. However, in the case of temporary node failures (for example) it may be that the node was re-added by the time the task get executed, causing an untimely disconnect call. Disconnect is cheep and should be done during the UpdateTask.

Closes #7543
2014-09-03 08:49:09 +02:00
Robert Muir 395744b0d2 [Analysis] Add missing docs for latvian analysis 2014-09-02 19:22:59 -04:00
Boaz Leskes 8d3dd61b21 typo s/removeDistruptionSchemeFromNode/removeDisruptionSchemeFromNode 2014-09-02 22:00:44 +02:00
Robert Muir 1711041c57 [Engine] Verify checksums on merge
Enable lucene verification of checksums on segments before merging them.
This prevents corruption from existing segments from silently slipping into
newer merged segments.

Closes #7360
2014-09-02 12:18:19 -04:00
Simon Willnauer b00424aba7 [TEST] Use a large threshold to prevent relocations in RecoveryBackwardsCompatibilityTests 2014-09-02 16:50:19 +02:00
Simon Willnauer cb206c94ec [TEST] Add simple test to test RT Lucene IW settings 2014-09-02 16:33:40 +02:00
Boaz Leskes 89f8f6c51e [Tests] ExternalCluster change error message when use local network mode due to wrong system properties 2014-09-02 15:37:07 +02:00
Boaz Leskes 024df242dc [Tests] add proper error message when BWC client creation fail due to node.local=true system property
System properties are typically set via the command line and therefore override the node settings. If one has `node.local=true` or `node.mode=local` it can result in cryptic error messages during the test run.
2014-09-02 15:37:07 +02:00
mikemccand 9c1ac95ba8 Use Flake IDs instead of random UUIDs when auto-generating id field
Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).

Closes #7531

Closes #6004

Closes #5941
2014-09-02 09:13:51 -04:00
Boaz Leskes 20dcb0e08a [Tests] add proper error message when BWC test fail due to node.local=true system property
System properties are typically set via the command line and therefore override the node settings. If one has `node.local=true` or `node.mode=local` it can result in cryptic error messages during the test run.
2014-09-02 14:49:46 +02:00
Boaz Leskes 5d7d86323d [Test] RecoveryBackwardsCompatibilityTests.testReusePeerRecovery used `gateway.recover_after_nodes:3` but may start only a 2 node cluster 2014-09-02 13:38:49 +02:00
Cristiano Fontes df5d22c7d7 Internal: Removing unused methods/parameters.
Close #7474
2014-09-02 09:38:51 +02:00
Boaz Leskes 884a744143 [Test] change the default port base for ClusterDiscoveryConfiguration.UnicastZen to 30000
The previous value of 10000 collided with the standard test cluster ports when 6 or more JVMs are used.
2014-09-01 21:40:52 +02:00
Boaz Leskes 246b2583a3 [Test] ElasticsearchIntegrationTest.clearDisruptionScheme should test if the current cluster is internal
When running on a non-internal cluster the function is a noop.
2014-09-01 21:14:30 +02:00
javanna 0d49a8ec76 [TEST] remove global scope mention from ElasticsearchIntegrationTest#buildTestCluster
The global cluster gets created from a static block and shared through all tests in the same jvm. The `buildTestCluster` method can't get called passing in `Scope.GLOBAL`, hence removed its mention from it as it might be misleading. The only two scopes supported within the `buildTestCluster` method are `SUITE` and `TEST`.
2014-09-01 18:34:32 +02:00
Boaz Leskes 598854dd72 [Discovery] accumulated improvements to ZenDiscovery
Merging the accumulated work from the feautre/improve_zen branch. Here are the highlights of the changes:

__Testing infra__
- Networking:
    - all symmetric partitioning
    - dropping packets
    - hard disconnects
    - Jepsen Tests
- Single node service disruptions:
    - Long GC / Halt
    - Slow cluster state updates
- Discovery settings
    - Easy to setup unicast with partial host list

__Zen Discovery__
- Pinging after master loss (no local elects)
- Fixes the split brain issue: #2488
- Batching join requests
- More resilient joining process (wait on a publish from master)

Closes #7493
2014-09-01 16:13:57 +02:00
Boaz Leskes 34f4ca763c [Cluster] Refactored ClusterStateUpdateTask protection against execution on a non master
Previous implementation used a marker interface and had no explicit failure call back for the case update task was run on a non master (i.e., the master stepped down after it was submitted). That lead to a couple of instance of checks.

This approach moves ClusterStateUpdateTask from an interface to an abstract class, which allows adding a flag to indicate whether it should only run on master nodes (defaults to true). It also adds an explicit onNoLongerMaster call back to allow different error handling for that case. This also removed the need for the  NoLongerMaster.

Closes #7511
2014-09-01 15:57:07 +02:00
Boaz Leskes 596a4a0735 [Internal] Extract a common base class for (Master|Nodes)FaultDetection
They share a lot of settings and some logic.

Closes #7512
2014-09-01 15:51:26 +02:00
Britta Weber 889db1c824 [TEST]: remove field_value_factor , was only added 1.2 2014-09-01 15:08:45 +02:00
Britta Weber 40d86a630b Tests: wait for yellow instead of green 2014-09-01 12:26:14 +02:00
javanna ab57d4a002 [TEST] Unify the randomization logic for number of shards and replicas
We currently have two ways to randomize the number of shards and replicas: random index template, that stays the same for all indices created under the same scope, and the overridable `indexSettings` method, called by `createIndex` and `prepareCreate` which returns different values each time.

Now that the `randomIndexTemplate` method is not static anymore, we can easily apply the same logic to both. Especially for number of replicas, we used to have slightly different behaviours, where more than one replicas were only rarely used through random index template, which gets now applied to the `indexSettings` method too (might speed up the tests a bit)

Side note: `randomIndexTemplate` had its own logic which didn't depend on `numberOfReplicas` or `maximumNumberOfReplicas`, which was causing bw comp tests failures since in some cases too many copies of the data are requested, which cannot be allocated to older nodes, and the write consistency quorum cannot be met, thus indexing times out.

Closes #7522
2014-09-01 12:04:24 +02:00
Britta Weber 3f0288fc59 fix typo in class name 2014-09-01 11:43:52 +02:00
Britta Weber c5ff70bf43 function_score: add optional weight parameter per function
Weights can be defined per function like this:

```
"function_score": {
    "functions": [
        {
            "filter": {},
            "FUNCTION": {},
            "weight": number
        }
        ...
```
If `weight` is given without `FUNCTION` then `weight` behaves like `boost_factor`.
This commit deprecates `boost_factor`.

The following is valid:

```
POST testidx/_search
{
  "query": {
    "function_score": {
      "weight": 2
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "weight": 2
        },
        ...
      ]
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "FUNCTION": {},
          "weight": 2
        },
        ...
      ]
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "filter": {},
          "weight": 2
        },
        ...
      ]
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "filter": {},
          "FUNCTION": {},
          "weight": 2
        },
        ...
      ]
    }
  }
}
```

The following is not valid:

```
POST testidx/_search
{
  "query": {
    "function_score": {
      "weight": 2,
      "FUNCTION(including boost_factor)": 2
    }
  }
}

POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "weight": 2,
          "boost_factor": 2
        }
      ]
    }
  }
}
````

closes #6955
closes #7137
2014-09-01 11:04:40 +02:00
Britta Weber 9750375412 mappings: keep parameters in mapping for _timestamp, _index and _size even if disabled
Settings that are not default for _size, _index and _timestamp were only build in
toXContent if these fields were actually enabled.
_timestamp, _index and _size can be dynamically enabled or disabled.
Therfore the settings must be kept, even if the field is disabled.
(Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..)
and SizeMappingTests#testThatDisablingWorksWhenMerging
but actually never worked, see below).

To avoid that _timestamp is overwritten by a default mapping
this commit also adds a check to mapping merging if the type is already
in the mapping. In this case the default is not applied anymore.
(see
SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration)

As a side effect, this fixes
- overwriting of paramters from the _source field by default mappings
  (see DefaultSourceMappingTests).
- dynamic enabling and disabling of _timestamp and _size ()
  (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and
  SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff )

Tests:

Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again
The missing settings in the mapping for _timestamp, _index and _size caused a the
failure: When creating a mapping which has settings other than default and the
field disabled, still empty field mappings were built from the type mappers.
When creating such a mapping, the mapping source on master and the rest of the cluster
can be out of sync for some time:

1. Master creates the index with source _timestamp:{_store:true}
   mapper classes are in a correct state but source is _timestamp:{}
2. Nodes update mapping and refresh source which then completely misses _timestamp
3. After a while source is refreshed again also on master and the _timestamp:{}
   vanishes there also.

The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed
because the cluster state was sampled from master between 1. and 3. because the
randomized testing injected a default mapping with disabled _size and _timestamp
fields that have settings which are not default.

The test
TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo
must be removed because it actualy expected the timestamp to remove
parameters when it was disabled.

closes #7137
2014-09-01 10:39:33 +02:00
Boaz Leskes 0e6bb1f28b [Rest] Add the cluster name to the "/" endpoint
The root endpoint returns basic information about this node, like it's name and ES version etc. The cluster name is an important information that belongs in that list.

Closes #7524
2014-09-01 10:05:11 +02:00
Areek Zillur 9df10a07b0 Improved Suggest Client API:
- Added SuggestBuilders (analogous to QueryBuilders)
 - supporting term, phrase, completion and fuzzyCompletion suggestion builders
- Added suggest(SuggestionBuilder) to SuggestRequest
   - previously only suggest(BytesReference) was supported

closes #7435
2014-08-31 21:55:03 -04:00
Boaz Leskes 7fb9e5e28e [Test] make testNoMasterActions more resilient 2014-08-30 18:34:20 +02:00
Martijn van Groningen 2ba4e35cde Aggregations: The nested aggregator should iterate over the child doc ids in ascending order.
The reverse_nested aggregator requires that the emitted doc ids are always in ascending order, which is already enforced on the scorer level,
but this also needs to be enforced on the nested aggrgetor level otherwise incorrect counts are a result.

Closes #7505
Closes #7514
2014-08-29 23:04:17 +02:00
Boaz Leskes d8a5ff0047 [Internal] introduce ClusterState.UNKNOWN_VERSION constant
Used as null value for cluster state versions.
2014-08-29 22:57:23 +02:00
Boaz Leskes 75795e44c1 [Tests] add different node name prefix for the different cluster type
During a test run we have a global shared cluster and potentially a suite level or even a test level cluster running. All of those share the same node name pattern (node_#). This can be confusing if you're debugging discovery related tests where those nodes from the different clusters potentially interact (and reject each other). This commit gives each cluster type a unique prefix to make tracing and log filtering simpler.

Closes #7518
2014-08-29 21:33:54 +02:00
Simon Willnauer 4473cdc503 [TEST] Remove unused plugin isolation leftover 2014-08-29 21:29:48 +02:00
Simon Willnauer 0d07917e99 [TEST] Stabelize SimpleRecoveryLocalGatewayTests#testReusePeerRecovery 2014-08-29 21:29:01 +02:00
Lee Hinman 1e21f27874 [TEST] fix off-by-one error in BigArrays tests
Comparisons for the BigArrays breaker use "greater than" instead of
"greater than or equal", which was never an issue before because the
test size was not right on a page boundary. A test with an exactly
divisible page boundary (4mb exactly in this case) caused the sizes to
be equal to, but not exceed, the limit, and never break.

The limit should be smaller than the test increments the breaker anyway.
2014-08-29 17:17:03 +02:00
Boaz Leskes ed5b2e0e35 Add an assertion to ZenDiscovery checking that local node is never elected if pings indicate an active master 2014-08-29 17:07:24 +02:00
Boaz Leskes 680fb36637 [Discovery] Add try/catch around repetitive onSuccess calls 2014-08-29 17:03:08 +02:00
Adrien Grand 172a40c55e Docs: Add javadocs to the client-side aggregation APIs. 2014-08-29 16:36:43 +02:00
markharwood 536d3ffed0 Highlighter Javadocs 2014-08-29 16:26:41 +02:00
Martijn van Groningen f416ed4949 Docs: added missing jdocs for the percolate client classes.
Also made constructors were possible package protected
and removed some useless getters in percolator source builder.
2014-08-29 16:26:41 +02:00
Simon Willnauer c10ef110ae [DOCS] Added JavaDocs for ClusterAdminClient, IndicesAdminClient and Warmer API 2014-08-29 16:26:41 +02:00
markharwood 1687c5ad51 Completion suggestion javadocs 2014-08-29 16:26:41 +02:00
Simon Willnauer 1bb0677df7 [CORE] Don't update indexShard if it has been removed before
Today we have logic that removes a shard from the indexservice if
the shard has changed ie. from replica to primary or if it's recovery
source vanished etc. This can cause shards from been not allocated at
all on a nodes causeing delete requests to timeout since we were waiting
for shards on nodes that got dropped due to a IndexShardMissingException

Closes #7509
2014-08-29 15:16:22 +02:00
markharwood c0aef4adc4 Suggest API - bugs with encoding multiple levels of geo precision.
1) One issue reported by a user is due to the truncation of the geohash string. Added Junit test for this scenario
2) Another suspect piece of code was the “toAutomaton” method that only merged the first of possibly many precisions into the result.

Closes #7368
2014-08-29 13:41:35 +01:00
Simon Willnauer 88aec9e3c0 [TEST] Fix per-segment / per-commit exclude logic in CorruptFileTest 2014-08-29 11:43:52 +02:00
Lee Hinman b2827a09a9 [TEST] add AwaitsFix for testTranslogChecksums since it may cause OOME
if the size is corrupted
2014-08-29 10:11:50 +02:00
Boaz Leskes d15909716b [Internal] moved ZenDiscovery setting to use string constants 2014-08-29 09:46:28 +02:00
Michael Brackx 0fd3ef6df0 Client: Make the query builder nullable in filteredQuery.
Close #7398
2014-08-29 09:40:38 +02:00
Simon Willnauer d7a068d02c [TEST] Exclude per commit files rather than only segments_N
When we corrupt a file in the snapshot/restore case we have to corrupt
a per-segment file. The .del file might change with the commit / flush
that is triggered by the snapshot operation.
2014-08-29 09:22:03 +02:00
Boaz Leskes 183ca37dfa Code style improvement 2014-08-29 09:01:05 +02:00
Martijn van Groningen c55341bf51 Core: Remove the warmer listener when the FixedBitSetFilterCache gets closed. 2014-08-28 20:58:34 +02:00
Martijn van Groningen 4c690fae47 Scan: Use ConcurrentHashMap instead of HashMap, because the readerStates is accessed by multiple threads during the entire scroll session.
Closes #7499
Closes #7478
2014-08-28 16:36:17 +02:00
Philip Wills a3c4137079 Aggregations: Encapsulate AggregationBuilder name and make getter public
Close #7425
2014-08-28 16:34:41 +02:00
Brian Murphy c165e640fc Indexed Scripts/Templates : Change the default auto_expand to 0-all
This commit changes the auto_expand_replicas setting for the ````.scripts```` index to
0-all from 1-all.
2014-08-28 15:31:44 +01:00
Brian Murphy f44bb502ee Indexed Scripts/Templates : Fix .script index template.
This commit makes the default number of shards for the .scripts index to ````1````, it also
forces the auto_expand replicas to ````1-all````. This change means that script index GET requests to load
scripts from the index should always use the local copy of the scripts index, preventing any network traffic or calls
on script GET.
2014-08-28 14:54:24 +01:00
javanna 88839ec546 [TEST] apply default settings by calling super.nodeSettings method when providing test specific methods 2014-08-28 15:35:35 +02:00
javanna a0e9532dca [TEST] make default settings don't override test specific settings 2014-08-28 15:35:34 +02:00
javanna 645db6867b [TEST] apply default settings before test specific ones to external nodes in bw comp tests, otherwise the defaults win all the time 2014-08-28 15:35:34 +02:00
Lee Hinman 09816fdf57 Validate create index requests' number of primary/replica shards
Fixes #7495
2014-08-28 14:20:32 +02:00
Simon Willnauer cc37ae13bc [CORE] Make network interface iteration order consistent
Today the iteration order of the interfaces might change across JVMs
this commit cleans up the NetworkUtils class and attempts to ensure
consistent iteration order across JVMs.
2014-08-28 12:35:56 +02:00
Simon Willnauer c93e6e3f67 [TEST] Fix RandomScoreFunctionTests#testConsistentHitsWithSameSeed 2014-08-28 12:31:47 +02:00
Boaz Leskes c6090e5d9b [Tests] add a debug logging message when starting an external node 2014-08-28 12:13:05 +02:00
Martijn van Groningen 6de18262dd Test: Increase the ping timeout to avoid that a candidate master node makes the decision to elect itself too soon. 2014-08-28 11:49:30 +02:00
Simon Willnauer 1d960d08f7 [TEST] only expand to 1 replica in SnapshotBackwardsCompatibilityTest 2014-08-28 11:20:33 +02:00
Simon Willnauer d062b2b0a4 [TEST] use a dedicated port range per test JVM
For reliability and debug purposes each test JVM should use it's own
TCP port range if executed in parallel. This also moves away from the
default port range to prevent conflicts with running ES instance on the local
machine.
2014-08-28 09:18:39 +02:00
Ryan Ernst eb22d9ec24 FunctionScore: Fixed RandomScoreFunction to guard against _uid field not existing.
Also added a test case to check the random score works with queries on
an empty index.
2014-08-27 17:01:01 -07:00
Simon Willnauer 59da079bae [SNAPSHOT] Ensure BWC layer can read chunked blobs 2014-08-27 21:33:40 +02:00
Martijn van Groningen 94eed4ef56 Introduced FixedBitSetFilterCache that guarantees to produce a FixedBitSet and does evict based on size or time.
Only when segments are merged away due to merging then entries in this cache are cleaned up.

Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this.
Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache.

Closes #7037
Closes #7031
2014-08-27 21:28:36 +02:00
Boaz Leskes 852a1103f3 [Internal] user node's cluster name as a default for an incoming cluster state who misses it
ClusterState has a reference to the cluster name since version 1.1.0 (df7474b9fc) . However, if the state was  sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version.

This commit changes the default to the node's cluster name.

Relates to #7386

Closes #7414
2014-08-27 20:24:27 +02:00
Boaz Leskes 55e9f169c3 [Tests] change BasicBackwardsCompatibilityTest to be compatible with 1.0.3
Also increase the time we wait for an external node to join
Sadly tests are not yet stable enough, testing with 1.0.3 is still disabled
2014-08-27 20:14:45 +02:00
Ryan Ernst 65afa1d93b FunctionScore: Refactor RandomScoreFunction to be consistent, and return values in rang [0.0, 1.0]
RandomScoreFunction previously relied on the order the documents were
iterated in from Lucene. This caused changes in ordering, with the same
seed, if documents moved to different segments. With this change, a
murmur32 hash of the _uid for each document is used as the "random"
value. Also, the hash is adjusted so as to only return values between
0.0 and 1.0 to enable easier manipulation to fit into users' scoring
models.

closes #6907, #7446
2014-08-27 08:37:25 -07:00
Alexander Reelsen 3aa72f2738 Test: Allow global test cluster to have configurable settings source
This allows to reuse the global test cluster with specific configurations,
which is useful in plugins.
2014-08-27 17:04:14 +02:00
Boaz Leskes d5552a980f [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
2014-08-27 15:47:42 +02:00
Boaz Leskes ff8b7409f7 [Discovery] add a debug log if a node responds to a publish request after publishing timed out. 2014-08-27 15:47:41 +02:00
Martijn van Groningen 5932371f21 [TEST] Adapt testNoMasterActions since metadata isn't cleared if there is a no master block 2014-08-27 15:47:41 +02:00
Martijn van Groningen c8919e4bf5 [TEST] Changed action names. 2014-08-27 15:47:41 +02:00
Martijn van Groningen 702890e461 [TEST] Remove the forceful `network.mode` setting in DiscoveryWithServiceDisruptions#testMasterNodeGCs now local transport use worker threads. 2014-08-27 15:47:41 +02:00
Boaz Leskes 26d90882e5 [Transport] Introduced worker threads to prevent alien threads of entering a node.
Requests are handled by the worked thread pool of the target node instead of the generic thread pool of the source node.
Also this change is required in order to make GC disruption work with local transport. Previously the handling of the a request was performed on on a node that that was being GC disrupted, resulting in some actions being performed while GC was being simulated.
2014-08-27 15:47:40 +02:00
Martijn van Groningen 966a55d21c Typo: s/Recieved/Received 2014-08-27 15:47:40 +02:00
Martijn van Groningen 47326adb67 [TEST] Make sure all shards are allocated before killing a random data node. 2014-08-27 15:47:40 +02:00
Martijn van Groningen 403ebc9e07 [Discovery] Added cluster version and master node to the nodes fault detecting ping request
The cluster state version allows resolving the case where a old master node become unresponsive and later wakes up and pings all the nodes in the cluster, allowing the newly elected master to decide whether it should step down or ask the old master to rejoin.
2014-08-27 15:47:40 +02:00
Boaz Leskes 50f852ffeb [TEST] Added LongGCDisruption and a test simulating GC on master nodes
Also rename DiscoveryWithNetworkFailuresTests to DiscoveryWithServiceDisruptions which better suites what we do.
2014-08-27 15:47:40 +02:00
Martijn van Groningen 4b8456e954 [Discovery] Master fault detection and nodes fault detection should take cluster name into account.
Both master fault detection and nodes fault detection request should also send the cluster name, so that on the receiving side the handling of these requests can be failed with an error. This error can be caught on the sending side and for master fault detection the node can fail the master locally and for nodes fault detection the node can be failed.

Note this validation will most likely never fail in a production cluster, but in during automated tests where cluster / nodes are created and destroyed very frequently.
2014-08-27 15:47:39 +02:00
Martijn van Groningen 364374dd03 [TEST] Added test that verifies that no shard relocations happen during / after a master re-election. 2014-08-27 15:47:39 +02:00
Martijn van Groningen 130e680cfb [Discovery] Made the handeling of the join request batch oriented.
In large clusters when a new elected master is chosen, there are many join requests to handle. By batching them up the the cluster state doesn't get published for each individual join request, but many handled at the same time, which results into a single new cluster state which ends up be published.

Closes #6984
2014-08-27 15:47:39 +02:00
Shay Banon 0244ddb0cd retry logic to unwrap exception to check for illegal state
it probably comes wrapped in a remote exception, which we should unwrap in order to detect it..., also, simplified a bit the retry logic
2014-08-27 15:47:39 +02:00
Boaz Leskes cccd060a0c [Discovery] verify we have a master after a successful join request
After master election, nodes send join requests to the elected master. Master is then responsible for publishing a new cluster state which sets the master on the local node's cluster state. If something goes wrong with the cluster state publishing, this process will not successfully complete. We should check it after the join request returns and if it failed, retry pinging.

Closes #6969
2014-08-27 15:47:38 +02:00
Boaz Leskes ffcf1077d8 [Discovery] join master after first election
Currently, pinging results are only used if the local node is elected master or if they detect another *already* active master. This has the effect that master election requires two pinging rounds - one for the elected master to take is role and another for the other nodes to detect it and join the cluster. We can be smarter and use the election of the first round on other nodes as well. Those nodes can try to join the elected master immediately. There is a catch though - the elected master node may still be processing the election and may reject the join request if not ready yet. To compensate a retry mechanism is introduced to try again (up to 3 times by default) if this happens.

Closes #6943
2014-08-27 15:47:38 +02:00
Boaz Leskes a40984887b [Tests] Fixed some issues with SlowClusterStateProcessing
Reduced expected time to heal to 0 (we interrupt and wait on stop disruption). It was also  wrongly indicated in seconds.
We didn't properly wait between slow cluster state tasks
2014-08-27 15:47:38 +02:00
Martijn van Groningen c2142c0f6d Discovery: Don't include local node to pingMasters list. We might end up electing ourselves without any form of verification. 2014-08-27 15:47:38 +02:00
Martijn van Groningen 5e38e9eb4f Discovery: Only add local node to possibleMasterNodes if it is a master node. 2014-08-27 15:47:37 +02:00
Martijn van Groningen 67685cb026 Discovery: If not enough possible masters are found, but there are masters to ping (ping responses did include master node) then these nodes should be resolved.
After the findMaster() call we try to connect to the node and if it isn't the master we start looking for a new master via pinging again.

Closes #6904
2014-08-27 15:47:37 +02:00
Boaz Leskes f029a24d53 [Store] migrate non-allocated shard deletion to use ClusterStateNonMasterUpdateTask 2014-08-27 15:47:37 +02:00
Boaz Leskes bebaf9799c [Tests] stability improvements
added explicit cleaning of temp unicast ping results
reduce gateway local.list_timeout to 10s.
testVerifyApiBlocksDuringPartition: verify master node has stepped down before restoring partition
2014-08-27 15:47:30 +02:00
Boaz Leskes ea2783787c [Tests] Introduced ClusterDiscoveryConfiguration
Closes #6890
2014-08-27 15:47:23 +02:00
Boaz Leskes ccabb4aa20 Remove unneeded reference to DiscoveryService which potentially causes circular references 2014-08-27 15:47:23 +02:00
Boaz Leskes 7fa3d7081b [logging] don't log an error if scheduled reroute is rejected because local node is no longer master
Since it runs in a background thread after a node is added, or submits a cluster state update when a node leaves, it may be that by the time it is executed the local node is no longer master.
2014-08-27 15:47:23 +02:00
Boaz Leskes e0543b3426 [Internal] Migrate new initial state cluster update task to a ClusterStateNonMasterUpdateTask 2014-08-27 15:47:23 +02:00
Boaz Leskes c12d0901f6 [Tests] Increase timeout when waiting for partitions to heal
the current 30s addition is tricky because we use 30s as timeout in many places...
2014-08-27 15:47:22 +02:00
Boaz Leskes 7b6e194923 [Tests] Don't log about restoring a partition if the partition is not active. 2014-08-27 15:47:22 +02:00
Boaz Leskes 522d4afe0c [Tests] Use local gateway
This is important to for proper primary allocation decisions
2014-08-27 15:47:22 +02:00
Boaz Leskes 3586e38c40 [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
2014-08-27 15:47:22 +02:00
Boaz Leskes 5302a53145 [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
2014-08-27 15:47:22 +02:00
Boaz Leskes 48c7da1fd4 [Test] testVerifyApiBlocksDuringPartition - wait for stable cluster after partition 2014-08-27 15:47:21 +02:00
Martijn van Groningen d99ca806cb [TEST] Properly clear the disruption schemes after test completed. 2014-08-27 15:47:21 +02:00
Boaz Leskes e897dccb52 [Tests] improved automatic disruption healing after tests 2014-08-27 15:47:21 +02:00
Boaz Leskes 5e5f8a9daf Added java docs to all tests in DiscoveryWithNetworkFailuresTests
Moved testVerifyApiBlocksDuringPartition to test blocks rather then rely on specific API rejections.
Did some cleaning while at it.
2014-08-27 15:47:21 +02:00
Martijn van Groningen 77dae631e1 [TEST] Make sure get request is always local 2014-08-27 15:47:20 +02:00
Martijn van Groningen 52f69c64f7 [TEST] Verify no master block during partition for read and write apis 2014-08-27 15:47:20 +02:00
Martijn van Groningen 98084c02ce [TEST] Added test to verify if 'discovery.zen.rejoin_on_master_gone' is updatable at runtime. 2014-08-27 15:47:20 +02:00
Boaz Leskes c3e84eb639 Fixed compilation issue caused by the lack of a thread pool name 2014-08-27 15:47:20 +02:00
Boaz Leskes 1af82fd96a [Tests] Disabling testAckedIndexing
The test is currently unstable and needs some more work
2014-08-27 15:47:20 +02:00
Boaz Leskes a7a61a0392 [Test] ensureStableCluster failed to pass viaNode parameter correctly
Also improved timeouts & logs
2014-08-27 15:47:19 +02:00
Martijn van Groningen f7b962a417 [TEST] Renamed afterDistribution timeout to expectedTimeToHeal
Accumulate expected shard failures to log later
2014-08-27 15:47:19 +02:00
Martijn van Groningen 785d0e55ab [TEST] Reduced failures in DiscoveryWithNetworkFailuresTests#testAckedIndexing test:
* waiting time should be long enough depending on the type of the disruption scheme
* MockTransportService#addUnresponsiveRule if remaining delay is smaller than 0 don't double execute transport logic
2014-08-27 15:47:19 +02:00
Martijn van Groningen 8aed9ee46f [TEST] Check if worker if null to prevent NPE on double stopping 2014-08-27 15:47:19 +02:00
Boaz Leskes 28489cee45 [Tests] Added ServiceDisruptionScheme(s) and testAckedIndexing
This commit adds the notion of ServiceDisruptionScheme allowing for introducing disruptions in our test cluster. This
abstraction as used in a couple of wrappers around the functionality offered by MockTransportService to simulate various
network partions. There is also one implementation for causing a node to be slow in processing cluster state updates.

This new mechnaism is integrated into existing tests DiscoveryWithNetworkFailuresTests.

A new test called testAckedIndexing is added to verify retrieval of documents whose indexing was acked during various disruptions.

Closes #6505
2014-08-27 15:47:14 +02:00
Boaz Leskes 5d13571dbe [Discovery] when master is gone, flush all pending cluster states
If the master FD flags master as gone while there are still pending cluster states, the processing of those cluster states we re-instate that node a master again.

Closes #6526
2014-08-27 15:47:13 +02:00
Boaz Leskes 8b85d97ea6 [Discovery] Improved logging when a join request is not executed because local node is no longer master 2014-08-27 15:47:09 +02:00
Boaz Leskes 7db9e98ee7 [Discovery] Change (Master|Nodes)FaultDetection's connect_on_network_disconnect default to false
The previous default was true, which means that after a node disconnected event we try to connect to it as an extra validation. This can result in slow detection of network partitions if the extra reconnect times out before failure.

Also added tests to verify the settings' behaviour
2014-08-27 15:47:05 +02:00
Boaz Leskes e39ac7eef4 [Test] testIsolateMasterAndVerifyClusterStateConsensus didn't wait on initializing shards before comparing cluster states 2014-08-27 15:46:51 +02:00
Martijn van Groningen f3d90cdb17 [TEST] Remove 'index.routing.allocation.total_shards_per_node' setting in data consistency test 2014-08-27 15:46:51 +02:00
Boaz Leskes 58f8774fa2 [Discovery] do not use versions to optimize cluster state copying for a first update from a new master
We have an optimization which compares routing/meta data version of cluster states and tries to reuse the current object if the versions are equal. This can cause rare failures during recovery from a minimum_master_node breach when using the "new light rejoin" mechanism and simulated network disconnects. This happens where the current master updates it's state, doesn't manage to broadcast it to other nodes due to the disconnect and then steps down. The new master will start with a previous version and continue to update it. When the old master rejoins, the versions of it's state can equal but the content is different.

Also improved DiscoveryWithNetworkFailuresTests to simulate this failure (and other improvements)

Closes #6466
2014-08-27 15:46:50 +02:00
Martijn van Groningen 1849d0966c [Discovery] Made 'discovery.zen.rejoin_on_master_gone' setting updatable at runtime. 2014-08-27 15:46:46 +02:00
Martijn van Groningen 424a2f68c6 [Discovery] Removed METADATA block 2014-08-27 15:46:39 +02:00
Martijn van Groningen 4828e78637 [TEST] Added test that exposes a shard consistency problem when isolated node(s) rejoin the cluster after network segmentation and when the elected master node ended up on the lesser side of the network segmentation. 2014-08-27 15:46:39 +02:00
Martijn van Groningen e7d24ecdd0 [TEST] Make sure there no initializing shards when network partition is simulated 2014-08-27 15:46:39 +02:00
Martijn van Groningen fc8ae4d30d [TEST] Added test that verifies data integrity during and after a simulated network split. 2014-08-27 15:46:39 +02:00
Martijn van Groningen 2c9ef63676 [TEST] It may take a little bit before the unlucky node deals with the fact the master left 2014-08-27 15:46:38 +02:00
Boaz Leskes d44bed5f48 [Internal] Do not execute cluster state changes if current node is no longer master
When a node steps down from being a master (because, for example, min_master_node is breached), it may still have
cluster state update tasks queued up. Most (but not all) are tasks that should no longer be executed as the node
no longer has authority to do so. Other cluster states updates, like electing the current node as master, should be
executed even if the current node is no longer master.

This commit make sure that, by default, `ClusterStateUpdateTask` is not executed if the node is no longer master. Tasks
that should run on non masters are changed to implement a new interface called `ClusterStateNonMasterUpdateTask`

Closes #6230
2014-08-27 15:46:38 +02:00