Commit Graph

4988 Commits

Author SHA1 Message Date
Britta Weber 3f0288fc59 fix typo in class name 2014-09-01 11:43:52 +02:00
Britta Weber c5ff70bf43 function_score: add optional weight parameter per function
Weights can be defined per function like this:

```
"function_score": {
    "functions": [
        {
            "filter": {},
            "FUNCTION": {},
            "weight": number
        }
        ...
```
If `weight` is given without `FUNCTION` then `weight` behaves like `boost_factor`.
This commit deprecates `boost_factor`.

The following is valid:

```
POST testidx/_search
{
  "query": {
    "function_score": {
      "weight": 2
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "weight": 2
        },
        ...
      ]
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "FUNCTION": {},
          "weight": 2
        },
        ...
      ]
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "filter": {},
          "weight": 2
        },
        ...
      ]
    }
  }
}
POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "filter": {},
          "FUNCTION": {},
          "weight": 2
        },
        ...
      ]
    }
  }
}
```

The following is not valid:

```
POST testidx/_search
{
  "query": {
    "function_score": {
      "weight": 2,
      "FUNCTION(including boost_factor)": 2
    }
  }
}

POST testidx/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "weight": 2,
          "boost_factor": 2
        }
      ]
    }
  }
}
````

closes #6955
closes #7137
2014-09-01 11:04:40 +02:00
Britta Weber 9750375412 mappings: keep parameters in mapping for _timestamp, _index and _size even if disabled
Settings that are not default for _size, _index and _timestamp were only build in
toXContent if these fields were actually enabled.
_timestamp, _index and _size can be dynamically enabled or disabled.
Therfore the settings must be kept, even if the field is disabled.
(Dynamic enabling/disabling was intended, see TimestampFieldMapper.merge(..)
and SizeMappingTests#testThatDisablingWorksWhenMerging
but actually never worked, see below).

To avoid that _timestamp is overwritten by a default mapping
this commit also adds a check to mapping merging if the type is already
in the mapping. In this case the default is not applied anymore.
(see
SimpleTimestampTests#testThatUpdatingMappingShouldNotRemoveTimestampConfiguration)

As a side effect, this fixes
- overwriting of paramters from the _source field by default mappings
  (see DefaultSourceMappingTests).
- dynamic enabling and disabling of _timestamp and _size ()
  (see SimpleTimestampTests#testThatTimestampCanBeSwitchedOnAndOff and
  SizeMappingIntegrationTests#testThatTimestampCanBeSwitchedOnAndOff )

Tests:

Enable UpdateMappingOnClusterTests#test_doc_valuesInvalidMappingOnUpdate again
The missing settings in the mapping for _timestamp, _index and _size caused a the
failure: When creating a mapping which has settings other than default and the
field disabled, still empty field mappings were built from the type mappers.
When creating such a mapping, the mapping source on master and the rest of the cluster
can be out of sync for some time:

1. Master creates the index with source _timestamp:{_store:true}
   mapper classes are in a correct state but source is _timestamp:{}
2. Nodes update mapping and refresh source which then completely misses _timestamp
3. After a while source is refreshed again also on master and the _timestamp:{}
   vanishes there also.

The test UpdateMappingOnCusterTests#test_doc_valuesInvalidMappingOnUpdate failed
because the cluster state was sampled from master between 1. and 3. because the
randomized testing injected a default mapping with disabled _size and _timestamp
fields that have settings which are not default.

The test
TimestampMappingTests#testThatDisablingFieldMapperDoesNotReturnAnyUselessInfo
must be removed because it actualy expected the timestamp to remove
parameters when it was disabled.

closes #7137
2014-09-01 10:39:33 +02:00
Boaz Leskes 0e6bb1f28b [Rest] Add the cluster name to the "/" endpoint
The root endpoint returns basic information about this node, like it's name and ES version etc. The cluster name is an important information that belongs in that list.

Closes #7524
2014-09-01 10:05:11 +02:00
Areek Zillur 9df10a07b0 Improved Suggest Client API:
- Added SuggestBuilders (analogous to QueryBuilders)
 - supporting term, phrase, completion and fuzzyCompletion suggestion builders
- Added suggest(SuggestionBuilder) to SuggestRequest
   - previously only suggest(BytesReference) was supported

closes #7435
2014-08-31 21:55:03 -04:00
Boaz Leskes 7fb9e5e28e [Test] make testNoMasterActions more resilient 2014-08-30 18:34:20 +02:00
Martijn van Groningen 2ba4e35cde Aggregations: The nested aggregator should iterate over the child doc ids in ascending order.
The reverse_nested aggregator requires that the emitted doc ids are always in ascending order, which is already enforced on the scorer level,
but this also needs to be enforced on the nested aggrgetor level otherwise incorrect counts are a result.

Closes #7505
Closes #7514
2014-08-29 23:04:17 +02:00
Boaz Leskes d8a5ff0047 [Internal] introduce ClusterState.UNKNOWN_VERSION constant
Used as null value for cluster state versions.
2014-08-29 22:57:23 +02:00
Boaz Leskes 75795e44c1 [Tests] add different node name prefix for the different cluster type
During a test run we have a global shared cluster and potentially a suite level or even a test level cluster running. All of those share the same node name pattern (node_#). This can be confusing if you're debugging discovery related tests where those nodes from the different clusters potentially interact (and reject each other). This commit gives each cluster type a unique prefix to make tracing and log filtering simpler.

Closes #7518
2014-08-29 21:33:54 +02:00
Simon Willnauer 4473cdc503 [TEST] Remove unused plugin isolation leftover 2014-08-29 21:29:48 +02:00
Simon Willnauer 0d07917e99 [TEST] Stabelize SimpleRecoveryLocalGatewayTests#testReusePeerRecovery 2014-08-29 21:29:01 +02:00
Lee Hinman 1e21f27874 [TEST] fix off-by-one error in BigArrays tests
Comparisons for the BigArrays breaker use "greater than" instead of
"greater than or equal", which was never an issue before because the
test size was not right on a page boundary. A test with an exactly
divisible page boundary (4mb exactly in this case) caused the sizes to
be equal to, but not exceed, the limit, and never break.

The limit should be smaller than the test increments the breaker anyway.
2014-08-29 17:17:03 +02:00
Boaz Leskes ed5b2e0e35 Add an assertion to ZenDiscovery checking that local node is never elected if pings indicate an active master 2014-08-29 17:07:24 +02:00
Boaz Leskes 680fb36637 [Discovery] Add try/catch around repetitive onSuccess calls 2014-08-29 17:03:08 +02:00
Adrien Grand 172a40c55e Docs: Add javadocs to the client-side aggregation APIs. 2014-08-29 16:36:43 +02:00
markharwood 536d3ffed0 Highlighter Javadocs 2014-08-29 16:26:41 +02:00
Martijn van Groningen f416ed4949 Docs: added missing jdocs for the percolate client classes.
Also made constructors were possible package protected
and removed some useless getters in percolator source builder.
2014-08-29 16:26:41 +02:00
Simon Willnauer c10ef110ae [DOCS] Added JavaDocs for ClusterAdminClient, IndicesAdminClient and Warmer API 2014-08-29 16:26:41 +02:00
markharwood 1687c5ad51 Completion suggestion javadocs 2014-08-29 16:26:41 +02:00
Simon Willnauer 1bb0677df7 [CORE] Don't update indexShard if it has been removed before
Today we have logic that removes a shard from the indexservice if
the shard has changed ie. from replica to primary or if it's recovery
source vanished etc. This can cause shards from been not allocated at
all on a nodes causeing delete requests to timeout since we were waiting
for shards on nodes that got dropped due to a IndexShardMissingException

Closes #7509
2014-08-29 15:16:22 +02:00
markharwood c0aef4adc4 Suggest API - bugs with encoding multiple levels of geo precision.
1) One issue reported by a user is due to the truncation of the geohash string. Added Junit test for this scenario
2) Another suspect piece of code was the “toAutomaton” method that only merged the first of possibly many precisions into the result.

Closes #7368
2014-08-29 13:41:35 +01:00
Simon Willnauer 88aec9e3c0 [TEST] Fix per-segment / per-commit exclude logic in CorruptFileTest 2014-08-29 11:43:52 +02:00
Lee Hinman b2827a09a9 [TEST] add AwaitsFix for testTranslogChecksums since it may cause OOME
if the size is corrupted
2014-08-29 10:11:50 +02:00
Boaz Leskes d15909716b [Internal] moved ZenDiscovery setting to use string constants 2014-08-29 09:46:28 +02:00
Michael Brackx 0fd3ef6df0 Client: Make the query builder nullable in filteredQuery.
Close #7398
2014-08-29 09:40:38 +02:00
Simon Willnauer d7a068d02c [TEST] Exclude per commit files rather than only segments_N
When we corrupt a file in the snapshot/restore case we have to corrupt
a per-segment file. The .del file might change with the commit / flush
that is triggered by the snapshot operation.
2014-08-29 09:22:03 +02:00
Boaz Leskes 183ca37dfa Code style improvement 2014-08-29 09:01:05 +02:00
Martijn van Groningen c55341bf51 Core: Remove the warmer listener when the FixedBitSetFilterCache gets closed. 2014-08-28 20:58:34 +02:00
Martijn van Groningen 4c690fae47 Scan: Use ConcurrentHashMap instead of HashMap, because the readerStates is accessed by multiple threads during the entire scroll session.
Closes #7499
Closes #7478
2014-08-28 16:36:17 +02:00
Philip Wills a3c4137079 Aggregations: Encapsulate AggregationBuilder name and make getter public
Close #7425
2014-08-28 16:34:41 +02:00
Brian Murphy c165e640fc Indexed Scripts/Templates : Change the default auto_expand to 0-all
This commit changes the auto_expand_replicas setting for the ````.scripts```` index to
0-all from 1-all.
2014-08-28 15:31:44 +01:00
Brian Murphy f44bb502ee Indexed Scripts/Templates : Fix .script index template.
This commit makes the default number of shards for the .scripts index to ````1````, it also
forces the auto_expand replicas to ````1-all````. This change means that script index GET requests to load
scripts from the index should always use the local copy of the scripts index, preventing any network traffic or calls
on script GET.
2014-08-28 14:54:24 +01:00
javanna 88839ec546 [TEST] apply default settings by calling super.nodeSettings method when providing test specific methods 2014-08-28 15:35:35 +02:00
javanna a0e9532dca [TEST] make default settings don't override test specific settings 2014-08-28 15:35:34 +02:00
javanna 645db6867b [TEST] apply default settings before test specific ones to external nodes in bw comp tests, otherwise the defaults win all the time 2014-08-28 15:35:34 +02:00
Lee Hinman 09816fdf57 Validate create index requests' number of primary/replica shards
Fixes #7495
2014-08-28 14:20:32 +02:00
Simon Willnauer cc37ae13bc [CORE] Make network interface iteration order consistent
Today the iteration order of the interfaces might change across JVMs
this commit cleans up the NetworkUtils class and attempts to ensure
consistent iteration order across JVMs.
2014-08-28 12:35:56 +02:00
Simon Willnauer c93e6e3f67 [TEST] Fix RandomScoreFunctionTests#testConsistentHitsWithSameSeed 2014-08-28 12:31:47 +02:00
Boaz Leskes c6090e5d9b [Tests] add a debug logging message when starting an external node 2014-08-28 12:13:05 +02:00
Martijn van Groningen 6de18262dd Test: Increase the ping timeout to avoid that a candidate master node makes the decision to elect itself too soon. 2014-08-28 11:49:30 +02:00
Simon Willnauer 1d960d08f7 [TEST] only expand to 1 replica in SnapshotBackwardsCompatibilityTest 2014-08-28 11:20:33 +02:00
Simon Willnauer d062b2b0a4 [TEST] use a dedicated port range per test JVM
For reliability and debug purposes each test JVM should use it's own
TCP port range if executed in parallel. This also moves away from the
default port range to prevent conflicts with running ES instance on the local
machine.
2014-08-28 09:18:39 +02:00
Ryan Ernst eb22d9ec24 FunctionScore: Fixed RandomScoreFunction to guard against _uid field not existing.
Also added a test case to check the random score works with queries on
an empty index.
2014-08-27 17:01:01 -07:00
Simon Willnauer 59da079bae [SNAPSHOT] Ensure BWC layer can read chunked blobs 2014-08-27 21:33:40 +02:00
Martijn van Groningen 94eed4ef56 Introduced FixedBitSetFilterCache that guarantees to produce a FixedBitSet and does evict based on size or time.
Only when segments are merged away due to merging then entries in this cache are cleaned up.

Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this.
Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache.

Closes #7037
Closes #7031
2014-08-27 21:28:36 +02:00
Boaz Leskes 852a1103f3 [Internal] user node's cluster name as a default for an incoming cluster state who misses it
ClusterState has a reference to the cluster name since version 1.1.0 (df7474b9fc) . However, if the state was  sent from a master of an older version, this name can be set to null. This is an unexpected and can cause bugs. The bad part is that it will never correct it self until a full cluster restart where the cluster state is rebuilt using the code of the latest version.

This commit changes the default to the node's cluster name.

Relates to #7386

Closes #7414
2014-08-27 20:24:27 +02:00
Boaz Leskes 55e9f169c3 [Tests] change BasicBackwardsCompatibilityTest to be compatible with 1.0.3
Also increase the time we wait for an external node to join
Sadly tests are not yet stable enough, testing with 1.0.3 is still disabled
2014-08-27 20:14:45 +02:00
Ryan Ernst 65afa1d93b FunctionScore: Refactor RandomScoreFunction to be consistent, and return values in rang [0.0, 1.0]
RandomScoreFunction previously relied on the order the documents were
iterated in from Lucene. This caused changes in ordering, with the same
seed, if documents moved to different segments. With this change, a
murmur32 hash of the _uid for each document is used as the "random"
value. Also, the hash is adjusted so as to only return values between
0.0 and 1.0 to enable easier manipulation to fit into users' scoring
models.

closes #6907, #7446
2014-08-27 08:37:25 -07:00
Alexander Reelsen 3aa72f2738 Test: Allow global test cluster to have configurable settings source
This allows to reuse the global test cluster with specific configurations,
which is useful in plugins.
2014-08-27 17:04:14 +02:00
Boaz Leskes d5552a980f [Discovery] UnicastZenPing should also ping last known discoNodes
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
2014-08-27 15:47:42 +02:00
Boaz Leskes ff8b7409f7 [Discovery] add a debug log if a node responds to a publish request after publishing timed out. 2014-08-27 15:47:41 +02:00
Martijn van Groningen 5932371f21 [TEST] Adapt testNoMasterActions since metadata isn't cleared if there is a no master block 2014-08-27 15:47:41 +02:00
Martijn van Groningen c8919e4bf5 [TEST] Changed action names. 2014-08-27 15:47:41 +02:00
Martijn van Groningen 702890e461 [TEST] Remove the forceful `network.mode` setting in DiscoveryWithServiceDisruptions#testMasterNodeGCs now local transport use worker threads. 2014-08-27 15:47:41 +02:00
Boaz Leskes 26d90882e5 [Transport] Introduced worker threads to prevent alien threads of entering a node.
Requests are handled by the worked thread pool of the target node instead of the generic thread pool of the source node.
Also this change is required in order to make GC disruption work with local transport. Previously the handling of the a request was performed on on a node that that was being GC disrupted, resulting in some actions being performed while GC was being simulated.
2014-08-27 15:47:40 +02:00
Martijn van Groningen 966a55d21c Typo: s/Recieved/Received 2014-08-27 15:47:40 +02:00
Martijn van Groningen 47326adb67 [TEST] Make sure all shards are allocated before killing a random data node. 2014-08-27 15:47:40 +02:00
Martijn van Groningen 403ebc9e07 [Discovery] Added cluster version and master node to the nodes fault detecting ping request
The cluster state version allows resolving the case where a old master node become unresponsive and later wakes up and pings all the nodes in the cluster, allowing the newly elected master to decide whether it should step down or ask the old master to rejoin.
2014-08-27 15:47:40 +02:00
Boaz Leskes 50f852ffeb [TEST] Added LongGCDisruption and a test simulating GC on master nodes
Also rename DiscoveryWithNetworkFailuresTests to DiscoveryWithServiceDisruptions which better suites what we do.
2014-08-27 15:47:40 +02:00
Martijn van Groningen 4b8456e954 [Discovery] Master fault detection and nodes fault detection should take cluster name into account.
Both master fault detection and nodes fault detection request should also send the cluster name, so that on the receiving side the handling of these requests can be failed with an error. This error can be caught on the sending side and for master fault detection the node can fail the master locally and for nodes fault detection the node can be failed.

Note this validation will most likely never fail in a production cluster, but in during automated tests where cluster / nodes are created and destroyed very frequently.
2014-08-27 15:47:39 +02:00
Martijn van Groningen 364374dd03 [TEST] Added test that verifies that no shard relocations happen during / after a master re-election. 2014-08-27 15:47:39 +02:00
Martijn van Groningen 130e680cfb [Discovery] Made the handeling of the join request batch oriented.
In large clusters when a new elected master is chosen, there are many join requests to handle. By batching them up the the cluster state doesn't get published for each individual join request, but many handled at the same time, which results into a single new cluster state which ends up be published.

Closes #6984
2014-08-27 15:47:39 +02:00
Shay Banon 0244ddb0cd retry logic to unwrap exception to check for illegal state
it probably comes wrapped in a remote exception, which we should unwrap in order to detect it..., also, simplified a bit the retry logic
2014-08-27 15:47:39 +02:00
Boaz Leskes cccd060a0c [Discovery] verify we have a master after a successful join request
After master election, nodes send join requests to the elected master. Master is then responsible for publishing a new cluster state which sets the master on the local node's cluster state. If something goes wrong with the cluster state publishing, this process will not successfully complete. We should check it after the join request returns and if it failed, retry pinging.

Closes #6969
2014-08-27 15:47:38 +02:00
Boaz Leskes ffcf1077d8 [Discovery] join master after first election
Currently, pinging results are only used if the local node is elected master or if they detect another *already* active master. This has the effect that master election requires two pinging rounds - one for the elected master to take is role and another for the other nodes to detect it and join the cluster. We can be smarter and use the election of the first round on other nodes as well. Those nodes can try to join the elected master immediately. There is a catch though - the elected master node may still be processing the election and may reject the join request if not ready yet. To compensate a retry mechanism is introduced to try again (up to 3 times by default) if this happens.

Closes #6943
2014-08-27 15:47:38 +02:00
Boaz Leskes a40984887b [Tests] Fixed some issues with SlowClusterStateProcessing
Reduced expected time to heal to 0 (we interrupt and wait on stop disruption). It was also  wrongly indicated in seconds.
We didn't properly wait between slow cluster state tasks
2014-08-27 15:47:38 +02:00
Martijn van Groningen c2142c0f6d Discovery: Don't include local node to pingMasters list. We might end up electing ourselves without any form of verification. 2014-08-27 15:47:38 +02:00
Martijn van Groningen 5e38e9eb4f Discovery: Only add local node to possibleMasterNodes if it is a master node. 2014-08-27 15:47:37 +02:00
Martijn van Groningen 67685cb026 Discovery: If not enough possible masters are found, but there are masters to ping (ping responses did include master node) then these nodes should be resolved.
After the findMaster() call we try to connect to the node and if it isn't the master we start looking for a new master via pinging again.

Closes #6904
2014-08-27 15:47:37 +02:00
Boaz Leskes f029a24d53 [Store] migrate non-allocated shard deletion to use ClusterStateNonMasterUpdateTask 2014-08-27 15:47:37 +02:00
Boaz Leskes bebaf9799c [Tests] stability improvements
added explicit cleaning of temp unicast ping results
reduce gateway local.list_timeout to 10s.
testVerifyApiBlocksDuringPartition: verify master node has stepped down before restoring partition
2014-08-27 15:47:30 +02:00
Boaz Leskes ea2783787c [Tests] Introduced ClusterDiscoveryConfiguration
Closes #6890
2014-08-27 15:47:23 +02:00
Boaz Leskes ccabb4aa20 Remove unneeded reference to DiscoveryService which potentially causes circular references 2014-08-27 15:47:23 +02:00
Boaz Leskes 7fa3d7081b [logging] don't log an error if scheduled reroute is rejected because local node is no longer master
Since it runs in a background thread after a node is added, or submits a cluster state update when a node leaves, it may be that by the time it is executed the local node is no longer master.
2014-08-27 15:47:23 +02:00
Boaz Leskes e0543b3426 [Internal] Migrate new initial state cluster update task to a ClusterStateNonMasterUpdateTask 2014-08-27 15:47:23 +02:00
Boaz Leskes c12d0901f6 [Tests] Increase timeout when waiting for partitions to heal
the current 30s addition is tricky because we use 30s as timeout in many places...
2014-08-27 15:47:22 +02:00
Boaz Leskes 7b6e194923 [Tests] Don't log about restoring a partition if the partition is not active. 2014-08-27 15:47:22 +02:00
Boaz Leskes 522d4afe0c [Tests] Use local gateway
This is important to for proper primary allocation decisions
2014-08-27 15:47:22 +02:00
Boaz Leskes 3586e38c40 [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
2014-08-27 15:47:22 +02:00
Boaz Leskes 5302a53145 [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
2014-08-27 15:47:22 +02:00
Boaz Leskes 48c7da1fd4 [Test] testVerifyApiBlocksDuringPartition - wait for stable cluster after partition 2014-08-27 15:47:21 +02:00
Martijn van Groningen d99ca806cb [TEST] Properly clear the disruption schemes after test completed. 2014-08-27 15:47:21 +02:00
Boaz Leskes e897dccb52 [Tests] improved automatic disruption healing after tests 2014-08-27 15:47:21 +02:00
Boaz Leskes 5e5f8a9daf Added java docs to all tests in DiscoveryWithNetworkFailuresTests
Moved testVerifyApiBlocksDuringPartition to test blocks rather then rely on specific API rejections.
Did some cleaning while at it.
2014-08-27 15:47:21 +02:00
Martijn van Groningen 77dae631e1 [TEST] Make sure get request is always local 2014-08-27 15:47:20 +02:00
Martijn van Groningen 52f69c64f7 [TEST] Verify no master block during partition for read and write apis 2014-08-27 15:47:20 +02:00
Martijn van Groningen 98084c02ce [TEST] Added test to verify if 'discovery.zen.rejoin_on_master_gone' is updatable at runtime. 2014-08-27 15:47:20 +02:00
Boaz Leskes c3e84eb639 Fixed compilation issue caused by the lack of a thread pool name 2014-08-27 15:47:20 +02:00
Boaz Leskes 1af82fd96a [Tests] Disabling testAckedIndexing
The test is currently unstable and needs some more work
2014-08-27 15:47:20 +02:00
Boaz Leskes a7a61a0392 [Test] ensureStableCluster failed to pass viaNode parameter correctly
Also improved timeouts & logs
2014-08-27 15:47:19 +02:00
Martijn van Groningen f7b962a417 [TEST] Renamed afterDistribution timeout to expectedTimeToHeal
Accumulate expected shard failures to log later
2014-08-27 15:47:19 +02:00
Martijn van Groningen 785d0e55ab [TEST] Reduced failures in DiscoveryWithNetworkFailuresTests#testAckedIndexing test:
* waiting time should be long enough depending on the type of the disruption scheme
* MockTransportService#addUnresponsiveRule if remaining delay is smaller than 0 don't double execute transport logic
2014-08-27 15:47:19 +02:00
Martijn van Groningen 8aed9ee46f [TEST] Check if worker if null to prevent NPE on double stopping 2014-08-27 15:47:19 +02:00
Boaz Leskes 28489cee45 [Tests] Added ServiceDisruptionScheme(s) and testAckedIndexing
This commit adds the notion of ServiceDisruptionScheme allowing for introducing disruptions in our test cluster. This
abstraction as used in a couple of wrappers around the functionality offered by MockTransportService to simulate various
network partions. There is also one implementation for causing a node to be slow in processing cluster state updates.

This new mechnaism is integrated into existing tests DiscoveryWithNetworkFailuresTests.

A new test called testAckedIndexing is added to verify retrieval of documents whose indexing was acked during various disruptions.

Closes #6505
2014-08-27 15:47:14 +02:00
Boaz Leskes 5d13571dbe [Discovery] when master is gone, flush all pending cluster states
If the master FD flags master as gone while there are still pending cluster states, the processing of those cluster states we re-instate that node a master again.

Closes #6526
2014-08-27 15:47:13 +02:00
Boaz Leskes 8b85d97ea6 [Discovery] Improved logging when a join request is not executed because local node is no longer master 2014-08-27 15:47:09 +02:00
Boaz Leskes 7db9e98ee7 [Discovery] Change (Master|Nodes)FaultDetection's connect_on_network_disconnect default to false
The previous default was true, which means that after a node disconnected event we try to connect to it as an extra validation. This can result in slow detection of network partitions if the extra reconnect times out before failure.

Also added tests to verify the settings' behaviour
2014-08-27 15:47:05 +02:00
Boaz Leskes e39ac7eef4 [Test] testIsolateMasterAndVerifyClusterStateConsensus didn't wait on initializing shards before comparing cluster states 2014-08-27 15:46:51 +02:00
Martijn van Groningen f3d90cdb17 [TEST] Remove 'index.routing.allocation.total_shards_per_node' setting in data consistency test 2014-08-27 15:46:51 +02:00
Boaz Leskes 58f8774fa2 [Discovery] do not use versions to optimize cluster state copying for a first update from a new master
We have an optimization which compares routing/meta data version of cluster states and tries to reuse the current object if the versions are equal. This can cause rare failures during recovery from a minimum_master_node breach when using the "new light rejoin" mechanism and simulated network disconnects. This happens where the current master updates it's state, doesn't manage to broadcast it to other nodes due to the disconnect and then steps down. The new master will start with a previous version and continue to update it. When the old master rejoins, the versions of it's state can equal but the content is different.

Also improved DiscoveryWithNetworkFailuresTests to simulate this failure (and other improvements)

Closes #6466
2014-08-27 15:46:50 +02:00