Commit Graph

4988 Commits

Author SHA1 Message Date
Boaz Leskes ff8b7409f7 [Discovery] add a debug log if a node responds to a publish request after publishing timed out. 2014-08-27 15:47:41 +02:00
Martijn van Groningen 5932371f21 [TEST] Adapt testNoMasterActions since metadata isn't cleared if there is a no master block 2014-08-27 15:47:41 +02:00
Martijn van Groningen c8919e4bf5 [TEST] Changed action names. 2014-08-27 15:47:41 +02:00
Martijn van Groningen 702890e461 [TEST] Remove the forceful `network.mode` setting in DiscoveryWithServiceDisruptions#testMasterNodeGCs now local transport use worker threads. 2014-08-27 15:47:41 +02:00
Boaz Leskes 26d90882e5 [Transport] Introduced worker threads to prevent alien threads of entering a node.
Requests are handled by the worked thread pool of the target node instead of the generic thread pool of the source node.
Also this change is required in order to make GC disruption work with local transport. Previously the handling of the a request was performed on on a node that that was being GC disrupted, resulting in some actions being performed while GC was being simulated.
2014-08-27 15:47:40 +02:00
Martijn van Groningen 966a55d21c Typo: s/Recieved/Received 2014-08-27 15:47:40 +02:00
Martijn van Groningen 47326adb67 [TEST] Make sure all shards are allocated before killing a random data node. 2014-08-27 15:47:40 +02:00
Martijn van Groningen 403ebc9e07 [Discovery] Added cluster version and master node to the nodes fault detecting ping request
The cluster state version allows resolving the case where a old master node become unresponsive and later wakes up and pings all the nodes in the cluster, allowing the newly elected master to decide whether it should step down or ask the old master to rejoin.
2014-08-27 15:47:40 +02:00
Boaz Leskes 50f852ffeb [TEST] Added LongGCDisruption and a test simulating GC on master nodes
Also rename DiscoveryWithNetworkFailuresTests to DiscoveryWithServiceDisruptions which better suites what we do.
2014-08-27 15:47:40 +02:00
Martijn van Groningen 4b8456e954 [Discovery] Master fault detection and nodes fault detection should take cluster name into account.
Both master fault detection and nodes fault detection request should also send the cluster name, so that on the receiving side the handling of these requests can be failed with an error. This error can be caught on the sending side and for master fault detection the node can fail the master locally and for nodes fault detection the node can be failed.

Note this validation will most likely never fail in a production cluster, but in during automated tests where cluster / nodes are created and destroyed very frequently.
2014-08-27 15:47:39 +02:00
Martijn van Groningen 364374dd03 [TEST] Added test that verifies that no shard relocations happen during / after a master re-election. 2014-08-27 15:47:39 +02:00
Martijn van Groningen 130e680cfb [Discovery] Made the handeling of the join request batch oriented.
In large clusters when a new elected master is chosen, there are many join requests to handle. By batching them up the the cluster state doesn't get published for each individual join request, but many handled at the same time, which results into a single new cluster state which ends up be published.

Closes #6984
2014-08-27 15:47:39 +02:00
Shay Banon 0244ddb0cd retry logic to unwrap exception to check for illegal state
it probably comes wrapped in a remote exception, which we should unwrap in order to detect it..., also, simplified a bit the retry logic
2014-08-27 15:47:39 +02:00
Boaz Leskes cccd060a0c [Discovery] verify we have a master after a successful join request
After master election, nodes send join requests to the elected master. Master is then responsible for publishing a new cluster state which sets the master on the local node's cluster state. If something goes wrong with the cluster state publishing, this process will not successfully complete. We should check it after the join request returns and if it failed, retry pinging.

Closes #6969
2014-08-27 15:47:38 +02:00
Boaz Leskes ffcf1077d8 [Discovery] join master after first election
Currently, pinging results are only used if the local node is elected master or if they detect another *already* active master. This has the effect that master election requires two pinging rounds - one for the elected master to take is role and another for the other nodes to detect it and join the cluster. We can be smarter and use the election of the first round on other nodes as well. Those nodes can try to join the elected master immediately. There is a catch though - the elected master node may still be processing the election and may reject the join request if not ready yet. To compensate a retry mechanism is introduced to try again (up to 3 times by default) if this happens.

Closes #6943
2014-08-27 15:47:38 +02:00
Boaz Leskes a40984887b [Tests] Fixed some issues with SlowClusterStateProcessing
Reduced expected time to heal to 0 (we interrupt and wait on stop disruption). It was also  wrongly indicated in seconds.
We didn't properly wait between slow cluster state tasks
2014-08-27 15:47:38 +02:00
Martijn van Groningen c2142c0f6d Discovery: Don't include local node to pingMasters list. We might end up electing ourselves without any form of verification. 2014-08-27 15:47:38 +02:00
Martijn van Groningen 5e38e9eb4f Discovery: Only add local node to possibleMasterNodes if it is a master node. 2014-08-27 15:47:37 +02:00
Martijn van Groningen 67685cb026 Discovery: If not enough possible masters are found, but there are masters to ping (ping responses did include master node) then these nodes should be resolved.
After the findMaster() call we try to connect to the node and if it isn't the master we start looking for a new master via pinging again.

Closes #6904
2014-08-27 15:47:37 +02:00
Boaz Leskes f029a24d53 [Store] migrate non-allocated shard deletion to use ClusterStateNonMasterUpdateTask 2014-08-27 15:47:37 +02:00
Boaz Leskes bebaf9799c [Tests] stability improvements
added explicit cleaning of temp unicast ping results
reduce gateway local.list_timeout to 10s.
testVerifyApiBlocksDuringPartition: verify master node has stepped down before restoring partition
2014-08-27 15:47:30 +02:00
Boaz Leskes ea2783787c [Tests] Introduced ClusterDiscoveryConfiguration
Closes #6890
2014-08-27 15:47:23 +02:00
Boaz Leskes ccabb4aa20 Remove unneeded reference to DiscoveryService which potentially causes circular references 2014-08-27 15:47:23 +02:00
Boaz Leskes 7fa3d7081b [logging] don't log an error if scheduled reroute is rejected because local node is no longer master
Since it runs in a background thread after a node is added, or submits a cluster state update when a node leaves, it may be that by the time it is executed the local node is no longer master.
2014-08-27 15:47:23 +02:00
Boaz Leskes e0543b3426 [Internal] Migrate new initial state cluster update task to a ClusterStateNonMasterUpdateTask 2014-08-27 15:47:23 +02:00
Boaz Leskes c12d0901f6 [Tests] Increase timeout when waiting for partitions to heal
the current 30s addition is tricky because we use 30s as timeout in many places...
2014-08-27 15:47:22 +02:00
Boaz Leskes 7b6e194923 [Tests] Don't log about restoring a partition if the partition is not active. 2014-08-27 15:47:22 +02:00
Boaz Leskes 522d4afe0c [Tests] Use local gateway
This is important to for proper primary allocation decisions
2014-08-27 15:47:22 +02:00
Boaz Leskes 3586e38c40 [Discovery] Start master fault detection after pingInterval
This is to allow the master election to complete on the chosen master.

 Relates to #6706
2014-08-27 15:47:22 +02:00
Boaz Leskes 5302a53145 [Discovery] immediately start Master|Node fault detection pinging
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
2014-08-27 15:47:22 +02:00
Boaz Leskes 48c7da1fd4 [Test] testVerifyApiBlocksDuringPartition - wait for stable cluster after partition 2014-08-27 15:47:21 +02:00
Martijn van Groningen d99ca806cb [TEST] Properly clear the disruption schemes after test completed. 2014-08-27 15:47:21 +02:00
Boaz Leskes e897dccb52 [Tests] improved automatic disruption healing after tests 2014-08-27 15:47:21 +02:00
Boaz Leskes 5e5f8a9daf Added java docs to all tests in DiscoveryWithNetworkFailuresTests
Moved testVerifyApiBlocksDuringPartition to test blocks rather then rely on specific API rejections.
Did some cleaning while at it.
2014-08-27 15:47:21 +02:00
Martijn van Groningen 77dae631e1 [TEST] Make sure get request is always local 2014-08-27 15:47:20 +02:00
Martijn van Groningen 52f69c64f7 [TEST] Verify no master block during partition for read and write apis 2014-08-27 15:47:20 +02:00
Martijn van Groningen 98084c02ce [TEST] Added test to verify if 'discovery.zen.rejoin_on_master_gone' is updatable at runtime. 2014-08-27 15:47:20 +02:00
Boaz Leskes c3e84eb639 Fixed compilation issue caused by the lack of a thread pool name 2014-08-27 15:47:20 +02:00
Boaz Leskes 1af82fd96a [Tests] Disabling testAckedIndexing
The test is currently unstable and needs some more work
2014-08-27 15:47:20 +02:00
Boaz Leskes a7a61a0392 [Test] ensureStableCluster failed to pass viaNode parameter correctly
Also improved timeouts & logs
2014-08-27 15:47:19 +02:00
Martijn van Groningen f7b962a417 [TEST] Renamed afterDistribution timeout to expectedTimeToHeal
Accumulate expected shard failures to log later
2014-08-27 15:47:19 +02:00
Martijn van Groningen 785d0e55ab [TEST] Reduced failures in DiscoveryWithNetworkFailuresTests#testAckedIndexing test:
* waiting time should be long enough depending on the type of the disruption scheme
* MockTransportService#addUnresponsiveRule if remaining delay is smaller than 0 don't double execute transport logic
2014-08-27 15:47:19 +02:00
Martijn van Groningen 8aed9ee46f [TEST] Check if worker if null to prevent NPE on double stopping 2014-08-27 15:47:19 +02:00
Boaz Leskes 28489cee45 [Tests] Added ServiceDisruptionScheme(s) and testAckedIndexing
This commit adds the notion of ServiceDisruptionScheme allowing for introducing disruptions in our test cluster. This
abstraction as used in a couple of wrappers around the functionality offered by MockTransportService to simulate various
network partions. There is also one implementation for causing a node to be slow in processing cluster state updates.

This new mechnaism is integrated into existing tests DiscoveryWithNetworkFailuresTests.

A new test called testAckedIndexing is added to verify retrieval of documents whose indexing was acked during various disruptions.

Closes #6505
2014-08-27 15:47:14 +02:00
Boaz Leskes 5d13571dbe [Discovery] when master is gone, flush all pending cluster states
If the master FD flags master as gone while there are still pending cluster states, the processing of those cluster states we re-instate that node a master again.

Closes #6526
2014-08-27 15:47:13 +02:00
Boaz Leskes 8b85d97ea6 [Discovery] Improved logging when a join request is not executed because local node is no longer master 2014-08-27 15:47:09 +02:00
Boaz Leskes 7db9e98ee7 [Discovery] Change (Master|Nodes)FaultDetection's connect_on_network_disconnect default to false
The previous default was true, which means that after a node disconnected event we try to connect to it as an extra validation. This can result in slow detection of network partitions if the extra reconnect times out before failure.

Also added tests to verify the settings' behaviour
2014-08-27 15:47:05 +02:00
Boaz Leskes e39ac7eef4 [Test] testIsolateMasterAndVerifyClusterStateConsensus didn't wait on initializing shards before comparing cluster states 2014-08-27 15:46:51 +02:00
Martijn van Groningen f3d90cdb17 [TEST] Remove 'index.routing.allocation.total_shards_per_node' setting in data consistency test 2014-08-27 15:46:51 +02:00
Boaz Leskes 58f8774fa2 [Discovery] do not use versions to optimize cluster state copying for a first update from a new master
We have an optimization which compares routing/meta data version of cluster states and tries to reuse the current object if the versions are equal. This can cause rare failures during recovery from a minimum_master_node breach when using the "new light rejoin" mechanism and simulated network disconnects. This happens where the current master updates it's state, doesn't manage to broadcast it to other nodes due to the disconnect and then steps down. The new master will start with a previous version and continue to update it. When the old master rejoins, the versions of it's state can equal but the content is different.

Also improved DiscoveryWithNetworkFailuresTests to simulate this failure (and other improvements)

Closes #6466
2014-08-27 15:46:50 +02:00
Martijn van Groningen 1849d0966c [Discovery] Made 'discovery.zen.rejoin_on_master_gone' setting updatable at runtime. 2014-08-27 15:46:46 +02:00
Martijn van Groningen 424a2f68c6 [Discovery] Removed METADATA block 2014-08-27 15:46:39 +02:00
Martijn van Groningen 4828e78637 [TEST] Added test that exposes a shard consistency problem when isolated node(s) rejoin the cluster after network segmentation and when the elected master node ended up on the lesser side of the network segmentation. 2014-08-27 15:46:39 +02:00
Martijn van Groningen e7d24ecdd0 [TEST] Make sure there no initializing shards when network partition is simulated 2014-08-27 15:46:39 +02:00
Martijn van Groningen fc8ae4d30d [TEST] Added test that verifies data integrity during and after a simulated network split. 2014-08-27 15:46:39 +02:00
Martijn van Groningen 2c9ef63676 [TEST] It may take a little bit before the unlucky node deals with the fact the master left 2014-08-27 15:46:38 +02:00
Boaz Leskes d44bed5f48 [Internal] Do not execute cluster state changes if current node is no longer master
When a node steps down from being a master (because, for example, min_master_node is breached), it may still have
cluster state update tasks queued up. Most (but not all) are tasks that should no longer be executed as the node
no longer has authority to do so. Other cluster states updates, like electing the current node as master, should be
executed even if the current node is no longer master.

This commit make sure that, by default, `ClusterStateUpdateTask` is not executed if the node is no longer master. Tasks
that should run on non masters are changed to implement a new interface called `ClusterStateNonMasterUpdateTask`

Closes #6230
2014-08-27 15:46:38 +02:00
Boaz Leskes a9aa10ade0 Updated to use ClusterBlocks new constructor signature
Introduced with: 11a3201a09
2014-08-27 15:46:27 +02:00
Martijn van Groningen 2220c66535 [Discovery] Eagerly clean the routing table of shards that exist on nodes that are not in the latestDiscoNodes list.
Only the previous master node has been removed, so only shards allocated to that node will get failed.
This would have happened anyhow on later on when AllocationService#reroute is invoked (for example when a cluster setting changes or another cluster event),
but by cleaning the routing table pro-actively, the stale routing table is fixed sooner and therefor the shards
that are not accessible anyhow (because the node these shards were on has left the cluster) will get re-assigned sooner.
2014-08-27 15:46:23 +02:00
Martijn van Groningen 89a50f6013 [Discovery] If available newly elected master node should take over previous known nodes. 2014-08-27 15:46:23 +02:00
Martijn van Groningen 549076eb4c [Discovery] Changed the default for the 'rejoin_on_master_gone' option from false to true in zen discovery.
Added AwaitFix for the FullRollingRestartTests.
2014-08-27 15:46:14 +02:00
Martijn van Groningen 3cdbb1a79d [Discovery] Enable `discovery.zen.rejoin_on_master_gone` setting in DiscoveryWithNetworkFailuresTests only. 2014-08-27 15:46:10 +02:00
Martijn van Groningen 97bdc8f5a2 [Discovery] Make noMasterBlock configurable and added simple test that shows reads do execute (partially) when m_m_n isn't met 2014-08-27 15:45:34 +02:00
Shay Banon 6ede83ab45 [Discovery] add rejoin on master gone flag, defaults to false
defaults to false since there is still work left to properly make it work
2014-08-27 15:45:25 +02:00
Shay Banon 4824f05369 [Internal] make no master lock an instance var so it can be configured 2014-08-27 15:45:10 +02:00
Shay Banon 63d0406b67 [Discovery] lightweight minimum master node recovery
don't perform full recovery when minimum master nodes are not met, keep the state around and use it once elected as master
2014-08-27 15:45:02 +02:00
Lee Hinman eaf392163c Add translog checksums
Switches TranslogStreams to check a header in the file to determine the
translog format, delegating to the version-specific stream.

Version 1 of the translog format writes a header using Lucene's
CodecUtil at the beginning of the file and appends a checksum for each
translog operation written.

Also refactors much of the translog operations, such as merging
.hasNext() and .next() in FsChannelSnapshot

Relates to #6554
2014-08-27 15:18:17 +02:00
Adrien Grand b745b0151c Fielddata: Remove soft/resident caches.
These caches have no advantage compared to the default node cache. Additionally,
the soft cache makes use of soft references which make fielddata loading quite
unpredictable in addition to pushing more pressure on the garbage collector.

The `none` cache is still there because of tests. There is no other good
reason to use it.

LongFieldDataBenchmark has been removed because the refactoring exposed a
compilation error in this class, which seems to not having been working for a
long time. In addition it's not as much useful now that we are progressively
moving more fields to doc values.

Close #7443
2014-08-27 14:28:41 +02:00
Britta Weber 238efe505b bool query: parser should return match_all in case there are no clauses
This also fixes has_parent filters with a nested empty bool filter
(see test SimpleChildQuerySearchTests#test6722, the test should actually expect
either 0 results when searching for has_parent "test" or one result when
search for has_parent "foo")

closes #7240
closes #7347
2014-08-27 14:07:21 +02:00
Britta Weber a92300c5b5 explain score: fix explanation streaming
Complex explanations were always read as Explanations. Depending
on if the response was streamed or not the explanation was
therefore generated by a ComplexExplanation or by a regular
Explanation.

closes #7257
2014-08-27 14:07:20 +02:00
javanna 92ae3c84fe Index templates: Made template filtering generic and extensible via plugins
Added the ability to register template filters that are being applied when a new index is created. The default filter that checks whether the template pattern matches the index name always runs first, additional filters can also be registered so that templates can be filtered out based on custom logic.

Took the chance to add the handy source(Object... source) method to PutIndexTemplateRequest and corresponding builder

Closes #7459
Closes #7454
2014-08-27 12:37:36 +02:00
Simon Willnauer e4b7395026 [TEST] only bump replicas if we have enough nodes in the cluster 2014-08-27 12:14:45 +02:00
Colin Goodheart-Smithe 6797d73d7e [TEST] removed AwaitsFix, added checks to make sure indexed scripts are put correctly 2014-08-27 11:04:51 +01:00
Brian Murphy 6109ec36b5 Indexed Scripts : Change preference and thread option for GetRequest.
This change forces the GetRequest when a script is being loaded from an index
to use preference("_local") and threaded(false) to prevent the script service from
forking for GetRequests.
2014-08-27 10:45:53 +01:00
Simon Willnauer 5453c08f50 Use physical name to compare files from snapshot metadata
The comparison and read code in the BlobStoreIndexShardRepository
used the physicalName and Name in reverse order. This caused
SnapshotBackwardsCompatibilityTest to fail.

This reverts commit 636af40da1
2014-08-27 10:47:19 +02:00
Cristiano Fontes ee46c3cd3f Mappings: Added support for empty field arrays
Close #7271
2014-08-27 10:17:05 +02:00
Martijn van Groningen b6cdb1d8fb Parent/child: Add missing support for the field data loading option to the `_parent` field.
Closes #7394
Closes #7402
2014-08-27 09:04:42 +02:00
Martijn van Groningen d414d89c62 Parent/child: If _parent field points to a non existing parent type, then skip the has_parent query/filter
Closes #7362
Closes #7349
2014-08-27 09:00:51 +02:00
Boaz Leskes 8a94044b69 [Test] testLargeClusterStatePublishing - bound max shard no to number of nodes and set replica count to 0
ensureGreen sometimes times out due to too many shards and GC kicking in
2014-08-27 08:34:19 +02:00
Ryan Ernst 1804f864d5 Internal: Add all unsafe variants of LZF compress library functions to forbidden APIs.
The "optimized" encoders/decoders have been unreliable and error prone.
Also, fix LZFCompressor.compress to use LZFEncoder.safeEncode, which
creates a new safe encoder, instead of using a shared encoder (which
is not threadsafe).

closes #7468
2014-08-26 20:17:07 -07:00
Ryan Ernst c94c13fa26 Revert part of change in #7466 to fix issue because encoder is not threadsafe so cannot be shared 2014-08-26 14:04:59 -07:00
Ryan Ernst d79c79c7d0 Internal: Add LZF safe encoder in LZFCompressor
Selecting the safe encoder fixes a 64bit JVM crash on big-endian architectures with
LZF UnsafeChunkEncoderBE.

closes #7466
2014-08-26 13:38:03 -07:00
Adrien Grand 636af40da1 Tests: Temporarily ignore SnapshotBackwardsCompatibilityTest 2014-08-26 18:13:36 +02:00
Adrien Grand 7623c5e401 Tests: Fix FileBasedMappingsTests by using the mappings API instead of field mappings. 2014-08-26 17:54:11 +02:00
Boaz Leskes 35b98f5c24 [Test] rewrite testNoMasterActions to use latest tooling
The test's timeout checks were thrown off by a client created randomly (when the timer was running).

Closes #7432
2014-08-26 17:48:24 +02:00
Britta Weber b754d2b36b Test: mute test until we know what is going on 2014-08-26 15:42:24 +02:00
Simon Willnauer c63626b537 [SNAPSHOT] Add BWC layer to .si / segments_N hashing
Due to additional safety added in #7351 we compute now a strong hash for
.si and segments_N files which are compared during snapshot / restore.
Old snapshots don't have this hash which can cause unnecessary copying
of large amount of data. This commit adds the ability to fetch this
hash from the blob store if needed.

Closes #7434
2014-08-26 15:36:46 +02:00
Simon Willnauer 0676869e6d [ENGINE] Wait until engine is started up when acquireing searcher
Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.

Closes #7455
2014-08-26 14:07:04 +02:00
Britta Weber d7b8d1728e _all: report conflict on merge and throw exception on doc_values
- _all field was never merged when mapping was updated and no conflict reported
- _all accepted doc_values format although it is always tokenized

relates to #777
closes #7377
2014-08-26 12:14:31 +02:00
mikemccand 075bd66713 Core: use Java's built-in ConcurrentHashMap
It's risky to have our own snapshot of Java 8's ConcurrentHashMap:
unless we keep the sources in sync over time (and OpenJDK's version
had already diverged), then we won't get bug/performance fixes.  Users
can choose to upgrade to Java 8 to see the improvements of CHM.

Closes #7392

Closes #7296
2014-08-26 06:11:05 -04:00
Adrien Grand b43c2ced93 [TESTS] Temporary disable field data cache randomization. 2014-08-25 23:12:09 +02:00
mikemccand 783a9cbb18 Stats: add segments.index_writer_max_memory to see index writer's max RAM usage before buffered documents must be written to a new segment
Closes #7438

Closes #7440
2014-08-25 14:43:09 -04:00
Nik Everett 74287865b2 [Internal] discovery.id.seed is ignored
Closes #7439, Closes #7437
2014-08-25 17:32:07 +02:00
javanna 3917ffc0ff [TEST] Explicitly clean up actions to be intercepted in IndicesRequestTests before asserting on collected requests
This helps making sure that no further requests are collected once we start asserting on them
2014-08-25 17:24:53 +02:00
Lee Hinman 1f7be7931b [TEST] fix issue clearing fielddata breaker introduced in 6950c38a04 2014-08-25 16:25:02 +02:00
Adrien Grand 2a67b129e2 [TESTS] Temporarily disable FileBasedMappingsTests. 2014-08-25 12:39:36 +02:00
markharwood 570c679420 Context suggester: infinite loop in GeolocationContextMapping
Close #7433
2014-08-25 11:56:39 +02:00
Simon Willnauer 6950c38a04 Tests: Improve test coverage.
Close #7428
2014-08-25 11:56:38 +02:00
Alexander Reelsen 49f0f0bb5d Test: Fixed pluggable transport module test to support transportclient
Also made sure, that only a change of requests is tested for and not
an initial value, which might not be set in case of a node client.
2014-08-25 10:36:04 +02:00
Martijn van Groningen bd0b68080b Nested: If the `_type` field isn't indexed nested docs must be filtered out. 2014-08-25 00:09:21 +02:00
Martijn van Groningen d471abe4d3 [TEST] Agg may not be a instance of StringTerms, but UnmappedTerms, so use common Terms class instead 2014-08-25 00:07:19 +02:00
Simon Willnauer 24e3c41afa [TEST] use more verbose assertion in IndicesRequestTests 2014-08-24 21:23:46 +02:00
Boaz Leskes e16a461317 [Tests] testNodeVersionIsUpdated stopped but didn't close it's node 2014-08-24 19:34:02 +02:00
Boaz Leskes 562fe1ddaf [Tests] NoMasterNodeTests make timeout checks less sensitive
Also remove catching of MasterNotDiscoveredException in bulk operation it is only set on a per item basis
2014-08-23 22:04:22 +02:00
javanna 00fc54c2ae Internal: made original indices optional for broadcast delete and delete by query shard requests
Shard requests like broadcast delete and delete by query, that needs to be executed on primary and all replicas, get read and written out to the transport on the same node. That means that if we add some field version checks are not enough to maintain bw comp since a newer node that holds the primary might receive the request from an older node, that didn't provide the field. Yet, when writing the request out again to a newer node that holds the replica, we do try and serialize the field although it's missing. The newer fields just needs to be set to optional in these cases, in addition to the version checks.

Re-enabled testDeleteByQuery and testDeleteRoutingRequired bw comp tests since this was the cause of their failures.

Closes #7406
2014-08-23 17:01:33 +02:00
Simon Willnauer 5f188d29fa [TEST] use CFS consistently to not trigger single segment merge without force flag 2014-08-23 16:41:46 +02:00
Boaz Leskes 06fb9ff761 [Tests] verifyThreadNames should account for new threads of shared cluster
The verifyThreadNames starts a node and checks that all new threads on the JVM are properly named. The current test uses the name of the new node which sometimes fails because our shared cluster spawns a new thread which is properly named but for not for the new name.

The commits relaxes the requirement of the test and on verify the threads are properly named (but not necessarily of the new node)
2014-08-23 14:45:08 +02:00
Simon Willnauer 45f062792c [TEST] use a default host name if localAddress is not available
Closes #7409
2014-08-23 13:47:11 +02:00
Simon Willnauer 805f042293 Add toString() method to Segment.java for debugging purposes 2014-08-23 11:17:14 +02:00
Simon Willnauer fdf1998f39 [ENGINE] Force optimize was not passed to shard request
The force flag to trigger optimiz calls of a single segment for upgrading
etc. was never passed on to the shard request.

Closes #7404
2014-08-22 15:39:04 +02:00
Alex Ksikes e78694ae82 More Like This Query: defaults to all possible fields for items
Items with no specified field now defaults to all the possible fields from the
document source. Previously, we had required 'fields' to be specified either
as a top level parameter or for each item. The default behavior is now similar
to the MLT API.

Closes #7382
2014-08-22 15:07:22 +02:00
Adrien Grand a1a9aadab5 [DOCS] Document the contracts of the RootMapper API.
Close #7400
2014-08-22 14:44:28 +02:00
javanna f4168a6382 Internal: move index templates api back to indices category and make put template and create index implement IndicesRequest
Closes #7378
2014-08-22 10:18:36 +02:00
javanna 9a14b3ce6f [TEST] copied delete bw comp tests to usual intergration tests
Added AwaitsFix to testDeletebyQuery and testDeleteRoutingRequired while checking if they fail as usual integration tests.
2014-08-22 10:12:17 +02:00
Martijn van Groningen 0196377190 [TEST] Muted tests 2014-08-22 09:45:56 +02:00
Simon Willnauer 3b51342515 Use empty BytesRef if we read from <= 1.4.0 2014-08-21 22:13:06 +02:00
Shay Banon ffcc78ca04 Add back string op type to IndexRequest
This was removed by accident I think, and it breaks backward comp. on the Java API in minor 1.3 version
closes #7387
2014-08-21 12:04:09 -07:00
Igor Motov 80887e8113 [TEST] Trigger random flushes while snapshot is created 2014-08-21 12:48:38 -04:00
Simon Willnauer 058a02b7aa [STORE] Improve recovery / snapshot restoring file identity handling
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.

Closes #7351
2014-08-21 18:00:41 +02:00
Simon Willnauer 4c1bc3ae4f [STORE] Remove unnecessary deduplication 2014-08-21 17:50:04 +02:00
Simon Willnauer 4d3f761d3d [STORE] Ignore segments.gen on metadata snapshots
The segments.gen file is optional and might even change while we
read it. It's safer to just ignore that file in the snapshot instead.
2014-08-21 17:50:04 +02:00
Shay Banon 4af1a29057 [TEST] filter out keep alive timer thread name
Keep-Alive-Timer is an internal Java thread that might be started, make sure to filter it out
2014-08-21 08:40:43 -07:00
Shay Banon 39a64cf4dd [TEST] only reset clients on nightly tests
resetting the clients on each test (in after test) makes the tests running, especially in network mode, much slower, since transport client needs to be created each time when randmized to be used. Also, on OSX, the excessive connections causes bind exceptions eventually which makes running the network tests much harder on it.
closes #7329
2014-08-21 07:34:26 -07:00
Britta Weber ab9e33e38d _ttl: Report conflict when trying to disable _ttl
_ttl could never be disabled once it was enabled.
But when trying to, no conflict was reported.

relates to #777 and #7293

closes #7316
2014-08-21 16:16:08 +02:00
Simon Willnauer 99ef3408fb [STORE] Allow to get metadata from arbitrary commit points
Today we always use the latest commit point to return the metadata from
the store. This might cause problems for snapshot and restore since in
contrast to recovery it won't prevent concurrent flushes (lucene commits).
This can lead to all kinds of interesting effects if we are snapshotting
while flushing. This change uses the IndexCommit to open the metadata snapshot
from the store which is consistent with what we snapshot.

Closes #7376
2014-08-21 16:09:12 +02:00
Colin Goodheart-Smithe 8550b9e84b Aggregations: Fixes pre and post offset serialisation for histogram aggs
Changes the serialisation of pre and post offset to use Long instead of VLong so that negative values are supported.  This actually only showed up in the case where minDocCount=0 as the rounding is only serialised in this case.

Closes #7312
2014-08-21 14:19:53 +01:00
javanna f956920acc Internal: make sure that multi_search request hands over its context and headers to its corresponding search requests
Closes #7374
2014-08-21 15:09:27 +02:00
javanna b6cdaff30c Internal: make sure that multi_percolate request hands over its context and headers to its corresponding shard requests
Closes #7371
2014-08-21 13:45:11 +02:00
Martijn van Groningen 9dd3597f1f [TEST] Sort by the _id field instead of _uid field and also assert the sort value. 2014-08-21 13:30:09 +02:00
javanna c89f941ffa [TEST] added debug lines to bw comp testDeleteByQuery and testDeleteRoutingRequired 2014-08-21 13:11:05 +02:00
Alex Ksikes f1a6b4e9fe More Like This Query: Switch to using the multi-termvectors API
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014
2014-08-21 12:18:21 +02:00
Simon Willnauer c4bed91262 [PARSER] Clarify XContentParser/Builder interface for binary vs. utf8 values
Today we have very confusing naming since some methods names claim to
read binary but in fact read utf-16 and convert to utf-8 under the hood.
This commit clarifies the interfaces and adds additional documentation.

Closes #7367
2014-08-21 11:46:50 +02:00
Adrien Grand b5b1960a2b Internal: Remove CacheRecycler.
The main consumer of this API was the faceting module. Now that it's gone,
let's remove CacheRecycler as well.

Close #7366
2014-08-21 11:21:04 +02:00
javanna 269a6dfb40 [TEST] bw comp testMultiGet should wait for yellow, not for green 2014-08-21 11:14:48 +02:00
javanna 5709a11d23 [TEST] fixed concurrency issue in IndicesRequestTests 2014-08-21 10:43:58 +02:00
Adrien Grand ea96359d82 Facets: Removal from master.
Close #7337
2014-08-21 10:34:39 +02:00
Adrien Grand ded30e95de Aggregations: Remove the logic to optionally sort/dedup values on the fly.
These options are not used anymore. Instead numeric values can now contain dups
and it is the responsibility of the aggregation to deal with it (eg. terms).
And otherwise all values sources are now sorted, which is the contract of the
interfaces that they implement.

Close #7276
2014-08-21 10:25:50 +02:00
Alex Ksikes 62ef4a30dc Term vector API: return 'found: false' for docs between index and refresh
Closes #7121
2014-08-21 09:58:49 +02:00
Igor Motov 150df5f1c5 [TEST] Improve robustness of restoreIndexWithMissingShards test 2014-08-20 21:11:04 -04:00
Shay Banon 9dc4f3861a Query Cache: Add hit and miss count
closes #7355
2014-08-20 14:39:16 -07:00
Shay Banon 2f3a041070 NPE in ShardStats when routing entry is not set yet on IndexShard
closes #7356
2014-08-20 12:48:52 -07:00
javanna abdbfe768b Internal: adjusted internal requests visibility from public to package private (redo)
was just reverted by mistake in the failed attempt of isolating the change and taking it out of #7319
2014-08-20 21:12:37 +02:00
javanna 441c1c8268 Internal: make sure that all shard level requests hold the original indices
A request that relates to indices (`IndicesRequest` or `CompositeIndicesRequest`) might be converted to some other internal request(s) (e.g. shard level request) that get distributed over the cluster. Those requests contain the concrete index they refer to, but it is not known which indices (or aliases or expressions) the original request related to.

This commit makes sure that the original indices are available as part of the shard level requests and makes them implement `IndicesRequest` as well.

Also every internal request should be created passing in the original request, so that the original headers, together with the eventual original indices and options, get copied to it. Corrected some places where this information was lost.

NOTE: As for the bulk api and other multi items api (e.g. multi_get), their shard level requests won't keep around the whole set of original indices, but only the ones that related to the bulk items sent to each shard, the important bit is that we keep the original names though, not only the concrete ones.

Closes #7319
2014-08-20 21:05:01 +02:00
Colin Goodheart-Smithe 0234b5b9b4 fix to compile issue caused by scripted metric aggregation change 2014-08-20 19:25:57 +01:00
Colin Goodheart-Smithe 7f943f0296 Aggregations: Scriptable Metrics Aggregation
A metrics aggregation which runs specified scripts at the init, collect, combine, and reduce phases

Closes #5923
2014-08-20 18:17:27 +01:00
Shay Banon 3a52296358 Warmer (search) to support query cache
allow for search based warmer to support query cache flag on the search request, and use the index level query caching flag if set.
closes #7326
2014-08-20 09:31:29 -07:00
Colin Goodheart-Smithe f7ae4d9d86 Geo: fixes circle radius calculation
This change fixes the creation circle shapes o it calculates it correctly instead of essentially using the diameter as the radius.  The radius has to be converted into degrees but calculating the ratio of the desired radius to the circumference of the earth and then multiplying it by 360 (number of degrees around the earths circumference).  This issue here was that it was only multiplied by 180 making the result out by a factor of 2.  Also made the test for circles actually check to make sure it has the correct centre and radius.

Closes #7301
2014-08-20 16:23:21 +01:00
uboness e2311d5da4 opened up getting the template name from DeleteIndexTemplateRequest 2014-08-20 07:11:15 -07:00
javanna 61da463dd0 Internal: adjusted internal requests visibility from public to package private 2014-08-20 12:11:57 +02:00
javanna 3450e82855 [TEST] UnicastBackwardsCompatibilityTest should not copy internal node settings to external nodes
Recent test failures triggered by #7289 were caused by this, simply because internal node settings (transport type key) that are not supported by the external older nodes were copied to them by mistake.
2014-08-20 12:01:41 +02:00