Commit Graph

8718 Commits

Author SHA1 Message Date
Boaz Leskes 8865e60e93 [Transport] possible NPE during shutdown for requests using timeouts
Closes #6849
2014-07-14 10:52:29 +02:00
Simon Willnauer 86bc79202d [ENGINE] Mark store as corrupted before sending failed shard
We have to mark a shard as corrupted if necessary before the
shard failed event is fired ie. before we call the corresponding
listener in the engine. Otherwise the shard might be re-allocated
on the same node and just started up without being marked as corrupted.

Relates to #5924
2014-07-14 10:14:58 +02:00
Simon Willnauer e8ff007852 [RECOVERY] Increment Store refcount on RecoveryTarget
We should make sure we have incremented the store refcount
before we start the recovery on the recovyer target.

Closes #6844
2014-07-14 09:18:25 +02:00
Boaz Leskes ab11c6821d [Test] one more tweak to testLocalNodeMasterListenerCallbacks 2014-07-13 17:59:45 +02:00
Boaz Leskes c3e842e363 [Test] renamed testListenerCallbacks to testLocalNodeMasterListenerCallbacks
Also clean up internal variable namings and fixed usage of wrong setting causing last node to not use the min_master_node settings.
2014-07-13 17:51:01 +02:00
Martijn van Groningen af38b9f7ba Core: Added missing return statements.
Closes #6841
2014-07-13 15:53:05 +02:00
Igor Motov 60b317caa4 Snapshot/Restore: Add ability to restore indices without their aliases
Closes #6457
2014-07-13 17:52:41 +09:00
Shay Banon f7a88fdd3e [TEST] wait for green before deleting mapping 2014-07-13 17:21:26 +09:00
Shay Banon fb6d847aac [TEST] wait for green before deleting mapping 2014-07-13 17:17:14 +09:00
Boaz Leskes 5e3742762a [Test] testHostOnMessages - only decrease latch after setting transport addresses 2014-07-12 09:11:27 +02:00
Florian Hopf 3689f67a76 Docs: Fixed invalid word count in geodistance agg doc
Closes #6838
2014-07-11 18:35:36 +02:00
Martijn van Groningen 05ca763b10 [TEST] Ensure that one node is part of the cluster. 2014-07-11 17:51:35 +02:00
Martijn van Groningen 6547ff3eb0 Print trace log if not enough master nodes could be found. 2014-07-11 17:42:11 +02:00
uboness 25a21c6a01 Cleanup of the transport request/response messages
Now both TransportRequest and TransportResponse inherit from a base TransportMessage that holds the message headers and also now added the remote transport address (where this message came from).
2014-07-11 16:41:01 +02:00
Boaz Leskes c4c0270c52 [Tests] Enhance ZenUnicastDiscoveryTest
This started out as a simple correction to a missing setting problem, but go bigger into more general work on the ZenUnicastDiscoveryTets suite. It now works with both network and local mode. I also merge the different ZenUnicast test suites into a single place.

Closes #6835
2014-07-11 16:37:52 +02:00
mikemccand 6c78147f5f Docs: remove orphan comma 2014-07-11 08:26:08 -04:00
Britta Weber 6d8fff65dc Throw exception if function in function score query is null
closes #6292 #6784
2014-07-11 13:57:11 +02:00
mikemccand b4e80999a7 Docs: fix merge docs to match the code (the max_thread_count default is 'aggressive' (favor SSDs)) 2014-07-11 07:00:57 -04:00
Shay Banon 43a5cbe9be Only use IndexShard instance to lookup recovery status
make sure we use the instance itself to look it up, and not the shard id, as we might get another instance
leftover from #6825
2014-07-11 11:38:36 +02:00
Boaz Leskes f480969503 [Gateway] set a default of 5m to `recover_after_time` when any to the `expected*Nodes` is set
The `recovery_after_time` tells the gateway to wait before starting recovery from disk. The goal here is to allow for more nodes to join the cluster and thus not start potentially unneeded replications. The `expectedNodes` setting (and friends) tells the gateway when it can start recovering even if the `recover_after_time` has not yet elapsed. However, `expectedNodes` is useless if one doesn't set `recovery_after_time`. This commit changes that by setting a sensible default of 5m for `recover_after_time` *if* a `expectedNodes` setting is present.

Closes #6742
2014-07-11 11:28:45 +02:00
Alex Ksikes af4eee594c More Like This: ensures selection of best terms is indeed O(n)
Previously the size of the priority queue was wrongly set to the total number
of terms. Instead, it should be set to 'maxQueryTerms'. This makes the
selection of best terms O(n), instead of O(n*log(n)).

Jira patch: https://issues.apache.org/jira/browse/LUCENE-5795

Closes #6657
2014-07-11 11:14:31 +02:00
Shay Banon 01ca81e2a3 Improve handling of failed primary replica handling
Out of #6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior
The change also expose a problem in our handling of replica shards that stay initializing during primary failure and electing another replica shard as primary, where we need to cancel its ongoing recovery to make sure it re-starts from the new elected primary
closes #6825
2014-07-11 10:51:59 +02:00
Simon Willnauer a84777e990 [TEST] Fix CorruptedFileTest to always corrupt the latest delete generation if a .del file is picked 2014-07-11 10:22:11 +02:00
Simon Willnauer 35a52cd04a [TEST] Temporarily don't corrupt .del files since they are generational and we might pick the wrong one 2014-07-11 08:41:38 +02:00
Boaz Leskes 8f0a4ed390 [Test] testCorruptionOnNetworkLayer had a typo in test name. 2014-07-11 08:30:46 +02:00
Simon Willnauer bb964e7817 Revert "Improve handling of failed primary replica handling"
This reverts commit 75ed24f6b6.
2014-07-10 21:30:15 +02:00
Lee Hinman 107534c062 Do not ignore ConnectTransportException for shard replication operations
A ConnectTransportException should fail the replica shard

Closes #6183
2014-07-10 18:49:05 +02:00
Shay Banon 75ed24f6b6 Improve handling of failed primary replica handling
Out of #6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior
closes #6816
2014-07-10 18:30:18 +02:00
Simon Willnauer 4f131dfffb [TEST] Fold SuggestActionTest into SuggestSearchTests
Instead of running the tests twice this commit just randomizes the API
that we use to return the suggestions.
2014-07-10 18:02:10 +02:00
Colin Goodheart-Smithe 0e5f9898d1 Aggregations: DateHistogramBuilder accepts String preOffset and postOffset
This is what DateHistogramParser expects so will enable the builder to build valid requests using these variables.
Also added tests for preOffset and postOffset since these tests did not exist

Closes #5586
2014-07-10 16:38:09 +01:00
Simon Willnauer 0a988ad8f7 [STORE] Treat reading past EOF as an corrupted index when we fail to read segment infos 2014-07-10 17:25:47 +02:00
Simon Willnauer 81e86eba6e [TEST] Wait for longer on slow nodes until replicating has kicked in 2014-07-10 16:52:51 +02:00
javanna eddb378bae [TEST] added ability to provide settings for external nodes in backwards compatibility tests
Closes #6809
2014-07-10 16:45:16 +02:00
Simon Willnauer e7c67bf03b [TEST] Do RollingUpgrade in BWC tests 2014-07-10 16:24:01 +02:00
Simon Willnauer 62002e8192 [TEST] Close TransportClient after it's used in BulkProcessorTests otherwise it will leave threads behind 2014-07-10 16:06:38 +02:00
Simon Willnauer da148ca8b8 [TEST] Subclass ElasticsearchTestCase in LoggingConfigurationTests 2014-07-10 15:28:38 +02:00
Simon Willnauer 72e6150bc1 [STORE]: Make use of Lucene build-in checksums
Since Lucene version 4.8 each file has a checksum written as it's
footer. We used to calculate the checksums for all files transparently
on the filesystem layer (Directory / Store) which is now not necessary
anymore. This commit makes use of the new checksums in a backwards
compatible way such that files written with the old checksum mechanism
are still compared against the corresponding Alder32 checksum while
newer files are compared against the Lucene build in CRC32 checksum.

Since now every written file is checksummed by default this commit
also verifies the checksum for files during recovery and restore if
applicable.

Closes #5924

This commit also has a fix for #6808 since the added tests in
`CorruptedFileTest.java` exposed the issue.

Closes #6808
2014-07-10 15:04:00 +02:00
Shay Banon 9ca5e6e3e1 Add local node to cluster state
Today, the tribe node needs the local node so it adds it when it starts, but other APIs would benefit from adding the local node, also, adding the local node should be done in a cleaner manner, where it belongs, which is right after the discovery service starts in the cluster service
closes #6811
2014-07-10 14:49:52 +02:00
Iulia Pasov eed3513c37 Docs: Update plugins.asciidoc to fix typo
Changed the name of the European Environment Agency (from European Environmental Agency)

Closes #6807
2014-07-10 14:04:26 +02:00
Simon Willnauer c9266e8b6b [TEST] Wait for primary allocations before restart
This commit ensures that all primaries are allocated before we
restart the node. If one primary is in post recovery when we
restart it will not be allocated otherwise.
2014-07-10 11:54:24 +02:00
Karel Minarik 4ddec99703 [DOC] Added comprehensive documentation for the Ruby and Rails integrations 2014-07-10 11:21:27 +02:00
Simon Willnauer 154bd0309c [DOCS] Fix typo in reference 2014-07-10 08:47:18 +02:00
Simon Willnauer fcadab869d [TEST] SuppressSysoutChecks on ElasticsearchTokenStreamTestCase 2014-07-10 07:48:12 +02:00
uboness c324103cbb added a fix to the PluginManagerTests to create config & bin dirs if they don't exist 2014-07-10 00:50:11 +02:00
Guillaume Nodet 263819c674 [ENV] Release node env if initialization fails
If the node initialisation fails, make sure the
node environment is closed correctly and thus
all locks (on data directories) being properly released.

Closes #6715
2014-07-10 00:14:52 +02:00
Simon Willnauer d82a434d10 [STORE] Make a hybrid directory default using `mmapfs` and `niofs`
`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.

Closes #6636
2014-07-10 00:01:43 +02:00
Simon Willnauer b69fa52588 [TEST] Mute PluginManagerTests#testLocalPluginInstallWithBinAndConfig 2014-07-10 00:00:41 +02:00
Simon Willnauer 9e4d738d7e [TEST] SuppressSysoutChecks on ElasticsearchPostingsFormatTest 2014-07-09 23:24:31 +02:00
uboness 6dae32b09a Added a check on moving bin & config plugin dirs
Plugins can contain bin & config sub-dirs that are copied to es's bin & config directories. If moving these directories fails we now throw an error.
2014-07-09 23:05:12 +02:00
Shay Banon 808c52706a [TEST] relax size test, to not run into OOM 2014-07-09 23:03:06 +02:00