Commit Graph

8559 Commits

Author SHA1 Message Date
Boaz Leskes 8909a77724 [Discovery] Handle ConnectionTransportException during a Master/Node fault detection ping
Both the Master and Node fault detection register themselves to be notified when a node disconnects to be able to respond to it accordingly. As such, when a ConnectionTransportException was raised on a ping request, it was not handled as it is already handled somewhere else. However, this does introduce a racing condition, if the disconnect  happen during a period where there is no current master (minimum_master_node breach) at which time the fault detection is not active. In this case, we will only discover the disconnect error during the ping request, so we have to respond accordingly.

Closes #6686
2014-07-02 20:49:48 +02:00
Simon Willnauer 3b959706b3 [TEST] Take compatibility version into account for XContentType
randomization

We randomize the XContentType to test deriving the content type on all
APIs. Yet, BWC tests run against versions where CBOR wasn't around
this commit ensures we don't use CBOR when compatibility version is
less than `1.2.0`

Closes #6691
2014-07-02 20:06:03 +02:00
Martijn van Groningen 0ccc4c7c05 [TEST] Also wait for fields to have been applied in the mapping in cluster state during teh waitForConcreteMappingsOnAll call
The concrete DocMapper on the master will be updated before the mapping in the cluster state. The DocMapper is updated during the cluster update task. This can lead to occasional assertion failures on the mapping response, because that is based on the mapping the cluster state, which may not yet have been updated. (time window between the DocMapping is updated, but the mapping in the cluster state isn't)
2014-07-02 17:35:35 +02:00
Shay Banon ccd54dae2d better logic on sending mapping update new type introduction
when an indexing request introduces a new mapping, today we rely on the parsing logic to mark it as modified on the "first" parsing phase. This can cause sending of mapping updates to master even when the mapping has been introduced in the create index/put mapping case, and can cause sending mapping updates without needing to.
 This bubbled up in the disabled field data format test, where we explicitly define mappings to not have the update mapping behavior happening, yet it still happens because of the current logic, and because in our test we delay the introduction of any mapping updates randomly, it can get in and override updated ones.
closes #6669
2014-07-02 17:30:56 +02:00
Alexander Reelsen 4091162d91 Refactoring: Replaced string values with static constants
in TransportShardBulkAction after fixing an issue.
2014-07-02 12:37:40 +02:00
Alexander Reelsen b46d017e5c Bulk API: Fix return of wrong request type on failed updates
In case an update request failed (for example when updating with a
wrongly formatted date), the returned index operation type was index
instead of update.

Closes #6630
2014-07-02 12:37:39 +02:00
Boaz Leskes 7119ffa7bc IndexingMemoryController should only update buffer settings of recovered shards
At the moment the IndexingMemoryController can try to update the index buffer memory of shards at any give moment. This update involves a flush, which may cause a FlushNotAllowedEngineException to be thrown in a concurrently finalizing recovery.

Closes #6642, closes #6667
2014-07-02 12:23:10 +02:00
Adrien Grand b0c21d751d [TEST] Fix SimpleDeleteMappingTests.
The failure was hard to reproduce but it looked to me like dynamic mapping
updates were overriding the delete mappings request.
2014-07-02 12:12:04 +02:00
Adrien Grand 356349599f [TEST] Fix PercolatorTests to wait for mappings on master. 2014-07-02 11:51:58 +02:00
Alexander Reelsen 16fe44c7ec JAVA API: Fix source excludes setting if no includes were provided
Due to a bogus if-check in SearchSourceBuilder.fetchSource(String include, String exclude)
the excludes only got set when the includes were not null. Fixed this and added some
basic tests.

Closes #6632
2014-07-02 11:48:05 +02:00
Simon Willnauer dbd372cd61 [TEST] Added IntegrationTest to reproduce #6614 2014-07-02 11:45:58 +02:00
Simon Willnauer 06918d547a [TEST] Wait for yellow after enable allocation on all nodes in BWC tests 2014-07-02 11:38:52 +02:00
Adrien Grand e76eb228b2 [TEST] Fix IndexLookupTests.testCallWithDifferentFlagsFails. 2014-07-02 10:09:29 +02:00
Adrien Grand 309a284e8d [TEST] Fix failure in SearchFieldsTests.testUidBasedScriptFields.
Sorting fails on unmapped fields so the new propagation delay of the mappings
exposed this issue. I added explicit mappings as part of index creation to fix it.
2014-07-02 09:40:49 +02:00
Adrien Grand a96f9a7c83 Templates: GET templates doesn't honor the `flat_settings` parameter.
Close #6671
2014-07-02 08:42:31 +02:00
Igor Motov 67882d78aa [TEST] Remove RANDOM_NO_DELETE_OPEN_FILE and RANDOM_PREVENT_DOUBLE_WRITE settings from snapshot/restore tests 2014-07-01 15:55:53 -04:00
Boaz Leskes b2b443130f Fix forbidden API syntax error 2014-07-01 19:49:57 +02:00
Shay Banon 2b1823cf02 wait for mapping updates during local recovery
when the primary shard is recovering its translog, make sure to wait for new mapping introductions till the mappings have been updated on the master before finalizing the recovery itself
also, this change performs the mapping updates in a more optimized manner by batching the types to change into a single set and sending after the translog has been replayed

also, remove the wait for mapping on master in the local state tests since this new behavior covers it

closes #6666

remove waiting for mapping on master since we do it in recovery
2014-07-01 19:36:26 +02:00
Boaz Leskes 72d2ac1328 Better support for partial buffer reads/writes in translog infrastructure
Some IO api can return after writing & reading only a part of the requested data. On these rare occasions, we should call the methods again to read/write the rest of the data. This has cause rare translog corruption while writing huge documents on Windows.

Noteful parts of the commit:
- A new Channels class with utility methods for reading and writing to channels
- Writing or reading to channels is added to the forbidden API list
- Added locking to SimpleFsTranslogFile
- Removed FileChannelInputStream which was not used

Closes #6441 , #6576
2014-07-01 19:11:36 +02:00
Martijn van Groningen 5668b1cfc5 Core: cancel entire recovery if shard closes on target node during the recovery operations.
Closes #6645
2014-07-01 18:16:41 +02:00
Simon Willnauer fd1d02fd07 [TEST] Prevent usage of System Properties in the InternalTestCluster
All settings should be passes as settings and the enviroment should not
influence the test cluster settings. The settings we care about ie.
`es.node.mode` and `es.logger.level` should be passed via settings.
This allows tests to override these settings if they for instance need
`network` transport to operate at all.

Closes #6663
2014-07-01 18:05:44 +02:00
Simon Willnauer c9b7bec3cc [INDEX] Ensure `index.version.created` is consistent
Today `index.version.created` depends on the version of the master
node in the cluster. This is potentially causing new features to be
expected on shards that didn't exist when the index was created.
There is no notion of `where was the shard allocated first` such that
`index.version.created` can't be reliably used as a feature flag.

With this change the `index.version.created` can be reliably used to
determin the smallest nodes version at the point in time when the index
was created. This means we can safely use certain features that would
for instance require reindeing and / or would not work if not the
entire index (all shards and segments) have been created with a certain
version or newer.

Closes #6660
2014-07-01 18:00:13 +02:00
Igor Motov f14edefc9d [TEST] Fix possible race condition in checksum name generator
When three threads are trying to write checksums at the same time, it's possible for all three threads to obtain the same checksum file name A. Then the first thread enters the synchronized section, creates the file with name A and exits. The second thread enters the synchronized section, checks that A exists, creates file A+1 and exits the critical section. Then it proceeds to clean up  and deletes all checksum files including A. If it happens before the third thread enters the synchronized section, it's possible for the third thread to check for A and since it no longer exists create the checksum file A the second time, which triggers "file _checksums-XXXXXXXXXXXXX was already written to" exception in MockDirectoryWrapper and fails recovery.
2014-07-01 09:51:42 -04:00
Martijn van Groningen ec74a7e76f Core: Prevent non segment readers from entering the filter cache and the field data caches.
Percolator: Never cache filters and field data in percolator for the percolator query parsing part.

Closes #6553
2014-07-01 15:05:31 +02:00
Florian Hopf 0a93956d9a Fixed link to native Java client
Closes #6590
2014-07-01 14:01:49 +02:00
Adrien Grand 2ed73bb4f7 [TEST] Improve reproducibility of mappings propagation delays related issues. 2014-07-01 13:31:54 +02:00
Martijn van Groningen 85bea22bc8 Core: The ignore_unavailable=true setting also ignores indices that are closed.
Closes #6471
Closes #6475
2014-07-01 13:09:24 +02:00
Ian Babrou 698eb7de9b Fixed JSON in fielddata docs 2014-07-01 12:53:10 +02:00
Shay Banon f0817c31d9 start mapping service earlier to be available for recovery 2014-07-01 11:39:26 +02:00
Adrien Grand 6a1e7b6ad0 [TEST] Fix ExistsMissingTests failures.
They were due to a combination of mappings propagation delays and the behavior
of MapperService.smartName(String) so mappings are now configured up-front.
2014-07-01 11:25:37 +02:00
Duncan Angus Wilkie 60a8515fb7 Update histogram-facet.asciidoc
Spotted a typo, which I've fixed.
2014-07-01 10:49:43 +02:00
Florian Hopf c5cf283517 Docs: Removed Sense mention 2014-07-01 08:30:34 +02:00
Igor Motov 8a20bfcdd5 [TEST] Turn off double write check for restore 2014-06-30 23:12:29 -04:00
Igor Motov 2149a9403d Improve deletion of corrupted snapshots
Makes it possible to delete snapshots that are missing some of the metadata files. This can happen if snapshot creation failed because repository drive ran out of disk space.

Closes #6383
2014-06-30 21:03:46 -04:00
Igor Motov 1425e28639 Add ability to restore partial snapshots
Closes #5742
2014-06-30 20:18:02 -04:00
Shay Banon 46f1e30fa9 Recovery from local gateway should re-introduce new mappings
The delayed mapping intro tests exposed a bug where if a new mapping is introduced, yet not updated on the master, and a full restart occurs, reply of the transaction log will not cause the new mapping to be re-introduced.
closes #6659

add comment on the method
2014-07-01 01:53:44 +02:00
Shay Banon e8519084c9 [TEST] properly wait for mapping on master node
add helper method to do so, by not assuming that the mapping will exists right away by waiting for green or refreshing...
2014-06-30 23:11:23 +02:00
Shay Banon 5c5e13abce [TEST] properly wait for mappings when needed 2014-06-30 22:32:43 +02:00
Shay Banon 5273410be6 Update mapping on master in async manner
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).

Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
closes #6648

allow to change the additional time window dynamically

better sorting on mappers when refreshing source
also, no need to call nodes info in test, we already have the node names

clean calls to mapping update to provide doc mapper and UUID always
also use the internal cluster support method to get the list of nodes an index is on

reverse the order to pick the latest change first

remove unused field

and fix constructor param

move to start/stop on mapping update action

randomize INDICES_MAPPING_ADDITIONAL_MAPPING_CHANGE_TIME
2014-06-30 22:08:39 +02:00
Lee Hinman 761ef5d9f1 Wrap groovy script exceptions in a serializable Exception object
Fixes #6598
2014-06-30 16:50:34 +02:00
Shay Banon c9ff9a6930 [TEST] Randomize netty worker and connection parameters
Try and push our system to a state where there is only a single worker, trying to expose potential deadlocks when we by mistake execute blocking operations on the worker thread
closes #6635
2014-06-30 14:57:36 +02:00
Boaz Leskes c907ce325e [Test] make recovery slow down in rerouteRecoveryTest aware of index size 2014-06-30 10:54:45 +02:00
Boaz Leskes a72c167be2 [Test] improved recovery slow down in rerouteRecoveryTest
only change recovery throttling to slow down recoveries. The recovery file chunk size updates are not picked up by ongoing recoveries. That cause the recovery to take too long even after the default settings are restored.

Also - change document creation to reuse field names in order to speed up the test.
2014-06-29 14:37:12 +02:00
Boaz Leskes bbc82e2821 [Test] add awaitFix to rerouteRecoveryTest 2014-06-29 09:55:03 +02:00
Boaz Leskes ca194594b3 Recovery API should also report ongoing relocation recoveries
We currently only report relocation related recoveries after they are done.

Closes #6585
2014-06-28 21:27:15 +02:00
Boaz Leskes 155620ed8e [Test] testRelocationWhileRefreshing should wait for the first shard to be started 2014-06-28 10:41:06 +02:00
Simon Willnauer 9ce66cb167 [TEST] Testcase for #6639 2014-06-28 09:12:25 +02:00
Simon Willnauer 309c7ceeff Added minimal setup guide for BW Compat tests 2014-06-27 15:39:53 +02:00
Robert Muir a3d5381392 Disable explicit GC by default
We don't rely upon GC to cleanup mappedbytebuffers, we unmap them
explicitly on close in lucene. But the JDK has crazy loops with
explicit GCs in exceptional cases to try to force unmapping.

In general we don't want any of our code or library code calling
this method: so its banned in forbidden-apis as well.
2014-06-27 14:09:44 +02:00
Simon Willnauer b2685f132a [TEST] Change es.node.mode default for tests to `local`
In order to speed up test execution we should run in local mode by
default. CI builds will still use network builds all the time.

Closes #6624
2014-06-27 11:57:34 +02:00