Commit Graph

12125 Commits

Author SHA1 Message Date
Simon Willnauer 5cce09b32d [TEST] Use async durability in load tests 2015-05-22 14:06:24 +02:00
Simon Willnauer 4e1fa3c3b4 [TEST] Provide random instance to rarely(Random) instead of using the threadlocal one 2015-05-22 13:49:01 +02:00
Shay Banon 08e87bd81e Async Fetch: Better logging classification + log when ignored 2015-05-22 11:59:44 +02:00
javanna afb7aabea7 Internal: replace if with existing MetaData#isAllTypes call in MapperService 2015-05-22 11:28:29 +02:00
Simon Willnauer ada98ba0c4 Allow disabling of sigar via settings
Sigar can only be disabled by removing the binaries. This is tricky for our
tests and might cause a lot of trouble if a user wants or needs to do it.
This commit allows to disable sigar with a simple boolean flag in the settings.

Closes #9582
2015-05-22 10:07:40 +02:00
Ryan Ernst 19de7039a3 Merge pull request #11292 from rjernst/remove/mapper-generic
Mappings: Remove generics from FieldMapper
2015-05-21 16:25:04 -07:00
Ryan Ernst 49e965fab0 Mappings: Remove generics from FieldMapper
FieldMapper is currently generic, where the templated type
is only used as the return of a single function, value(Object).
This change simply removes this generic type. It is not needed. The
implementations of value() now has a covariant return (so
those methods have not changed).
2015-05-21 16:07:07 -07:00
Ryan Ernst f071c01afc Add back accidentally removed support for _all as a type alias.
See 4dd4f48
2015-05-21 13:11:32 -07:00
Simon Willnauer cbead88273 Check engine reference for null before flushing
If we close the shard before the engine is started we see a NPE in
the logs which is not problematic since the relevant parts are in a
finally block. Yet, the NPE is unnecessary and can be confusing.
2015-05-21 21:30:35 +02:00
Igor Motov 581bfc74c6 Tests: disable rebalancing in restoreIndexWithShardsMissingInLocalGateway
Rebalancing sometimes kicks in at the end of restore and interferes with reuse stats that this test relies on.
2015-05-21 13:37:06 -04:00
Shay Banon d3e36d0940 [TEST] snapshot test shoudl use assertBusy to provide more info when failing 2015-05-21 19:26:46 +02:00
jaymode 8060cd0794 add profile name to TransportChannel
Today, only the NettyTransportChannel implements the getProfileName method
and the other channel implementations do not. The profile name is useful for some
plugins to perform custom actions based on the name. Rather than checking the
type of the channel, it makes sense to always expose the profile name.

For DirectResponseChannels we use a name that cannot be used in the settings
to define another profile with that name. For LocalTransportChannel we use the
same name as the default profile.

Closes #10483
2015-05-21 12:16:04 -04:00
Adrien Grand 6b3918a97c Clean up and make the test work. 2015-05-21 18:11:17 +02:00
Simon Willnauer c8f39dd2b9 Use ReleaseableLock and try to add a test for it.. 2015-05-21 18:11:17 +02:00
Adrien Grand a98d78b3ae Mappings: Make mapping updates atomic wrt document parsing.
When mapping updates happen concurrently with document parsing, bad things can
happen. For instance, when applying a mapping update we first update the Mapping
object which is used for parsing and then FieldNameAnalyzer which is used by
IndexWriter for analysis. So if you are unlucky, it could happen that a document
was parsed successfully without introducing dynamic updates yet IndexWriter does
not see its analyzer yet.

In order to fix this issue, mapping updates are now protected by a write lock
and document parsing is protected by the read lock associated with this write
lock. This ensures that no documents will be parsed while a mapping update is
being applied, so document parsing will either see none of the update or all of
it.
2015-05-21 18:11:17 +02:00
Simon Willnauer 6b63ea49c2 [TEST] now that we check early if we are corrupted this can be run with many nodes 2015-05-21 17:57:49 +02:00
Simon Willnauer 65c646b01d Also log the shard state info for the shard that can't be opened 2015-05-21 16:46:49 +02:00
Colin Goodheart-Smithe 35deb7efea Aggregations: Renaming reducers to Pipeline Aggregators 2015-05-21 14:57:23 +01:00
javanna e7958f46dc Transport: remove support for reading/writing list of strings, use arrays instead
We recently introduced support for reading and writing list of strings as part of #11056, but that was an oversight, we should be using arrays instead.

Closes #11276
2015-05-21 14:48:20 +02:00
javanna 5a0c456ac2 Internal: Uid#createTypeUids to accept a collection of ids rather than a list, plus rename method variants to avoid clashes
The downside of having createTypeUids accept a list only is that if you do provide a collection nothing breaks at compile time, but you end up calling the same method that accepts an object as second argument. Renamed both methods to avoid clashes to `createUidsForTypesAndId` and `createUidsForTypesAndIds`. The latter accepts now a Collection of Objects rather than just a List.

Closes #11263
2015-05-21 10:41:54 +02:00
Simon Willnauer e91bf8a267 [TEST] Remove corrupted index before checkindex goes wild 2015-05-21 10:21:11 +02:00
Simon Willnauer 4cb21d02a4 Check if the index can be opened and is not corrupted on state listing
We fetch the state version to find the right shard to be started as
the primary. This can return a valid shard state even if the shard is
corrupted and can't even be opened. This commit adds best effort detection
for this scenario and returns an invalid version for the shard if it's corrupted

Closes #11226
2015-05-21 09:52:39 +02:00
Ryan Ernst ea3c5d5820 Merge pull request #11272 from rjernst/refactor/mapper-names
Mappings: Cleanup names handling
2015-05-21 00:36:14 -07:00
Britta Weber 17eec9ddca Merge pull request #11266 from brwe/npe-commitstats
fix NPE when streaming commit stats
2015-05-21 08:51:22 +02:00
Robert Muir 332e50e44f Merge pull request #11273 from rmuir/useless_version
Remove useless outdated forbidden-api version in `-Pdev` config.
2015-05-21 02:26:36 -04:00
Robert Muir 8abb8e4861 Remove useless outdated forbidden-api version in `-Pdev` config. 2015-05-21 02:11:11 -04:00
Igor Motov dd41c68741 Snapshot/Restore: fix FSRepository location configuration
Closes #11068
2015-05-20 22:14:31 -04:00
Yannick Welsch 14c1743f30 Snapshot/Restore: Batching of snapshot state updates
Similar to the batching of "shards-started" actions, this commit implements batching of snapshot status updates. This is useful when backing up many indices as the cluster state does not need to be republished as many times.

Closes #10295
2015-05-20 20:43:02 -04:00
Ryan Ernst 5203205808 Mappings: Cleanup names handling
This clarifies some of the uses of names, so that the ambiguous
"name" is mostly no longer used (does this include path or not?).
sourcePath is also removed as it was not used. Not all the
uses of .name() have been cleaned up because Mapper still has
this, and ObjectMapper depends on it returning the short name,
but I would like to leave finishing that cleanup for a future issue.
2015-05-20 17:07:22 -07:00
Lee Hinman 0a6f7ef379 [DOCS] Mention Integer.MAX_VALUE limit for http.max_content_length
Fixes #11244
2015-05-20 13:08:59 -06:00
Britta Weber 4bdf8aabd7 fix NPE when streaming commit stats
commit id is only written from lucene 5 on
2015-05-20 20:52:37 +02:00
Shay Banon 72fde6f695 Async fetch of shard started and store during allocation
Today, when a primary shard is not allocated we go to all the nodes to find where it is allocated (listing its started state). When we allocate a replica shard, we head to all the nodes and list its store to allocate the replica on a node that holds the closest matching index files to the primary.

Those two operations today execute synchronously within the GatewayAllocator, which means they execute in a sync manner within the cluster update thread. For large clusters, or environments with very slow disk, those operations will stall the cluster update thread, making it seem like its stuck.

Worse, if the FS is really slow, we timeout after 30s the operation (to not stall the cluster update thread completely). This means that we will have another run for the primary shard if we didn't find one, or we won't find the best node to place a shard since it might have timed out (listing stores need to list all files and read the checksum at the end of each file).

On top of that, this sync operation happen one shard at a time, so its effectively compounding the problem in a serial manner the more shards we have and the slower FS is...

This change moves to perform both listing the shard started states and the shard stores to an async manner. During the allocation by the GatewayAllocator, if data needs to be fetched from a node, it is done in an async fashion, with the response triggering a reroute to make sure the results will be taken into account. Also, if there are on going operations happening, the relevant shard data will not be taken into account until all the ongoing listing operations are done executing.

The execution of listing shard states and stores has been moved to their own respective thread pools (scaling, so will go down to 0 when not needed anymore, unbounded queue, since we don't want to timeout, just let it execute based on how fast the local FS is). This is needed sine we are going to blast nodes with a lot of requests and we need to make sure there is no thread explosion.

This change also improves the handling of shard failures coming from a specific node. Today, those nodes were ignored from allocation only for the single reroute round. Now, since fetching is async, we need to keep those failures around at least until a single successful fetch without the node is done, to make sure not to repeat allocating to the failed node all the time.

Note, if before the indication of slow allocation was high pending tasks since the allocator was waiting for responses, not the pending tasks will be much smaller. In order to still indicate that the cluster is in the middle of fetching shard data, 2 attributes were added to the cluster health API, indicating the number of ongoing fetches of both started shards and shard store.

closes #9502
closes #11101
2015-05-20 17:55:02 +02:00
Shay Banon 1568eca962 [TEST] simplify number of data nodes in #testReplicaCorruption 2015-05-20 17:51:03 +02:00
jaymode a40ba3be5a default socket reuse address value should be non null
The default for reuse address was null on Windows and casting to a boolean would
result in a NPE. This makes the default on Windows false and changes the return
type to a boolean and removes the need to check for nulls.
2015-05-20 09:52:06 -04:00
Simon Willnauer 40bd56b918 Merge pull request #11241 from s1monw/readd_fail_on_recovery
[RECOVERY] Add engine failure on recovery finalization corruption back
2015-05-20 14:35:02 +02:00
Clinton Gormley 409e4e5f73 REST test: Fixed index-seal test
Fixed bad YAML, and changed to wait for yellow instead of green, because
REST tests usually run on a single node
2015-05-20 13:46:43 +02:00
Simon Willnauer 0bc5b35a59 [TEST] Add better assertion messages 2015-05-20 13:00:26 +02:00
Adrien Grand 42ad677127 Tests: Compare the length of cluster states serialized as a String, not the binary representation.
Due to the fact that the binary representation uses some compression, we can't
be sure that they will be equal even if they store the same content.
2015-05-20 12:05:04 +02:00
Simon Willnauer 51c9f73947 Include num_docs in the commit stats
This also fixes a potential race condition when the number of docs
is compared across shards with the same seal ID since the assertion
was taking the number of docs form the live index reader which might
not be equivalent to the committed num docs.
2015-05-20 11:33:07 +02:00
Simon Willnauer 44b0edd2b8 Notify Listener if transport throws an exception 2015-05-20 11:23:12 +02:00
Clinton Gormley 5e4d5e1c64 Docs: Included the index-seal docs in the indices section 2015-05-20 11:20:12 +02:00
Alexander Reelsen bc5bf9784d Cleanup: Remove generics need in ContextAndHeaderHolder
Generics were only needed for setting a header, that returned
the object being set (most likely the request), but none of
the other methods did this.
2015-05-20 10:49:45 +02:00
Adrien Grand b4ec9044ed Fix abusive assertion. 2015-05-20 10:10:53 +02:00
Ryan Ernst eaf35c4e4a Merge pull request #11243 from rjernst/remove/type-listener
Mappings: Remove document parse listener
2015-05-20 00:00:57 -07:00
Adrien Grand e90740ad74 Merge pull request #11233 from jpountz/fix/compressedstring_equals
Internal: Fix CompressedString.equals.
2015-05-20 08:39:51 +02:00
Adrien Grand fd1954d74f Internal: Fix CompressedString.equals.
CompressedString relied on the assumption that two CompressedString instanes
are equal if there compressed representation are equal. Unfortunately this is
not always true because the compressed representation also depends on when
flush() was called on the output stream or on the size of the hash table that
has been used at compression time.
2015-05-20 08:36:42 +02:00
Igor Motov 21ed6bb90c Core: Don't allow indices containing too-old segments to be opened
When index is introduced into the cluster via cluster upgrade, restore or as a dangled index the MetaDataIndexUpgradeService checks if this index can be upgraded to the current version. If upgrade is not possible, the newly upgraded cluster startup and restore process are aborted, the dangled index is imported as a closed index that cannot be open.

Closes #10215
2015-05-19 23:37:05 -04:00
Ryan Ernst 26ef40a46d Mappings: Remove document parse listener
There was previously a single user of the parse listener, MLT API.
However, now that this is gone, there is no need for it.
2015-05-19 15:35:45 -07:00
Simon Willnauer 8949cb85fb [RECOVERY] Add engine failure on recovery finalization corruption back
This engine failure on finalization corruption was lost on refactorings and
should be added back.
2015-05-19 22:24:02 +02:00
Simon Willnauer eaa8576bba Remove all corruption markers if finalization fails 2015-05-19 22:15:48 +02:00