We need to register those data paths otherwise we might miss path that
need to get cleaned when using local gatway etc. which can otherwise
cause imports of dangeling indices.
This commit fixes the issue caused by restore process deleting all legacy checksum files at the end of restore process. Instead it keeps the latest version of the checksum intact. The issue manifests itself in losing checksum for all legacy files restored into post 1.3.0 cluster, which in turn causes unnecessary snapshotting of files that didn't change.
Fixes#8119
Now each error is reported in bulk response rather than causing entire bulk to fail.
Added a Junit test but the use of TransportClient means the error is manifested differently to a REST based request - instead of a NullPointer the whole of the bulk request failed with a RoutingMissingException. Changed TransportBulkAction to catch this exception and treat it the same as the existing logic for a ElasticsearchParseException - the individual bulk request items are flagged and reported individually rather than failing the whole bulk request.
Closes#8365
Test used `indices.recovery.concurrent_streams` when creating an index but this is a node setting. Moved it to the node settings and added similar settings to speed up concurrent recoveries.
Also fixed a misleading log message in ShardRecoveryHandler when logging a remove corruption
Today it's possible that the data directory for a single shard is used by more than on
IndexShard->Store instances. While one shard is already closed but has a concurrent recovery
running and a new shard is creating it's engine files can conflict and data can potentially
be lost. We also remove shards data without checking if there are still users of the files
or if files are still open which can cause pending writes / flushes or the delete operation
to fail. If the latter is the case the index might be treated as a dangeling index and is brought
back to life at a later point in time.
This commit introduces a shard level lock that prevents modifications to the shard data
while it's still in use. Locks are created per shard and maintined in NodeEnvironment.java.
In contrast to most java concurrency primitives those locks are not reentrant.
This commit also adds infrastructure that checks if all shard locks are released after tests.
Don't eagerly cache parent type filters in bitset cache or nested object fields that are leafs.
Also let parent/child queries not rely on FixedBitSetFilter, but rather on regular Filter
Closes#8440
Fixed behaviour where two representations of the default index analyzer weren't being treated as equivalent. Added REST test to confirm fix.
Closes#2716
In addition to `_source`, the following variables are available through
the `ctx` map: `_index`, `_type`, `_id`, `_version`, `_routing`,
`_parent`, `_timestamp`, `_ttl`.
Some of these fields are more useful still within the context of an
Update By Query, see #1607, #2230, #2231.
Based on some test failures, this commit fixes two minor things
* Bind ports only on so called ephemeral ports to prevent try to
bind to ports where elasticsearch already runs on
* Remove @Network annotation as it was used in a wrong scope
Previous to this change all features (_alias,_mapping,_settings,_warmer) are run regardless of which features are actually requested. This change fixes the request object to resolve this bug
The compression bug fixed in #7210 can still strike us since we are
running BWC test against these version. This commit disables compression
forcefully if the compatibility version is < 1.3.2 to prevent debugging
already known issues.
The query_string query has an option for analyzing wildcard/prefix (#787) by a best effort approach.
This adds `analyze_wildcard` option also to simple_query_string.
The default is set to `false` so the existing behavior of simple_query_string is unchanged.
When data nodes receive mapping updates from the master, the parse it and merge it into their own in memory representation (if there). If this results in different bytes then the master sent, the nodes will send a refresh-mapping command to indicate to the master that it's byte level storage of the mapping should be refreshed via the document mappers. This comes handy when the mapping format has changed, in a backwards compatible manner, and we want to make sure we can still rely on the bytes to identify changes. An example of such a change can be seen at #4760.
This commit extends the logic to include the `_default_` type, which was never refreshed before. In some unlucky scenarios, this caused the _default_ mapping to be parsed with every cluster state update.
Closes#8413
This prevents too-difficult regular expressions from consuming
excessive RAM/CPU; the default max_determinized_states is 10,000 (same
as Lucene) but query_string and regepx query/filter can override
per-request.
The also upgrades to a new Lucene 5.0.0 snapshot.
Closes#8386Closes#8357
DocIdSets.isFast(DocIdSet) has two issues:
- it works on the DocIdSet interface while some doc sets can generate either
slow or fast sets depending on their options (eg. whether an OrDocIdSet is
fast or not depends on the wrapped clauses).
- it only works because the result of this method is only taken into account
when a DocIdSet has non-null `bits()`.
This commit changes this method to work on top of a DocIdSetIterator and to use
a black-list rather than a white list: slow iterators should really be the
exception rather than the rule.
Close#8380
The rename(String, String) method doesn't allow this implementation to use a simple
concurrent map. There is a race during a rename operation where files are not fully
renamed but already visible via #listAll(). This inconsistency can lead to problems
when opening commit points since the pending_segments_N as well as segments_N are visible
but not yet atomically renamed.
Yet, non of the methods that are synced are long running such that adding sychronization
doesn't introduce bottlenecks here. The Direcotry#sync(...) method is not synchronized since
it doesn't change any mapping nor does it depend on the mapping.
Previously we didn't calculate this checksums even though we have a checksum
to compare. Since we now also verify checksums for legacy files #checkIntegrity
should also calculate the legacy checksums.
Closes#8407
When a lucene 4.8+ file is transferred, Store returns a VerifyingIndexOutput
that verifies both the CRC32 integrity and the length of the file.
However, for older files, problems can make it to the lucene level. This is not great
since older lucene files aren't especially strong as far as detecting issues here.
For example, if a network transfer is closed on the remote side, we might write a
truncated file... which old lucene formats may or may not detect.
The idea here is to verify old files with their legacy Adler32 checksum, plus expected
length. If they don't have an Adler32 (segments_N, jurassic elasticsearch?, its optional
as far as the protocol goes), then at least check the length.
We could improve it for segments_N, its had an embedded CRC32 forever in lucene, but this
gets trickier. Long term, we should also try to also improve tests around here, especially
backwards compat testing, we should test that detected corruptions are handled properly.
Closes#8399
Conflicts:
src/main/java/org/elasticsearch/index/store/Store.java
src/test/java/org/elasticsearch/index/store/StoreTest.java