This commit addresses an issue that was leading to snapshot corruption for snapshots stored as blobs in Azure Storage.
The underlying issue is that in cases when multiple snapshots of an index were taken and persisted into Azure Storage, snapshots subsequent
to the first would repeatedly overwrite the snapshot files. This issue does render useless all snapshots except the final snapshot.
The root cause of this is due to String concatenation involving null. In particular, to list all of the blobs in a snapshot directory in
Azure the code would use the method listBlobsByPrefix where the prefix is null. In the listBlobsByPrefix method, the path keyPath + prefix
is constructed. However, per 5.1.11, 5.4 and 15.18.1 of the Java Language Specification, the reference null is first converted to the string
"null" before performing the concatenation. This leads to no blobs being returned and therefore the snapshot mechanism would operate as if
it were writing the first snapshot of the index. The fix is simply to check if prefix is null and handle the concatenation accordingly.
Upon fixing this issue so that subsequent snapshots would no longer overwrite earlier snapshots, it was discovered that the snapshot metadata
returned by the listBlobsByPrefix method was not sufficient for the snapshot layer to detect whether or not the Lucene segments had already
been copied to the Azure storage layer in an earlier snapshot. This led the snapshot layer to unnecessarily duplicate these Lucene segments
in Azure Storage.
The root cause of this is due to known behavior in the CloudBlobContainer.getBlockBlobReference method in the Azure API. Namely, this method
does not fetch blob attributes from Azure. As such, the lengths of all the blobs appeared to the snapshot layer to be of length zero and
therefore they would compare as not equal to any new blobs that the snapshot layer is going to persist. To remediate this, the method
CloudBlockBlob.downloadAttributes must be invoked. This will fetch the attributes from Azure Storage so that a proper comparison of the
blobs can be performed.
Closeselastic/elasticsearch-cloud-azure#51, closeselastic/elasticsearch-cloud-azure#99
Fold ignored unassigned to a UnassignedShards and have simpler handling of them. Also remove the trapy way of adding an ignored unassigned shards today directly to the list, and have dedicated methods for it.
This change also removes the useless moving of unassigned shards to the end, since anyhow we first, sort those unassigned shards, and second, we now have persistent "store exceptions" that should not cause "dead letter" shard allocation.
Break it into more manageable code by separating allocation primaries and allocating replicas. Start adding basic unit tests for primary shard allocator.
Our thread pools have support for timeout on a task. To support this, a special background task is schedule to run at timeout. That background task fires and check if the main task is still in the executor queue and then cancels it if needed. Currently we schedule this background task before adding the main task to the queue. If the timeout is very small (in tests we often use numbers like 2 ms) the background task can fire before the main one is added to the queue causing the timeout to be missed.
See http://build-us-00.elastic.co/job/es_g1gc_master_metal/11780/testReport/junit/org.elasticsearch.cluster/ClusterServiceTests/testTimeoutUpdateTask/Closes#12319
On top of that:
1) A relocation target shards' allocation id is changed to include the allocation id of the source shard under relocatingId (similar to shard routing semantics)
2) The logic around state change for finalize shard relocation is simplified - one simple start the target shard (we previously had unused logic around relocating state)
Closes#12299
While the GeoJSON spec does say a polygon is represented as an array of LinearRings (where a LinearRing is defined as a 'closed' array of points), the coerce parameter provides users with flexibility to have ES automatically close polygons. This addresses situations like those integrated with twitter (where GeoJSON polygons are not closed) such that our users do not have to write extra code to close the polygon. This code change adds the optional coerce parameter to the GeoShapeFieldMapper.
closes#11131
Currently this target is "yet another way" to run elasticsearch,
which we can't maintain. It also has the problem that it doesnt
ensure its running on the latest source code, doesn't configure
any scratch space properly, won't work with securitymanager, list
goes on.
Even if we made it work, it would break every day, since its untested.
Instead, `mvn package -Drun -DskipTests` will run packaging, and then
startup bin/elasticsearch (like integration tests, but in foreground).
It also enables debugger socket on port 8000, for people that like
IDE debuggers and not system.out.println.
Its a little slower to get started because of all the shading/RPM/DEB
building going on in `package` but that is just what it is right now
until that stuff is moved out.