The implementation this commit replaces was almost k-NN regression with
k=2, but had two bugs: (a) it depends on the empirical raw estimates
being in strictly non-decreasing order for the binary search (which they
are not); and (b) it weights the biases positively with increased
distance from the corresponding raw estimate.
“HyperLogLog in Practice” leaves the choice of exact algorithm here
fairly vague, just noting: “We use k-nearest neighbor interpolation to
get the bias for a given raw estimate (for k = 6).” The majority of
other open source HyperLogLog++ implementations appear to use k-NN
regression with uniform weights (and generally k = 6). Uniform
weighting does decrease variance, but also introduces bias at the domain
extrema. This problem, plus the use of the word “interpolation” in the
original paper, suggests (inverse) distance-weighted k-NN, as
implemented here.
Since #13068 refresh and flush requests go to the primary first and are then replicated.
One difference to before is though that if a shard is not available (INITIALIZING for example)
we wait a little for an indexing request but for refresh we don't and just give up immediately.
Before, refresh requests were just send to the shards regardless of what their state is.
In tests we sometimes create an index, issue an indexing request, refresh and
then get the document. But we do not wait until all nodes know that all primaries have ben assigned.
Now potentially one node can be one cluster state behind and not know yet that
the shards have ben started. If the refresh is executed through this node then the
refresh request will silently fail on shards that are started already because from
the nodes perspective they are still initializing. As a consequence, documents
that expected to be available in the test are now not.
Example test failures are here: http://build-us-00.elastic.co/job/elasticsearch-20-oracle-jdk7/395/
This commit changes the timeout to 1m (default) to make sure we don't miss shards
when we refresh. This will trigger the same retry mechanism as for indexing requests.
We still have to make a decision if this change of behavior is acceptable.
see #13238
From a user perspective, the main benefit from this upgrade is that the new
Lucene53Codec has disk-based norms. The elasticsearch directory has been fixed
to load these norms through mmap instead of nio.
Other changes include the removal of `max_thread_states`, the fact that
PhraseQuery and BooleanQuery are now immutable, and that deleted docs are now
applied on top of the Scorer API.
This change introduces a couple of `AwaitsFix`s but I don't think it should
hold us from merging.
This allows reducing privileges with doPrivileged to work,
otherwise it will fail with NPE.
In general, if some code wants to do that, let it. The null
check is needed, even though ProtectionDomain(CodeSource, PermissionCollection)
is more than a bit misleading: "the current Policy will not be consulted".
Additionally add a defensive check for location, since the docs
there are even more confusing: https://bugs.openjdk.java.net/browse/JDK-8129972
The jdk policy impl has both these checks.
Virtualbox is the default virtualization provier for vagrant but folks
override that from time to time. If they do then the build will fail because
the boxes used by the build don't usually support non-virtualbox providers.
Closes#13217
Prior to 2.0 we summed up the available space on all disk on a node
due to the raid-0 like behavior. Now we don't do this anymore and use the
min & max disk space to make decisions.
Closes#13106
To support spaces in both the command as well as its arguments cmd needs
be called like this:
cmd /C ""c:\a b\c.bat" "argument 1" "argument2""
ant was running
cmd /C "c:\a b\c.bat" "argument 1" "argument2"
which in windows causes to be preprocessed to
cmd /C c:\a b\c.bat" "argument 1" "argument2
Which would make it appear as though ant was not properly quoting (which
it did sort of).
Several shard-level operations that previously broadcasted a request
per shard were converted to broadcast a request per node. This commit
converts upgrade action to this new model as well.
Closes#13204
Switch to use HTTPS by default for all hardcoded plugin URLs.
If users want to install via HTTP they can still specify a HTTP
URL manually.
Closes#12748
The current netty multiport tests bind on localhost and then try to connect
to 127.0.0.1, which may fail, if localhost is resolved to ipv6 by default.
This randomly chooses between 127.0.0.1, localhost and ::1 (if available) for
binding and then uses this throughout the test.
The path that a shard is allocated on is not taken into account when
we decide to move a shard away from a node because it passed a watermark.
Even worse we potentially moved away (relocated) a shard that was not even
allocated on that disk but on another on the node in question. This commit
adds a ShardRouting -> dataPath mapping to ClusterInfo that allows to identify
on which disk the shards are allocated on.
Relates to #13106
The field type tests for mappings had a huge hole: check compatibility
was not tested directly at all! I had meant for this to happen in a
follow up after #8871, and was relying on existing mapping tests.
However, there were a number of issues.
This change reworks the fieldtype tests to be able to check all settable
properties on a field type work with checkCompatibility. It fixes a
handful of small bugs in various field types. In particular, analyzer
comparison was just wrong: it was comparing reference equality for
search analyzer instead of the analyzer name. There was also no check
for search quote analyzer.
closes#13112