Commit Graph

1031 Commits

Author SHA1 Message Date
Adrien Grand 656fa69f2d Merge pull request #13243 from llasram/hll-estimate-bias-nn-k-6
Estimate HyperLogLog bias via k-NN regression
2015-09-01 14:59:17 +02:00
Marshall Bockrath-Vandegrift 1c773e235a Estimate HyperLogLog bias via k-NN regression
The implementation this commit replaces was almost k-NN regression with
k=2, but had two bugs: (a) it depends on the empirical raw estimates
being in strictly non-decreasing order for the binary search (which they
are not); and (b) it weights the biases positively with increased
distance from the corresponding raw estimate.

“HyperLogLog in Practice” leaves the choice of exact algorithm here
fairly vague, just noting: “We use k-nearest neighbor interpolation to
get the bias for a given raw estimate (for k = 6).”  The majority of
other open source HyperLogLog++ implementations appear to use k-NN
regression with uniform weights (and generally k = 6).  Uniform
weighting does decrease variance, but also introduces bias at the domain
extrema.  This problem, plus the use of the word “interpolation” in the
original paper, suggests (inverse) distance-weighted k-NN, as
implemented here.
2015-09-01 08:42:29 -04:00
Adrien Grand 7bc1acf956 Merge pull request #13239 from jpountz/upgrade/lucene-5.3.0
Upgrade to lucene-5.3.0.
2015-09-01 14:03:29 +02:00
Britta Weber d386d909fc rename actions back to admin/* and add suffix [s] instead 2015-09-01 12:53:07 +02:00
Britta Weber 05b48b904d set timeout for refresh and flush to default
Since #13068 refresh and flush requests go to the primary first and are then replicated.
One difference to before is though that if a shard is  not available (INITIALIZING for example)
we wait a little for an indexing request but for refresh we don't and just give up immediately.
Before, refresh requests were just send to the shards regardless of what their state is.

In tests we sometimes create an index, issue an indexing request, refresh and
then get the document. But we do not wait until all nodes know that all primaries have ben assigned.
Now potentially one node can be one cluster state behind and not know yet that
the shards have ben started. If the refresh is executed through this node then the
refresh request will silently fail on shards that are started already because from
the nodes perspective they are still initializing. As a consequence, documents
that expected to be available in the test are now not.
Example test failures are here: http://build-us-00.elastic.co/job/elasticsearch-20-oracle-jdk7/395/

This commit changes the timeout to 1m (default) to make sure we don't miss shards
when we refresh. This will trigger the same retry mechanism as for indexing requests.
We still have to make a decision if this change of behavior is acceptable.

see #13238
2015-09-01 12:20:05 +02:00
Robert Muir 7caed74d5d Merge pull request #13232 from rmuir/nullcheck_policy
Add missing null check in ESPolicy.
2015-09-01 06:03:35 -04:00
Adrien Grand 5d9fb2e8a6 Upgrade to lucene-5.3.0.
From a user perspective, the main benefit from this upgrade is that the new
Lucene53Codec has disk-based norms. The elasticsearch directory has been fixed
to load these norms through mmap instead of nio.

Other changes include the removal of `max_thread_states`, the fact that
PhraseQuery and BooleanQuery are now immutable, and that deleted docs are now
applied on top of the Scorer API.

This change introduces a couple of `AwaitsFix`s but I don't think it should
hold us from merging.
2015-09-01 11:58:45 +02:00
Adrien Grand f0b7fa2f31 Merge pull request #13060 from andrestc/enhancement/functionscore-unmapped
Make FunctionScore work on unmapped field with `missing` parameter
2015-09-01 11:05:30 +02:00
Simon Willnauer 7571276b84 Pass in relevant disk usage map for early termination 2015-09-01 10:35:56 +02:00
xuzha f46e66e7d0 Remove the experimental indices.fielddata.cache.expire
closes #10781
2015-09-01 00:40:04 -07:00
Britta Weber 333831c126 Merge pull request #13068 from brwe/broadcast_replication
Make refresh a replicated action
2015-09-01 09:21:54 +02:00
Robert Muir a58c5dba89 Add missing null check in ESPolicy.
This allows reducing privileges with doPrivileged to work,
otherwise it will fail with NPE.

In general, if some code wants to do that, let it. The null
check is needed, even though ProtectionDomain(CodeSource, PermissionCollection)
is more than a bit misleading: "the current Policy will not be consulted".

Additionally add a defensive check for location, since the docs
there are even more confusing: https://bugs.openjdk.java.net/browse/JDK-8129972

The jdk policy impl has both these checks.
2015-09-01 00:34:34 -04:00
Jason Tedor aea00a62f3 Merge pull request #13227 from jasontedor/immutable-lists-be-gone
Remove and forbid use of com.google.common.collect.ImmutableList
2015-08-31 15:29:35 -04:00
Martijn van Groningen 238b56dedf Merge pull request #13046 from jimhooker2002/issue-4665-clean
Turn DestructiveOperations into a Guice module.

To share the same instance between component inside a node.

Closes #4665
2015-08-31 21:22:55 +02:00
Martijn van Groningen 30ffa9a61b test: Allow tests to override whether mock modules are used 2015-08-31 21:02:49 +02:00
Jason Tedor a8bace9f97 Remove and forbid final uses of ImmutableList 2015-08-31 14:35:23 -04:00
Jason Tedor b0af7a1426 Fix NettyTransport 2015-08-31 14:29:00 -04:00
Jason Tedor e39a3bae2c Merge branch 'master' into lists_are_simple 2015-08-31 14:07:00 -04:00
Britta Weber d81f426b68 Make refresh a replicated action
prerequisite to #9421
see also #12600
2015-08-31 19:44:00 +02:00
Martijn van Groningen 1b84cadb7b test: The transport client that interacts with the external cluster shoud be provided a list of transport client plugins. 2015-08-31 16:58:03 +02:00
Britta Weber a7e240077d Merge pull request #13218 from brwe/resolve-index-default-impl
add default impl for resolveIndex()
2015-08-31 15:53:57 +02:00
Michael McCandless a49217949f Merge pull request #13199 from mikemccand/remove_merge_docs
Move expert segment merge settings documentation off site into javadocs.
2015-08-31 09:52:19 -04:00
Britta Weber 73785e075e add default impl for resolveIndex() 2015-08-31 15:48:32 +02:00
Tanguy Leroux dbbecce8f2 Sort thread pools by name in Nodes Stats 2015-08-31 14:30:43 +02:00
Jason Tedor 6e2dc73023 Merge pull request #13205 from jasontedor/feature/13204
Convert upgrade action to broadcast by node
2015-08-31 06:02:08 -04:00
Jason Tedor d1223b7369 Convert upgrade action to broadcast by node
Several shard-level operations that previously broadcasted a request
per shard were converted to broadcast a request per node. This commit
converts upgrade action to this new model as well.

Closes #13204
2015-08-31 05:59:57 -04:00
Alexander Reelsen 856b040a0a Plugins: Replace HTTP urls with HTTPS
Switch to use HTTPS by default for all hardcoded plugin URLs.
If users want to install via HTTP they can still specify a HTTP
URL manually.

Closes #12748
2015-08-31 11:45:38 +02:00
Alexander Reelsen 00902207a6 Tests: Ensure binding on localhost host is consistently ipv4/v6
The current netty multiport tests bind on localhost and then try to connect
to 127.0.0.1, which may fail, if localhost is resolved to ipv6 by default.

This randomly chooses between 127.0.0.1, localhost and ::1 (if available) for
binding and then uses this throughout the test.
2015-08-31 10:56:42 +02:00
Simon Willnauer a17d7500d3 Take Shard data path into account in DiskThresholdDecider
The path that a shard is allocated on is not taken into account when
we decide to move a shard away from a node because it passed a watermark.
Even worse we potentially moved away (relocated) a shard that was not even
allocated on that disk but on another on the node in question. This commit
adds a ShardRouting -> dataPath mapping to ClusterInfo that allows to identify
on which disk the shards are allocated on.

Relates to #13106
2015-08-31 10:40:42 +02:00
Michael McCandless 7ad2222ccc copy over merge docs as javadocs 2015-08-30 18:14:47 -04:00
Ryan Ernst 6295f8e795 Merge branch 'master' into tell_me_your_plugins 2015-08-30 14:20:54 -07:00
Ryan Ernst 2539b779c8 Merge pull request #13137 from rjernst/empty_doc_again
Fix doc parser to still pre/post process metadata fields on disabled type
2015-08-30 12:14:18 -07:00
Jason Tedor aa26b66e96 Remove leftover debugging statement 2015-08-30 14:19:30 -04:00
Simon Willnauer 86a8a0a570 IndicesStatsAction is now a per node operation 2015-08-30 12:48:13 +02:00
David Pilato 03bb28514e Installing plugin without checksums ends up downloading from github
```sh
bin/plugin install lmenezes/elasticsearch-kopf/develop
-> Installing lmenezes/elasticsearch-kopf/develop...
Trying http://download.elastic.co/lmenezes/elasticsearch-kopf/elasticsearch-kopf-develop.zip ...
Trying http://search.maven.org/remotecontent?filepath=lmenezes/elasticsearch-kopf/develop/elasticsearch-kopf-develop.zip ...
Trying https://oss.sonatype.org/service/local/repositories/releases/content/lmenezes/elasticsearch-kopf/develop/elasticsearch-kopf-develop.zip ...
Trying https://github.com/lmenezes/elasticsearch-kopf/archive/develop.zip ...
Downloading .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
Verifying https://github.com/lmenezes/elasticsearch-kopf/archive/develop.zip checksums if available ...
Trying https://github.com/lmenezes/elasticsearch-kopf/archive/master.zip ...
Downloading ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................DONE
Verifying https://github.com/lmenezes/elasticsearch-kopf/archive/master.zip checksums if available ...
```

This happens because we don't have anymore ElasticsearchWrapperException here but standard java exceptions.

Closes #13196.
2015-08-29 23:00:45 +02:00
Jason Tedor 5cb86130ec Add mechanism for transporting shard-level actions by node
Currently, many shard-level operations are transported with a request
per shard via TransportBroadcastAction. These shard-level requests are
then submitted to unbounded execution queues for asynchronous execution
on the receiving node. This transport mechanism and stuffing of the
execution queues can be problematic on large clusters. A better
mechanism would be to aggregate the shard-level requests, transport
them via a single request per node, and execute the shard-level
operations serially on the receiving node.

This commit introduces TransportNodeBroadcastAction which is the
high-level mechanism for transporting the shard-level operations in a
single request per node. The shard-level operations are executed
serially on the receiving node and per-node shard-level results are
aggregated into a single response per node. These node-level results
are then aggregated into a single response to the initial request.

One item of note is a new mechanism for registering request handlers.
This mechanism enables registrants to provide a callback for
instantiating new instances of the request class. Doing this enables
the inner class to be instantiated with the context of its outer class.
This is done so that a single NodeRequest class can be defined rather
than defining a class per operation.

Closes #7990
2015-08-29 16:15:12 -04:00
Jim Hooker 05aa1d90b8 Extend AbstractComponent and remove logger 2015-08-29 07:43:17 +01:00
Jason Tedor 0fa8ee1edd Fix logging statement in o.e.a.s.m.TransportMasterNodeAction 2015-08-28 13:39:56 -04:00
Jason Tedor 532d100c22 Fix logging statement 2015-08-28 13:34:22 -04:00
Nik Everett c180defb10 [CAT] Default verbose to false
Closes #13156
2015-08-28 11:15:44 -04:00
Simon Willnauer 0c71328186 Expand ClusterInfo to provide min / max disk usage forn allocation decider
Today we sum up the disk usage for the allocation decider which is broken since
we don't stripe across multiple data paths. Each shard has it's own private path
now but the allocation deciders still treat all paths as one big disk. This commit
adds allows allocation deciders to access the least used and most used path to make
better allocation decidsions upon canRemain and canAllocate calls.

Yet, this commit doesn't fix all the issues since we still can't tell which shard
can remain and which can't. This problem is out of scope in this commit and will be solved
in a followup commit.

Relates to #13106
2015-08-28 14:04:25 +02:00
Boaz Leskes 35f9ee7a62 Tests: better isolation of cluster ports
Previously multiple clusters in the same JVM reused the same port ranges, leading to potential big gaps in port selection, which in turns causes unicast based discovery to fail, missing to find another node in the default 5 port range.

Also the previous logic had http use a range that is assigned to another JVMs.
2015-08-28 11:39:30 +02:00
Michael McCandless 07b5d22d91 disable new test on windows 2015-08-28 05:06:35 -04:00
Michael McCandless fb703845dd Merge pull request #13158 from mikemccand/new_path_for_shard_test
Add unit test for ShardPath.selectNewPathForShard
2015-08-28 04:15:15 -04:00
Michael McCandless b646ed9cd8 try to work on Windows too 2015-08-28 04:13:21 -04:00
Michael McCandless 8dbc1fbdbd use ShardPath.getRootStatePath; allow forbidden API 2015-08-28 03:59:02 -04:00
Boaz Leskes db5e225a25 Discovery: fix `discovery.zen.join_timeout` default value logic
We default the value to be 20x the value of a ping timeout, however we only use the legacy ping timeout settings value for the calculation.

Closes #13162
2015-08-28 09:47:15 +02:00
javanna 9b2e77903d Internal: make ValidationException methods final and fix javadocs 2015-08-28 09:41:47 +02:00
javanna 37ec221df5 Internal: remove unused MapperQueryParser constructor 2015-08-28 09:38:29 +02:00
Jason Tedor 90bc784194 Work around for JDK-8039214 on JDK 9 2015-08-27 23:29:22 -04:00