When you set a BulkProcessor with a bulk actions size of 100, it executes the bulk after 101 documents.
```java
BulkProcessor.builder(client(), listener).setBulkActions(100).setConcurrentRequests(1).setName("foo").build();
```
Same for size. If you set the bulk size to 1024 bytes, it will actually execute the bulk after 1025 bytes.
This patch fix it.
Closes#4265.
We have the default QueryWrapperFilter as well as our custom one while
our wrapper is explicitly marked as no_cache such that it will never
be included in a cache. This was not consistenly used and caused several
problems during tests where p/c related queries were used as filters
and ended up in the cache. This commit adds the QueryWrapperFilter
ctor to the forbidden APIs to enforce the query instance checks.
Some mappers do not support externalValue() to be set. So plugin developers can't use it while building their own mappers.
Support added in this PR for:
* `BinaryFieldMapper`
* `BooleanFieldMapper`
* `GeoPointFieldMapper`
* `GeoShapeFieldMapper`
Closes#4986.
Relative to #4154.
Due to bugs in jvm (specifically OSX), running zen discovery tests causes for "socket close" failure on receive on multicast socket, and under some jvm versions, even crashes. This happens because of the creation of multiple multicast sockets within the same VM. In practice, in our tests, we use the same settings, so we can share the same multicast socket across multiple channels.
This change creates an abstraction called MulticastChannel, that can be shared, with ref counting. Today, the shared option is only enabled under OSX.
closes#5410
A missing call to ArrayUtil.grow prevented the array that stores the values
from growing in case the number of values returned by the script was higher
than the original size of the array.
Close#5414
Seen during CI tests, it could appears that the download service is not available for any reason.
This fix in test will check before each test which requires an internet access (annotated with @Network) that the download service we are testing is still working.
It won't fail the test but will mark the test as `Ignored` in case of failure.
Note that the standard `atLeast` implementation has now Integer.MAX_VALUE as upper bound, thus it behaves differently from what we expect in our tests, as we never expect the upper bound to be that high.
Added our own `atLeast` to `AbstractRandomizedTest` so that it has the expected behaviour with a reasonable upper bound.
See https://github.com/carrotsearch/randomizedtesting/issues/131
Significance is related to the changes in document frequency observed between everyday use in the corpus and
frequency observed in the result set. The asciidocs include extensive details on the applications of this feature.
Closes#5146
Lucene's RamUsageEstimator.sizeOf(Object) is to expensive.
Query size estimation will be enabled when a cheaper way of query size estimation can be found.
Closes#5372
Relates to #5339
During snapshot finalization the snapshot file is getting overwritten. If we try to read the snapshot file at this moment we can get back an empty or incomplete snapshot. This change adds a retry mechanism in case of such failure.
The test was shutting down nodes even if some of the inidces had only a
single shard. This caused that we basically had no shard active that
could sever the docs and caused random failures. This commit fixed the
test to rather allocate enough shards such that we never need to resize
the cluster which also makes the test faster.
This aggregation computes unique term counts using the hyperloglog++ algorithm
which uses linear counting to estimate low cardinalities and hyperloglog on
higher cardinalities.
Since this algorithm works on hashes, it is useful for high-cardinality fields
to store the hash of values directly in the index, which is the purpose of
the new `murmur3` field type. This is less necessary on low-cardinality
string fields because the aggregator is smart enough to only compute the hash
once per unique value per segment thanks to ordinals, or on numeric fields
since hashing them is very fast.
Close#5426
Aggregations were still using CacheRecycler on the reduce phase. They are now
using page-based recycling for both the aggregation phase and the reduce phase.
Close#4929
Due to a regression edit distances > 2 threw exceptions after unifying
the fuzziness factor in Elasticsearch `1.0`. This commit brings back the
expceted behavior.
Closes#5292
Introduced two levels of randomization for the number of replicas when running tests:
1) through the existing random index template, which now sets a random number of replicas that can either be 0 or 1 that is shared across all the indices created in the same test method unless overwritten
2) through createIndex and prepareCreate methods, between 0 and the number of data nodes available, similar to what happens using the indexSettings method, which changes for every createIndex or prepareCreate unless overwritten (overwrites index template for what concerns the number of replicas)
Added the following facilities to deal with the random number of replicas:
- made it possible to retrieve how many data nodes are available in the `TestCluster`
- added common methods similar to indexSettings, to be used in combination with createIndex and prepareCreate method and explicitly control the second level of randomization: numberOfReplicas, minimumNumberOfReplicas and maximumNumberOfReplicas
Tests that specified the number of replicas have been reviewed:
- removed manual replicas randomization where present, replaced with ordinary one that's now available
- adapted tests that didn't need a specific number of replicas to the new random behaviour
- also done some more cleanup, used common methods like assertAcked, ensureGreen, refresh, flush and refreshAndFlush where possible
================
This commit extends the `CompletionSuggester` by context
informations. In example such a context informations can
be a simple string representing a category reducing the
suggestions in order to this category.
Three base implementations of these context informations
have been setup in this commit.
- a Category Context
- a Geo Context
All the mapping for these context informations are
specified within a context field in the completion
field that should use this kind of information.
Per single main ping request we maintain the received ping response per node. Each node level ping response is mapped into that. If from a previous node level ping request the response has already been set for a node, it will be overwritten. We give higher value to the latest response. This change makes sure that this doesn't happen if the previous response has a set master and the current response hasn't a set master. Otherwise a node will lose the fact that another node has elected itself as master, the result of that would be that there would multiple master nodes in a single cluster.
Closes#5413