Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
closes#5447
If we want to have a full picture of versions running in a cluster, we need to add a `_cat/plugins` endpoint.
Response could look like:
```sh
% curl es2:9200/_cat/plugins?v
node component version type url desc
es1 mapper-attachments 1.7.0 j Adds the attachment type allowing to parse difference attachment formats
es1 lang-javascript 1.4.0 j JavaScript plugin allowing to add javascript scripting support
es1 analysis-smartcn 1.9.0 j Smart Chinese analysis support
es1 marvel 1.1.0 j/s http://localhost:9200/_plugins/marvel Elasticsearch Management & Monitoring
es1 kopf 0.5.3 s http://localhost:9200/_plugins/kopf kopf - simple web administration tool for ElasticSearch
es2 mapper-attachments 2.0.0.RC1 j Adds the attachment type allowing to parse difference attachment formats
es2 lang-javascript 2.0.0.RC1 j JavaScript plugin allowing to add javascript scripting support
es2 analysis-smartcn 2.0.0.RC1 j Smart Chinese analysis support
```
Closes#4824.
Significance is related to the changes in document frequency observed between everyday use in the corpus and
frequency observed in the result set. The asciidocs include extensive details on the applications of this feature.
Closes#5146
This aggregation computes unique term counts using the hyperloglog++ algorithm
which uses linear counting to estimate low cardinalities and hyperloglog on
higher cardinalities.
Since this algorithm works on hashes, it is useful for high-cardinality fields
to store the hash of values directly in the index, which is the purpose of
the new `murmur3` field type. This is less necessary on low-cardinality
string fields because the aggregator is smart enough to only compute the hash
once per unique value per segment thanks to ordinals, or on numeric fields
since hashing them is very fast.
Close#5426
================
This commit extends the `CompletionSuggester` by context
informations. In example such a context informations can
be a simple string representing a category reducing the
suggestions in order to this category.
Three base implementations of these context informations
have been setup in this commit.
- a Category Context
- a Geo Context
All the mapping for these context informations are
specified within a context field in the completion
field that should use this kind of information.
Introduced two levels of randomization for the number of shards (between 1 and 10) when running tests:
1) through the existing random index template, which now sets a random number of shards that is shared across all the indices created in the same test method unless overwritten
2) through `createIndex` and `prepareCreate` methods, similar to what happens using the `indexSettings` method, which changes for every `createIndex` or `prepareCreate` unless overwritten (overwrites index template for what concerns the number of shards)
Added the following facilities to deal with the random number of shards:
- `getNumShards` to retrieve the number of shards of a given existing index, useful when doing comparisons based on the number of shards and we can avoid specifying a static number. The method returns an object containing the number of primaries, number of replicas and the total number of shards for the existing index
- added `assertFailures` that checks that a shard failure happened during a search request, either partial failure or total (all shards failed). Checks also the error code and the error message related to the failure. This is needed as without knowing the number of shards upfront, when simulating errors we can run into either partial (search returns partial results and failures) or total failures (search returns an error)
- added common methods similar to `indexSettings`, to be used in combination with `createIndex` and `prepareCreate` method and explicitly control the second level of randomization: `numberOfShards`, `minimumNumberOfShards` and `maximumNumberOfShards`. Added also `numberOfReplicas` despite the number of replicas is not randomized (default not specified but can be overwritten by tests)
Tests that specified the number of shards have been reviewed and the results follow:
- removed number_of_shards in node settings, ignored anyway as it would be overwritten by both mechanisms above
- remove specific number of shards when not needed
- removed manual shards randomization where present, replaced with ordinary one that's now available
- adapted tests that didn't need a specific number of shards to the new random behaviour
- fixed a couple of test bugs (e.g. 3 levels parent child test could only work on a single shard as the routing key used for grand-children wasn't correct)
- also done some cleanup, shared code through shard size facets and aggs tests and used common methods like `assertAcked`, `ensureGreen`, `refresh`, `flush` and `refreshAndFlush` where possible
- made sure that `indexSettings()` is always used as a basis when using `prepareCreate` to inject specific settings
- converted indexRandom(false, ...) + refresh to indexRandom(true, ...)
Supports sorting on sub-aggs down the current hierarchy. This is supported as long as the aggregation in the specified order path are of a single-bucket type, where the last aggregation in the path points to either a single-bucket aggregation or a metrics one. If it's a single-bucket aggregation, the sort will be applied on the document count in the bucket (i.e. doc_count), and if it is a metrics type, the sort will be applied on the pointed out metric (in case of a single-metric aggregations, such as avg, the sort will be applied on the single metric value)
NOTE: this commit adds a constraint on what should be considered a valid aggregation name. Aggregations names must be alpha-numeric and may contain '-' and '_'.
Closes#5253
Lucene 4.7 supports a setter for the `filler_token` that is
inserted if there are gaps in the token stream. This change exposes
this setting.
Closes#4307
In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.
The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.
Closes#5102