For better integration with the Lucene Test Framework and the
availabilty of RandomizedRunner / Randommized Testing this commit
moves over from TestNG to JUnit.
This commit also moves relevant places over to RandomzedRunner for
reproduceability and removes copied classes from the Lucene Test
Framework.
if the index already exists, trace log it, since it might happen as a result of multiple index requests at the same time creating the index, all other ones, should be debug and not warn in the same spirit of other APIs
try and bulk as much as possible refresh and update mapping events, so they will all be processed at a single go, resulting in less cluster change events, and also reduce the load of multiple changes to the same index
also, change the prio for those to HIGH, since we want URGENT ones (like create index, delete index) to execute
Have a dedicated thread pool for explicit optimize calls (shard level optimize operations). By default, the size should be 1 to work the same with how things work currently allowing for only 1 shard level optimize on a node.
The change allows to see the optimize thread pool stats now, and potentially increase the thread pool size for beefy machines.
closes#3366
- master actions many times end up being executed on the cluster service, so there is no need to block them on the management thread pool to wait for a response, this remove the load on the management thread pool, and also simplifies the code implementing it
- cluster service state update exception handling was improved to include a callback when a failure happens during state update execution, this makes sure that we catch all relevant exceptions and invoke the callback, as well as simplify the code of cluster state update tasks, as they can throw failures from within the execute method and then handle them properly
FastVectorHighlighter fails at highlighting some complex queries such as
multi phrase queries which have two terms at the same position. This can be
easily triggered by running a `match_phrase` query with an analyzer which
outputs synonyms such as SynonymFilter or WordDelimiterFilter.
Close#3357
Currently, the master node might be processing too many cluster state events, and then be blocked on waiting for its respective even to be processed. We can use the new cluster state update timeout support to use the master_timeout value and respect it.
closes#3365
Moved alias action validation to IndicesAliasesRequest, so that Java api and RestIndexPutAliasAction can benefit from it too.
Added check in MetaDataIndexAliasesService too.
Closes#3363
Today, we have a low/med/high channel groups in our transport layer. High is used to publish cluster state and send ping requests. Sometimes, the overhead of publishing large cluster states can interfere with ping requests.
Introduce a new, dedicated ping channel (with size 1) to have a channel that only handles ping requests.
closes#3362
Master node cluster state events resulting in zen discovery (node gets added, removed, for example) should be processed with priority URGENT as its always better to process them as fast as possible, and not let other events get in the way.
closes#3361
In case systemd is used, ulimit is not called (as it would be in the initscript)
and has to be configured in the systemd configuration.
For more information about the parameters LimitNOFILE and LimitMEMLOCK
see http://www.freedesktop.org/software/systemd/man/systemd.exec.html
currently, we treat all strings as shared (either by full equality or identity equality), while almost all times we know if they should be serialized as shared or not. Add an explicitly write/readSharedString, and use it where applicable, and all other write/readString will not treat them as shared
relates to #3322
when parsing a filter, we use null to indicate that this filter should not match anything, the top level filter doesn't take it into account
fixes#3356
Although segments are limited to 2B documents, there is not limit on the number
of unique values that a segment may store. This commit replaces 'int' with
'long' every time a number is used to represent an ordinal and modifies the
data-structures used to store ordinals so that they can actually support more
than 2B ordinals per segment.
This commit also improves memory usage of the multi-ordinals data-structures
and the transient memory usage which is required to build them (OrdinalsBuilder)
by using Lucene's PackedInts data-structures. In the end, loading the ordinals
mapping from disk may be a little slower, field-data-based features such as
faceting may be slightly slower or faster depending on whether being nicer to
the CPU caches balances the overhead of the additional abstraction or not, and
memory usage should be better in all cases, especially when the size of the
ordinals mapping is not negligible compared to the size of the values (numeric
data for example).
Close#3189
* If all parent ids have been emitted as hit, abort the query / filter execution.
* If the a relative small number of parent ids have been collected in the first phase then limit the number of second phase parent id lookups by putting a short circuit filter before parent document evaluation or omit the it in the case of the filter. This is contrable via the `short_circuit_cutoff` option which is exposed in the `has_child` query & filter.
All parent / child queries and filters (expect `top_children` query) abort execution if no parent ids have been collected in the first phase.
Closes#3190
With this design the percolate queries will be stored in a special `_percolator` type with its own mapping in the same index where the actual data is or in a different index (dedicated percolation index, which might require different sharding behavior compared to the index that holds actual data and being search on). This approach allows percolate requests to scale to the number of primary shards an index has been configured with and effectively distributes the percolate execution.
This commit doesn't add new percolate features other than scaling. The response remains similar, with exception that a header similar to the search api has been added to the percolate response.
Closes#3173
It can happen that Eclipse fails at correctly adding a new import entry to an
existing list of imports since we don't use its default rules. This commit
forces Eclipse to organize imports on save.
More-like-this and fuzzy-like-this queries expect analyzers which are able to
generate character terms (CharTermAttribute), so unfortunately this doesn't
work with analyzers which generate binary-only terms (BinaryTermAttribute,
the default CharTermAttribute impl being a special BinaryTermAttribute) such as
our analyzers for numeric fields (byte, short, integer, long, float, double but
also date and ip).
To work around this issue, this commits adds a fail_on_unsupported_field
parameter to the more-like-this and fuzzy-like-this parsers. When this parameter
is false, numeric fields will just be ignored and when it is true, an error will
be returned, saying that these queries don't support numeric fields. By default,
this setting is true but the mlt API sets it to true in order not to fail on
documents which contain numeric fields.
Close#3252
The XPatternCaptureGroupTokenFilter.java file can be removed once we
upgrade to Lucene 4.4.
This change required the addition of the commaDelimited flag to getAsArray()
to disable parsing strings as comma-delimited values.
Closes#3340