When a thread pool rejects the execution on the local node, the search might not return.
This happens due to the fact that we move to the next shard only *within* the execution on the thread pool in the start method. If it fails to submit the task to the thread pool, it will go through the fail shard logic, but without "counting" the current shard itself. When this happens, the relevant shard will then execute more times than intended, causing the total opes counter to skew, and for example, if on another shard the search is successful, the total ops will be incremented *beyond* the expectedTotalOps, causing the check on == as the exit condition to never happen.
The fix here makes sure that the shard iterator properly progresses even in the case of rejections, and also includes improvement to when cleaning a context is sent in case of failures (which were exposed by the test).
Though the change fixes the problem, we should work on simplifying the code path considerably, the first suggestion as a followup is to remove the support for operation threading (also in broadcast), and move the local optimization execution to SearchService, this will simplify the code in different search action considerably, and will allow to remove the problematic #firstOrNull method on the shard iterator.
The second suggestion is to move the optimization of local execution to the TransportService, so all actions will not have to explicitly do the mentioned optimization.
fixes#4887
Changed getFiniteStrings to use an iterative implementation instead of
recursive, so we don't use a Java stack-frame per character for each
suggestion at build & query time.
The defaults we have today in our data intensive memory structures don't properly add up to properly protected from potential OOM.
The circuit breaker, today at 80%, aims at protecting from extensive field data loading. The default threshold today is too permissive and can still cause OOMs.
The filter cache today is at 20%, and its too high when adding it to other limits we have, reduce it to 10%, which is still a big enough portion of the heap, yet provides improved safety measure.
closes#5990
The regex tests are formatted with blocks for readability. Previously,
they were formatted using folded style blocks (e.g. using `>`). Folded
blocks convert newlines into spaces. This is problematic for our regex,
since comments can only be terminated with a newline.
Effectively, anything after a comment will be commented out, making many
of the regex "silently pass".
This commit replaces them with scalar-style blocks (e.g. using `|`), which
treats newlines as significant, and thus correctly terminates comments
inside the regex.
Also fixes a regex test (`cat.thread_pool/10_basic.yaml`) that started
to fail after the block was fixed. The test was missing a `\s+` before
the closing newline.
If a source node disconnect during recover, the target node will respond by canceling the recovery. Typically the master will respond by removing the disconnected node from the cluster state, promoting another shard to become primary. This is sent it to all nodes and the target node will start recovering from the new primary. However, if the drop of a node caused the node count to go bellow min_master_node, the master will step down and will not promote shard immediately. When a new master is elected we may publish a new cluster state (who's point is to notify of a new master) which is not yet updated. This caused the node to start a recovery to a non existent node. Before we aborted the recovery without cleaning up the shard, causing subsequent correct cluster states to be ignored. We should not start the recovery process but wait for another cluster state to come in.
Closes#6024
A boost terms factor of 1.0 is not the same as no boosting of terms.
The desired behavior is to deactivate boosting by default. If the user
specifies any value other than 0, then boosting is activated.
Closes#6021
Updating to this version allows to configure a special JNA directory,
in case the /tmp directory is mounted with the noexec option, as JNA
extracts some data and tries to execute parts of it.
Also updated documentation to clarify mlockall and memory settings as well
as pointing to the new jna.tmpdir system property.
Closes#5493
In addition, add a special warning if the misplaced function is a "boost_factor"
function to avoid confusion of "boost" and "boost_function".
closes#5995
In the string type, we have an optimization to reuse the StringTokenStream on a thread local when a non analyzed field is used (instead of creating it each time). We should use this across the board on all places where we create a field with a String.
Also, move to a specific XStringField, that we can reuse StringTokenStream instead of copying it.
closes#6001
Some tests run before the thread is started and thus use 0 as a the current time, which later on leads to big time jumps and thus failures.
Ex. InternalEngineTests.testVersioningReplicaConflict2
Sub-aggregations are currently collected directly, by just forwarding the
doc ID and bucket ordinal to them. This change adds the new BucketCollector
abstract class that Aggregator extends, so that we have more flexibility to
add implicit filters or buffering between an aggregator and its sub
aggregators.
Close#5975
Sorting was broken on nested documents because the `missing(slot)` method
didn't correctly set the segment ordinal (readerGen), causing term ordinals to
be compared across segments.
Close#5986
When providing a number (milliseconds since epoch, UTC), range and term query/filter don't handle it correctly and convert it to a string, that is then first tried to parse as a date
closes#5969
Instead of resolving the global ordinal for each hit on the fly, resolve the global ordinals during post collect.
On fields with not so many unique values, that can reduce the number of global ordinals significantly.
Closes#5895Closes#5854
The default number of clients nodes is randomized between 0 and 1, applied to all cluster scopes (global, suite and test). Can be changed through the newly added `@ClusterScope#numClientNodes`.
In our tests we currently refer to nodes in a generic way. All the tests that either stop or start nodes rely on the fact that those nodes hold data though. Made that clearer as that becomes more important when introducing other types of nodes within the test cluster. Reflected this by adapting and renaming the following methods in `TestCluster`:
- ensureAtLeastNumNodes to ensureAtLeastNumDataNodes
- ensureAtMostNumNodes to ensureAtMostNumDataNodes
- stopRandomNode to stopRandomDataNode
and the following ones in `ElasticsearchIntegrationTest`:
- allowNodes to allowDataNodes
- dataNodes to numDataNodes.
- @ClusterScope#numNodes to numDataNodes
- @ClusterScope#minNumNodes to minNumDataNodes
- @ClusterScope#maxNumNodes to maxNumDataNodes
Added facilities to be able to deal with data nodes specifically, like for instance retrieve a client to a data node, or retrieve an instance of a class through guice only from data nodes.
Adapted existing tests to successfully run although there's a node client around.
Fixed _cat/allocation REST tests to make disk.total, disk.avail and disk.percent optional as client nodes won't return that info.
Closes#5949
Decay functions currently only use the first value in a field that contains
multiple values to compute the distance to the origin. Instead, it should
consider all distances if more values are in the field and then use
one of min/max/sum/avg which is defined by the user.
Relates to #3960closes#5940