This commit tightens the load average test assertions by separating out
OS X as a system where we can make tighter assertions about the load
average values.
This commit fixes the test for load averages on FreeBSD. On FreeBSD, it
is either the case that linprocfs is mounted at /compat/linux/proc in
which case the load averages are available, or this is not the case and
no load average are available. Previously, the test on FreeBSD was
falling back to the catch all case which asserts that the five-minute
and fifteen-minute load averages are not available.
This commit normalizes the one-minute load average obtained from
OperatingSystemMXBean#getSystemLoadAverage to -1 when it is not
available. This is to reflect the Javadocs for this method saying "If
the load average is not available, a negative value is returned."
This commit modifies the load_average in the node stats API response
to be an object containing the one-minute, five-minute and
fifteen-minute load averages as fields (if those values are
available). Additionally, this commit modifies the cat nodes API
response to format the one-minute, five-minute and fifteen-minute load
averages as null if any of the respective values are not available.
Setting realTime to false in the get term vector request ensures,
that after a refresh the document is not fetched from the translog,
which seems to yield different in rare test runs.
The change likely triggering this was introduced in #15933
This commit adds some additional assertions that test success is not
falsely indicated by adding assertions that success / failure methods
are not incorrectly invoked in failure / success scenarios.
This commit adds a test for calculating the age of PrioritizedRunnable
that allows real clock time to elapse. The test ensures that at least
one millisecond has passed, and that the resolution of System#nanoTime
on the underlying system is actually able to detect this.
Relates #15995
This commit wraps a trace logging message in a trace logging level check
to prevent allocating an Object array (to hold the logging parameters)
and a String (from the interval) when trace logging is not enabled every
second (with the default index refresh interval) and every five seconds
(with the default translog sync interval) for every open index when
trace logging is not enabled.
This commit simplifies an equality test in IndexShard#sameException
where the messages for two exceptions are being compared. The previous
condition first tested logical equality if the left exception is not
null, and otherwise tested reference equality. There is a convenience
method since JDK 7 for testing equality in this way: Objects#equals.
Closes#16025
Fixes an issue caused by trying to add a LeafReader for a closed index to
ShardCoreKeyMap. It add itself to the map half way before throwing
AlreadyClosedException, leaking some references and causesing Elasticsearch
to refuse to shut down cleanly when testing.
This commit enhances the master channel exception test in
o.e.c.a.s.ShardStateActionTests to test that a retries loop as expected
when requests to the master repeatedly fail.
It had some funky errors, like lenient:true not working and queries with
two integer fields blowing up if there was no analyzer defined on the
query. This throws a bunch more tests at it and rejiggers how non-strings
are handled so they don't wander off into scary QueryBuilder-land unless
they have a nice strong analyzer to protect them.
This commit adds a trace log on a cluster state update while waiting for
a new master, and changes the log level on cluster service close to the
warn level.
This commit tightens the tests in o.e.c.a.s.ShardStateActionTests:
- adds a simple test for a success condition that validates the shard
failed request is correct and sent to the correct place
- remove redundant assertions from the no master and master left tests
- an assertion that success is not falsely indicated in the case of a
unhandled error
This commit addresses a time unit conversion bug in calculating the age
of a PrioritizedRunnable. The issue was an incorrect conversion from
nanoseconds to milliseconds as instead the conversion was to
microseconds. This leads to the timeInQueue metric for pending tasks to
be off by three orders of magnitude.
We have a performance bug that if a filter aggregation is below a terms
aggregation that has a cardinality of 1000, we will call Query.createWeight
1000 times as well. However, Query.createWeight can be a costly operation.
For instance in the case of a TermQuery it will seek the term in every
segment. Instead, we should create the Weight once, and then get as many
iterators as we need from this Weight.
I found this problem while trying to diagnose a performance regression while
upgrading from 1.7 to 2.1[1]. While the problem was not introduced in 2.x, the
fact that 1.7 cached very aggressively had hidden this problem, since you don't
need to seek the term anymore on a cached TermFilter.
Doing things once for every aggregator is not easy with the current API but
I discussed this with Colin and Aggregator factories will need to get an init
method for different reasons, where we will be able to put these steps that
need to be performed only once, no matter haw many aggregators need to be
created.
[1] https://discuss.elastic.co/t/aggregations-in-2-1-0-much-slower-than-1-6-0/38056/26
If an operating system reports -1 for the total bytes of a filesystem
path, we should ignore it when capturing the least and most available
statistics.
Relates to #15919
Squashed commit of the following:
commit 5d2258ffeff8a0d156295dcc754ab9b6cbb4b02e
Author: Lee Hinman <lee@writequit.org>
Date: Thu Jan 14 14:14:27 2016 -0700
Change test to test positive total with negative 'free' value
commit 927e61d4b39692fc147220a955b63b291ad80db5
Author: Lee Hinman <lee@writequit.org>
Date: Thu Jan 14 13:09:28 2016 -0700
Skip capturing least/most FS info for an FS with no total
If an operating system reports -1 for the total bytes of a filesystem
path, we should ignore it when capturing the least and most available
statistics.
Relates to #15919
This commit removes the timeout retry mechanism from ShardStateAction
allowing it to instead be handled by the general master channel retry
mechanism. The idea is that if there is a network issue, the master will
miss a ping timeout causing the channel to be closed which will expose
itself via a NodeDisconnectedException. At this point, we can just wait
for a new master and retry, as with any other master channel exception.
This commit moves the handling of channel failures when failing a shard
to o.e.c.a.s.ShardStateAction. This means that shard failure requests
that timeout or occur when there is no master or the master leaves after
the request is sent will now be retried from here. The listener for a
shard failed request will now only be notified upon successful
completion of the shard failed request, or when a catastrophic
non-channel failure occurs.
This commit adds a simulation of the master leaving after a shard
failure request has been sent. In this case, after a new cluster state
is published (simulating a new master having been elected), the request
to fail the shard should be retried.
This commit handles the situation when we are failing a shard and either
no master is known, or the known master left while failing the shard. We
handle this situation by waiting for a new master to be reelected, and
then sending the shard failed request to the new master.
Adding serialization capabilities to RescoreBuilder and make
all QueryRescorer implement NamedWritable, also requiring
all implementations of RescoreBuilder.Rescorer to implement
equals() and hashCode.
In addition, the current rescore mode enumeration is pulled out to a
separate class to make sharing of constants easier between
the query builders XContent rendering coder and the parser.