There's a currently unhandled edge case for the precion@ metric. When none of
the search hits in the result are rated, we have neither true nor false
positives which currently leads to division by zero. We should return a precion
of 0.0 in this case.
This adds checks for expected warning headers to the query builder test
infrastructure. Tests that are adding deprecation warnings to the response
headers need to check those, otherwise the abstract base class for the test
class will complain at teardown.
The `IndexService#newQueryShardContext()` method creates a QueryShardContext on
shard `0`, with a `null` reader and that uses `System.currentTimeMillis()` to
resolve `now`. This may hide bugs, since the shard id is sometimes used for
query parsing (it is used to salt random score generation in `function_score`),
passing a `null` reader disables query rewriting and for some use-cases, it is
simply not ok to rely on the current timestamp (eg. percolation). So this pull
request removes this method and instead requires that all call sites provide
these parameters explicitly.
It was 10mb and that was causing trouble when folks reindex-from-remoted
with large documents.
We also improve the error reporting so it tells folks to use a smaller
batch size if they hit a buffer size exception. Finally, adds some docs
to reindex-from-remote mentioning the buffer and giving an example of
lowering the size.
Closes#21185
When multiple ratings for the same document (identified by _index, _type,
_id) are specified in the request we should throw an error. This change adds a
check for this in the RatedRequest setter (and ctor that uses that setter).
Closes#20997
This adds support for templating in rank eval requests.
Relates to #20231
Problem: In it's current state the rank-eval request API forces the user to repeat complete queries for each test request. In most use cases the structure of the query to test will be stable with only parameters changing across requests, so this looks like lots of boilerplate json for something that could be expressed in a more concise way.
Uses templating/ ScriptServices to enable users to submit only one test request template and let them only specify template parameters on a per test request basis.
Previously Elasticsearch would only use the package name for logging
levels, truncating the package prefix and the class name. This meant
that logger names for Netty were just prefixed by netty3 and netty. We
changed this for Elasticsearch so that it's the fully-qualified class
name now, but never corrected this for Netty. This commit fixes the
logger names for the Netty modules so that their levels are controlled
by the fully-qualified class name.
Relates #21223
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
Java 9's exception message when lists have an out of bounds index
is much better than java 8 but the painless code asserted on the
java 8 message. Now it'll accept either.
I'm tempted to weaken the assertion but I like asserting that the
message is readable.
Adds support for indexing into lists and arrays with negative
indexes meaning "counting from the back". So for if
`x = ["cat", "dog", "chicken"]` then `x[-1] == "chicken"`.
This adds an extra branch to every array and list access but
some performance testing makes it look like the branch predictor
successfully predicts the branch every time so there isn't a
in execution time for this feature when the index is positive.
When the index is negative performance testing showed the runtime
is the same as writing `x[x.length - 1]`, again, presumably thanks
to the branch predictor.
Those performance metrics were calculated for lists and arrays but
`def`s get roughly the same treatment though instead of inlining
the test they need to make a invoke dynamic so we don't screw up
maps.
Closes#20870
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.
This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.
Relates #21123
Versions before 2.0 needed to be told to return interesting fields
like `_parent`, `_routing`, `_ttl`, and `_timestamp`. And they come
back inside a `fields` block which we need to parse.
Closes#21044
This commit upgrades the transport-netty4 module dependency from Netty
version 4.1.5 to version 4.1.6. This is a bug fix release of Netty.
Relates #21051
This allows you to whitelist `localhost:*` or `127.0.10.*:9200`.
It explicitly checks for patterns like `*` in the whitelist and
refuses to start if the whitelist would match everything. Beyond
that the user is on their own designing a secure whitelist.
When running `gradle run`, a developer usually intends to get a running
instance as if they had run elasticsearch from the command line. This is
different than the isolated environment we use for integration testing
plugins. This change switches the run task to use the zip distribution,
so that all modules included in the normal distribution are included.
* Scripting: Add support for booleans in scripts
Since 2.0, booleans have been represented as numeric fields (longs).
However, in scripts, this is odd, since you expect doing a comparison
against a boolean to work. While languages like groovy will auto convert
between booleans and longs, painless does not.
This changes the doc values accessor for boolean fields in scripts to
return Boolean objects instead of Long objects.
closes#20949
* Make Booleans final and remove wrapping of `this` for getValues()
This commit fixes an issue with the handling of the value "keep-alive"
on the Connection header in the Netty 4 HTTP implementation while
handling an HTTP 1.0 request. The issue was using the wrong equals
method to compare an AsciiString instance and a String instance (they
could never be equal). This commit fixes this to use the correct equals
method to compare for content equality.
This commit fixes an issue with the handling of the value "close" on the
Connection header in the Netty 4 HTTP implementation. The issue was
using the wrong equals method to compare an AsciiString instance and a
String instance (they could never be equal). This commit fixes this to
use the correct equals method to compare for content equality.
Relates #20956
The unknown document section in the response for each query can be rendered
using the rated hits that are now also part of the response by just filtering
the documents without a rating.
Currently each implementation of RankedListQualityMetric does some initial
joining operation that links the input search hits with a rated document rating,
if available. Also all metrics collect unknown docs and now also need to add the
list of rated search hits to the partial query evaluation. This change
centralizes this work in some new helper methods in RankedListQualityMetric.
This change adds a `hits` section to the response part for each ranking
evaluation query, containing a list of documents (index/type/id) and ratings (if
the document was rated in the request). This section can be used to better
understand the calculation of the ranking quality of this particular query, but
it can also be used to identify the "unknown" (that is unrated) documents that
were part of the seach hits, for example because a UI later wants to present
those documents to the user to get a rating for them.
If the user specifies a set of field names using a parameter called
`summary_fields` in the request, those fields are also included as part of the
response in addition to "_index", "_type", "_id".
Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.
There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.
This commit also increases the lower bound of the max processes ulimit,
to prepare for a world where Elasticsearch instances might be running
with more the previous cap of 32 processors. With the current settings,
Elasticsearch wants to create roughly 576 + 25 * p / 2 threads, where p
is the number of processors. Add in roughly 7 * p / 8 threads for the GC
threads and a fudge factor, and 4096 should cover us pretty well up to
256 cores.
Relates #20874
Both netty3 and netty4 http implementation printed the default
toString representation of PortRange if ports couldn't be bound.
This commit adds a better default toString method to PortRange and
uses the string representation for the error message in the http
implementations.
Some objects like maps, iterables or arrays of objects can self-reference themselves. This is mostly due to a bug in code but the XContentBuilder should be able to detect such situations and throws an IllegalArgumentException instead of building objects over and over until a stackoverflow occurs.
closes#20540closes#19475
Update scripts might want to update the documents `_timestamp` but need a notion of `now()`.
Painless doesn't support any notion of now() since it would make scripts non-pure functions. Yet,
in the update case this is a valid value and we can pass it with the context together to allow the
script to record the timestamp the document was updated.
Relates to #17895
* Replace org.elasticsearch.common.lucene.search.MatchNoDocsQuery with its Lucene version (org.apache.lucene.search.MatchNoDocsQuery)
This change removes the ES version of the match no docs query and replaces it with the Lucene version.
relates #18030
* Add missing change
Today we throw an assertion error if we release an AbstractArray more than once.
Yet, it's recommended to implement close methods such that they can be invoked
more than once. Guaranteed single release calls are hard to implement and some
situations might not be tested causing for instance `CircuitBreaker` to operate on
corrupted memory stats.