This change adds the recall@k metric and refactors precision@k to match
the new metric.
Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.
See: https://github.com/elastic/elasticsearch/issues/51676
Backports: https://github.com/elastic/elasticsearch/pull/52577
Generalize how queries on `_index` are handled at rewrite time (#52486)
Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.
This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.
Closes#49254
Commit #52748 fixed a bug where percolate queries wrapped in a constant score
could report incorrect matches. This commit adds a test to check that it also fixes
the case where a percolate query is sorted by something other than score.
Closes#52618
Lucene's RAMDirectory has been deprecated. This commit replaces all uses of
RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses
are in tests, but the percolator and painless executor may get some small speedups.
The block setup by the test could be released by the nodes cluster info
thread before the disk threshold decider was disabled, now disable
decider first.
Currently, date ranges queries using NOW-based date math are rewritten to
MatchAllDocs queries when being preprocessed for the percolator. However,
since we added the verification step, this can result in incorrect matches when
percolator queries are run without scores. This commit changes things to instead
wrap date queries that use NOW with a new DateRangeIncludingNowQuery.
This is a simple wrapper query that returns its delegate at rewrite time, but it can
be detected by the percolator QueryAnalyzer and be dealt with accordingly.
This also allows us to remove a method on QueryRewriteContext, and push all
logic relating to NOW-based ranges into the DateFieldMapper.
Fixes#52617
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.
Closes#48465
We consider index level read_only_allow_delete blocks temporary since
the DiskThresholdMonitor can automatically release those when an index
is no longer allocated on nodes above high threshold.
The rest status has therefore been changed to 429 when encountering this
index block to signal retryability to clients.
Related to #49393
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.
In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
Backport of #52596
Backport of #52542.
This commit is part of issue #40366 to remove disabled Xlint warnings
from gradle files. In particular, it removes the Xlint exclusions from
the following files:
- benchmarks/build.gradle
- client/client-benchmark-noop-api-plugin/build.gradle
- x-pack/qa/rolling-upgrade/build.gradle
- x-pack/qa/third-party/active-directory/build.gradle
- modules/transport-netty4/build.gradle
For the first three files no code adjustments were needed. For
x-pack/qa/third-party/active-directory move the suppression at the code
level. For transport-netty4 replace the variable arguments with
ArrayLists and remove any redundant casts.
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
Now that the FIPS 140 security provider is simply a test dependency
we don't need the thirdPartyAudit exceptions, but plugin-cli and
transport-netty4 do need jarHell disabled as they use the non fips
BouncyCastle security provider as a test dependency too.
* Improve Painless compilation performance for nested conditionals (#52056)
This PR changes how conditional expression is handled in `PainlessParser`
in a way that avoids the need for backtracking, which led to exponential
compilation times in case of nested conditionals.
The test was added ensures that we can compile deeply nested conditionals.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Fix Map.of in Java8
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This change fixes flakiness in `CsvProcessorTests` where source field
can be the same as one of the headers used by tests which messes up
asserts when we check that field is not present after processor run.
Closes#50209
* Add empty_value parameter to CSV processor
This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.
* docs updated
* Fix compilation problem
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
when a timezone is not provided Ingest logic should consider a time to be in a timezone provided as a parameter.
When a timezone is provided Ingest should recalculate a time to the timezone provided as a parameter
closes#51108
backport(#51215)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
rest-api-spec/test/10_basic.yml would check that transport_types is
`netty4` but we run FIPS 140 tests with default distribution and
transport_types is `security4`
This commit deprecates the creation of dot-prefixed index names (e.g.
.watches) unless they are either 1) a hidden index, or 2) registered by
a plugin that extends SystemIndexPlugin. This is the first step
towards more thorough protections for system indices.
This commit also modifies several plugins which use dot-prefixed indices
to register indices they own as system indices, and adds a plugin to
register .tasks as a system index.
We still test remote reindex against version 0.90. This failed on mac a
few times and rather than spend time investigating this, we no longer
test remote reindex against 0.90 on mac.
Closes#51202
This change changes the way to run our test suites in
JVMs configured in FIPS 140 approved mode. It does so by:
- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.
- When runtime java is 11 and higher, using BouncyCastle FIPS
Cryptographic provider and BCJSSE in FIPS mode. These are
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests
- When runtime java is 8, using BouncyCastle FIPS
Cryptographic provider and SunJSSE in FIPS mode.
Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
* Refactor ForEachProcessor to use iteration instead of recursion (#51104)
* Refactor ForEachProcessor to use iteration instead of recursion
This change makes ForEachProcessor iterative and still non-blocking.
In case of non-async processors we use single for loop and no recursion at all.
In case of async processors we continue work on either current thread or thread
started by downstream processor, whichever is slower (usually processor thread).
Everything is synchronised by single atomic variable.
Relates #50514
* Update IngestCommonPlugin.java
Add the character position of a scripting error to error responses.
The contents of the `position` field are experimental and subject to
change. Currently, `offset` refers to the character location where the
error was encountered, `start` and `end` define a range of characters
that contain the error.
eg.
```
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"y = x;",
" ^---- HERE"
],
"script": "def x = new ArrayList(); Map y = x;",
"lang": "painless",
"position": {
"offset": 33,
"start": 29,
"end": 35
}
}
```
Refs: #50993
This replaces the message we return for unknown queries with the standard
one that we use for unknown fields from `ObjectParser`. This is nice
because it includes "did you mean". One day we might convert parsing
queries to using object parser, but that looks complex. This change is
much smaller and seems useful.
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.
Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.
Fixes: #50734
(cherry picked from commit 24e1858)
Backport: #50467
This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.
Example usage in pipeline:
PUT /_ingest/pipeline/2
{
"processors": [
{
"set": {
"field": "pipeline_name",
"value": "{{_ingest.pipeline}}"
}
}
]
}
Closes#42106
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
"dac": {}
}'
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
}
],
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
},
"status" : 400
}
```
The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
We deprecated and removed the camel-case versions of the nGram and edgeNGram
filters a while ago and we should do the same with the nGram and edgeNGram tokenizers.
This PR deprecates the use of these names in favour of ngram and edge_ngram in
7. Usage will be disallowed on new indices starting with 8 then.