The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
Now that the FIPS 140 security provider is simply a test dependency
we don't need the thirdPartyAudit exceptions, but plugin-cli and
transport-netty4 do need jarHell disabled as they use the non fips
BouncyCastle security provider as a test dependency too.
* Improve Painless compilation performance for nested conditionals (#52056)
This PR changes how conditional expression is handled in `PainlessParser`
in a way that avoids the need for backtracking, which led to exponential
compilation times in case of nested conditionals.
The test was added ensures that we can compile deeply nested conditionals.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Fix Map.of in Java8
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This change fixes flakiness in `CsvProcessorTests` where source field
can be the same as one of the headers used by tests which messes up
asserts when we check that field is not present after processor run.
Closes#50209
* Add empty_value parameter to CSV processor
This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.
* docs updated
* Fix compilation problem
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
when a timezone is not provided Ingest logic should consider a time to be in a timezone provided as a parameter.
When a timezone is provided Ingest should recalculate a time to the timezone provided as a parameter
closes#51108
backport(#51215)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
rest-api-spec/test/10_basic.yml would check that transport_types is
`netty4` but we run FIPS 140 tests with default distribution and
transport_types is `security4`
This commit deprecates the creation of dot-prefixed index names (e.g.
.watches) unless they are either 1) a hidden index, or 2) registered by
a plugin that extends SystemIndexPlugin. This is the first step
towards more thorough protections for system indices.
This commit also modifies several plugins which use dot-prefixed indices
to register indices they own as system indices, and adds a plugin to
register .tasks as a system index.
We still test remote reindex against version 0.90. This failed on mac a
few times and rather than spend time investigating this, we no longer
test remote reindex against 0.90 on mac.
Closes#51202
This change changes the way to run our test suites in
JVMs configured in FIPS 140 approved mode. It does so by:
- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.
- When runtime java is 11 and higher, using BouncyCastle FIPS
Cryptographic provider and BCJSSE in FIPS mode. These are
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests
- When runtime java is 8, using BouncyCastle FIPS
Cryptographic provider and SunJSSE in FIPS mode.
Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
* Refactor ForEachProcessor to use iteration instead of recursion (#51104)
* Refactor ForEachProcessor to use iteration instead of recursion
This change makes ForEachProcessor iterative and still non-blocking.
In case of non-async processors we use single for loop and no recursion at all.
In case of async processors we continue work on either current thread or thread
started by downstream processor, whichever is slower (usually processor thread).
Everything is synchronised by single atomic variable.
Relates #50514
* Update IngestCommonPlugin.java
Add the character position of a scripting error to error responses.
The contents of the `position` field are experimental and subject to
change. Currently, `offset` refers to the character location where the
error was encountered, `start` and `end` define a range of characters
that contain the error.
eg.
```
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"y = x;",
" ^---- HERE"
],
"script": "def x = new ArrayList(); Map y = x;",
"lang": "painless",
"position": {
"offset": 33,
"start": 29,
"end": 35
}
}
```
Refs: #50993
This replaces the message we return for unknown queries with the standard
one that we use for unknown fields from `ObjectParser`. This is nice
because it includes "did you mean". One day we might convert parsing
queries to using object parser, but that looks complex. This change is
much smaller and seems useful.
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.
Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.
Fixes: #50734
(cherry picked from commit 24e1858)
Backport: #50467
This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.
Example usage in pipeline:
PUT /_ingest/pipeline/2
{
"processors": [
{
"set": {
"field": "pipeline_name",
"value": "{{_ingest.pipeline}}"
}
}
]
}
Closes#42106
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
"dac": {}
}'
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
}
],
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
},
"status" : 400
}
```
The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
We deprecated and removed the camel-case versions of the nGram and edgeNGram
filters a while ago and we should do the same with the nGram and edgeNGram tokenizers.
This PR deprecates the use of these names in favour of ngram and edge_ngram in
7. Usage will be disallowed on new indices starting with 8 then.
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.
This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.
This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.
Fixes#42349
A very large number of recursive calls can cause a stack overflow
exception. This commit forks the recursive calls for non-async
processors. Once forked, each thread will handle at most 10
recursive calls to help keep the stack size and thread count
down to a reasonable size.
Currently, if an updateable synonym filter is included in a multiplexer filter,
it is not reloaded via the _reload_search_analyzers because the multiplexer
itself doesn't pass on the analysis mode of the filters it contains, so its not
recognized as "updateable" in itself. Instead we can check and merge the
AnalysisMode settings of all filters in the multiplexer and use the resulting
mode (e.g. search-time only) for the multiplexer itself, thus making any synonym
filters contained in it reloadable. This, of course, will also make the
analyzers using the multiplexer be usable at search-time only.
Closes#50554
ElasticsearchException.guessRootCauses would return wrapper exception if
inner exception was not an ElasticsearchException. Fixed to never return
wrapper exceptions.
At least following APIs change root_cause.0.type as a result:
_update with bad script
_index with bad pipeline
Relates #50417
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.
In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
- keys can't be longer than 20 chars,
- values can only be numbers or strings of no more than 50 chars - no inner
arrays or objects,
- the metadata can't have more than 5 keys in total.
Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.
Here is how the meta might look like in mappings:
```json
{
"properties": {
"latency": {
"type": "long",
"meta": {
"unit": "ms"
}
}
}
}
```
And then in the field capabilities response:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms" ]
}
}
}
}
```
When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms", "ns" ]
}
}
}
}
```
Closes#33267
In order to ensure that logstash and Elasticsearch are able to understand
the same patterns, this commit adapts to changes in logstash, adds a few
patterns and changes a few.
When there several subqueries on different relations of the join field,
and only one of subqueries is using inner_hits, NPE occurs.
This PR prevents NPE error.
Closes#50539
With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API,
term queries that are direct children of a boolean query are handled separately from
other children. This works fine for conjunctions, but for disjunctions we need to
treat the extracted terms from these direct descendents along with extractions from
more deeply nested children to ensure that minimum-should-match requirements
are met correctly.
This commit changes the logic in QueryAnalyzer#getResult() to bundle child term
results with all other results before handling them.
Fixes#50305
*Most* of our parsing can be done without passing any extra context into
the parser that isn't already part of the xcontent stream. While I was
looking around at the places that *do* need a context I found a few
places that were declared to need a context but don't actually need it.
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to these
parsers.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to a bunch
of these parsers, mostly the ones in xpack and their "paired" parsers in
the high level rest client. I picked these just to have somewhere to
break the up the change so it wouldn't be huge.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
deprecation warnings for `nGram` and `edgeNGram` respectively on version 7
indices.
Relates #50360
Avoid backwards incompatible changes for 8.x and 7.6 by removing type
restriction on compile and Factory. Factories may optionally implement
ScriptFactory. If so, then they can indicate determinism and thus
cacheability.
**Backport**
Relates: #49466