This pull request makes the `RestGetSourceAction` return a `ResourceNotFoundException` with a proper JSON response when source or document itself is missing (see issue #33384).
Here is below a sample JSON output:
```
{
"error": {
"root_cause": [
{
"type": "resource_not_found_exception",
"reason": "Source not found [index1]/[_doc]/[1]"
}
],
"type": "resource_not_found_exception",
"reason": "Source not found [index1]/[_doc]/[1]"
},
"status": 404
}
```
Watcher still exposes some dates as joda DateTime objects. This commit
adds back joda to the painless whitelist so they can still be accessed.
closes#35913
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
This change fixes#35351. Users were no longer able to return types of numbers other than doubles for bucket aggregation scripts. This change reverts to the previous behavior of being able to return any type of number and having it converted to a double outside of the script.
This is related to #34483. It introduces a namespaced setting for
compression that allows users to configure compression on a per remote
cluster basis. The transport.tcp.compress remains as a fallback
setting. If transport.tcp.compress is set to true, then all requests
and responses are compressed. If it is set to false, only requests to
clusters based on the cluster.remote.cluster_name.transport.compress
setting are compressed. However, after this change regardless of any
local settings, responses will be compressed if the request that is
received was compressed.
Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.
The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query,
so that the query analyzer can't extract it and the `percolate` query
is then able to evaluate 'now' at query time.
With the removal of SearchScript (#34730), an extraneous compile method
was left behind in PainlessScriptEngine. This change removes the method and
updates the tests that depend on it to use the main compile method which
gives better test coverage.
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
This is related to #29023. Additionally at other points we have
discussed a preference for removing the need to unnecessarily block
threads for opening new node connections. This commit lays the groudwork
for this by opening connections asynchronously at the transport level.
We still block, however, this work will make it possible to eventually
remove all blocking on new connections out of the TransportService
and Transport.
Today if a wildcard, date-math expression or alias expands/resolves
to an index that is search-throttled we still search it. This is likely
not the desired behavior since it can unexpectedly slow down searches
significantly.
This change adds a new indices option that allows `search`, `count`
and `msearch` to ignore throttled indices by default. Users can
force expansion to throttled indices by using `ignore_throttled=true`
on the rest request to expand also to throttled indices.
Relates to #34352
This changes the current script.max_size_in_bytes to be dynamic so it can be
set through the cluster settings API. This setting is also applied to inline scripts
in the compile method of ScriptService to prevent excessively long inline
scripts from being compiled. The script length limit is removed from Painless as
this is no longer necessary with the protection in compile.
Currently we create a new netty event loop group for client connections
and all server profiles. Each new group creates new threads for io
processing. This means 2 * num of processors new threads for each group.
A single group should be able to handle all io processing (for the
transports). This also brings the netty module inline with what we do
for nio.
Additionally, this PR renames the worker threads to be the same for
netty and nio.
With this commit we differentiate between permanent circuit breaking
exceptions (which require intervention from an operator and should not
be automatically retried) and transient ones (which may heal themselves
eventually and should be retried). Furthermore, the parent circuit
breaker will categorize a circuit breaking exception as either transient
or permanent based on the categorization of memory usage of its child
circuit breakers.
Closes#31986
Relates #34460
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.
This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
In order to remove Streamable from the codebase, Response objects need
to be read using the Writeable.Reader interface which this change
enables. This change enables the use of Writeable.Reader by adding the
`Action#getResponseReader` method. The default implementation simply
uses the existing `newResponse` method and the readFrom method. As
responses are migrated to the Writeable.Reader interface, Action
classes can be updated to throw an UnsupportedOperationException when
`newResponse` is called and override the `getResponseReader` method.
Relates #34389
This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.
Relates #34508
After discussing on the team's FixItFriday, we concluded that
static final instance variables that are mutable should be lowercased.
Historically, DeprecationLogger was uppercased more frequently than lowercased.
This is related to #30876. The AbstractSimpleTransportTestCase initiates
many tcp connections. There are normally over 1,000 connections in
TIME_WAIT at the end of the test. This is because every test opens at
least two different transports that connect to each other with 13
channel connection profiles. This commit modifies the default
connection profile used by this test to 6. One connection for each
type, except for REG which gets 2 connections.
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
This change adds instance bindings to Painless. This binding allows a whitelisted
method to be called on an instance instantiated prior to script compilation.
Whitelisting must be done in code as there is no practical way to instantiate a
useful instance from a text file (see the tests for an example). Since an
instance can be shared by multiple scripts, each method called must be
thread-safe.
We throw parsing exception when an unknown array is found, but we don't when an unknown top-level field is found. This commit makes sure that unsupported top-level fields are not ignored in a do section.
Closes#34651
Access to special variables _source and _fields were accidentally
removed in recent refactorings. This commit adds them back, along with a
test.
closes#33884
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
This commit introduces two corrections to the way simulate?verbose
handles conditionals on processors.
1) Prior to this change when executing simulate?verbose for
processors with conditionals that evaluate to false, that processor
would still be displayed in the result set. What was displayed was
correct, such that no changes to the document occurred. However, if the
conditional evaluates to false, the processor should not even be
displayed.
2) Prior to this change when executing simulate?verbose for
pipeline processors with conditionals, the individual steps would no
longer be displayed. Commit e37e5df addressed the issue, but
failed account for a conditional on the pipeline processor. Since
a pipeline processor can introduce cycles and is effectively a
single processor that encapsulates multiple other processors that
are potentially guarded by a single conditional, special handling is
needed to for pipeline and conditional pipeline processors.
Currently the StemmerTokenFilterFactory checks the validity of the language
setting only when the first TokenStream is processed. Instead we should throw an
error earlier at mapping creation time. This change adds a check to the
StemmerTokenFilterFactory constructor that checks for a valid `language` setting
by trying to create a new TokenStream from an empty input stream. This will
throw errors about wrong language settings early on.
Closes#34170
This removes the extraneous check to see if the class for a statically imported
method is already whitelisted, so a statically imported method can be whitelisted
independently.
Since all calls to `ESLoggerFactory` outside of the logging package were
deprecated, it seemed like it'd simplify things to migrate all of the
deprecated calls and declare `ESLoggerFactory` to be package private.
This does that.
* SCRIPTING: Add Expr. Compile for TermSetQuery Ctx.
* Follow up to #33602 adding the ability to compile TermsSetQuery
scripts with the expressions engine in the same way we support
SearchScript in Expressions
* Duplicated the code here for now to make the change less complex,
the only difference to SearchScript is that `_score` and `_value` are not handled for TermsSetQuery
* remove redundant check
Optionals containing boxed primitive types are prohibitively costly because they
have two level of boxing. For Optional<Integer> the analogous OptionalInt can be
used to avoid the boxing of the contained int value.
When a script uses the special-cased SearchScript or ExecutableScript, Painless is
internally caching the GenericElasticsearchScript that is generated via compile as part of
the newInstance method for the factories. This breaks the model for class bindings as the
class binding expects to not have any stored information for each instance generated via
newInstance from the factory. This change fixes the issue by creating a new instance of
the GenericElasticsearchScript each time newInstance is called against the factory.
The synonym filters no longer need access to the AnalysisRegistry in their
constructors, so we can remove the special-case code and move them to the
common analysis module.
This commit means that synonyms are no longer available for `server` integration tests,
so several of these are either rewritten or migrated to the common analysis module
as rest-spec-api tests
This commit adds the ability to plug in compilation of custom contexts
in mock script engine. This is needed for testing plugins which add
custom contexts like watcher.
Currently a bad regex in CORS settings throws a PatternSyntaxException, which
then bubbles up through the bootstrap code, meaning users have to parse a
stack trace to work out where the problem is. We should instead catch this
exception and rethrow with a more useful error message.
It is sometimes desirable to pass a class into a script constructor that
will not actually be exposed in the script whitelist. This commit uses
reflection when creating the compiler to find all the classes of the
factory method signature, and make the classloader that wraps lookup
also expose these classes.
* Adds equals/hashcode methods to the Painless* objects within the lookup package.
* Changes the caches to use the Painless* objects as keys as well as values. This forces
future changes to taken into account the appropriate values for caching.
* Deletes the existing caching objects in favor of Painless* objects. This removes a pair of
bugs that were not taking into account subtle corner cases related to augmented methods
and caching.
* Uses the Painless* objects to check for equivalency in existing Painless* objects that
may have already been added to a PainlessClass. This removes a bug related to return
not being appropriately checked when adding methods.
* Cleans up several error messages.
This change cleans up "unused variable" warnings. There are several cases were we
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
This commit removes the sysprop controlling whether ctx is in params for
update scripts and replaces it with use of the new ParameterMap, which
outputs a deprecation warning whenever params.ctx is used.
* INGEST: Tests for Drop Processor
* UT for behavior of dropped callback
and drop processor
* Moved drop processor to `server`
project to enable this test
* Simple IT
* Relates #32278
This commit introduces an AbstractSimpleSecurityTransportTestCase for
security transports. This classes provides transport tests that are
specific for security transports. Additionally, it fixes the tests referenced in
#33285.
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
Drops `Settings` from some of the methods to lookup loggers and
deprecates another logger lookup that takes `Settings` because
`Settings` is no longer required to build a logger.
* ingest: support simulate with verbose for pipeline processor
This change better supports the use of simulate?verbose with the
pipeline processor. Prior to this change any pipeline processors
executed with simulate?verbose would not show all intermediate
processors for the inner pipelines.
This changes also moves the PipelineProcess and TrackingResultProcessor
classes to enable instance checks and to avoid overly public classes.
As well this updates the error message for when cycles are detected
in pipelines calling other pipelines.
With the upcoming instance bindings, the singular *Binding name isn't descriptive enough
with multiple binding types. This renames the existing *Binding classes to *ClassBinding.
Mechanical change (with some error messages changed from binding to class binding by
hand).
We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to
know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This
special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional
filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they
appear before them in a chain, because they produce multiple tokens at the same position.
This commit adds two methods to the TokenFilterFactory interface.
* `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding
filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and
`CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`.
* `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym
list `Analyzer`. By default it returns `true`.
Fixes#33609
* MINOR: Drop Redundant Ctx. Check in ScriptService
* This check is completely redundant, the expression script
engine will throw anyway (and with a similar message) for
those contexts that it cannot compile. Moreover, the update context
is not the only context that is not suported by the expression engine
at this point so handling the update context separately here makes
no sense.
This commit switches the joda time backcompat in scripting to use
augmentation over ZonedDateTime. The augmentation methods provide
compatibility with the missing methods between joda's DateTime and
java's ZonedDateTime. Due to getDayOfWeek returning an enum in the java
API, ZonedDateTime is wrapped so that the method can return int like the
joda time does. The java time api version is renamed to
getDayOfWeekEnum, which will be kept through 7.x for compatibility while
users switch back to getDayOfWeek once joda compatibility is removed.
This allows users to filter out tokens from a TokenStream using painless scripts,
instead of having to write specialised Java code and packaging it up into a plugin.
The commit also refactors the AnalysisPredicateScript.Token class so that it wraps
and makes read-only an AttributeSource.
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
Historically we have had a ESLoggingHandler in the netty module that
logs low-level connection operations. This class just extends the netty
logging handler with some (broken) message deserialization. This commit
fixes this message serialization and moves the class to server.
This new logger logs inbound and outbound messages. Eventually, we
should move other event logging to this class (connect, close, flush).
That way we will have consistent logging regards of which transport is
loaded.
Resolves#27306 on master. Older branches will need a different fix.
This commit is related to #32517. It allows an "server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. This functionality is only implemented for
the netty security transport.
This allows tokenfilters to be applied selectively, depending on the status of the current token in the tokenstream. The filter takes a scripted predicate, and only applies its subfilter when the predicate returns true.
We can have multiple documents in Lucene with the same seq_no for
parent-child documents (or without rollback). In this case, the usage
"lastSeenSeqNo + 1" is an off-by-one error as it may miss some
documents. This error merely affects the `skippedOperations` contract.
See: https://github.com/elastic/elasticsearch/pull/33222#discussion_r213842257Closes#33318
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:
- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)
Relates #30086Closes#29530
---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:
Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Simon Willnauer <simonw@apache.org>
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:
- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)
Relates #30086Closes#29530
---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:
Co-authored-by: Adrien Grand jpountz@gmail.com
Co-authored-by: Boaz Leskes b.leskes@gmail.com
Co-authored-by: Jason Tedor jason@tedor.me
Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com
Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co
Co-authored-by: Simon Willnauer simonw@apache.org
When the change was made to the format for in the whitelist for bindings, parameters from
both the constructor and the method were combined into a single list instead of separate
lists. The check for method parameters was being executed from the start of the combined
list rather than the correct position. The tests for bindings used a constructor and a method
that only used the int types so this was not caught. The test has been changed to also use
a double type and this issue is fixed.
- third party audit detects jar hell with JDK so we disable it
- jdk non portable in forbiddenapis detects classes being used from the
JDK ( for fips ) that are not portable, this is intended so we don't
scan for it on fips.
- different exclusion rules for third party audit on fips
Closes#33179
Trailers (statements following something like an if statement) that don't use brackets currently require a semicolon even if they're the last statement. This is a regression caused by (#29566) and noted by (#33193). This change fixes the regression and adds a test for the broken case.
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. In a
long series of PRs I've changed all of the old style requests that I
could find with `grep`. In this PR I change all requests that I could
find by *removing* the deprecated methods. Since this is a non-trivial
change I do not include actually removing the deprecated requests. I'll
do that in a follow up. But this should be the last set of usage
removals before the actual deprecated method removal. Yay!
* ingest: Introduce the dissect processor
The ingest node dissect processor is an alternative to Grok
to split a string based on a pattern. Dissect differs from
Grok such that regular expressions are not used to split the
string.
Dissect can be used to parse a source text field with a
simpler pattern, and is often faster the Grok for basic string
parsing. This processor uses the dissect library which
does most of the work.
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
In our Netty layer we have had to take extra precautions against Netty
catching throwables which prevents them from reaching the uncaught
exception handler. This code has taken on additional uses in NIO layer
and now in the scheduler engine because there are other components in
stack traces that could catch throwables and suppress them from reaching
the uncaught exception handler. This commit is a simple cleanup of the
iterative evolution of this code to refactor all uses into a single
method in ExceptionsHelper.
This is related to #32517. This commit passes the DiscoveryNode to the
initiateChannel method for different Transport implementation. This
will allow additional attributes (besides just the socket address) to be
used when opening channels.
This is a followup to #31886. After that commit the
TransportConnectionListener had to be propogated to both the
Transport and the ConnectionManager. This commit moves that listener
to completely live in the ConnectionManager. The request and response
related methods are moved to a TransportMessageListener. That listener
continues to live in the Transport class.
This removes def from the classes map in PainlessLookup and instead always special
cases it. This prevents potential calls against the def type that shouldn't be made and
forces all cases of def throughout Painless code to be special cased.
This is related to #31835. It moves the default connection profile into
the ConnectionManager class. The will allow us to have different
connection managers with different profiles.
This removes custom Response classes that extend `AcknowledgedResponse` and do nothing, these classes are not needed and we can directly use the non-abstract super-class instead.
While this appears to be a large PR, no code has actually changed, only class names have been changed and entire classes removed.
This changes the whitelist parameter fqn_only to no_import when specifying that a
whitelisted class must have the fully-qualified-name instead of a shortcut name. This more
closely correlates with Java imports, hence the rename.
This is related to #31835. This commit adds a connection manager that
manages client connections to other nodes. This means that the
TcpTransport no longer maintains a map of nodes that it is connected
to.
Currently AbstractBuilderTestCase generates certain random values in its
`beforeTest()` method annotated with @Before only the first time that a test
method in the suite is run while initializing the serviceHolder that we use for
the rest of the test. This changes the values of subsequent random values
and has the effect that when running single methods from a test suite with
"-Dtests.method=*", the random values it sees are different from when the same
test method is run as part of the whole test suite. This makes it hard to use
the reproduction lines logged on failure.
This change runs the inialization of the serviceHolder and the randomization
connected to it using the test runners master seed, so reproduction by running
just one method is possible again.
Closes#32400
This commit adds two pieces. The first is a small set of documentation providing
instructions on how to get setup to run context examples. This will require a download
similar to how Kibana works for some of the examples. The second is an ingest processor
example using the downloaded data. More examples will follow as ideally one per PR.
This also adds a set of tests to individually test each script as a unit test.