This allows users to filter out tokens from a TokenStream using painless scripts,
instead of having to write specialised Java code and packaging it up into a plugin.
The commit also refactors the AnalysisPredicateScript.Token class so that it wraps
and makes read-only an AttributeSource.
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
Historically we have had a ESLoggingHandler in the netty module that
logs low-level connection operations. This class just extends the netty
logging handler with some (broken) message deserialization. This commit
fixes this message serialization and moves the class to server.
This new logger logs inbound and outbound messages. Eventually, we
should move other event logging to this class (connect, close, flush).
That way we will have consistent logging regards of which transport is
loaded.
Resolves#27306 on master. Older branches will need a different fix.
This commit is related to #32517. It allows an "server_name"
attribute on a DiscoveryNode to be propagated to the server using
the TLS SNI extentsion. This functionality is only implemented for
the netty security transport.
This allows tokenfilters to be applied selectively, depending on the status of the current token in the tokenstream. The filter takes a scripted predicate, and only applies its subfilter when the predicate returns true.
We can have multiple documents in Lucene with the same seq_no for
parent-child documents (or without rollback). In this case, the usage
"lastSeenSeqNo + 1" is an off-by-one error as it may miss some
documents. This error merely affects the `skippedOperations` contract.
See: https://github.com/elastic/elasticsearch/pull/33222#discussion_r213842257Closes#33318
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:
- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)
Relates #30086Closes#29530
---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:
Co-authored-by: Adrien Grand <jpountz@gmail.com>
Co-authored-by: Boaz Leskes <b.leskes@gmail.com>
Co-authored-by: Jason Tedor <jason@tedor.me>
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Simon Willnauer <simonw@apache.org>
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:
- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)
Relates #30086Closes#29530
---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:
Co-authored-by: Adrien Grand jpountz@gmail.com
Co-authored-by: Boaz Leskes b.leskes@gmail.com
Co-authored-by: Jason Tedor jason@tedor.me
Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com
Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co
Co-authored-by: Simon Willnauer simonw@apache.org
When the change was made to the format for in the whitelist for bindings, parameters from
both the constructor and the method were combined into a single list instead of separate
lists. The check for method parameters was being executed from the start of the combined
list rather than the correct position. The tests for bindings used a constructor and a method
that only used the int types so this was not caught. The test has been changed to also use
a double type and this issue is fixed.
- third party audit detects jar hell with JDK so we disable it
- jdk non portable in forbiddenapis detects classes being used from the
JDK ( for fips ) that are not portable, this is intended so we don't
scan for it on fips.
- different exclusion rules for third party audit on fips
Closes#33179
Trailers (statements following something like an if statement) that don't use brackets currently require a semicolon even if they're the last statement. This is a regression caused by (#29566) and noted by (#33193). This change fixes the regression and adds a test for the broken case.
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. In a
long series of PRs I've changed all of the old style requests that I
could find with `grep`. In this PR I change all requests that I could
find by *removing* the deprecated methods. Since this is a non-trivial
change I do not include actually removing the deprecated requests. I'll
do that in a follow up. But this should be the last set of usage
removals before the actual deprecated method removal. Yay!
* ingest: Introduce the dissect processor
The ingest node dissect processor is an alternative to Grok
to split a string based on a pattern. Dissect differs from
Grok such that regular expressions are not used to split the
string.
Dissect can be used to parse a source text field with a
simpler pattern, and is often faster the Grok for basic string
parsing. This processor uses the dissect library which
does most of the work.
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
In our Netty layer we have had to take extra precautions against Netty
catching throwables which prevents them from reaching the uncaught
exception handler. This code has taken on additional uses in NIO layer
and now in the scheduler engine because there are other components in
stack traces that could catch throwables and suppress them from reaching
the uncaught exception handler. This commit is a simple cleanup of the
iterative evolution of this code to refactor all uses into a single
method in ExceptionsHelper.
This is related to #32517. This commit passes the DiscoveryNode to the
initiateChannel method for different Transport implementation. This
will allow additional attributes (besides just the socket address) to be
used when opening channels.
This is a followup to #31886. After that commit the
TransportConnectionListener had to be propogated to both the
Transport and the ConnectionManager. This commit moves that listener
to completely live in the ConnectionManager. The request and response
related methods are moved to a TransportMessageListener. That listener
continues to live in the Transport class.
This removes def from the classes map in PainlessLookup and instead always special
cases it. This prevents potential calls against the def type that shouldn't be made and
forces all cases of def throughout Painless code to be special cased.
This is related to #31835. It moves the default connection profile into
the ConnectionManager class. The will allow us to have different
connection managers with different profiles.
This removes custom Response classes that extend `AcknowledgedResponse` and do nothing, these classes are not needed and we can directly use the non-abstract super-class instead.
While this appears to be a large PR, no code has actually changed, only class names have been changed and entire classes removed.
This changes the whitelist parameter fqn_only to no_import when specifying that a
whitelisted class must have the fully-qualified-name instead of a shortcut name. This more
closely correlates with Java imports, hence the rename.
This is related to #31835. This commit adds a connection manager that
manages client connections to other nodes. This means that the
TcpTransport no longer maintains a map of nodes that it is connected
to.
Currently AbstractBuilderTestCase generates certain random values in its
`beforeTest()` method annotated with @Before only the first time that a test
method in the suite is run while initializing the serviceHolder that we use for
the rest of the test. This changes the values of subsequent random values
and has the effect that when running single methods from a test suite with
"-Dtests.method=*", the random values it sees are different from when the same
test method is run as part of the whole test suite. This makes it hard to use
the reproduction lines logged on failure.
This change runs the inialization of the serviceHolder and the randomization
connected to it using the test runners master seed, so reproduction by running
just one method is possible again.
Closes#32400
This commit adds two pieces. The first is a small set of documentation providing
instructions on how to get setup to run context examples. This will require a download
similar to how Kibana works for some of the examples. The second is an ingest processor
example using the downloaded data. More examples will follow as ideally one per PR.
This also adds a set of tests to individually test each script as a unit test.