Current implementations of the indexer are using aggregations.
Thus each search step executes a search action. However,
we can generalize that to allow for any action that returns a `SearchResponse`.
This commit abstracts the search phase from the search action.
Backport of #61739
The CodecReader wrapper we use to remove the `_recovery_source` field
doesn't override `StoredFieldsreader#getMergeInstance`, which has the
undesired side-effect of preventing the wrapped stored fields reader
from optimizing merging.
`VersionConflictEngineException` is thrown on the hot path for updates,
but stack traces are expensive to compute and transport and rarely
useful for this kind of exception. This commit avoids computing the
stack trace for these exceptions.
This new snapshot contains the following JIRAs that we're interested in:
- [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525)
Better handling of small documents. This should improve retrieval times
when documents are less than ~1kB.
- [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510)
Faster flushes when index sorting is enabled by not compressing the
temporary files that store stored fields and term vectors.
Today we only emit `DEBUG` logs if the source disconnects from the
target during a recovery. This deserves to be noisier by default since
it should be rare and may help users identify other problems with their
network or with their shard movements.
This commit promotes this message to `INFO`. There's no need for `WARN`
since these days we will normally resume the recovery where it left off.
With this commit we rename all of the fielddata, doc_values and mapped field type classes for runtime fields to not start with the Script prefix but rather their runtime type (e.g. Boolean) and only then Script
To preserve the PIT semantics, the retrieval of results has moved from
using multi-get to using an idsQuery.
(cherry picked from commit 1c2362fcf2be62ce568b3772924abce7331ef23c)
This commit makes the LocalNodeMasterListener interface extend the
ClusterStateListener interface and use a default implementation for
detecting whether the local node master status changed.
Backport of #62422
This backport incorporates all the changes to improve compiler extensibility. The reason for this
backport is the changes are now required to support runtime fields.
With this commit we rename the script classes used for each mapped field type used for runtime fields. The new naming is a shorter version of the previous one: from e.g. BooleanScriptFieldScrip to BooleanScript . We also move such classes to the existing mapper package.
Constructing the timout checker FIRST and THEN registering the watcher allows the test to have a race condition.
The timeout value could be reached BEFORE the matcher is added. To prevent the matcher never being interrupted, a new timedOut value is added to the watcher thread entry. Then when a new matcher is registered, if the thread was previously timedout, we interrupt the matcher immediately.
closes#48861
We were checking for loops in queries before, but we had an "off by one"
error where we wouldn't notice the "top level" runtime field when
detecting a loop. So the error message would be wrong.
I also caught a few bugs with query generation caused by missing
`@Override` annotations and fixed a few of them. There is a bug with
`regexp` queries with match options that I'm not fixing in this PR but
will get to later.
Relates to #59332
This commit unmutes the windows check for testTooManyPartitions test.
The assertion has since changed to include a soft_limit check.
This coupled with changes over the past years means the test should be enabled again.
related to: #32033
With the addition of sub aggregations like filter, the validation could fail if 2 sub aggs use the
same output name. This change makes validation sub-agg aware.
fixes#57814
The OpenID Connect specification defines a number of ways for a
client (RP) to authenticate itself to the OP when accessing the
Token Endpoint. We currently only support `client_secret_basic`.
This change introduces support for 2 additional authentication
methods, namely `client_secret_post` (where the client credentials
are passed in the body of the POST request to the OP) and
`client_secret_jwt` where the client constructs a JWT and signs
it using the the client secret as a key.
Support for the above, and especially `client_secret_jwt` in our
integration tests meant that the OP we use ( Connect2id server )
should be able to validate the JWT that we send it from the RP.
Since we run the OP in docker and it listens on an ephemeral port
we would have no way of knowing the port so that we can configure
the ES running via the testcluster to know the "correct" Token
Endpoint, and even if we did, this would not be the Token Endpoint
URL that the OP would think it listens on. To alleviate this, we
run an ES single node cluster in docker, alongside the OP so that
we can configured it with the correct hostname and port within
the docker network.
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.
While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.
Relates to #59332
We were checking for loops in queries before, but we had an "off by one"
error where we wouldn't notice the "top level" runtime field when
detecting a loop. So the error message would be wrong.
I also caught a few bugs with query generation caused by missing
`@Override` annotations and fixed a few of them. There is a bug with
`regexp` queries with match options that I'm not fixing in this PR but
will get to later.
Relates to #59332
This speeds up `StreamOutput#writeVInt` quite a bit which is nice
because it is *very* commonly called when serializing aggregations. Well,
when serializing anything. All "collections" serialize their size as a
vint. Anyway, I was examining the serialization speeds of `StringTerms`
and this saves about 30% of the write time for that. I expect it'll be
useful other places.
Prevent the analyzer for trying to resolve aliases on expressions that
reference themselves (or fields within themselves) as that causes
infinite recursion.
Fix#62296
(cherry picked from commit 021d27815b03e92e02859bc9c0c8eec78f30c72e)
This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
out if we're using the optimization for segment ordinals. It adds a few
more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
non-trivial for some aggregations, like cardinality.
We want Logstash indices to be system indices, but the logstash
service will still need to be able to manage its indices. This PR
adds special system index APIs to the logstash plugin so that
logstash can manage its pipelines without direct access to the
underlying indices.
* Add logstash module with dedicated logstash APIs
* merge with x-pack plugin
* add system index access allowance
* Break out serialization tests into distinct classes
* Log failures for partial multiget failure
* Move LogstashSystemIndexIT to javaRestTest task
Co-authored-by: William Brafford <william.brafford@elastic.co>
Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
We never see this exception in the logs even though it's pretty severe.
All we might see is an exception about a transport message not having been read fully
from the logic that follows this code.
Technically we should probably bubble up the exception but that's a bigger change
and needs some carefully reasoning, this change for the time being at least simplifies
tracking down deserialization issues in responses.
FastVectorHighlighter uses the top-level reader to rewrite queries against, which
it gets via an IndexSearcher field on HitContext. However, we can already access
this top-level reader via HitContext's existing LeafReaderContext field.
This commit removes the unnecessary field and constructor parameter, and
changes the implementation of topLevelReader to go via ReaderUtils and
the leaf reader context.