Today the translog of an engine is exposed and can be accessed directly.
While this exposure offers much flexibility, it also causes these troubles:
- Inconsistent behavior between translog method and engine method.
For example, rolling a translog generation via an engine also trims
unreferenced files, but translog's method does not.
- An engine does not get notified when critical errors happen in translog
as the access is direct.
This change isolates translog of an engine and enforces all accesses to
translog via the engine.
The index thread pool is no longer needed as its primary use-case for
single-document indexing requests has been relieved now that
single-document indexing requests are converted to bulk indexing
requests (with a single document payload).
This adds 2 testcases that test if a shard goes idle
pending (uncommitted) segments are committed and unreferenced
files will be freed.
Relates to #29482
This change adds the current primary term to the header of the current
translog file. Having a term in a translog header is a prerequisite step
that allows us to trim translog operations given the max valid seq# for
that term.
This commit also updates tests to conform the primary term invariant
which guarantees that all translog operations in a translog file have
its terms at most the term stored in the translog header.
* Add a helper method to get a random java.util.TimeZone
This adds a helper method to ESTestCase that returns a randomized
`java.util.TimeZone`. This can be used when transitioning code from Joda to the
JDK's time classes.
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
Currently rest-based tests do not work from the IDE, as the security
manager is configured to permit certain network operations when
using the snapshot jars compiled by gradle. We have an existing
workaround that explicitly associates a codebase with the path
from which the classes are loaded (in this case, the IDE build
directory). This PR adds the rest client to this workaround list.
* Move Streams.copy into elasticsearch-core and make a multi-release jar
This moves the method `Streams.copy(InputStream in, OutputStream out)` into the
`elasticsearch-core` project (inside the `o.e.core.internal.io` package). It
also makes this class into a multi-release class where the Java 9 equivalent
uses `InputStream#transferTo`.
This is a followup from
https://github.com/elastic/elasticsearch/pull/29300#discussion_r178147495
* Move ObjectParser into the x-content lib
This moves `ObjectParser`, `AbstractObjectParser`, and
`ConstructingObjectParser` into the libs/x-content dependency. This decoupling
allows them to be used for parsing for projects that don't want to depend on the
entire Elasticsearch jar.
Relates to #28504
* Fixes query_string query equals timezone check
This change fixes a bug where two `QueryStringQueryBuilder`s were found
to be equal if they had the same timezone set even if the query string
in the builders were different
Closes#29403
* Adds mutate function to QueryStringQueryBuilderTests
* iter
This improves the way similarities are plugged in in order to:
- reject the classic similarity on 7.x indices and emit a deprecation
warning otherwise
- reject unkwown parameters on 7.x indices and emit a deprecation
warning otherwise
Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.
Closes#23208Closes#29035
* Begin moving XContent to a separate lib/artifact
This commit moves a large portion of the XContent code from the `server` project
to the `libs/xcontent` project. For the pieces that have been moved, some
helpers have been duplicated to allow them to be decoupled from ES helper
classes. In addition, `Booleans` and `CheckedFunction` have been moved to the
`elasticsearch-core` project.
This decoupling is a move so that we can eventually make things like the
high-level REST client not rely on the entire ES jar, only the parts it needs.
There are some pieces that are still not decoupled, in particular some of the
XContent tests still remain in the server project, this is because they test a
large portion of the pluggable xcontent pieces through
`XContentElasticsearchException`. They may be decoupled in future work.
Additionally, there may be more piecese that we want to move to the xcontent lib
in the future that are not part of this PR, this is a starting point.
Relates to #28504
Removes a set of assertions in the test framework that verified that
Streamable objects could be serialized and deserialized across different
versions. When this was discussed the consensus was that this approach
has not caught many bugs in a long time and that serialization testing of
objects was best left to their respective unit and integration tests.
This commit also removes a transport interceptor that was used in
ESIntegTestCase tests to make these assertions about objects coming in
or off the wire.
Today we have a few problems with how we handle bad requests:
- handling requests with bad encoding
- handling requests with invalid value for filter_path/pretty/human
- handling requests with a garbage Content-Type header
There are two problems:
- in every case, we give an empty response to the client
- in most cases, we leak the byte buffer backing the request!
These problems are caused by a broader problem: poor handling preparing
the request for handling, or the channel to write to when the response
is ready. This commit addresses these issues by taking a unified
approach to all of them that ensures that:
- we respond to the client with the exception that blew us up
- we do not leak the byte buffer backing the request
We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.
This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.
Closes#26802
Once a document is deleted and Lucene is refreshed, we will not be able
to look up the `version/seq#` associated with that delete in Lucene. As
conflicting operations can still be indexed, we need another mechanism
to remember these deletes. Therefore deletes should still be stored in
the Version Map, even after Lucene is refreshed. Obviously, we can't
remember all deletes forever so a trimming mechanism is needed.
Currently, we remember deletes for at least 1 minute (the default GC
deletes cycle) and clean them periodically. This is, at the moment, the
best we can do on the primary for user facing APIs but this arbitrary
time limit is problematic for replicas. Furthermore, we can't rely on
the primary and replicas doing the trimming in a synchronized manner,
and failing to do so results in the replica and primary making different
decisions.
The following scenario can cause inconsistency between
primary and replica.
1. Primary index doc (index, id=1, v2)
2. Network packet issue causes index operation to back off and wait
3. Primary deletes doc (delete, id=1, v3)
4. Replica processes delete (delete, id=1, v3)
5. 1+ minute passes (GC deletes runs replica)
6. Indexing op is finally sent to the replica which no processes it
because it forgot about the delete.
We can reply on sequence-numbers to prevent this issue. If we prune only
deletes whose seqno at most the local checkpoint, a replica will
correctly remember what it needs. The correctness is explained as
follows:
Suppose o1 and o2 are two operations on the same document with seq#(o1)
< seq#(o2), and o2 arrives before o1 on the replica. o2 is processed
normally since it arrives first; when o1 arrives it should be discarded:
1. If seq#(o1) <= LCP, then it will be not be added to Lucene, as it was
already previously added.
2. If seq#(o1) > LCP, then it depends on the nature of o2:
- If o2 is a delete then its seq# is recorded in the VersionMap,
since seq#(o2) > seq#(o1) > LCP, so a lookup can find it and
determine that o1 is stale.
- If o2 is an indexing then its seq# is either in Lucene (if
refreshed) or the VersionMap (if not refreshed yet), so a
real-time lookup can find it and determine that o1 is stale.
In this PR, we prefer to deploy a single trimming strategy, which
satisfies both requirements, on primary and replicas because:
- It's simpler - no need to distinguish if an engine is running at
primary mode or replica mode or being promoted.
- If a replica subsequently is promoted, user experience is fully
maintained as that replica remembers deletes for the last GC cycle.
However, the version map may consume less memory if we deploy two
different trimming strategies for primary and replicas.
#28245 has introduced the utility class`EngineDiskUtils` with a set of methods to prepare/change
translog and lucene commit points. That util class bundled everything that's needed to create and
empty shard, bootstrap a shard from a lucene index that was just restored etc.
In order to safely do these manipulations, the util methods acquired the IndexWriter's lock. That
would sometime fail due to concurrent shard store fetching or other short activities that require the
files not to be changed while they read from them.
Since there is no way to wait on the index writer lock, the `Store` class has other locks to make
sure that once we try to acquire the IW lock, it will succeed. To side step this waiting problem, this
PR folds `EngineDiskUtils` into `Store`. Sadly this comes with a price - the store class doesn't and
shouldn't know about the translog. As such the logic is slightly less tight and callers have to do the
translog manipulations on their own.
This change refactors the composite aggregation to add an execution mode that visits documents in the order of the values
present in the leading source of the composite definition. This mode does not need to visit all documents since it can early terminate
the collection when the leading source value is greater than the lowest value in the queue.
Instead of collecting the documents in the order of their doc_id, this mode uses the inverted lists (or the bkd tree for numerics) to collect documents
in the order of the values present in the leading source.
For instance the following aggregation:
```
"composite" : {
"sources" : [
{ "value1": { "terms" : { "field": "timestamp", "order": "asc" } } }
],
"size": 10
}
```
... can use the field `timestamp` to collect the documents with the 10 lowest values for the field instead of visiting all documents.
For composite aggregation with more than one source the execution can early terminate as soon as one of the 10 lowest values produces enough
composite buckets. For instance if visiting the first two lowest timestamp created 10 composite buckets we can early terminate the collection since it
is guaranteed that the third lowest timestamp cannot create a composite key that compares lower than the one already visited.
This mode can execute iff:
* The leading source in the composite definition uses an indexed field of type `date` (works also with `date_histogram` source), `integer`, `long` or `keyword`.
* The query is a match_all query or a range query over the field that is used as the leading source in the composite definition.
* The sort order of the leading source is the natural order (ascending since postings and numerics are sorted in ascending order only).
If these conditions are not met this aggregation visits each document like any other agg.
* Remove BytesArray and BytesReference usage from XContentFactory
This removes the usage of `BytesArray` and `BytesReference` from
`XContentFactory`. Instead, a regular `byte[]` should be passed. To assist with
this a helper has been added to `XContentHelper` that will preserve the offset
and length from the underlying BytesReference.
This is part of ongoing work to separate the XContent parts from ES so they can
be factored into their own jar.
Relates to #28504
`$_path` is used by documentation tests to ignore a value from a
response, for example:
```
[source,js]
----
{
"count": 1,
"datafeeds": [
{
"datafeed_id": "datafeed-total-requests",
"state": "started",
"node": {
...
"attributes": {
"ml.machine_memory": "17179869184",
"ml.max_open_jobs": "20",
"ml.enabled": "true"
}
},
"assignment_explanation": ""
}
]
}
----
// TESTRESPONSE[s/"17179869184"/$body.$_path/]
```
That example shows `17179869184` in the compiled docs but when it runs
the tests generated by that doc it ignores `17179869184` and asserts
instead that there is a value in that field. This is required because we
can't predict things like "how many milliseconds will this take?" and
"how much memory will this take?".
Before this change it was impossible to use `$_path` when any component
of the path contained a `.`. This fixes the `$_path` evaluator to
properly escape `.`.
Closes#28770
Changes made in #28972 seems to have changed some assumptions about how
SMILE and CBOR write byte[] values and how this is tested. This changes
the generation of the randomized DocumentField values back to BytesArray
while expecting the JSON and YAML deserialisation to produce Base64
encoded strings and SMILE and CBOR to parse back BytesArray instances.
Closes#29080
Currently we have a fairly complicated logic in the engine constructor logic to deal with all the
various ways we want to mutate the lucene index and translog we're opening.
We can:
1) Create an empty index
2) Use the lucene but create a new translog
3) Use both
4) Force a new history uuid in all cases.
This leads complicated code flows which makes it harder and harder to make sure we cover all the
corner cases. This PR tries to take another approach. Constructing an InternalEngine always opens
things as they are and all needed modifications are done by static methods directly on the
directory, one at a time.
* Decouple XContentBuilder from BytesReference
This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.
While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).
Relates to #28504
As we have factored Elasticsearch into smaller libraries, we have ended
up in a situation that some of the dependencies of Elasticsearch are not
available to code that depends on these smaller libraries but not server
Elasticsearch. This is a good thing, this was one of the goals of
separating Elasticsearch into smaller libraries, to shed some of the
dependencies from other components of the system. However, this now
means that simple utility methods from Lucene that we rely on are no
longer available everywhere. This commit copies IOUtils (with some small
formatting changes for our codebase) into the fold so that other
components of the system can rely on these methods where they no longer
depend on Lucene.
I have long wanted an actual test that dying with dignity works. It is
tricky because if dying with dignity works, it means the test JVM dies
which is usually an abnormal condition. And anyway, how does one force a
fatal error to be thrown. I was motivated to investigate this again by
the fact that I missed a backport to one branch leading to an issue
where Elasticsearch would not successfully die with dignity. And now we
have a solution: we install a plugin that throws an out of memory error
when it receives a request. We hack the standalone test infrastructure
to prevent this from failing the test. To do this, we bypass the
security manager and remove the PID file for the node; this tricks the
test infrastructure into thinking that it does not need to stop the
node. We also bypass seccomp so that we can fork jps to make sure that
Elasticsearch really died. And to be extra paranoid, we parse the logs
of the dead Elasticsearch process to make sure it died with
dignity. Never forget.
Today we have two test base classes that have a lot in common when it comes to testing wire and xcontent serialization: `AbstractSerializingTestCase` and `AbstractXContentStreamableTestCase`. There are subtle differences though between the two, in the way they work, what can be overridden and features that they support (e.g. insertion of random fields).
This commit introduces a new base class called `AbstractWireTestCase` which holds all of the serialization test code in common between `Streamable` and `Writeable`. It has two minimal subclasses called `AbstractWireSerializingTestCase` and `AbstractStreamableTestCase` which are specialized for `Writeable` and `Streamable`.
This commit also introduces a new test class called `AbstractXContentTestCase` for all of the xContent testing, which holds a testFromXContent method for parsing and rendering to xContent. This one can be delegated to from the existing `AbstractStreamableXContentTestCase` and `AbstractSerializingTestCase` so that we avoid code duplicate as much as possible and all these base classes offer the same functionalities in the same way. Having this last base class decoupled from the serialization testing may also help with the REST high-level client testing, as there are some classes where it's hard to implement equals/hashcode and this makes it possible to override `assertEqualInstances` for custom equality comparisons (also this base class doesn't require implementing equals/hashcode as it doesn't test such methods.
This commit is related to #27260. Currently there is a weird
relationship between channel contexts and nio channels. The selectors
use the context for read and writing. But the selector operates directly
on the nio channel for registering, closing, and connecting.
This commit works on improving this relationship. The selector operates
directly on the context which wraps the low level java.nio.channels. The
NioChannel class is simply an API that is used to interact with the
channel (sending messages from outside the selector event loop,
scheduling a close, adding listeners, etc). The context is only used
internally by the channel to implement these apis and by the selector to
perform these operations.
This allows us to save a bit of code, but also adds more coverage as it tests serialization which was missing in some of the existing tests. Also it requires implementing equals/hashcode and we get the corresponding tests for them for free from the base test class.
* Pass InputStream when creating XContent parser
Rather than passing the raw `BytesReference` in when creating the xcontent
parser, this passes the StreamInput (which is an InputStream), this allows us to
decouple XContent from BytesReference.
This also removes the use of `commons.Booleans` so it doesn't require more
external commons classes.
Related to #28504
* Undo boolean removal
* Enhance deprecation javadoc
Add support version and version_type in ingest pipelines
Add support for setting document version and version type in set
processor of an ingest pipeline.
* Remove log4j dependency from elasticsearch-core
This removes the log4j dependency from our elasticsearch-core project. It was
originally necessary only for our jar classpath checking. It is now replaced by
a `Consumer<String>` so that the es-core dependency doesn't have external
dependencies.
The parts of #28191 which were moved in conjunction (like `ESLoggerFactory` and
`Loggers`) have been moved back where appropriate, since they are not required
in the core jar.
This is tangentially related to #28504
* Add javadocs for `output` parameter
* Change @code to @link
Today we have several levels of indirection to acquire an Engine.Searcher.
We first acquire a the reference manager for the scope then acquire an
IndexSearcher and then create a searcher for the engine based on that.
This change simplifies the creation into a single method call instead of
3 different ones.
Previously we introduced a new parameter to `acquireIndexCommit` to
allow acquire either a safe commit or a last commit. However with the
new parameters, callers can provide a nonsense combination - flush first
but acquire the safe commit. This commit separates acquireIndexCommit
method into two different methods to avoid that problem. Moreover, this
change should also improve the readability.
Relates #28038
The REST high-level client supports now encoding of path parts, so that for instance documents with valid ids, but containing characters that need to be encoded as part of urls (`#` etc.), are properly supported. We also make sure that each path part can contain `/` by encoding them properly too.
Closes#28625
Currently the Translog constructor is capable both of opening an existing translog and creating a
new one (deleting existing files). This PR separates these two into separate code paths. The
constructors opens files and a dedicated static methods creates an empty translog.
* Move more XContent.createParser calls to non-deprecated version
Part 2
This moves more of the callers to pass in the DeprecationHandler.
Relates to #28504
* Use parser's deprecation handler where appropriate
* Use logging handler in test that uses deprecated field on purpose
* Move more XContent.createParser calls to non-deprecated version
This moves more of the callers to pass in the DeprecationHandler.
Relates to #28504
* Use parser's deprecation handler where available
Version Utils did not previously have logic that removed the last majors
minor snapshot if there was a next bugfix and maintenance bugfix
release. This adds the logic and fixes some broken assumptions in tests
as well.
relates #28505
The build.snapshot was mistakenly passed in to every snapshot version,
so when release tests were run, these versions were mistaken as released
entities and could not be found in maven, because they do not
exist. This fix removes that bug in logic, and always makes them proper
snapshots. This has a benefit of cleaning up the VersionUtilsTests
because they no longer rely on different sets of versions to check
against, which was also a bug.
Currently if a yaml test has a teardown and a test is failing then
a stash dump of a request in the teardown is logged instead of
a stash dump of a request in the test itself.
By handling the logging of stash dumps separately for setup, tests and
teardown yaml sections we shouldn't miss the stash dump of request/response
that is actually causing the yaml test to fail.