Processing bulk request goes item by item. Sometimes during processing, we need to stop execution and wait for a new mapping update to be processed by the node. This is currently achieved by throwing a `RetryOnPrimaryException`, which is caught higher up. When the exception is caught, we wait for the next cluster state to arrive and process the request again. Sadly this is a problem because all operations that were already done until the mapping change was required are applied again and get new sequence numbers. This in turn means that the previously issued sequence numbers are never replicated to the replicas. That causes the local checkpoint of those shards to be stuck and with it all the seq# based infrastructure.
This commit refactors how we deal with retries with the goal of removing `RetryOnPrimaryException` and `RetryOnReplicaException` (not done yet). It achieves so by introducing a class `BulkPrimaryExecutionContext` that is used the capture the execution state and allows continuing from where the execution stopped. The class also formalizes the steps each item has to go through:
1) A translation phase for updates
2) Execution phase (always index/delete)
3) Waiting for a mapping update to come in, if needed
4) Requires a retry (for updates and cases where the mapping are still not available after the put mapping call returns)
5) A finalization phase which allows updates to the index/delete result to an update result.
Enhance reproduction line with info about jdks
Provide the ability to control compiler and hava versions just by
passing a property. The actual java home comes from the
`JAVA<major>_HOME` env vars that we allready require.
This works better with the Gradle daemon as well.
Output is also changed a bit.
for `-Druntime.java=8 -Dcompiler.java=9`:
```
=======================================
Elasticsearch Build Hamster says Hello!
Gradle Version : 4.9
OS Info : Linux 4.17.8-1-ARCH (amd64)
Compiler JDK Version : 11 (Oracle Corporation 11-ea [OpenJDK 64-Bit Server VM 11-ea+22])
Runtime JDK Version : 11 (Oracle Corporation 11-ea [OpenJDK 64-Bit Server VM 11-ea+22])
Gradle JDK Version : 10 (Oracle Corporation 10.0.1 [OpenJDK 64-Bit Server VM 10.0.1+10])
Compiler java.home : /home/alpar/opt/jdk-11-ea22/
Runtime java.home : /home/alpar/opt/jdk-11-ea22/
Gradle java.home : /usr/lib/jvm/java-10-openjdk
Random Testing Seed : EA858533191E8DFB
=======================================
```
Without configuration:
```
=======================================
Elasticsearch Build Hamster says Hello!
=======================================
Gradle Version : 4.9
OS Info : Linux 4.17.8-1-ARCH (amd64)
JDK Version : 10 (Oracle Corporation 10.0.1 [OpenJDK 64-Bit Server VM 10.0.1+10])
JAVA_HOME : /usr/lib/jvm/java-10-openjdk
Random Testing Seed : 4BD5B2A839C8FCA1
=======================================
```
Here's how a reproduction line will look like (test made to fail):
```
./gradlew :modules:lang-painless:test -Dtests.seed=2DA2379065A4EEAB -Dtests.class=org.elasticsearch.painless.AdditionTests -Dtests.method="testInt" -Dtests.security.manager=true -Dtests.locale=es-PE -Dtests.timezone=WET -Dcompiler.java=10 -Druntime.java=10
```
This adds a java time based date math parser class in order, which will replace the joda date based one in the future. For now the class also returns the date in milliseconds since the epoch.
This commit modifies the test to handle file permission
tests in windows/dos environments. The test requires access
to UserPrincipal and so have modified the plugin-security policy
to access user information.
Closes#32637
This commit adds two pieces. The first is a small set of documentation providing
instructions on how to get setup to run context examples. This will require a download
similar to how Kibana works for some of the examples. The second is an ingest processor
example using the downloaded data. More examples will follow as ideally one per PR.
This also adds a set of tests to individually test each script as a unit test.
Currently if a document cannot be indexed because it violates the defined
mapping for the index, a MapperException is thrown. In some cases it is
useful to expose the expected field type in the exception itself,
so that the user can react based on the error message. This change adds
the expected data type to the MapperException.
Closes#31502
Remove a few of the logger constructors that aren't widely used or
aren't used at all and deprecate a few more logger constructors in favor
of log4j2's `LogManager`.
This commit adds back the publishing section that sets the artifact id
of the generated pom file for the high level rest client. This was
accidentally removed during a consolidationo of the shadow plugin logic.
The incorrect NodeInfo is created when the optional parameter is not used, leading to the incorrect constructor being used. Simplified LocateFunctionProcessorDefinition by using one constructor instead of two.
Fixes https://github.com/elastic/elasticsearch/issues/32554
Skip the comparative tests using lowercasing/uppercasing against H2 (which considers the Locale).
ES-SQL is, so far, ignoring the Locale.
Still, the same queries are executed against ES-SQL alone and results asserted to be correct.
A bug in the test suite prevented to properly check that all date
formatters printed the date the same way like joda time does.
This fixes the test and thus also a fair share of formats, that
now use the strict parser for printing.
We previously discussed moving the classes extending `AcknowledgedResponse` to
simply use `AcknowledgedResponse`, making the class non-abstract.
This moves the first class to do this, removing `WritePipelineResponse` in the
process.
If we like the way this looks, I will switch the remaining classes over to using
`AcknowledgedResponse`.
When Circuit Breaker has tripped, certain diagnostic requests like
"_cluster/health" succeed where as request to / fails with
503 Service Unavailable. This behavior is observed because of this
commit f32b700 where certain API paths are whitelisted from
Circuit Breaking exception, but / is not whitelisted.
Added / to circuit breaker whitelist so that it can be used for
diagnostic purposes
* Fixes suggestion generics
This solves a compile problem in Eclipse where Eclipse could not
resolve the generics for the options field in `PhraseSuggestion.Entry`.
But I think this is also a good change in general because
`PhraseSuggestion.Entry` is now declaring the specific `Option`
implementation it requires rather than `Suggest.Entry.Option` which is
more general and could lead to weird bugs. `CompletionSuggestion.Entry`
and `TermSuggestion.Entry` already declare the more specific class they
use so I think this was an oversight in `PhaseSuggestion.Entry`
* iter
`ShardOperationFailedException` and corresponding implementors seem to suggest that the cause may be null, case that is also handled in a few places. Yet, it does not seem to be possible in practice for the cause to be null, hence we can clean that up and enforce the cause to be a non null value. This is best done by making `ShardOperationFailedException` an abstract class rather than an interface, which holds the basic member instance that all the subclasses have in common and can also enforce that cause, status and reason are non null.
As part of #32608 we made sure that the fully qualified index name is taken from the query shard context whenever creating a new `QueryShardException`. That change introduced a regression as instead of setting the entire `Index` object to the exception, which holds index name and index uuid, we ended up setting only the index name (including cluster alias). With this commit we make sure that the index uuid does not get lost and we try to lower the chances that a similar bug makes it in another time. That's done by making `QueryShardContext` return the fully qualified `Index` (which also holds the uuid) rather than only the fully qualified index name.
This is related to #27260. This commit replaces the netty driven http
client (Netty4HttpClient) with one that is driven by (NioHttpClient).
This client exists in the test package and is used for making http
requests.
Suggestion responses were previously serialized as streamables which
made writing suggesters in plugins with custom suggestion response types
impossible. This commit makes them serialized as named writeables and
provides a facility for registering a reader for suggestion responses
when registering a suggester.
This also makes Suggestion responses abstract, requiring a suggester
implementation to provide its own types. Suggesters which do not need
anything additional to what is defined in Suggest.Suggestion should
provide a minimal subclass.
The existing plugin suggester integration tests are removed and
replaced with an equivalent implementation as an example
plugin.
This change consolidates all the logic for generating a FunctionReference (renamed from
FunctionRef) from several arbitrary constructors to a single static function that is used at
both compile-time and run-time. This increases long-term maintainability as it is much
easier to follow when and how a function reference is being generated. It moves most of
the duplicated logic out of the ECapturingFuncRef, EFuncRef and ELambda nodes and
Def as well.
It will be useful for future efforts to know if the global checkpoint
was updated. To this end, we need to expose whether or not the global
checkpoint was updated when the state of the replication tracker
updates. For this, we add to the tracker a callback that is invoked
whenever the global checkpoint is updated. For primaries this will be
invoked when the computed global checkpoint is updated based on state
changes to the tracker. For replicas this will be invoked when the local
knowledge of the global checkpoint is advanced from the primary.
The MockNioTransport (similar to the MockTcpTransport) is used for integ
tests. The MockTcpTransport has always only opened a single for all of
its work. The MockNioTransport has awlays opened the default number of
connections (13). This means that every test where two transports
connect requires 26 connections. This is more than is necessary. This
commit modifies the MockNioTransport to only require 3 connections.
On some Linux distributions tmpfiles.d cleans files and
directories under /tmp if they haven't been accessed for
10 days.
This can cause problems for ML as ML is currently the only
component that uses the temp directory more than a few
seconds after startup. If you didn't open an ML job for
10 days and then tried to open one then the temp directory
would have been deleted.
This commit prevents the problem occurring in the case of
Elasticsearch being managed by systemd, as systemd private
temp directories are not subject to periodic cleanup (by
default).
Additionally there are now some docs to warn people about
the risk and suggest a manual mitigation for .tar.gz users.
Primary terms were introduced as part of the sequence-number effort (#10708) and added in ES
5.0. Subsequent work introduced the replication tracker which lets the primary own its replication
group (#25692) to coordinate recovery and replication. The replication tracker explicitly exposes
whether it is operating in primary mode or replica mode, independent of the ShardRouting object
that's associated with a shard. During a primary relocation, for example, the primary mode is
transferred between the primary relocation source and the primary relocation target. After
transferring this so-called primary context, the old primary becomes a replication target and the
new primary the replication source, reflected in the replication tracker on both nodes. With the
most recent PR in this area (#32442), we finally have a clean transition between a shard that's
operating as a primary and issuing sequence numbers and a shard that's serving as a replication
target. The transition from one state to the other is enforced through the operation-permit system,
where we block permit acquisition during such changes and perform the transition under this
operation block, ensuring that there are no operations in progress while the transition is being
performed. This finally allows us to turn the best-effort checks that were put in place to prevent
shards from being used in the wrong way (i.e. primary as replica, or replica as primary) into hard
assertions, making it easier to catch any bugs in this area.
Currently, when TranslogCorruptedException is thrown most of the times it does not contain information about the translog location on the file system. There is the translog recovery tool that accepts the translog path as an argument and users are constantly puzzled where to get the path.
This pull request adds "source" information to every TranslogCorruptedException thrown. The source could be local file, remote translog source (used for recovery), assertion (translog entry is constructed to perform some assertion) or translog constructed inside the test.
Closes#24929
This change adds a check so that when parsing the search source, script fields are
ignored when the requested search result size is 0. This helps with e.g. clients like
Kibana that sends a list of script fields that they may need for convenience, but they
don't require any hits. Before this change, user sometimes ran into confusing behaviour,
e.g. the script compilation limit to breaking although no hits were requested.
Closes#31824
* We were comparing the wrong timeout value in the `randomValueOtherThan` call here, leading to no mutation happening for a certain seed
* closes#32639