Move translog recover outside of the engine
We changed the way we manage engine memory buffers to an
open model where each shard can essentially has infinite memory.
The indexing memory controller is responsible for moving memory to disk
when it's needed. Yet, this doesn't work today when we recover from store/translog
since the engine is not fully initialized such that IMC has no access to the engine,
neither to it's memory buffer nor can it move data to disk.
The biggest issue here is that translog recovery happends inside the Engine constructor
which is problematic by itself since it might take minutes and uses a not yet fully
initialzied engine to perform write operations on.
This change detaches the translog recovery and makes it the responsibility of the caller
to run it once the engine is fully constructed or skip it if not necessary.
BulkByScrollTaskTest#testDelayAndRethrottle was getting rejected exceptions
every once in a while. This was reproducible ~20% of the time for me. I
added a CyclicBarrier to prevent the test from shutting down the thread pool
before the threads get finished.
Changes QueryParser into a @FunctionalInterface and provides a way to
register queries using that. Cuts match and function_score queries over
to that registration method as a proof of concept.
Once all queries have been cut over we can remove their PROTOTYPES.
Currently our testing of parsing query builders is limited to the
default order of the parameters that each builders toXContent()
method produces. To better test real queries where the order of
parameters can be different, this change adds a helper
method to ESTestCase that takes a XContentBuilder and randomly
shuffles the order of the fields inside an object. This is
used in AbstractQueryTestCase, but it can be used in other similar
places in the future.
The refactoring of RescoreBuilder and QueryRescoreBuilder moved parsing
previously in RescoreParseElement into the builders. This removes the
left over parse element.
By default, tasks are grouped by node. However, task execution in elasticsearch can be quite complex and an individual task that runs on a coordinating node can have many subtasks running on other nodes in the cluster. This commit makes it possible to list task grouped by common parents instead of by node. When this option is enabled all subtask are grouped under the coordinating node task that started all subtasks in the group. To group tasks by common parents, use the following syntax:
GET /tasks?group_by=parents
We changed the way we manage engine memory buffers to an
open model where each shard can essentially has infinite memory.
The indexing memory controller is responsible for moving memory to disk
when it's needed. Yet, this doesn't work today when we recover from store/translog
since the engine is not fully initialized such that IMC has no access to the engine,
neither to it's memory buffer nor can it move data to disk.
The biggest issue here is that translog recovery happends inside the Engine constructor
which is problematic by itself since it might take minutes and uses a not yet fully
initialzied engine to perform write operations on.
This change detaches the translog recovery and makes it the responsibility of the caller
to run it once the engine is fully constructed or skip it if not necessary.
This allows the user to update the reindex throttle on the fly, with changes
that speed up the throttling being applied immediately and changes that
slow down the throttling being applied during the next batch. This means
that if a user throttles reindex in such a way that it tries to sleep for
16 years and then realizes that they've done something wrong then they
can change the throttle and reindex will wake up again. We don't apply
slow downs immediately so we never get in danger of losing the scan context.
Also, if reindex is canceled while it is sleeping (how it honor throttling)
then it'll immediately wake up and cancel itself.