- Shedding locks at startup is bad, we actually want to keep them. Stop doing that.
- stopGracefully now interrupts the run thread if had started running finishJob. This avoids
waiting for handoff unnecessarily.
- fixes#1970
- extracted out segment handoff callbacks in SegmentHandoffNotifier
which is responsible for tracking segment handoffs and doing callbacks
when handoff is complete.
- Coordinator now maintains a view of segments in the cluster, this
will affect the jam heap requirements for the overlord for large
clusters.
realtime index task and nodes now use HTTP end points exposed by the
coordinator to get serverView
review comment
fix realtime node guide injection
review comments
make test not rely on scheduled exec
fix compilation
fix import
review comment
introduce immutableSegmentLoadInfo
fix son reading
remove unnecessary logging
This is done by killing and respawning the jvms rather than reconnecting to existing
jvms, for a couple reasons. One is that it lets you restore tasks after server reboots
too, and another is that it lets you upgrade all the software on a box at once by just
restarting everything.
The main changes are,
1) Add "canRestore" and "stopGracefully" methods to Tasks that say if a task can
stop gracefully, and actually do a graceful stop. RealtimeIndexTask is the only
one that currently implements this.
2) Add "stop" method to TaskRunners that attempts to do an orderly shutdown.
ThreadPoolTaskRunner- call stopGracefully on restorable tasks, wait for exit
ForkingTaskRunner- close output stream to restorable tasks, wait for exit
RemoteTaskRunner- do nothing special, we actually don't want to shutdown
3) Add "restore" method to TaskRunners that attempts to bootstrap tasks from last run.
Only ForkingTaskRunner does anything here. It maintains a "restore.json" file with
a list of restorable tasks.
4) Have the CliPeon's ExecutorLifecycle lock the task base directory to avoid a restored
task and a zombie old task from stomping on each other.
Deserialization of Optionals does not work quite right- they come back as actual
nulls, rather than absent Optionals. So these probably only ever worked for the local
task action client.
This is a feature meant to allow realtime tasks to work without being told upfront
what shardSpec they should use (so we can potentially publish a variable number
of segments per interval).
The idea is that there is a "pendingSegments" table in the metadata store that
tracks allocated segments. Each one has a segment id (the same segment id we know
and love) and is also part of a sequence.
The sequences are an idea from @cheddar that offers a way of doing replication.
If there are N tasks reading exactly the same data with exactly the same logic
(think Kafka tasks reading a fixed range of offsets) then you can place them
in the same sequence, and they will generate the same sequence of segments.
In #933 the ForkingTaskRunner's logging was changed to buffered from
unbuffered. This means that the last few KB of the logs are generally
not visible while a task is running, which makes debugging running
tasks difficult.