609a199bd4
When we introduces [persistent node ids](https://github.com/elastic/elasticsearch/pull/19140) we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying). Sadly there were two problems with this: 1) One minor and easy to fix - we didn't allow for the case where the existing node can have the same network address as the incoming one, but have a different ephemeral id (after node restart). This confused the logic in `AllocationService`, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean. 2) The assumption that Node Fault Detection will clean up is *wrong* when the node just won an election (it wasn't master before) and needs to process the incoming joins in order to commit the cluster state and assume it's mastership. In those cases, the Node Fault Detection isn't active. This PR fixes these two and prefers incoming nodes to existing node when finishing an election. On top of the, on request by @ywelsch , `AllocationService` synchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time. The same goes for promotion of replicas to primaries. |
||
---|---|---|
.. | ||
src/main | ||
README.md | ||
build.gradle |
README.md
Elasticsearch Microbenchmark Suite
This directory contains the microbenchmark suite of Elasticsearch. It relies on JMH.
Purpose
We do not want to microbenchmark everything but the kitchen sink and should typically rely on our macrobenchmarks with Rally. Microbenchmarks are intended to spot performance regressions in performance-critical components. The microbenchmark suite is also handy for ad-hoc microbenchmarks but please remove them again before merging your PR.
Getting Started
Just run gradle :benchmarks:jmh
from the project root directory. It will build all microbenchmarks, execute them and print the result.
Running Microbenchmarks
Benchmarks are always run via Gradle with gradle :benchmarks:jmh
.
Running via an IDE is not supported as the results are meaningless (we have no control over the JVM running the benchmarks).
If you want to run a specific benchmark class, e.g. org.elasticsearch.benchmark.MySampleBenchmark
or have special requirements
generate the uberjar with gradle :benchmarks:jmhJar
and run it directly with:
java -jar benchmarks/build/distributions/elasticsearch-benchmarks-*.jar
JMH supports lots of command line parameters. Add -h
to the command above to see the available command line options.
Adding Microbenchmarks
Before adding a new microbenchmark, make yourself familiar with the JMH API. You can check our existing microbenchmarks and also the JMH samples.
In contrast to tests, the actual name of the benchmark class is not relevant to JMH. However, stick to the naming convention and
end the class name of a benchmark with Benchmark
. To have JMH execute a benchmark, annotate the respective methods with @Benchmark
.
Tips and Best Practices
To get realistic results, you should exercise care when running benchmarks. Here are a few tips:
Do
- Ensure that the system executing your microbenchmarks has as little load as possible. Shutdown every process that can cause unnecessary
runtime jitter. Watch the
Error
column in the benchmark results to see the run-to-run variance. - Ensure to run enough warmup iterations to get the benchmark into a stable state. If you are unsure, don't change the defaults.
- Avoid CPU migrations by pinning your benchmarks to specific CPU cores. On Linux you can use
taskset
. - Fix the CPU frequency to avoid Turbo Boost from kicking in and skewing your results. On Linux you can use
cpufreq-set
and theperformance
CPU governor. - Vary the problem input size with
@Param
. - Use the integrated profilers in JMH to dig deeper if benchmark results to not match your hypotheses:
- Run the generated uberjar directly and use
-prof gc
to check whether the garbage collector runs during a microbenchmarks and skews your results. If so, try to force a GC between runs (-gc true
) but watch out for the caveats. - Use
-prof perf
or-prof perfasm
(both only available on Linux) to see hotspots.
- Run the generated uberjar directly and use
- Have your benchmarks peer-reviewed.
Don't
- Blindly believe the numbers that your microbenchmark produces but verify them by measuring e.g. with
-prof perfasm
. - Run more threads than your number of CPU cores (in case you run multi-threaded microbenchmarks).
- Look only at the
Score
column and ignoreError
. Instead take countermeasures to keepError
low / variance explainable.