OpenSearch/benchmarks
Nik Everett 81cba796e6
Add microbenchmark for LongKeyedBucketOrds (#58608) (#59459)
I've always been confused by the strange behavior that I saw when
working on #57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to #56487
2020-07-13 17:22:46 -04:00
..
src/main Add microbenchmark for LongKeyedBucketOrds (#58608) (#59459) 2020-07-13 17:22:46 -04:00
README.md Optimize date_histograms across daylight savings time (backport of #55559) (#56334) 2020-05-07 09:10:51 -04:00
build.gradle Remove misc dependency related deprecation warnings (7.x backport) (#59122) 2020-07-07 17:10:31 +02:00

README.md

Elasticsearch Microbenchmark Suite

This directory contains the microbenchmark suite of Elasticsearch. It relies on JMH.

Purpose

We do not want to microbenchmark everything but the kitchen sink and should typically rely on our macrobenchmarks with Rally. Microbenchmarks are intended to spot performance regressions in performance-critical components. The microbenchmark suite is also handy for ad-hoc microbenchmarks but please remove them again before merging your PR.

Getting Started

Just run gradlew -p benchmarks run from the project root directory. It will build all microbenchmarks, execute them and print the result.

Running Microbenchmarks

Running via an IDE is not supported as the results are meaningless because we have no control over the JVM running the benchmarks.

If you want to run a specific benchmark class like, say, MemoryStatsBenchmark, you can use --args:

gradlew -p benchmarks run --args ' MemoryStatsBenchmark'

Everything in the ' gets sent on the command line to JMH. The leading inside the 's is important. Without it parameters are sometimes sent to gradle.

Adding Microbenchmarks

Before adding a new microbenchmark, make yourself familiar with the JMH API. You can check our existing microbenchmarks and also the JMH samples.

In contrast to tests, the actual name of the benchmark class is not relevant to JMH. However, stick to the naming convention and end the class name of a benchmark with Benchmark. To have JMH execute a benchmark, annotate the respective methods with @Benchmark.

Tips and Best Practices

To get realistic results, you should exercise care when running benchmarks. Here are a few tips:

Do

  • Ensure that the system executing your microbenchmarks has as little load as possible. Shutdown every process that can cause unnecessary runtime jitter. Watch the Error column in the benchmark results to see the run-to-run variance.
  • Ensure to run enough warmup iterations to get the benchmark into a stable state. If you are unsure, don't change the defaults.
  • Avoid CPU migrations by pinning your benchmarks to specific CPU cores. On Linux you can use taskset.
  • Fix the CPU frequency to avoid Turbo Boost from kicking in and skewing your results. On Linux you can use cpufreq-set and the performance CPU governor.
  • Vary the problem input size with @Param.
  • Use the integrated profilers in JMH to dig deeper if benchmark results to not match your hypotheses:
    • Add -prof gc to the options to check whether the garbage collector runs during a microbenchmarks and skews your results. If so, try to force a GC between runs (-gc true) but watch out for the caveats.
    • Add -prof perf or -prof perfasm (both only available on Linux) to see hotspots.
  • Have your benchmarks peer-reviewed.

Don't

  • Blindly believe the numbers that your microbenchmark produces but verify them by measuring e.g. with -prof perfasm.
  • Run more threads than your number of CPU cores (in case you run multi-threaded microbenchmarks).
  • Look only at the Score column and ignore Error. Instead take countermeasures to keep Error low / variance explainable.