When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.
Microbenchmark results before this change:
Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
1113.839 ±(99.9%) 6.338 ns/op [Average]
(min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)
Microbenchmark results with this change applied:
Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
433.190 ±(99.9%) 0.644 ns/op [Average]
(min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
CI (99.9%): [432.546, 433.833] (assumes normal distribution)
The microbenchmark in question was:
@Fork(3)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@SuppressWarnings("unused") //invoked by benchmarking framework
public class RegexStartsWithBenchmark {
private static final String testString = "abcdefghijklmnopqrstuvwxyz";
private static final String[] patterns;
static {
patterns = new String[testString.length() + 1];
for (int i = 0; i <= testString.length(); i++) {
patterns[i] = testString.substring(0, i) + "*";
}
}
@Benchmark
public void performSimpleMatch() {
for (int i = 0; i < patterns.length; i++) {
Regex.simpleMatch(patterns[i], testString);
}
}
}