Improve performance of extracting warning value

When building headers for a REST response, we de-duplicate the warning
headers based on the actual warning value. The current implementation of
this uses a capturing regular expression that is prone to excessive
backtracking. In cases a request involves a large number of warnings,
this extraction can be a severe performance penalty. An example where
this can arise is a bulk indexing request that utilizes a deprecated
feature (e.g., using deprecated forms of boolean values). This commit is
an attempt to address this performance regression. We already know the
format of the warning header, so we do not need to use a regular
expression to parse it but rather can parse it by hand to extract the
warning value. This gains back the vast majority of the performance lost
due to the usage of a deprecated feature. There is still a performance
loss due to logging the deprecation message but we do not address that
concern in this commit.

Relates #24114
This commit is contained in:
Jason Tedor 2017-04-14 12:18:00 -04:00 committed by GitHub
parent 989da585b2
commit 09efdc3151
1 changed files with 32 additions and 1 deletions

View File

@ -226,10 +226,41 @@ public class DeprecationLogger {
* @return the extracted warning value
*/
public static String extractWarningValueFromWarningHeader(final String s) {
/*
* We know the exact format of the warning header, so to extract the warning value we can skip forward from the front to the first
* quote, and skip backwards from the end to the penultimate quote:
*
* 299 Elasticsearch-6.0.0 "warning value" "Sat, 25, Feb 2017 10:27:43 GMT"
* ^ ^ ^
* firstQuote penultimateQuote lastQuote
*
* We do it this way rather than seeking forward after the first quote because there could be escaped quotes in the warning value
* but since there are none in the warning date, we can skip backwards to find the quote that closes the quoted warning value.
*
* We parse this manually rather than using the capturing regular expression because the regular expression involves a lot of
* backtracking and carries a performance penalty. However, when assertions are enabled, we still use the regular expression to
* verify that we are maintaining the warning header format.
*/
final int firstQuote = s.indexOf('\"');
final int lastQuote = s.lastIndexOf('\"');
final int penultimateQuote = s.lastIndexOf('\"', lastQuote - 1);
final String warningValue = s.substring(firstQuote + 1, penultimateQuote - 2);
assert assertWarningValue(s, warningValue);
return warningValue;
}
/**
* Assert that the specified string has the warning value equal to the provided warning value.
*
* @param s the string representing a full warning header
* @param warningValue the expected warning header
* @return {@code true} if the specified string has the expected warning value
*/
private static boolean assertWarningValue(final String s, final String warningValue) {
final Matcher matcher = WARNING_HEADER_PATTERN.matcher(s);
final boolean matches = matcher.matches();
assert matches;
return matcher.group(1);
return matcher.group(1).equals(warningValue);
}
/**