Given the input text 'A B A C', an ordered interval 'A B C' will currently return an incorrect
internal [2, 3] in addition to the correct [0, 3] interval. This is due to a bug in the ORDERED
algorithm, where we assume that after the first interval is returned, the sub-intervals are
always in-order. This assumption only holds during minimization, as minimizing an interval
may move the earlier terms beyond the trailing terms.
For example, after the initial [0, 3] interval is found above, the algorithm will attempt to
minimize it by advancing A to [2,2]. Because this is still before C at [3,3], but after B at
[1,1], we then try advancing B, leaving it at [Inf,Inf]. Minimization has failed, so we return
the original interval of [0,3]. However, when we come to retrieve the next interval, our
subintervals look like this: A[2,2], B[Inf,Inf], C[3,3] - the assumption that they are in order
is broken. The algorithm sees that A is before B, assumes that therefore all subsequent
subintervals are in order, and returns the new interval.
This commit fixes things by changing the assumption of ordering to only hold during
minimization. When first finding a candidate interval, the algorithm now checks that
all sub-intervals appear in order.
Add IndexWriter merge-on-commit feature to selectively merge small segments on commit,
subject to a configurable timeout, to improve search performance by reducing the number of small
segments for searching.
Co-authored-by: Michael Froh <msfroh@apache.org>
Co-authored-by: Michael Sokolov <sokolov@falutin.net>
Co-authored-by: Mike McCandless <mikemccand@apache.org>
Previous change *should* have been purely a refactor but accidentally changed the path separator this streaming expression exposed to always be '/'. Apparently this is controversial so the previous behavior is re-instated here (and fixes a failing test on Windows).
This commit is a follow up to the original commit and adds more documentation and adds timing information for circuit breaker in query response only if circuit breakers are enabled. This commit also adds a test for ensuring that the query response is correct when timing is enabled and circuit breakers are being used.
* SOLR-14588: Implement Circuit Breakers
This commit consists of two parts: initial circuit breakers infrastructure and real JVM memory based
circuit breaker which monitors incoming search requests and rejects them with SERVICE_TOO_BUSY error
if the defined threshold is breached, thus giving headroom to existing indexing and search requests
to complete.