QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream. This commit also adds a DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr
With this change, we sort dvUpdates in the term order before applying if
they all update a single field to the same value. This optimization can
reduce the flush time by around 20% for the docValues update user cases.
ConcurrentUpdateSolrClient now propagates its connection and read timeouts to the private HttpSolrClient used to commit and optimize.
(cherry picked from commit 051133c13f)
On newer linux distros, at least, 'python' now means python3. So
we can't rely on what version of python it will invoke (at least for a
few years).
For example in Fedora Linux:
https://fedoraproject.org/wiki/Changes/Python_means_Python3
For python2.x code, explicitly call 'python2.7' and for python3.x code,
explicitly call 'python3'.
Ant variable names are cleaned up, e.g. 'python.exe' is renamed to
'python2.exe' and 'python32.exe' is renamed to 'python3.exe'. This also
makes it easy to identify remaining python 2.x code that should be
migrated to python 3.x
Previous changes to this issue 'fixed' the way the test was creating mock Replica instances,
to ensure all properties were specified -- but these changes tickled a bug in the existing test
scaffolding that caused it's "expecations" to be based on a regex check against only the base "url"
even though the test logic itself looked at the entire "core url"
The result is that there were reproducible failures if/when the randomly generated regex matched
".*1.*" because the existing test logic did not expect that to match the url or a Replica with
a core name of "core1" because it only considered the base url
(cherry picked from commit 49e20dbee4)
SOLR-13996: Refactor HttpShardHandler.prepDistributed method into smaller pieces
This commit introduces an interface named ReplicaSource which is marked as experimental. It has two sub-classes named CloudReplicaSource (for solr cloud) and LegacyReplicaSource for non-cloud clusters. The prepDistributed method now calls out to these sub-classes depending on whether the cluster is running on cloud mode or not.
(cherry picked from commit c65b97665c)
* No Introduction (to Solr) header. Point at solr-upgrade-notes.adoc instead
* No Getting Started header
* No Versions of Major Components header
* No "Upgrade Notes" for subsequent releases. See solr-upgrade-notes.adoc
Closes#1202
(cherry picked from commit 46c0945614)
If you have repeating intervals in an ordered or unordered interval source, you currently
get somewhat confusing behaviour:
* `ORDERED(a, a, b)` will return an extra interval over just a b if it first matches a a b, meaning
that you can get incorrect results if used in a `CONTAINING` filter -
`CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y`
* `UNORDERED(a, a)` will match on documents that just containg a single a.
This commit adds a RepeatingIntervalsSource that correctly handles repeats within
ordered and unordered sources. It also changes the way that gaps are calculated within
ordered and unordered sources, by using a new width() method on IntervalIterator. The
default implementation just returns end() - start() + 1, but RepeatingIntervalsSource
instead returns the sum of the widths of its child iterators. This preserves maxgaps filtering
on ordered and unordered sources that contain repeats.
In order to correctly handle matches in this scenario, IntervalsSource#matches now always
returns an explicit IntervalsMatchesIterator rather than a plain MatchesIterator, which adds
gaps() and width() methods so that submatches can be combined in the same way that
subiterators are. Extra checks have been added to checkIntervals() to ensure that the same
intervals are returned by both iterator and matches, and a fix to
DisjunctionIntervalIterator#matches() is also included - DisjunctionIntervalIterator minimizes
its intervals, while MatchesUtils.disjunction does not, so there was a discrepancy between
the two methods.