QueryBuilder currently has special logic for graph phrase queries with no slop,
constructing a spanquery that attempts to follow all paths using a combination of
OR and NEAR queries. However, this type of query has known bugs(LUCENE-7398).
This commit removes this logic and just builds a disjunction of phrase queries, one
phrase per path.
* Create properties for PublicKeyHandler to read existing keys from disk
* Move pregenerated keys from core/test-files to test-framework
* Update tests to use existing keys instead of new keys each run
SOLR-12238: Handle boosts in QueryBuilder
QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream. This commit also adds a DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr
Previous situation:
* The snowball base classes (Among, SnowballProgram, etc) had accumulated local performance-related changes. There was a task that would also "patch" generated classes (e.g. GermanStemmer) after-the-fact.
* Snowball classes had many "non-changes" from the original such as removal of tabs addition of javadocs, license headers, etc.
* Snowball test data (inputs and expected stems) was incorporated into lucene testing, but this was maintained manually. Also files had become large, making the test too slow (Nightly).
* Snowball stopwords lists from their website were manually maintained. In some cases encoding fixes were manually applied.
* Some generated stemmers (such as Estonian and Armenian) exist in lucene, but have no corresponding `.sbl` file in snowball sources at all.
Besides this mess, snowball project is "moving along" and acquiring new languages, adding non-BSD-licensed test data, huge test data, and other complexity. So it is time to automate the integration better.
New situation:
* Lucene has a `gradle snowball` regeneration task. It works on Linux or Mac only. It checks out their repos, applies the `snowball.patch` in our repository, compiles snowball stemmers, regenerates all java code, applies any adjustments so that our build is happy.
* Tests data is automatically regenerated from the commit hash of the snowball test data repository. Not all languages are tested from their data: only where the license is simple BSD. Test data is also (deterministically) sampled, so that we don't have huge files. We just want to make sure our integration works.
* Randomized tests are still set to test every language with generated fake words. The regeneration task ensures all languages get tested (it writes a simple text file list of them).
* Stopword files are automatically regenerated from the commit hash of the snowball website repository.
* The regeneration procedure is idempotent. This way when stuff does change, you know exactly what happened. For example if test data changes to a different license, you may see a git deletion. Or if a new language/stopwords/test data gets added, you will see git additions.
Previous changes to this issue 'fixed' the way the test was creating mock Replica instances,
to ensure all properties were specified -- but these changes tickled a bug in the existing test
scaffolding that caused it's "expecations" to be based on a regex check against only the base "url"
even though the test logic itself looked at the entire "core url"
The result is that there were reproducible failures if/when the randomly generated regex matched
".*1.*" because the existing test logic did not expect that to match the url or a Replica with
a core name of "core1" because it only considered the base url
SOLR-13996: Refactor HttpShardHandler.prepDistributed method into smaller pieces
This commit introduces an interface named ReplicaSource which is marked as experimental. It has two sub-classes named CloudReplicaSource (for solr cloud) and LegacyReplicaSource for non-cloud clusters. The prepDistributed method now calls out to these sub-classes depending on whether the cluster is running on cloud mode or not.
* No Introduction (to Solr) header. Point at solr-upgrade-notes.adoc instead
* No Getting Started header
* No Versions of Major Components header
* No "Upgrade Notes" for subsequent releases. See solr-upgrade-notes.adoc
Closes#1202
Java 13 adds a new doclint check under "accessibility" that the html
header nesting level isn't crazy.
Many are incorrect because the html4-style javadocs had horrible
font-sizes, so developers used the wrong header level to work around it.
This is no issue in trunk (always html5).
Java recommends against using such structured tags at all in javadocs,
but that is a more involved change: this just "shifts" header levels
in documents to be correct.