Commit Graph

34995 Commits

Author SHA1 Message Date
Peter Gromov c3166e1dc3
LUCENE-9750: Hunspell: improve suggestions for mixed-case misspelled words (#2332) 2021-02-10 09:21:01 +01:00
Peter Gromov 5fd18881e9
LUCENE-9749: Hunspell: apply output conversion (OCONV) to the suggestions (#2329) 2021-02-10 09:17:44 +01:00
Peter Gromov f2b7cdc491
LUCENE-9748: Hunspell: suggest inflected dictionary entries similar to the misspelled word (#2330) 2021-02-10 09:16:06 +01:00
Dawid Weiss 1f5b37f299
LUCENE-9747: dodge javadoc reporter NPE bug on Java 11. (#2337) 2021-02-09 21:47:38 +01:00
Dawid Weiss 061b3f29c9
LUCENE-9740: scan affix stream once. (#2327) 2021-02-09 09:58:26 +01:00
Ignacio Vera f93cbb324e
Add TestLucene90FieldInfosFormat (#2269) 2021-02-09 09:32:42 +01:00
Ignacio Vera eafeb66434
LUCENE-9705: Move Lucene50CompoundFormat to Lucene90CompoundFormat (#2268) 2021-02-09 09:19:07 +01:00
Peter Gromov 24984ff4e2
LUCENE-9742: Hunspell: suggest dictionary entries similar to the misspelled word (#2320) 2021-02-09 08:12:34 +01:00
Jason Gerlowski e186d8c439 Fix debug-logging-caused test failures 2021-02-08 19:32:20 -05:00
Dawid Weiss 077f8ccf70
LUCENE-9744: NPE on a degenerate query in MinimumShouldMatchIntervalsSource$MinimumMatchesIterator.getSubMatches() (#2323) 2021-02-08 21:49:00 +01:00
Peter Gromov 80803eb9ad
LUCENE-9746: Hunspell: unify case variation logic in Stemmer and SpellChecker (#2322) 2021-02-08 21:37:32 +01:00
Peter Gromov d0b4ef66d7
LUCENE-9745: Hunspell: tolerate more aff/dic file typos (#2321) 2021-02-08 21:36:44 +01:00
Joel Bernstein da8b8ecdb8 SOLR-15142: Allow the cat Streaming Expression to read gzip files 2021-02-08 15:07:39 -05:00
Jason Gerlowski ed2eebfa4d Debug logging for TestIncrementalCoreBackup Windows failures 2021-02-08 14:36:54 -05:00
Jason Gerlowski cede9723fa SOLR-15118: CHANGES.txt entry 2021-02-08 10:45:37 -05:00
Jason Gerlowski e89fba6fe7
SOLR-15118: Convert /v2/collections APIs to annotations (#2281)
Solr supports two different ways to write v2 APIs: a JSON spec based
approach, and one based on annotated POJOs.  The POJO method is now
preferred.

This commit switches the /v2/collections APIs over to the
annotation-based approach.  Since V2RequestSupport only works with
jsonspec-based APIs, this commit also changes CollectionAdminRequest
to no longer implement that interface.
2021-02-08 10:11:58 -05:00
Peter Gromov 4f64e39ec6
LUCENE-9743: Hunspell: ignore original tests which are out of scope (#2319) 2021-02-08 11:50:40 +01:00
Peter Gromov c3fe9afcc6
LUCENE-9739: Hunspell: speed up numeric flag parsing (#2316) 2021-02-08 11:02:13 +01:00
Peter Gromov 653626399f
LUCENE-9736: Hunspell: support MAP-based suggestions for groups of similar letters (#2314) 2021-02-08 10:59:53 +01:00
Peter Gromov 061233ca4e
LUCENE-9735: Hunspell: speed up flag checks by avoiding allocations (#2315) 2021-02-08 10:56:10 +01:00
Dawid Weiss 903782d756
LUCENE-9727: build side support for running Hunspell tests. (#2313) 2021-02-08 10:50:25 +01:00
Peter Gromov 1cc26b6bb4
LUCENE-9724: Hunspell: tolerate existing aff/dic file typos (#2307) 2021-02-07 12:49:53 +01:00
Peter Gromov 1852d7ad5a
LUCENE-9734: Hunspell: support suggestions based on "ph" morphological data (#2308) 2021-02-06 17:04:12 +01:00
Eric Pugh 573b442903
SOLR-15123: Make all Tool option descriptions follow the same general pattern. (#2275)
* Make all Tool option descriptions follow the same general pattern for describing them.

* Figure out a switch to determine level of either cluster or collections(s)

* better wording on what cluster versus collection params mean

Co-authored-by: epugh@opensourceconnections.com <>
2021-02-05 15:17:58 -05:00
Peter Gromov 825d8dbfd9
LUCENE-9732: Hunspell: support dictionary entries starting with slash (#2301) 2021-02-05 11:25:32 +01:00
Jan Høydahl 2f6807cc76
Split the publish_maven step in two TODOs (#2279) 2021-02-05 09:58:48 +01:00
Peter Gromov 82f8d7ba1d
LUCENE-9728: Hunspell: add a performance test (#2296) 2021-02-05 09:47:02 +01:00
Peter Gromov 650f16ad5d
LUCENE-9729: Hunspell: support CHECKCOMPOUNDREP flags (#2300) 2021-02-05 09:46:22 +01:00
Peter Gromov 16764f1601
LUCENE-9733: Hunspell: exception when loading dictionaries with mixed-case words and aliased flags (#2305) 2021-02-05 09:40:06 +01:00
David Smiley b5c1ed7129
SOLR-15011: BadApple the test 2021-02-04 23:10:09 -05:00
Julie Tibshirani 75755c837c Update changelog with note about BM25FQuery. 2021-02-04 14:00:14 -08:00
Eric Pugh d83a17490d
SOLR-15133: Document how to eliminate Failed to reserve shared memory warning (#2304)
* light copyediting

* document how to avoid shared memory issue

Co-authored-by: epugh@opensourceconnections.com <>
2021-02-04 16:15:43 -05:00
Julie Tibshirani c3f5454d49
LUCENE-9725: Allow BM25FQuery to use other similarities. (#2293)
From a high level, BM25FQuery (1) computes statistic that represent the combined
field content and (2) passes these to a score function. This model makes sense
for many similarities besides BM25.

This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one
configured on IndexSearcher. It also renames BM25FQuery since it's no longer
specific to BM25.
2021-02-04 12:42:45 -08:00
Michael Sokolov 67f71d453d
LUCENE-9715: fix int overflow in Lucene90VectorReader 2021-02-04 13:52:13 -05:00
Julie Tibshirani f0a2f1fe03
LUCENE-9705: Create Lucene90LiveDocsFormat (#2274)
For now this is just a copy of Lucene50LiveDocsFormat. The existing
Lucene50LiveDocsFormat was moved to backwards-codecs.
2021-02-04 10:43:16 -08:00
Jason Gerlowski 7fd64aabcc CHANGES.txt entry for SOLR-13608 2021-02-04 08:54:25 -05:00
Jason Gerlowski 33d16b570c
SOLR-13608: Incremental backup file format (#2250)
This commit introduces a new way for Solr to do backups (with a new
underlying file structure).  This new "incremental" backup process
improves over the existing backup mechanism in several ways:

- multiple backups "points" can now be stored at a given backup
  location/name, allowing users to choose which point in time they want
  to restore
- subsequent backups skip over uploading files that were uploaded by
  previous backups, saving time and network time.
- files are checksumed as they're uploaded, ensuring that corrupted
  indices aren't persisted and accidentally restored later.

Incremental backups are now the default, and traditional backups
should now be considered 'deprecated' but can still be created by
passing an `incremental=false` parameter on backup requests.
2021-02-04 08:47:30 -05:00
Dawid Weiss 894d0bbb59 LUCENE-9730: cleaned up temp. folder management in hunspell. 2021-02-04 09:27:02 +01:00
Peter Gromov 04167b27f5
LUCENE-9726: Hunspell: speed up spellchecking by stopping at a single… (#2295) 2021-02-04 09:13:11 +01:00
Michael Sokolov e2cf6ee74d
LUCENE-9731: restore consistent random seed to HnswGraphBuilder (#2299) 2021-02-03 22:14:01 -05:00
Mike Drob 8fccdfe353
SOLR-15122 Replace sleeps with phaser await (#2291) 2021-02-03 19:39:04 -06:00
Mike Drob 40c5d6b750
SOLR-14253 Avoid writes in ZKSR.waitForState (#2297) 2021-02-03 14:40:07 -06:00
Chris Hostetter d693a61185 SOLR-15092: remove link anchors that are no longer neccessary due to relaxed validation rules
commit generated using: perl -i -ple 's/<<(.*?)\.adoc#\1,/<<.adoc#,/g' src/*.adoc

...with manual cleanup of src/language-analysis.adoc due to adoc syntax ambiguity
2021-02-03 10:36:12 -07:00
Julie Tibshirani 2544a2243b
Remove write logic from Lucene70NormsFormat. (#2287)
Our policy is to not maintain write logic for old formats that can't be written
to. The write logic is moved to the test folder to support unit testing.
2021-02-03 09:28:48 -08:00
Julie Tibshirani 902ce0809d
Improve backwards compatibility tests for sorted indexes. (#2276)
This commit also cleans up some old checks that only applied to pre-6.0 indices.
2021-02-03 09:27:40 -08:00
Chris Hostetter 8a0c1f5a0e SOLR-15092: eliminate overly strict rules against empty link anchors in ref-guide
legacy enforcement from the days of building a PDF
2021-02-03 10:07:34 -07:00
Peter Gromov d95e405fec
LUCENE-9721: Hunspell: disallow ONLYINCOMPOUND suffixes at the very end of compound words (#2294) 2021-02-03 17:46:54 +01:00
Peter Gromov a79f641561
LUCENE-9720: Hunspell: more ways to vary misspelled word variations for suggestions (#2286) 2021-02-03 17:45:56 +01:00
Andrzej Bialecki d88264ba72 SOLR-14234: Unhelpful message in RemoteExecutionException. 2021-02-03 16:27:47 +01:00
Peter Gromov 84aa683b6b
LUCENE-9723: Hunspell: update sanity tests that load all dictionaries (#2290) 2021-02-03 10:45:35 +01:00