Commit Graph

34763 Commits

Author SHA1 Message Date
Julie Tibshirani c3f5454d49
LUCENE-9725: Allow BM25FQuery to use other similarities. (#2293)
From a high level, BM25FQuery (1) computes statistic that represent the combined
field content and (2) passes these to a score function. This model makes sense
for many similarities besides BM25.

This PR unhardcodes BM25Similarity in BM25FQuery and instead uses the one
configured on IndexSearcher. It also renames BM25FQuery since it's no longer
specific to BM25.
2021-02-04 12:42:45 -08:00
Michael Sokolov 67f71d453d
LUCENE-9715: fix int overflow in Lucene90VectorReader 2021-02-04 13:52:13 -05:00
Julie Tibshirani f0a2f1fe03
LUCENE-9705: Create Lucene90LiveDocsFormat (#2274)
For now this is just a copy of Lucene50LiveDocsFormat. The existing
Lucene50LiveDocsFormat was moved to backwards-codecs.
2021-02-04 10:43:16 -08:00
Jason Gerlowski 7fd64aabcc CHANGES.txt entry for SOLR-13608 2021-02-04 08:54:25 -05:00
Jason Gerlowski 33d16b570c
SOLR-13608: Incremental backup file format (#2250)
This commit introduces a new way for Solr to do backups (with a new
underlying file structure).  This new "incremental" backup process
improves over the existing backup mechanism in several ways:

- multiple backups "points" can now be stored at a given backup
  location/name, allowing users to choose which point in time they want
  to restore
- subsequent backups skip over uploading files that were uploaded by
  previous backups, saving time and network time.
- files are checksumed as they're uploaded, ensuring that corrupted
  indices aren't persisted and accidentally restored later.

Incremental backups are now the default, and traditional backups
should now be considered 'deprecated' but can still be created by
passing an `incremental=false` parameter on backup requests.
2021-02-04 08:47:30 -05:00
Dawid Weiss 894d0bbb59 LUCENE-9730: cleaned up temp. folder management in hunspell. 2021-02-04 09:27:02 +01:00
Peter Gromov 04167b27f5
LUCENE-9726: Hunspell: speed up spellchecking by stopping at a single… (#2295) 2021-02-04 09:13:11 +01:00
Michael Sokolov e2cf6ee74d
LUCENE-9731: restore consistent random seed to HnswGraphBuilder (#2299) 2021-02-03 22:14:01 -05:00
Mike Drob 8fccdfe353
SOLR-15122 Replace sleeps with phaser await (#2291) 2021-02-03 19:39:04 -06:00
Mike Drob 40c5d6b750
SOLR-14253 Avoid writes in ZKSR.waitForState (#2297) 2021-02-03 14:40:07 -06:00
Chris Hostetter d693a61185 SOLR-15092: remove link anchors that are no longer neccessary due to relaxed validation rules
commit generated using: perl -i -ple 's/<<(.*?)\.adoc#\1,/<<.adoc#,/g' src/*.adoc

...with manual cleanup of src/language-analysis.adoc due to adoc syntax ambiguity
2021-02-03 10:36:12 -07:00
Julie Tibshirani 2544a2243b
Remove write logic from Lucene70NormsFormat. (#2287)
Our policy is to not maintain write logic for old formats that can't be written
to. The write logic is moved to the test folder to support unit testing.
2021-02-03 09:28:48 -08:00
Julie Tibshirani 902ce0809d
Improve backwards compatibility tests for sorted indexes. (#2276)
This commit also cleans up some old checks that only applied to pre-6.0 indices.
2021-02-03 09:27:40 -08:00
Chris Hostetter 8a0c1f5a0e SOLR-15092: eliminate overly strict rules against empty link anchors in ref-guide
legacy enforcement from the days of building a PDF
2021-02-03 10:07:34 -07:00
Peter Gromov d95e405fec
LUCENE-9721: Hunspell: disallow ONLYINCOMPOUND suffixes at the very end of compound words (#2294) 2021-02-03 17:46:54 +01:00
Peter Gromov a79f641561
LUCENE-9720: Hunspell: more ways to vary misspelled word variations for suggestions (#2286) 2021-02-03 17:45:56 +01:00
Andrzej Bialecki d88264ba72 SOLR-14234: Unhelpful message in RemoteExecutionException. 2021-02-03 16:27:47 +01:00
Peter Gromov 84aa683b6b
LUCENE-9723: Hunspell: update sanity tests that load all dictionaries (#2290) 2021-02-03 10:45:35 +01:00
Peter Gromov d0ae2bd2b9
LUCENE-9717: Hunspell: support CHECKCOMPOUNDPATTERN (#2280) 2021-02-03 08:58:40 +01:00
Nazerke Seidan 6509a3003c
SOLR-15011: /admin/logging now distributes setLevel to all nodes (#2230)
The admin UI will set nodes=all for this.

Co-authored-by: Nazerke Seidan <nseidan@salesforce.com>
Co-authored-by: David Smiley <dsmiley@apache.org>
2021-02-02 21:41:23 -05:00
orenovadia 8d0cbcbb53
LUCENE-9680 - Re-add IndexWriter::getFieldNames 2021-02-02 17:38:43 -05:00
sbeniwal12 a53e8e7228
LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes (from Shubham Beniwal)) 2021-02-02 17:26:29 -05:00
Peter Gromov 8f75933f3d
LUCENE-9716: Hunspell: support flag usage before its format is even specified (#2277) 2021-02-02 21:25:56 +01:00
Nhat Nguyen 47e3d06ce0
LUCENE-9722: Close merged readers on abort (#2288)
We fail to close the merged readers of an aborted merge if its 
output segment contains no document.

This bug was discovered by a test in Elasticsearch 
(elastic/elasticsearch#67884).
2021-02-02 11:24:10 -05:00
Andrzej Bialecki 4cb1000ea0 SOLR-15122: Tentative fix for the test failure - the node in the test could go down
before the new plugin was active on the Overseer.
2021-02-02 12:06:39 +01:00
Peter Gromov b48d5beb34
LUCENE-9707: Hunspell: check Lucene's implementation against Hunspel's test data (#2267) 2021-02-02 10:46:14 +01:00
Dawid Weiss 2da7a4a86d LUCENE-9686: Add changes entry. 2021-02-02 09:10:03 +01:00
zacharymorn 3835cb4e95
LUCENE-9686: Fix read past EOF handling in DirectIODirectory (#2258) 2021-02-02 09:07:30 +01:00
Chris Hostetter 15aaec60d9 SOLR-14330: ExpandComponent now supports an expand.nullGroup=true option 2021-02-01 16:19:34 -07:00
Mike Drob 99748384cf
SOLR-14253 Replace sleep calls with ZK waits (#1297)
Co-Authored-By: markrmiller <markrmiller@apache.org>
2021-02-01 13:25:17 -06:00
András Salamon e8bc758144
SOLR-15115: Remove unused methods from TestRerankBase (#2261) 2021-02-01 17:31:58 +00:00
Andrzej Bialecki 9e8ca98985 SOLR-15068: RefGuide documentation for replica placement plugins (plus
minor cleanups).
2021-02-01 16:50:25 +01:00
Peter Gromov 7a7949aed2
LUCENE-9708: Hunspell: support FLAG UTF-8 in absence of SET UTF-8 (#2270) 2021-02-01 10:36:24 +01:00
Peter Gromov 8a34cc7afd
LUCENE-9701: Hunspell: implement simple REP-based suggestion algorithm (#2251) 2021-02-01 10:23:54 +01:00
Peter Gromov 9d45dfe776
LUCENE-9710: Hunspell: support minor compounding-related flags (#2272)
* LUCENE-9710: Hunspell: support COMPOUNDFLAG

* LUCENE-9710: Hunspell: fix CHECKCOMPOUNDCASE support

* LUCENE-9710: Hunspell: support CHECKCOMPOUNDDUP

* LUCENE-9710: Hunspell: support triple flags (CHECKCOMPOUNDTRIPLE, SIMPLIFIEDTRIPLE)

* LUCENE-9710: Hunspell: support COMPOUNDFORBIDFLAG

* LUCENE-9710: Hunspell: support FORCEUCASE
2021-02-01 10:20:11 +01:00
Peter Gromov 40e92315ae
LUCENE-9709: Hunspell: no special dotted i treatment outside tr/az languages (#2271) 2021-02-01 10:05:28 +01:00
Mike Drob 5cca464517 SOLR-15122 Debug Logging 2021-01-29 15:49:10 -06:00
Mike McCandless 4d839225b1 LUCENE-9537: move to 8.9 section in CHANGES.txt; make it consistent with 8.x's CHANGES.txt; remove the leading UTF-8 BOM 2021-01-29 16:46:54 -05:00
Mike McCandless cac5c2a4b2 LUCENE-9694: make new DocumentSelector interface public so it is usable outside of its own package 2021-01-29 16:10:59 -05:00
cammiemw 9cc5c9b798
LUCENE-9537: Add initial Indri search engine functionality to Lucene 2021-01-29 14:47:24 -05:00
Patrick Zhai e4cede0e8c
LUCENE-9694: New tool for creating a deterministic index (#2246) 2021-01-29 13:32:24 -05:00
Eric Pugh 6d71a0aced
SOLR-14067: v4 Create /contrib/scripting module with ScriptingUpdateProcessor (#2257)
* Creating Scripting contrib module to centralize the less secure code related to scripts.

* tweak the changelog and update notice to explain why the name changed and the security posture thinking

* the test script happens to be a currency.xml, which made me think we were doing something specific to currency types, but instead any xml formatted file will suffice for the test.

* Update solr/contrib/scripting/src/java/org/apache/solr/scripting/update/ScriptUpdateProcessorFactory.java

* Update solr/contrib/scripting/src/java/org/apache/solr/scripting/update/package-info.java

* drop the ing, and be more specific on the name of the ref guide page

* comment out the script update chain.

The sample techproducts configSet is used by many of the solr unit tests, and by default doesn't have access to the jar file in the contrib module.   This is commented out, similar to how the lang contrib is.

* using a Mock for the script processor in order to keep the trusted configSets tests all together.

* tweak since we are using a mock script processor

Co-authored-by: David Smiley <dsmiley@apache.org>
2021-01-29 12:27:36 -05:00
Tim Dillon a7a434dbc4
SOLR-15025: MiniSolrCloudCluster.waitForAllNodes ignores passed timeout value (#2193)
* Change timeout values to seconds
2021-01-29 11:22:06 -06:00
Tim Owen 715caaae52
SOLR-15085 Prevent EmbeddedSolrServer calling shutdown on a CoreContainer that was passed to it 2021-01-29 11:15:22 -06:00
Mike Drob 0d4769e174
SOLR-15120 Reduce duplicated core creation work (#2266)
Use j.u.c collections instead of sync block
Rework how we load implicit handlers
Additional debug and trace logging for zookeeper comms
2021-01-29 10:20:16 -06:00
Peter Gromov ff943ece8f
LUCENE-9702: Hunspell: support alternate casing for short language codes (#2253) 2021-01-29 11:46:45 +01:00
Peter Gromov 6635d7a5e7
LUCENE-9704: Hunspell: support capitalization for German ß (#2260) 2021-01-29 10:03:37 +01:00
Peter Gromov 71705c900b
LUCENE-9703: Hunspell: prohibit FORBIDDENWORD words and their case variations (#2254) 2021-01-29 08:36:37 +01:00
Peter Gromov 4ba78f2ab2
LUCENE-9706: Hunspell: support NEEDAFFIX flag on affixes (#2262) 2021-01-29 08:24:23 +01:00
Peter Gromov 800f4d0919
LUCENE-9700: Hunspell: support words with trailing dots (#2249) 2021-01-29 08:23:03 +01:00