Commit Graph

35110 Commits

Author SHA1 Message Date
zacharymorn 56cb9a304c
LUCENE-9639: Add unit tests for SimpleTextVector format (#2404)
... and fix the implementation so it passes!
2021-02-24 15:37:34 -05:00
Joel Bernstein 9a30406871 SOLR-15193: Fix typos 2021-02-24 15:19:35 -05:00
Joel Bernstein 4fb530c52e SOLR-15193: Add Graph to the Visual Guide to Streaming Expressions and Math Expressions 2021-02-24 15:02:57 -05:00
Peter Gromov a5d8463119
LUCENE-9810: Hunspell: when generating suggestions, skip too deep word FST subtrees (#2427)
* LUCENE-9810: Hunspell: when generating suggestions, skip too deep word FST subtrees

we skip roots longer than misspelled+4 anyway, so there's no need to read their arcs

* check more in TestPerformance.de_suggest
2021-02-24 11:58:32 -05:00
Peter Gromov 3a99e2aa82
LUCENE-9806: Hunspell: speed up affix condition checking (#2423)
* LUCENE-9806: Hunspell: speed up affix condition checking

check only stem beginning/end without strip/condition, not the whole candidate
avoid regexp if possible

* hunspell: simplify AffixCondition, add more tests

* add a license to the test
2021-02-24 11:45:35 -05:00
Peter Gromov e1ff4c1354
LUCENE-9808: Hunspell suggestions: consider space/dash-separated words for each case variation (#2425) 2021-02-24 11:43:37 -05:00
Peter Gromov 9d6fd98810
LUCENE-9811: Hunspell suggestions: speed up ngram calculation by not searching for substrings in impossible places (#2428) 2021-02-24 11:41:50 -05:00
Ignacio Vera f8be421ae1
LUCENE-9705: Create Lucene90TermVectorsFormat (#2334) 2021-02-24 11:15:11 +01:00
Robert Muir 84a35dfaea
LUCENE-9794: Optimize skipBytes implementation in remaining DataInput subclasses
Fix various DataInputs to no longer use skipBytesSlowly, add new tests.
2021-02-24 02:46:24 -05:00
Timothy Potter eba0e25535
SOLR-15181: update schema to not specify the docValuesFormat (#2424) 2021-02-23 17:34:07 -07:00
Timothy Potter cfb9764250 Add backcompat indices for 8.8.1 2021-02-23 12:32:50 -07:00
Timothy Potter 10ece1bf2b Fix :lucene:core:spotlessApply failure due to 8.8.1 version update 2021-02-23 12:21:49 -07:00
Timothy Potter bb607bf3fd Add bugfix version 8.8.1 2021-02-23 09:02:52 -07:00
Timothy Potter b37e165a2a DOAP changes for release 8.8.1 2021-02-23 08:33:07 -07:00
Peter Gromov 381a5cacb0
LUCENE-9805: Hunspell: fix space + mixed case heuristics on suggestions (#2420) 2021-02-23 07:00:02 -05:00
Peter Gromov c61b458719
LUCENE-9804: Hunspell: fix most similar dictionary entry search by reversing the comparator (#2419) 2021-02-23 06:58:22 -05:00
Peter Gromov 342ea856d3
LUCENE-9803: Hunspell: don't check second stage suffixes if the first stage flag only occurs in prefixes (#2418) 2021-02-23 06:55:45 -05:00
Robert Muir 7d3f3d61ce
Fix tests.profile output to not run many many times (#2417)
The profiler should only be invoked once at the end of the build. During
refactoring the buildFinished() hook became nested underneath stuff such
as allProjects which causes it to run too many times.
2021-02-23 06:54:39 -05:00
Peter Gromov 34993c22dd
LUCENE-9801: Hunspell suggestions: speed up expandWord by enumerating only applicable affixes (#2416) 2021-02-22 23:25:21 -05:00
Robert Muir af49df4851
Fix compilation failure on linux due to wrong case of package name
Correct package name in backwards-codecs from Lucene87 -> lucene87

It may cause no issues for case-insensitive filesystems such as on Mac
OS X or Windows, but it breaks on linux.
2021-02-22 22:39:14 -05:00
Noble Paul d1a5b9df02 refactor /cluster/aliases V2 API to use annotations 2021-02-23 13:03:38 +11:00
Julie Tibshirani 4d7b2aebfe
LUCENE-9705: Create Lucene90DocValuesFormat and Lucene90NormsFormat (#2392)
For now these are just copies of Lucene80DocValuesFormat and
Lucene80NormsFormat. The existing formats were moved to backwards-codecs.
2021-02-22 11:49:02 -08:00
Peter Gromov 42da2b45e6
LUCENE-9800: Hunspell: put a time limit on suggestion calculation (#2414)
* LUCENE-9800: Hunspell: put a time limit on suggestion calculation

* fix review remarks
2021-02-22 14:06:24 -05:00
Julie Tibshirani bfce5f36da
LUCENE-9616: Add developer docs on how to update a format. (#2395)
This commit adds simple guidelines on how to make a change to a file format:
* Document how the 'copy-on-write' approach works with backwards-codecs
* Clarify that we prefer to copy the format instead of using internal versions
2021-02-22 11:02:37 -08:00
Julie Tibshirani f43fe7642e
LUCENE-9705: Create Lucene90PostingsFormat (#2310)
For now this is just a copy of Lucene90PostingsFormat. The existing
Lucene84PostingsFormat was moved to backwards-codecs, along with its utility
classes.
2021-02-22 10:45:13 -08:00
Peter Gromov f783848e71
LUCENE-9799: Hunspell: don't check second-level affixes when the first level isn't a continuation (#2413)
* LUCENE-9799: Hunspell: don't check second-level affixes when the first level isn't a continuation

* check more words in TestPerformance
2021-02-22 05:35:36 -05:00
Gus Heck e420e6c8f6
SOLR-15160 update cloud.sh (#2393) 2021-02-21 14:36:19 -05:00
Ilan Ginzburg c472be5b86
SOLR-15157: fix wrong assumptions on stats returned by Overseer when cluster state updates are distributed (#2410) 2021-02-21 19:04:53 +01:00
Gus Heck 88ff3cd58d SOLR-14787 CHANGES.txt entry. 2021-02-21 12:05:53 -05:00
Gus Heck 7619165470 Documenting CloneFieldUpdateProcessorFactory once is enough :). 2021-02-21 11:57:24 -05:00
Kevin Watters b298d7fb16
SOLR-14787 - Adding support to use inequalities to the payload check query parser. (#1954) 2021-02-21 11:49:36 -05:00
Robert Muir 107926e486
LUCENE-9795: fix CheckIndex not to validate SortedDocValues as if they were BinaryDocValues
CheckIndex already validates SortedDocValues properly: reads every
document's ordinal and validates derefing all the ordinals back to bytes
from the terms dictionary.

It should not do an additional (very slow) pass where it treats the
field as if it were binary (doc -> ord -> byte[]), this is slow and
doesn't validate any additional index data.

Now that the term dictionary of SortedDocValues may be compressed, it is
especially slow to misuse the docvalues field in this way.
2021-02-21 11:19:41 -05:00
Dawid Weiss d2fb89c22f LUCENE-9793: Add task time aggregation utility (enabled with -Ptask.times=true). 2021-02-20 20:18:16 +01:00
Dawid Weiss 224843a2ba Clean up stale comments a bit. 2021-02-20 20:18:02 +01:00
Robert Muir c51fee9c1a
LUCENE-9480: Make DataInput.skipBytes(long) abstract
skipBytes() is a "relative" version of seek(), but DataInput previously
implemented it via read() calls, because DataInput's API does not
include absolute positioning methods (seek, getFilePointer).

This resulted in inefficiencies: calls to skipBytes() would cause
buffers to be allocated, bytes copied, etc.

Instead, make the subclass implement skipBytes() explicitly. The old
DataInput implementation is marked deprecated and renamed to skipBytesSlowly().

Some subclasses still implement skipBytes() via skipBytesSlowly(), to be
fixed in future improvements.
2021-02-20 12:11:32 -05:00
Eric Pugh 2f0d191452
SOLR-15162: Add some parameters to make MODIFYCOLLECTION v1 and v2 more similar. (#2402)
* expose readOnly parameter to v2 of modifycollection.


Co-authored-by: epugh@opensourceconnections.com <>
2021-02-20 10:49:09 -05:00
Jason Gerlowski 582a9f2e14 SOLR-15087: CHANGES.txt entry 2021-02-19 15:54:26 -05:00
Dawid Weiss 515a41dee9
LUCENE-9792: add testRegressions task that downloads and runs hunspell regression tests. (#2407) 2021-02-19 21:13:40 +01:00
Peter Gromov 31a64927a4
LUCENE-9785: Hunspell: don't check case in compound middle and end (#2398) 2021-02-19 20:16:39 +01:00
Peter Gromov 5325d2e6f4
LUCENE-9786: Hunspell suggestions: try moving the last character into the middle (#2399) 2021-02-19 20:15:57 +01:00
Peter Gromov 3ddc3c04a5
LUCENE-9787: Hunspell: speed up suggesting a bit by not creating a huge TreeSet (#2400) 2021-02-19 20:13:19 +01:00
Peter Gromov 58e3b7a854
LUCENE-9790: Hunspell: avoid slow dictionary lookup if the word's hash isn't there (#2405) 2021-02-19 20:10:06 +01:00
Peter Gromov 4b3fb1e065
LUCENE-9776: Hunspell: allow to inflect the last part of COMPOUNDRULE compound (#2397) 2021-02-19 20:03:34 +01:00
Ilan Ginzburg e7c80f6445
SOLR-15157: refactor Collection API to separate from Overseer and message handling abstractions (#2390)
No functional changes. In preparation of distributing the Collection API command execution.
2021-02-19 14:40:23 +01:00
Robert Muir 6deee14382
LUCENE-9774: Fix TestDirectIODirectory to probe for supported filesystem (#2396)
TestDirectIODirectory will currently fail if run on an unsupported
filesystem (e.g. tmpfs). Add an "assume" that probes if the filesystem
supports Direct I/O.

Also tweak javadocs to indicate correct @throws clauses for the
IndexInput and IndexOutput. You'll get an IOException (translated from
EINVAL) if the filesystem doesn't support it, not a UOE.
2021-02-18 20:36:18 -05:00
epugh@opensourceconnections.com f920b9b14e I do not want to backport build tool changes from gradle to ant, so will leave this feature for Solr 9 2021-02-18 17:26:01 -05:00
Eric Pugh f70a518f1b
SOLR-8138: Simple UI for issuing SQL queries (#2381)
* Updated SOLR-8138 files for Solr 9.

This code was mostly written by Michael Suzuki,  i just tweaked it to load, and updated the version of ui-grid to the 4.10 version.

* unused file, we use the .min version.

* add an entry for the ui-grid project to license file.

Co-authored-by: epugh@opensourceconnections.com <>
2021-02-18 17:21:21 -05:00
Peter Gromov 5e834b39eb
LUCENE-9769: Hunspell: KEEPCASE should take precedence over affixed forms (#2374)
and disregard KEEPCASE in Stemmer to make it more consistent with "hunspell -s"
2021-02-18 09:30:09 +01:00
Peter Gromov 589eefc32b
LUCENE-9782: Hunspell suggestions: split by space (but not dash) also before last char (#2387) 2021-02-18 09:28:29 +01:00
Peter Gromov f879c6ad84
LUCENE-9783: Hunspell: don't suggest more than 4 ngram corrections by default (#2388) 2021-02-18 09:27:06 +01:00