Commit Graph

34876 Commits

Author SHA1 Message Date
Robert Muir dade99cb4d
LUCENE-9816: lazy-init LZ4-HC hashtable in BlockTreeTermsWriter
LZ4-HC hashtable is heavy (128kb int[] + 128kb short[]) and must be
filled with special values on initialization. This is a lot of overhead
for fields that might not use the compression at all.

Don't initialize this for a field until we see hints that the data might
be compressible and need to use the table in order to test it out.
2021-02-28 17:54:30 -05:00
Robert Muir 96eb043131
fix TestKnnGraph test failure if it gets SimpleText
This test reaches into lucene90 internals, fails with classcastexception
if it happens to get simpletext.
2021-02-28 14:43:36 -05:00
Ilan Ginzburg 1fff174690
SOLR-14928: add exponential backoff wait time when Compare And Swap fails in distributed cluster state update due to concurrent update (#2438) 2021-02-28 00:53:42 +01:00
Thomas Wöckinger 988a16fe95 SOLR-15191: Fix JSON faceting on EnumFieldType (#2426)
* Fix JSON Faceting on EnumFieldType if allBuckets, numBuckets or missing is set.
* Enhance hash method of JSON faceting to support EnumFieldType and perhaps some other/custom field types

Co-authored-by: Thomas Wöckinger <two@silbergrau.com>
Co-authored-by: David Smiley <dsmiley@apache.org>
2021-02-27 14:35:29 -05:00
Eric Pugh d4fb023756
SOLR-15194: relax requirements and allow http urls. (#2430)
Relax the need for https urls for JWT IDP's if you pass in solr.auth.jwt.allowOutboundHttp=true system property.
2021-02-27 09:13:51 -05:00
Robert Muir 6348c284fd
Merge branch 'master' of https://gitbox.apache.org/repos/asf/lucene-solr 2021-02-26 20:27:23 -05:00
Robert Muir 373e1d6c83
LUCENE-9814: fix extremely slow 7.0 backwards tests in master
The 7.0 backwards tests added to master must have come from an older
branch before they were fixed: they've added minutes to my test times.

These tests have already been fixed in master, so that the crazy
corner-case stress tests are only running slowly in jekins and we dont
have 15-30s long tests locally.

Re-applying same fixes to 7.0 tests removes minutes from my test times.
2021-02-26 20:24:01 -05:00
Christine Poerschke e88b3e9c20 Fix 'invoke' typo in UUIDUpdateProcessorFactory javadocs. 2021-02-26 17:38:06 +00:00
Peter Gromov 4f6469b173
LUCENE-9812: Hunspell: honor empty stripping affixes when generating suggestions (#2432) 2021-02-26 00:24:57 -05:00
zacharymorn 5bca3d1960
LUCENE-9639: Implements SimpleTextVectorReader#ramBytesUsed (#2433)
* Use single class imports
2021-02-25 21:32:34 -05:00
David Smiley 62971c4f99
SOLR-13034: RTG sometimes didn't materialize LazyField (#2408)
Partial (AKA Atomic) updates could encounter "LazyField" instances in the document
cache and not know hot to deal with them when writing the updated doc to the update log.
2021-02-25 16:29:30 -05:00
Joel Bernstein 220db76311 SOLR-15193: Fix wording 2021-02-25 08:39:10 -05:00
Noble Paul 119aec804e removed empty file 2021-02-25 22:47:04 +11:00
Ilan Ginzburg 04c95c71af
SOLR-15146: remove unreachable code (#2431) 2021-02-25 00:08:16 +01:00
Joel Bernstein 1f8b708a54 SOLR-15193: Fix typos 2021-02-24 17:39:02 -05:00
Joel Bernstein a3691bc81d SOLR-15193: Fix typos 2021-02-24 17:33:42 -05:00
zacharymorn 56cb9a304c
LUCENE-9639: Add unit tests for SimpleTextVector format (#2404)
... and fix the implementation so it passes!
2021-02-24 15:37:34 -05:00
Joel Bernstein 9a30406871 SOLR-15193: Fix typos 2021-02-24 15:19:35 -05:00
Joel Bernstein 4fb530c52e SOLR-15193: Add Graph to the Visual Guide to Streaming Expressions and Math Expressions 2021-02-24 15:02:57 -05:00
Peter Gromov a5d8463119
LUCENE-9810: Hunspell: when generating suggestions, skip too deep word FST subtrees (#2427)
* LUCENE-9810: Hunspell: when generating suggestions, skip too deep word FST subtrees

we skip roots longer than misspelled+4 anyway, so there's no need to read their arcs

* check more in TestPerformance.de_suggest
2021-02-24 11:58:32 -05:00
Peter Gromov 3a99e2aa82
LUCENE-9806: Hunspell: speed up affix condition checking (#2423)
* LUCENE-9806: Hunspell: speed up affix condition checking

check only stem beginning/end without strip/condition, not the whole candidate
avoid regexp if possible

* hunspell: simplify AffixCondition, add more tests

* add a license to the test
2021-02-24 11:45:35 -05:00
Peter Gromov e1ff4c1354
LUCENE-9808: Hunspell suggestions: consider space/dash-separated words for each case variation (#2425) 2021-02-24 11:43:37 -05:00
Peter Gromov 9d6fd98810
LUCENE-9811: Hunspell suggestions: speed up ngram calculation by not searching for substrings in impossible places (#2428) 2021-02-24 11:41:50 -05:00
Ignacio Vera f8be421ae1
LUCENE-9705: Create Lucene90TermVectorsFormat (#2334) 2021-02-24 11:15:11 +01:00
Robert Muir 84a35dfaea
LUCENE-9794: Optimize skipBytes implementation in remaining DataInput subclasses
Fix various DataInputs to no longer use skipBytesSlowly, add new tests.
2021-02-24 02:46:24 -05:00
Timothy Potter eba0e25535
SOLR-15181: update schema to not specify the docValuesFormat (#2424) 2021-02-23 17:34:07 -07:00
Timothy Potter cfb9764250 Add backcompat indices for 8.8.1 2021-02-23 12:32:50 -07:00
Timothy Potter 10ece1bf2b Fix :lucene:core:spotlessApply failure due to 8.8.1 version update 2021-02-23 12:21:49 -07:00
Timothy Potter bb607bf3fd Add bugfix version 8.8.1 2021-02-23 09:02:52 -07:00
Timothy Potter b37e165a2a DOAP changes for release 8.8.1 2021-02-23 08:33:07 -07:00
Peter Gromov 381a5cacb0
LUCENE-9805: Hunspell: fix space + mixed case heuristics on suggestions (#2420) 2021-02-23 07:00:02 -05:00
Peter Gromov c61b458719
LUCENE-9804: Hunspell: fix most similar dictionary entry search by reversing the comparator (#2419) 2021-02-23 06:58:22 -05:00
Peter Gromov 342ea856d3
LUCENE-9803: Hunspell: don't check second stage suffixes if the first stage flag only occurs in prefixes (#2418) 2021-02-23 06:55:45 -05:00
Robert Muir 7d3f3d61ce
Fix tests.profile output to not run many many times (#2417)
The profiler should only be invoked once at the end of the build. During
refactoring the buildFinished() hook became nested underneath stuff such
as allProjects which causes it to run too many times.
2021-02-23 06:54:39 -05:00
Peter Gromov 34993c22dd
LUCENE-9801: Hunspell suggestions: speed up expandWord by enumerating only applicable affixes (#2416) 2021-02-22 23:25:21 -05:00
Robert Muir af49df4851
Fix compilation failure on linux due to wrong case of package name
Correct package name in backwards-codecs from Lucene87 -> lucene87

It may cause no issues for case-insensitive filesystems such as on Mac
OS X or Windows, but it breaks on linux.
2021-02-22 22:39:14 -05:00
Noble Paul d1a5b9df02 refactor /cluster/aliases V2 API to use annotations 2021-02-23 13:03:38 +11:00
Julie Tibshirani 4d7b2aebfe
LUCENE-9705: Create Lucene90DocValuesFormat and Lucene90NormsFormat (#2392)
For now these are just copies of Lucene80DocValuesFormat and
Lucene80NormsFormat. The existing formats were moved to backwards-codecs.
2021-02-22 11:49:02 -08:00
Peter Gromov 42da2b45e6
LUCENE-9800: Hunspell: put a time limit on suggestion calculation (#2414)
* LUCENE-9800: Hunspell: put a time limit on suggestion calculation

* fix review remarks
2021-02-22 14:06:24 -05:00
Julie Tibshirani bfce5f36da
LUCENE-9616: Add developer docs on how to update a format. (#2395)
This commit adds simple guidelines on how to make a change to a file format:
* Document how the 'copy-on-write' approach works with backwards-codecs
* Clarify that we prefer to copy the format instead of using internal versions
2021-02-22 11:02:37 -08:00
Julie Tibshirani f43fe7642e
LUCENE-9705: Create Lucene90PostingsFormat (#2310)
For now this is just a copy of Lucene90PostingsFormat. The existing
Lucene84PostingsFormat was moved to backwards-codecs, along with its utility
classes.
2021-02-22 10:45:13 -08:00
Peter Gromov f783848e71
LUCENE-9799: Hunspell: don't check second-level affixes when the first level isn't a continuation (#2413)
* LUCENE-9799: Hunspell: don't check second-level affixes when the first level isn't a continuation

* check more words in TestPerformance
2021-02-22 05:35:36 -05:00
Gus Heck e420e6c8f6
SOLR-15160 update cloud.sh (#2393) 2021-02-21 14:36:19 -05:00
Ilan Ginzburg c472be5b86
SOLR-15157: fix wrong assumptions on stats returned by Overseer when cluster state updates are distributed (#2410) 2021-02-21 19:04:53 +01:00
Gus Heck 88ff3cd58d SOLR-14787 CHANGES.txt entry. 2021-02-21 12:05:53 -05:00
Gus Heck 7619165470 Documenting CloneFieldUpdateProcessorFactory once is enough :). 2021-02-21 11:57:24 -05:00
Kevin Watters b298d7fb16
SOLR-14787 - Adding support to use inequalities to the payload check query parser. (#1954) 2021-02-21 11:49:36 -05:00
Robert Muir 107926e486
LUCENE-9795: fix CheckIndex not to validate SortedDocValues as if they were BinaryDocValues
CheckIndex already validates SortedDocValues properly: reads every
document's ordinal and validates derefing all the ordinals back to bytes
from the terms dictionary.

It should not do an additional (very slow) pass where it treats the
field as if it were binary (doc -> ord -> byte[]), this is slow and
doesn't validate any additional index data.

Now that the term dictionary of SortedDocValues may be compressed, it is
especially slow to misuse the docvalues field in this way.
2021-02-21 11:19:41 -05:00
Dawid Weiss d2fb89c22f LUCENE-9793: Add task time aggregation utility (enabled with -Ptask.times=true). 2021-02-20 20:18:16 +01:00
Dawid Weiss 224843a2ba Clean up stale comments a bit. 2021-02-20 20:18:02 +01:00