LZ4-HC hashtable is heavy (128kb int[] + 128kb short[]) and must be
filled with special values on initialization. This is a lot of overhead
for fields that might not use the compression at all.
Don't initialize this for a field until we see hints that the data might
be compressible and need to use the table in order to test it out.
* Fix JSON Faceting on EnumFieldType if allBuckets, numBuckets or missing is set.
* Enhance hash method of JSON faceting to support EnumFieldType and perhaps some other/custom field types
Co-authored-by: Thomas Wöckinger <two@silbergrau.com>
Co-authored-by: David Smiley <dsmiley@apache.org>
The 7.0 backwards tests added to master must have come from an older
branch before they were fixed: they've added minutes to my test times.
These tests have already been fixed in master, so that the crazy
corner-case stress tests are only running slowly in jekins and we dont
have 15-30s long tests locally.
Re-applying same fixes to 7.0 tests removes minutes from my test times.
Partial (AKA Atomic) updates could encounter "LazyField" instances in the document
cache and not know hot to deal with them when writing the updated doc to the update log.
* LUCENE-9810: Hunspell: when generating suggestions, skip too deep word FST subtrees
we skip roots longer than misspelled+4 anyway, so there's no need to read their arcs
* check more in TestPerformance.de_suggest
* LUCENE-9806: Hunspell: speed up affix condition checking
check only stem beginning/end without strip/condition, not the whole candidate
avoid regexp if possible
* hunspell: simplify AffixCondition, add more tests
* add a license to the test
The profiler should only be invoked once at the end of the build. During
refactoring the buildFinished() hook became nested underneath stuff such
as allProjects which causes it to run too many times.
Correct package name in backwards-codecs from Lucene87 -> lucene87
It may cause no issues for case-insensitive filesystems such as on Mac
OS X or Windows, but it breaks on linux.
This commit adds simple guidelines on how to make a change to a file format:
* Document how the 'copy-on-write' approach works with backwards-codecs
* Clarify that we prefer to copy the format instead of using internal versions
For now this is just a copy of Lucene90PostingsFormat. The existing
Lucene84PostingsFormat was moved to backwards-codecs, along with its utility
classes.
CheckIndex already validates SortedDocValues properly: reads every
document's ordinal and validates derefing all the ordinals back to bytes
from the terms dictionary.
It should not do an additional (very slow) pass where it treats the
field as if it were binary (doc -> ord -> byte[]), this is slow and
doesn't validate any additional index data.
Now that the term dictionary of SortedDocValues may be compressed, it is
especially slow to misuse the docvalues field in this way.