The 7.0 backwards tests added to master must have come from an older
branch before they were fixed: they've added minutes to my test times.
These tests have already been fixed in master, so that the crazy
corner-case stress tests are only running slowly in jekins and we dont
have 15-30s long tests locally.
Re-applying same fixes to 7.0 tests removes minutes from my test times.
Partial (AKA Atomic) updates could encounter "LazyField" instances in the document
cache and not know hot to deal with them when writing the updated doc to the update log.
* LUCENE-9810: Hunspell: when generating suggestions, skip too deep word FST subtrees
we skip roots longer than misspelled+4 anyway, so there's no need to read their arcs
* check more in TestPerformance.de_suggest
* LUCENE-9806: Hunspell: speed up affix condition checking
check only stem beginning/end without strip/condition, not the whole candidate
avoid regexp if possible
* hunspell: simplify AffixCondition, add more tests
* add a license to the test
The profiler should only be invoked once at the end of the build. During
refactoring the buildFinished() hook became nested underneath stuff such
as allProjects which causes it to run too many times.
Correct package name in backwards-codecs from Lucene87 -> lucene87
It may cause no issues for case-insensitive filesystems such as on Mac
OS X or Windows, but it breaks on linux.
This commit adds simple guidelines on how to make a change to a file format:
* Document how the 'copy-on-write' approach works with backwards-codecs
* Clarify that we prefer to copy the format instead of using internal versions
For now this is just a copy of Lucene90PostingsFormat. The existing
Lucene84PostingsFormat was moved to backwards-codecs, along with its utility
classes.
CheckIndex already validates SortedDocValues properly: reads every
document's ordinal and validates derefing all the ordinals back to bytes
from the terms dictionary.
It should not do an additional (very slow) pass where it treats the
field as if it were binary (doc -> ord -> byte[]), this is slow and
doesn't validate any additional index data.
Now that the term dictionary of SortedDocValues may be compressed, it is
especially slow to misuse the docvalues field in this way.
skipBytes() is a "relative" version of seek(), but DataInput previously
implemented it via read() calls, because DataInput's API does not
include absolute positioning methods (seek, getFilePointer).
This resulted in inefficiencies: calls to skipBytes() would cause
buffers to be allocated, bytes copied, etc.
Instead, make the subclass implement skipBytes() explicitly. The old
DataInput implementation is marked deprecated and renamed to skipBytesSlowly().
Some subclasses still implement skipBytes() via skipBytesSlowly(), to be
fixed in future improvements.