Instead of configuring a dictionary size and a block size, the format
now tries to have 10 sub blocks per bigger block, and adapts the size of
the dictionary and of the sub blocks to this overall block size.
When index sorting is enabled, stored fields and term vectors can't be
written on the fly like in the normal case, so they are written into
temporary files that then get resorted. For these temporary files,
disabling compression speeds up indexing significantly.
On a synthetic test that indexes stored fields and a doc value field
populated with random values that is used for index sorting, this
resulted in a 3x indexing speedup.
* Restore lucene/version.properties
* Switch release wizard commands from ant to gradle equivalents
* Remove remaining checks for ant
* Remove checks for Java 8
* Update Copyright year
* Minor bug fixes around determining next version for a major release
This removes the ability to replace the IndexingChain / DocConsumer
in Lucenes IndexWriter. The interface is not sufficient to efficiently
replace the functionality with reasonable efforts. It also seems it's completely
unused at this point and hasn't been maintained in years.
Added mentions for BBoxField, NestPathField, RankField (and placehold for QParser, from SOLR-14590), RptWithGeometrySpatialField
Moved Deprecated types into separate table to improve reading comprehension
Added some cross-references for more in-depth reading.
With recent changes to stored fields that split blocks into several sub
blocks, the merge instance has become much slower at random access since
it would decompress all sub blocks when accessing a document. Since
stored fields likely get accessed in random order at flush time when
index sorting is enabled, it's better not to use the merge instance.
On a synthetic benchmark that has one stored field and one numeric
doc-value field that is used for sorting and fed with random values,
this made indexing more than 4x faster.
TermVectorsWriter might consume some heap space memory that
can have a significant impact on decisions made in the IW if
writers should be stalled or DWPTs should be flushed if memory
settings are small in IWC and flushes are frequent. This change adds
RAM accounting to the TermVectorsWriter since it's part of the
DWPT lifecycle and not just present during flush.
Previously the DocIdSetIterator returned an old value for docID advance
returned NO_MORE_DOCS. This violates the DocIdSetIterator contract and made it
possiblefor the iterator's advance method to be called even after it was
already exhausted.
It's a minor mistake and it doesn't affect the output due to the scale change (It will just make PAST_HOUR, PAST_SIX_HOURS not as intended). Still it's better to be correct.
- Enhance DocComparator to provide an iterator over competitive
documents when searching with "after". This iterator can quickly position
on the desired "after" document skipping all documents and segments before
"after".
- Redesign numeric comparators to provide skipping functionality
by default.
Relates to LUCENE-9280