Require that the segment has enough dirty documents to create a clean
chunk before recompressing during merge, there must be at least maxChunkSize.
This prevents wasteful recompression with small flushes (e.g. every
document): we ensure recompression achieves some "permanent" progress.
Expose maxDocsPerChunk as a parameter for Term vectors too, matching the
stored fields format. This allows for easy testing.
Increment numDirtyDocs for partially optimized merges:
If segment N needs recompression, we have to flush any buffered docs
before bulk-copying segment N+1. Don't just increment numDirtyChunks,
also make sure numDirtyDocs is incremented, too.
This doesn't have a performance impact, and is unrelated to tooDirty()
improvements, but it is easier to reason about things with correct
statistics in the index.
Further tuning of how dirtiness is measured: for simplification just use percentage
of dirty chunks.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
Detects common cases of unreachable/dead code.
For generated javacc code, the check is disabled via
SuppressWarnings("unused") because javacc generates strange/bad code such as:
if ("" == null)
For TestStressNRTReplication's startNode() method, the check is also
disabled because analysis folds the "test evilness controls" which are
static final constants. This itself is a WTF, shouldn't we instead
randomize these evil things in our tests rather than hardcoding them to
specific values?
1. Add an option to supply a custom leaf sorter for IndexWriter.
A DirectoryReader opened from this IndexWriter will have its leaf
readers sorted with the provided leaf sorter. This is useful for
indices on which it is expected to run many queries with particular
sort criteria (e.g. for time-based indices this is usually a
descending sort on timestamp). Providing leafSorter allows
to speed up early termination for this particular type of
sort queries.
2. Add an option to supply a custom sub-readers sorter for
BaseCompositeReader. In this case sub-readers will be sorted
according to the the provided leafSorter.
3. Add an option to supply a custom leaf sorter for
StandardDirectoryReader. The leaf readers of this
StandardDirectoryReader will be sorted according to
the the provided leaf sorter.
Requiring the annotation is helpful because if an abstract method is removed, the concrete methods will then show up as compile errors: preventing dead code from being accidentally left behind.
Co-authored-by: Robert Muir <rmuir@apache.org>
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.
Co-authored-by: Mike McCandless <mikemccand@apache.org>