Previously the DocIdSetIterator returned an old value for docID advance
returned NO_MORE_DOCS. This violates the DocIdSetIterator contract and made it
possiblefor the iterator's advance method to be called even after it was
already exhausted.
It's a minor mistake and it doesn't affect the output due to the scale change (It will just make PAST_HOUR, PAST_SIX_HOURS not as intended). Still it's better to be correct.
- Enhance DocComparator to provide an iterator over competitive
documents when searching with "after". This iterator can quickly position
on the desired "after" document skipping all documents and segments before
"after".
- Redesign numeric comparators to provide skipping functionality
by default.
Relates to LUCENE-9280
StoredFieldsWriter might consume some heap space memory that
can have a significant impact on decisions made in the IW if
writers should be stalled or DWPTs should be flushed if memory
settings are small in IWC and flushes are frequent. This change adds
RAM accounting to the StoredFieldsWriter since it's part of the
DWPT lifecycle and not just present during flush.
* Abstract classes don't need public constructors since they can only be
called by subclasses
* Don't escape html characters in @code tags in javadoc
* Fixed a few int/long arithmetic
* Use explicit Term.toString instead of implicit byte[].toString
* Javadoc typos
* Consistent capitalization for field and parameter names
Today we need to decide on an index sorting before we create the index.
In some situations it might make a lot of sense to sort an index afterwards
when the index doesn't change anymore or to compress older indices.
This change adds the ability to wrap readers from an unsorted index and merge it
into a sorted index by using IW#addIndices.
Some queries use DocValues.unwrapSingleton to execute different logic for
single-valued doc values. When tests use an AssertingLeafReader, unwrapSingleton
will never unwrap the doc values, as they don't have the expected class. So some
queries have code paths that are never exercised with an AssertingLeafReader.
This change makes sure to preserve the expected classes when creating asserting
doc values.
This has the same logic as the previous python, but no longer relies
upon parsing HTML output, instead using java's doclet processor.
The errors are reported like "normal" javadoc errors with source file
name and line number and happen when running "gradlew javadoc"
Although the "rules" are the same as the previous python, the python had
some bugs where the checker didn't quite do exactly what we wanted, so
some fixes were applied throughout.
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Uwe Schindler <uschindler@apache.org>
* Remove DIH example directory
* Remove contrib code directories
* Remove contrib package related configurations for build tools
* Remove mention of DIH example
* remove dih as build dependencies and no-longer needed version pins
* Remove README references to DIH
* Remove dih mention from the script that probably does need to exist at all
* More build artifact references
* More removed dependencies leftovers (licenses/versions)
* No need to smoke exclude DIH anymore
* Remove Admin UI's DIH integration
* Remove DIH from shortname package list
* Remove unused DIH (related? not?) dataset
Unclear what is happening here, but there is no reference to that directory anywhere else
The other parallel directories ARE referenced in a TestConfigSetsAPI.java
* Hidden Idea files references
* No DIH to ignore anymore
* Remove last Derby DB references
* Remove DIH from documentation
Add the information in Major Changes document with the link to the external repo
* Added/updated a mention to CHANGES
* Fix leftover library mentions
* Fix Spellings
Stored fields have a metadata file, but it currently only records
metadata about the index, not the actual data. This commit moves all
metadata to the metadata file.
In LUCENE-9304 we introduced some fixes that unfortunately hold on to the previous
DWPTDeleteQueue which is essentially leaking IW memory and cause applications to fail.
This fixes the memory leak and adds a test to ensure its not leaking memory.
If we fail to rollback an already renamed pending segments file during
commit due to a failure in directory syncing we might not fully roll back
to a proper state if we hit a failure during rollback which leaves the index
in a broken state. This is a best effort approach to remove the renamed file
in the case of a failure during sync.
Add IndexWriter merge-on-refresh feature to selectively merge
small segments on getReader, subject to a configurable timeout,
to improve search performance by reducing the number of small
segments for searching.
Co-authored-by: Mike McCandless <mikemccand@apache.org>