This includes the following changes:
- New `IndexInput#slice(String, long, long, ReadAdvice)` API that allows creating slices with different advices.
- `PosixNativeAccess` now explicitly sets `MADV_NORMAL` when called with `ReadAdvice.NORMAL`. This is required to be able to override a `RANDOM` advice of a compound file with a `NORMAL` advice of a sub file of this compound file.
- `PosixNativeAccess` now only ignores the first page if a range of bytes starts before the `MemorySegment` instead of the whole range.
The introduction of the doc values skip index in #13449 broke the backward codec test as those codecs do not support
it. This commit fix it by breaking up the base class for the tests.
Optional skip list on top of doc values which is exposed via the DocValuesSkipper abstraction. A new flag is
added to FieldType.java that configures whether to create a "skip index" for doc values.
Co-authored-by: Adrien Grand <jpountz@gmail.com>
We consume a lot of memory for the `indexIn` slices. If `indexIn` is of
type `MemorySegmentIndexInput` the overhead of keeping loads of slices
around just for cloning is far higher than the extra 12b per reader this
adds (the slice description alone often costs a lot).
In a number of Elasticsearch example uses with high segment counts I
investigated, this change would save up to O(GB) of heap.
MultiTermQuery return null for ScoreSupplier if there are no terms in an index that
match query terms.
With the introduction of PR #12156 we saw degradation in performance of bool queries
where one of the mandatory clauses is a TermInSetQuery with query terms not present in
the field. Before for such cases TermsInSetQuery returned null for ScoreSupplier which
would shortcut the whole bool query.
This PR adds ability for MultiTermQuery to return null for ScoreSupplier if a field
doesn't contain any query terms.
Relates to PR #12156
Merges all immutable attributes in FieldInfos.FieldNumbers into one hashmap saving memory when
writing big indices. Fixes an exotic bug when calling clear where not all attributes were cleared.
If Caller requires Weight then they have to keep track of Weight with which Scorer was created in the first place instead of relying on Scorer.
Closes#13410
This follows a similar approach as postings and only prefetches the first page
of data.
I verified that it works well for collectors such as `TopFieldCollector`, as
`IndexSearcher` first pulls a `LeafCollector`, then a `BulkScorer` and only
then starts feeding the `BulkScorer` into the `LeafCollector`. So the
background I/O for the `LeafCollector` which will prefetch the first page of
doc values and the background I/O for the `BulkScorer` will run in parallel.
This applies to files where performing readahead could help:
- Doc values data (`.dvd`)
- Norms data (`.nvd`)
- Docs and freqs in postings lists (`.doc`)
- Points data (`.kdd`)
Other files (KNN vectors, stored fields, term vectors) keep using a `RANDOM`
advice.