This adds multi-leaf optimizations for diversified children collector. This means as children vectors are collected within a block join, we can share information between leaves to speed up vector search.
To make this happen, I refactored the multi-leaf collector slightly. Now, instead of inheriting from TopKnnCollector, we inject a inner collector.
* Hunspell: don't proceed with other suggestions if we found good REP ones
This makes "rep" and "ph" from TestHunspellRepositoryTestCases pass on latest revisions of the original Hunspell (after b88f9ea57b)
The current logic for reordering splits a slice of doc IDs into a left side and
a right side, and for each document it computes the expected gain of moving to
the other side. Then it swaps documents from both sides as long as the sum of
the gain of moving the left doc to the right and the right doc to the left is
positive.
This works well, but I would like to extend BP reordering to also work with
blocks, and the swapping logic is challenging to modify as two parent documents
may have different numbers of children.
One of the follow-up papers on BP suggested to use a different logic, where one
would compute a bias for all documents that is negative when a document is
attracted to the left and positive otherwise. Then we only have to partition doc
IDs around the mid point, e.g. with quickselect.
A benefit of this change is that it will make it easier to generalize BP
reordering to indexes that have blocks, e.g. by using a stable sort on biases.
This commit updates the FieldInfosFormat translation of vector similarity functions to be independent of the VectorSimilartyFunction enum.
The VectorSimilartyFunction enum lives outside of the codec format, and the format should not inadvertently depend upon the declaration order or values in VectorSimilartyFunction. The format should be in charge of the translation of similarity function to format ordinal (and visa versa). In reality, and for now, the translation remains the same as the declaration order, but this may not be the case in the future.