mirror of
https://github.com/apache/lucene.git
synced 2025-03-07 00:39:21 +00:00
parent
3c235bb7b4
commit
376bd24693
@ -196,6 +196,50 @@ public class OrdinalMap implements Accountable {
|
||||
// ram usage
|
||||
final long ramBytesUsed;
|
||||
|
||||
/**
|
||||
* Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. Assume
|
||||
* we have the following global mapping for a doc values field: <br>
|
||||
* bar -> 0, cat -> 1, dog -> 2, foo -> 3 <br>
|
||||
* And our index is split into 2 segments with the following local mappings for that same doc
|
||||
* values field: <br>
|
||||
* Segment 0: bar -> 0, foo -> 1 <br>
|
||||
* Segment 1: cat -> 0, dog -> 1 <br>
|
||||
* We will then encode delta between the local and global mapping in a packed 2d array keyed by
|
||||
* (segmentIndex, segmentOrd). So the following 2d array will be created by OrdinalMap: <br>
|
||||
* [[0, 2], [1, 1]]
|
||||
*
|
||||
* <p>The general algorithm for creating an OrdinalMap (skipping over some implementation details
|
||||
* and optimizations) is as follows:
|
||||
*
|
||||
* <p>[1] Create and populate a PQ with ({@link TermsEnum}, index) tuples where index is the
|
||||
* position of the termEnum in an array of termEnum's sorted by descending size. The PQ itself
|
||||
* will be ordered by {@link TermsEnum#term()}
|
||||
*
|
||||
* <p>[2] We will iterate through every term in the index now. In order to do so, we will start
|
||||
* with the first term at the top of the PQ . We keep track of a global ord, and track the
|
||||
* difference between the global ord and {@link TermsEnum#ord()} in ordDeltas, which maps: <br>
|
||||
* (segmentIndex, {@link TermsEnum#ord()}) -> globalTermOrdinal - {@link TermsEnum#ord()} <br>
|
||||
* We then call {@link TermsEnum#next()} then update the PQ to iterate (remember the PQ maintains
|
||||
* and order based on {@link TermsEnum#term()} which changes on the next() calls). If the current
|
||||
* term exists in some other segment, the top of the queue will contain that segment. If not, the
|
||||
* top of the queue will contain a segment with the next term in the index and the global ord will
|
||||
* also be incremented.
|
||||
*
|
||||
* <p>[3] We use some information gathered in the previous step to perform optimizations on memory
|
||||
* usage and building time in the following steps, for more detail on those, look at the code.
|
||||
*
|
||||
* <p>[4] We will then populate segmentToGlobalOrds, which maps (segmentIndex, segmentOrd) ->
|
||||
* globalOrd. Using the information we tracked in ordDeltas, we can construct this information
|
||||
* relatively easily.
|
||||
*
|
||||
* @param owner For caching purposes
|
||||
* @param subs A TermsEnum[], where each index corresponds to a segment
|
||||
* @param segmentMap Provides two maps, newToOld which lists segments in descending 'weight' order
|
||||
* (see {@link SegmentMap} for more details) and a oldToNew map which maps each original
|
||||
* segment index to their position in newToOld
|
||||
* @param acceptableOverheadRatio Acceptable overhead memory usage for some packed data structures
|
||||
* @throws IOException throws IOException
|
||||
*/
|
||||
OrdinalMap(
|
||||
IndexReader.CacheKey owner,
|
||||
TermsEnum[] subs,
|
||||
|
Loading…
x
Reference in New Issue
Block a user