diff --git a/lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java b/lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java
index bcee2f43214..5785c5dc938 100644
--- a/lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java
+++ b/lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java
@@ -196,6 +196,50 @@ public class OrdinalMap implements Accountable {
// ram usage
final long ramBytesUsed;
+ /**
+ * Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. Assume
+ * we have the following global mapping for a doc values field:
+ * bar -> 0, cat -> 1, dog -> 2, foo -> 3
+ * And our index is split into 2 segments with the following local mappings for that same doc
+ * values field:
+ * Segment 0: bar -> 0, foo -> 1
+ * Segment 1: cat -> 0, dog -> 1
+ * We will then encode delta between the local and global mapping in a packed 2d array keyed by
+ * (segmentIndex, segmentOrd). So the following 2d array will be created by OrdinalMap:
+ * [[0, 2], [1, 1]]
+ *
+ *
The general algorithm for creating an OrdinalMap (skipping over some implementation details + * and optimizations) is as follows: + * + *
[1] Create and populate a PQ with ({@link TermsEnum}, index) tuples where index is the + * position of the termEnum in an array of termEnum's sorted by descending size. The PQ itself + * will be ordered by {@link TermsEnum#term()} + * + *
[2] We will iterate through every term in the index now. In order to do so, we will start
+ * with the first term at the top of the PQ . We keep track of a global ord, and track the
+ * difference between the global ord and {@link TermsEnum#ord()} in ordDeltas, which maps:
+ * (segmentIndex, {@link TermsEnum#ord()}) -> globalTermOrdinal - {@link TermsEnum#ord()}
+ * We then call {@link TermsEnum#next()} then update the PQ to iterate (remember the PQ maintains
+ * and order based on {@link TermsEnum#term()} which changes on the next() calls). If the current
+ * term exists in some other segment, the top of the queue will contain that segment. If not, the
+ * top of the queue will contain a segment with the next term in the index and the global ord will
+ * also be incremented.
+ *
+ *
[3] We use some information gathered in the previous step to perform optimizations on memory + * usage and building time in the following steps, for more detail on those, look at the code. + * + *
[4] We will then populate segmentToGlobalOrds, which maps (segmentIndex, segmentOrd) -> + * globalOrd. Using the information we tracked in ordDeltas, we can construct this information + * relatively easily. + * + * @param owner For caching purposes + * @param subs A TermsEnum[], where each index corresponds to a segment + * @param segmentMap Provides two maps, newToOld which lists segments in descending 'weight' order + * (see {@link SegmentMap} for more details) and a oldToNew map which maps each original + * segment index to their position in newToOld + * @param acceptableOverheadRatio Acceptable overhead memory usage for some packed data structures + * @throws IOException throws IOException + */ OrdinalMap( IndexReader.CacheKey owner, TermsEnum[] subs,