HybridDirectory should mmap postings. (#52641) (#52873)

Since version 8.4, `MMapDirectory` has an optimization to read long[]
arrays directly in little endian order, which postings leverage. So it'd
be more efficient to open postings with `MMapDirectory`.

I refactored a bit the existing logic to better explain why every listed
file extension is open with `mmap`.
This commit is contained in:
Adrien Grand 2020-02-28 18:45:46 +01:00 committed by GitHub
parent 090bdf69c0
commit 331d4bb0af
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 15 additions and 3 deletions

View File

@ -152,15 +152,27 @@ public class FsDirectoryFactory implements IndexStorePlugin.DirectoryFactory {
boolean useDelegate(String name) {
String extension = FileSwitchDirectory.getExtension(name);
switch(extension) {
// We are mmapping norms, docvalues as well as term dictionaries, all other files are served through NIOFS
// this provides good random access performance and does not lead to page cache thrashing.
// Norms, doc values and term dictionaries are typically performance-sensitive and hot in the page
// cache, so we use mmap, which provides better performance.
case "nvd":
case "dvd":
case "tim":
// We want to open the terms index and KD-tree index off-heap to save memory, but this only performs
// well if using mmap.
case "tip":
case "cfs":
case "dim":
// Compound files are tricky because they store all the information for the segment. Benchmarks
// suggested that not mapping them hurts performance.
case "cfs":
// MMapDirectory has special logic to read long[] arrays in little-endian order that helps speed
// up the decoding of postings. The same logic applies to positions (.pos) of offsets (.pay) but we
// are not mmaping them as queries that leverage positions are more costly and the decoding of postings
// tends to be less a bottleneck.
case "doc":
return true;
// Other files are either less performance-sensitive (e.g. stored field index, norms metadata)
// or are large and have a random access pattern and mmap leads to page cache trashing
// (e.g. stored fields and term vectors).
default:
return false;
}