Compress blocks of suffixes in order to make the terms dictionary more
space-efficient. Two compression algorithms are used depending on which one is
more space-efficient:
- LowercaseAsciiCompression, which applies when all bytes are in the
`[0x1F,0x3F)` or `[0x5F,0x7F)` ranges, which notably include all digits,
lowercase ASCII characters, '.', '-' and '_', and encodes 4 chars on 3 bytes.
It is very often applicable on analyzed content and decompresses very quickly
thanks to auto-vectorization support in the JVM.
- LZ4, when the compression ratio is less than 0.75.
I was a bit unhappy with the complexity of the high-compression LZ4 option, so
I simplified it in order to only keep the logic that detects duplicate strings.
The logic about what to do in case overlapping matches are found, which was
responsible for most of the complexity while only yielding tiny benefits, has
been removed.
This also fixes a bug where an inability to assign a node based on existing autoscaling policy resulted in a server error instead of a bad request.
This closes#1152.
* DocValuesFieldExistsQuery and NormsFieldExistsQuery are used for existence queries when possible.
* Added documentation on the difference between field:* and field:[* TO *]
Calming down individual test methods with double-digit execution times
after running tests many times.
There are a few more issues remaining, but this solves the majority of them.
This is consistently the slowest test for me in all of lucene core by
far. Takes around an entire minute. Mark it nightly: should catch any
issues with RAM estimation but keep local builds fast.
Adds some build parameters to tune how tests run. There is an example
shown by "gradle helpLocalSettings"
Default C2 off in tests as it is wasteful locally and causes slowdown of
tests runs. You can override this by setting tests.jvmargs for gradle,
or args for ant.
Some crazy lucene stress tests may need to be toned down after the
change, as they may have been doing too many iterations by default...
but this is not a new problem.
* Use Caffeine impl and weak values (to the schema). Previously the cache never evicted!
* now populating the configSet name from ZK into CloudDescriptor when CloudDescriptor is loaded
* actual schema name needs to be deterministic now; fallback from non-existent managed-schema to schema.xml will thwart this cache
* a test conf/core.properties wasn't actually used and became a problem in it's weird location after I refactored some logic