Stored Fields and Term Vectors are block-compressed. Decompressing and
recompressing all the documents on every merge is too slow, so we try to
avoid doing it unless it will actually improve the compression ratio. If
we can get away with it, we just bulk-copy existing compressed blocks to
the new segment.
Previously, small segments would always be considered dirty and
recompressed... the special optimized bulk merge wouldn't kick in until
segments were relatively large. But as block size and ratio (shared
dictionaries etc) have increased, "relatively large" has become a much
bigger number.
So try to avoid doing wasted work: if there's only 1 dirty chunk
(incompletely filled compression block), then don't recompress: it will
likely only give us 1 dirty chunk as a result, at the expense of cpu.
Require at least 2 dirty chunks to recompress: this way the
recompression actually buys us something (reduces 2 to 1).
The change also means that bulk merge will now happen often in
the unit test suite, increasing coverage.
Removes `scratch1` field in `BytesRefHash` by accessing underlying bytes pool directly
in `equals` method. As a result it is now possible to call `BytesRefHash#find`
concurrently as long as there are no concurrent modifications to BytesRefHash instance
and it is correctly published.
This addresses the concurrency issue with Monitor (aka Luwak) since it
is using `BytesRefHash#find` concurrently without additional synchronization.
It was never truly required there.
Pervasive use of "javabin" reduces the need to care about client-side XML speed. Better to reduce dependencies and let clients use the libs they want.
ACL is no longer provided by default in Ubuntu 20.04
Other changes:
- Made tests easier to debug
- Removed two inconsequential lines from the Dockerfile
Credentials can now be set explicitly at the client level, or can be read from System properties like in the previous version of the client when using PreemptiveBasicAuthClientBuilderFactory. Other implementations of HttpClientBuilderFactory can now also be used.
used in parallel() streaming expression. Hash algorithm is different.
* Simpler
* Don't use Filter (to be removed)
* Do use TwoPhaseIterator, not PostFilter
* Don't pre-compute matching docs (wasteful)
* Support more fields, and more field types
* Faster hash on Strings (avoid Char conversion)
* Stronger hash when using multiple fields
For/PFor code has BLOCK_SIZE=128 as a static final constant, with a lot
of assumptions and optimizations for that case. For example it will
encode 3 exceptions at most and optimizes the exception encoding with a
single byte.
This would not work at all if you changed the constant in the code to
something like 512, but an assertion at an early stage helps make
experimentation less painful, and better "documents" the assumption of how
the exception encoding currently works.