druid/codestyle
Gian Merlino 67fbd8e7fc
Add "stringEncoding" parameter to DataSketches HLL. (#11201)
* Add "stringEncoding" parameter to DataSketches HLL.

Builds on the concept from #11172 and adds a way to feed HLL sketches
with UTF-8 bytes.

This must be an option rather than always-on, because prior to this
patch, HLL sketches used UTF-16LE encoding when hashing strings. To
remain compatible with sketch images created prior to this patch -- which
matters during rolling updates and when reading sketches that have been
written to segments -- we must keep UTF-16LE as the default.

Not currently documented, because I'm not yet sure how best to expose
this functionality to users. I think the first place would be in the SQL
layer: we could have it automatically select UTF-8 or UTF-16LE when
building sketches at query time. We need to be careful about this, though,
because UTF-8 isn't always faster. Sometimes, like for the results of
expressions, UTF-16LE is faster. I expect we will sort this out in
future patches.

* Fix benchmark.

* Fix style issues, improve test coverage.

* Put round back, to make IT updates easier.

* Fix test.

* Fix issue with filtered aggregators and add test.

* Use DS native update(ByteBuffer) method. Improve test coverage.

* Add another suppression.

* Fix ITAutoCompactionTest.

* Update benchmarks.

* Updates.

* Fix conflict.

* Adjustments.
2023-06-30 12:45:55 -07:00
..
LICENSE.txt
checkstyle-suppressions.xml handle timestamps of complex types when parsing protobuf messages (#11293) 2021-06-07 15:19:39 +05:30
checkstyle.xml Add Checkstyle check for String literal equality (#8386) 2019-08-28 17:53:42 +03:00
druid-forbidden-apis.txt Removing the forbidden check on getOrDefault due to java8 incompatibility. (#13920) 2023-03-11 09:49:32 +05:30
guava16-forbidden-apis.txt Forbiddenapis: Split the guava16-only signatures file from main signatures file (#12170) 2022-01-19 17:50:28 -08:00
joda-time-forbidden-apis.txt Introduce SegmentId class (#6370) 2019-01-21 11:11:10 -08:00
pmd-ruleset.xml Remove use of deprecated PMD ruleset (#12044) 2021-12-09 13:04:27 -08:00
spotbugs-exclude.xml Add "stringEncoding" parameter to DataSketches HLL. (#11201) 2023-06-30 12:45:55 -07:00