druid/extensions-core
Clint Wylie d9e5245ff0
allow string dimension indexer to handle byte[] as base64 strings (#13573)
This PR expands `StringDimensionIndexer` to handle conversion of `byte[]` to base64 encoded strings, rather than the current behavior of calling java `toString`. 

This issue was uncovered by a regression of sorts introduced by #13519, which updated the protobuf extension to directly convert stuff to java types, resulting in `bytes` typed values being converted as `byte[]` instead of a base64 string which the previous JSON based conversion created. While outputting `byte[]` is more consistent with other input formats, and preferable when the bytes can be consumed directly (such as complex types serde), when fed to a `StringDimensionIndexer`, it resulted in an ugly java `toString` because `processRowValsToUnsortedEncodedKeyComponent` is fed the output of `row.getRaw(..)`. Converting `byte[]` to a base64 string within `StringDimensionIndexer` is consistent with the behavior of calling `row.getDimension(..)` which does do this coercion (and why many tests on binary types appeared to be doing the expected thing).

I added some protobuf `bytes` tests, but they don't really hit the new `StringDimensionIndexer` behavior because they operate on the `InputRow` directly, and call `getDimension` to validate stuff. The parser based version still uses the old conversion mechanisms, so when not using a flattener incorrectly calls `toString` on the `ByteString`. I have encoded this behavior in the test for now, if we either update the parser to use the new flattener or just .. remove parsers we can remove this test stuff.
2022-12-16 14:50:17 +05:30
..
avro-extensions add protobuf flattener, direct to plain java conversion for faster flattening (#13519) 2022-12-09 12:24:21 -08:00
azure-extensions Revert "Add filter in cloud object input source for backward compatibility (#13437)" (#13450) 2022-11-30 16:33:05 +05:30
datasketches Better error message when theta_sketch_intersect is used on scalar expression (#13508) 2022-12-07 09:35:43 +05:30
druid-aws-rds-extensions Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
druid-basic-security Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
druid-bloom-filter SQL test framework extensions (#13426) 2022-12-02 09:11:59 -08:00
druid-catalog Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
druid-kerberos Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
druid-pac4j Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
druid-ranger-security Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
ec2-extensions Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
google-extensions Add InputStats to track bytes processed by a task (#13520) 2022-12-13 18:54:42 +05:30
hdfs-storage Add InputStats to track bytes processed by a task (#13520) 2022-12-13 18:54:42 +05:30
histogram SQL test framework extensions (#13426) 2022-12-02 09:11:59 -08:00
kafka-extraction-namespace Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
kafka-indexing-service Add InputStats to track bytes processed by a task (#13520) 2022-12-13 18:54:42 +05:30
kinesis-indexing-service Add InputStats to track bytes processed by a task (#13520) 2022-12-13 18:54:42 +05:30
kubernetes-extensions update org.bouncycastle:bcprov-jdk15on 1.68 to 1.69 (#13440) 2022-11-30 21:57:38 +05:30
lookups-cached-global Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
lookups-cached-single Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
multi-stage-query Track input processedBytes with MSQ ingestion (#13559) 2022-12-16 02:20:01 +05:30
mysql-metadata-storage Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
orc-extensions add protobuf flattener, direct to plain java conversion for faster flattening (#13519) 2022-12-09 12:24:21 -08:00
parquet-extensions add protobuf flattener, direct to plain java conversion for faster flattening (#13519) 2022-12-09 12:24:21 -08:00
postgresql-metadata-storage Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
protobuf-extensions allow string dimension indexer to handle byte[] as base64 strings (#13573) 2022-12-16 14:50:17 +05:30
s3-extensions Add InputStats to track bytes processed by a task (#13520) 2022-12-13 18:54:42 +05:30
simple-client-sslcontext Prepare master branch for next release, 26.0.0 (#13401) 2022-11-22 15:31:01 +05:30
stats SQL test framework extensions (#13426) 2022-12-02 09:11:59 -08:00
testing-tools SQL test framework extensions (#13426) 2022-12-02 09:11:59 -08:00