druid/extensions-core
Gian Merlino 202c78c8f3
Enable rewriting certain inner joins as filters. (#11068)
* Enable rewriting certain inner joins as filters.

The main logic for doing the rewrite is in JoinableFactoryWrapper's
segmentMapFn method. The requirements are:

- It must be an inner equi-join.
- The right-hand columns referenced by the condition must not contain any
  duplicate values. (If they did, the inner join would not be guaranteed
  to return at most one row for each left-hand-side row.)
- No columns from the right-hand side can be used by anything other than
  the join condition itself.

HashJoinSegmentStorageAdapter is also modified to pass through to
the base adapter (even allowing vectorization!) in the case where 100%
of join clauses could be rewritten as filters.

In support of this goal:

- Add Query getRequiredColumns() method to help us figure out whether
  the right-hand side of a join datasource is being used or not.
- Add JoinConditionAnalysis getRequiredColumns() method to help us
  figure out if the right-hand side of a join is being used by later
  join clauses acting on the same base.
- Add Joinable getNonNullColumnValuesIfAllUnique method to enable
  retrieving the set of values that will form the "in" filter.
- Add LookupExtractor canGetKeySet() and keySet() methods to support
  LookupJoinable in its efforts to implement the new Joinable method.
- Add "enableRewriteJoinToFilter" feature flag to
  JoinFilterRewriteConfig. The default is disabled.

* Test improvements.

* Test fixes.

* Avoid slow size() call.

* Remove invalid test.

* Fix style.

* Fix mistaken default.

* Small fixes.

* Fix logic error.
2021-04-14 10:49:27 -07:00
..
avro-extensions add avro stream input format (#11040) 2021-04-12 21:53:41 -07:00
azure-extensions Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
datasketches Fix CAST being ignored when aggregating on strings after cast (#11083) 2021-04-12 22:21:24 -07:00
druid-aws-rds-extensions Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
druid-basic-security basic security extension ignore permissions that use unknown ResourceType or Action (#10896) 2021-02-23 14:49:09 -08:00
druid-bloom-filter vector group by support for string expressions (#11010) 2021-04-08 19:23:39 -07:00
druid-kerberos Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
druid-pac4j Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
druid-ranger-security Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
ec2-extensions Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
google-extensions GCS lookup support (#11026) 2021-03-30 01:40:41 +05:30
hdfs-storage DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) 2021-03-25 10:32:21 -07:00
histogram Fix CAST being ignored when aggregating on strings after cast (#11083) 2021-04-12 22:21:24 -07:00
kafka-extraction-namespace Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
kafka-indexing-service Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676) 2021-04-08 21:03:00 -07:00
kinesis-indexing-service Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676) 2021-04-08 21:03:00 -07:00
kubernetes-extensions k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value (#10961) 2021-04-07 17:45:28 -07:00
lookups-cached-global Allow list for JDBC connection properties to address CVE-2021-26919 (#11047) 2021-04-01 17:30:47 -07:00
lookups-cached-single Enable rewriting certain inner joins as filters. (#11068) 2021-04-14 10:49:27 -07:00
mysql-metadata-storage Enforce allow list for JDBC properties by default (#11063) 2021-04-06 19:46:19 -07:00
orc-extensions DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) 2021-03-25 10:32:21 -07:00
parquet-extensions DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) 2021-03-25 10:32:21 -07:00
postgresql-metadata-storage Enforce allow list for JDBC properties by default (#11063) 2021-04-06 19:46:19 -07:00
protobuf-extensions add protobuf inputformat (#11018) 2021-04-12 22:03:13 -07:00
s3-extensions DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) 2021-03-25 10:32:21 -07:00
simple-client-sslcontext Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
stats Fix CAST being ignored when aggregating on strings after cast (#11083) 2021-04-12 22:21:24 -07:00