StringDictionaryEncodedColumn dimSelector to return CARDINALITY_UNKNOWN with extractionFn (#8433)

* update DimensionDictionarySelector.getValueCardinality() javadoc

* unknown cardinality in StringDictionaryEncodedColumn dim selector

* revert StringDictionaryEncodedColumn change as that fails GroupBy-v1 execution for many working queries

* fix/add more comments
This commit is contained in:
Himanshu 2019-09-06 14:19:25 -07:00 committed by GitHub
parent 645799f977
commit 1fe4ecf17a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 14 additions and 1 deletions

View File

@ -47,7 +47,12 @@ public interface DimensionDictionarySelector
* dimension selector has no dictionary, and avoid storing ids, calling "lookupId", or calling "lookupName"
* outside of the context of operating on a single row.
*
* @return the value cardinality, or -1 if unknown.
* If cardinality is known then it is assumed that underlying dictionary is lexicographically sorted by the encoded
* value.
* For example if there are values "A" , "B" , "C" in a column with cardinality 3 then it is assumed that
* id("A") < id("B") < id("C")
*
* @return the value cardinality, or {@link DimensionDictionarySelector#CARDINALITY_UNKNOWN} if unknown.
*/
int getValueCardinality();

View File

@ -121,6 +121,14 @@ public class StringDictionaryEncodedColumn implements DictionaryEncodedColumn<St
@Override
public int getValueCardinality()
{
/*
This is technically wrong if
extractionFn != null && (extractionFn.getExtractionType() != ExtractionFn.ExtractionType.ONE_TO_ONE ||
!extractionFn.preservesOrdering())
However current behavior allows some GroupBy-V1 queries to work that wouldn't work otherwise and doesn't
cause any problems due to special handling of extractionFn everywhere.
See https://github.com/apache/incubator-druid/pull/8433
*/
return getCardinality();
}