mirror of https://github.com/apache/druid.git
Moving in filter check to broker (#12195)
* Moving in filter check to broker
* Adding more unit tests, making error message meaningful
* Spelling and doc changes
* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100
* Removing upper limit of 100, updated docs
* Making documentation more meaningful
* Moving check outside to PlannerConfig, updating test cases and adding back max limit
* Updated with some additional code comments
* Missed removing one line during the checkin
* Addressing doc changes and one forbidden API correction
* Final doc change
* Adding a speling exception, correcting a testcase
* Reading entire filter tree to address combinations of ANDs and ORs
* Specifying in docs that, this case works only for ORs
* Revert "Reading entire filter tree to address combinations of ANDs and ORs"
This reverts commit 81ca8f8496
.
* Covering a class cast exception and updating docs
* Counting changed
Co-authored-by: Jihoon Son <jihoonson@apache.org>
This commit is contained in:
parent
26bc4b7345
commit
eae163a797
|
@ -1844,6 +1844,7 @@ The Druid SQL server is configured through the following properties on the Broke
|
|||
|`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` is set to true. Poll period is in milliseconds. |60000|
|
||||
|`druid.sql.planner.authorizeSystemTablesDirectly`|If true, Druid authorizes queries against any of the system schema tables (`sys` in SQL) as `SYSTEM_TABLE` resources which require `READ` access, in addition to permissions based content filtering.|false|
|
||||
|`druid.sql.planner.useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite. It can be overridden per query with `useNativeQueryExplain` context key.|false|
|
||||
|`druid.sql.planner.maxNumericInFilters`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. If this value is disabled, `maxNumericInFilters` set through query context is ignored.|`-1` (disabled)|
|
||||
|`druid.sql.approxCountDistinct.function`|Implementation to use for the [`APPROX_COUNT_DISTINCT` function](../querying/sql-aggregations.md). Without extensions loaded, the only valid value is `APPROX_COUNT_DISTINCT_BUILTIN` (a HyperLogLog, or HLL, based implementation). If the [DataSketches extension](../development/extensions-core/datasketches-extension.md) is loaded, this can also be `APPROX_COUNT_DISTINCT_DS_HLL` (alternative HLL implementation) or `APPROX_COUNT_DISTINCT_DS_THETA`.<br><br>Theta sketches use significantly more memory than HLL sketches, so you should prefer one of the two HLL implementations.|APPROX_COUNT_DISTINCT_BUILTIN|
|
||||
|
||||
> Previous versions of Druid had properties named `druid.sql.planner.maxQueryCount` and `druid.sql.planner.maxSemiJoinRowsInMemory`.
|
||||
|
|
|
@ -62,7 +62,7 @@ Unless otherwise noted, the following parameters apply to all query types.
|
|||
|secondaryPartitionPruning|`true`|Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.|
|
||||
|enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter, by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.|
|
||||
|debug| `false` | Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:<br />- Log the stack trace of the exception (if any) produced by the query |
|
||||
|
||||
|maxNumericInFilters|`-1`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates only to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (‘123’, ‘456’)`. This value cannot exceed the set system configuration `druid.sql.planner.maxNumericInFilters`. This value is ignored if `druid.sql.planner.maxNumericInFilters` is not set explicitly.|
|
||||
## Parameters by query type
|
||||
|
||||
Some query types offer context parameters specific to that query type.
|
||||
|
|
|
@ -56,6 +56,7 @@ public class QueryContexts
|
|||
public static final String JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS_ENABLE_KEY = "enableJoinFilterRewriteValueColumnFilters";
|
||||
public static final String REWRITE_JOIN_TO_FILTER_ENABLE_KEY = "enableRewriteJoinToFilter";
|
||||
public static final String JOIN_FILTER_REWRITE_MAX_SIZE_KEY = "joinFilterRewriteMaxSize";
|
||||
public static final String MAX_NUMERIC_IN_FILTERS = "maxNumericInFilters";
|
||||
// This flag controls whether a SQL join query with left scan should be attempted to be run as direct table access
|
||||
// instead of being wrapped inside a query. With direct table access enabled, Druid can push down the join operation to
|
||||
// data servers.
|
||||
|
|
|
@ -21,6 +21,8 @@ package org.apache.druid.sql.calcite.planner;
|
|||
|
||||
import com.fasterxml.jackson.annotation.JsonProperty;
|
||||
import org.apache.druid.java.util.common.IAE;
|
||||
import org.apache.druid.java.util.common.Numbers;
|
||||
import org.apache.druid.java.util.common.UOE;
|
||||
import org.joda.time.DateTimeZone;
|
||||
import org.joda.time.Period;
|
||||
|
||||
|
@ -34,6 +36,8 @@ public class PlannerConfig
|
|||
public static final String CTX_KEY_USE_APPROXIMATE_TOPN = "useApproximateTopN";
|
||||
public static final String CTX_COMPUTE_INNER_JOIN_COST_AS_FILTER = "computeInnerJoinCostAsFilter";
|
||||
public static final String CTX_KEY_USE_NATIVE_QUERY_EXPLAIN = "useNativeQueryExplain";
|
||||
public static final String CTX_MAX_NUMERIC_IN_FILTERS = "maxNumericInFilters";
|
||||
public static final int NUM_FILTER_NOT_USED = -1;
|
||||
|
||||
@JsonProperty
|
||||
private Period metadataRefreshPeriod = new Period("PT1M");
|
||||
|
@ -74,11 +78,19 @@ public class PlannerConfig
|
|||
@JsonProperty
|
||||
private boolean useNativeQueryExplain = false;
|
||||
|
||||
@JsonProperty
|
||||
private int maxNumericInFilters = NUM_FILTER_NOT_USED;
|
||||
|
||||
public long getMetadataSegmentPollPeriod()
|
||||
{
|
||||
return metadataSegmentPollPeriod;
|
||||
}
|
||||
|
||||
public int getMaxNumericInFilters()
|
||||
{
|
||||
return maxNumericInFilters;
|
||||
}
|
||||
|
||||
public boolean isMetadataSegmentCacheEnable()
|
||||
{
|
||||
return metadataSegmentCacheEnable;
|
||||
|
@ -180,6 +192,14 @@ public class PlannerConfig
|
|||
CTX_KEY_USE_NATIVE_QUERY_EXPLAIN,
|
||||
isUseNativeQueryExplain()
|
||||
);
|
||||
final int systemConfigMaxNumericInFilters = getMaxNumericInFilters();
|
||||
final int queryContextMaxNumericInFilters = getContextInt(
|
||||
context,
|
||||
CTX_MAX_NUMERIC_IN_FILTERS,
|
||||
getMaxNumericInFilters()
|
||||
);
|
||||
newConfig.maxNumericInFilters = validateMaxNumericInFilters(queryContextMaxNumericInFilters,
|
||||
systemConfigMaxNumericInFilters);
|
||||
newConfig.requireTimeCondition = isRequireTimeCondition();
|
||||
newConfig.sqlTimeZone = getSqlTimeZone();
|
||||
newConfig.awaitInitializationOnStart = isAwaitInitializationOnStart();
|
||||
|
@ -190,6 +210,46 @@ public class PlannerConfig
|
|||
return newConfig;
|
||||
}
|
||||
|
||||
private int validateMaxNumericInFilters(int queryContextMaxNumericInFilters, int systemConfigMaxNumericInFilters)
|
||||
{
|
||||
// if maxNumericInFIlters through context == 0 catch exception
|
||||
// else if query context exceeds system set value throw error
|
||||
if (queryContextMaxNumericInFilters == 0) {
|
||||
throw new UOE("[%s] must be greater than 0", CTX_MAX_NUMERIC_IN_FILTERS);
|
||||
} else if (queryContextMaxNumericInFilters > systemConfigMaxNumericInFilters
|
||||
&& systemConfigMaxNumericInFilters != NUM_FILTER_NOT_USED) {
|
||||
throw new UOE(
|
||||
"Expected parameter[%s] cannot exceed system set value of [%d]",
|
||||
CTX_MAX_NUMERIC_IN_FILTERS,
|
||||
systemConfigMaxNumericInFilters
|
||||
);
|
||||
}
|
||||
// if system set value is not present, thereby inferring default of -1
|
||||
if (systemConfigMaxNumericInFilters == NUM_FILTER_NOT_USED) {
|
||||
return systemConfigMaxNumericInFilters;
|
||||
}
|
||||
// all other cases return the valid query context value
|
||||
return queryContextMaxNumericInFilters;
|
||||
}
|
||||
|
||||
private static int getContextInt(
|
||||
final Map<String, Object> context,
|
||||
final String parameter,
|
||||
final int defaultValue
|
||||
)
|
||||
{
|
||||
final Object value = context.get(parameter);
|
||||
if (value == null) {
|
||||
return defaultValue;
|
||||
} else if (value instanceof String) {
|
||||
return Numbers.parseInt(value);
|
||||
} else if (value instanceof Integer) {
|
||||
return (Integer) value;
|
||||
} else {
|
||||
throw new IAE("Expected parameter[%s] to be integer", parameter);
|
||||
}
|
||||
}
|
||||
|
||||
private static boolean getContextBoolean(
|
||||
final Map<String, Object> context,
|
||||
final String parameter,
|
||||
|
|
|
@ -35,12 +35,17 @@ import org.apache.druid.java.util.common.DateTimes;
|
|||
import org.apache.druid.java.util.common.IAE;
|
||||
import org.apache.druid.java.util.common.ISE;
|
||||
import org.apache.druid.java.util.common.Intervals;
|
||||
import org.apache.druid.java.util.common.StringUtils;
|
||||
import org.apache.druid.java.util.common.UOE;
|
||||
import org.apache.druid.java.util.common.guava.Sequence;
|
||||
import org.apache.druid.java.util.common.guava.Sequences;
|
||||
import org.apache.druid.math.expr.Evals;
|
||||
import org.apache.druid.query.InlineDataSource;
|
||||
import org.apache.druid.query.Query;
|
||||
import org.apache.druid.query.QueryToolChest;
|
||||
import org.apache.druid.query.filter.BoundDimFilter;
|
||||
import org.apache.druid.query.filter.DimFilter;
|
||||
import org.apache.druid.query.filter.OrDimFilter;
|
||||
import org.apache.druid.query.planning.DataSourceAnalysis;
|
||||
import org.apache.druid.query.spec.QuerySegmentSpec;
|
||||
import org.apache.druid.query.timeseries.TimeseriesQuery;
|
||||
|
@ -53,6 +58,7 @@ import org.apache.druid.server.QueryLifecycleFactory;
|
|||
import org.apache.druid.server.security.Access;
|
||||
import org.apache.druid.server.security.AuthenticationResult;
|
||||
import org.apache.druid.sql.calcite.planner.Calcites;
|
||||
import org.apache.druid.sql.calcite.planner.PlannerConfig;
|
||||
import org.apache.druid.sql.calcite.planner.PlannerContext;
|
||||
import org.apache.druid.sql.calcite.rel.CannotBuildQueryException;
|
||||
import org.apache.druid.sql.calcite.rel.DruidQuery;
|
||||
|
@ -124,6 +130,36 @@ public class NativeQueryMaker implements QueryMaker
|
|||
);
|
||||
}
|
||||
}
|
||||
int numFilters = plannerContext.getPlannerConfig().getMaxNumericInFilters();
|
||||
|
||||
// special corner case handling for numeric IN filters
|
||||
// in case of query containing IN (v1, v2, v3,...) where Vi is numeric
|
||||
// a BoundFilter is created internally for each of the values
|
||||
// whereas when Vi s are String the Filters are converted as BoundFilter to SelectorFilter to InFilter
|
||||
// which takes lesser processing for bitmaps
|
||||
// So in a case where user executes a query with multiple numeric INs, flame graph shows BoundFilter.getBitmapResult
|
||||
// and BoundFilter.match predicate eating up processing time which stalls a historical for a query with large number
|
||||
// of numeric INs (> 10K). In such cases user should change the query to specify the IN clauses as String
|
||||
// Instead of IN(v1,v2,v3) user should specify IN('v1','v2','v3')
|
||||
if (numFilters != PlannerConfig.NUM_FILTER_NOT_USED) {
|
||||
if (query.getFilter() instanceof OrDimFilter) {
|
||||
OrDimFilter orDimFilter = (OrDimFilter) query.getFilter();
|
||||
int numBoundFilters = 0;
|
||||
for (DimFilter filter : orDimFilter.getFields()) {
|
||||
numBoundFilters += filter instanceof BoundDimFilter ? 1 : 0;
|
||||
}
|
||||
if (numBoundFilters > numFilters) {
|
||||
String dimension = ((BoundDimFilter) (orDimFilter.getFields().get(0))).getDimension();
|
||||
throw new UOE(StringUtils.format(
|
||||
"The number of values in the IN clause for [%s] in query exceeds configured maxNumericFilter limit of [%s] for INs. Cast [%s] values of IN clause to String",
|
||||
dimension,
|
||||
numFilters,
|
||||
orDimFilter.getFields().size()
|
||||
));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
final List<String> rowOrder;
|
||||
if (query instanceof TimeseriesQuery && !druidQuery.getGrouping().getDimensions().isEmpty()) {
|
||||
|
|
|
@ -201,6 +201,16 @@ public class BaseCalciteQueryTest extends CalciteTestBase
|
|||
}
|
||||
};
|
||||
|
||||
public static final int MAX_NUM_IN_FILTERS = 100;
|
||||
public static final PlannerConfig PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER = new PlannerConfig()
|
||||
{
|
||||
@Override
|
||||
public int getMaxNumericInFilters()
|
||||
{
|
||||
return MAX_NUM_IN_FILTERS;
|
||||
}
|
||||
};
|
||||
|
||||
public static final String DUMMY_SQL_ID = "dummy";
|
||||
public static final String LOS_ANGELES = "America/Los_Angeles";
|
||||
|
||||
|
|
|
@ -30,6 +30,7 @@ import org.apache.druid.java.util.common.HumanReadableBytes;
|
|||
import org.apache.druid.java.util.common.Intervals;
|
||||
import org.apache.druid.java.util.common.JodaUtils;
|
||||
import org.apache.druid.java.util.common.StringUtils;
|
||||
import org.apache.druid.java.util.common.UOE;
|
||||
import org.apache.druid.java.util.common.granularity.Granularities;
|
||||
import org.apache.druid.java.util.common.granularity.PeriodGranularity;
|
||||
import org.apache.druid.math.expr.ExprMacroTable;
|
||||
|
@ -6903,6 +6904,67 @@ public class CalciteQueryTest extends BaseCalciteQueryTest
|
|||
);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testZeroMaxNumericInFilter() throws Exception
|
||||
{
|
||||
expectedException.expect(UOE.class);
|
||||
expectedException.expectMessage("[maxNumericInFilters] must be greater than 0");
|
||||
|
||||
testQuery(
|
||||
PLANNER_CONFIG_DEFAULT,
|
||||
ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 0),
|
||||
"SELECT COUNT(*)\n"
|
||||
+ "FROM druid.numfoo\n"
|
||||
+ "WHERE dim6 IN (\n"
|
||||
+ "1,2,3\n"
|
||||
+ ")\n",
|
||||
CalciteTests.REGULAR_USER_AUTH_RESULT,
|
||||
ImmutableList.of(),
|
||||
ImmutableList.of()
|
||||
);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testHighestMaxNumericInFilter() throws Exception
|
||||
{
|
||||
expectedException.expect(UOE.class);
|
||||
expectedException.expectMessage("Expected parameter[maxNumericInFilters] cannot exceed system set value of [100]");
|
||||
|
||||
testQuery(
|
||||
PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER,
|
||||
ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 20000),
|
||||
"SELECT COUNT(*)\n"
|
||||
+ "FROM druid.numfoo\n"
|
||||
+ "WHERE dim6 IN (\n"
|
||||
+ "1,2,3\n"
|
||||
+ ")\n",
|
||||
CalciteTests.REGULAR_USER_AUTH_RESULT,
|
||||
ImmutableList.of(),
|
||||
ImmutableList.of()
|
||||
);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testQueryWithMoreThanMaxNumericInFilter() throws Exception
|
||||
{
|
||||
expectedException.expect(UOE.class);
|
||||
expectedException.expectMessage("The number of values in the IN clause for [dim6] in query exceeds configured maxNumericFilter limit of [2] for INs. Cast [3] values of IN clause to String");
|
||||
|
||||
testQuery(
|
||||
PLANNER_CONFIG_MAX_NUMERIC_IN_FILTER,
|
||||
ImmutableMap.of(QueryContexts.MAX_NUMERIC_IN_FILTERS, 2),
|
||||
"SELECT COUNT(*)\n"
|
||||
+ "FROM druid.numfoo\n"
|
||||
+ "WHERE dim6 IN (\n"
|
||||
+ "1,2,3\n"
|
||||
+ ")\n",
|
||||
CalciteTests.REGULAR_USER_AUTH_RESULT,
|
||||
ImmutableList.of(),
|
||||
ImmutableList.of()
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@Test
|
||||
public void testExplainExactCountDistinctOfSemiJoinResult() throws Exception
|
||||
{
|
||||
|
|
|
@ -305,7 +305,7 @@ public class CalciteTests
|
|||
new TimestampSpec(TIMESTAMP_COLUMN, "iso", null),
|
||||
new DimensionsSpec(
|
||||
ImmutableList.<DimensionSchema>builder()
|
||||
.addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5")))
|
||||
.addAll(DimensionsSpec.getDefaultSchemas(ImmutableList.of("dim1", "dim2", "dim3", "dim4", "dim5", "dim6")))
|
||||
.add(new DoubleDimensionSchema("d1"))
|
||||
.add(new DoubleDimensionSchema("d2"))
|
||||
.add(new FloatDimensionSchema("f1"))
|
||||
|
@ -537,6 +537,7 @@ public class CalciteTests
|
|||
.put("dim3", ImmutableList.of("a", "b"))
|
||||
.put("dim4", "a")
|
||||
.put("dim5", "aa")
|
||||
.put("dim6", "1")
|
||||
.build(),
|
||||
ImmutableMap.<String, Object>builder()
|
||||
.put("t", "2000-01-02")
|
||||
|
@ -553,6 +554,7 @@ public class CalciteTests
|
|||
.put("dim3", ImmutableList.of("b", "c"))
|
||||
.put("dim4", "a")
|
||||
.put("dim5", "ab")
|
||||
.put("dim6", "2")
|
||||
.build(),
|
||||
ImmutableMap.<String, Object>builder()
|
||||
.put("t", "2000-01-03")
|
||||
|
@ -569,6 +571,7 @@ public class CalciteTests
|
|||
.put("dim3", ImmutableList.of("d"))
|
||||
.put("dim4", "a")
|
||||
.put("dim5", "ba")
|
||||
.put("dim6", "3")
|
||||
.build(),
|
||||
ImmutableMap.<String, Object>builder()
|
||||
.put("t", "2001-01-01")
|
||||
|
@ -579,6 +582,7 @@ public class CalciteTests
|
|||
.put("dim3", ImmutableList.of(""))
|
||||
.put("dim4", "b")
|
||||
.put("dim5", "ad")
|
||||
.put("dim6", "4")
|
||||
.build(),
|
||||
ImmutableMap.<String, Object>builder()
|
||||
.put("t", "2001-01-02")
|
||||
|
@ -589,6 +593,7 @@ public class CalciteTests
|
|||
.put("dim3", ImmutableList.of())
|
||||
.put("dim4", "b")
|
||||
.put("dim5", "aa")
|
||||
.put("dim6", "5")
|
||||
.build(),
|
||||
ImmutableMap.<String, Object>builder()
|
||||
.put("t", "2001-01-03")
|
||||
|
@ -597,6 +602,7 @@ public class CalciteTests
|
|||
.put("dim1", "abc")
|
||||
.put("dim4", "b")
|
||||
.put("dim5", "ab")
|
||||
.put("dim6", "6")
|
||||
.build()
|
||||
);
|
||||
public static final List<InputRow> ROWS1_WITH_NUMERIC_DIMS =
|
||||
|
|
|
@ -313,6 +313,7 @@ lookback
|
|||
lookups
|
||||
mapreduce
|
||||
masse
|
||||
maxNumericInFilters
|
||||
maxNumFiles
|
||||
maxNumSegments
|
||||
max_map_count
|
||||
|
@ -563,6 +564,7 @@ _common
|
|||
- ../docs/dependencies/deep-storage.md
|
||||
druid-hdfs-storage
|
||||
druid-s3-extensions
|
||||
druid.sql.planner.maxNumericInFilters
|
||||
- ../docs/dependencies/metadata-storage.md
|
||||
BasicDataSource
|
||||
- ../docs/dependencies/zookeeper.md
|
||||
|
|
Loading…
Reference in New Issue