From 5b6727f3195ac9bad906e3416bf8997b069c222f Mon Sep 17 00:00:00 2001 From: Gian Merlino Date: Mon, 16 May 2022 03:13:53 -0700 Subject: [PATCH] Enable vectorized virtual column processing by default. (#12520) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, ) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue --- docs/querying/query-context.md | 2 +- .../src/main/java/org/apache/druid/query/QueryContexts.java | 2 +- .../druid/segment/virtual/VectorizedVirtualColumnTest.java | 3 +-- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/querying/query-context.md b/docs/querying/query-context.md index 8c88e8749d4..bded042ecb5 100644 --- a/docs/querying/query-context.md +++ b/docs/querying/query-context.md @@ -126,4 +126,4 @@ vectorization. These query types will ignore the "vectorize" parameter even if i |--------|-------|------------| |vectorize|`true`|Enables or disables vectorized query execution. Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override `druid.query.default.context.vectorize` if it's set.| |vectorSize|`512`|Sets the row batching size for a particular query. This will override `druid.query.default.context.vectorSize` if it's set.| -|vectorizeVirtualColumns|`false`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.| +|vectorizeVirtualColumns|`true`|Enables or disables vectorized query processing of queries with virtual columns, layered on top of `vectorize` (`vectorize` must also be set to true for a query to utilize vectorization). Possible values are `false` (disabled), `true` (enabled if possible, disabled otherwise, on a per-segment basis), and `force` (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The `"force"` setting is meant to aid in testing, and is not generally useful in production. This will override `druid.query.default.context.vectorizeVirtualColumns` if it's set.| diff --git a/processing/src/main/java/org/apache/druid/query/QueryContexts.java b/processing/src/main/java/org/apache/druid/query/QueryContexts.java index a4defa7b08e..73bf04fcae2 100644 --- a/processing/src/main/java/org/apache/druid/query/QueryContexts.java +++ b/processing/src/main/java/org/apache/druid/query/QueryContexts.java @@ -78,7 +78,7 @@ public class QueryContexts public static final boolean DEFAULT_POPULATE_RESULTLEVEL_CACHE = true; public static final boolean DEFAULT_USE_RESULTLEVEL_CACHE = true; public static final Vectorize DEFAULT_VECTORIZE = Vectorize.TRUE; - public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN = Vectorize.FALSE; + public static final Vectorize DEFAULT_VECTORIZE_VIRTUAL_COLUMN = Vectorize.TRUE; public static final int DEFAULT_PRIORITY = 0; public static final int DEFAULT_UNCOVERED_INTERVALS_LIMIT = 0; public static final long DEFAULT_TIMEOUT_MILLIS = TimeUnit.MINUTES.toMillis(5); diff --git a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java index 126e304ae04..9bfdafeb353 100644 --- a/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java +++ b/processing/src/test/java/org/apache/druid/segment/virtual/VectorizedVirtualColumnTest.java @@ -289,9 +289,8 @@ public class VectorizedVirtualColumnTest } @Test - public void testTimeseriesTrueVirtualContextCannotVectorize() + public void testTimeseriesTrueVirtualContextDefault() { - expectNonvectorized(); testTimeseries( ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ColumnType.FLOAT), CONTEXT_USE_DEFAULTS,