* fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode
changes:
* json_value 'returning' decimal will now plan to native double typed query instead of ending up with default string typing, allowing decimal vector math expressions to work with this type
* vector math expressions now zero out 'null' values even in 'default' mode (druid.generic.useDefaultValueForNull=false) to prevent downstream things that do not check the null vector from producing incorrect results
* more better
* test and why not vectorize
* more test, more fix
This adds a sql function, "BIG_SUM", that uses
CompressedBigDecimal to do a sum. Other misc changes:
1. handle NumberFormatExceptions when parsing a string (default to set
to 0, configurable in agg factory to be strict and throw on error)
2. format pom file (whitespace) + add dependency
3. scaleUp -> scale and always require scale as a parameter
* Converted Druid planner to use statement handlers
Converts the large collection of if-statements for statement
types into a set of classes: one per supported statement type.
Cleans up a few error messages.
* Revisions from review comments
* Build fix
* Build fix
* Resolve merge confict.
* More merges with QueryResponse PR
* More parameterized type cleanup
Forces a rebuild due to a flaky test
* SQL: Fix round-trips of floating point literals.
When writing RexLiterals into Druid expressions, we now write non-integer
numeric literals in such a way that ensures they are parsed as doubles
on the other end.
* Updates from code review, and some additional stuff inspired by the
investigation.
- Remove unnecessary formatting code from DruidExpression.doubleLiteral:
it handles things just fine with its default behavior.
- Fix a problem where expression literals could not represent Long.MIN_VALUE.
Now, integer literals start life off as BigIntegerExpr instead of LongExpr,
and are converted to LongExpr during flattening. This is necessary because,
in order to avoid ambiguity between unary minus and negative literals, our
grammar does not actually have true negative literals. Negative numbers must
be represented as unary minus next to a positive literal.
- Fix a bug introduced in #12230 where shuttle.visitAll(args) delegated
to shuttle.visit(arg) instead of arg.visit(shuttle). The latter does
a recursive visitation, which is the intended behavior.
* Style fixes.
* Move regexp to the right place.
* Expose HTTP Response headers from SqlResource
This change makes the SqlResource expose HTTP response
headers in the same way that the QueryResource exposes them.
Fundamentally, the change is to pipe the QueryResponse
object all the way through to the Resource so that it can
populate response headers. There is also some code
cleanup around DI, as there was a superfluous FactoryFactory
class muddying things up.
* more consistent expression error messages
* review stuff
* add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff
* fixes
* add expression transform name to transformer failure, better parse_json error messaging
Two changes:
1) Restore the text of the SQL query. It was removed in #12897, but
then it was later pointed out that the text is helpful for end
users querying Druid through tools that do not show the SQL queries
that they are making.
2) Adjust wording slightly, from "Cannot build plan for query" to
"Query not supported". This will be clearer to most users. Generally
the reason we get these errors is due to unsupported SQL constructs.
* json_value adjustments
changes:
* native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type
* rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument
* fix json_value_any to be usable by humans too, coverage
* fix bug
* checkstyle
* checkstyle
* review stuff
* validate that options to json_value are the supported options rather than ignore them
* remove more legacy undocumented functions
The method wasn't following its contract, leading to pollution of the
overall planner context, when really we just want to create a new
context for a specific query.
* SQL: Morph QueryMakerFactory into SqlEngine.
Groundwork for introducing an indexing-service-task-based SQL engine
under the umbrella of #12262. Also includes some other changes related
to improving error behavior.
Main changes:
1) Elevate the QueryMakerFactory interface (an extension point that allows
customization of how queries are made) into SqlEngine. SQL engines
can influence planner behavior through EngineFeatures, and can fully
control the mechanics of query execution using QueryMakers.
2) Remove the server-wide QueryMakerFactory choice, in favor of the choice
being made by the SQL entrypoint. The indexing-service-task-based
SQL engine would be associated with its own entrypoint, like
/druid/v2/sql/task.
Other changes:
1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based
on analysis of the planned rels; never try both. In particular, we
no longer try BINDABLE when DRUID fails. This simplifies the logic
and improves error messages.
2) Adjust error message "Cannot build plan for query" to omit the SQL
query text. Useful because the text can be quite long, which makes it
easy to miss the text about the problem.
3) Add a feature to block context parameters used internally by the SQL
planner from being supplied by end users.
4) Add a feature to enable adding row signature to the context for
Scan queries. This is useful in building the task-based engine.
5) Add saffron.properties file that turns off sets and graphviz dumps
in "cannot plan" errors. Significantly reduces log spam on the Broker.
* Fixes from CI.
* Changes from review.
* Can vectorize, now that join-to-filter is on by default.
* Checkstyle! And variable renames!
* Remove throws from test.
* Refactor SqlLifecycle into statement classes
Create direct & prepared statements
Remove redundant exceptions from tests
Tidy up Calcite query tests
Make PlannerConfig more testable
* Build fixes
* Added builder to SqlQueryPlus
* Moved Calcites system properties to saffron.properties
* Build fix
* Resolve merge conflict
* Fix IntelliJ inspection issue
* Revisions from reviews
Backed out a revision to Calcite tests that didn't work out as planned
* Build fix
* Fixed spelling errors
* Fixed failed test
Prepare now enforces security; before it did not.
* Rebase and fix IntelliJ inspections issue
* Clean up exception handling
* Fix handling of JDBC auth errors
* Build fix
* More tweaks to security messages
This is used to control access to the EXTERN function, which allows
reading external data in SQL. The EXTERN function is not usable in
production as of today, but it is used by the task-based SQL engine
contemplated in #12262.
Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog.
As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities.
This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself.
DruidSchema
In the present code, DruidSchema does three tasks:
Holds the segment metadata cache
Interfaces with an external schema manager
Acts as a schema to Calcite
This PR splits those responsibilities.
DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog.
SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema.
DruidTable
The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes:
DruidTable becomes an abstract base class to hold Druid-specific methods.
DatasourceTable represents a datasource.
ExternalTable represents an external table, such as from EXTERN or (later) from the catalog.
InlineTable represents the internal case in which we attach data directly to a table.
LookupTable represents Druid’s lookup table mechanism.
The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner.
DatasourceMetadata
Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR.
More Efficient Table Resolution
Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup.
There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names.
DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.)
DruidSchemaManager
DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method.
The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.
* Adjust "in" filter null behavior to match "selector".
Now, both of them match numeric nulls if constructed with a "null" value.
This is consistent as far as native execution goes, but doesn't match
the behavior of SQL = and IN. So, to address that, this patch also
updates the docs to clarify that the native filters do match nulls.
This patch also updates the SQL docs to describe how Boolean logic is
handled in addition to how NULL values are handled.
Fixes#12856.
* Fix test.
* Refactor Guice initialization
Builders for various module collections
Revise the extensions loader
Injector builders for server startup
Move Hadoop init to indexer
Clean up server node role filtering
Calcite test injector builder
* Revisions from review comments
* Build fixes
* Revisions from review comments
add NumericRangeIndex interface and BoundFilter support
changes:
* NumericRangeIndex interface, like LexicographicalRangeIndex but for numbers
* BoundFilter now uses NumericRangeIndex if comparator is numeric and there is no extractionFn
* NestedFieldLiteralColumnIndexSupplier.java now supports supplying NumericRangeIndex for single typed numeric nested literal columns
* better faster stronger and (ever so slightly) more understandable
* more tests, fix bug
* fix style
* Druid planner now makes only one pass through Calcite planner
Resolves the issue that required two parse/plan cycles: one
for validate, another for plan. Creates a clone of the Calcite
planner and validator to resolve the conflict that prevented
the merger.
* Fixes for the Avatica JDBC driver
Correctly implement regular and prepared statements
Correctly implement result sets
Fix race condition with contexts
Clarify when parameters are used
Prepare for single-pass through the planner
* Addressed review comments
* Addressed review comment
Some queries like `REPLACE INTO ... SELECT TIME_PARSE("__time") AS __time FROM ...`
fail at the Calcite layer because any column with name `__time` is considered to be of
type `SqlTypeName.TIMESTAMP`.
Changes:
- Modify `RowSignatures.toRelDataType()` so that the type of `__time` column
is determined by the RowSignature's type.
* Automatic sizing for GroupBy dictionary sizes.
Merging and selector dictionary sizes currently both default to 100MB.
This is not optimal, because it can lead to OOM on small servers and
insufficient resource utilization on larger servers. It also invites
end users to try to tune it when queries run out of dictionary space,
which can make things worse if the end user sets it to too high.
So, this patch:
- Adds automatic tuning for selector and merge dictionaries. Selectors
use up to 15% of the heap and merge buffers use up to 30% of the heap
(aggregate across all queries).
- Updates out-of-memory error messages to emphasize enabling disk
spilling vs. increasing memory parameters. With the memory parameters
automatically sized, it is more likely that an end user will get
benefit from enabling disk spilling.
- Removes the query context parameters that allow lowering of configured
dictionary sizes. These complicate the calculation, and I don't see a
reasonable use case for them.
* Adjust tests.
* Review adjustments.
* Additional comment.
* Remove unused import.
* Preserve column order in DruidSchema, SegmentMetadataQuery.
Instead of putting columns in alphabetical order. This is helpful
because it makes query order better match ingestion order. It also
allows tools, like the reindexing flow in the web console, to more
easily do follow-on ingestions using a column order that matches the
pre-existing column order.
We prefer the order from the latest segments. The logic takes all
columns from the latest segments in the order they appear, then adds
on columns from older segments after those.
* Additional test adjustments.
* Adjust imports.
* Frame format for data transfer and short-term storage.
As we move towards query execution plans that involve more transfer
of data between servers, it's important to have a data format that
provides for doing this more efficiently than the options available to
us today.
This patch adds:
- Columnar frames, which support fast querying.
- Row-based frames, which support fast sorting via memory comparison
and fast whole-row copies via memory copying.
- Frame files, a container format that can be stored on disk or
transferred between servers.
The idea is we should use row-based frames when data is expected to
be sorted, and columnar frames when data is expected to be queried.
The code in this patch is not used in production yet. Therefore, the
patch involves minimal changes outside of the org.apache.druid.frame
package. The main ones are adjustments to SqlBenchmark to add benchmarks
for queries on frames, and the addition of a "forEach" method to Sequence.
* Fixes based on tests, static analysis.
* Additional fixes.
* Skip DS mapping tests on JDK 14+
* Better JDK checking in tests.
* Fix imports.
* Additional comment.
* Adjustments from code review.
* Update test case.
* Add EIGHT_HOUR into possible list of Granularities.
* Add the missing definition.
* fix test.
* Fix another test.
* Stylecheck finally passed.
Co-authored-by: Didip Kerabat <didip@apple.com>
This commit contains the cleanup needed for the new integration test framework.
Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`
SQL expressions such as those containing `MV_FILTER_ONLY` and `MV_FILTER_NONE`
are planned as specialized virtual columns instead of the default `expression`-type virtual columns.
This commit adds a new context parameter to force the `expression`-type virtual columns.
Changes
- Add query context param `forceExpressionVirtualColumns`
- Use context param to determine if specialized virtual columns should be used or not
- Moved some tests into `CalciteExplainQueryTest`
* Add TIME_IN_INTERVAL SQL operator.
The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.
* SqlParserPos cannot be null here.
* Remove unused import.
* Doc updates.
* Add words to dictionary.
True, false, and null have different meanings: true/false mean "legacy"
and "not legacy"; null means use the default set by ScanQueryConfig.
So, we need to respect this in the JsonIgnore setup.
* Remove null and empty fields from native queries
* Test fixes
* Attempted IT fix.
* Revisions from review comments
* Build fixes resulting from changes suggested by reviews
* IT fix for changed segment size
Fixes an issue where sql query request logs do not include the default query context
values set via `druid.query.default.context.xyz` runtime properties.
# Change summary
* Inject `DefaultQueryConfig` into `SqlLifecycleFactory`
* Add params from `DefaultQueryConfig` to the query context in `SqlLifecycle`
# Description
- This change does not affect query execution. This is because the
`DefaultQueryConfig` was already being used in `QueryLifecycle`,
which is initialized when the SQL is translated to a native query.
- This also handles any potential use case where a context parameter should be
handled at the SQL stage itself.
RowBasedColumnSelectorFactory inherited strange behavior from
Rows.objectToStrings for nulls that appear in lists: instead of being
left as a null, it is replaced with the string "null". Some callers may
need compatibility with this strange behavior, but it should be opt-in.
Query-time call sites are changed to opt-out of this behavior, since it
is not consistent with query-time expectations. The IncrementalIndex
ingestion-time call site retains the old behavior, as this is traditionally
when Rows.objectToStrings would be used.
Description
Fixes a bug when running q's like
SELECT cntarray,
Count(*)
FROM (SELECT dim1,
dim2,
Array_agg(cnt) AS cntarray
FROM (SELECT dim1,
dim2,
dim3,
Count(*) AS cnt
FROM foo
GROUP BY 1,
2,
3)
GROUP BY 1,
2)
GROUP BY 1
This generates an error:
org.apache.druid.java.util.common.ISE: Unable to convert type [Ljava.lang.Object; to org.apache.druid.segment.data.ComparableList
at org.apache.druid.segment.DimensionHandlerUtils.convertToList(DimensionHandlerUtils.java:405) ~[druid-xx]
Because it's an array of numbers it looks like it does the convertToList call, which looks like:
@Nullable
public static ComparableList convertToList(Object obj)
{
if (obj == null) {
return null;
}
if (obj instanceof List) {
return new ComparableList((List) obj);
}
if (obj instanceof ComparableList) {
return (ComparableList) obj;
}
throw new ISE("Unable to convert type %s to %s", obj.getClass().getName(), ComparableList.class.getName());
}
I.e. it doesn't know about arrays. Added the array handling as part of this PR.
In the case that the clustered by is before the partitioned by for an sql query, the error message is a bit confusing.
insert into foo select * from bar clustered by dim1 partitioned by all
Error: SQL parse failed
Encountered "PARTITIONED" at line 1, column 88.
Was expecting one of: <EOF> "," ... "ASC" ... "DESC" ... "NULLS" ... "." ... "NOT" ... "IN" ... "<" ... "<=" ... ">" ... ">=" ... "=" ... "<>" ... "!=" ... "BETWEEN" ... "LIKE" ... "SIMILAR" ... "+" ... "-" ... "*" ... "/" ... "%" ... "||" ... "AND" ... "OR" ... "IS" ... "MEMBER" ... "SUBMULTISET" ... "CONTAINS" ... "OVERLAPS" ... "EQUALS" ... "PRECEDES" ... "SUCCEEDS" ... "IMMEDIATELY" ... "MULTISET" ... "[" ... "FORMAT" ... "(" ... Less...
org.apache.calcite.sql.parser.SqlParseException
This is a bit confusing and adding a check could be added to throw a more user friendly message stating that the order should be reversed.
Add error message for incorrectly ordered clause in sql.
* Direct UTF-8 access for "in" filters.
Directly related:
1) InDimFilter: Store stored Strings (in ValuesSet) plus sorted UTF-8
ByteBuffers (in valuesUtf8). Use valuesUtf8 whenever possible. If
necessary, the input set is copied into a ValuesSet. Much logic is
simplified, because we always know what type the values set will be.
I think that there won't even be an efficiency loss in most cases.
InDimFilter is most frequently created by deserialization, and this
patch updates the JsonCreator constructor to deserialize
directly into a ValuesSet.
2) Add Utf8ValueSetIndex, which InDimFilter uses to avoid UTF-8 decodes
during index lookups.
3) Add unsigned comparator to ByteBufferUtils and use it in
GenericIndexed.BYTE_BUFFER_STRATEGY. This is important because UTF-8
bytes can be compared as bytes if, and only if, the comparison
is unsigned.
4) Add specialization to GenericIndexed.singleThreaded().indexOf that
avoids needless ByteBuffer allocations.
5) Clarify that objects returned by ColumnIndexSupplier.as are not
thread-safe. DictionaryEncodedStringIndexSupplier now calls
singleThreaded() on all relevant GenericIndexed objects, saving
a ByteBuffer allocation per access.
Also:
1) Fix performance regression in LikeFilter: since #12315, it applied
the suffix matcher to all values in range even for type MATCH_ALL.
2) Add ObjectStrategy.canCompare() method. This fixes LikeFilterBenchmark,
which was broken due to calls to strategy.compare in
GenericIndexed.fromIterable.
* Add like-filter implementation tests.
* Add in-filter implementation tests.
* Add tests, fix issues.
* Fix style.
* Adjustments from review.
* SQL: Add is_active to sys.segments, update examples and docs.
is_active is short for:
(is_published = 1 AND is_overshadowed = 0) OR is_realtime = 1
It's important because this represents "all the segments that should
be queryable, whether or not they actually are right now". Most of the
time, this is the set of segments that people will want to look at.
The web console already adds this filter to a lot of its queries,
proving its usefulness.
This patch also reworks the caveat at the bottom of the sys.segments
section, so its information is mixed into the description of each result
field. This should make it more likely for people to see the information.
* Wording updates.
* Adjustments for spellcheck.
* Adjust IT.
- Add user friendly error messages for missing or incorrect OVERWRITE clause for REPLACE SQL query
- Move validation of missing OVERWRITE clause at code level instead of parser for custom error message
Relevant Issue: #11929
- Add custom replace statement to Druid SQL parser.
- Edit DruidPlanner to convert relevant fields to Query Context.
- Refactor common code with INSERT statements to reuse them for REPLACE where possible.
Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
* Add feature flag for sql planning of TimeBoundary queries
* fixup! Add feature flag for sql planning of TimeBoundary queries
* Add documentation for enableTimeBoundaryPlanning
* fixup! Add documentation for enableTimeBoundaryPlanning
* Vectorized version of string last aggregator
* Updating string last and adding testcases
* Updating code and adding testcases for serializable pairs
* Addressing review comments
* Reduce allocations due to Jackson serialization.
This patch attacks two sources of allocations during Jackson
serialization:
1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
DefaultSerializerProvider instance for each call. It has lots of
fields and creates pressure on the garbage collector. So, this patch
adds helper functions in JacksonUtils that enable reuse of
SerializerProvider objects and updates various call sites to make
use of this.
2) GroupByQueryToolChest copies the ObjectMapper for every query to
install a special module that supports backwards compatibility with
map-based rows. This isn't needed if resultAsArray is set and
all servers are running Druid 0.16.0 or later. This release was a
while ago. So, this patch disables backwards compatibility by default,
which eliminates the need to copy the heavyweight ObjectMapper. The
patch also introduces a configuration option that allows admins to
explicitly enable backwards compatibility.
* Add test.
* Update additional call sites and add to forbidden APIs.
* SQL: Create millisecond precision timestamp literals.
Fixes a bug where implicit casts of strings to timestamps would use seconds
precision rather than milliseconds. The new test case
testCountStarWithBetweenTimeFilterUsingMillisecondsInStringLiterals
exercises this.
* Update sql/src/main/java/org/apache/druid/sql/calcite/planner/Calcites.java
Co-authored-by: Frank Chen <frankchen@apache.org>
* Correct precision handling.
- Set default precision to 3 (millis) for things involving timestamps.
- Respect precision specified in types when available.
* Silence, checkstyle.
Co-authored-by: Frank Chen <frankchen@apache.org>
Unnamed columns in the select part of insert SQL statements currently create a table with the column name such as "EXPR$3". This PR adds a check for this.
* Vectorizing Latest aggregator Part 1
* Updating benchmark tests
* Changing appropriate logic for vectors for null handling
* Introducing an abstract class and moving the commonalities there
* Adding vectorization for StringLast aggregator (initial version)
* Updated bufferized version of numeric aggregators
* Adding some javadocs
* Making sure this PR vectorizes numeric latest agg only
* Adding another benchmarking test
* Fixing intellij inspections
* Adding tests for double
* Adding test cases for long and float
* Updating testcases
* Checkstyle oops..
* One tiny change in test case
* Fixing spotbug and rhs not being used
* Support array based results in timeBoundary query
* Fix bug with query interval in timeBoundary
* Convert min(__time) and max(__time) SQL queries to timeBoundary
* Add tests for timeBoundary backed SQL queries
* Fix query plans for existing tests
* fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary
* fixup! Add tests for timeBoundary backed SQL queries
* fixup! Fix bug with query interval in timeBoundary
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.
Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause
1) a bad UX if the context param is not matured yet or
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.
This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.
{
"resourceAction" : {
"resource" : {
"name" : "maxSubqueryRows",
"type" : "QUERY_CONTEXT"
},
"action" : "WRITE"
},
"resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.
When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,
HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.
The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
For a query like
INSERT INTO tablename SELECT channel, added as count FROM wikipedia the error message is Encountered "as count". However, for the insert statement
INSERT INTO t SELECT channel, added as count FROM wikipedia PARTITIONED BY ALL
returns INSERT statements must specify PARTITIONED BY clause explictly (incorrectly). This PR corrects this.
Add EOF to end of Druid SQL Insert statements
Rename SQL Insert statements in the parser to reflect the behaviour change
Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.
* Fix error message for groupByEnableMultiValueUnnesting.
It referred to the incorrect context parameter.
Also, create a dedicated exception class, to allow easier detection of this
specific error.
* Fix other test.
* More better error messages.
* Test getDimensionName method.
* upgrade Airline to Airline 2
https://github.com/airlift/airline is no longer maintained, updating to
https://github.com/rvesse/airline (Airline 2) to use an actively
maintained version, while minimizing breaking changes.
Note, this is a backwards incompatible change, and extensions relying on
the CliCommandCreator extension point will also need to be updated.
* fix dependency checks where jakarta.inject is now resolved first instead
of javax.inject, due to Airline 2 using jakarta
As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key.
Added a config groupByEnableMultiValueUnnesting which can be set in the query context.
The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior.
If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.
* Moving in filter check to broker
* Adding more unit tests, making error message meaningful
* Spelling and doc changes
* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100
* Removing upper limit of 100, updated docs
* Making documentation more meaningful
* Moving check outside to PlannerConfig, updating test cases and adding back max limit
* Updated with some additional code comments
* Missed removing one line during the checkin
* Addressing doc changes and one forbidden API correction
* Final doc change
* Adding a speling exception, correcting a testcase
* Reading entire filter tree to address combinations of ANDs and ORs
* Specifying in docs that, this case works only for ORs
* Revert "Reading entire filter tree to address combinations of ANDs and ORs"
This reverts commit 81ca8f8496.
* Covering a class cast exception and updating docs
* Counting changed
Co-authored-by: Jihoon Son <jihoonson@apache.org>
#12163 makes PARTITIONED BY a required clause in INSERT queries. While this is required, if a user accidentally omits the clause, it emits a JavaCC/Calcite error, since it's syntactically incorrect. The error message is cryptic. Since it's a custom clause, this PR aims to make the clause optional on the syntactic side, but move the validation to DruidSqlInsert where we can surface a friendlier error.
* rework sql planner expression and virtual column handling
* simplify a bit
* add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests
* spotbugs
* fix bugs with multi-value string array expression handling
* javadocs and adjust test
* better
* fix tests
* array_concat_agg and array_agg support for array inputs
changes:
* added array_concat_agg to aggregate arrays into a single array
* added array_agg support for array inputs to make nested array
* added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating
* fix test
* tie capabilities type to legacy mode flag about coercing arrays to strings
* oops
* better javadoc
* changes:
* remove SystemSchema duplicate ServerInventoryView in broker
* suppress duplicate segment added/removed warnings in HttpServerInventoryView when doing a full sync
* fixes