Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00

75 lines
3.1 KiB
Plaintext

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
LANG=C.UTF-8
LANGUAGE=C.UTF-8
LC_ALL=C.UTF-8
# JAVA OPTS
COMMON_DRUID_JAVA_OPTS=-Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dlog4j.configurationFile=/shared/docker/lib/log4j2.xml -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
DRUID_DEP_LIB_DIR=/shared/hadoop_xml:/shared/docker/lib/*:/usr/local/druid/lib/mysql-connector-java.jar
# Druid configs
druid_extensions_loadList=[]
druid_extensions_directory=/shared/docker/extensions
druid_auth_authenticator_basic_authorizerName=basic
druid_auth_authenticator_basic_initialAdminPassword=priest
druid_auth_authenticator_basic_initialInternalClientPassword=warlock
druid_auth_authenticator_basic_type=basic
druid_auth_authenticatorChain=["basic"]
druid_auth_authorizer_basic_type=basic
druid_auth_authorizers=["basic"]
druid_client_https_certAlias=druid
druid_client_https_keyManagerPassword=druid123
druid_client_https_keyStorePassword=druid123
druid_client_https_keyStorePath=/tls/server.jks
druid_client_https_protocol=TLSv1.2
druid_client_https_trustStoreAlgorithm=PKIX
druid_client_https_trustStorePassword=druid123
druid_client_https_trustStorePath=/tls/truststore.jks
druid_enableTlsPort=true
druid_escalator_authorizerName=basic
druid_escalator_internalClientPassword=warlock
druid_escalator_internalClientUsername=druid_system
druid_escalator_type=basic
druid_lookup_numLookupLoadingThreads=1
druid_server_http_numThreads=20
# Allow OPTIONS method for ITBasicAuthConfigurationTest.testSystemSchemaAccess
druid_server_http_allowedHttpMethods=["OPTIONS"]
druid_server_https_certAlias=druid
druid_server_https_keyManagerPassword=druid123
druid_server_https_keyStorePassword=druid123
druid_server_https_keyStorePath=/tls/server.jks
druid_server_https_keyStoreType=jks
druid_server_https_requireClientCertificate=true
druid_server_https_trustStoreAlgorithm=PKIX
druid_server_https_trustStorePassword=druid123
druid_server_https_trustStorePath=/tls/truststore.jks
druid_server_https_validateHostnames=true
druid_zk_service_host=druid-zookeeper-kafka
druid_auth_basic_common_maxSyncRetries=20
druid_indexer_logs_directory=/shared/tasklogs
druid_sql_enable=true
druid_extensions_hadoopDependenciesDir=/shared/hadoop-dependencies
druid_request_logging_type=slf4j
# Testing the legacy config from https://github.com/apache/druid/pull/10267
# Can remove this when the flag is no longer needed
druid_indexer_task_ignoreTimestampSpecForDruidInputSource=true