Vishesh Garg 197c54f673
Auto-Compaction using Multi-Stage Query Engine (#16291)
Description:
Compaction operations issued by the Coordinator currently run using the native query engine.
As majority of the advancements that we are making in batch ingestion are in MSQ, it is imperative
that we support compaction on MSQ to make Compaction more robust and possibly faster. 
For instance, we have seen OOM errors in native compaction that MSQ could have handled by its
auto-calculation of tuning parameters. 

This commit enables compaction on MSQ to remove the dependency on native engine. 

Main changes:
* `DataSourceCompactionConfig` now has an additional field `engine` that can be one of 
`[native, msq]` with `native` being the default.
*  if engine is MSQ, `CompactSegments` duty assigns all available compaction task slots to the
launched `CompactionTask` to ensure full capacity is available to MSQ. This is to avoid stalling which
could happen in case a fraction of the tasks were allotted and they eventually fell short of the number
of tasks required by the MSQ engine to run the compaction.
* `ClientCompactionTaskQuery` has a new field `compactionRunner` with just one `engine` field.
* `CompactionTask` now has `CompactionRunner` interface instance with its implementations
`NativeCompactinRunner` and `MSQCompactionRunner` in the `druid-multi-stage-query` extension.
The objectmapper deserializes `ClientCompactionRunnerInfo` in `ClientCompactionTaskQuery` to the
`CompactionRunner` instance that is mapped to the specified type [`native`, `msq`]. 
* `CompactTask` uses the `CompactionRunner` instance it receives to create the indexing tasks.
* `CompactionTask` to `MSQControllerTask` conversion logic checks whether metrics are present in 
the segment schema. If present, the task is created with a native group-by query; if not, the task is
issued with a scan query. The `storeCompactionState` flag is set in the context.
* Each created `MSQControllerTask` is launched in-place and its `TaskStatus` tracked to determine the
final status of the `CompactionTask`. The id of each of these tasks is the same as that of `CompactionTask`
since otherwise, the workers will be unable to determine the controller task's location for communication
(as they haven't been launched via the overlord).
2024-07-12 16:40:20 +05:30

85 lines
4.0 KiB
Plaintext

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
LANG=C.UTF-8
LANGUAGE=C.UTF-8
LC_ALL=C.UTF-8
# JAVA OPTS
COMMON_DRUID_JAVA_OPTS=-Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dlog4j.configurationFile=/shared/docker/lib/log4j2.xml -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='chmod 644 /shared/logs/*.hprof' -XX:HeapDumpPath=/shared/logs
DRUID_DEP_BIN_DIR=/shared/docker/bin
DRUID_DEP_LIB_DIR=/shared/hadoop_xml:/shared/docker/lib/*:/usr/local/druid/lib/mysql-connector-j.jar
# Druid configs
# If you are making a change in load list below, make the necessary changes in github actions too
druid_extensions_loadList=["mysql-metadata-storage","druid-basic-security","simple-client-sslcontext","druid-testing-tools","druid-lookups-cached-global","druid-histogram","druid-datasketches","druid-parquet-extensions","druid-avro-extensions","druid-protobuf-extensions","druid-orc-extensions","druid-kafka-indexing-service","druid-s3-extensions","druid-multi-stage-query"]
druid_startup_logging_logProperties=true
druid_extensions_directory=/shared/docker/extensions
druid_auth_authenticator_basic_authorizerName=basic
druid_auth_authenticator_basic_initialAdminPassword=priest
druid_auth_authenticator_basic_initialInternalClientPassword=warlock
druid_auth_authenticator_basic_type=basic
druid_auth_authenticatorChain=["basic"]
druid_auth_authorizer_basic_type=basic
druid_auth_authorizers=["basic"]
druid_auth_authorizeQueryContextParams=true
druid_client_https_certAlias=druid
druid_client_https_keyManagerPassword=druid123
druid_client_https_keyStorePassword=druid123
druid_client_https_keyStorePath=/tls/server.p12
druid_client_https_protocol=TLSv1.2
druid_client_https_trustStoreAlgorithm=PKIX
druid_client_https_trustStorePassword=druid123
druid_client_https_trustStorePath=/tls/truststore.jks
druid_enableTlsPort=true
druid_escalator_authorizerName=basic
druid_escalator_internalClientPassword=warlock
druid_escalator_internalClientUsername=druid_system
druid_escalator_type=basic
druid_lookup_numLookupLoadingThreads=1
druid_server_http_numThreads=20
# Allow OPTIONS method for ITBasicAuthConfigurationTest.testSystemSchemaAccess
druid_server_http_allowedHttpMethods=["OPTIONS"]
druid_server_https_certAlias=druid
druid_server_https_keyManagerPassword=druid123
druid_server_https_keyStorePassword=druid123
druid_server_https_keyStorePath=/tls/server.p12
druid_server_https_keyStoreType=PKCS12
druid_server_https_requireClientCertificate=true
druid_server_https_trustStoreAlgorithm=PKIX
druid_server_https_trustStorePassword=druid123
druid_server_https_trustStorePath=/tls/truststore.jks
druid_server_https_validateHostnames=true
druid_zk_service_host=druid-zookeeper-kafka
druid_auth_basic_common_maxSyncRetries=20
druid_indexer_logs_directory=/shared/tasklogs
druid_sql_enable=true
druid_extensions_hadoopDependenciesDir=/shared/hadoop-dependencies
druid_request_logging_type=slf4j
druid_coordinator_kill_supervisor_on=true
druid_coordinator_kill_supervisor_period=PT10S
druid_coordinator_kill_supervisor_durationToRetain=PT0M
druid_coordinator_period_metadataStoreManagementPeriod=PT10S
druid_sql_planner_authorizeSystemTablesDirectly=true
druid_audit_manager_type=log
# Testing the legacy config from https://github.com/apache/druid/pull/10267
# Can remove this when the flag is no longer needed
druid_indexer_task_ignoreTimestampSpecForDruidInputSource=true