druid/integration-tests/docker/environment-configs/common

84 lines
3.9 KiB
Plaintext
Raw Normal View History

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
LANG=C.UTF-8
LANGUAGE=C.UTF-8
LC_ALL=C.UTF-8
# JAVA OPTS
Fixes and tests related to the Indexer process. (#10631) * Fixes and tests related to the Indexer process. Three bugs fixed: 1) Indexers would not announce themselves as segment servers if they did not have storage locations defined. This used to work, but was broken in #9971. Fixed this by adding an "isSegmentServer" method to ServerType and updating SegmentLoadDropHandler to always announce if this method returns true. 2) Certain batch task types were written in a way that assumed "isReady" would be called before "run", which is not guaranteed. In particular, they relied on it in order to initialize "taskLockHelper". Fixed this by updating AbstractBatchIndexTask to ensure "isReady" is called before "run" for these tasks. 3) UnifiedIndexerAppenderatorsManager did not properly handle complex datasources. Introduced DataSourceAnalysis in order to fix this. Test changes: 1) Add a new "docker-compose.cli-indexer.yml" config that spins up an Indexer instead of a MiddleManager. 2) Introduce a "USE_INDEXER" environment variable that determines if docker-compose will start up an Indexer or a MiddleManager. 3) Duplicate all the jdk8 tests and run them in both MiddleManager and Indexer mode. 4) Various adjustments to encourage fail-fast errors in the Docker build scripts. 5) Various adjustments to speed up integration tests and reduce memory usage. 6) Add another Mac-specific approach to determining a machine's own IP. This was useful on my development machine. 7) Update segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused). Javadoc updates: 1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX that make it clear when taskLockHelper will be initialized as a side effect. (Related to the second bug above.) 2) Task: Clarified that "isReady" is not guaranteed to be called before "run". It was already implied, but now it's explicit. 3) ZkCoordinator: Clarified deprecation message. 4) DataSegmentServerAnnouncer: Clarified deprecation message. * Fix stop_cluster script. * Fix sanity check in script. * Fix hashbang lines. * Test and doc adjustments. * Additional tests, and adjustments for tests. * Split ITs back out. * Revert change to druid_coordinator_period_indexingPeriod. * Set Indexer capacity to match MM. * Bump up Historical memory. * Bump down coordinator, overlord memory. * Bump up Broker memory.
2020-12-08 19:02:26 -05:00
COMMON_DRUID_JAVA_OPTS=-Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dlog4j.configurationFile=/shared/docker/lib/log4j2.xml -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
DRUID_DEP_BIN_DIR=/shared/docker/bin
DRUID_DEP_LIB_DIR=/shared/hadoop_xml:/shared/docker/lib/*:/usr/local/druid/lib/mysql-connector-java.jar
# Druid configs
# If you are making a change in load list below, make the necessary changes in github actions too
druid_extensions_loadList=["mysql-metadata-storage","druid-basic-security","simple-client-sslcontext","druid-testing-tools","druid-lookups-cached-global","druid-histogram","druid-datasketches","druid-parquet-extensions","druid-avro-extensions","druid-protobuf-extensions","druid-orc-extensions","druid-kafka-indexing-service","druid-s3-extensions"]
druid_startup_logging_logProperties=true
druid_extensions_directory=/shared/docker/extensions
druid_auth_authenticator_basic_authorizerName=basic
druid_auth_authenticator_basic_initialAdminPassword=priest
druid_auth_authenticator_basic_initialInternalClientPassword=warlock
druid_auth_authenticator_basic_type=basic
druid_auth_authenticatorChain=["basic"]
druid_auth_authorizer_basic_type=basic
druid_auth_authorizers=["basic"]
Add support for authorizing query context params (#12396) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
2022-04-21 04:51:16 -04:00
druid_auth_authorizeQueryContextParams=true
druid_client_https_certAlias=druid
druid_client_https_keyManagerPassword=druid123
druid_client_https_keyStorePassword=druid123
druid_client_https_keyStorePath=/tls/server.p12
druid_client_https_protocol=TLSv1.2
druid_client_https_trustStoreAlgorithm=PKIX
druid_client_https_trustStorePassword=druid123
druid_client_https_trustStorePath=/tls/truststore.jks
druid_enableTlsPort=true
druid_escalator_authorizerName=basic
druid_escalator_internalClientPassword=warlock
druid_escalator_internalClientUsername=druid_system
druid_escalator_type=basic
druid_lookup_numLookupLoadingThreads=1
druid_server_http_numThreads=20
# Allow OPTIONS method for ITBasicAuthConfigurationTest.testSystemSchemaAccess
druid_server_http_allowedHttpMethods=["OPTIONS"]
druid_server_https_certAlias=druid
druid_server_https_keyManagerPassword=druid123
druid_server_https_keyStorePassword=druid123
druid_server_https_keyStorePath=/tls/server.p12
druid_server_https_keyStoreType=PKCS12
druid_server_https_requireClientCertificate=true
druid_server_https_trustStoreAlgorithm=PKIX
druid_server_https_trustStorePassword=druid123
druid_server_https_trustStorePath=/tls/truststore.jks
druid_server_https_validateHostnames=true
druid_zk_service_host=druid-zookeeper-kafka
druid_auth_basic_common_maxSyncRetries=20
druid_indexer_logs_directory=/shared/tasklogs
druid_sql_enable=true
druid_extensions_hadoopDependenciesDir=/shared/hadoop-dependencies
druid_request_logging_type=slf4j
druid_coordinator_kill_supervisor_on=true
druid_coordinator_kill_supervisor_period=PT10S
druid_coordinator_kill_supervisor_durationToRetain=PT0M
druid_coordinator_period_metadataStoreManagementPeriod=PT10S
druid_sql_planner_authorizeSystemTablesDirectly=true
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.
2021-03-25 13:32:21 -04:00
# Testing the legacy config from https://github.com/apache/druid/pull/10267
# Can remove this when the flag is no longer needed
druid_indexer_task_ignoreTimestampSpecForDruidInputSource=true