druid/test-config.md at 51b2f6cb4524a79fd9ea4abbe4e35416f30b1cdb

20 KiB

Raw Blame History

Test Configuration

Tests typically need to understand how the cluster is structured. To create a test, you must supply at least three key components:

A cluster/<category>/docker-compose.yaml file that launches the desired cluster. (The folder name <category> becomes the application name in Docker.)
A src/test/resources/cluster/<category>/docker.yaml file that describes the cluster for tests. This file can also include Metastore SQL statements needed to populate the metastore.
The test itself, as a JUnit test that uses the Initializer class to configure the tests to match the cluster.

This section explains the test configuration file which defines the test cluster.

Note that you can create multiple versions of the docker.yaml file. For example, you might want to create one that lists hosts and credentials unique to your debugging environment. You then use your custom version in place of the standard one.

Cluster Types

The integration tests can run in a variety of cluster types, depending on the details of the test:

Docker Compose: the normal configuration that all tests support.
Micro Quickstart: allows for a manual cluster setup, if, say, you want to run services in your IDE. Supported by a subset of tests.
Kubernetes: (Details needed.)

Each cluster type has its own quirks. The job of the tests's cluster configuration file is to communicate those quirks to the test.

Docker and Kubernetes use proxies to communicate to the cluster. Thus, the host known to the tests is different than the hosts known within the cluster. Ports may also are mapped differently "outside" than "inside."

Clusters outside of Docker don't provide a good way to start and stop services, so tests that want to do that (to, say, test high availability) can't run except in a Docker cluster.

Specify the Cluster Type

To reflect this, tests provide named configuration files. The configuration itself is passed in via the environment:

export TEST_CONFIG=quickstart

java ... -DtestConfig=quickstart

The system property taskes precedence over the environment variable. If neither are set, docker is the default. The configuration file itself is assumed to be a resource named /yaml/<config>.yaml.

As a debug aide, a test can specify and ad-hoc file in the file system to load for one-off special cases. See Initialization.Builder for details.

Cluster Configuration Files

Cluster configuration is specified in a file for ease of debugging. Since configuration is in a file (resource), and not in environment variables or system properties, you should need no special launch setup in your IDE to run a test that uses the standard Docker Compose cluster for that test.

The configuration file has the same name as the cluster type and resides on the class path at /yaml/<type>.yaml and in the source tree at <test module>/src/test/resources/yaml/<type>.yaml. The standard names are:

docker.yaml: the default and required for all tests. Describes a Docker Compose based test.
k8s.yaml: a test cluster running in Kubernetes. (Details needed.)
local.yaml: a local cluser such as Micro Quickstart cluster. (Details needed.)
<other>.yaml: custom cluster configuration.

Configuration files support include files. Most of the boiler-plate configuration should appear in commmon files. As a result, you should only need to specify test-specific differences in your docker.yaml file, with all else obtained from the included files.

Configuration File Syntax

The configuration is a YAML file that has a few top-level properties and an entry for each service in your cluster.

`type`

type: docker|k8s|local|disabled

The type explains the infrastructure that runs the cluster:

docker: a cluster launched in Docker, typically via Docker Compose. A proxy host is needed. (See below.)
k8s: a cluster run in Kubernets. (Details needed). A proxy host is needed.
local: a cluster running as processes on a network directly reachable by the tests. Example: a micro-quickstart cluster running locally.
disabled: the configuration is not supported by the test.

The disabled type is handy for tests that require Docker: you can say that the test is not available when the cluster is local.

If the test tries to load a cluster name that does not exist, a "dummy" configuration is loaded instead with the type set to disabled.

The type is separate from the cluster name (as explained earlier): there may be multiple names for the same type. For example, you might have two or three local cluster setups you wish to test.

`include`

include:
  - <file>

Allows including any number of other files. Similar to inheritance for Docker Compose. The inheritance rules are:

Properties set later in the list take precedence over properties set in files earlier in the list.
Properties set in the file take precedence over properties set in included files.
Includes can nest to any level.

Merging occurs as follows:

Top level scalars: newer values replace older values.
Services: newer values replace all older settings for that service.
Metastore init: newer values add more queries to any list defined by an earlier file.
Properties: newer values replace values defined by earlier files.

The files are assumed to be resources (on the class path) and require the full path name. Example: /cluster/Commmon/base.yaml

`proxyHost`

proxyHost: <host name>

When tests run in either Docker or Kubernetes, the test communicate with a proxy, which forwards requests to the cluster hosts and ports. In Docker, the proxy host is the machine that runs Docker. In Kubernetes, the proxy host is the host running the Kubernetes proxy service.

There is no proxy host for clusters running directly on a machine.

If the proxy host is omitted for Docker, localhost is assumed.

`datasourceSuffix`

datasourceSuffix: <suffix>

Suffix to append to data source names in indexer tests. The default is the empty string.

`zk`

zk:
  <service object>

Specifies the ZooKeeper instances.

`startTimeoutSecs`

startTimeoutSecs: <secs>

Specifies the amount of time to wait for ZK to become available when using the test client. Optional.

`metastore`

metastore:
  <service object>

Describes the Druid "metadata storage" (metastore) typically hosted in the offical MySql container. See MetastoreConfig for configuration options.

`driver`

driver: <full class name>

The Driver to use to work with the metastore. The driver must be available on the tests's class path.

`connectURI`

connectURI: <url>

The JDBC connetion URL. Example:

jdbc:mysql://<host>:<port>/druid

The config system supports two special fields: <host> and <port>. A string of form <host> will be replaced by the resolved host name (proxy host for Docker) and <port> with the resolved port number.

`user`

user: <user name>

The MySQL user name.

`password`

user: <password>

The MySQL password.

`properties`

properties:
  <key>: <value>

Optional map of additional key/value pairs to pass to the JDBC driver.

`kafka`

zk:
  <service object>

Describes the optional Kafka service.

`druid`

druid:
  <service>:
    <service object>

Describes the set of Druid services using the ServiceConfig object. Each service is keyed by the standard service name: the same name used by the Druid server option.

When using inheritance, overrides replace entire services: it is not possible to override individual instances of the service. That is, an include file might define coordinator, but a test-specific file might override this with a definition of two Coordinators.

`properties`

properties:
  <key>: <value>

Optional set of properties to use to configuration the Druid components loaded by tests. This is the test-specific form of the standard Druid common.runtime.properties and runtime.properties files. Because the test runs as a client, the server files are not available, and might not even make sense. (The client is not a "service", for example.) Technically, the properties listed here are added to Guice as the one and only Properties object.

Typically most components work using the default values. Tests are free to change any of these values for a given test scenario. The properties are the same for all tests within a category. However, they can be changed via environment variables via the environment variable "binding" mechanism described in tests.

The "JSON configuration" mechanism wants all properties to be strings. YAML will deserialize number-like properties as numbers. To avoid confusion, all properties are converted to strings before being passed to Druid.

When using inheritance, later properties override earlier properties. Environment variables, if bound, override the defaults specified in this section. Command-line settings, if provided, have the highest priority.

A number of test-specific properties are avilable:

druid.test.config.cloudBucket
druid.test.config.cloudPath

`settings`

The settings section is much like the properties section, and, indeed, are converted to properties internally. Settings are a fixed set of values that map to the config files used in the prior tests. Keys include:

The above replaces the config file mechanism from the older tests. In general, when a setting is fixed for a test category, list it in the docker.yaml configuration file. When it varies, pass it in as an environment variable. As a result, the prior configuration file is not needed. As a result, the prior override.config.path property is not supported.

`metastoreInit`

metastoreInit:
  - sql: |
      <sql query>

A set of MySQL statements to be run against the metadata storage before the test starts. Queries run in the order specified. Ensure each is idempotent to allow running tests multiple times against the same database.

To be kind to readers, please format the statements across multiple lines. The code will compress out extra spaces before submitting the query so that JSON payloads are as compact as possible.

The sql keyword is the only one supported at present. The idea is that there may need to be context for some queries in some tests. (To be enhanced as query conversion proceeds.)

When using inheritance, the set of queries is the union of all queries in all configuration files. Base statements appear first, then included statements.

`metastoreInitDelaySec`

metastoreInitDelaySec: <sec>

The default value is 6 seconds.

The metastore init section issues queries to the MySQL DB read by the Coordinator. For performance, the Coordinator does not directly query the database: instead, it queries an in-memory cache. This leads to the following behavior:

The Coordinator starts, checks the DB, and records the poll time.
The test starts and updates the DB.
The test runs and issues a query that needs the DB contents.
The Coordinator checks that its poll timeout has not yet occurred and returns the (empty) contents of the cache.
The test checks the empty contents against the expected contents, notices the results differ, and fails the test.

To work around this, we must change two settings. First, change the following Druid configuration for the Coordinator:

      - druid_manager_segments_pollDuration=PT5S

Second, change the metastoreInitDelaySec to be a bit longer:

metastoreInitDelaySec: 6

The result is that the test will sit idle for 6 seconds, but that is better than random failures.

Note: a better fix would be for the Coordinator to have an API that causes it to flush its cache. Since some tests run two coordinators, the message must be sent to both. An even better fix would be fore the Coordinator to detect such changes itself somehow.

Service Object

Generic object to describe Docker Compose services.

`if`

Conditionally defines a service. The system defines a set of configutation tags. At present there are only two:

middleManager: the cluster runs a MiddleManager
indexer: the cluster runs an Indexer

The if tag conditionally enables a service only if the corresponding tag is set. Thus, for a cluster that can use either a middle manager or an indexer:

  middlemanager:
    if: middleManager
    instances:
      - port: 8091
  indexer:
    if: indexer
    instances:
      - port: 8091

`instances`

instances:
  - <service-instance>

Describes the instances of the service as ServiceInstance objects. Each service requires at least one instance. If more than one, then each instance must define a tag that is a suffix that distinguishes the instances.

Service Instance Object

The service sections all allow multiple instances of each service. Service instances define each instance of a service and provide a number of properties:

`tag`

When a service has more than one instance, the instances must have unique names. The name is made up of the a base name (see below) with the tag appended. Thus, if the service is cooordinator and the tag is one, then the instance name is coordinator-one.

The tag is required when there is more than one instance of a service, and is optional if there is only one instance. The tag corresponds to the DRUID_INSTANCE environment variable passed into the container.

`container`

container: <container name>

Name of the Docker container. If omitted, defaults to:

<service name>-<tag> if a tag is provided (see below.)
The name of the service (if there is only one instance).

`host`

host: <host name or IP>

The host name or IP address on which the instance runs. This is the host name known to the cluster: the name inside a Docker overlay network. Has the same defaults as container.

`port`

port: <port>

The port number of the service on the container as seen by other services running within Docker. Required.

(TODO: If TLS is enabled, this is the TLS port.)

`proxyPort`

proxyPort: <port>

The port number for the service as exposed on the proxy host. Defaults to the same as port. You must specify a value if you run multiple instances of the same service.

Conversion Guide

In prior tests, a config file, and the ConfigFileConfigProvider class, provided test configuration. In this version, the file described here provides configuration. This section presents a mapping from the old to the new form.

The IntegrationTestingConfig class, which the above class used to provide, is reimplemented to provide the same information to tests as before; only the source of the information has changed.

The new framework assumes that each Druid node is configured either for plain text or for TLS. (If this assumption is wrong, we'll change the config file to match.)

Many of the properties are derived from information in the configuration file. For example, host names (within Docker) are those given in the druid section, and ports (within the cluster and for the client) are given in druid.<service>.intances.port, from which the code computes the URL.

The old system hard-codes the idea that there are two coordinators or overlords. The new system allows any number of instances.

Method	Old Property	New Format
Router
`getRouterHost()`	`router_host`	`'router'`
`getRouterUrl()`	`router_url`	`'router'` & `instances.port`
`getRouterTLSUrl()`	`router_tls_url`	"
`getPermissiveRouterUrl()`	`router_permissive_url`	"
`getPermissiveRouterTLSUrl()`	`router_permissive_tls_url`	"
`getNoClientAuthRouterUrl()`	`router_no_client_auth_url`	"
`getNoClientAuthRouterTLSUrl()`	`router_no_client_auth_tls_url`	"
`getCustomCertCheckRouterUrl()`		"
`getCustomCertCheckRouterTLSUrl()`		"
Broker
`getBrokerHost()`	`broker_host`	`'broker'`
`getBrokerUrl()`	`broker_url`	`'broker'` & `instances.port`
`getBrokerTLSUrl()`	`broker_tls_url`	"
Coordinator
`getCoordinatorHost()`	`coordinator_host`	`'coordinator'` + `tag`
`getCoordinatorTwoHost()`	`coordinator_two_host`	"
`getCoordinatorUrl()`	`coordinator_url`	host & `instances.port`
`getCoordinatorTLSUrl()`	`coordinator_tls_url`	"
`getCoordinatorTwoUrl()`	`coordinator_two_url`	"
`getCoordinatorTwoTLSUrl()`	`coordinator_two_tls_url`	"
Overlord
`getOverlordUrl()`	?	`'overlord'` + `tag`
`getOverlordTwoHost()`	`overlord_two_host`	"
`getOverlordTwoUrl()`	`overlord_two_url`	host & `instances.port`
`getOverlordTLSUrl()`	?	"
`getOverlordTwoTLSUrl()`	`overlord_two_tls_url`	"
Overlord
`getHistoricalHost()`	`historical_host`	`historical'`
`getHistoricalUrl()`	`historical_url`	`'historical'` & `instances.port`
`getHistoricalTLSUrl()`	`historical_tls_url`	"
Overlord
`getMiddleManagerHost()`	`middlemanager_host`	`'middlemanager'`
Dependencies
`getZookeeperHosts()`	`zookeeper_hosts`	`'zk'`
`getKafkaHost()`	`kafka_host`	'`kafka`'
`getSchemaRegistryHost()`	`schema_registry_host`	?
`getProperty()`	From config file	From `settings`
`getProperties()`	"	"
`getUsername()`	`username`	Setting
`getPassword()`	`password`	Setting
`getCloudBucket()`	`cloud_bucket`	Setting
`getCloudPath()`	`cloud_path`	Setting
`getCloudRegion()`	`cloud_region`	Setting
`getS3AssumeRoleWithExternalId()`	`s3_assume_role_with_external_id`	Setting
`getS3AssumeRoleExternalId()`	`s3_assume_role_external_id`	Setting
`getS3AssumeRoleWithoutExternalId()`	`s3_assume_role_without_external_id`	Setting
`getAzureKey()`	`azureKey`	Setting
`getHadoopGcsCredentialsPath()`	`hadoopGcsCredentialsPath`	Setting
`getStreamEndpoint()`	`stream_endpoint`	Setting
`manageKafkaTopic()`	?	?
`getExtraDatasourceNameSuffix()`	?	?

Pre-defined environment bindings:

Others can be added in Initializer.Builder.

20 KiB Raw Blame History

Test Configuration

Cluster Types

Specify the Cluster Type

Cluster Configuration Files

Configuration File Syntax

type

include

proxyHost

datasourceSuffix

zk

startTimeoutSecs

metastore

driver

connectURI

user

password

properties

kafka

druid

properties

settings

metastoreInit

metastoreInitDelaySec

Service Object

if

instances

Service Instance Object

tag

container

host

port

proxyPort

Conversion Guide

20 KiB

Raw Blame History

`type`

`include`

`proxyHost`

`datasourceSuffix`

`zk`

`startTimeoutSecs`

`metastore`

`driver`

`connectURI`

`user`

`password`

`properties`

`kafka`

`druid`

`properties`

`settings`

`metastoreInit`

`metastoreInitDelaySec`

`if`

`instances`

`tag`

`container`

`host`

`port`

`proxyPort`