druid/integration-tests-ex/docs/docker.md

11 KiB

Docker Test Image for Druid

Integration tests need a Druid cluster. While some tests support using Kubernetes for the Quickstart cluster, most need a cluster with some test-specific configuration. We use Docker Compose to create that cluster, based on a test-oriented Docker image built by the it-image Maven module (activated by the test-image profile.) The image contains the Druid distribution, unpacked, along with the MySQL and MariaDB client libaries and and the Kafka protobuf dependency. Docker Compose is used to pass configuration specific to each service.

In addition to the Druid image, we use "official" images for dependencies such as ZooKeeper, MySQL and Kafka.

The image here is distinct from the "retail" image used for getting started. The test image:

  • Uses a shared directory to hold logs and some configuration.
  • Uses "official" images for dependencies.
  • Assumes the wrapper Docker compose scripts.
  • Has some additional test-specific extensions as defind in it-tools.

Build Process

Assuming DRUID_DEV points to your Druid build directory, to build the image (only):

cd $DRUID_DEV/docker-tests/it-image
mvn -P test-image install

Building of the image occurs in four steps:

  • The Maven pom.xml file gathers versions and other information from the build. It also uses the normal Maven dependency mechanism to download the MySQL, MariaDB and Kafka client libraries, then copies them to the target/docker directory. It then invokes the build-image.sh script.
  • build-image.sh adds the Druid build tarball from distribution/target, copies the contents of test-image/docker to target/docker and then invokes the docker build command.
  • docker build uses target/docker as the context, and thus uses the Dockerfile to build the image. The Dockerfile copies artifacts into the image, then defers to the test-setup.sh script.
  • The test-setup.sh script is copied into the image and run. This script does the work of installing Druid.

The resulting image is named org.apache.druid/test:<version>.

Clean

A normal mvn clean won't remove the Docker image because that is often not what you want. Instead, do:

mvn clean -P test-image

You can also remove the image using Docker or the Docker desktop.

target/docker

Docker requires that all build resources be within the current directory. We don't want to change the source directory: in Maven, only the target directories should contain build artifacts. So, the pom.xml file builds up a target/docker directory. The pom.xml file then invokes the build-image.sh script to complete the setup. The resulting directory structure is:

/target/docker
|- Dockerfile (from docker/)
|- scripts (from docker/)
|- apache-druid-<version>-bin.tar.gz (from distribution, by build-image.sh)
|- MySQL client (done by pom.xml)
|- MariaDB client (done by pom.xml)
|- Kafka protobuf client (done by pom.xml)

Then, we invoke docker build to build our test image. The Dockerfile copies files into the image. Actual setup is done by the test-setup.sh script copied into the image.

Many Dockerfiles issue Linux commands inline. In some cases, this can speed up subsequent builds because Docker can reuse layers. However, such Dockerfiles are tedious to debug. It is far easier to do the detailed setup in a script within the image. With this approach, you can debug the script by loading it into the image, but don't run it in the Dockerfile. Instead, launch the image with a bash shell and run the script by hand to debug. Since our build process is quick, we don't lose much by reusing layers.

Manual Image Rebuilds

You can quick rebuild the image if you've previously run a Maven image build. Assume DRUID_DEV points to your Druid development root. Start with a Maven build:

cd $DRUID_DEV/docker/test-image
mvn -P test-image install

Maven is rather slow to do its part. Let it grind away once to populate target/docker. Then, as you debug the Dockerfile, or test-setup.sh, you can build faster:

cd $DRUID_DEV/docker/test-image
./rebuild.sh

This works because the Maven build creates a file target/env.sh that contains the Maven-defined environment. rebuild.sh reads that environment, then proceeds as would the Maven build. Image build time shrinks from about a minute to just a few seconds. rebuild.sh will fail if target/env.sh is missing, which reminds you to do the full Maven build that first time.

Remember to do a full Maven build if you change the actual Druid code. You'll need Maven to rebuild the affected jar file and to recreate the distribution image. You can do this the slow way by doing a full rebuild, or, if you are comfortable with maven, you can selectively run just the one module build followed by just the distribution build.

Image Contents

The Druid test image adds the following to the base image:

  • A Debian base image with the target JDK installed.
  • Druid in /usr/local/druid
  • Script to run Druid: /usr/local/launch.sh
  • Extra libraries (Kafka, MySQL, MariaDB) placed in the Druid lib directory.

The specific "bill of materials" follows. DRUID_HOME is the location of the Druid install and is set to /usr/local/druid.

Variable or Item Source Destination
Druid build distribution/target $DRUID_HOME
MySQL Connector Maven repo $DRUID_HOME/lib
Kafka Protobuf Maven repo $DRUID_HOME/lib
Druid launch script docker/launch.sh /usr/local/launch.sh
Env-var-to-config script docker/druid.sh /usr/local/druid.sh

Several environment variables are defined. DRUID_HOME is useful at runtime.

Name Description
DRUID_HOME Location of the Druid install
DRUID_VERSION Druid version used to build the image
JAVA_HOME Java location
JAVA_VERSION Java version
MYSQL_VERSION MySQL version (DB, connector) (not actually used)
MYSQL_DRIVER_CLASSNAME Name of the MySQL driver (not actually used)
CONFLUENT_VERSION Kafka Protobuf library version (not actually used)

Shared Directory

The image assumes a "shared" directory passes in additional configuration information, and exports logs and other items for inspection.

  • Location in the container: /shared
  • Location on the host: <project>/target/shared

This means that each test group has a distinct shared directory, populated as needed for that test.

Input items:

Item Description
conf/ log4j.xml config (optional)
hadoop-xml/ Hadoop configuration (optional)
hadoop-dependencies/ Hadoop dependencies (optional)
lib/ Extra Druid class path items (optional)

Output items:

Item Description
logs/ Log files from each service
tasklogs/ Indexer task logs
kafka/ Kafka persistence
db/ MySQL database
druid/ Druid persistence, etc.

Note on the db directory: the MySQL container creates this directory when it starts. If you start, then restart the MySQL container, you must remove the db directory before restart or MySQL will fail due to existing files.

Third-Party Logs

The three third-party containers are configured to log to the /shared directory rather than to Docker:

  • Kafka: /shared/logs/kafka.log
  • ZooKeeper: /shared/logs/zookeeper.log
  • MySQL: /shared/logs/mysql.log

Entry Point

The container launches the launch.sh script which:

  • Converts environment variables to config files.
  • Assembles the Java command line arguments, including those explained above, and the just-generated config files.
  • Launches Java as "pid 1" so it will receive signals.

Run Configuration

The "raw" Java environment variables are a bit overly broad and result in copy/paste when a test wants to customize only part of the option, such as JVM arguments. To assist, the image breaks configuration down into smaller pieces, which it assembles prior to launch.

Enviornment Viable Description
DRUID_SERVICE Name of the Druid service to run in the server $DRUID_SERVICE option
DRUID_INSTANCE Suffix added to the DRUID_SERVICE to create the log file name.
Use when running more than one of the same service.
DRUID_COMMON_JAVA_OPTS Java options common to all services
DRUID_SERVICE_JAVA_OPTS Java options for this one service or instance
DEBUG_OPTS Optional debugging Java options
LOG4J_CONFIG Optional Log4J configuration used in -Dlog4j.configurationFile=$LOG4J_CONFIG
DRUID_CLASSPATH Optional extra Druid class path

In addition, three other shared directories are added to the class path if they exist:

  • /shared/hadoop-xml - included itself
  • /shared/lib - Included as /shared/lib/* to include extra jars
  • /shared/resources - included itself to hold extra class-path resources

init Process

Middle Manager launches Peon processes which must be reaped. Add the following option to the Docker Compose configuration for this service:

   init: true

Extensions

The following extensions are installed in the image:

druid-avro-extensions
druid-aws-rds-extensions
druid-azure-extensions
druid-basic-security
druid-bloom-filter
druid-datasketches
druid-ec2-extensions
druid-google-extensions
druid-hdfs-storage
druid-histogram
druid-kafka-extraction-namespace
druid-kafka-indexing-service
druid-kerberos
druid-kinesis-indexing-service
druid-kubernetes-extensions
druid-lookups-cached-global
druid-lookups-cached-single
druid-orc-extensions
druid-pac4j
druid-parquet-extensions
druid-protobuf-extensions
druid-ranger-security
druid-s3-extensions
druid-stats
it-tools
mysql-metadata-storage
postgresql-metadata-storage
simple-client-sslcontext

If more are needed, they should be added during the image build.