Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984)

This commit is contained in:
Victoria Lim 2023-05-15 15:20:52 -07:00 committed by GitHub
parent c4aa98953b
commit 66d4ea014c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 1635 additions and 131 deletions

3
.gitignore vendored
View File

@ -33,9 +33,10 @@ integration-tests/gen-scripts/
**/.ipython/
**/.jupyter/
**/.local/
**/druidapi.egg-info/
examples/quickstart/jupyter-notebooks/docker-jupyter/notebooks
# ignore NetBeans IDE specific files
nbproject
nbactions.xml
nb-configuration.xml

View File

@ -0,0 +1,201 @@
---
id: tutorial-jupyter-docker
title: "Docker for Jupyter Notebook tutorials"
sidebar_label: "Docker for tutorials"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
Apache Druid provides a custom Jupyter container that contains the prerequisites
for all Jupyter-based Druid tutorials, as well as all of the tutorials themselves.
You can run the Jupyter container, as well as containers for Druid and Apache Kafka,
using the Docker Compose file provided in the Druid GitHub repository.
You can run the following combination of applications:
* [Jupyter only](#start-only-the-jupyter-container)
* [Jupyter and Druid](#start-jupyter-and-druid)
* [Jupyter, Druid, and Kafka](#start-jupyter-druid-and-kafka)
## Prerequisites
Jupyter in Docker requires that you have **Docker** and **Docker Compose**.
We recommend installing these through [Docker Desktop](https://docs.docker.com/desktop/).
## Launch the Docker containers
You run Docker Compose to launch Jupyter and optionally Druid or Kafka.
Docker Compose references the configuration in `docker-compose.yaml`.
Running Druid in Docker also requires the `environment` file, which
sets the configuration properties for the Druid services.
To get started, download both `docker-compose.yaml` and `environment` from
[`tutorial-jupyter-docker.zip`](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip).
Alternatively, you can clone the [Apache Druid repo](https://github.com/apache/druid) and
access the files in `druid/examples/quickstart/jupyter-notebooks/docker-jupyter`.
### Start only the Jupyter container
If you already have Druid running locally, you can run only the Jupyter container to complete the tutorials.
In the same directory as `docker-compose.yaml`, start the application:
```bash
docker compose --profile jupyter up -d
```
The Docker Compose file assigns `8889` for the Jupyter port.
You can override the port number by setting the `JUPYTER_PORT` environment variable before starting the Docker application.
### Start Jupyter and Druid
Running Druid in Docker requires the `environment` file as well as an environment variable named `DRUID_VERSION`,
which determines the version of Druid to use. The Druid version references the Docker tag to pull from the
[Apache Druid Docker Hub](https://hub.docker.com/r/apache/druid/tags).
In the same directory as `docker-compose.yaml` and `environment`, start the application:
```bash
DRUID_VERSION={{DRUIDVERSION}} docker compose --profile druid-jupyter up -d
```
### Start Jupyter, Druid, and Kafka
Running Druid in Docker requires the `environment` file as well as the `DRUID_VERSION` environment variable.
In the same directory as `docker-compose.yaml` and `environment`, start the application:
```bash
DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services up -d
```
### Update image from Docker Hub
If you already have a local cache of the Jupyter image, you can update the image before running the application using the following command:
```bash
docker compose pull jupyter
```
### Use locally built image
The default Docker Compose file pulls the custom Jupyter Notebook image from a third party Docker Hub.
If you prefer to build the image locally from the official source, do the following:
1. Clone the Apache Druid repository.
2. Navigate to `examples/quickstart/jupyter-notebooks/docker-jupyter`.
3. Start the services using `-f docker-compose-local.yaml` in the `docker compose` command. For example:
```bash
DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services -f docker-compose-local.yaml up -d
```
## Access Jupyter-based tutorials
The following steps show you how to access the Jupyter notebook tutorials from the Docker container.
At startup, Docker creates and mounts a volume to persist data from the container to your local machine.
This way you can save your work completed within the Docker container.
1. Navigate to the notebooks at http://localhost:8889.
> If you set `JUPYTER_PORT` to another port number, replace `8889` with the value of the Jupyter port.
2. Select a tutorial. If you don't plan to save your changes, you can use the notebook directly as is. Otherwise, continue to the next step.
3. Optional: To save a local copy of your tutorial work,
select **File > Save as...** from the navigation menu. Then enter `work/<notebook name>.ipynb`.
If the notebook still displays as read only, you may need to refresh the page in your browser.
Access the saved files in the `notebooks` folder in your local working directory.
## View the Druid web console
To access the Druid web console in Docker, go to http://localhost:8888/unified-console.html.
Use the web console to view datasources and ingestion tasks that you create in the tutorials.
## Stop Docker containers
Shut down the Docker application using the following command:
```bash
docker compose down -v
```
## Tutorial setup without using Docker
To use the Jupyter Notebook-based tutorials without using Docker, do the following:
1. Clone the Apache Druid repo, or download the [tutorials](tutorial-jupyter-index.md#tutorials)
as well as the [Python client for Druid](tutorial-jupyter-index.md#python-api-for-druid).
2. Install the prerequisite Python packages with the following commands:
```bash
# Install requests
pip install requests
```
```bash
# Install JupyterLab
pip install jupyterlab
# Install Jupyter Notebook
pip install notebook
```
Individual notebooks may list additional packages you need to install to complete the tutorial.
3. In your Druid source repo, install `druidapi` with the following commands:
```bash
cd examples/quickstart/jupyter-notebooks/druidapi
pip install .
```
4. Start Jupyter, in the same directory as the tutorials, using either JupyterLab or Jupyter Notebook:
```bash
# Start JupyterLab on port 3001
jupyter lab --port 3001
# Start Jupyter Notebook on port 3001
jupyter notebook --port 3001
```
5. Start Druid. You can use the [Quickstart (local)](./index.md) instance. The tutorials
assume that you are using the quickstart, so no authentication or authorization
is expected unless explicitly mentioned.
If you contribute to Druid, and work with Druid integration tests, you can use a test cluster.
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
```bash
cd $DRUID_DEV
./it.sh build
./it.sh image
./it.sh up <category>
```
Replace `<category>` with one of the available integration test categories. See the integration
test `README.md` for details.
You should now be able to access and complete the tutorials.
## Learn more
See the following topics for more information:
* [Jupyter Notebook tutorials](tutorial-jupyter-index.md) for the available Jupyter Notebook-based tutorials for Druid
* [Tutorial: Run with Docker](docker.md) for running Druid from a Docker container

View File

@ -32,67 +32,34 @@ the Druid API to complete the tutorial.
## Prerequisites
Make sure you meet the following requirements before starting the Jupyter-based tutorials:
The simplest way to get started is to use Docker. In this case, you only need to set up Docker Desktop.
For more information, see [Docker for Jupyter Notebook tutorials](tutorial-jupyter-docker.md).
Otherwise, you can install the prerequisites on your own. Here's what you need:
- An available Druid instance.
- Python 3.7 or later
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port.
By default, Druid and Jupyter both try to use port `8888`, so start Jupyter on a different port.
- The `requests` Python package
- The `druidapi` Python package
- The `requests` package for Python. For example, you can install it with the following command:
For setup instructions, see [Tutorial setup without using Docker](tutorial-jupyter-docker.md#tutorial-setup-without-using-docker).
Individual tutorials may require additional Python packages, such as for visualization or streaming ingestion.
```bash
pip3 install requests
```
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
and Jupyter both try to use port `8888`, so start Jupyter on a different port.
- Install JupyterLab or Notebook:
```bash
# Install JupyterLab
pip3 install jupyterlab
# Install Jupyter Notebook
pip3 install notebook
```
- Start Jupyter using either JupyterLab
```bash
# Start JupyterLab on port 3001
jupyter lab --port 3001
```
Or using Jupyter Notebook
```bash
# Start Jupyter Notebook on port 3001
jupyter notebook --port 3001
```
- An available Druid instance. You can use the [Quickstart (local)](./index.md) instance. The tutorials
assume that you are using the quickstart, so no authentication or authorization
is expected unless explicitly mentioned.
If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
```bash
cd $DRUID_DEV
./it.sh build
./it.sh image
./it.sh up <category>
```
Replace `<category>` with one of the available integration test categories. See the integration
test `README.md` for details.
## Simple Druid API
## Python API for Druid
The `druidapi` Python package is a REST API for Druid.
One of the notebooks shows how to use the Druid REST API. The others focus on other
topics and use a simple set of Python wrappers around the underlying REST API. The
wrappers reside in the `druidapi` package within the notebooks directory. While the package
can be used in any Python program, the key purpose, at present, is to support these
notebooks. See the [Introduction to the Druid Python API]
(https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
notebooks. See
[Introduction to the Druid Python API](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
for an overview of the Python API.
The `druidapi` package is already installed in the custom Jupyter Docker container for Druid tutorials.
## Tutorials
The notebooks are located in the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/). You can either clone the repo or download the notebooks you want individually.

View File

@ -41,24 +41,27 @@
"source": [
"## Prerequisites\n",
"\n",
"To get this far, you've installed Python 3 and Jupyter Notebook. Make sure you meet the following requirements before starting the Jupyter-based tutorials:\n",
"\n",
"- The `requests` package for Python. For example, you can install it with the following command:\n",
"\n",
" ```bash\n",
" pip install requests\n",
" ````\n",
"\n",
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
" and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
"Before starting the Jupyter-based tutorials, make sure you meet the requirements listed in this section.\n",
"The simplest way to get started is to use Docker. In this case, you only need to set up Docker Desktop.\n",
"For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
"\n",
"Otherwise, you need the following:\n",
"- An available Druid instance. You can use the local quickstart configuration\n",
" described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
" The tutorials assume that you are using the quickstart, so no authentication or authorization\n",
" is expected unless explicitly mentioned.\n",
"- Python 3.7 or later\n",
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
" and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
"- The `requests` Python package\n",
"- The `druidapi` Python package\n",
"\n",
"For setup instructions, see [Tutorial setup without using Docker](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html#tutorial-setup-without-using-docker).\n",
"Individual tutorials may require additional Python packages, such as for visualization or streaming ingestion.\n",
"\n",
"## Simple Druid API\n",
"\n",
"The `druidapi` Python package is a REST API for Druid.\n",
"One of the notebooks shows how to use the Druid REST API. The others focus on other\n",
"topics and use a simple set of Python wrappers around the underlying REST API. The\n",
"wrappers reside in the `druidapi` package within this directory. While the package\n",
@ -148,7 +151,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.9.5"
}
},
"nbformat": 4,

View File

@ -0,0 +1,65 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# -------------------------------------------------------------
# This Dockerfile creates a custom Docker image for Jupyter
# to use with the Apache Druid Jupyter notebook tutorials.
# Build using `docker build -t imply/druid-notebook:latest .`
# -------------------------------------------------------------
# Use the Jupyter base notebook as the base image
# Copyright (c) Project Jupyter Contributors.
# Distributed under the terms of the 3-Clause BSD License.
FROM jupyter/base-notebook
# Set the container working directory
WORKDIR /home/jovyan
# Install required Python packages
RUN pip install requests
RUN pip install pandas
RUN pip install numpy
RUN pip install seaborn
RUN pip install bokeh
RUN pip install kafka-python
RUN pip install sortedcontainers
# Install druidapi client from apache/druid
# Local install requires sudo privileges
USER root
ADD druidapi /home/jovyan/druidapi
WORKDIR /home/jovyan/druidapi
RUN pip install .
WORKDIR /home/jovyan
# Import data generator and configuration file
# Change permissions to allow import (requires sudo privileges)
# WIP -- change to apache repo
ADD https://raw.githubusercontent.com/shallada/druid/data-generator/examples/quickstart/jupyter-notebooks/data-generator/DruidDataDriver.py .
ADD docker-jupyter/kafka_docker_config.json .
RUN chmod 664 DruidDataDriver.py
RUN chmod 664 kafka_docker_config.json
USER jovyan
# Copy the Jupyter notebook tutorials from the
# build directory to the image working directory
COPY ./*ipynb .
# Add location of the data generator to PYTHONPATH
ENV PYTHONPATH "${PYTHONPATH}:/home/jovyan"

View File

@ -1,12 +1,5 @@
# Jupyter Notebook tutorials for Druid
If you are reading this in Jupyter, switch over to the [0-START-HERE](0-START-HERE.ipynb)
notebook instead.
<!-- This README, the "0-START-HERE" notebook, and the tutorial-jupyter-index.md file in
docs/tutorials share a lot of the same content. If you make a change in one place, update
the other too. -->
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
@ -26,70 +19,13 @@ the other too. -->
~ under the License.
-->
If you are reading this in Jupyter, switch over to the [0-START-HERE](0-START-HERE.ipynb)
notebook instead.
You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These
tutorials provide snippets of Python code that you can use to run calls against
the Druid API to complete the tutorial.
## Prerequisites
For information on prerequisites and getting started with the Jupyter-based tutorials,
see [Jupyter Notebook tutorials](../../../docs/tutorials/tutorial-jupyter-index.md).
Make sure you meet the following requirements before starting the Jupyter-based tutorials:
- Python 3
- The `requests` package for Python. For example, you can install it with the following command:
```bash
pip install requests
```
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
and Jupyter both try to use port `8888`, so start Jupyter on a different port.
- Install JupyterLab or Notebook:
```bash
# Install JupyterLab
pip install jupyterlab
# Install Jupyter Notebook
pip install notebook
```
- Start Jupyter using either JupyterLab
```bash
# Start JupyterLab on port 3001
jupyter lab --port 3001
```
Or using Jupyter Notebook
```bash
# Start Jupyter Notebook on port 3001
jupyter notebook --port 3001
```
- The Python API client for Druid. Clone the Druid repo if you haven't already.
Go to your Druid source repo and install `druidapi` with the following commands:
```bash
cd examples/quickstart/jupyter-notebooks/druidapi
pip install .
```
- An available Druid instance. You can use the [quickstart deployment](https://druid.apache.org/docs/latest/tutorials/index.html).
The tutorials assume that you are using the quickstart, so no authentication or authorization
is expected unless explicitly mentioned.
If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
```bash
cd $DRUID_DEV
./it.sh build
./it.sh image
./it.sh up <category>
```
Replace `<catagory>` with one of the available integration test categories. See the integration
test `README.md` for details.
## Continue in Jupyter
Start Jupyter (see above) and navigate to the "0-START-HERE" notebook for more information.

View File

@ -0,0 +1,60 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
# Jupyter in Docker
For details on getting started with Jupyter in Docker,
see [Docker for Jupyter Notebook tutorials](../../../../docs/tutorials/tutorial-jupyter-docker.md).
## Contributing
### Rebuild Jupyter image
You may want to update the Jupyter image to access new or updated tutorial notebooks,
include new Python packages, or update configuration files.
To build the custom Jupyter image locally:
1. Clone the Druid repo if you haven't already.
2. Navigate to `examples/quickstart/jupyter-notebooks` in your Druid source repo.
3. Edit the image definition in `Dockerfile`.
4. Navigate to the `docker-jupyter` directory.
5. Generate the new build using the following command:
```shell
DRUID_VERSION=25.0.0 docker compose --profile all-services -f docker-compose-local.yaml up -d --build
```
You can change the value of `DRUID_VERSION` or the profile used from the Docker Compose file.
### Update Docker Compose
The Docker Compose file defines a multi-container application that allows you to run
the custom Jupyter Notebook container, Apache Druid, and Apache Kafka.
Any changes to `docker-compose.yaml` should also be made to `docker-compose-local.yaml`
and vice versa. These files should be identical except that `docker-compose.yaml`
contains an `image` attribute while `docker-compose-local.yaml` contains a `build` subsection.
If you update `docker-compose.yaml`, recreate the ZIP file using the following command:
```bash
zip tutorial-jupyter-docker.zip docker-compose.yaml environment
```

View File

@ -0,0 +1,172 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
---
version: "2.2"
volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
broker_var: {}
coordinator_var: {}
router_var: {}
druid_shared: {}
services:
postgres:
image: postgres:latest
container_name: postgres
profiles: ["druid-jupyter", "all-services"]
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=FoolishPassword
- POSTGRES_USER=druid
- POSTGRES_DB=druid
# Need 3.5 or later for container nodes
zookeeper:
image: zookeeper:latest
container_name: zookeeper
profiles: ["druid-jupyter", "all-services"]
ports:
- "2181:2181"
environment:
- ZOO_MY_ID=1
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: bitnami/kafka:latest
container_name: kafka-broker
profiles: ["all-services"]
ports:
# To learn about configuring Kafka for access across networks see
# https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
- "9092:9092"
depends_on:
- zookeeper
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
coordinator:
image: apache/druid:${DRUID_VERSION}
container_name: coordinator
profiles: ["druid-jupyter", "all-services"]
volumes:
- druid_shared:/opt/shared
- coordinator_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
ports:
- "8081:8081"
command:
- coordinator
env_file:
- environment
broker:
image: apache/druid:${DRUID_VERSION}
container_name: broker
profiles: ["druid-jupyter", "all-services"]
volumes:
- broker_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8082:8082"
command:
- broker
env_file:
- environment
historical:
image: apache/druid:${DRUID_VERSION}
container_name: historical
profiles: ["druid-jupyter", "all-services"]
volumes:
- druid_shared:/opt/shared
- historical_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8083:8083"
command:
- historical
env_file:
- environment
middlemanager:
image: apache/druid:${DRUID_VERSION}
container_name: middlemanager
profiles: ["druid-jupyter", "all-services"]
volumes:
- druid_shared:/opt/shared
- middle_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8091:8091"
- "8100-8105:8100-8105"
command:
- middleManager
env_file:
- environment
router:
image: apache/druid:${DRUID_VERSION}
container_name: router
profiles: ["druid-jupyter", "all-services"]
volumes:
- router_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8888:8888"
command:
- router
env_file:
- environment
jupyter:
build:
context: ..
dockerfile: Dockerfile
container_name: jupyter
profiles: ["jupyter", "all-services"]
environment:
DOCKER_STACKS_JUPYTER_CMD: "notebook"
NOTEBOOK_ARGS: "--NotebookApp.token=''"
ports:
- "${JUPYTER_PORT:-8889}:8888"
volumes:
- ./notebooks:/home/jovyan/work

View File

@ -0,0 +1,170 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
---
version: "2.2"
volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
broker_var: {}
coordinator_var: {}
router_var: {}
druid_shared: {}
services:
postgres:
image: postgres:latest
container_name: postgres
profiles: ["druid-jupyter", "all-services"]
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=FoolishPassword
- POSTGRES_USER=druid
- POSTGRES_DB=druid
# Need 3.5 or later for container nodes
zookeeper:
image: zookeeper:latest
container_name: zookeeper
profiles: ["druid-jupyter", "all-services"]
ports:
- "2181:2181"
environment:
- ZOO_MY_ID=1
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: bitnami/kafka:latest
container_name: kafka-broker
profiles: ["all-services"]
ports:
# To learn about configuring Kafka for access across networks see
# https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
- "9092:9092"
depends_on:
- zookeeper
environment:
- KAFKA_BROKER_ID=1
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
coordinator:
image: apache/druid:${DRUID_VERSION}
container_name: coordinator
profiles: ["druid-jupyter", "all-services"]
volumes:
- druid_shared:/opt/shared
- coordinator_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
ports:
- "8081:8081"
command:
- coordinator
env_file:
- environment
broker:
image: apache/druid:${DRUID_VERSION}
container_name: broker
profiles: ["druid-jupyter", "all-services"]
volumes:
- broker_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8082:8082"
command:
- broker
env_file:
- environment
historical:
image: apache/druid:${DRUID_VERSION}
container_name: historical
profiles: ["druid-jupyter", "all-services"]
volumes:
- druid_shared:/opt/shared
- historical_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8083:8083"
command:
- historical
env_file:
- environment
middlemanager:
image: apache/druid:${DRUID_VERSION}
container_name: middlemanager
profiles: ["druid-jupyter", "all-services"]
volumes:
- druid_shared:/opt/shared
- middle_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8091:8091"
- "8100-8105:8100-8105"
command:
- middleManager
env_file:
- environment
router:
image: apache/druid:${DRUID_VERSION}
container_name: router
profiles: ["druid-jupyter", "all-services"]
volumes:
- router_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
ports:
- "8888:8888"
command:
- router
env_file:
- environment
jupyter:
image: imply/druid-notebook:latest
container_name: jupyter
profiles: ["jupyter", "all-services"]
environment:
DOCKER_STACKS_JUPYTER_CMD: "notebook"
NOTEBOOK_ARGS: "--NotebookApp.token=''"
ports:
- "${JUPYTER_PORT:-8889}:8888"
volumes:
- ./notebooks:/home/jovyan/work

View File

@ -0,0 +1,56 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Java tuning
#DRUID_XMX=1g
#DRUID_XMS=1g
#DRUID_MAXNEWSIZE=250m
#DRUID_NEWSIZE=250m
#DRUID_MAXDIRECTMEMORYSIZE=6172m
DRUID_SINGLE_NODE_CONF=micro-quickstart
druid_emitter_logging_logLevel=debug
druid_extensions_loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-multi-stage-query", "druid-kafka-indexing-service"]
druid_zk_service_host=zookeeper
druid_metadata_storage_host=
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
druid_metadata_storage_connector_user=druid
druid_metadata_storage_connector_password=FoolishPassword
druid_coordinator_balancer_strategy=cachingCost
druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiB
druid_storage_type=local
druid_storage_storageDirectory=/opt/shared/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/shared/indexing-logs
druid_processing_numThreads=2
druid_processing_numMergeBuffers=2
DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>

View File

@ -0,0 +1,90 @@
{
"target": {
"type": "kafka",
"endpoint": "kafka:9092",
"topic": "social_media"
},
"emitters": [
{
"name": "example_record_1",
"dimensions": [
{
"type": "enum",
"name": "username",
"values": ["willow", "mia", "leon", "milton", "miette", "gus", "jojo", "rocket"],
"cardinality_distribution": {
"type": "uniform",
"min": 0,
"max": 7
}
},
{
"type": "string",
"name": "post_title",
"length_distribution": {"type": "uniform", "min": 1, "max": 140},
"cardinality": 0,
"chars": "abcdefghijklmnopqrstuvwxyz0123456789_ABCDEFGHIJKLMNOPQRSTUVWXYZ!';:,."
},
{
"type": "int",
"name": "views",
"distribution": {
"type": "exponential",
"mean": 10000
},
"cardinality": 0
},
{
"type": "int",
"name": "upvotes",
"distribution": {
"type": "normal",
"mean": 70,
"stddev": 20
},
"cardinality": 0
},
{
"type": "int",
"name": "comments",
"distribution": {
"type": "normal",
"mean": 10,
"stddev": 5
},
"cardinality": 0
},
{
"type": "enum",
"name": "edited",
"values": ["True","False"],
"cardinality_distribution": {
"type": "uniform",
"min": 0,
"max": 1
}
}
]
}
],
"interarrival": {
"type": "constant",
"value": 1
},
"states": [
{
"name": "state_1",
"emitter": "example_record_1",
"delay": {
"type": "constant",
"value": 1
},
"transitions": [
{
"next": "state_1",
"probability": 1.0
}
]
}
]
}

File diff suppressed because one or more lines are too long

View File

@ -27,6 +27,7 @@
"tutorials/tutorial-sql-query-view",
"tutorials/tutorial-unnest-arrays",
"tutorials/tutorial-jupyter-index",
"tutorials/tutorial-jupyter-docker",
"tutorials/tutorial-jdbc"
],
"Design": [