mirror of
https://github.com/apache/druid.git
synced 2025-02-22 02:05:01 +00:00
Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984)
This commit is contained in:
parent
c4aa98953b
commit
66d4ea014c
3
.gitignore
vendored
3
.gitignore
vendored
@ -33,9 +33,10 @@ integration-tests/gen-scripts/
|
||||
**/.ipython/
|
||||
**/.jupyter/
|
||||
**/.local/
|
||||
**/druidapi.egg-info/
|
||||
examples/quickstart/jupyter-notebooks/docker-jupyter/notebooks
|
||||
|
||||
# ignore NetBeans IDE specific files
|
||||
nbproject
|
||||
nbactions.xml
|
||||
nb-configuration.xml
|
||||
|
||||
|
201
docs/tutorials/tutorial-jupyter-docker.md
Normal file
201
docs/tutorials/tutorial-jupyter-docker.md
Normal file
@ -0,0 +1,201 @@
|
||||
---
|
||||
id: tutorial-jupyter-docker
|
||||
title: "Docker for Jupyter Notebook tutorials"
|
||||
sidebar_label: "Docker for tutorials"
|
||||
---
|
||||
|
||||
<!--
|
||||
~ Licensed to the Apache Software Foundation (ASF) under one
|
||||
~ or more contributor license agreements. See the NOTICE file
|
||||
~ distributed with this work for additional information
|
||||
~ regarding copyright ownership. The ASF licenses this file
|
||||
~ to you under the Apache License, Version 2.0 (the
|
||||
~ "License"); you may not use this file except in compliance
|
||||
~ with the License. You may obtain a copy of the License at
|
||||
~
|
||||
~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~
|
||||
~ Unless required by applicable law or agreed to in writing,
|
||||
~ software distributed under the License is distributed on an
|
||||
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
~ KIND, either express or implied. See the License for the
|
||||
~ specific language governing permissions and limitations
|
||||
~ under the License.
|
||||
-->
|
||||
|
||||
|
||||
Apache Druid provides a custom Jupyter container that contains the prerequisites
|
||||
for all Jupyter-based Druid tutorials, as well as all of the tutorials themselves.
|
||||
You can run the Jupyter container, as well as containers for Druid and Apache Kafka,
|
||||
using the Docker Compose file provided in the Druid GitHub repository.
|
||||
|
||||
You can run the following combination of applications:
|
||||
* [Jupyter only](#start-only-the-jupyter-container)
|
||||
* [Jupyter and Druid](#start-jupyter-and-druid)
|
||||
* [Jupyter, Druid, and Kafka](#start-jupyter-druid-and-kafka)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Jupyter in Docker requires that you have **Docker** and **Docker Compose**.
|
||||
We recommend installing these through [Docker Desktop](https://docs.docker.com/desktop/).
|
||||
|
||||
## Launch the Docker containers
|
||||
|
||||
You run Docker Compose to launch Jupyter and optionally Druid or Kafka.
|
||||
Docker Compose references the configuration in `docker-compose.yaml`.
|
||||
Running Druid in Docker also requires the `environment` file, which
|
||||
sets the configuration properties for the Druid services.
|
||||
To get started, download both `docker-compose.yaml` and `environment` from
|
||||
[`tutorial-jupyter-docker.zip`](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/docker-jupyter/tutorial-jupyter-docker.zip).
|
||||
|
||||
Alternatively, you can clone the [Apache Druid repo](https://github.com/apache/druid) and
|
||||
access the files in `druid/examples/quickstart/jupyter-notebooks/docker-jupyter`.
|
||||
|
||||
### Start only the Jupyter container
|
||||
|
||||
If you already have Druid running locally, you can run only the Jupyter container to complete the tutorials.
|
||||
In the same directory as `docker-compose.yaml`, start the application:
|
||||
|
||||
```bash
|
||||
docker compose --profile jupyter up -d
|
||||
```
|
||||
|
||||
The Docker Compose file assigns `8889` for the Jupyter port.
|
||||
You can override the port number by setting the `JUPYTER_PORT` environment variable before starting the Docker application.
|
||||
|
||||
### Start Jupyter and Druid
|
||||
|
||||
Running Druid in Docker requires the `environment` file as well as an environment variable named `DRUID_VERSION`,
|
||||
which determines the version of Druid to use. The Druid version references the Docker tag to pull from the
|
||||
[Apache Druid Docker Hub](https://hub.docker.com/r/apache/druid/tags).
|
||||
|
||||
In the same directory as `docker-compose.yaml` and `environment`, start the application:
|
||||
|
||||
```bash
|
||||
DRUID_VERSION={{DRUIDVERSION}} docker compose --profile druid-jupyter up -d
|
||||
```
|
||||
|
||||
### Start Jupyter, Druid, and Kafka
|
||||
|
||||
Running Druid in Docker requires the `environment` file as well as the `DRUID_VERSION` environment variable.
|
||||
|
||||
In the same directory as `docker-compose.yaml` and `environment`, start the application:
|
||||
|
||||
```bash
|
||||
DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services up -d
|
||||
```
|
||||
|
||||
### Update image from Docker Hub
|
||||
|
||||
If you already have a local cache of the Jupyter image, you can update the image before running the application using the following command:
|
||||
|
||||
```bash
|
||||
docker compose pull jupyter
|
||||
```
|
||||
|
||||
### Use locally built image
|
||||
|
||||
The default Docker Compose file pulls the custom Jupyter Notebook image from a third party Docker Hub.
|
||||
If you prefer to build the image locally from the official source, do the following:
|
||||
1. Clone the Apache Druid repository.
|
||||
2. Navigate to `examples/quickstart/jupyter-notebooks/docker-jupyter`.
|
||||
3. Start the services using `-f docker-compose-local.yaml` in the `docker compose` command. For example:
|
||||
|
||||
```bash
|
||||
DRUID_VERSION={{DRUIDVERSION}} docker compose --profile all-services -f docker-compose-local.yaml up -d
|
||||
```
|
||||
|
||||
## Access Jupyter-based tutorials
|
||||
|
||||
The following steps show you how to access the Jupyter notebook tutorials from the Docker container.
|
||||
At startup, Docker creates and mounts a volume to persist data from the container to your local machine.
|
||||
This way you can save your work completed within the Docker container.
|
||||
|
||||
1. Navigate to the notebooks at http://localhost:8889.
|
||||
> If you set `JUPYTER_PORT` to another port number, replace `8889` with the value of the Jupyter port.
|
||||
|
||||
2. Select a tutorial. If you don't plan to save your changes, you can use the notebook directly as is. Otherwise, continue to the next step.
|
||||
|
||||
3. Optional: To save a local copy of your tutorial work,
|
||||
select **File > Save as...** from the navigation menu. Then enter `work/<notebook name>.ipynb`.
|
||||
If the notebook still displays as read only, you may need to refresh the page in your browser.
|
||||
Access the saved files in the `notebooks` folder in your local working directory.
|
||||
|
||||
## View the Druid web console
|
||||
|
||||
To access the Druid web console in Docker, go to http://localhost:8888/unified-console.html.
|
||||
Use the web console to view datasources and ingestion tasks that you create in the tutorials.
|
||||
|
||||
## Stop Docker containers
|
||||
|
||||
Shut down the Docker application using the following command:
|
||||
|
||||
```bash
|
||||
docker compose down -v
|
||||
```
|
||||
|
||||
## Tutorial setup without using Docker
|
||||
|
||||
To use the Jupyter Notebook-based tutorials without using Docker, do the following:
|
||||
|
||||
1. Clone the Apache Druid repo, or download the [tutorials](tutorial-jupyter-index.md#tutorials)
|
||||
as well as the [Python client for Druid](tutorial-jupyter-index.md#python-api-for-druid).
|
||||
|
||||
2. Install the prerequisite Python packages with the following commands:
|
||||
|
||||
```bash
|
||||
# Install requests
|
||||
pip install requests
|
||||
```
|
||||
|
||||
```bash
|
||||
# Install JupyterLab
|
||||
pip install jupyterlab
|
||||
|
||||
# Install Jupyter Notebook
|
||||
pip install notebook
|
||||
```
|
||||
|
||||
Individual notebooks may list additional packages you need to install to complete the tutorial.
|
||||
|
||||
3. In your Druid source repo, install `druidapi` with the following commands:
|
||||
|
||||
```bash
|
||||
cd examples/quickstart/jupyter-notebooks/druidapi
|
||||
pip install .
|
||||
```
|
||||
|
||||
4. Start Jupyter, in the same directory as the tutorials, using either JupyterLab or Jupyter Notebook:
|
||||
```bash
|
||||
# Start JupyterLab on port 3001
|
||||
jupyter lab --port 3001
|
||||
|
||||
# Start Jupyter Notebook on port 3001
|
||||
jupyter notebook --port 3001
|
||||
```
|
||||
|
||||
5. Start Druid. You can use the [Quickstart (local)](./index.md) instance. The tutorials
|
||||
assume that you are using the quickstart, so no authentication or authorization
|
||||
is expected unless explicitly mentioned.
|
||||
|
||||
If you contribute to Druid, and work with Druid integration tests, you can use a test cluster.
|
||||
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
|
||||
|
||||
```bash
|
||||
cd $DRUID_DEV
|
||||
./it.sh build
|
||||
./it.sh image
|
||||
./it.sh up <category>
|
||||
```
|
||||
|
||||
Replace `<category>` with one of the available integration test categories. See the integration
|
||||
test `README.md` for details.
|
||||
|
||||
You should now be able to access and complete the tutorials.
|
||||
|
||||
## Learn more
|
||||
|
||||
See the following topics for more information:
|
||||
* [Jupyter Notebook tutorials](tutorial-jupyter-index.md) for the available Jupyter Notebook-based tutorials for Druid
|
||||
* [Tutorial: Run with Docker](docker.md) for running Druid from a Docker container
|
||||
|
@ -32,67 +32,34 @@ the Druid API to complete the tutorial.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Make sure you meet the following requirements before starting the Jupyter-based tutorials:
|
||||
The simplest way to get started is to use Docker. In this case, you only need to set up Docker Desktop.
|
||||
For more information, see [Docker for Jupyter Notebook tutorials](tutorial-jupyter-docker.md).
|
||||
|
||||
Otherwise, you can install the prerequisites on your own. Here's what you need:
|
||||
|
||||
- An available Druid instance.
|
||||
- Python 3.7 or later
|
||||
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port.
|
||||
By default, Druid and Jupyter both try to use port `8888`, so start Jupyter on a different port.
|
||||
- The `requests` Python package
|
||||
- The `druidapi` Python package
|
||||
|
||||
- The `requests` package for Python. For example, you can install it with the following command:
|
||||
For setup instructions, see [Tutorial setup without using Docker](tutorial-jupyter-docker.md#tutorial-setup-without-using-docker).
|
||||
Individual tutorials may require additional Python packages, such as for visualization or streaming ingestion.
|
||||
|
||||
```bash
|
||||
pip3 install requests
|
||||
```
|
||||
|
||||
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
|
||||
and Jupyter both try to use port `8888`, so start Jupyter on a different port.
|
||||
|
||||
|
||||
- Install JupyterLab or Notebook:
|
||||
|
||||
```bash
|
||||
# Install JupyterLab
|
||||
pip3 install jupyterlab
|
||||
# Install Jupyter Notebook
|
||||
pip3 install notebook
|
||||
```
|
||||
- Start Jupyter using either JupyterLab
|
||||
```bash
|
||||
# Start JupyterLab on port 3001
|
||||
jupyter lab --port 3001
|
||||
```
|
||||
|
||||
Or using Jupyter Notebook
|
||||
```bash
|
||||
# Start Jupyter Notebook on port 3001
|
||||
jupyter notebook --port 3001
|
||||
```
|
||||
|
||||
- An available Druid instance. You can use the [Quickstart (local)](./index.md) instance. The tutorials
|
||||
assume that you are using the quickstart, so no authentication or authorization
|
||||
is expected unless explicitly mentioned.
|
||||
|
||||
If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
|
||||
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
|
||||
|
||||
```bash
|
||||
cd $DRUID_DEV
|
||||
./it.sh build
|
||||
./it.sh image
|
||||
./it.sh up <category>
|
||||
```
|
||||
|
||||
Replace `<category>` with one of the available integration test categories. See the integration
|
||||
test `README.md` for details.
|
||||
|
||||
## Simple Druid API
|
||||
## Python API for Druid
|
||||
|
||||
The `druidapi` Python package is a REST API for Druid.
|
||||
One of the notebooks shows how to use the Druid REST API. The others focus on other
|
||||
topics and use a simple set of Python wrappers around the underlying REST API. The
|
||||
wrappers reside in the `druidapi` package within the notebooks directory. While the package
|
||||
can be used in any Python program, the key purpose, at present, is to support these
|
||||
notebooks. See the [Introduction to the Druid Python API]
|
||||
(https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
|
||||
notebooks. See
|
||||
[Introduction to the Druid Python API](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
|
||||
for an overview of the Python API.
|
||||
|
||||
The `druidapi` package is already installed in the custom Jupyter Docker container for Druid tutorials.
|
||||
|
||||
## Tutorials
|
||||
|
||||
The notebooks are located in the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/). You can either clone the repo or download the notebooks you want individually.
|
||||
|
@ -41,24 +41,27 @@
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"\n",
|
||||
"To get this far, you've installed Python 3 and Jupyter Notebook. Make sure you meet the following requirements before starting the Jupyter-based tutorials:\n",
|
||||
"\n",
|
||||
"- The `requests` package for Python. For example, you can install it with the following command:\n",
|
||||
"\n",
|
||||
" ```bash\n",
|
||||
" pip install requests\n",
|
||||
" ````\n",
|
||||
"\n",
|
||||
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
|
||||
" and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
|
||||
"Before starting the Jupyter-based tutorials, make sure you meet the requirements listed in this section.\n",
|
||||
"The simplest way to get started is to use Docker. In this case, you only need to set up Docker Desktop.\n",
|
||||
"For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
|
||||
"\n",
|
||||
"Otherwise, you need the following:\n",
|
||||
"- An available Druid instance. You can use the local quickstart configuration\n",
|
||||
" described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
|
||||
" The tutorials assume that you are using the quickstart, so no authentication or authorization\n",
|
||||
" is expected unless explicitly mentioned.\n",
|
||||
"- Python 3.7 or later\n",
|
||||
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
|
||||
" and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
|
||||
"- The `requests` Python package\n",
|
||||
"- The `druidapi` Python package\n",
|
||||
"\n",
|
||||
"For setup instructions, see [Tutorial setup without using Docker](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html#tutorial-setup-without-using-docker).\n",
|
||||
"Individual tutorials may require additional Python packages, such as for visualization or streaming ingestion.\n",
|
||||
"\n",
|
||||
"## Simple Druid API\n",
|
||||
"\n",
|
||||
"The `druidapi` Python package is a REST API for Druid.\n",
|
||||
"One of the notebooks shows how to use the Druid REST API. The others focus on other\n",
|
||||
"topics and use a simple set of Python wrappers around the underlying REST API. The\n",
|
||||
"wrappers reside in the `druidapi` package within this directory. While the package\n",
|
||||
@ -148,7 +151,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
"version": "3.9.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
65
examples/quickstart/jupyter-notebooks/Dockerfile
Normal file
65
examples/quickstart/jupyter-notebooks/Dockerfile
Normal file
@ -0,0 +1,65 @@
|
||||
#
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
#
|
||||
# -------------------------------------------------------------
|
||||
# This Dockerfile creates a custom Docker image for Jupyter
|
||||
# to use with the Apache Druid Jupyter notebook tutorials.
|
||||
# Build using `docker build -t imply/druid-notebook:latest .`
|
||||
# -------------------------------------------------------------
|
||||
|
||||
# Use the Jupyter base notebook as the base image
|
||||
# Copyright (c) Project Jupyter Contributors.
|
||||
# Distributed under the terms of the 3-Clause BSD License.
|
||||
FROM jupyter/base-notebook
|
||||
|
||||
# Set the container working directory
|
||||
WORKDIR /home/jovyan
|
||||
|
||||
# Install required Python packages
|
||||
RUN pip install requests
|
||||
RUN pip install pandas
|
||||
RUN pip install numpy
|
||||
RUN pip install seaborn
|
||||
RUN pip install bokeh
|
||||
RUN pip install kafka-python
|
||||
RUN pip install sortedcontainers
|
||||
|
||||
# Install druidapi client from apache/druid
|
||||
# Local install requires sudo privileges
|
||||
USER root
|
||||
ADD druidapi /home/jovyan/druidapi
|
||||
WORKDIR /home/jovyan/druidapi
|
||||
RUN pip install .
|
||||
WORKDIR /home/jovyan
|
||||
|
||||
# Import data generator and configuration file
|
||||
# Change permissions to allow import (requires sudo privileges)
|
||||
# WIP -- change to apache repo
|
||||
ADD https://raw.githubusercontent.com/shallada/druid/data-generator/examples/quickstart/jupyter-notebooks/data-generator/DruidDataDriver.py .
|
||||
ADD docker-jupyter/kafka_docker_config.json .
|
||||
RUN chmod 664 DruidDataDriver.py
|
||||
RUN chmod 664 kafka_docker_config.json
|
||||
USER jovyan
|
||||
|
||||
# Copy the Jupyter notebook tutorials from the
|
||||
# build directory to the image working directory
|
||||
COPY ./*ipynb .
|
||||
|
||||
# Add location of the data generator to PYTHONPATH
|
||||
ENV PYTHONPATH "${PYTHONPATH}:/home/jovyan"
|
||||
|
@ -1,12 +1,5 @@
|
||||
# Jupyter Notebook tutorials for Druid
|
||||
|
||||
If you are reading this in Jupyter, switch over to the [0-START-HERE](0-START-HERE.ipynb)
|
||||
notebook instead.
|
||||
|
||||
<!-- This README, the "0-START-HERE" notebook, and the tutorial-jupyter-index.md file in
|
||||
docs/tutorials share a lot of the same content. If you make a change in one place, update
|
||||
the other too. -->
|
||||
|
||||
<!--
|
||||
~ Licensed to the Apache Software Foundation (ASF) under one
|
||||
~ or more contributor license agreements. See the NOTICE file
|
||||
@ -26,70 +19,13 @@ the other too. -->
|
||||
~ under the License.
|
||||
-->
|
||||
|
||||
If you are reading this in Jupyter, switch over to the [0-START-HERE](0-START-HERE.ipynb)
|
||||
notebook instead.
|
||||
|
||||
You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These
|
||||
tutorials provide snippets of Python code that you can use to run calls against
|
||||
the Druid API to complete the tutorial.
|
||||
|
||||
## Prerequisites
|
||||
For information on prerequisites and getting started with the Jupyter-based tutorials,
|
||||
see [Jupyter Notebook tutorials](../../../docs/tutorials/tutorial-jupyter-index.md).
|
||||
|
||||
Make sure you meet the following requirements before starting the Jupyter-based tutorials:
|
||||
|
||||
- Python 3
|
||||
|
||||
- The `requests` package for Python. For example, you can install it with the following command:
|
||||
|
||||
```bash
|
||||
pip install requests
|
||||
```
|
||||
|
||||
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
|
||||
and Jupyter both try to use port `8888`, so start Jupyter on a different port.
|
||||
|
||||
- Install JupyterLab or Notebook:
|
||||
|
||||
```bash
|
||||
# Install JupyterLab
|
||||
pip install jupyterlab
|
||||
# Install Jupyter Notebook
|
||||
pip install notebook
|
||||
```
|
||||
- Start Jupyter using either JupyterLab
|
||||
```bash
|
||||
# Start JupyterLab on port 3001
|
||||
jupyter lab --port 3001
|
||||
```
|
||||
|
||||
Or using Jupyter Notebook
|
||||
```bash
|
||||
# Start Jupyter Notebook on port 3001
|
||||
jupyter notebook --port 3001
|
||||
```
|
||||
|
||||
- The Python API client for Druid. Clone the Druid repo if you haven't already.
|
||||
Go to your Druid source repo and install `druidapi` with the following commands:
|
||||
|
||||
```bash
|
||||
cd examples/quickstart/jupyter-notebooks/druidapi
|
||||
pip install .
|
||||
```
|
||||
|
||||
- An available Druid instance. You can use the [quickstart deployment](https://druid.apache.org/docs/latest/tutorials/index.html).
|
||||
The tutorials assume that you are using the quickstart, so no authentication or authorization
|
||||
is expected unless explicitly mentioned.
|
||||
|
||||
If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
|
||||
Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
|
||||
|
||||
```bash
|
||||
cd $DRUID_DEV
|
||||
./it.sh build
|
||||
./it.sh image
|
||||
./it.sh up <category>
|
||||
```
|
||||
|
||||
Replace `<catagory>` with one of the available integration test categories. See the integration
|
||||
test `README.md` for details.
|
||||
|
||||
## Continue in Jupyter
|
||||
|
||||
Start Jupyter (see above) and navigate to the "0-START-HERE" notebook for more information.
|
||||
|
@ -0,0 +1,60 @@
|
||||
<!--
|
||||
~ Licensed to the Apache Software Foundation (ASF) under one
|
||||
~ or more contributor license agreements. See the NOTICE file
|
||||
~ distributed with this work for additional information
|
||||
~ regarding copyright ownership. The ASF licenses this file
|
||||
~ to you under the Apache License, Version 2.0 (the
|
||||
~ "License"); you may not use this file except in compliance
|
||||
~ with the License. You may obtain a copy of the License at
|
||||
~
|
||||
~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~
|
||||
~ Unless required by applicable law or agreed to in writing,
|
||||
~ software distributed under the License is distributed on an
|
||||
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
~ KIND, either express or implied. See the License for the
|
||||
~ specific language governing permissions and limitations
|
||||
~ under the License.
|
||||
-->
|
||||
|
||||
# Jupyter in Docker
|
||||
|
||||
For details on getting started with Jupyter in Docker,
|
||||
see [Docker for Jupyter Notebook tutorials](../../../../docs/tutorials/tutorial-jupyter-docker.md).
|
||||
|
||||
## Contributing
|
||||
|
||||
### Rebuild Jupyter image
|
||||
|
||||
You may want to update the Jupyter image to access new or updated tutorial notebooks,
|
||||
include new Python packages, or update configuration files.
|
||||
|
||||
To build the custom Jupyter image locally:
|
||||
|
||||
1. Clone the Druid repo if you haven't already.
|
||||
2. Navigate to `examples/quickstart/jupyter-notebooks` in your Druid source repo.
|
||||
3. Edit the image definition in `Dockerfile`.
|
||||
4. Navigate to the `docker-jupyter` directory.
|
||||
5. Generate the new build using the following command:
|
||||
|
||||
```shell
|
||||
DRUID_VERSION=25.0.0 docker compose --profile all-services -f docker-compose-local.yaml up -d --build
|
||||
```
|
||||
|
||||
You can change the value of `DRUID_VERSION` or the profile used from the Docker Compose file.
|
||||
|
||||
### Update Docker Compose
|
||||
|
||||
The Docker Compose file defines a multi-container application that allows you to run
|
||||
the custom Jupyter Notebook container, Apache Druid, and Apache Kafka.
|
||||
|
||||
Any changes to `docker-compose.yaml` should also be made to `docker-compose-local.yaml`
|
||||
and vice versa. These files should be identical except that `docker-compose.yaml`
|
||||
contains an `image` attribute while `docker-compose-local.yaml` contains a `build` subsection.
|
||||
|
||||
If you update `docker-compose.yaml`, recreate the ZIP file using the following command:
|
||||
|
||||
```bash
|
||||
zip tutorial-jupyter-docker.zip docker-compose.yaml environment
|
||||
```
|
||||
|
@ -0,0 +1,172 @@
|
||||
#
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
#
|
||||
---
|
||||
version: "2.2"
|
||||
|
||||
volumes:
|
||||
metadata_data: {}
|
||||
middle_var: {}
|
||||
historical_var: {}
|
||||
broker_var: {}
|
||||
coordinator_var: {}
|
||||
router_var: {}
|
||||
druid_shared: {}
|
||||
|
||||
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:latest
|
||||
container_name: postgres
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- metadata_data:/var/lib/postgresql/data
|
||||
environment:
|
||||
- POSTGRES_PASSWORD=FoolishPassword
|
||||
- POSTGRES_USER=druid
|
||||
- POSTGRES_DB=druid
|
||||
|
||||
# Need 3.5 or later for container nodes
|
||||
zookeeper:
|
||||
image: zookeeper:latest
|
||||
container_name: zookeeper
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
ports:
|
||||
- "2181:2181"
|
||||
environment:
|
||||
- ZOO_MY_ID=1
|
||||
- ALLOW_ANONYMOUS_LOGIN=yes
|
||||
|
||||
kafka:
|
||||
image: bitnami/kafka:latest
|
||||
container_name: kafka-broker
|
||||
profiles: ["all-services"]
|
||||
ports:
|
||||
# To learn about configuring Kafka for access across networks see
|
||||
# https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
|
||||
- "9092:9092"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
environment:
|
||||
- KAFKA_BROKER_ID=1
|
||||
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
|
||||
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
|
||||
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
- ALLOW_PLAINTEXT_LISTENER=yes
|
||||
|
||||
coordinator:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: coordinator
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- druid_shared:/opt/shared
|
||||
- coordinator_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
ports:
|
||||
- "8081:8081"
|
||||
command:
|
||||
- coordinator
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
broker:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: broker
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- broker_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8082:8082"
|
||||
command:
|
||||
- broker
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
historical:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: historical
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- druid_shared:/opt/shared
|
||||
- historical_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8083:8083"
|
||||
command:
|
||||
- historical
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
middlemanager:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: middlemanager
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- druid_shared:/opt/shared
|
||||
- middle_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8091:8091"
|
||||
- "8100-8105:8100-8105"
|
||||
command:
|
||||
- middleManager
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
router:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: router
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- router_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8888:8888"
|
||||
command:
|
||||
- router
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
jupyter:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile
|
||||
container_name: jupyter
|
||||
profiles: ["jupyter", "all-services"]
|
||||
environment:
|
||||
DOCKER_STACKS_JUPYTER_CMD: "notebook"
|
||||
NOTEBOOK_ARGS: "--NotebookApp.token=''"
|
||||
ports:
|
||||
- "${JUPYTER_PORT:-8889}:8888"
|
||||
volumes:
|
||||
- ./notebooks:/home/jovyan/work
|
@ -0,0 +1,170 @@
|
||||
#
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
#
|
||||
---
|
||||
version: "2.2"
|
||||
|
||||
volumes:
|
||||
metadata_data: {}
|
||||
middle_var: {}
|
||||
historical_var: {}
|
||||
broker_var: {}
|
||||
coordinator_var: {}
|
||||
router_var: {}
|
||||
druid_shared: {}
|
||||
|
||||
|
||||
services:
|
||||
postgres:
|
||||
image: postgres:latest
|
||||
container_name: postgres
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- metadata_data:/var/lib/postgresql/data
|
||||
environment:
|
||||
- POSTGRES_PASSWORD=FoolishPassword
|
||||
- POSTGRES_USER=druid
|
||||
- POSTGRES_DB=druid
|
||||
|
||||
# Need 3.5 or later for container nodes
|
||||
zookeeper:
|
||||
image: zookeeper:latest
|
||||
container_name: zookeeper
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
ports:
|
||||
- "2181:2181"
|
||||
environment:
|
||||
- ZOO_MY_ID=1
|
||||
- ALLOW_ANONYMOUS_LOGIN=yes
|
||||
|
||||
kafka:
|
||||
image: bitnami/kafka:latest
|
||||
container_name: kafka-broker
|
||||
profiles: ["all-services"]
|
||||
ports:
|
||||
# To learn about configuring Kafka for access across networks see
|
||||
# https://www.confluent.io/blog/kafka-client-cannot-connect-to-broker-on-aws-on-docker-etc/
|
||||
- "9092:9092"
|
||||
depends_on:
|
||||
- zookeeper
|
||||
environment:
|
||||
- KAFKA_BROKER_ID=1
|
||||
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
|
||||
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
|
||||
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
|
||||
- ALLOW_PLAINTEXT_LISTENER=yes
|
||||
|
||||
coordinator:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: coordinator
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- druid_shared:/opt/shared
|
||||
- coordinator_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
ports:
|
||||
- "8081:8081"
|
||||
command:
|
||||
- coordinator
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
broker:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: broker
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- broker_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8082:8082"
|
||||
command:
|
||||
- broker
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
historical:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: historical
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- druid_shared:/opt/shared
|
||||
- historical_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8083:8083"
|
||||
command:
|
||||
- historical
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
middlemanager:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: middlemanager
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- druid_shared:/opt/shared
|
||||
- middle_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8091:8091"
|
||||
- "8100-8105:8100-8105"
|
||||
command:
|
||||
- middleManager
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
router:
|
||||
image: apache/druid:${DRUID_VERSION}
|
||||
container_name: router
|
||||
profiles: ["druid-jupyter", "all-services"]
|
||||
volumes:
|
||||
- router_var:/opt/druid/var
|
||||
depends_on:
|
||||
- zookeeper
|
||||
- postgres
|
||||
- coordinator
|
||||
ports:
|
||||
- "8888:8888"
|
||||
command:
|
||||
- router
|
||||
env_file:
|
||||
- environment
|
||||
|
||||
jupyter:
|
||||
image: imply/druid-notebook:latest
|
||||
container_name: jupyter
|
||||
profiles: ["jupyter", "all-services"]
|
||||
environment:
|
||||
DOCKER_STACKS_JUPYTER_CMD: "notebook"
|
||||
NOTEBOOK_ARGS: "--NotebookApp.token=''"
|
||||
ports:
|
||||
- "${JUPYTER_PORT:-8889}:8888"
|
||||
volumes:
|
||||
- ./notebooks:/home/jovyan/work
|
@ -0,0 +1,56 @@
|
||||
#
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
#
|
||||
|
||||
# Java tuning
|
||||
#DRUID_XMX=1g
|
||||
#DRUID_XMS=1g
|
||||
#DRUID_MAXNEWSIZE=250m
|
||||
#DRUID_NEWSIZE=250m
|
||||
#DRUID_MAXDIRECTMEMORYSIZE=6172m
|
||||
DRUID_SINGLE_NODE_CONF=micro-quickstart
|
||||
|
||||
druid_emitter_logging_logLevel=debug
|
||||
|
||||
druid_extensions_loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-multi-stage-query", "druid-kafka-indexing-service"]
|
||||
|
||||
druid_zk_service_host=zookeeper
|
||||
|
||||
druid_metadata_storage_host=
|
||||
druid_metadata_storage_type=postgresql
|
||||
druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
|
||||
druid_metadata_storage_connector_user=druid
|
||||
druid_metadata_storage_connector_password=FoolishPassword
|
||||
|
||||
druid_coordinator_balancer_strategy=cachingCost
|
||||
|
||||
druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
|
||||
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiB
|
||||
|
||||
|
||||
|
||||
druid_storage_type=local
|
||||
druid_storage_storageDirectory=/opt/shared/segments
|
||||
druid_indexer_logs_type=file
|
||||
druid_indexer_logs_directory=/opt/shared/indexing-logs
|
||||
|
||||
druid_processing_numThreads=2
|
||||
druid_processing_numMergeBuffers=2
|
||||
|
||||
DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>
|
||||
|
@ -0,0 +1,90 @@
|
||||
{
|
||||
"target": {
|
||||
"type": "kafka",
|
||||
"endpoint": "kafka:9092",
|
||||
"topic": "social_media"
|
||||
},
|
||||
"emitters": [
|
||||
{
|
||||
"name": "example_record_1",
|
||||
"dimensions": [
|
||||
{
|
||||
"type": "enum",
|
||||
"name": "username",
|
||||
"values": ["willow", "mia", "leon", "milton", "miette", "gus", "jojo", "rocket"],
|
||||
"cardinality_distribution": {
|
||||
"type": "uniform",
|
||||
"min": 0,
|
||||
"max": 7
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "string",
|
||||
"name": "post_title",
|
||||
"length_distribution": {"type": "uniform", "min": 1, "max": 140},
|
||||
"cardinality": 0,
|
||||
"chars": "abcdefghijklmnopqrstuvwxyz0123456789_ABCDEFGHIJKLMNOPQRSTUVWXYZ!';:,."
|
||||
},
|
||||
{
|
||||
"type": "int",
|
||||
"name": "views",
|
||||
"distribution": {
|
||||
"type": "exponential",
|
||||
"mean": 10000
|
||||
},
|
||||
"cardinality": 0
|
||||
},
|
||||
{
|
||||
"type": "int",
|
||||
"name": "upvotes",
|
||||
"distribution": {
|
||||
"type": "normal",
|
||||
"mean": 70,
|
||||
"stddev": 20
|
||||
},
|
||||
"cardinality": 0
|
||||
},
|
||||
{
|
||||
"type": "int",
|
||||
"name": "comments",
|
||||
"distribution": {
|
||||
"type": "normal",
|
||||
"mean": 10,
|
||||
"stddev": 5
|
||||
},
|
||||
"cardinality": 0
|
||||
},
|
||||
{
|
||||
"type": "enum",
|
||||
"name": "edited",
|
||||
"values": ["True","False"],
|
||||
"cardinality_distribution": {
|
||||
"type": "uniform",
|
||||
"min": 0,
|
||||
"max": 1
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"interarrival": {
|
||||
"type": "constant",
|
||||
"value": 1
|
||||
},
|
||||
"states": [
|
||||
{
|
||||
"name": "state_1",
|
||||
"emitter": "example_record_1",
|
||||
"delay": {
|
||||
"type": "constant",
|
||||
"value": 1
|
||||
},
|
||||
"transitions": [
|
||||
{
|
||||
"next": "state_1",
|
||||
"probability": 1.0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
Binary file not shown.
782
examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
Normal file
782
examples/quickstart/jupyter-notebooks/kafka-tutorial.ipynb
Normal file
File diff suppressed because one or more lines are too long
@ -27,6 +27,7 @@
|
||||
"tutorials/tutorial-sql-query-view",
|
||||
"tutorials/tutorial-unnest-arrays",
|
||||
"tutorials/tutorial-jupyter-index",
|
||||
"tutorials/tutorial-jupyter-docker",
|
||||
"tutorials/tutorial-jdbc"
|
||||
],
|
||||
"Design": [
|
||||
|
Loading…
x
Reference in New Issue
Block a user