mirror of https://github.com/apache/druid.git
164 lines
7.8 KiB
Plaintext
164 lines
7.8 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e415d732",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Jupyter Notebook tutorials for Druid\n",
|
|
"\n",
|
|
"<!-- This README and the tutorial-jupyter-index.md file in docs/tutorials share a lot of the same content.\n",
|
|
"If you make a change in one place, update the other too. -->\n",
|
|
"\n",
|
|
"<!--\n",
|
|
" ~ Licensed to the Apache Software Foundation (ASF) under one\n",
|
|
" ~ or more contributor license agreements. See the NOTICE file\n",
|
|
" ~ distributed with this work for additional information\n",
|
|
" ~ regarding copyright ownership. The ASF licenses this file\n",
|
|
" ~ to you under the Apache License, Version 2.0 (the\n",
|
|
" ~ \"License\"); you may not use this file except in compliance\n",
|
|
" ~ with the License. You may obtain a copy of the License at\n",
|
|
" ~\n",
|
|
" ~ http://www.apache.org/licenses/LICENSE-2.0\n",
|
|
" ~\n",
|
|
" ~ Unless required by applicable law or agreed to in writing,\n",
|
|
" ~ software distributed under the License is distributed on an\n",
|
|
" ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
|
|
" ~ KIND, either express or implied. See the License for the\n",
|
|
" ~ specific language governing permissions and limitations\n",
|
|
" ~ under the License.\n",
|
|
" -->\n",
|
|
"\n",
|
|
"You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These\n",
|
|
"tutorials provide snippets of Python code that you can use to run calls against\n",
|
|
"the Druid API to complete the tutorial."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "60015702",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites\n",
|
|
"\n",
|
|
"Before starting the Jupyter-based tutorials, make sure you meet the requirements listed in this section.\n",
|
|
"The simplest way to get started is to use Docker. In this case, you only need to set up Docker Desktop.\n",
|
|
"For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
|
|
"\n",
|
|
"Otherwise, you need the following:\n",
|
|
"- An available Druid instance. You can use the local quickstart configuration\n",
|
|
" described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
|
|
" The tutorials assume that you are using the quickstart, so no authentication or authorization\n",
|
|
" is expected unless explicitly mentioned.\n",
|
|
"- Python 3.7 or later\n",
|
|
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
|
|
" and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
|
|
"- The `requests` Python package\n",
|
|
"- The `druidapi` Python package\n",
|
|
"\n",
|
|
"For setup instructions, see [Tutorial setup without using Docker](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html#tutorial-setup-without-using-docker).\n",
|
|
"Individual tutorials may require additional Python packages, such as for visualization or streaming ingestion.\n",
|
|
"\n",
|
|
"## Simple Druid API\n",
|
|
"\n",
|
|
"The `druidapi` Python package is a REST API for Druid.\n",
|
|
"One of the notebooks shows how to use the Druid REST API. The others focus on other\n",
|
|
"topics and use a simple set of Python wrappers around the underlying REST API. The\n",
|
|
"wrappers reside in the `druidapi` package within this directory. While the package\n",
|
|
"can be used in any Python program, the key purpose, at present, is to support these\n",
|
|
"notebooks. See the [Introduction to the Druid Python API](../01-introduction/01-druidapi-package-intro.ipynb)\n",
|
|
"for an overview of the Python API."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d9e18342",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Tutorials\n",
|
|
"\n",
|
|
"If you run the [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html), all the notebooks are included.\n",
|
|
"\n",
|
|
"Otherwise, you can find the notebooks in the [apache/druid repo](\n",
|
|
"https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
|
|
"You can either clone the repo or download the notebooks you want individually.\n",
|
|
"\n",
|
|
"The links that follow are the raw GitHub URLs, so you can use them to download the\n",
|
|
"notebook directly, such as with `wget`, or manually through your web browser. Note\n",
|
|
"that if you save the file from your web browser, make sure to remove the `.txt` extension.\n",
|
|
"\n",
|
|
"- [Introduction to the Druid REST API](../04-api/00-getting-started.ipynb) walks you through some of the\n",
|
|
" basics related to the Druid REST API and several endpoints.\n",
|
|
"- [Introduction to the Druid Python API](../01-introduction/01-druidapi-package-intro.ipynb) walks you through some of the\n",
|
|
" basics related to the Druid API using the Python wrapper API.\n",
|
|
"- [Learn the basics of Druid SQL](../03-query/00-using-sql-with-druidapi.ipynb) introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
|
|
"- [Learn to use the Data Generator](./02-datagen-intro.ipynb) gets you started with streaming and batch file data generation for testing of any data schema.\n",
|
|
"- [Ingest and query data from Apache Kafka](../02-ingestion/01-streaming-from-kafka.ipynb) walks you through ingesting an event stream from Kafka."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1a4b986a",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Contributing\n",
|
|
"\n",
|
|
"If you build a Jupyter tutorial, you need to do a few things to add it to the docs\n",
|
|
"in addition to saving the notebook in this directory. The process requires two PRs to the repo.\n",
|
|
"\n",
|
|
"For the first PR, do the following:\n",
|
|
"\n",
|
|
"1. Depending on the goal of the notebook, you may want to clear the outputs from your notebook\n",
|
|
" before you make the PR. You can use the following command:\n",
|
|
"\n",
|
|
" ```bash\n",
|
|
" jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace ./path/to/notebook/notebookName.ipynb\n",
|
|
" ```\n",
|
|
" \n",
|
|
" This can also be done in Jupyter Notebook itself: `Kernel` → `Restart & Clear Output`\n",
|
|
"\n",
|
|
"2. Create the PR as you normally would. Make sure to note that this PR is the one that\n",
|
|
" contains only the Jupyter notebook and that there will be a subsequent PR that updates\n",
|
|
" related pages.\n",
|
|
"\n",
|
|
"3. After this first PR is merged, grab the \"raw\" URL for the file from GitHub. For example,\n",
|
|
" navigate to the file in the GitHub web UI and select **Raw**. Use the URL for this in the\n",
|
|
" second PR as the download link.\n",
|
|
"\n",
|
|
"For the second PR, do the following:\n",
|
|
"\n",
|
|
"1. Update the list of [Tutorials](#tutorials) on this page and in the\n",
|
|
" [Jupyter tutorial index page](../../../docs/tutorials/tutorial-jupyter-index.md#tutorials)\n",
|
|
" in the `docs/tutorials` directory.\n",
|
|
"\n",
|
|
"2. Update `tutorial-jupyter-index.md` and provide the URL to the raw version of the file\n",
|
|
" that becomes available after the first PR is merged.\n",
|
|
"\n",
|
|
"Note that you can skip the second PR, if you just copy the prefix link from one of the\n",
|
|
"existing notebook links when doing your first PR."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.4"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|