Python Druid API for use in notebooks (#13787)

Python Druid API for use in notebooks Revises existing notebooks and readme to reference the new API. Notebook to explain the new API. Split README into a console version and a notebook version to work around lack of a nice display for md files. Update the REST API notebook to use simpler Requests calls Converted the SQL tutorial to use the Python library README file, converted to using properties --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-03-04 18:25:19 -08:00 · 2023-03-04 18:25:19 -08:00 · a580aca551
parent b68180fc44
commit a580aca551
25 changed files with 4550 additions and 500 deletions
--- a/.gitignore
+++ b/.gitignore
@ -29,3 +29,4 @@ integration-tests/gen-scripts/
 /bin/
 *.hprof
 **/.ipynb_checkpoints/
 *.pyc
--- a/docs/tutorials/tutorial-jupyter-index.md
+++ b/docs/tutorials/tutorial-jupyter-index.md
@ -22,15 +22,19 @@ title: "Jupyter Notebook tutorials"
  ~ under the License.
  -->
-<!-- tutorial-jupyter-index.md and examples/quickstart/juptyer-notebooks/README.md share a lot of the same content. If you make a change in one place, update the other too. -->
+<!-- tutorial-jupyter-index.md and examples/quickstart/juptyer-notebooks/README.md
    share a lot of the same content. If you make a change in one place, update the other
    too. -->
-You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These tutorials provide snippets of Python code that you can use to run calls against the Druid API to complete the tutorial.
+You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These
 tutorials provide snippets of Python code that you can use to run calls against
 the Druid API to complete the tutorial.
 ## Prerequisites
 Make sure you meet the following requirements before starting the Jupyter-based tutorials:
- Python 3 
+- Python 3.7 or later
 - The `requests` package for Python. For example, you can install it with the following command:
@ -38,7 +42,9 @@ Make sure you meet the following requirements before starting the Jupyter-based
   pip3 install requests
   ```
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid and Jupyter both try to use port `8888`, so start Jupyter on a different port.
+- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
  and Jupyter both try to use port `8888`, so start Jupyter on a different port.
  - Install JupyterLab or Notebook:
@ -48,19 +54,44 @@ Make sure you meet the following requirements before starting the Jupyter-based
    # Install Jupyter Notebook
    pip3 install notebook
    ```
-  - Start JupyterLab
+  - Start Jupyter using either JupyterLab
    ```bash
    # Start JupyterLab on port 3001
    jupyter lab --port 3001
    ```
-   - Alternatively, start Jupyter Notebook
+
    Or using Jupyter Notebook
    ```bash
    # Start Jupyter Notebook on port 3001
    jupyter notebook --port 3001
    ```
- An available Druid instance. You can use the [Quickstart (local)](./index.md) instance. The tutorials assume that you are using the quickstart, so no authentication or authorization is expected unless explicitly mentioned.
+- An available Druid instance. You can use the [Quickstart (local)](./index.md) instance. The tutorials
  assume that you are using the quickstart, so no authentication or authorization
  is expected unless explicitly mentioned.
  If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
  Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
  ```bash
  cd $DRUID_DEV
  ./it.sh build
  ./it.sh image
  ./it.sh up <category>
  ```
  Replace `<category>` with one of the available integration test categories. See the integration
  test `README.md` for details.
 ## Simple Druid API
 One of the notebooks shows how to use the Druid REST API. The others focus on other
 topics and use a simple set of Python wrappers around the underlying REST API. The
 wrappers reside in the `druidapi` package within the notebooks directory. While the package
 can be used in any Python program, the key purpose, at present, is to support these
 notebooks. See the [Introduction to the Druid Python API]
 (https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/python-api-tutorial.ipynb)
 for an overview of the Python API.
 ## Tutorials
@ -68,5 +99,10 @@ The notebooks are located in the [apache/druid repo](https://github.com/apache/d
 The links that follow are the raw GitHub URLs, so you can use them to download the notebook directly, such as with `wget`, or manually through your web browser. Note that if you save the file from your web browser, make sure to remove the `.txt` extension.
- [Introduction to the Druid API](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb) walks you through some of the basics related to the Druid API and several endpoints.
+- [Introduction to the Druid REST API](
  https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb)
  walks you through some of the basics related to the Druid REST API and several endpoints.
 - [Introduction to the Druid Python API](
  https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb)
  walks you through some of the basics related to the Druid API using the Python wrapper API.
 - [Introduction to Druid SQL](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb) covers the basics of Druid SQL.
--- a/examples/quickstart/jupyter-notebooks/-START
+++ b/examples/quickstart/jupyter-notebooks/-START
@ -0,0 +1,156 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e415d732",
   "metadata": {},
   "source": [
    "# Jupyter Notebook tutorials for Druid\n",
    "\n",
    "<!-- This README and the tutorial-jupyter-index.md file in docs/tutorials share a lot of the same content.\n",
    "If you make a change in one place, update the other too. -->\n",
    "\n",
    "<!--\n",
    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
    "  ~ or more contributor license agreements.  See the NOTICE file\n",
    "  ~ distributed with this work for additional information\n",
    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
    "  ~ to you under the Apache License, Version 2.0 (the\n",
    "  ~ \"License\"); you may not use this file except in compliance\n",
    "  ~ with the License.  You may obtain a copy of the License at\n",
    "  ~\n",
    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
    "  ~\n",
    "  ~ Unless required by applicable law or agreed to in writing,\n",
    "  ~ software distributed under the License is distributed on an\n",
    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
    "  ~ KIND, either express or implied.  See the License for the\n",
    "  ~ specific language governing permissions and limitations\n",
    "  ~ under the License.\n",
    "  -->\n",
    "\n",
    "You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These\n",
    "tutorials provide snippets of Python code that you can use to run calls against\n",
    "the Druid API to complete the tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60015702",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "To get this far, you've installed Python 3 and Jupyter Notebook. Make sure you meet the following requirements before starting the Jupyter-based tutorials:\n",
    "\n",
    "- The `requests` package for Python. For example, you can install it with the following command:\n",
    "\n",
    "   ```bash\n",
    "   pip3 install requests\n",
    "   ````\n",
    "\n",
    "- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
    "  and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
    "\n",
    "- An available Druid instance. You can use the local quickstart configuration\n",
    "  described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
    "  The tutorials assume that you are using the quickstart, so no authentication or authorization\n",
    "  is expected unless explicitly mentioned.\n",
    "\n",
    "## Simple Druid API\n",
    "\n",
    "One of the notebooks shows how to use the Druid REST API. The others focus on other\n",
    "topics and use a simple set of Python wrappers around the underlying REST API. The\n",
    "wrappers reside in the `druidapi` package within this directory. While the package\n",
    "can be used in any Python program, the key purpose, at present, is to support these\n",
    "notebooks. See the [Introduction to the Druid Python API](Python_API_Tutorial.ipynb)\n",
    "for an overview of the Python API."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9e18342",
   "metadata": {},
   "source": [
    "## Tutorials\n",
    "\n",
    "The notebooks are located in the [apache/druid repo](\n",
    "https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
    "You can either clone the repo or download the notebooks you want individually.\n",
    "\n",
    "The links that follow are the raw GitHub URLs, so you can use them to download the\n",
    "notebook directly, such as with `wget`, or manually through your web browser. Note\n",
    "that if you save the file from your web browser, make sure to remove the `.txt` extension.\n",
    "\n",
    "- [Introduction to the Druid REST API](api-tutorial.ipynb) walks you through some of the\n",
    "  basics related to the Druid REST API and several endpoints.\n",
    "- [Introduction to the Druid Python API](Python_API_Tutorial.ipynb) walks you through some of the\n",
    "  basics related to the Druid API using the Python wrapper API.\n",
    "- [Learn the basics of Druid SQL](sql-tutorial.ipynb) introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a4b986a",
   "metadata": {},
   "source": [
    "## Contributing\n",
    "\n",
    "If you build a Jupyter tutorial, you need to do a few things to add it to the docs\n",
    "in addition to saving the notebook in this directory. The process requires two PRs to the repo.\n",
    "\n",
    "For the first PR, do the following:\n",
    "\n",
    "1. Depending on the goal of the notebook, you may want to clear the outputs from your notebook\n",
    "   before you make the PR. You can use the following command:\n",
    "\n",
    "   ```bash\n",
    "   jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace ./path/to/notebook/notebookName.ipynb\n",
    "   ```\n",
    "   \n",
    "   This can also be done in Jupyter Notebook itself: `Kernel` &rarr; `Restart & Clear Output`\n",
    "\n",
    "2. Create the PR as you normally would. Make sure to note that this PR is the one that\n",
    "   contains only the Jupyter notebook and that there will be a subsequent PR that updates\n",
    "   related pages.\n",
    "\n",
    "3. After this first PR is merged, grab the \"raw\" URL for the file from GitHub. For example,\n",
    "   navigate to the file in the GitHub web UI and select **Raw**. Use the URL for this in the\n",
    "   second PR as the download link.\n",
    "\n",
    "For the second PR, do the following:\n",
    "\n",
    "1. Update the list of [Tutorials](#tutorials) on this page and in the\n",
    "   [Jupyter tutorial index page](../../../docs/tutorials/tutorial-jupyter-index.md#tutorials)\n",
    "   in the `docs/tutorials` directory.\n",
    "\n",
    "2. Update `tutorial-jupyter-index.md` and provide the URL to the raw version of the file\n",
    "   that becomes available after the first PR is merged.\n",
    "\n",
    "Note that you can skip the second PR, if you just copy the prefix link from one of the\n",
    "existing notebook links when doing your first PR."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb
+++ b/examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb
@ -0,0 +1,743 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ce2efaaa",
   "metadata": {},
   "source": [
    "# Learn the Druid Python API\n",
    "\n",
    "<!--\n",
    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
    "  ~ or more contributor license agreements.  See the NOTICE file\n",
    "  ~ distributed with this work for additional information\n",
    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
    "  ~ to you under the Apache License, Version 2.0 (the\n",
    "  ~ \"License\"); you may not use this file except in compliance\n",
    "  ~ with the License.  You may obtain a copy of the License at\n",
    "  ~\n",
    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
    "  ~\n",
    "  ~ Unless required by applicable law or agreed to in writing,\n",
    "  ~ software distributed under the License is distributed on an\n",
    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
    "  ~ KIND, either express or implied.  See the License for the\n",
    "  ~ specific language governing permissions and limitations\n",
    "  ~ under the License.\n",
    "  -->\n",
    "\n",
    "This notebook provides a quick introduction to the Python wrapper around the [Druid REST API](api-tutorial.ipynb). This notebook assumes you are familiar with the basics of the REST API, and the [set of operations which Druid provides](https://druid.apache.org/docs/latest/operations/api-reference.html). This tutorial focuses on using Python to access those APIs rather than explaining the APIs themselves. The APIs themselves are covered in other notebooks that use the Python API.\n",
    "\n",
    "The Druid Python API is primarily intended to help with these notebook tutorials. It can also be used in your own ad-hoc notebooks, or in a regular Python program.\n",
    "\n",
    "The Druid Python API is a work in progress. The Druid team adds API wrappers as needed for the notebook tutorials. If you find you need additional wrappers, please feel free to add them, and post a PR to Apache Druid with your additions.\n",
    "\n",
    "The API provides two levels of functions. Most are simple wrappers around Druid's REST APIs. Others add additional code to make the API easier to use. The SQL query interface is a prime example: extra code translates a simple SQL query into Druid's `SQLQuery` object and interprets the results into a form that can be displayed in a notebook.\n",
    "\n",
    "This notebook contains sample output to allow it to function as a reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & Clear Output` menu command to clear the sample output.\n",
    "\n",
    "Start by importing the `druidapi` package from the same folder as this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d90ca5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import druidapi"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb68a838",
   "metadata": {},
   "source": [
    "Next, connect to your cluster by providing the router endpoint. The code assumes the cluster is on your local machine, using the default port. Go ahead and change this if your setup is different.\n",
    "\n",
    "The API uses the router to forward messages to each of Druid's services so that you don't have to keep track of the host and port for each service.\n",
    "\n",
    "The `jupyter_client()` method waits for the cluster to be ready and sets up the client to display tables and messages as HTML. To use this code without waiting and without HTML formatting, use the `client()` method instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae601081",
   "metadata": {},
   "outputs": [],
   "source": [
    "druid = druidapi.jupyter_client('http://localhost:8888')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b4e774b",
   "metadata": {},
   "source": [
    "## Status Client\n",
    "\n",
    "The SDK groups Druid REST API calls into categories, with a client for each. Start with the status client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff16fc3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "status_client = druid.status"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be992774",
   "metadata": {},
   "source": [
    "Use the Python `help()` function to learn what methods are avaialble."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "03f26417",
   "metadata": {},
   "outputs": [],
   "source": [
    "help(status_client)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e803c9fe",
   "metadata": {},
   "source": [
    "Check the version of your cluster. Some of these notebooks illustrate newer features available only on specific versions of Druid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2faa0d81",
   "metadata": {},
   "outputs": [],
   "source": [
    "status_client.version"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d78a6c35",
   "metadata": {},
   "source": [
    "You can also check which extensions are loaded in your cluster. Some notebooks require specific extensions to be available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1001f412",
   "metadata": {},
   "outputs": [],
   "source": [
    "status_client.properties['druid.extensions.loadList']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "012b2e61",
   "metadata": {},
   "source": [
    "## Display Client\n",
    "\n",
    "The display client performs Druid operations, then formats the results for display in a notebook. Running SQL queries in a notebook is easy with the display client.\n",
    "\n",
    "When run outside a notebook, the display client formats results as text. The display client is the most convenient way to work with Druid in a notebook. Most operations also have a form that returns results as Python objects rather than displaying them. Use these methods if you write code to work with the results. Here the goal is just to interact with Druid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f867f1f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "display = druid.display"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d051bc5e",
   "metadata": {},
   "source": [
    "Start by getting a list of schemas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd8387e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "display.schemas()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8261ab0",
   "metadata": {},
   "source": [
    "Then, retreive the tables (or datasources) within any schema."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64dcb46a",
   "metadata": {},
   "outputs": [],
   "source": [
    "display.tables('INFORMATION_SCHEMA')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff311595",
   "metadata": {},
   "source": [
    "The above shows the list of datasources by default. You'll get an empty result if you have no datasources yet."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "616770ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "display.tables()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7392e484",
   "metadata": {},
   "source": [
    "You can easily run a query and show the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c649eef",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = '''\n",
    "SELECT TABLE_NAME\n",
    "FROM INFORMATION_SCHEMA.TABLES\n",
    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
    "'''\n",
    "display.sql(sql)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6c4e1d4",
   "metadata": {},
   "source": [
    "The query above showed the same results as `tables()`. That is not surprising: `tables()` just runs this query for you."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f414d145",
   "metadata": {},
   "source": [
    "## SQL Client\n",
    "\n",
    "While the display client is handy for simple queries, sometimes you need more control, or want to work with the data returned from a query. For this you use the SQL client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9951e976",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_client = druid.sql"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b944084",
   "metadata": {},
   "source": [
    "The SQL client allows you create a SQL request object that enables passing context parameters and query parameters. Druid will work out the query parameter type based on the Python type. Use the display client to show the query results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd559827",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = '''\n",
    "SELECT TABLE_NAME\n",
    "FROM INFORMATION_SCHEMA.TABLES\n",
    "WHERE TABLE_SCHEMA = ?\n",
    "'''\n",
    "req = sql_client.sql_request(sql)\n",
    "req.add_parameter('INFORMATION_SCHEMA')\n",
    "req.add_context(\"someParameter\", \"someValue\")\n",
    "display.sql(req)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "937dc6b1",
   "metadata": {},
   "source": [
    "The request has other features for advanced use cases: see the code for details. The query API actually returns a sql response object. Use this if you want to get the values directly, work with the schema, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd7a1827",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = '''\n",
    "SELECT TABLE_NAME\n",
    "FROM INFORMATION_SCHEMA.TABLES\n",
    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
    "'''\n",
    "resp = sql_client.sql_query(sql)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fe6a749",
   "metadata": {},
   "outputs": [],
   "source": [
    "col1 = resp.schema[0]\n",
    "print(col1.name, col1.sql_type, col1.druid_type)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41d27bb1",
   "metadata": {},
   "outputs": [],
   "source": [
    "resp.rows"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "481af1f2",
   "metadata": {},
   "source": [
    "The `show()` method uses this information for format an HTML table to present the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8dba807b",
   "metadata": {},
   "outputs": [],
   "source": [
    "resp.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99f8db7b",
   "metadata": {},
   "source": [
    "The display and SQL clients are intened for exploratory queries. The [pydruid](https://pythonhosted.org/pydruid/) library provides a robust way to run native queries, to run SQL queries, and to convert the results to various formats."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e3be017",
   "metadata": {},
   "source": [
    "## MSQ Ingestion\n",
    "\n",
    "The SQL client also performs MSQ-based ingestion using `INSERT` or `REPLACE` statements. Use the extension check above to ensure that `druid-multi-stage-query` is loaded in Druid 26. (Later versions may have MSQ built in.)\n",
    "\n",
    "An MSQ query is run using a different API: `task()`. This API returns a response object that describes the Overlord task which runs the MSQ query. For tutorials, data is usually small enough you can wait for the ingestion to complete. Do that with the `run_task()` call which handles the waiting. To illustrate, here is a query that ingests a subset of columns, and includes a few data clean-up steps:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10f1e451",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = '''\n",
    "REPLACE INTO \"myWiki1\" OVERWRITE ALL\n",
    "SELECT\n",
    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
    "  namespace,\n",
    "  page,\n",
    "  channel,\n",
    "  \"user\",\n",
    "  countryName,\n",
    "  CASE WHEN isRobot = 'true' THEN 1 ELSE 0 END AS isRobot,\n",
    "  \"added\",\n",
    "  \"delta\",\n",
    "  CASE WHEN isNew = 'true' THEN 1 ELSE 0 END AS isNew,\n",
    "  CAST(\"deltaBucket\" AS DOUBLE) AS deltaBucket,\n",
    "  \"deleted\"\n",
    "FROM TABLE(\n",
    "  EXTERN(\n",
    "    '{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
    "    '{\"type\":\"json\"}',\n",
    "    '[{\"name\":\"isRobot\",\"type\":\"string\"},{\"name\":\"channel\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"string\"},{\"name\":\"flags\",\"type\":\"string\"},{\"name\":\"isUnpatrolled\",\"type\":\"string\"},{\"name\":\"page\",\"type\":\"string\"},{\"name\":\"diffUrl\",\"type\":\"string\"},{\"name\":\"added\",\"type\":\"long\"},{\"name\":\"comment\",\"type\":\"string\"},{\"name\":\"commentLength\",\"type\":\"long\"},{\"name\":\"isNew\",\"type\":\"string\"},{\"name\":\"isMinor\",\"type\":\"string\"},{\"name\":\"delta\",\"type\":\"long\"},{\"name\":\"isAnonymous\",\"type\":\"string\"},{\"name\":\"user\",\"type\":\"string\"},{\"name\":\"deltaBucket\",\"type\":\"long\"},{\"name\":\"deleted\",\"type\":\"long\"},{\"name\":\"namespace\",\"type\":\"string\"},{\"name\":\"cityName\",\"type\":\"string\"},{\"name\":\"countryName\",\"type\":\"string\"},{\"name\":\"regionIsoCode\",\"type\":\"string\"},{\"name\":\"metroCode\",\"type\":\"long\"},{\"name\":\"countryIsoCode\",\"type\":\"string\"},{\"name\":\"regionName\",\"type\":\"string\"}]'\n",
    "  )\n",
    ")\n",
    "PARTITIONED BY DAY\n",
    "CLUSTERED BY namespace, page\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d752b1d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_client.run_task(sql)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef4512f8",
   "metadata": {},
   "source": [
    "MSQ reports task completion as soon as ingestion is done. However, it takes a while for Druid to load the resulting segments. Wait for the table to become ready."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37fcedf2",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_client.wait_until_ready('myWiki1')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11d9c95a",
   "metadata": {},
   "source": [
    "`describe_table()` lists the columns in a table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b662697b",
   "metadata": {},
   "outputs": [],
   "source": [
    "display.table('myWiki1')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "936f57fb",
   "metadata": {},
   "source": [
    "You can sample a few rows of data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4cfa5dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "display.sql('SELECT * FROM myWiki1 LIMIT 10')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1152f41",
   "metadata": {},
   "source": [
    "## Datasource Client\n",
    "\n",
    "The Datasource client lets you perform operations on datasource objects. The SQL layer allows you to get metadata and do queries. The datasource client works with the underlying segments. Explaining the full functionality is the topic of another notebook. For now, you can use the datasource client to clean up the datasource created above. The `True` argument asks for \"if exists\" semantics so you don't get an error if the datasource was alredy deleted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fba659ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_client = druid.datasources\n",
    "ds_client.drop('myWiki', True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c96fdcc6",
   "metadata": {},
   "source": [
    "## Tasks Client\n",
    "\n",
    "Use the tasks client to work with Overlord tasks. The `run_task()` call above actually uses the task client internally to poll Overlord."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4f5ea17",
   "metadata": {},
   "outputs": [],
   "source": [
    "task_client = druid.tasks\n",
    "task_client.tasks()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1deaf95f",
   "metadata": {},
   "source": [
    "## REST Client\n",
    "\n",
    "The Druid Python API starts with a REST client that itself is built on the `requests` package. The REST client implements the common patterns seen in the Druid REST API. You can create a client directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1e55635",
   "metadata": {},
   "outputs": [],
   "source": [
    "from druidapi.rest import DruidRestClient\n",
    "rest_client = DruidRestClient(\"http://localhost:8888\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcb8055f",
   "metadata": {},
   "source": [
    "Or, if you have already created the Druid client, you can reuse the existing REST client. This is how the various other clients work internally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "370ba76a",
   "metadata": {},
   "outputs": [],
   "source": [
    "rest_client = druid.rest"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2654e72c",
   "metadata": {},
   "source": [
    "Use the REST client if you need to make calls that are not yet wrapped by the Python API, or if you want to do something special. To illustrate the client, you can make some of the same calls as in the [Druid REST API notebook](api_tutorial.ipynb).\n",
    "\n",
    "The REST API maintains the Druid host: you just provide the specifc URL tail. There are methods to get or post JSON results. For example, to get status information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e42dfbc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "rest_client.get_json('/status')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "837e08b0",
   "metadata": {},
   "source": [
    "A quick comparison of the three approaches (Requests, REST client, Python client):\n",
    "\n",
    "Status:\n",
    "\n",
    "* Requests: `session.get(druid_host + '/status').json()`\n",
    "* REST client: `rest_client.get_json('/status')`\n",
    "* Status client: `status_client.status()`\n",
    "\n",
    "Health:\n",
    "\n",
    "* Requests: `session.get(druid_host + '/status/health').json()`\n",
    "* REST client: `rest_client.get_json('/status/health')`\n",
    "* Status client: `status_client.is_healthy()`\n",
    "\n",
    "Ingest data:\n",
    "\n",
    "* Requests: See the [REST tutorial](api_tutorial.ipynb)\n",
    "* REST client: as the REST tutorial, but use `rest_client.post_json('/druid/v2/sql/task', sql_request)` and\n",
    "  `rest_client.get_json(f\"/druid/indexer/v1/task/{ingestion_taskId}/status\")`\n",
    "* SQL client: `sql_client.run_task(sql)`, also a form for a full SQL request.\n",
    "\n",
    "List datasources:\n",
    "\n",
    "* Requests: `session.get(druid_host + '/druid/coordinator/v1/datasources').json()`\n",
    "* REST client: `rest_client.get_json('/druid/coordinator/v1/datasources')`\n",
    "* Datasources client: `ds_client.names()`\n",
    "\n",
    "Query data, where `sql_request` is a properly formatted `SqlRequest` dictionary:\n",
    "\n",
    "* Requests: `session.post(druid_host + '/druid/v2/sql', json=sql_request).json()`\n",
    "* REST client: `rest_client.post_json('/druid/v2/sql', sql_request)`\n",
    "* SQL Client: `sql_client.show(sql)`, where `sql` is the query text\n",
    "\n",
    "In general, you have to provide the all the details for the Requests library. The REST client handles the low-level repetitious bits. The Python clients provide methods that encapsulate the specifics of the URLs and return formats."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edc4ee39",
   "metadata": {},
   "source": [
    "## Constants\n",
    "\n",
    "Druid has a large number of special constants: type names, options, etc. The `consts` module provides definitions for many of these:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a90187c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from druidapi import consts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc535898",
   "metadata": {},
   "outputs": [],
   "source": [
    "help(consts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b661b29f",
   "metadata": {},
   "source": [
    "Using the constants avoids typos:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3393af62",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_client.show_tables(consts.SYS_SCHEMA)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e789ca7",
   "metadata": {},
   "source": [
    "## Tracing\n",
    "\n",
    "It is often handy to see what the Druid API is doing: what messages it sends to Druid. You may need to debug some function that isn't working as expected. Or, perhaps you want to see what is sent to Druid so you can replicate it in your own code. Either way, just turn on tracing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac68b60e",
   "metadata": {},
   "outputs": [],
   "source": [
    "druid.trace(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b9dc7e3",
   "metadata": {},
   "source": [
    "Then, each call to Druid prints what it sends:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72c955c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_client.show_tables()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddaf0dc2",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This notebook have you a whirlwind tour of the Python Druid API: just enough to check your cluster, ingest some data with MSQ and query that data. Druid has many more APIs. As noted earlier, the Python API is a work in progress: the team adds new wrappers as needed for tutorials. Your [contributions](https://github.com/apache/druid/pulls) and [feedback](https://github.com/apache/druid/issues) are welcome."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
--- a/examples/quickstart/jupyter-notebooks/README.md
+++ b/examples/quickstart/jupyter-notebooks/README.md
@ -1,6 +1,11 @@
 # Jupyter Notebook tutorials for Druid
-<!-- This README and the tutorial-jupyter-index.md file in docs/tutorials share a lot of the same content. If you make a change in one place, update the other too. -->
+If you are reading this in Jupyter, switch over to the [- START HERE -](- START HERE -.ipynb]
 notebook instead.
 <!-- This README, the "- START HERE -" notebook, and the tutorial-jupyter-index.md file in
 docs/tutorials share a lot of the same content. If you make a change in one place, update
 the other too. -->
 <!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
@ -21,7 +26,9 @@
  ~ under the License.
  -->
-You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These tutorials provide snippets of Python code that you can use to run calls against the Druid API to complete the tutorial.
+You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These
 tutorials provide snippets of Python code that you can use to run calls against
 the Druid API to complete the tutorial.
 ## Prerequisites
@ -33,9 +40,10 @@ Make sure you meet the following requirements before starting the Jupyter-based
  ```bash
  pip3 install requests
-   ````
+  ```
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid and Jupyter both try to use port `8888,` so start Jupyter on a different port.
+- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
  and Jupyter both try to use port `8888`, so start Jupyter on a different port.
  - Install JupyterLab or Notebook:
@ -45,47 +53,36 @@ Make sure you meet the following requirements before starting the Jupyter-based
    # Install Jupyter Notebook
    pip3 install notebook
    ```
-     
+  - Start Jupyter using either JupyterLab
  -  Start Jupyter:
      -  JupyterLab 
    ```bash
    # Start JupyterLab on port 3001
    jupyter lab --port 3001
    ```
-      - Jupyter Notebook
+
    Or using Jupyter Notebook
    ```bash
    # Start Jupyter Notebook on port 3001
    jupyter notebook --port 3001
    ```
- An available Druid instance. You can use the `micro-quickstart` configuration described in [Quickstart (local)](../../../docs/tutorials/index.md). The tutorials assume that you are using the quickstart, so no authentication or authorization is expected unless explicitly mentioned.
+- An available Druid instance. You can use the `micro-quickstart` configuration
  described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).
  The tutorials assume that you are using the quickstart, so no authentication or authorization
  is expected unless explicitly mentioned.
-## Tutorials
+  If you contribute to Druid, and work with Druid integration tests, can use a test cluster.
-
+  Assume you have an environment variable, `DRUID_DEV`, which identifies your Druid source repo.
 The notebooks are located in the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/). You can either clone the repo or download the notebooks you want individually. 
 The links that follow are the raw GitHub URLs, so you can use them to download the notebook directly, such as with `wget`, or manually through your web browser. Note that if you save the file from your web browser, make sure to remove the `.txt` extension.
 - [Introduction to the Druid API](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb) walks you through some of the basics related to the Druid API and several endpoints.
 - [Introduction to Druid SQL](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb) covers the basics of Druid SQL.
 ## Contributing
 If you build a Jupyter tutorial, you need to do a few things to add it to the docs in addition to saving the notebook in this directory. The process requires two PRs to the repo.
 For the first PR, do the following:
 1. Clear the outputs from your notebook before you make the PR. You can use the following command: 
  ```bash
-   jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace ./path/to/notebook/notebookName.ipynb
+  cd $DRUID_DEV
  ./it.sh build
  ./it.sh image
  ./it.sh up <category>
  ```
-2. Create the PR as you normally would. Make sure to note that this PR is the one that contains only the Jupyter notebook and that there will be a subsequent PR that updates related pages.
+  Replace `<catagory>` with one of the available integration test categories. See the integration 
  test `README.md` for details.
-3. After this first PR is merged, grab the "raw" URL for the file from GitHub. For example, navigate to the file in the GitHub web UI and select **Raw**. Use the URL for this in the second PR as the download link.
+## Continue in Jupyter
-For the second PR, do the following:
+Start Jupyter (see above) and navigate to the "- START HERE -" page for more information.
 1. Update the list of [Tutorials](#tutorials) on this page and in the [ Jupyter tutorial index page](../../../docs/tutorials/tutorial-jupyter-index.md#tutorials) in the `docs/tutorials` directory. 
 2. Update `tutorial-jupyter-index.md` and provide the URL to the raw version of the file that becomes available after the first PR is merged.
--- a/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb
+++ b/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb
@ -7,7 +7,7 @@
    "tags": []
   },
   "source": [
-    "# Tutorial: Learn the basics of the Druid API\n",
+    "# Learn the basics of the Druid API\n",
    "\n",
    "<!--\n",
    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
@ -38,6 +38,8 @@
    "\n",
    "For more information, see the [API reference](https://druid.apache.org/docs/latest/operations/api-reference.html), which is organized by server type.\n",
    "\n",
    "For work within other notebooks, prefer to use the [Python API](Python_API_Tutorial.ipynb) which is a notebook-friendly wrapper around the low-level API calls shown here.\n",
    "\n",
    "## Table of contents\n",
    "\n",
    "- [Prerequisites](#Prerequisites)\n",
@ -58,29 +60,38 @@
   "source": [
    "## Prerequisites\n",
    "\n",
-    "You'll need install the Requests library for Python before you start. For example:\n",
+    "Install the [Requests](https://requests.readthedocs.io/en/latest/) library for Python before you start. For example:\n",
    "\n",
    "```bash\n",
    "pip3 install requests\n",
    "```\n",
    "\n",
    "Please read the [Requests Quickstart](https://requests.readthedocs.io/en/latest/user/quickstart/) to gain a basic understanding of how Requests works.\n",
    "\n",
    "Next, you'll need a Druid cluster. This tutorial uses the `micro-quickstart` config described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html). So download that and start it if you haven't already. In the root of the Druid folder, run the following command to start Druid:\n",
    "\n",
    "```bash\n",
    "./bin/start-micro-quickstart\n",
    "```\n",
    "\n",
-    "Finally, you'll need either JupyterLab (recommended) or Jupyter Notebook. Both the quickstart Druid cluster and Jupyter are deployed at `localhost:8888` by default, so you'll \n",
+    "Finally, you'll need either JupyterLab (recommended) or Jupyter Notebook. Both the quickstart Druid cluster and Jupyter are deployed at `localhost:8888` by default, so you'll need to change the port for Jupyter. To do so, stop Jupyter and start it again with the `port` parameter included. For example, you can use the following command to start Jupyter on port `3001`:\n",
    "need to change the port for Jupyter. To do so, stop Jupyter and start it again with the `port` parameter included. For example, you can use the following command to start Jupyter on port `3001`:\n",
    "\n",
    "```bash\n",
    "# If you're using JupyterLab\n",
    "jupyter lab --port 3001\n",
    "# If you're using Jupyter Notebook\n",
    "jupyter notebook --port 3001 \n",
-    "```\n",
+    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12c0e1c3",
   "metadata": {},
   "source": [
    "To start this tutorial, run the next cell. It imports the Python packages you'll need and defines a variable for the the Druid host, where the Router service listens.\n",
    "\n",
-    "To start this tutorial, run the next cell. It imports the Python packages you'll need and defines a variable for the the Druid host, where the Router service listens. "
+    "`druid_host` is the hostname and port for your Druid deployment. In a distributed environment, you can point to other Druid services. In this tutorial, you'll use the Router service as the `druid_host`."
   ]
  },
  {
@ -91,13 +102,27 @@
   "outputs": [],
   "source": [
    "import requests\n",
    "import json\n",
    "\n",
    "# druid_host is the hostname and port for your Druid deployment. \n",
    "# In a distributed environment, you can point to other Druid services. In this tutorial, you'll use the Router service  as the `druid_host`. \n",
    "druid_host = \"http://localhost:8888\"\n",
-    "dataSourceName = \"wikipedia_api\"\n",
+    "druid_host"
-    "print(f\"\\033[1mDruid host\\033[0m: {druid_host}\")"
+   ]
  },
  {
   "cell_type": "markdown",
   "id": "a22c69c8",
   "metadata": {},
   "source": [
    "If your cluster is secure, you'll need to provide authorization information on each request. You can automate it by using the Requests `session` feature. Although this tutorial assumes no authorization, the configuration below defines a session as an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06ea91de",
   "metadata": {},
   "outputs": [],
   "source": [
    "session = requests.Session()"
   ]
  },
  {
@ -105,7 +130,7 @@
   "id": "2093ecf0-fb4b-405b-a216-094583580e0a",
   "metadata": {},
   "source": [
-    "In the rest of this tutorial, the `endpoint`, `http_method`, and `payload` variables are updated in code cells to call a different Druid endpoint to accomplish a task."
+    "In the rest of this tutorial, the `endpoint` and other variables are updated in code cells to call a different Druid endpoint to accomplish a task."
   ]
  },
  {
@ -119,7 +144,28 @@
    "\n",
    "In this cell, you'll use the `GET /status` endpoint to return basic information about your cluster, such as the Druid version, loaded extensions, and resource consumption.\n",
    "\n",
-    "The following cell sets `endpoint` to `/status` and updates the HTTP method to `GET`. When you run the cell, you should get a response that starts with the version number of your Druid deployment."
+    "The following cell sets `endpoint` to `/status` and updates the HTTP method to `GET`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a1b453e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "endpoint = druid_host + '/status'\n",
    "endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e853795",
   "metadata": {},
   "source": [
    "The Requests `Session` has a `get()` method that posts an HTTP `GET` request. The method takes multiple arguments. Here you only need the URL. The method returns a Requests `Response` object, which can convert the returned JSON result to Python. JSON objects become Python dictionaries. JSON arrays become Python arrays. When you run the cell, you should receive a response that starts with the version number of your Druid deployment."
   ]
  },
  {
@ -131,15 +177,27 @@
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/status\"\n",
+    "response = session.get(endpoint)\n",
-    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "json = response.json()\n",
-    "http_method = \"GET\"\n",
+    "json"
-    "\n",
+   ]
-    "payload = {}\n",
+  },
-    "headers = {}\n",
+  {
-    "\n",
+   "cell_type": "markdown",
-    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+   "id": "de82029e",
-    "print(\"\\033[1mResponse\\033[0m: : \\n\" + json.dumps(response.json(), indent=4))"
+   "metadata": {},
   "source": [
    "Because the JSON result is converted to Python, you can use Python to pull out the information you want. For example, to see just the version:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64e83cec",
   "metadata": {},
   "outputs": [],
   "source": [
    "json['version']"
   ]
  },
  {
@ -151,7 +209,18 @@
   "source": [
    "### Get cluster health\n",
    "\n",
-    "The `/status/health` endpoint returns `true` if your cluster is up and running. It's useful if you want to do something like programmatically check if your cluster is available. When you run the following cell, you should get `true` if your Druid cluster has finished starting up and is running."
+    "The `/status/health` endpoint returns JSON `true` if your cluster is up and running. It's useful if you want to do something like programmatically check if your cluster is available. When you run the following cell, you should receive the `True` Python value if your Druid cluster has finished starting up and is running."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21121c13",
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = druid_host + '/status/health'\n",
    "endpoint"
   ]
  },
  {
@ -161,21 +230,13 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/status/health\"\n",
+    "is_healthy = session.get(endpoint).json()\n",
-    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "is_healthy"
    "http_method = \"GET\"\n",
    "\n",
    "payload = {}\n",
    "headers = {}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(\"\\033[1mResponse\\033[0m: \" + response.text)"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "1de51db8-4c51-4b7e-bb3b-734ff15c8ab3",
+   "id": "1917aace",
   "metadata": {
    "tags": []
   },
@ -184,19 +245,34 @@
    "\n",
    "Now that you've confirmed that your cluster is up and running, you can start ingesting data. There are different ways to ingest data based on what your needs are. For more information, see [Ingestion methods](https://druid.apache.org/docs/latest/ingestion/index.html#ingestion-methods).\n",
    "\n",
-    "This tutorial uses the multi-stage query (MSQ) task engine and its `sql/task` endpoint to perform SQL-based ingestion. The `/sql/task` endpoint accepts [SQL requests in the JSON-over-HTTP format](https://druid.apache.org/docs/latest/querying/sql-api.html#request-body) using the `query`, `context`, and `parameters` fields\n",
+    "This tutorial uses the multi-stage query (MSQ) task engine and its `sql/task` endpoint to perform SQL-based ingestion.\n",
    "\n",
    "To learn more about SQL-based ingestion, see [SQL-based ingestion](https://druid.apache.org/docs/latest/multi-stage-query/index.html). For information about the endpoint specifically, see [SQL-based ingestion and multi-stage query task API](https://druid.apache.org/docs/latest/multi-stage-query/api.html).\n",
    "\n",
    "\n",
    "The next cell does the following:\n",
    "\n",
    "- Includes a payload that inserts data from an external source into a table named wikipedia_api. The payload is in JSON format and included in the code directly. You can also store it in a file and provide the file. \n",
    "- Saves the response to a unique variable that you can reference later to identify this ingestion task\n",
    "\n",
    "To learn more about SQL-based ingestion, see [SQL-based ingestion](https://druid.apache.org/docs/latest/multi-stage-query/index.html). For information about the endpoint specifically, see [SQL-based ingestion and multi-stage query task API](https://druid.apache.org/docs/latest/multi-stage-query/api.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "994ba704",
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = druid_host + '/druid/v2/sql/task'\n",
    "endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2168db3a",
   "metadata": {
    "tags": []
   },
   "source": [
    "The example uses INSERT, but you could also use REPLACE INTO. In fact, if you have an existing datasource with the name `wikipedia_api`, you need to use REPLACE INTO instead. \n",
    "\n",
-    "The MSQ task engine uses a task to ingest data. The response for the API includes a `taskId` and `state` for your ingestion. You can use this `taskId` to reference this task later on to get more information about it.\n",
+    "The `/sql/task` endpoint accepts [SQL requests in the JSON-over-HTTP format](https://druid.apache.org/docs/latest/querying/sql-api.html#request-body) using the `query`, `context`, and `parameters` fields.\n",
    "\n",
    "The query inserts data from an external source into a table named `wikipedia_api`. \n",
    "\n",
    "Before you ingest the data, take a look at the query. Pay attention to two parts of it, `__time` and `PARTITIONED BY`, which relate to how Druid partitions data:\n",
    "\n",
@ -210,60 +286,141 @@
    "\n",
    "To learn more, see [Partitioning](https://druid.apache.org/docs/latest/ingestion/partitioning.html).\n",
    "\n",
    "The query uses `INSERT INTO`. If you have an existing datasource with the name `wikipedia_api`, use `REPLACE INTO` instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90c34908",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = '''\n",
    "INSERT INTO wikipedia_api \n",
    "SELECT \n",
    "  TIME_PARSE(\"timestamp\") AS __time,\n",
    "  * \n",
    "FROM TABLE(EXTERN(\n",
    "  '{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}', \n",
    "  '{\"type\": \"json\"}', \n",
    "  '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\n",
    "  ))\n",
    "PARTITIONED BY DAY\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7dcddd7",
   "metadata": {},
   "source": [
    "The query is included inline here. You can also store it in a file and provide the file.\n",
    "\n",
    "The above showed how the Requests library can convert the response from JSON to Python. Requests can also convert the request from Python to JSON. The next cell builds up a Python map that represents the Druid `SqlRequest` object. In this case, you need the query and a context variable to set the task count to 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6e82c0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql_request = {\n",
    "    'query': sql,\n",
    "    'context': {\n",
    "      'maxNumTasks': 3\n",
    "  }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6aa7f230",
   "metadata": {},
   "source": [
    "With the SQL request ready, use the the `json` parameter to the `Session` `post` method to send a `POST` request with the `sql_request` object as the payload. The result is a Requests `Response` which is saved in a variable.\n",
    "\n",
    "Now, run the next cell to start the ingestion."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "362b6a87",
+   "id": "e2939a07",
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql/task\"\n",
+    "response = session.post(endpoint, json=sql_request)"
    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
    "http_method = \"POST\"\n",
    "\n",
    "\n",
    "# The query uses INSERT INTO. If you have an existing datasource with the name wikipedia_api, use REPLACE INTO instead.\n",
    "payload = json.dumps({\n",
    "\"query\": \"INSERT INTO wikipedia_api SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
    "          AS __time, * FROM TABLE \\\n",
    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
    "          PARTITIONED BY DAY\",\n",
    "  \"context\": {\n",
    "    \"maxNumTasks\": 3\n",
    "  }\n",
    "})\n",
    "\n",
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "ingestion_taskId_response = response\n",
    "print(f\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "c1235e99-be72-40b0-b7f9-9e860e4932d7",
+   "id": "9ba1821f",
   "metadata": {
    "tags": []
   },
   "source": [
-    "Extract the `taskId` value from the `taskId_response` variable so that you can reference it later:"
+    "The MSQ task engine uses a task to ingest data. The response for the API includes a `taskId` and `state` for your ingestion. You can use this `taskId` to reference this task later on to get more information about it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9cc2e45",
   "metadata": {},
   "source": [
    "It is good practice to ensure that the response succeeded by checking the return status. The status should be 20x. (202 means \"accepted\".) If the response is something else, such as 4xx, display `response.text` to see the error message."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "f578b9b2",
+   "id": "01d875f7",
   "metadata": {},
   "outputs": [],
   "source": [
-    "ingestion_taskId = json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "response.status_code"
-    "print(f\"This is the task ID: {ingestion_taskId}\")"
+   ]
  },
  {
   "cell_type": "markdown",
   "id": "a983270d",
   "metadata": {},
   "source": [
    "Convert the JSON response to a Python object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95eeb9bf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "json = response.json()\n",
    "json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f9c1b6b",
   "metadata": {},
   "source": [
    "Extract the taskId value from the taskId_response variable so that you can reference it later:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2eeb3e3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "ingestion_taskId = json['taskId']\n",
    "ingestion_taskId"
   ]
  },
  {
@ -285,46 +442,56 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "fdbab6ae",
+   "id": "f7b65866",
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = druid_host + f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
    "endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22b08321",
   "metadata": {},
   "outputs": [],
   "source": [
    "json = session.get(endpoint).json()\n",
    "json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74a9b593",
   "metadata": {},
   "source": [
    "Wait until your ingestion completes before proceeding. Depending on what else is happening in your Druid cluster and the resources available, ingestion can take some time. Pro tip: an asterisk appears next to the cell while Python runs and waits for the task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aae5c228",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
-    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "ingestion_status = json['status']['status']\n",
    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
    "http_method = \"GET\"\n",
    "\n",
    "\n",
    "payload = {}\n",
    "headers = {}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "ingestion_status = json.loads(response.text)['status']['status']\n",
    "# If you only want to fetch the status once and print it, \n",
    "# uncomment the print statement and comment out the if and while loops\n",
    "# print(json.dumps(response.json(), indent=4))\n",
    "\n",
    "\n",
    "if ingestion_status == \"RUNNING\":\n",
    "  print(\"The ingestion is running...\")\n",
    "\n",
    "while ingestion_status != \"SUCCESS\":\n",
-    "  response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "  time.sleep(5) # 5 seconds \n",
-    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  json = session.get(endpoint).json()\n",
-    "  time.sleep(15)  \n",
+    "  ingestion_status = json['status']['status']\n",
    "  \n",
    "if ingestion_status == \"SUCCESS\": \n",
-    "  print(\"The ingestion is complete:\")\n",
+    "  print(\"The ingestion is complete\")\n",
-    "  print(json.dumps(response.json(), indent=4))\n"
+    "else:\n",
-   ]
+    "  print(\"The ingestion task failed:\", json)"
  },
  {
   "cell_type": "markdown",
   "id": "1336ddd5-8a42-41af-8913-533316221c52",
   "metadata": {},
   "source": [
    "Wait until your ingestion completes before proceeding. Depending on what else is happening in your Druid cluster and the resources available, ingestion can take some time."
   ]
  },
  {
@ -346,19 +513,22 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "959e3c9b",
+   "id": "a482c1f0",
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/coordinator/v1/datasources\"\n",
+    "endpoint = druid_host + '/druid/coordinator/v1/datasources'\n",
-    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "endpoint"
-    "http_method = \"GET\"\n",
+   ]
-    "\n",
+  },
-    "payload = {}\n",
+  {
-    "headers = {}\n",
+   "cell_type": "code",
-    "\n",
+   "execution_count": null,
-    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+   "id": "a0a7799e",
-    "print(\"\\nThe response is the list of datasources available in your Druid deployment: \" + response.text)"
+   "metadata": {},
   "outputs": [],
   "source": [
    "session.get(endpoint).json()"
   ]
  },
  {
@ -376,25 +546,54 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "694900d0-891f-41bd-9b45-5ae957385244",
+   "id": "44868ff9",
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "endpoint = druid_host + '/druid/v2/sql'\n",
-    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "endpoint"
-    "http_method = \"POST\"\n",
+   ]
-    "\n",
+  },
-    "payload = json.dumps({\n",
+  {
-    "  \"query\": \"SELECT  * FROM wikipedia_api LIMIT 3\"\n",
+   "cell_type": "markdown",
-    "})\n",
+   "id": "b2b366ad",
-    "headers = {'Content-Type': 'application/json'}\n",
+   "metadata": {},
-    "\n",
+   "source": [
-    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "As for ingestion, define a query, then create a `SQLRequest` object as a Python `dict`."
-    "\n",
+   ]
-    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+  },
-    "print(f\"\\nEach JSON object in the response represents a row in the {dataSourceName} datasource.\") \n",
+  {
-    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), indent=4))\n",
+   "cell_type": "code",
-    "\n"
+   "execution_count": null,
   "id": "b7c77093",
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = '''\n",
    "SELECT  * \n",
    "FROM wikipedia_api \n",
    "LIMIT 3\n",
    "'''\n",
    "sql_request = {'query': sql}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0df529d6",
   "metadata": {},
   "source": [
    "Now run the query. The result is an array of JSON objects. Each JSON object in the response represents a row in the `wikipedia_api` datasource."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac4f95b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "json = session.post(endpoint, json=sql_request).json()\n",
    "json"
   ]
  },
  {
@ -420,28 +619,36 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "c3860d64-fba6-43bc-80e2-404f5b3b9baa",
+   "id": "9a645287",
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"\\033[1mQuery endpoint\\033[0m: {druid_host+endpoint}\")\n",
+    "SELECT * \n",
-    "http_method = \"POST\"\n",
+    "FROM wikipedia_api \n",
-    "\n",
+    "WHERE __time > ? \n",
-    "payload = json.dumps({\n",
+    "LIMIT 1\n",
-    "  \"query\": \"SELECT * FROM wikipedia_api WHERE __time > ? LIMIT 1\",\n",
+    "'''\n",
-    "  \"context\": {\n",
+    "sql_request = {\n",
-    "    \"sqlQueryId\" : \"important-query\" \n",
+    "    'query': sql,\n",
    "    'context': {\n",
    "      'sqlQueryId' : 'important-query'\n",
    "    },\n",
-    "  \"parameters\": [\n",
+    "    'parameters': [\n",
-    "    { \"type\": \"TIMESTAMP\", \"value\": \"2016-06-27\"}\n",
+    "      { 'type': 'TIMESTAMP', 'value': '2016-06-27'}\n",
    "  ]\n",
-    "})\n",
+    "}"
-    "headers = {'Content-Type': 'application/json'}\n",
+   ]
-    "\n",
+  },
-    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+  {
-    "print(\"\\033[1mQuery\\033[0m:\\n\" + payload)\n",
+   "cell_type": "code",
-    "print(\"\\n\\033[1mResponse\\033[0m: \\n\" + json.dumps(response.json(), indent=4))\n"
+   "execution_count": null,
   "id": "e5cced58",
   "metadata": {},
   "outputs": [],
   "source": [
    "json = session.post(endpoint, json=sql_request).json()\n",
    "json"
   ]
  },
  {
@ -458,10 +665,7 @@
    "- [Druid SQL API](https://druid.apache.org/docs/latest/querying/sql-api.html)\n",
    "- [API reference](https://druid.apache.org/docs/latest/operations/api-reference.html)\n",
    "\n",
-    "You can also try out the [druid-client](https://github.com/paul-rogers/druid-client), a Python library for Druid created by Paul Rogers, a Druid contributor.\n",
+    "You can also try out the [druid-client](https://github.com/paul-rogers/druid-client), a Python library for Druid created by Paul Rogers, a Druid contributor. A simplified version of that library is included with these tutorials. See [the Python API Tutorial](Python_API_Tutorial.ipynb) for an overview. That tutorial shows how to do the same tasks as this one, but in a simpler form: focusing on the Druid actions and not the mechanics of the REST API."
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
@ -481,7 +685,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.9.6"
  },
  "vscode": {
   "interpreter": {
--- a/examples/quickstart/jupyter-notebooks/druidapi/README.md
+++ b/examples/quickstart/jupyter-notebooks/druidapi/README.md
@ -0,0 +1,497 @@
 <!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->
 # Python API for Druid
 `druidapi` is a Python library to interact with all aspects of your
 [Apache Druid](https://druid.apache.org/) cluster. 
 `druidapi` picks up where the venerable [pydruid](https://github.com/druid-io/pydruid) library 
 left off to include full SQL support and support for many of of Druid APIs. `druidapi` is usable 
 in any Python environment, but is optimized for use in Jupyter, providing a complete interactive
 environment which complements the UI-based Druid console. The primary use of `druidapi` at present
 is to support the set of tutorial notebooks provided in the parent directory.
 ## Install
 At present, the best way to use `druidapi` is to clone the Druid repo itself:
 ```bash
 git clone git@github.com:apache/druid.git
 ```
 `druidapi` is located in `examples/quickstart/jupyter-notebooks/druidapi/`
 Eventually we would like to create a Python package that can be installed with `pip`. Contributions
 in that area are welcome.
 Dependencies are listed in `requirements.txt`.
 `druidapi` works against any version of Druid. Operations that exploit newer features obviously work
 only against versions of Druid that support those features.
 ## Getting Started
 To use `druidapi`, first import the library, then connect to your cluster by providing the URL to your Router instance. The way that is done differs a bit between consumers.
 ### From a Tutorial Jupyter Notebook
 The tutorial Jupyter notebooks in `examples/quickstart/jupyter-notebooks` reside in the same directory tree
 as this library. We start the library using the Jupyter-oriented API which is able to render tables in
 HTML. First, identify your Router endpoint. Use the following for a local installation:
 ```python
 router_endpoint = 'http://localhost:8888'
 ```
 Then, import the library, declare the `druidapi` CSS styles, and create a client to your cluster:
 ```python
 import druidapi
 druid = druidapi.jupyter_client(router_endpoint)
 ```
 The `jupyter_client` call defines a number of CSS styles to aid in displaying tabular results. It also
 provides a "display" client that renders information as HTML tables.
 ### From Any Other Juypter Notebook
 If you create a Jupyter notebook outside of the `jupyter-notebooks` directory then you must tell Python where
 to find the `druidapi` library. (This step is temporary until `druidapi` is properly packaged.)
 First, set a variable to point to the location where you cloned the Druid git repo:
 ```python
 druid_dev = '/path/to/Druid-repo'
 ```
 Then, add the notebooks directory to Python's module search path:
 ```python
 import sys
 sys.path.append(druid_dev + '/examples/quickstart/jupyter-notebooks/')
 ```
 Now you can import `druidapi` and create a client as shown in the previous section.
 ### From a Python Script
 `druidapi` works in any Python script. When run outside of a Jupyter notebook, the various "display"
 commands revert to displaying a text (not HTML) format. The steps are similar to those above:
 ```python
 druid_dev = '/path/to/Druid-repo'
 import sys
 sys.path.append(druid_dev + '/examples/quickstart/jupyter-notebooks/')
 import druidapi
 druid = druidapi.client(router_endpoint)
 ```
 ## Library Organization
 `druidapi` organizes Druid REST operations into various "clients," each of which provides operations
 for one of Druid's functional areas. Obtain a client from the `druid` client created above. For
 status operations:
 ```python
 status_client = druid.status
 ```
 The set of clients is still under construction. The set at present includes the following. The
 set of operations within each client is also partial, and includes only those operations used
 within one of the tutorial notebooks. Contributions are welcome to expand the scope. Clients are
 available as properties on the `druid` object created above.
 * `status` - Status operations such as service health, property values, and so on. This client
  is special: it works only with the Router. The Router does not proxy these calls to other nodes.
  Use the `status_for()` method to get status for other nodes.
 * `datasources` - Operations on datasources such as dropping a datasource.
 * `tasks` - Work with Overlord tasks: status, reports, and more.
 * `sql` - SQL query operations for both the interactive query engine and MSQ.
 * `display` - A set of convenience operations to display results as lightly formatted tables
  in either HTML (for Jupyter notebooks) or text (for other Python scripts).
 ## Assumed Cluster Architecture
 `druidapi` assumes that you run a standard Druid cluster with a Router in front of the other nodes.
 This design works well for most Druid clusters:
 * Run locally, such as the various quickstart clusters.
 * Remote cluster on the same network.
 * Druid cluster running under Docker Compose such as that explained in the Druid documentation.
 * Druid integration test clusters launched via the Druid development `it.sh` command.
 * Druid clusters running under Kubernetes
 In all the Docker, Docker Compose and Kubernetes scenaris, the Router's port (typically 8888) must be visible
 to the machine running `druidapi`, perhaps via port mapping or a proxy.
 The Router is then responsible for routing Druid REST requests to the various other Druid nodes,
 including those not visible outside of a private Docker or Kubernetes network.
 The one exception to this rule is if you want to perform a health check (i.e. the `/status` endpoint)
 on a service other than the Router. These checks are _not_ proxied by the Router: you must connect to
 the target service directly.
 ## Status Operations
 When working with tutorials, a local Druid cluster, or a Druid integration test cluster, it is common
 to start your cluster then immediately start performing `druidapi` operations. However, because Druid
 is a distributed system, it can take some time for all the services to become ready. This seems to be
 particularly true when starting a cluster with Docker Compose or Kubernetes on the local system.
 Therefore, the first operation is to wait for the cluster to become ready:
 ```python
 status_client = druid.status
 status_client.wait_until_ready()
 ```
 Without this step, your operations may mysteriously fail, and you'll wonder if you did something wrong.
 Some clients retry operations multiple times in case a service is not yet ready. For typical scripts
 against a stable cluster, the above line should be sufficient instead. This step is built into the
 `jupyter_client()` method to ensure notebooks provide a good exerience.
 If your notebook or script uses newer features, you should start by ensuring that the target Druid cluster
 is of the correct version:
 ```python
 status_client.version
 ```
 This check will prevent frustration if the notebook is used against previous releases.
 Similarly, if the notebook or script uses features defined in an extension, check that the required
 extension is loaded:
 ```python
 status_client.properties['druid.extensions.loadList']
 ```
 ## Display Client
 When run in a Jupyter notebook, it is often handy to format results for display. A special display
 client performs operations _and_ formats them for display as HTML tables within the notebook.
 ```python
 display = druid.display
 ```
 The most common methods are:
 * `sql(sql)` - Run a query and display the results as an HTML table.
 * `schemas()` - Display the schemas defined in Druid.
 * `tables(schema)` - Display the tables (datasources) in the given schema, `druid` by default.
 * `table(name)` - Display the schema (list of columns) for the the given table. The name can
  be one part (`foo`) or two parts (`INFORMATION_SCHEMA.TABLES`).
 * `function(name)` - Display the arguments for a table function defined in the catalog.
 The display client also has other methods to format data as a table, to display various kinds
 of messages and so on.
 ## Interactive Queries
 The original [`pydruid`](https://pythonhosted.org/pydruid/) library revolves around Druid 
 "native" queries. Most new applications now use SQL. `druidapi` provides two ways to run
 queries, depending on whether you want to display the results (typical in a notebook), or
 use the results in Python code. You can run SQL queries using the SQL client:
 ```python
 sql_client = druid.sql
 ```
 To obtain the results of a SQL query against the example Wikipedia table (datasource) in a "raw" form:
 ```python
 sql = '''
 SELECT
  channel,
  COUNT(*) AS "count"
 FROM wikipedia
 GROUP BY channel
 ORDER BY COUNT(*) DESC
 LIMIT 5
 '''
 client.sql(sql)
 ```
 Gives:
 ```text
 [{'channel': '#en.wikipedia', 'count': 6650},
 {'channel': '#sh.wikipedia', 'count': 3969},
 {'channel': '#sv.wikipedia', 'count': 1867},
 {'channel': '#ceb.wikipedia', 'count': 1808},
 {'channel': '#de.wikipedia', 'count': 1357}]
 ```
 The raw results are handy when Python code consumes the results, or for a quick check. The raw results
 can also be forward to advanced visualization tools such a Pandas.
 For simple visualization in notebooks (or as text in Python scripts), you can use the "display" client:
 ```python
 display = druid.display
 display.sql(sql)
 ```
 When run without HTML visualization, the above gives:
 ```text
 channel        count
 #en.wikipedia   6650
 #sh.wikipedia   3969
 #sv.wikipedia   1867
 #ceb.wikipedia  1808
 #de.wikipedia   1357
 ```
 Within Jupyter, the results are formatted as an HTML table.
 ### Advanced Queries
 In addition to the SQL text, Druid also lets you specify:
 * A query context
 * Query parameters
 * Result format options
 The Druid `SqlQuery` object specifies these options. You can build up a Python equivalent:
 ```python
 sql = '''
 SELECT *
 FROM INFORMATION_SCHEMA.SCHEMATA
 WHERE SCHEMA_NAME = ?
 '''
 sql_query = {
    'query': sql,
    'parameters': [
        {'type': consts.SQL_VARCHAR_TYPE, 'value': 'druid'}
    ],
    'resultFormat': consts.SQL_OBJECT
 }
 ```
 However, the easier approach is to let `druidapi` handle the work for you using a SQL request:
 ```python
 req = self.client.sql_request(sql)
 req.add_parameter('druid')
 ```
 Either way, when you submit the query in this form, you get a SQL response back:
 ```python
 resp = sql_client.sql_query(req)
 ```
 The SQL response wraps the REST response. First, we ensure that the request worked:
 ```python
 resp.ok
 ```
 If the request failed, we can obtain the error message:
 ```python
 resp.error_message
 ```
 If the request succeeded, we can obtain the results in a variety of ways. The easiest is to obtain
 the data as a list of Java objects. This is the form shown in the "raw" example above. This works
 only if you use the default ('objects') result format.
 ```python
 resp.rows
 ```
 You can also obtain the schema of the result:
 ```python
 resp.schema
 ```
 The result is a list of `ColumnSchema` objects. Get column information from the `name`, `sql_type`
 and `druid_type` fields in each object. 
 For other formats, you can obtain the REST payload directly:
 ```python
 resp.results
 ```
 Use the `results()` method if you requested other formats, such as CSV. The `rows()` and `schema()` methods
 are not available for these other result formats.
 The result can also format the results as a text or HTML table, depending on how you created the client:
 ```python
 resp.show()
 ```
 In fact, the display client `sql()` method uses the `resp.show()` method internally, which in turn uses the
 `rows` and `schema` properties.
 ### Run a Query and Return Results
 The above forms are handy for interactive use in a notebook. If you just need to run a query to use the results
 in code, just do the following:
 ```python
 rows = sql_client.sql(sql)
 ```
 This form takes a set of arguments so that you can use Python to parameterize the query:
 ```python
 sql = 'SELECT * FROM {}'
 rows = sql_client.sql(sql, ['myTable'])
 ```
 ## MSQ Queries
 The SQL client can also run an MSQ query. See the `sql-tutorial.ipynb` notebook for examples. First define the
 query:
 ```python
 sql = '''
 INSERT INTO myTable ...
 '''
 ```
 Then launch an ingestion task:
 ```python
 task = sql_client.task(sql)
 ```
 To learn the Overlord task ID:
 ```python
 task.id
 ```
 You can use the tasks client to track the status, or let the task object do it for you:
 ```python
 task.wait_until_done()
 ```
 You can combine the run-and-wait operations into a single call:
 ```python
 task = sql_client.run_task(sql)
 ```
 A quirk of Druid is that MSQ reports task completion as soon as ingestion is done. However, it takes a 
 while for Druid to load the resulting segments, so you must wait for the table to become queryable:
 ```python
 sql_client.wait_until_ready('myTable')
 ```
 ## Datasource Operations
 To get information about a datasource, prefer to query the `INFORMATION_SCHEMA` tables, or use the methods
 in the display client. Use the datasource client for other operations.
 ```python
 datasources = druid.datasources
 ```
 To delete a datasource:
 ```python
 datasources.drop('myWiki', True)
 ```
 The True argument asks for "if exists" semantics so you don't get an error if the datasource does not exist.
 ## REST Client
 The `druidapi` is based on a simple REST client which is itself based on the Requests library. If you
 need to use Druid REST APIs not yet wrapped by this library, you can use the REST client directly.
 (If you find such APIs, we encourage you to add methods to the library and contribute them to Druid.)
 The REST client implements the common patterns seen in the Druid REST API. You can create a client directly:
 ```python
 from druidapi.rest import DruidRestClient
 rest_client = DruidRestClient("http://localhost:8888")
 ```
 Or, if you have already created the Druid client, you can reuse the existing REST client. This is how 
 the various other clients work internally.
 ```python
 rest_client = druid.rest
 ```
 The REST API maintains the Druid host: you just provide the specifc URL tail. There are methods to get or 
 post JSON results. For example, to get status information:
 ```python
 rest_client.get_json('/status')
 ```
 A quick comparison of the three approaches (Requests, REST client, Python client):
 Status:
 * Requests: `session.get(druid_host + '/status').json()`
 * REST client: `rest_client.get_json('/status')`
 * Status client: `status_client.status()`
 Health:
 * Requests: `session.get(druid_host + '/status/health').json()`
 * REST client: `rest_client.get_json('/status/health')`
 * Status client: `status_client.is_healthy()`
 Ingest data:
 * Requests: See the REST tutorial.
 * REST client: as the REST tutorial, but use `rest_client.post_json('/druid/v2/sql/task', sql_request)` and
  `rest_client.get_json(f"/druid/indexer/v1/task/{ingestion_taskId}/status")`
 * SQL client: `sql_client.run_task(sql)`, also a form for a full SQL request.
 List datasources:
 * Requests: `session.get(druid_host + '/druid/coordinator/v1/datasources').json()`
 * REST client: `rest_client.get_json('/druid/coordinator/v1/datasources')`
 * Datasources client: `ds_client.names()`
 Query data, where `sql_request` is a properly-formatted `SqlRequest` dictionary:
 * Requests: `session.post(druid_host + '/druid/v2/sql', json=sql_request).json()`
 * REST client: `rest_client.post_json('/druid/v2/sql', sql_request)`
 * SQL Client: `sql_client.show(sql)`, where `sql` is the query text
 In general, you have to provide the all the details for the Requests library. The REST client handles the low-level repetitious bits. The Python clients provide methods that encapsulate the specifics of the URLs and return formats.
 ## Constants
 Druid has a large number of special constants: type names, options, etc. The consts module provides definitions for many of these:
 ```python
 from druidapi import consts
 help(consts)
 ```
--- a/examples/quickstart/jupyter-notebooks/druidapi/init.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/init.py
@ -0,0 +1,35 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .druid import DruidClient
 def jupyter_client(endpoint) -> DruidClient:
    '''
    Create a Druid client configured to display results as HTML withing a Jupyter notebook.
    Waits for the cluster to become ready to avoid intermitent problems when using Druid.
    '''
    from .html import HtmlDisplayClient
    druid = DruidClient(endpoint, HtmlDisplayClient())
    druid.status.wait_until_ready()
    return druid
 def client(endpoint) -> DruidClient:
    '''
    Create a Druid client for use in Python scripts that uses a text-based format for
    displaying results. Does not wait for the cluster to be ready: clients should call
    `status().wait_until_ready()` before making other Druid calls if there is a chance
    that the cluster has not yet fully started.
    '''
    return DruidClient(endpoint)
--- a/examples/quickstart/jupyter-notebooks/druidapi/base_table.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/base_table.py
@ -0,0 +1,143 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 ALIGN_LEFT = 0
 ALIGN_CENTER = 1
 ALIGN_RIGHT = 2
 def padded(array, width, fill):
    if array and len(array) >= width:
        return array
    if not array:
        result = []
    else:
        result = array.copy()
    return pad(result, width, fill)
 def pad(array, width, fill):
    for _ in range(len(array), width):
        array.append(fill)
    return array
 def infer_keys(data):
    if type(data) is list:
        data = data[0]
    keys = {}
    for key in data.keys():
        keys[key] = key
    return keys
 class BaseTable:
    def __init__(self):
        self._headers = None
        self._align = None
        self._col_fmt = None
        self.sample_size = 10
        self._rows = None
    def headers(self, headers):
        self._headers = headers
    def rows(self, rows):
        self._rows = rows
    def alignments(self, align):
        self._align = align
    def col_format(self, col_fmt):
        self._col_fmt = col_fmt
    def row_width(self, rows):
        max_width = 0
        min_width = None
        if self._headers:
            max_width = len(self._headers)
            min_width = max_width
        for row in rows:
            max_width = max(max_width, len(row))
            min_width = max_width if min_width is None else min(min_width, max_width)
        min_width = max_width if min_width is None else min_width
        return (min_width, max_width)
    def find_alignments(self, rows, width):
        align = padded(self._align, width, None)
        unknown_count = 0
        for v in align:
            if v is None:
                unknown_count += 1
        if unknown_count == 0:
            return align
        for row in rows:
            for i in range(len(row)):
                if align[i] is not None:
                    continue
                v = row[i]
                if v is None:
                    continue
                if type(v) is str:
                    align[i] = ALIGN_LEFT
                else:
                    align[i] = ALIGN_RIGHT
                unknown_count -= 1
                if unknown_count == 0:
                    return align
        for i in range(width):
            if align[i] is None:
                align[i] = ALIGN_LEFT
        return align
    def pad_rows(self, rows, width):
        new_rows = []
        for row in rows:
            new_rows.append(padded(row, width, None))
        return new_rows
    def pad_headers(self, width):
        if not self._headers:
            return None
        if len(self._headers) == 0:
            return None
        has_none = False
        for i in range(len(self._headers)):
            if not self._headers[i]:
                has_none = True
                break
        if len(self._headers) >= width and not has_none:
            return self._headers
        headers = self._headers.copy()
        if has_none:
            for i in range(len(headers)):
                if not headers[i]:
                    headers[i] = ''
        return pad(headers, width, '')
    def from_object_list(self, objects, cols=None):
        cols = infer_keys(objects) if not cols else cols
        self._rows = []
        for obj in objects:
            row = []
            for key in cols.keys():
                row.append(obj.get(key))
            self._rows.append(row)
        self.headers([head for head in cols.values()])
        self.alignments(self.find_alignments(self._rows, len(self._rows)))
    def from_object(self, obj, labels=None):
        labels = infer_keys(obj) if not labels else labels
        self._rows = []
        for key, head in labels.items():
            self._rows.append([head, obj.get(key)])
        self.headers(['Key', 'Value'])
--- a/examples/quickstart/jupyter-notebooks/druidapi/catalog.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/catalog.py
@ -0,0 +1,64 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import requests
 from .consts import COORD_BASE
 from .rest import check_error
 # Catalog (new feature in Druid 26)
 CATALOG_BASE = COORD_BASE + '/catalog'
 REQ_CAT_SCHEMAS = CATALOG_BASE + '/schemas'
 REQ_CAT_SCHEMA = REQ_CAT_SCHEMAS + '/{}'
 REQ_CAT_SCHEMA_TABLES = REQ_CAT_SCHEMA + '/tables'
 REQ_CAT_SCHEMA_TABLE = REQ_CAT_SCHEMA_TABLES + '/{}'
 REQ_CAT_SCHEMA_TABLE_EDIT = REQ_CAT_SCHEMA_TABLE + '/edit'
 class CatalogClient:
    '''
    Client for the Druid catalog feature that provides metadata for tables,
    including both datasources and external tables.
    '''
    def __init__(self, rest_client):
        self.client = rest_client
    def post_table(self, schema, table_name, table_spec, version=None, overwrite=None):
        params = {}
        if version:
            params['version'] = version
        if overwrite is not None:
            params['overwrite'] = overwrite
        return self.client.post_json(REQ_CAT_SCHEMA_TABLE, table_spec, args=[schema, table_name], params=params)
    def create(self, schema, table_name, table_spec):
        self.post_table(schema, table_name, table_spec)
    def table(self, schema, table_name):
        return self.client.get_json(REQ_CAT_SCHEMA_TABLE, args=[schema, table_name])
    def drop_table(self, schema, table_name, if_exists=False):
        r = self.client.delete(REQ_CAT_SCHEMA_TABLE, args=[schema, table_name])
        if if_exists and r.status_code == requests.codes.not_found:
            return
        check_error(r)
    def edit_table(self, schema, table_name, action):
        return self.client.post_json(REQ_CAT_SCHEMA_TABLE_EDIT, action, args=[schema, table_name])
    def schema_names(self):
        return self.client.get_json(REQ_CAT_SCHEMAS)
    def tables_in_schema(self, schema, list_format='name'):
        return self.client.get_json(REQ_CAT_SCHEMA_TABLES, args=[schema], params={'format': list_format})
--- a/examples/quickstart/jupyter-notebooks/druidapi/consts.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/consts.py
@ -0,0 +1,56 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 COORD_BASE = '/druid/coordinator/v1'
 ROUTER_BASE = '/druid/v2'
 OVERLORD_BASE = '/druid/indexer/v1'
 # System schemas and table names. Note: case must match in Druid, though
 # SQL itself is supposed to be case-insensitive.
 SYS_SCHEMA = 'sys'
 INFORMATION_SCHEMA = 'INFORMATION_SCHEMA'
 DRUID_SCHEMA = 'druid'
 EXT_SCHEMA = 'ext'
 # Information Schema tables
 SCHEMA_TABLE = INFORMATION_SCHEMA + '.SCHEMATA'
 TABLES_TABLE = INFORMATION_SCHEMA + '.TABLES'
 COLUMNS_TABLE = INFORMATION_SCHEMA + '.COLUMNS'
 # SQL request formats
 SQL_OBJECT = 'object'
 SQL_ARRAY = 'array'
 SQL_ARRAY_WITH_TRAILER = 'arrayWithTrailer'
 SQL_CSV = 'csv'
 # Type names as known to Druid and mentioned in documentation.
 DRUID_STRING_TYPE = 'string'
 DRUID_LONG_TYPE = 'long'
 DRUID_FLOAT_TYPE = 'float'
 DRUID_DOUBLE_TYPE = 'double'
 DRUID_TIMESTAMP_TYPE = 'timestamp'
 # SQL type names as returned from the INFORMATION_SCHEMA
 SQL_VARCHAR_TYPE = 'VARCHAR'
 SQL_BIGINT_TYPE = 'BIGINT'
 SQL_FLOAT_TYPE = 'FLOAT'
 SQL_DOUBLE_TYPE = 'DOUBLE'
 SQL_TIMESTAMP_TYPE = 'TIMESTAMP'
 SQL_ARRAY_TYPE = 'ARRAY'
 # Task status code
 RUNNING_STATE = 'RUNNING'
 SUCCESS_STATE = 'SUCCESS'
 FAILED_STATE = 'FAILED'
--- a/examples/quickstart/jupyter-notebooks/druidapi/datasource.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/datasource.py
@ -0,0 +1,81 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import requests, time
 from .consts import COORD_BASE
 from .rest import check_error
 from .util import dict_get
 REQ_DATASOURCES = COORD_BASE + '/datasources'
 REQ_DATASOURCE = REQ_DATASOURCES + '/{}'
 # Segment load status
 REQ_DATASOURCES = COORD_BASE + '/datasources'
 REQ_DS_LOAD_STATUS = REQ_DATASOURCES + '/{}/loadstatus'
 class DatasourceClient:
    '''
    Client for datasource APIs. Prefer to use SQL to query the
    INFORMATION_SCHEMA to obtain information.
    See https://druid.apache.org/docs/latest/operations/api-reference.html#datasources
    '''
    def __init__(self, rest_client):
        self.rest_client = rest_client
    def drop(self, ds_name, if_exists=False):
        '''
        Drops a data source.
        Marks as unused all segments belonging to a datasource. 
        Marking all segments as unused is equivalent to dropping the table.
        Parameters
        ----------
        ds_name: str
            The name of the datasource to query
        Returns
        -------
        Returns a map of the form 
        {"numChangedSegments": <number>} with the number of segments in the database whose 
        state has been changed (that is, the segments were marked as unused) as the result 
        of this API call.
        Reference
        ---------
        `DELETE /druid/coordinator/v1/datasources/{dataSourceName}`
        '''
        r = self.rest_client.delete(REQ_DATASOURCE, args=[ds_name])
        if if_exists and r.status_code == requests.codes.not_found:
            return
        check_error(r)
    def load_status_req(self, ds_name, params=None):
        return self.rest_client.get_json(REQ_DS_LOAD_STATUS, args=[ds_name], params=params)
    def load_status(self, ds_name):
        return self.load_status_req(ds_name, {
            'forceMetadataRefresh': 'true', 
            'interval': '1970-01-01/2999-01-01'})
    def wait_until_ready(self, ds_name):
        while True:
            resp = self.load_status(ds_name)
            if dict_get(resp, ds_name) == 100.0:
                return
            time.sleep(0.5)
--- a/examples/quickstart/jupyter-notebooks/druidapi/display.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/display.py
@ -0,0 +1,146 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from . import consts
 class DisplayClient:
    '''
    Abstract base class to display various kinds of results.
    '''
    def __init__(self, druid=None):
        # If the client is None, it must be backfilled by the caller.
        # This case occurs only when creating the DruidClient to avoid
        # a circular depencency.
        self._druid = druid
    # Basic display operations
    def text(self, msg):
        raise NotImplementedError()
    def alert(self, msg):
        raise NotImplementedError()
    def error(self, msg):
        raise NotImplementedError()
    # Tabular formatting
    def new_table(self):
        raise NotImplementedError()
    def show_table(self, table):
        raise NotImplementedError()
    def data_table(self, rows, cols=None):
        '''
        Display a table of data with the optional column headings.
        Parameters
        ----------
        objects: list[list]
            The data to display as a list of lists, where each inner list represents one
            row of data. Rows should be of the same width: ragged rows will display blank
            cells. Data can be of any scalar type and is formatted correctly for that type.
        cols: list[str]
            Optional list of column headings.
        '''
        table = self.new_table()
        table.rows(rows)
        table.headers(cols)
        self.show_table(table)
    def object_list(self, objects, cols=None):
        '''
        Display a list of objects represented as dictionaries with optional headings.
        Parameters
        ----------
        objects: list[dict]
            List of dictionaries: one dictionary for each row.
        cols: dict, Default = None
            A list of column headings in the form `{'key': 'label'}`
        '''
        table = self.new_table()
        table.from_object_list(objects, cols)
        self.show_table(table)
    def object(self, obj, labels=None):
        '''
        Display a single object represented as a dictionary with optional headings.
        The object is displayed in two columns: keys and values.
        Parameters
        ----------
        objects: list[dict]
            List of dictionaries: one dictionary for each row.
        labels: list, Default = None
            A list of column headings in the form `['key', 'value']`. Default headings
            are used if the lables are not provided.
        '''
        table = self.new_table()
        table.from_object(obj, labels)
        self.show_table(table)
    # SQL formatting
    def sql(self, sql):
        '''
        Run a query and display the result as a table.
        Parameters
        ----------
        query
            The query as either a string or a SqlRequest object.
        '''
        self._druid.sql.sql_query(sql).show(display=self)
    def table(self, table_name):
        '''
        Describe a table by returning the list of columns in the table.
        Parameters
        ----------
        table_name str
            The name of the table as either "table" or "schema.table".
            If the form is "table", then the 'druid' schema is assumed.
        '''
        self._druid.sql._schema_query(table_name).show(display=self)
    def function(self,  table_name):
        '''
        Retrieve the list of parameters for a partial external table defined in
        the Druid catalog.
        Parameters
        ----------
        table_name str
            The name of the table as either "table" or "schema.table".
            If the form is "table", then the 'ext' schema is assumed.
        '''
        return self._druid.sql._function_args_query(table_name).show(display=self)
    def schemas(self):
        '''
        Display the list of schemas available in Druid.
        '''
        self._druid.sql._schemas_query().show()
    def tables(self, schema=consts.DRUID_SCHEMA):
        self._druid.sql._tables_query(schema).show(display=self)
--- a/examples/quickstart/jupyter-notebooks/druidapi/druid.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/druid.py
@ -0,0 +1,130 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .rest import DruidRestClient
 from .status import StatusClient
 from .catalog import CatalogClient
 from .sql import QueryClient
 from .tasks import TaskClient
 from .datasource import DatasourceClient
 class DruidClient:
    '''
    Client for a Druid cluster. Functionality is split into a number of
    specialized "clients" that group many of Druid's REST API calls.
    '''
    def __init__(self, router_endpoint, display_client=None):
        self.rest_client = DruidRestClient(router_endpoint)
        self.status_client = None
        self.catalog_client = None
        self.sql_client = None
        self.tasks_client = None
        self.datasource_client = None
        if display_client:
            self.display_client = display_client
        else:
            from .text import TextDisplayClient
            self.display_client = TextDisplayClient()
        self.display_client._druid = self
    @property
    def rest(self):
        '''
        Returns the low-level REST client. Useful for debugging and to access REST API
        calls not yet wrapped by the various function-specific clients.
        If you find you need to use this, consider creating a wrapper function in Python
        and contributing it to Druid via a pull request.
        '''
        return self.rest_client
    def trace(self, enable=True):
        '''
        Enable or disable tracing. When enabled, the Druid client prints the
        URL and payload for each REST API call. Useful for debugging, or if you want
        to learn what the code does so you can replicate it in your own client.
        '''
        self.rest_client.enable_trace(enable)
    @property
    def status(self) -> StatusClient:
        '''
        Returns the status client for the Router service.
        '''
        if not self.status_client:
            self.status_client = StatusClient(self.rest_client)
        return self.status_client
    def status_for(self, endpoint) -> StatusClient:
        '''
        Returns the status client for a Druid service.
        Parameters
        ----------
        endpoint: str
            The URL for a Druid service.
        '''
        return StatusClient(DruidRestClient(endpoint), True)
    @property
    def catalog(self) -> CatalogClient:
        '''
        Returns the catalog client to interact with the Druid catalog.
        '''
        if not self.catalog_client:
            self.catalog_client = CatalogClient(self.rest_client)
        return self.catalog_client
    @property
    def sql(self) -> QueryClient:
        '''
        Returns the SQL query client to submit interactive or MSQ queries.
        '''
        if not self.sql_client:
            self.sql_client = QueryClient(self)
        return self.sql_client
    @property
    def tasks(self) -> TaskClient:
        '''
        Returns the Overlord tasks client to submit and track tasks.
        '''
        if not self.tasks_client:
            self.tasks_client = TaskClient(self.rest_client)
        return self.tasks_client
    @property
    def datasources(self) -> DatasourceClient:
        '''
        Returns the Coordinator datasources client to manipulate datasources.
        Prefer to use the SQL client to query the INFORMATION_SCHEMA to obtain
        information about datasources.
        '''
        if not self.datasource_client:
            self.datasource_client = DatasourceClient(self.rest_client)
        return self.datasource_client
    @property
    def display(self):
        return self.display_client
    def close(self):
        self.rest_client.close()
        self.rest_client = None
        self.catalog_client = None
        self.tasks_client = None
        self.datasource_client = None
        self.sql_client = None
--- a/examples/quickstart/jupyter-notebooks/druidapi/error.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/error.py
@ -0,0 +1,31 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 class ClientError(Exception):
    '''
    Indicates an error with usage of the Python API.
    '''
    def __init__(self, msg):
        self.message = msg
 class DruidError(Exception):
    '''
    Indicates that something went wrong on Druid, typically as the result of a
    request that this client sent.
    '''
    def __init__(self, msg):
        self.message = msg
--- a/examples/quickstart/jupyter-notebooks/druidapi/html.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/html.py
@ -0,0 +1,141 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from IPython.display import display, HTML
 from html import escape
 from .display import DisplayClient
 from .base_table import BaseTable
 STYLES = '''
 <style>
  .druid table {
    border: 1px solid black;
    border-collapse: collapse;
  }
  .druid th, .druid td {
    padding: 4px 1em ;
    text-align: left;
  }
  td.druid-right, th.druid-right {
    text-align: right;
  }
  td.druid-center, th.druid-center {
    text-align: center;
  }
  .druid .druid-left {
    text-align: left;
  }
  .druid-alert {
    font-weight: bold;
  }
  .druid-error {
    color: red;
  }
 </style>
 '''
 def escape_for_html(s):
    # Annoying: IPython treats $ as the start of Latex, which is cool,
    # but not wanted here.
    return s.replace('$', '\\$')
 def html(s):
    display(HTML(s))
 initialized = False
 alignments = ['druid-left', 'druid-center', 'druid-right']
 def start_tag(tag, align):
    s = '<' + tag
    if align:
        s += ' class="{}"'.format(alignments[align])
    return s + '>'
 class HtmlTable(BaseTable):
    def __init__(self):
        BaseTable.__init__(self)
    def widths(self, widths):
        self._widths = widths
    def format(self) -> str:
        if not self._rows and not self._headers:
            return ''
        _, width = self.row_width(self._rows)
        headers = self.pad_headers(width)
        rows = self.pad_rows(self._rows, width)
        s = '<table>\n'
        s += self.gen_header(headers)
        s += self.gen_rows(rows)
        return s + '\n</table>'
    def gen_header(self, headers):
        if not headers:
            return ''
        s = '<tr>'
        for i in range(len(headers)):
            s += start_tag('th', self.col_align(i)) + escape(headers[i]) + '</th>'
        return s + '</tr>\n'
    def gen_rows(self, rows):
        html_rows = []
        for row in rows:
            r = '<tr>'
            for i in range(len(row)):
                r += start_tag('td', self.col_align(i))
                cell = row[i]
                value = '' if cell is None else escape(str(cell))
                r += value + '</td>'
            html_rows.append(r + '</tr>')
        return '\n'.join(html_rows)
    def col_align(self, col):
        if not self._align:
            return None
        if col >= len(self._align):
            return None
        return self._align[col]
 class HtmlDisplayClient(DisplayClient):
    def __init__(self):
        DisplayClient.__init__(self)
        global initialized
        if not initialized:
            display(HTML(STYLES))
            initialized = True
    def text(self, msg):
        html('<div class="druid">' + escape_for_html(msg) + '</div>')
    def alert(self, msg):
        html('<div class="druid-alert">' + escape_for_html(msg.replace('\n', '<br/>')) + '</div>')
    def error(self, msg):
        html('<div class="druid-error">ERROR: ' + escape_for_html(msg.replace('\n', '<br/>')) + '</div>')
    def new_table(self):
        return HtmlTable()
    def show_table(self, table):
        self.text(table.format())
--- a/examples/quickstart/jupyter-notebooks/druidapi/requirements.txt
+++ b/examples/quickstart/jupyter-notebooks/druidapi/requirements.txt
@ -0,0 +1,22 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ------------------------------------------------------------------------
 # Requirements for the the druiapi library.
 # See: https://pip.pypa.io/en/stable/reference/requirements-file-format/
 #
 # Requirements are both few and simple at present.
 requests
--- a/examples/quickstart/jupyter-notebooks/druidapi/rest.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/rest.py
@ -0,0 +1,265 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import requests
 from .util import dict_get
 from urllib.parse import quote
 from .error import ClientError
 def check_error(response):
    '''
    Raises an HttpError from the requests library if the response code is neither
    OK (200) nor Accepted (202).
    Druid's REST API is inconsistent with how it reports errors. Some APIs return
    an error as a JSON object. Others return a text message. Still others return
    nothing at all. With the JSON format, sometimes the error returns an
    'errorMessage' field, other times only a generic 'error' field.
    This method attempts to parse these variations. If the error response JSON
    matches one of the known error formats, then raises a `ClientError` with the error
    message. Otherise, raises a Requests library `HTTPError` for a generic error.
    If the response includes a JSON payload, then the it is returned in the json field 
    of the `HTTPError` object so that the client can perhaps decode it.
    '''
    code = response.status_code
    if code == requests.codes.ok or code == requests.codes.accepted:
        return
    json = None
    try:
        json = response.json()
    except Exception:
        # If we can't get the JSON, raise a Requests error
        response.raise_for_status()
    # Druid JSON payload. Try to make sense of the error
    msg = dict_get(json, 'errorMessage')
    if not msg:
        msg = dict_get(json, 'error')
    if msg:
        # We have an explanation from Druid. Raise a Client exception
        raise ClientError(msg)
    # Don't know what the Druid JSON is. Raise a Requetss exception, but
    # add on the JSON in the hopes that the caller can make use of it.
    try:
        response.raise_for_status()
    except Exception as e:
        e.json = json
        raise e
 def build_url(endpoint, req, args=None) -> str:
    '''
    Returns the full URL for a REST call given the relative request API and
    optional parameters to fill placeholders within the request URL.
    Parameters
    ----------
    endpoint: str
        The base URL for the service.
    req: str
        Relative URL, with optional {} placeholders
    args: list
        Optional list of values to match {} placeholders in the URL.
    '''
    url = endpoint + req
    if args:
        quoted = [quote(arg) for arg in args]
        url = url.format(*quoted)
    return url
 class DruidRestClient:
    '''
    Wrapper around the basic Druid REST API operations using the
    requests Python package. Handles the grunt work of building up
    URLs, working with JSON, etc.
    The REST client accepts an endpoint that represents a Druid service, typically
    the Router. All requests are made to this service, which means using the service
    URL as the base. That is, if the service is http://localhost:8888, then
    a request for status is just '/status': the methods here build up the URL by
    concatenating the service endpoint with the request URL.
    '''
    def __init__(self, endpoint):
        self.endpoint = endpoint
        self.trace = False
        self.session = requests.Session()
    def enable_trace(self, flag=True):
        self.trace = flag
    def build_url(self, req, args=None) -> str:
        '''
        Returns the full URL for a REST call given the relative request API and
        optional parameters to fill placeholders within the request URL.
        Parameters
        ----------
        req: str
            Relative URL, with optional {} placeholders
        args: list
            Optional list of values to match {} placeholders in the URL.
        '''
        return build_url(self.endpoint, req, args)
    def get(self, req, args=None, params=None, require_ok=True) -> requests.Request:
        '''
        Generic GET request to this service.
        Parameters
        ----------
        req: str
            The request URL without host, port or query string.
            Example: `/status`
        args: [str], default = None
            Optional parameters to fill in to the URL.
            Example: `/customer/{}`
        params: dict, default = None
            Optional map of query variables to send in
            the URL. Query parameters are the name/value pairs
            that appear after the `?` marker.
        require_ok: bool, default = True
            Whether to require an OK (200) response. If `True`, and
            the request returns a different response code, then raises
            a `RestError` exception.
        Returns
        -------
        The `requests` `Request` object.
        '''
        url = self.build_url(req, args)
        if self.trace:
            print('GET:', url)
        r = self.session.get(url, params=params)
        if require_ok:
            check_error(r)
        return r
    def get_json(self, url_tail, args=None, params=None):
        '''
        Generic GET request which expects a JSON response.
        '''
        r = self.get(url_tail, args, params)
        return r.json()
    def post(self, req, body, args=None, headers=None, require_ok=True) -> requests.Response:
        '''
        Issues a POST request for the given URL on this
        node, with the given payload and optional URL query
        parameters.
        '''
        url = self.build_url(req, args)
        if self.trace:
            print('POST:', url)
            print('body:', body)
        r = self.session.post(url, data=body, headers=headers)
        if require_ok:
            check_error(r)
        return r
    def post_json(self, req, body, args=None, headers=None, params=None) -> requests.Response:
        '''
        Issues a POST request for the given URL on this node, with a JSON request. Returns
        the JSON response.
        Parameters
        ----------
        req: str
            URL relative to the service base URL.
        body: any
            JSON-encodable Python object to send in the request body.
        args: array[str], default = None
            Arguments to include in the relative URL to replace {} markers.
        headers: dict, default = None
            Additional HTTP header fields to send in the request.
        params: dict, default = None
            Parameters to inlude in the URL as the `?name=value` query string.
        Returns
        -------
            The JSON response as a Python object.
        See
        ---
            `post_only_json()` for the form that returns the response object, not JSON.
        '''
        r = self.post_only_json(req, body, args, headers, params)
        check_error(r)
        return r.json()
    def post_only_json(self, req, body, args=None, headers=None, params=None) -> requests.Request:
        '''
        Issues a POST request for the given URL on this node, with a JSON request, returning
        the Requests library `Response` object.
        Parameters
        ----------
        req: str
            URL relative to the service base URL.
        body: any
            JSON-encodable Python object to send in the request body.
        args: array[str], default = None
            Arguments to include in the relative URL to replace {} markers.
        headers: dict, default = None
            Additional HTTP header fields to send in the request.
        params: dict, default = None
            Parameters to inlude in the URL as the `?name=value` query string.
        Returns
        -------
            The JSON response as a Python object.
        See
        ---
            `post_json()` for the form that returns the response JSON.
        '''
        url = self.build_url(req, args)
        if self.trace:
            print('POST:', url)
            print('body:', body)
        return self.session.post(url, json=body, headers=headers, params=params)
    def delete(self, req, args=None, params=None, headers=None):
        url = self.build_url(req, args)
        if self.trace:
            print('DELETE:', url)
        r = self.session.delete(url, params=params, headers=headers)
        return r
    def delete_json(self, req, args=None, params=None, headers=None):
        return self.delete(req, args=args, params=params, headers=headers).json()
    def close(self):
        '''
        Close the session. Use in scripts and tests when the system will otherwise complain
        about open sockets.
        '''
        self.session.close()
        self.session = None
--- a/examples/quickstart/jupyter-notebooks/druidapi/sql.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/sql.py
@ -0,0 +1,862 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import time, requests
 from . import consts
 from .util import dict_get, split_table_name
 from .error import DruidError, ClientError
 REQ_SQL = consts.ROUTER_BASE + '/sql'
 REQ_SQL_TASK = REQ_SQL + '/task'
 class SqlRequest:
    def __init__(self, query_client, sql):
        self.query_client = query_client
        self.sql = sql
        self.context = None
        self.params = None
        self.header = False
        self.format = consts.SQL_OBJECT
        self.headers = None
        self.types = None
        self.sql_types = None
    def with_format(self, result_format):
        self.format = result_format
        return self
    def with_headers(self, sql_types=False, druidTypes=False):
        self.headers = True
        self.types = druidTypes
        self.sql_types = sql_types
        return self
    def with_context(self, context):
        if not self.context:
            self.context = context
        else:
            self.context.update(context)
        return self
    def add_context(self, key, value):
        return self.with_context({key: value})
    def with_parameters(self, params):
        '''
        Set the array of parameters. Parameters must each be a map of 'type'/'value' pairs:
        {'type': the_type, 'value': the_value}. The type must be a valid SQL type
        (in upper case). See the consts module for a list.
        '''
        for param in params:
            self.add_parameters(param)
        return self
    def add_parameter(self, value):
        '''
        Add one parameter value. Infers the type of the parameter from the Python type.
        '''
        if value is None:
            raise ClientError('Druid does not support null parameter values')
        data_type = None
        value_type = type(value)
        if value_type is str:
            data_type = consts.SQL_VARCHAR_TYPE
        elif value_type is int:
            data_type = consts.SQL_BIGINT_TYPE
        elif value_type is float:
            data_type = consts.SQL_DOUBLE_TYPE
        elif value_type is list:
            data_type = consts.SQL_ARRAY_TYPE
        else:
            raise ClientError('Unsupported value type')
        if not self.params:
            self.params = []
        self.params.append({'type': data_type, 'value': value})
    def response_header(self):
        self.header = True
        return self
    def request_headers(self, headers):
        self.headers = headers
        return self
    def to_common_format(self):
        self.header = False
        self.sql_types = False
        self.types = False
        self.format = consts.SQL_OBJECT
        return self
    def to_request(self):
        query_obj = {'query': self.sql}
        if self.context:
            query_obj['context'] = self.context
        if self.params:
            query_obj['parameters'] = self.params
        if self.header:
            query_obj['header'] = True
        if self.format:
            query_obj['resultFormat'] = self.format
        if self.sql_types is not None: # Note: boolean variable
            query_obj['sqlTypesHeader'] = self.sql_types
        if self.types is not None: # Note: boolean variable
            query_obj['typesHeader'] = self.types
        return query_obj
    def result_format(self):
        return self.format.lower()
    def run(self):
        return self.query_client.sql_query(self)
 def request_from_sql_query(query_client, sql_query):
    try:
        req = SqlRequest(query_client, sql_query['query'])
    except KeyError:
        raise ClientError('A SqlRequest dictionary must have \'query\' set')
    req.context = sql_query.get('context')
    req.params = sql_query.get('parameters')
    req.header = sql_query.get('header')
    req.format = sql_query.get('resultFormat')
    req.format = consts.SQL_OBJECT if req.format is None else req.format
    req.sql_types = sql_query.get('sqlTypesHeader')
    req.types = sql_query.get('typesHeader')
    return req
 def parse_rows(fmt, context, results):
    if fmt == consts.SQL_ARRAY_WITH_TRAILER:
        rows = results['results']
    elif fmt == consts.SQL_ARRAY:
        rows = results
    else:
        return results
    if not context.get('headers', False):
        return rows
    header_size = 1
    if context.get('sqlTypesHeader', False):
        header_size += 1
    if context.get('typesHeader', False):
        header_size += 1
    return rows[header_size:]
 def label_non_null_cols(results):
    if not results:
        return []
    is_null = {}
    for key in results[0].keys():
        is_null[key] = True
    for row in results:
        for key, value in row.items():
            # The following is hack to check for null values, empty strings and numeric 0s.
            is_null[key] = not not value
    return is_null
 def filter_null_cols(results):
    '''
    Filter columns from a Druid result set by removing all null-like
    columns. A column is considered null if all values for that column
    are null. A value is null if it is either a JSON null, an empty
    string, or a numeric 0. All rows are preserved, as is the order
    of the remaining columns.
    '''
    if not results:
        return results
    is_null = label_non_null_cols(results)
    revised = []
    for row in results:
        new_row = {}
        for key, value in row.items():
            if is_null[key]:
                continue
            new_row[key] = value
        revised.append(new_row)
    return revised
 def parse_object_schema(results):
    schema = []
    if len(results) == 0:
        return schema
    row = results[0]
    for k, v in row.items():
        druid_type = None
        sql_type = None
        if type(v) is str:
            druid_type = consts.DRUID_STRING_TYPE
            sql_type = consts.SQL_VARCHAR_TYPE
        elif type(v) is int or type(v) is float:
            druid_type = consts.DRUID_LONG_TYPE
            sql_type = consts.SQL_BIGINT_TYPE
        schema.append(ColumnSchema(k, sql_type, druid_type))
    return schema
 def parse_array_schema(context, results):
    schema = []
    if len(results) == 0:
        return schema
    has_headers = context.get(consts.HEADERS_KEY, False)
    if not has_headers:
        return schema
    has_sql_types = context.get(consts.SQL_TYPES_HEADERS_KEY, False)
    has_druid_types = context.get(consts.DRUID_TYPE_HEADERS_KEY, False)
    size = len(results[0])
    for i in range(size):
        druid_type = None
        if has_druid_types:
            druid_type = results[1][i]
        sql_type = None
        if has_sql_types:
            sql_type = results[2][i]
        schema.append(ColumnSchema(results[0][i], sql_type, druid_type))
    return schema
 def parse_schema(fmt, context, results):
    if fmt == consts.SQL_OBJECT:
        return parse_object_schema(results)
    elif fmt == consts.SQL_ARRAY or fmt == consts.SQL_ARRAY_WITH_TRAILER:
        return parse_array_schema(context, results)
    else:
        return []
 def is_response_ok(http_response):
    code = http_response.status_code
    return code == requests.codes.ok or code == requests.codes.accepted
 class ColumnSchema:
    def __init__(self, name, sql_type, druid_type):
        self.name = name
        self.sql_type = sql_type
        self.druid_type = druid_type
    def __str__(self):
        return '{{name={}, SQL type={}, Druid type={}}}'.format(self.name, self.sql_type, self.druid_type)
 class SqlQueryResult:
    '''
    Response from a classic request/response query.
    '''
    def __init__(self, request, response):
        self.http_response = response
        self._json = None
        self._rows = None
        self._schema = None
        self.request = request
        self._error = None
        self._id = None
        if not is_response_ok(response):
            try:
                self._error = response.json()
            except Exception:
                self._error = response.text
                if not self._error:
                    self._error = 'Failed with HTTP status {}'.format(response.status_code)
        try:
            self._id = self.http_response.headers['X-Druid-SQL-Query-Id']
        except KeyError:
            self._error = 'Query returned no query ID'
    @property
    def _druid(self):
        return self.request.query_client.druid_client
    @property
    def result_format(self):
        return self.request.result_format()
    @property
    def ok(self):
        '''
        Reports if the query succeeded.
        The query rows and schema are available only if ok is True.
        '''
        return is_response_ok(self.http_response)
    @property
    def error(self):
        '''
        If the query fails, returns the error, if any provided by Druid.
        '''
        if self.ok:
            return None
        if self._error:
            return self._error
        if not self.http_response:
            return { 'error': 'unknown'}
        if is_response_ok(self.http_response):
            return None
        return {'error': 'HTTP {}'.format(self.http_response.status_code)}
    @property
    def error_message(self):
        if self.ok:
            return None
        err = self.error
        if not err:
            return 'unknown'
        if type(err) is str:
            return err
        msg = err.get('error')
        text = err.get('errorMessage')
        if not msg and not text:
            return 'unknown'
        if not msg:
            return text
        if not text:
            return msg
        return msg + ': ' + text
    @property
    def id(self):
        '''
        Returns the unique identifier for the query.
        '''
        return self._id
    @property
    def non_null(self):
        if not self.ok:
            return None
        if self.result_format != consts.SQL_OBJECT:
            return None
        return filter_null_cols(self.rows)
    @property
    def as_array(self):
        if self.result_format == consts.SQL_OBJECT:
            rows = []
            for obj in self.rows:
                rows.append([v for v in obj.values()])
            return rows
        else:
            return self.rows
    @property
    def json(self):
        if not self.ok:
            return None
        if not self._json:
            self._json = self.http_response.json()
        return self._json
    @property
    def rows(self):
        '''
        Returns the rows of data for the query.
        Druid supports many data formats. The method makes its best
        attempt to map the format into an array of rows of some sort.
        '''
        if not self._rows:
            json = self.json
            if not json:
                return self.http_response.text
            self._rows = parse_rows(self.result_format, self.request.context, json)
        return self._rows
    @property
    def schema(self):
        '''
        Returns the data schema as a list of ColumnSchema objects.
        Druid supports many data formats; not all of which provide
        schema information. This method makes a best effort to
        extract the schema from the query results.
        '''
        if not self._schema:
            self._schema = parse_schema(self.result_format, self.request.context, self.json)
        return self._schema
    def _display(self, display):
        return self._druid.display if not display else display
    def show(self, non_null=False, display=None):
        display = self._display(display)
        if not self.ok:
            display.error(self.error_message)
            return
        data = None
        if non_null:
            data = self.non_null
        if not data:
            data = self.as_array
        if not data:
            display.alert('Query returned no results')
            return
        display.data_table(data, [c.name for c in self.schema])
    def show_schema(self, display=None):
        display = self._display(display)
        if not self.ok:
            display.error(self.error_message)
            return
        data = []
        for c in self.schema:
            data.append([c.name, c.sql_type, c.druid_type])
        if not data:
            display.alert('Query returned no schema')
            return
        display.data_table(data, ['Name', 'SQL Type', 'Druid Type'])
 class QueryTaskResult:
    '''
    Response from an asynchronous MSQ query, which may be an ingestion or a retrieval
    query. Can monitor task progress and wait for the task to complete. For a SELECT query,
    obtains the rows from the task reports. There are no results for an ingestion query,
    just a success/failure status.
    Note that SELECT query support is preliminary. The result structure is subject to
    change. Use a version of the library that matches your version of Druid for best
    results with MSQ SELECT queries.
    '''
    def __init__(self, request, response):
        self._request = request
        self.http_response = response
        self._status = None
        self._results = None
        self._details = None
        self._schema = None
        self._rows = None
        self._reports = None
        self._schema = None
        self._results = None
        self._error = None
        self._id = None
        if not is_response_ok(response):
            self._state = consts.FAILED_STATE
            try:
                self._error = response.json()
            except Exception:
                self._error = response.text
                if not self._error:
                    self._error = 'Failed with HTTP status {}'.format(response.status_code)
            return
        # Typical response:
        # {'taskId': '6f7b514a446d4edc9d26a24d4bd03ade_fd8e242b-7d93-431d-b65b-2a512116924c_bjdlojgj',
        # 'state': 'RUNNING'}
        self.response_obj = response.json()
        self._id = self.response_obj['taskId']
        self._state = self.response_obj['state']
    @property
    def ok(self):
        '''
        Reports if the query completed successfully or is still running.
        Use succeeded() to check if the task is done and successful.
        '''
        return not self._error
    @property
    def id(self):
        return self._id
    def _druid(self):
        return self._request.query_client.druid_client
    def _tasks(self):
        return self._druid().tasks
    @property
    def status(self):
        '''
        Polls Druid for an update on the query run status.
        '''
        self.check_valid()
        # Example:
        # {'task': 'talaria-sql-w000-b373b68d-2675-4035-b4d2-7a9228edead6',
        # 'status': {
        #   'id': 'talaria-sql-w000-b373b68d-2675-4035-b4d2-7a9228edead6',
        #   'groupId': 'talaria-sql-w000-b373b68d-2675-4035-b4d2-7a9228edead6',
        #   'type': 'talaria0', 'createdTime': '2022-04-28T23:19:50.331Z',
        #   'queueInsertionTime': '1970-01-01T00:00:00.000Z',
        #   'statusCode': 'RUNNING', 'status': 'RUNNING', 'runnerStatusCode': 'PENDING',
        #   'duration': -1, 'location': {'host': None, 'port': -1, 'tlsPort': -1},
        #   'dataSource': 'w000', 'errorMsg': None}}
        self._status = self._tasks().task_status(self._id)
        self._state = self._status['status']['status']
        if self._state == consts.FAILED_STATE:
            self._error = self._status['status']['errorMsg']
        return self._status
    @property
    def done(self):
        '''
        Reports whether the query is done. The query is done when the Overlord task
        that runs the query completes. A completed task is one with a status of either
        SUCCESS or FAILED.
        '''
        return self._state == consts.FAILED_STATE or self._state == consts.SUCCESS_STATE
    @property
    def succeeded(self):
        '''
        Reports if the query succeeded.
        '''
        return self._state == consts.SUCCESS_STATE
    @property
    def state(self):
        '''
        Reports the task state from the Overlord task.
        Updated after each call to status().
        '''
        return self._state
    @property
    def error(self):
        return self._error
    @property
    def error_message(self):
        err = self.error()
        if not err:
            return 'unknown'
        if type(err) is str:
            return err
        msg = dict_get(err, 'error')
        text = dict_get(err, 'errorMessage')
        if not msg and not text:
            return 'unknown'
        if text:
            text = text.replace('\\n', '\n')
        if not msg:
            return text
        if not text:
            return msg
        return msg + ': ' + text
    def join(self):
        '''
        Wait for the task to complete, if still running. Returns at task
        completion: success or failure.
        Returns True for success, False for failure.
        '''
        if not self.done:
            self.status
            while not self.done:
                time.sleep(0.5)
                self.status
        return self.succeeded
    def check_valid(self):
        if not self._id:
            raise ClientError('Operation is invalid on a failed query')
    def wait_until_done(self):
        '''
        Wait for the task to complete. Raises an error if the task fails.
        A caller can proceed to do something with the successful result
        once this method returns without raising an error.
        '''
        if not self.join():
            raise DruidError('Query failed: ' + self.error_message())
    def wait(self):
        '''
        Wait for a SELECT query to finish running, then returns the rows from the query.
        '''
        self.wait_until_done()
        return self.rows
    @property
    def reports(self) -> dict:
        self.check_valid()
        if not self._reports:
            self.join()
            self._reports = self._tasks().task_reports(self._id)
        return self._reports
    @property
    def results(self):
        if not self._results:
            rpts = self.reports()
            self._results = rpts['multiStageQuery']['payload']['results']
        return self._results
    @property
    def schema(self):
        if not self._schema:
            results = self.results
            sig = results['signature']
            sql_types = results['sqlTypeNames']
            size = len(sig)
            self._schema = []
            for i in range(size):
                self._schema.append(ColumnSchema(sig[i]['name'], sql_types[i], sig[i]['type']))
        return self._schema
    @property
    def rows(self):
        if not self._rows:
            results = self.results
            self._rows = results['results']
        return self._rows
    def _display(self, display):
        return self._druid().display if not display else display
    def show(self, non_null=False, display=None):
        display = self._display(display)
        if not self.done:
            display.alert('Task has not finished running')
            return
        if not self.succeeded:
            display.error(self.error_message)
            return
        data = self.rows
        if non_null:
            data = filter_null_cols(data)
        if not data:
            display.alert('Query returned no {}rows'.format("visible " if non_null else ''))
            return
        display.data_table(data, [c.name for c in self.schema])
 class QueryClient:
    def __init__(self, druid, rest_client=None):
        self.druid_client = druid
        self._rest_client = druid.rest_client if not rest_client else rest_client
    @property
    def rest_client(self):
        return self._rest_client
    def _prepare_query(self, request):
        if not request:
            raise ClientError('No query provided.')
        # If the request is a dictionary, assume it is already in SqlQuery form.
        query_obj = None
        if type(request) == dict:
            query_obj = request
            request = request_from_sql_query(self, request)
        elif type(request) == str:
            request = self.sql_request(request)
        if not request.sql:
            raise ClientError('No query provided.')
        if self.rest_client.trace:
            print(request.sql)
        if not query_obj:
            query_obj = request.to_request()
        return (request, query_obj)
    def sql_query(self, request) -> SqlQueryResult:
        '''
        Submits a SQL query with control over the context, parameters and other
        options. Returns a response with either a detailed error message, or
        the rows and query ID.
        Parameters
        ----------
        request: str | SqlRequest | dict
            If a string, then gives the SQL query to execute.
            Can also be a `SqlRequest`, obtained from the
            'sql_request()` method, with optional query context, query parameters or
            other options.
            Can also be a dictionary that represents a `SqlQuery` object. The
            `SqlRequest` is a convenient wrapper to generate a `SqlQuery`.
        Note that some of the Druid SqlQuery options will return data in a format
        that this library cannot parse. In that case, obtain the raw payload from
        the response and avoid using the rows() and schema() methods.
        Returns
        -------
        A SqlQueryResult object that provides either the error message for a failed query,
        or the results of a successul query. The object provides access to the schema and
        rows if data is requested in a supported format. The default request object sets the
        options to return data in the required format.
        '''
        request, query_obj = self._prepare_query(request)
        r = self.rest_client.post_only_json(REQ_SQL, query_obj, headers=request.headers)
        return SqlQueryResult(request, r)
    def sql(self, sql, *args) -> list:
        '''
        Run a SQL query and return the results. Typically used to receive data as part
        of another operation, rathre than to display results to the user.
        Parameters
        ----------
        sql: str
            The SQL statement with optional Python `{}` parameters.
        args: list[str], Default = None
            Array of values to insert into the parameters.
        '''
        if len(args) > 0:
            sql = sql.format(*args)
        resp = self.sql_query(sql)
        if resp.ok:
            return resp.rows
        raise ClientError(resp.error_message)
    def explain_sql(self, query):
        '''
        Runs an EXPLAIN PLAN FOR query for the given query.
        Returns
        -------
        An object with the plan JSON parsed into Python objects:
        plan: the query plan
        columns: column schema
        tables: dictionary of name/type pairs
        '''
        if not query:
            raise ClientError('No query provided.')
        results = self.sql('EXPLAIN PLAN FOR ' + query)
        return results[0]
    def sql_request(self, sql) -> SqlRequest:
        '''
        Creates a SqlRequest object for the given SQL query text.
        '''
        return SqlRequest(self, sql)
    def task(self, query) -> QueryTaskResult:
        '''
        Submits an MSQ query. Returns a QueryTaskResult to track the task.
        Parameters
        ----------
        query
            The query as either a string or a SqlRequest object.
        '''
        request, query_obj = self._prepare_query(query)
        r = self.rest_client.post_only_json(REQ_SQL_TASK, query_obj, headers=request.headers)
        return QueryTaskResult(request, r)
    def run_task(self, query):
        '''
        Submits an MSQ query and wait for completion. Returns a QueryTaskResult to track the task.
        Parameters
        ----------
        query
            The query as either a string or a SqlRequest object.
        '''
        resp = self.task(query)
        if not resp.ok:
            raise ClientError(resp.error_message)
        resp.wait_until_done()
    def _tables_query(self, schema):
        return self.sql_query('''
            SELECT TABLE_NAME AS TableName
            FROM INFORMATION_SCHEMA.TABLES
            WHERE TABLE_SCHEMA = '{}'
            ORDER BY TABLE_NAME
            '''.format(schema))
    def tables(self, schema=consts.DRUID_SCHEMA):
        '''
        Returns a list of tables in the given schema.
        Parameters
        ----------
        schema
            The schema to query, `druid` by default.
        '''
        return self._tables_query(schema).rows
    def _schemas_query(self):
        return self.sql_query('''
            SELECT SCHEMA_NAME AS SchemaName
            FROM INFORMATION_SCHEMA.SCHEMATA
            ORDER BY SCHEMA_NAME
            ''')
    def schemas(self):
        return self._schemas_query().rows
    def _schema_query(self, table_name):
        parts = split_table_name(table_name, consts.DRUID_SCHEMA)
        return self.sql_query('''
            SELECT
              ORDINAL_POSITION AS "Position",
              COLUMN_NAME AS "Name",
              DATA_TYPE AS "Type"
            FROM INFORMATION_SCHEMA.COLUMNS
            WHERE TABLE_SCHEMA = '{}'
              AND TABLE_NAME = '{}'
            ORDER BY ORDINAL_POSITION
            '''.format(parts[0], parts[1]))
    def table_schema(self, table_name):
        '''
        Returns the schema of a table as an array of dictionaries of the
        form {"Position": "<n>", "Name": "<name>", "Type": "<type>"}
        Parameters
        ----------
        table_name: str
            The name of the table as either "table" or "schema.table".
            If the form is "table", then the 'druid' schema is assumed.
        '''
        return self._schema_query(table_name).rows
    def _function_args_query(self, table_name):
        parts = split_table_name(table_name, consts.EXT_SCHEMA)
        return self.sql_query('''
            SELECT
              ORDINAL_POSITION AS "Position",
              PARAMETER_NAME AS "Parameter",
              DATA_TYPE AS "Type",
              IS_OPTIONAL AS "Optional"
            FROM INFORMATION_SCHEMA.PARAMETERS
            WHERE SCHEMA_NAME = '{}'
              AND FUNCTION_NAME = '{}'
            ORDER BY ORDINAL_POSITION
            '''.format(parts[0], parts[1]))
    def function_parameters(self,  table_name):
        '''
        Retruns the list of parameters for a partial external table defined in
        the Druid catalog. Returns the parameters as an array of objects in the
        form {"Position": <n>, "Parameter": "<name>", "Type": "<type>",
              "Optional": True|False}
        Parameters
        ----------
        table_name str
            The name of the table as either "table" or "schema.table".
            If the form is "table", then the 'ext' schema is assumed.
        '''
        return self._function_args_query(table_name).rows
    def wait_until_ready(self, table_name):
        '''
        Waits for a datasource to be loaded in the cluster, and to become available to SQL.
        Parameters
        ----------
        table_name str
            The name of a datasource in the 'druid' schema.
        '''
        self.druid_client.datasources.wait_until_ready(table_name)
        while True:
            try:
                self.sql('SELECT 1 FROM "{}" LIMIT 1'.format(table_name));
                return
            except Exception:
                time.sleep(0.5)
--- a/examples/quickstart/jupyter-notebooks/druidapi/status.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/status.py
@ -0,0 +1,124 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import time
 STATUS_BASE = '/status'
 REQ_STATUS = STATUS_BASE
 REQ_HEALTH = STATUS_BASE + '/health'
 REQ_PROPERTIES = STATUS_BASE + '/properties'
 REQ_IN_CLUSTER = STATUS_BASE + '/selfDiscovered/status'
 ROUTER_BASE = '/druid/router/v1'
 REQ_BROKERS = ROUTER_BASE + '/brokers'
 class StatusClient:
    '''
    Client for status APIs. These APIs are available on all nodes.
    If used with the Router, they report the status of just the Router.
    To check the status of other nodes, first create a REST endpoint for that
    node:
      status_client = StatusClient(DruidRestClient("<service endpoint>"))
    You can find the service endpoints by querying the sys.servers table using SQL.
    See https://druid.apache.org/docs/latest/operations/api-reference.html#process-information
    '''
    def __init__(self, rest_client, owns_client=False):
        self.rest_client = rest_client
        self.owns_client = owns_client
    def close(self):
        if self.owns_client:
            self.rest_client.close()
        self.rest_client = None
    #-------- Common --------
    @property
    def status(self):
        '''
        Returns the Druid version, loaded extensions, memory used, total memory 
        and other useful information about the Druid service.
        GET `/status`
        '''
        return self.rest_client.get_json(REQ_STATUS)
    @property
    def is_healthy(self) -> bool:
        '''
        Returns `True` if the node is healthy, `False` otherwise. Check service health
        before using other Druid API methods to ensure the server is ready.
        See also `wait_until_ready()`.
        GET `/status/health`
        '''
        try:
            return self.rest_client.get_json(REQ_HEALTH)
        except Exception:
            return False
    def wait_until_ready(self):
        '''
        Sleeps until the node reports itself as healthy. Will run forever if the node
        is down or never becomes healthy.
        '''
        while not self.is_healthy:
            time.sleep(0.5)
    @property
    def properties(self) -> map:
        '''
        Returns the effective set of Java properties used by the service, including
        system properties and properties from the `common_runtime.propeties` and
        `runtime.properties` files.
        GET `/status/properties`
        '''
        return self.rest_client.get_json(REQ_PROPERTIES)
    @property
    def in_cluster(self):
        '''
        Returns `True` if the node is visible within the cluster, `False` if not.
        That is, returns the value of the `{"selfDiscovered": true/false}`
        field in the response.
        GET `/status/selfDiscovered/status`
        '''
        try:
            result = self.rest_client.get_json(REQ_IN_CLUSTER)
            return result.get('selfDiscovered', False)
        except ConnectionError:
            return False
    @property
    def version(self):
        '''
        Returns the version of the Druid server. If the server is running in an IDE, the
        version will be empty.
        '''
        return self.status.get('version')
    @property
    def brokers(self):
        '''
        Returns the list of broker nodes known to this node. Must be called on the Router.
        '''
        return self.rest_client.get_json(REQ_BROKERS)
--- a/examples/quickstart/jupyter-notebooks/druidapi/tasks.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/tasks.py
@ -0,0 +1,191 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .consts import OVERLORD_BASE
 REQ_TASKS = OVERLORD_BASE + '/tasks'
 REQ_POST_TASK = OVERLORD_BASE + '/task'
 REQ_GET_TASK = REQ_POST_TASK + '/{}'
 REQ_TASK_STATUS = REQ_GET_TASK + '/status'
 REQ_TASK_REPORTS = REQ_GET_TASK + '/reports'
 REQ_END_TASK = REQ_GET_TASK
 REQ_END_DS_TASKS = REQ_END_TASK + '/shutdownAllTasks'
 class TaskClient:
    '''
    Client for Overlord task-related APIs.
    See https://druid.apache.org/docs/latest/operations/api-reference.html#tasks
    '''
    def __init__(self, rest_client):
        self.client = rest_client
    def tasks(self, state=None, table=None, task_type=None, max=None, created_time_interval=None):
        '''
        Retrieves the list of tasks.
        Parameters
        ----------
        state: str, default = None
            Filter list of tasks by task state. Valid options are "running", 
            "complete", "waiting", and "pending". Constants are defined for
            each of these in the `consts` file.
        table: str, default = None
            Return tasks for only for one Druid table (datasource).
        created_time_interval: str, Default = None
            Return tasks created within the specified interval.
        max: int, default = None
            Maximum number of "complete" tasks to return. Only applies when state is set to "complete".
        task_type: str, default = None
            Filter tasks by task type.
        Reference
        ---------
        `GET /druid/indexer/v1/tasks`
        '''
        params = {}
        if state:
            params['state'] = state
        if table:
            params['datasource'] = table
        if task_type:
            params['type'] = task_type
        if max is not None:
            params['max'] = max
        if created_time_interval:
            params['createdTimeInterval'] = created_time_interval
        return self.client.get_json(REQ_TASKS, params=params)
    def task(self, task_id) -> dict:
        '''
        Retrieves the "payload" of a task.
        Parameters
        ----------
        task_id: str
            The ID of the task to retrieve.
        Returns
        -------
        The task payload as a Python dictionary.
        Reference
        ---------
        `GET /druid/indexer/v1/task/{taskId}`
        '''
        return self.client.get_json(REQ_GET_TASK, args=[task_id])
    def task_status(self, task_id) -> dict:
        '''
        Retrieves the status of a task.
        Parameters
        ----------
        task_id: str
            The ID of the task to retrieve.
        Returns
        -------
        The task status as a Python dictionary. See the `consts` module for a list
        of status codes.
        Reference
        ---------
        `GET /druid/indexer/v1/task/{taskId}/status`
        '''
        return self.client.get_json(REQ_TASK_STATUS, args=[task_id])
    def task_reports(self, task_id) -> dict:
        '''
        Retrieves the completion report for a completed task.
        Parameters
        ----------
        task_id: str
            The ID of the task to retrieve.
        Returns
        -------
        The task reports as a Python dictionary.
        Reference
        ---------
        `GET /druid/indexer/v1/task/{taskId}/reports`
        '''
        return self.client.get_json(REQ_TASK_REPORTS, args=[task_id])
    def submit_task(self, payload):
        '''
        Submits a task to the Overlord.
        Returns the `taskId` of the submitted task.
        Parameters
        ----------
        payload: object
            The task object represented as a Python dictionary.
        Returns
        -------
        The REST response.
        Reference
        ---------
        `POST /druid/indexer/v1/task`
        '''
        return self.client.post_json(REQ_POST_TASK, payload)
    def shut_down_task(self, task_id):
        '''
        Shuts down a task.
        Parameters
        ----------
        task_id: str
            The ID of the task to shut down.
        Returns
        -------
            The REST response.
        Reference
        ---------
        `POST /druid/indexer/v1/task/{taskId}/shutdown`
        '''
        return self.client.post_json(REQ_END_TASK, args=[task_id])
    def shut_down_tasks_for(self, table):
        '''
        Shuts down all tasks for a table (datasource).
        Parameters
        ----------
        table: str
            The name of the table (datasource).
        Returns
        -------
        The REST response.
        Reference
        ---------
        `POST /druid/indexer/v1/datasources/{dataSource}/shutdownAllTasks`
        '''
        return self.client.post_json(REQ_END_DS_TASKS, args=[table])
--- a/examples/quickstart/jupyter-notebooks/druidapi/text.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/text.py
@ -0,0 +1,181 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .display import DisplayClient
 from .base_table import pad, BaseTable
 alignments = ['', '^', '>']
 def simple_table(table_def):
    table = []
    if table_def.headers:
        table.append(' '.join(table_def.format_row(table_def.headers)))
    for row in table_def.rows:
        table.append(' '.join(table_def.format_row(row)))
    return table
 def border_table(table_def):
    fmt = ' | '.join(table_def.formats)
    table = []
    if table_def.headers:
        table.append(fmt.format(*table_def.headers))
        bar = ''
        for i in range(table_def.width):
            width = table_def.widths[i]
            if i > 0:
                bar += '+'
            if table_def.width == 1:
                pass
            elif i == 0:
                width += 1
            elif i == table_def.width - 1:
                width += 1
            else:
                width += 2
            bar += '-' * width
        table.append(bar)
    for row in table_def.rows:
        table.append(fmt.format(*row))
    return table
 class TableDef:
    def __init__(self):
        self.width = None
        self.headers = None
        self.align = None
        self.formats = None
        self.rows = None
        self.widths = None
    def find_widths(self):
        self.widths = [0 for i in range(self.width)]
        if self.headers:
            for i in range(len(self.headers)):
                self.widths[i] = len(self.headers[i])
        for row in self.rows:
            for i in range(len(row)):
                if row[i] is not None:
                    self.widths[i] = max(self.widths[i], len(row[i]))
    def apply_widths(self, widths):
        if not widths:
            return
        for i in range(min(len(self.widths), len(widths))):
            if widths[i] is not None:
                self.widths[i] = widths[i]
    def define_row_formats(self):
        self.formats = []
        for i in range(self.width):
            f = '{{:{}{}.{}}}'.format(
                alignments[self.align[i]],
                self.widths[i], self.widths[i])
            self.formats.append(f)
    def format_header(self):
        if not self.headers:
            return None
        return self.format_row(self.headers)
    def format_row(self, data_row):
        row = []
        for i in range(self.width):
            value = data_row[i]
            if not value:
                row.append(' ' * self.widths[i])
            else:
                row.append(self.formats[i].format(value))
        return row
 class TextTable(BaseTable):
    def __init__(self):
        BaseTable.__init__(self)
        self.formatter = simple_table
        self._widths = None
    def with_border(self):
        self.formatter = border_table
    def widths(self, widths):
        self._widths = widths
    def compute_def(self, rows):
        table_def = TableDef()
        min_width, max_width = self.row_width(rows)
        table_def.width = max_width
        table_def.headers = self.pad_headers(max_width)
        table_def.rows = self.format_rows(rows, min_width, max_width)
        table_def.find_widths()
        table_def.apply_widths(self._widths)
        table_def.align = self.find_alignments(rows, max_width)
        table_def.define_row_formats()
        return table_def
    def format(self):
        if not self._rows:
            self._rows = []
        table_rows = self.formatter(self.compute_def(self._rows))
        return '\n'.join(table_rows)
    def format_rows(self, rows, min_width, max_width):
        if not self._col_fmt:
            return self.default_row_format(rows, min_width, max_width)
        else:
            return self.apply_row_formats(rows, max_width)
    def default_row_format(self, rows, min_width, max_width):
        new_rows = []
        if min_width <= max_width:
            rows = self.pad_rows(rows, max_width)
        for row in rows:
            new_row = ['' if v is None else str(v) for v in row]
            new_rows.append(pad(new_row, max_width, None))
        return new_rows
    def apply_row_formats(self, rows, max_width):
        new_rows = []
        fmts = self._col_fmt
        if len(fmts) < max_width:
            fmts = fmts.copy()
            for i in range(len(fmts), max_width):
                fmts.append(lambda v: v)
        for row in rows:
            new_row = []
            for i in range(len(row)):
                new_row.append(fmts[i](row[i]))
            new_rows.append(pad(new_row, max_width, None))
        return new_rows
 class TextDisplayClient(DisplayClient):
    def __init__(self):
        DisplayClient.__init__(self)
    def text(self, msg):
        print(msg)
    def alert(self, msg):
        print("Alert:", msg)
    def error(self, msg):
        print("ERROR:", msg)
    def new_table(self):
        return TextTable()
    def show_table(self, table):
        print(table.format())
--- a/examples/quickstart/jupyter-notebooks/druidapi/util.py
+++ b/examples/quickstart/jupyter-notebooks/druidapi/util.py
@ -0,0 +1,35 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from .error import ClientError
 def dict_get(dict, key, default=None):
    '''
    Returns the value of key in the given dict, or the default value if
    the key is not found. 
    '''
    if not dict:
        return default
    return dict.get(key, default)
 def split_table_name(table_name, default_schema):
    if not table_name:
        raise ClientError('Table name is required')
    parts = table_name.split('.')
    if len(parts) > 2:
        raise ClientError('Druid supports one or two-part table names')
    if len(parts) == 2:
        return parts
    return [default_schema, parts[0]]
--- a/examples/quickstart/jupyter-notebooks/requirements.txt
+++ b/examples/quickstart/jupyter-notebooks/requirements.txt
@ -0,0 +1,22 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with
 # this work for additional information regarding copyright ownership.
 # The ASF licenses this file to You under the Apache License, Version 2.0
 # (the "License"); you may not use this file except in compliance with
 # the License.  You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ------------------------------------------------------------------------
 # Requirements for the Jupyter Notebooks
 # See: https://pip.pypa.io/en/stable/reference/requirements-file-format/
 #
 # Requirements are both few and simple at present.
 requests
--- a/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb
+++ b/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb
@ -4,8 +4,6 @@
   "cell_type": "markdown",
   "id": "ad4e60b6",
   "metadata": {
    "deletable": true,
    "editable": true,
    "tags": []
   },
   "source": [
@ -43,7 +41,6 @@
   "cell_type": "markdown",
   "id": "8d6bbbcb",
   "metadata": {
    "deletable": true,
    "tags": []
   },
   "source": [
@ -57,7 +54,9 @@
    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
    "\n",
-    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+    "This tutorial uses the [Druid Python API](Python_API_Tutorial.ipynb) to simplify access to Druid.\n",
    "\n",
    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid client, and another for the SQL client used to run SQL commands."
   ]
  },
  {
@ -69,8 +68,7 @@
   },
   "outputs": [],
   "source": [
-    "import requests\n",
+    "import druidapi\n",
    "import json\n",
    "\n",
    "# druid_host is the hostname and port for your Druid deployment. \n",
    "# In a distributed environment, you can point to other Druid services.\n",
@ -78,10 +76,9 @@
    "druid_host = \"http://localhost:8888\"\n",
    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
    "\n",
-    "# Set basic output formatting.\n",
+    "druid = druidapi.jupyter_client(druid_host)\n",
-    "bold = '\\033[1m'\n",
+    "display = druid.display\n",
-    "standard = '\\033[0m'\n",
+    "sql_client = druid.sql"
    "print(f\"{bold}Druid host{standard}: {druid_host}\")"
   ]
  },
  {
@ -89,8 +86,6 @@
   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
   "metadata": {},
   "source": [
    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks.\n",
    "\n",
    "## Druid SQL statements\n",
    "\n",
    "The following are the main Druid SQL statements:\n",
@ -112,7 +107,9 @@
    "Note the following about the query to ingest data:\n",
    "- The query uses the TIME_PARSE function to parse ISO 8601 time strings into timestamps. See the section on [timestamp values](#timestamp-values) for more information.\n",
    "- The asterisk ( * ) tells Druid to ingest all the columns.\n",
-    "- The EXTERN statement lets you define the data source type and the input schema. See [Read external data with EXTERN](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#read-external-data-with-extern) for more information."
+    "- The EXTERN statement lets you define the data source type and the input schema. See [Read external data with EXTERN](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#read-external-data-with-extern) for more information.\n",
    "\n",
    "The following cell defines the query, uses MSQ to ingest the data, and waits for the MSQ task to complete. You will see an asterisk `[*]` in the left margin while the task runs."
   ]
  },
  {
@ -122,76 +119,35 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql/task\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "INSERT INTO \"wikipedia-sql-tutorial\" \n",
-    "http_method = \"POST\"\n",
+    "SELECT TIME_PARSE(\"timestamp\") AS __time, * \n",
-    "\n",
+    "FROM TABLE (EXTERN(\n",
-    "\n",
+    "    '{\"type\": \"http\", \"uris\": [\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
-    "payload = json.dumps({\n",
+    "    '{\"type\": \"json\"}', \n",
-    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "    '[{\"name\": \"added\", \"type\": \"long\"}, {\"name\": \"channel\", \"type\": \"string\"}, {\"name\": \"cityName\", \"type\": \"string\"}, {\"name\": \"comment\", \"type\": \"string\"}, {\"name\": \"commentLength\", \"type\": \"long\"}, {\"name\": \"countryIsoCode\", \"type\": \"string\"}, {\"name\": \"countryName\", \"type\": \"string\"}, {\"name\": \"deleted\", \"type\": \"long\"}, {\"name\": \"delta\", \"type\": \"long\"}, {\"name\": \"deltaBucket\", \"type\": \"string\"}, {\"name\": \"diffUrl\", \"type\": \"string\"}, {\"name\": \"flags\", \"type\": \"string\"}, {\"name\": \"isAnonymous\", \"type\": \"string\"}, {\"name\": \"isMinor\", \"type\": \"string\"}, {\"name\": \"isNew\", \"type\": \"string\"}, {\"name\": \"isRobot\", \"type\": \"string\"}, {\"name\": \"isUnpatrolled\", \"type\": \"string\"}, {\"name\": \"metroCode\", \"type\": \"string\"}, {\"name\": \"namespace\", \"type\": \"string\"}, {\"name\": \"page\", \"type\": \"string\"}, {\"name\": \"regionIsoCode\", \"type\": \"string\"}, {\"name\": \"regionName\", \"type\": \"string\"}, {\"name\": \"timestamp\", \"type\": \"string\"}, {\"name\": \"user\", \"type\": \"string\"}]'\n",
-    "          AS __time, * FROM TABLE \\\n",
+    "    ))\n",
-    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "PARTITIONED BY DAY\n",
-    "          PARTITIONED BY DAY\",\n",
+    "'''\n",
-    "  \"context\": {\n",
+    "sql_client.run_task(sql)"
    "    \"maxNumTasks\": 3\n",
    "  }\n",
    "})\n",
    "\n",
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "ingestion_taskId_response = response\n",
    "ingestion_taskId = json.loads(ingestion_taskId_response.text)['taskId']\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
   ]
  },
  {
   "cell_type": "markdown",
-   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "id": "a141e962",
   "metadata": {},
   "source": [
-    "Wait for ingestion to complete before proceeding.\n",
+    "MSQ reports task completion as soon as ingestion is done. However, it takes a while for Druid to load the resulting segments. Wait for the table to become ready."
    "To check on the status of your ingestion task, run the following cell.\n",
    "It continuously fetches the status of the ingestion job until the ingestion job is complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "id": "cca15307",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "import time\n",
+    "sql_client.wait_until_ready(dataSourceName)"
    "\n",
    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
    "http_method = \"GET\"\n",
    "\n",
    "payload = {}\n",
    "headers = {}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "ingestion_status = json.loads(response.text)['status']['status']\n",
    "# If you only want to fetch the status once and print it, \n",
    "# uncomment the print statement and comment out the if and while loops\n",
    "# print(json.dumps(response.json(), indent=4))\n",
    "\n",
    "if ingestion_status == \"RUNNING\":\n",
    "  print(\"The ingestion is running...\")\n",
    "\n",
    "while ingestion_status != \"SUCCESS\":\n",
    "  response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "  ingestion_status = json.loads(response.text)['status']['status']\n",
    "  time.sleep(15)  \n",
    "  \n",
    "if ingestion_status == \"SUCCESS\": \n",
    "  print(\"The ingestion is complete:\")\n",
    "  print(json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -206,38 +162,26 @@
    "\n",
    "In Druid SQL, table datasources reside in the `druid` schema. This is the default schema, so table datasources can be referenced as either `druid.dataSourceName` or `dataSourceName`.\n",
    "\n",
-    "For example, run the next cell to return the rows of the column named `channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is running in Jupyter, the cells use the LIMIT clause to limit the size of the query results for display purposes."
+    "For example, run the next cell to return the rows of the column named `channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is running in Jupyter, the cells use the LIMIT clause to limit the size of the query results for display purposes. The cell uses the built-in table formatting feature of the Python API. You can also retrieve the values as a Python object if you wish to perform additional processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "id": "6e5d8de0",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT \"channel\" FROM \"wikipedia-sql-tutorial\" LIMIT 7\n",
-    "http_method = \"POST\"\n",
+    "'''\n",
-    "\n",
+    "display.sql(sql)"
    "payload = json.dumps({\n",
    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" LIMIT 7\"})\n",
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\nEach JSON object in the response represents a row in the {dataSourceName} datasource.\") \n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbeb5a63",
   "metadata": {
    "deletable": true,
    "tags": []
   },
   "source": [
@ -246,11 +190,13 @@
    "Druid maps SQL data types onto native types at query runtime.\n",
    "The following native types are supported for Druid columns:\n",
    "\n",
-    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* SQL: `VARCHAR`, Druid: `STRING`: UTF-8 encoded strings and string arrays\n",
-    "* LONG: 64-bit signed int\n",
+    "* SQL: `BIGINT`, Druid: `LONG`: 64-bit signed int\n",
-    "* FLOAT: 32-bit float\n",
+    "* SQL & Druid: `FLOAT`: 32-bit float\n",
-    "* DOUBLE: 64-bit float\n",
+    "* SQL & Druid: `DOUBLE`: 64-bit float\n",
-    "* COMPLEX: represents non-standard data types, such as nested JSON, hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "* Druid `COMPLEX`: represents non-standard data types, such as nested JSON, hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
    "\n",
    "For reference on how SQL data types map onto Druid native types, see [Standard types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
    "\n",
    "Druid exposes table and column metadata through [INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema) tables. Run the following query to retrieve metadata for the `wikipedia-sql-tutorial` datasource. In the response body, each JSON object correlates to a column in the table.\n",
    "Check the objects' `DATA_TYPE` property for SQL data types. You should see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
@ -259,25 +205,35 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "id": "c7a86e2e",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT COLUMN_NAME, DATA_TYPE \n",
-    "http_method = \"POST\"\n",
+    "FROM INFORMATION_SCHEMA.COLUMNS \n",
-    "\n",
+    "WHERE \"TABLE_SCHEMA\" = 'druid' AND \"TABLE_NAME\" = 'wikipedia-sql-tutorial' \n",
-    "payload = json.dumps({\n",
+    "LIMIT 7\n",
-    "  \"query\":\"SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE \\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "'''\n",
-    "})\n",
+    "display.sql(sql)"
-    "headers = {'Content-Type': 'application/json'}\n",
+   ]
-    "\n",
+  },
-    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+  {
-    "\n",
+   "cell_type": "markdown",
-    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+   "id": "59f41229",
-    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   "metadata": {},
   "source": [
    "This is such a common query that the SQL client has it built in:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ac6c410",
   "metadata": {},
   "outputs": [],
   "source": [
    "display.table(dataSourceName)"
   ]
  },
  {
@ -285,8 +241,6 @@
   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
   "metadata": {},
   "source": [
    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL data types as LONG. For reference on how SQL data types map onto Druid native types, see [Standard types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
    "\n",
    "### Timestamp values\n",
    "\n",
    "Druid stores timestamp values as the number of milliseconds since the Unix epoch.\n",
@ -303,23 +257,18 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "id": "16c1a31a",
   "metadata": {},
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT channel, page\n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "WHERE TIME_IN_INTERVAL(__time, '2016-06-27T00:05:54.56/2016-06-27T00:06:53')\n",
-    "payload = json.dumps({\n",
+    "GROUP BY channel, page\n",
-    "  \"query\":\"SELECT channel, page\\nFROM \\\"wikipedia-sql-tutorial\\\" WHERE TIME_IN_INTERVAL(__time, '2016-06-27T00:05:54.56/2016-06-27T00:06:53')\\nGROUP BY channel, page\\nLIMIT 7\"\n",
+    "LIMIT 7\n",
-    "})\n",
+    "'''\n",
-    "headers = {'Content-Type': 'application/json'}\n",
+    "display.sql(sql)"
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -338,7 +287,6 @@
   "cell_type": "markdown",
   "id": "29c24856",
   "metadata": {
    "deletable": true,
    "tags": []
   },
   "source": [
@ -387,25 +335,18 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "ffca2962-da31-4b9c-adbc-882e35386916",
+   "id": "123187d3",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT channel, page, comment \n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "WHERE __time >= TIMESTAMP '2015-09-12 23:33:55' \n",
-    "payload = json.dumps({\n",
+    "  AND namespace = 'Main' \n",
-    "  \"query\":\"SELECT channel, page, comment FROM \\\"wikipedia-sql-tutorial\\\" WHERE __time >= TIMESTAMP '2015-09-12 23:33:55' AND namespace = 'Main' LIMIT 7\"\n",
+    "LIMIT 7\n",
-    "})\n",
+    "'''\n",
-    "headers = {'Content-Type': 'application/json'}\n",
+    "display.sql(sql)"
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -432,33 +373,23 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "9afa74b9-ef9f-4b36-a7bb-88e4498a48ef",
+   "id": "b4656c81",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT channel, page, comment \n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "WHERE \"cityName\" <> '' AND \"countryIsoCode\" = 'US' \n",
-    "payload = json.dumps({\n",
+    "LIMIT 7\n",
-    "  \"query\":\"SELECT channel, page, comment FROM \\\"wikipedia-sql-tutorial\\\" WHERE \\\"cityName\\\" <> '' AND \\\"countryIsoCode\\\" = 'US' LIMIT 7\"\n",
+    "'''\n",
-    "})\n",
+    "display.sql(sql)"
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd24470d-25c2-4031-a711-8477d69c9e94",
   "metadata": {
    "deletable": true,
    "editable": true,
    "tags": []
   },
   "source": [
@ -483,25 +414,17 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "228ae0e4-355e-4b4d-8253-fb2e46715559",
+   "id": "0127e401",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT channel, COUNT(*) AS counts \n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "GROUP BY channel \n",
-    "payload = json.dumps({\n",
+    "LIMIT 7\n",
-    "  \"query\":\"SELECT channel, COUNT(*) AS counts FROM \\\"wikipedia-sql-tutorial\\\" GROUP BY channel LIMIT 7\"\n",
+    "'''\n",
-    "})\n",
+    "display.sql(sql)"
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -521,25 +444,17 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "ca4b3bae-2b02-4c90-98a5-cb7806b6e649",
+   "id": "d3724cc3",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT cityName, countryName, COUNT(*) AS counts \n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "GROUP BY cityName, countryName \n",
-    "payload = json.dumps({\n",
+    "LIMIT 7\n",
-    "  \"query\":\"SELECT cityName, countryName, COUNT(*) AS counts FROM \\\"wikipedia-sql-tutorial\\\" GROUP BY 1, 2 LIMIT 7\"\n",
+    "'''\n",
-    "})\n",
+    "display.sql(sql)"
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -580,25 +495,17 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "83300261-b0c6-4104-b42b-8b1f9d15aa56",
+   "id": "3ed58bfd",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT comment AS \"Entry\" \n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "WHERE cityName = 'Mexico City' \n",
-    "payload = json.dumps({\n",
+    "LIMIT 7\n",
-    "  \"query\":\"SELECT comment AS \\\"Entry\\\" FROM \\\"wikipedia-sql-tutorial\\\" WHERE cityName = 'Mexico City' LIMIT 7\"\n",
+    "'''\n",
-    "})\n",
+    "display.sql(sql)"
    "headers = {'Content-Type': 'application/json'}\n",
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -622,25 +529,18 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "fdb94a64-09d1-4f3d-981c-ef805f34b175",
+   "id": "bb694442",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT channel, count(*) as \"Number of events\" \n",
-    "http_method = \"POST\"\n",
+    "FROM \"wikipedia-sql-tutorial\" \n",
-    "\n",
+    "GROUP BY channel \n",
-    "payload = json.dumps({\n",
+    "ORDER BY \"Number of events\" ASC \n",
-    "  \"query\":\"SELECT channel, count(*) as \\\"Number of events\\\" FROM \\\"wikipedia-sql-tutorial\\\" GROUP BY channel ORDER BY 2 ASC LIMIT 5\"\n",
+    "LIMIT 5\n",
-    "})\n",
+    "'''\n",
-    "headers = {'Content-Type': 'application/json'}\n",
+    "display.sql(sql)"
    "\n",
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -665,26 +565,21 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "19a6ab44-6349-4620-82c7-00997287378a",
+   "id": "644b0cdd",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT \n",
-    "http_method = \"POST\"\n",
+    " countryName AS \"Country\", \n",
-    "\n",
+    " SUM(deleted) AS deleted, \n",
-    "payload = json.dumps({\n",
+    " SUM(added) AS added \n",
-    "  \"query\":\"SELECT countryName AS \\\"Country\\\", SUM(deleted) AS deleted, SUM(added) AS added FROM \\\"wikipedia-sql-tutorial\\\" \\\n",
+    "FROM \"wikipedia-sql-tutorial\"\n",
-    "  WHERE countryName = 'France' GROUP BY countryName , FLOOR(__time TO HOUR) LIMIT 7\"\n",
+    "WHERE countryName = 'France' \n",
-    "})\n",
+    "GROUP BY countryName , FLOOR(__time TO HOUR) \n",
-    "headers = {'Content-Type': 'application/json'}\n",
+    "LIMIT 7\n",
-    "\n",
+    "'''\n",
-    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "display.sql(sql)"
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
@ -704,34 +599,26 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "091c15dd-3e65-49ce-bb31-fca716d7ca3a",
+   "id": "b4f5d1dd",
-   "metadata": {
+   "metadata": {},
    "tags": []
   },
   "outputs": [],
   "source": [
-    "endpoint = \"/druid/v2/sql\"\n",
+    "sql = '''\n",
-    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "SELECT \n",
-    "http_method = \"POST\"\n",
+    "  countryName AS \"Country\\\", \n",
-    "\n",
+    "  countryIsoCode AS \"ISO\" \n",
-    "payload = json.dumps({\n",
+    "FROM \"wikipedia-sql-tutorial\"\n",
-    "  \"query\":\"SELECT countryName AS \\\"Country\\\", countryIsoCode AS \\\"ISO\\\" FROM \\\"wikipedia-sql-tutorial\\\" \\\n",
+    "WHERE channel = '#es.wikipedia' \n",
-    "  WHERE channel = '#es.wikipedia' GROUP BY countryName, countryIsoCode LIMIT 7\"\n",
+    "GROUP BY countryName, countryIsoCode \n",
-    "})\n",
+    "LIMIT 7\n",
-    "headers = {'Content-Type': 'application/json'}\n",
+    "'''\n",
-    "\n",
+    "display.sql(sql)"
    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
    "\n",
    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fbfa1fa-2cde-46d5-8107-60bd436fb64e",
   "metadata": {
    "deletable": true,
    "editable": true,
    "tags": []
   },
   "source": [
@ -751,7 +638,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@ -765,7 +652,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.8 (main, Nov  4 2022, 16:31:34) [Clang 12.0.5 (clang-1205.0.22.11)]"
+   "version": "3.9.6"
  },
  "toc-autonumbering": false,
  "toc-showcode": false,