pip install for Python Druid API (#13938)

Broken test appears unrelated to this PR

* make druidapi pip installable

* include druidapi in prerequisites

* add license to setup.py

* updates from Paul's review

* note about editable install

* Apply suggestions from code review

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* update install instructions

* found unrelated typos

* standardize install cmd with pip

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
This commit is contained in:
Victoria Lim 2023-03-21 11:37:39 -07:00 committed by GitHub
parent 1c7a03a47b
commit ede9903ff4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
22 changed files with 188 additions and 127 deletions

3
.gitignore vendored
View File

@ -30,3 +30,6 @@ integration-tests/gen-scripts/
*.hprof *.hprof
**/.ipynb_checkpoints/ **/.ipynb_checkpoints/
*.pyc *.pyc
**/.ipython/
**/.jupyter/
**/.local/

View File

@ -46,7 +46,7 @@
"- The `requests` package for Python. For example, you can install it with the following command:\n", "- The `requests` package for Python. For example, you can install it with the following command:\n",
"\n", "\n",
" ```bash\n", " ```bash\n",
" pip3 install requests\n", " pip install requests\n",
" ````\n", " ````\n",
"\n", "\n",
"- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n", "- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",

View File

@ -564,7 +564,7 @@
"id": "2654e72c", "id": "2654e72c",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Use the REST client if you need to make calls that are not yet wrapped by the Python API, or if you want to do something special. To illustrate the client, you can make some of the same calls as in the [Druid REST API notebook](api_tutorial.ipynb).\n", "Use the REST client if you need to make calls that are not yet wrapped by the Python API, or if you want to do something special. To illustrate the client, you can make some of the same calls as in the [Druid REST API notebook](api-tutorial.ipynb).\n",
"\n", "\n",
"The REST API maintains the Druid host: you just provide the specifc URL tail. There are methods to get or post JSON results. For example, to get status information:" "The REST API maintains the Druid host: you just provide the specifc URL tail. There are methods to get or post JSON results. For example, to get status information:"
] ]

View File

@ -1,9 +1,9 @@
# Jupyter Notebook tutorials for Druid # Jupyter Notebook tutorials for Druid
If you are reading this in Jupyter, switch over to the [- START HERE -](- START HERE -.ipynb] If you are reading this in Jupyter, switch over to the [0-START-HERE](0-START-HERE.ipynb)
notebook instead. notebook instead.
<!-- This README, the "- START HERE -" notebook, and the tutorial-jupyter-index.md file in <!-- This README, the "0-START-HERE" notebook, and the tutorial-jupyter-index.md file in
docs/tutorials share a lot of the same content. If you make a change in one place, update docs/tutorials share a lot of the same content. If you make a change in one place, update
the other too. --> the other too. -->
@ -39,7 +39,7 @@ Make sure you meet the following requirements before starting the Jupyter-based
- The `requests` package for Python. For example, you can install it with the following command: - The `requests` package for Python. For example, you can install it with the following command:
```bash ```bash
pip3 install requests pip install requests
``` ```
- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid - JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
@ -49,9 +49,9 @@ Make sure you meet the following requirements before starting the Jupyter-based
```bash ```bash
# Install JupyterLab # Install JupyterLab
pip3 install jupyterlab pip install jupyterlab
# Install Jupyter Notebook # Install Jupyter Notebook
pip3 install notebook pip install notebook
``` ```
- Start Jupyter using either JupyterLab - Start Jupyter using either JupyterLab
```bash ```bash
@ -65,8 +65,15 @@ Make sure you meet the following requirements before starting the Jupyter-based
jupyter notebook --port 3001 jupyter notebook --port 3001
``` ```
- An available Druid instance. You can use the `micro-quickstart` configuration - The Python API client for Druid. Clone the Druid repo if you haven't already.
described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html). Go to your Druid source repo and install `druidapi` with the following commands:
```bash
cd examples/quickstart/jupyter-notebooks/druidapi
pip install .
```
- An available Druid instance. You can use the [quickstart deployment](https://druid.apache.org/docs/latest/tutorials/index.html).
The tutorials assume that you are using the quickstart, so no authentication or authorization The tutorials assume that you are using the quickstart, so no authentication or authorization
is expected unless explicitly mentioned. is expected unless explicitly mentioned.
@ -85,4 +92,4 @@ Make sure you meet the following requirements before starting the Jupyter-based
## Continue in Jupyter ## Continue in Jupyter
Start Jupyter (see above) and navigate to the "- START HERE -" page for more information. Start Jupyter (see above) and navigate to the "0-START-HERE" notebook for more information.

View File

@ -63,7 +63,7 @@
"Install the [Requests](https://requests.readthedocs.io/en/latest/) library for Python before you start. For example:\n", "Install the [Requests](https://requests.readthedocs.io/en/latest/) library for Python before you start. For example:\n",
"\n", "\n",
"```bash\n", "```bash\n",
"pip3 install requests\n", "pip install requests\n",
"```\n", "```\n",
"\n", "\n",
"Please read the [Requests Quickstart](https://requests.readthedocs.io/en/latest/user/quickstart/) to gain a basic understanding of how Requests works.\n", "Please read the [Requests Quickstart](https://requests.readthedocs.io/en/latest/user/quickstart/) to gain a basic understanding of how Requests works.\n",

View File

@ -28,6 +28,9 @@ in any Python environment, but is optimized for use in Jupyter, providing a comp
environment which complements the UI-based Druid console. The primary use of `druidapi` at present environment which complements the UI-based Druid console. The primary use of `druidapi` at present
is to support the set of tutorial notebooks provided in the parent directory. is to support the set of tutorial notebooks provided in the parent directory.
`druidapi` works against any version of Druid. Operations that make use of newer features obviously work
only against versions of Druid that support those features.
## Install ## Install
At present, the best way to use `druidapi` is to clone the Druid repo itself: At present, the best way to use `druidapi` is to clone the Druid repo itself:
@ -36,21 +39,29 @@ At present, the best way to use `druidapi` is to clone the Druid repo itself:
git clone git@github.com:apache/druid.git git clone git@github.com:apache/druid.git
``` ```
`druidapi` is located in `examples/quickstart/jupyter-notebooks/druidapi/` `druidapi` is located in `examples/quickstart/jupyter-notebooks/druidapi/`.
From this directory, install the package and its dependencies with pip using the following command:
Eventually we would like to create a Python package that can be installed with `pip`. Contributions ```
in that area are welcome. pip install .
```
Dependencies are listed in `requirements.txt`. Note that there is a second level `druidapi` directory that contains the modules. Do not run
the install command in the subdirectory.
`druidapi` works against any version of Druid. Operations that exploit newer features obviously work Verify your installation by checking that the following command runs in Python:
only against versions of Druid that support those features.
## Getting Started ```python
import druidapi
```
The import statement should not return anything if it runs successfully.
## Getting started
To use `druidapi`, first import the library, then connect to your cluster by providing the URL to your Router instance. The way that is done differs a bit between consumers. To use `druidapi`, first import the library, then connect to your cluster by providing the URL to your Router instance. The way that is done differs a bit between consumers.
### From a Tutorial Jupyter Notebook ### From a tutorial Jupyter notebook
The tutorial Jupyter notebooks in `examples/quickstart/jupyter-notebooks` reside in the same directory tree The tutorial Jupyter notebooks in `examples/quickstart/jupyter-notebooks` reside in the same directory tree
as this library. We start the library using the Jupyter-oriented API which is able to render tables in as this library. We start the library using the Jupyter-oriented API which is able to render tables in
@ -70,40 +81,17 @@ druid = druidapi.jupyter_client(router_endpoint)
The `jupyter_client` call defines a number of CSS styles to aid in displaying tabular results. It also The `jupyter_client` call defines a number of CSS styles to aid in displaying tabular results. It also
provides a "display" client that renders information as HTML tables. provides a "display" client that renders information as HTML tables.
### From Any Other Juypter Notebook ### From a Python script
If you create a Jupyter notebook outside of the `jupyter-notebooks` directory then you must tell Python where
to find the `druidapi` library. (This step is temporary until `druidapi` is properly packaged.)
First, set a variable to point to the location where you cloned the Druid git repo:
```python
druid_dev = '/path/to/Druid-repo'
```
Then, add the notebooks directory to Python's module search path:
```python
import sys
sys.path.append(druid_dev + '/examples/quickstart/jupyter-notebooks/')
```
Now you can import `druidapi` and create a client as shown in the previous section.
### From a Python Script
`druidapi` works in any Python script. When run outside of a Jupyter notebook, the various "display" `druidapi` works in any Python script. When run outside of a Jupyter notebook, the various "display"
commands revert to displaying a text (not HTML) format. The steps are similar to those above: commands revert to displaying a text (not HTML) format. The steps are similar to those above:
```python ```python
druid_dev = '/path/to/Druid-repo'
import sys
sys.path.append(druid_dev + '/examples/quickstart/jupyter-notebooks/')
import druidapi import druidapi
druid = druidapi.client(router_endpoint) druid = druidapi.client(router_endpoint)
``` ```
## Library Organization ## Library organization
`druidapi` organizes Druid REST operations into various "clients," each of which provides operations `druidapi` organizes Druid REST operations into various "clients," each of which provides operations
for one of Druid's functional areas. Obtain a client from the `druid` client created above. For for one of Druid's functional areas. Obtain a client from the `druid` client created above. For
@ -127,7 +115,7 @@ available as properties on the `druid` object created above.
* `display` - A set of convenience operations to display results as lightly formatted tables * `display` - A set of convenience operations to display results as lightly formatted tables
in either HTML (for Jupyter notebooks) or text (for other Python scripts). in either HTML (for Jupyter notebooks) or text (for other Python scripts).
## Assumed Cluster Architecture ## Assumed cluster architecture
`druidapi` assumes that you run a standard Druid cluster with a Router in front of the other nodes. `druidapi` assumes that you run a standard Druid cluster with a Router in front of the other nodes.
This design works well for most Druid clusters: This design works well for most Druid clusters:
@ -148,7 +136,7 @@ The one exception to this rule is if you want to perform a health check (i.e. th
on a service other than the Router. These checks are _not_ proxied by the Router: you must connect to on a service other than the Router. These checks are _not_ proxied by the Router: you must connect to
the target service directly. the target service directly.
## Status Operations ## Status operations
When working with tutorials, a local Druid cluster, or a Druid integration test cluster, it is common When working with tutorials, a local Druid cluster, or a Druid integration test cluster, it is common
to start your cluster then immediately start performing `druidapi` operations. However, because Druid to start your cluster then immediately start performing `druidapi` operations. However, because Druid
@ -183,7 +171,7 @@ extension is loaded:
status_client.properties['druid.extensions.loadList'] status_client.properties['druid.extensions.loadList']
``` ```
## Display Client ## Display client
When run in a Jupyter notebook, it is often handy to format results for display. A special display When run in a Jupyter notebook, it is often handy to format results for display. A special display
client performs operations _and_ formats them for display as HTML tables within the notebook. client performs operations _and_ formats them for display as HTML tables within the notebook.
@ -204,7 +192,7 @@ The most common methods are:
The display client also has other methods to format data as a table, to display various kinds The display client also has other methods to format data as a table, to display various kinds
of messages and so on. of messages and so on.
## Interactive Queries ## Interactive queries
The original [`pydruid`](https://pythonhosted.org/pydruid/) library revolves around Druid The original [`pydruid`](https://pythonhosted.org/pydruid/) library revolves around Druid
"native" queries. Most new applications now use SQL. `druidapi` provides two ways to run "native" queries. Most new applications now use SQL. `druidapi` provides two ways to run
@ -264,7 +252,7 @@ channel count
Within Jupyter, the results are formatted as an HTML table. Within Jupyter, the results are formatted as an HTML table.
### Advanced Queries ### Advanced queries
In addition to the SQL text, Druid also lets you specify: In addition to the SQL text, Druid also lets you specify:
@ -350,7 +338,7 @@ resp.show()
In fact, the display client `sql()` method uses the `resp.show()` method internally, which in turn uses the In fact, the display client `sql()` method uses the `resp.show()` method internally, which in turn uses the
`rows` and `schema` properties. `rows` and `schema` properties.
### Run a Query and Return Results ### Run a query and return results
The above forms are handy for interactive use in a notebook. If you just need to run a query to use the results The above forms are handy for interactive use in a notebook. If you just need to run a query to use the results
in code, just do the following: in code, just do the following:
@ -366,7 +354,7 @@ sql = 'SELECT * FROM {}'
rows = sql_client.sql(sql, ['myTable']) rows = sql_client.sql(sql, ['myTable'])
``` ```
## MSQ Queries ## MSQ queries
The SQL client can also run an MSQ query. See the `sql-tutorial.ipynb` notebook for examples. First define the The SQL client can also run an MSQ query. See the `sql-tutorial.ipynb` notebook for examples. First define the
query: query:
@ -408,7 +396,7 @@ while for Druid to load the resulting segments, so you must wait for the table t
sql_client.wait_until_ready('myTable') sql_client.wait_until_ready('myTable')
``` ```
## Datasource Operations ## Datasource operations
To get information about a datasource, prefer to query the `INFORMATION_SCHEMA` tables, or use the methods To get information about a datasource, prefer to query the `INFORMATION_SCHEMA` tables, or use the methods
in the display client. Use the datasource client for other operations. in the display client. Use the datasource client for other operations.
@ -425,7 +413,7 @@ datasources.drop('myWiki', True)
The True argument asks for "if exists" semantics so you don't get an error if the datasource does not exist. The True argument asks for "if exists" semantics so you don't get an error if the datasource does not exist.
## REST Client ## REST client
The `druidapi` is based on a simple REST client which is itself based on the Requests library. If you The `druidapi` is based on a simple REST client which is itself based on the Requests library. If you
need to use Druid REST APIs not yet wrapped by this library, you can use the REST client directly. need to use Druid REST APIs not yet wrapped by this library, you can use the REST client directly.
@ -495,3 +483,28 @@ Druid has a large number of special constants: type names, options, etc. The con
from druidapi import consts from druidapi import consts
help(consts) help(consts)
``` ```
## Contributing
We encourage you to contribute to the `druidapi` package.
Set up an editable installation for development by running the following command
in a local clone of your `apache/druid` repo in
`examples/quickstart/jupyter-notebooks/druidapi/`:
```
pip install -e .
```
An editable installation allows you to implement and test changes iteratively
without having to reinstall the package with every change.
When you update the package, also increment the version field in `setup.py` following the
[PEP 440 semantic versioning scheme](https://peps.python.org/pep-0440/#semantic-versioning).
Use the following guidelines for incrementing the version number:
* Increment the third position for a patch or bug fix.
* Increment the second position for new features, such as adding new method wrappers.
* Increment the first position for major changes and changes that are not backwards compatible.
Submit your contribution by opening a pull request to the `apache/druid` GitHub repository.

View File

@ -13,14 +13,14 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .druid import DruidClient from druidapi.druid import DruidClient
def jupyter_client(endpoint) -> DruidClient: def jupyter_client(endpoint) -> DruidClient:
''' '''
Create a Druid client configured to display results as HTML withing a Jupyter notebook. Create a Druid client configured to display results as HTML withing a Jupyter notebook.
Waits for the cluster to become ready to avoid intermitent problems when using Druid. Waits for the cluster to become ready to avoid intermitent problems when using Druid.
''' '''
from .html import HtmlDisplayClient from druidapi.html_display import HtmlDisplayClient
druid = DruidClient(endpoint, HtmlDisplayClient()) druid = DruidClient(endpoint, HtmlDisplayClient())
druid.status.wait_until_ready() druid.status.wait_until_ready()
return druid return druid
@ -33,3 +33,4 @@ def client(endpoint) -> DruidClient:
that the cluster has not yet fully started. that the cluster has not yet fully started.
''' '''
return DruidClient(endpoint) return DruidClient(endpoint)

View File

@ -14,8 +14,8 @@
# limitations under the License. # limitations under the License.
import requests import requests
from .consts import COORD_BASE from druidapi.consts import COORD_BASE
from .rest import check_error from druidapi.rest import check_error
# Catalog (new feature in Druid 26) # Catalog (new feature in Druid 26)
CATALOG_BASE = COORD_BASE + '/catalog' CATALOG_BASE = COORD_BASE + '/catalog'

View File

@ -14,9 +14,9 @@
# limitations under the License. # limitations under the License.
import requests, time import requests, time
from .consts import COORD_BASE from druidapi.consts import COORD_BASE
from .rest import check_error from druidapi.rest import check_error
from .util import dict_get from druidapi.util import dict_get
REQ_DATASOURCES = COORD_BASE + '/datasources' REQ_DATASOURCES = COORD_BASE + '/datasources'
REQ_DATASOURCE = REQ_DATASOURCES + '/{}' REQ_DATASOURCE = REQ_DATASOURCES + '/{}'

View File

@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from . import consts from druidapi import consts
class DisplayClient: class DisplayClient:
''' '''

View File

@ -13,12 +13,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .rest import DruidRestClient from druidapi.rest import DruidRestClient
from .status import StatusClient from druidapi.status import StatusClient
from .catalog import CatalogClient from druidapi.catalog import CatalogClient
from .sql import QueryClient from druidapi.sql import QueryClient
from .tasks import TaskClient from druidapi.tasks import TaskClient
from .datasource import DatasourceClient from druidapi.datasource import DatasourceClient
class DruidClient: class DruidClient:
''' '''
@ -36,7 +36,7 @@ class DruidClient:
if display_client: if display_client:
self.display_client = display_client self.display_client = display_client
else: else:
from .text import TextDisplayClient from druidapi.text_display import TextDisplayClient
self.display_client = TextDisplayClient() self.display_client = TextDisplayClient()
self.display_client._druid = self self.display_client._druid = self

View File

@ -15,8 +15,8 @@
from IPython.display import display, HTML from IPython.display import display, HTML
from html import escape from html import escape
from .display import DisplayClient from druidapi.display import DisplayClient
from .base_table import BaseTable from druidapi.base_table import BaseTable
STYLES = ''' STYLES = '''
<style> <style>

View File

@ -14,9 +14,9 @@
# limitations under the License. # limitations under the License.
import requests import requests
from .util import dict_get from druidapi.util import dict_get
from urllib.parse import quote from urllib.parse import quote
from .error import ClientError from druidapi.error import ClientError
def check_error(response): def check_error(response):
''' '''
@ -52,7 +52,7 @@ def check_error(response):
# We have an explanation from Druid. Raise a Client exception # We have an explanation from Druid. Raise a Client exception
raise ClientError(msg) raise ClientError(msg)
# Don't know what the Druid JSON is. Raise a Requetss exception, but # Don't know what the Druid JSON is. Raise a Requests exception, but
# add on the JSON in the hopes that the caller can make use of it. # add on the JSON in the hopes that the caller can make use of it.
try: try:
response.raise_for_status() response.raise_for_status()

View File

@ -14,9 +14,9 @@
# limitations under the License. # limitations under the License.
import time, requests import time, requests
from . import consts from druidapi import consts
from .util import dict_get, split_table_name from druidapi.util import dict_get, split_table_name
from .error import DruidError, ClientError from druidapi.error import DruidError, ClientError
REQ_SQL = consts.ROUTER_BASE + '/sql' REQ_SQL = consts.ROUTER_BASE + '/sql'
REQ_SQL_TASK = REQ_SQL + '/task' REQ_SQL_TASK = REQ_SQL + '/task'

View File

@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .consts import OVERLORD_BASE from druidapi.consts import OVERLORD_BASE
REQ_TASKS = OVERLORD_BASE + '/tasks' REQ_TASKS = OVERLORD_BASE + '/tasks'
REQ_POST_TASK = OVERLORD_BASE + '/task' REQ_POST_TASK = OVERLORD_BASE + '/task'

View File

@ -13,8 +13,8 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .display import DisplayClient from druidapi.display import DisplayClient
from .base_table import pad, BaseTable from druidapi.base_table import pad, BaseTable
alignments = ['', '^', '>'] alignments = ['', '^', '>']

View File

@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from .error import ClientError from druidapi.error import ClientError
def dict_get(dict, key, default=None): def dict_get(dict, key, default=None):
''' '''

View File

@ -0,0 +1,37 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from setuptools import setup, find_packages
setup(
name='druidapi',
version='0.1.0',
description='Python API client for Apache Druid',
url='https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/druidapi',
author='Apache Druid project',
author_email='dev@druid.apache.org',
license='Apache License 2.0',
packages=find_packages(),
install_requires=['requests'],
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Developers',
'Intended Audience :: End Users/Desktop',
'License :: OSI Approved :: Apache Software License',
'Operating System :: OS Independent',
'Programming Language :: Python :: 3',
],
)