druid/examples/quickstart/jupyter-notebooks
Katya Macedo 1595653e6f
docs: add a link for the Druid SQL tutorial (#13468)
* docs: add juptyer API tutorial for API and jupyter tutorial index (#3)

(cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583)

* update prereqs and fix jupyterlab name

* Removing notebook since 13345 has it

13345 should be merged first

* update contributing instructions

* docs: link to the  Druid SQL tutorial

* Add link to partitioning

* fix merge conflict

* Saving

* Update docs/tutorials/tutorial-jupyter-index.md

* Remove partitioning

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: brian.le <brian.le@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-22 09:36:13 -08:00
..
README.md docs: add a link for the Druid SQL tutorial (#13468) 2023-02-22 09:36:13 -08:00
api-tutorial.ipynb docs: notebook only for API tutorial (#13345) 2022-12-15 13:16:07 -08:00
sql-tutorial.ipynb docs: notebook only for SQL tutorial (#13465) 2023-02-08 20:04:53 -08:00

README.md

Jupyter Notebook tutorials for Druid

You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These tutorials provide snippets of Python code that you can use to run calls against the Druid API to complete the tutorial.

Prerequisites

Make sure you meet the following requirements before starting the Jupyter-based tutorials:

  • Python 3

  • The requests package for Python. For example, you can install it with the following command:

    pip3 install requests
    
  • JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid and Jupyter both try to use port 8888, so start Jupyter on a different port.

    • Install JupyterLab or Notebook:

      # Install JupyterLab
      pip3 install jupyterlab  
      # Install Jupyter Notebook
      pip3 install notebook
      
    • Start Jupyter:

      • JupyterLab
        # Start JupyterLab on port 3001
        jupyter lab --port 3001
        
      • Jupyter Notebook
        # Start Jupyter Notebook on port 3001
        jupyter notebook --port 3001
        
  • An available Druid instance. You can use the micro-quickstart configuration described in Quickstart (local). The tutorials assume that you are using the quickstart, so no authentication or authorization is expected unless explicitly mentioned.

Tutorials

The notebooks are located in the apache/druid repo. You can either clone the repo or download the notebooks you want individually.

The links that follow are the raw GitHub URLs, so you can use them to download the notebook directly, such as with wget, or manually through your web browser. Note that if you save the file from your web browser, make sure to remove the .txt extension.

Contributing

If you build a Jupyter tutorial, you need to do a few things to add it to the docs in addition to saving the notebook in this directory. The process requires two PRs to the repo.

For the first PR, do the following:

  1. Clear the outputs from your notebook before you make the PR. You can use the following command:

    jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace ./path/to/notebook/notebookName.ipynb
    
  2. Create the PR as you normally would. Make sure to note that this PR is the one that contains only the Jupyter notebook and that there will be a subsequent PR that updates related pages.

  3. After this first PR is merged, grab the "raw" URL for the file from GitHub. For example, navigate to the file in the GitHub web UI and select Raw. Use the URL for this in the second PR as the download link.

For the second PR, do the following:

  1. Update the list of Tutorials on this page and in the Jupyter tutorial index page in the docs/tutorials directory.
  2. Update tutorial-jupyter-index.md and provide the URL to the raw version of the file that becomes available after the first PR is merged.