2018-12-13 14:47:20 -05:00
---
2019-08-21 00:48:59 -04:00
id: index
2022-09-17 00:58:11 -04:00
title: "Quickstart (local)"
2018-12-13 14:47:20 -05:00
---
2018-11-13 12:38:37 -05:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
This quickstart gets you started with Apache Druid and introduces you to Druid ingestion and query features. For this tutorial, we recommend a machine with at least 6 GB of RAM.
2022-09-06 13:36:09 -04:00
In this quickstart, you'll do the following:
- install Druid
- start up Druid services
2022-09-17 00:58:11 -04:00
- use SQL to ingest and query data
2022-09-06 13:36:09 -04:00
2022-09-17 00:58:11 -04:00
Druid supports a variety of ingestion options. Once you're done with this tutorial, refer to the
[Ingestion ](../ingestion/index.md ) page to determine which ingestion method is right for you.
2018-08-09 16:37:52 -04:00
2020-04-30 15:07:28 -04:00
## Requirements
2018-08-09 16:37:52 -04:00
2022-09-17 00:58:11 -04:00
You can follow these steps on a relatively modest machine, such as a workstation or virtual server with 16 GiB of RAM.
2022-09-06 13:36:09 -04:00
2020-04-30 15:07:28 -04:00
The software requirements for the installation machine are:
2019-05-06 22:11:13 -04:00
2022-12-14 14:45:23 -05:00
* Linux, Mac OS X, or other Unix-like OS. (Windows is not supported)
* [Java 8u92+ or Java 11 ](../operations/java.md )
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
* [Python2 or Python3 ](../operations/python.md )
2023-02-13 16:34:49 -05:00
* Perl 5
2019-05-06 22:11:13 -04:00
2022-03-04 17:15:45 -05:00
> Druid relies on the environment variables `JAVA_HOME` or `DRUID_JAVA_HOME` to find Java on the machine. You can set
2020-04-30 15:07:28 -04:00
`DRUID_JAVA_HOME` if there is more than one instance of Java. To verify Java requirements for your environment, run the
`bin/verify-java` script.
2019-05-06 22:11:13 -04:00
2022-09-17 00:58:11 -04:00
Before installing a production Druid instance, be sure to review the [security
overview](../operations/security-overview.md). In general, avoid running Druid as root user. Consider creating a
dedicated user account for running Druid.
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
## Install Druid
2018-11-02 00:47:29 -04:00
2022-09-06 13:36:09 -04:00
Download the [{{DRUIDVERSION}} release ](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz ) from Apache Druid.
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
In your terminal, extract the file and change directories to the distribution directory:
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
```bash
tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
cd apache-druid-{{DRUIDVERSION}}
```
The distribution directory contains `LICENSE` and `NOTICE` files and subdirectories for executable files, configuration files, sample data and more.
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
## Start up Druid services
2018-08-09 16:37:52 -04:00
2023-01-12 00:12:52 -05:00
Start up Druid services using the automatic single-machine configuration.
2022-09-06 13:36:09 -04:00
This configuration includes default settings that are appropriate for this tutorial, such as loading the `druid-multi-stage-query` extension by default so that you can use the MSQ task engine.
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
You can view that setting and others in the configuration files in the `conf/druid/auto` .
2019-05-06 22:11:13 -04:00
2019-09-22 20:38:55 -04:00
From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
2018-08-09 16:37:52 -04:00
```bash
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
./bin/start-druid
2018-08-09 16:37:52 -04:00
```
2022-12-22 01:21:08 -05:00
This brings up instances of ZooKeeper and the Druid services and may use up to 80% of the total available system memory. To explicitly set the total memory available to Druid, pass a value for the memory parameter, e.g. `./bin/start-druid -m 16g` or `./bin/start-druid --memory 16g` .
2018-08-09 16:37:52 -04:00
2018-08-13 14:11:32 -04:00
```bash
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
$ ./bin/start-druid
[Tue Nov 29 16:31:06 2022] Starting Apache Druid.
[Tue Nov 29 16:31:06 2022] Open http://localhost:8888/ in your browser to access the web console.
[Tue Nov 29 16:31:06 2022] Or, if you have enabled TLS, use https on port 9088.
[Tue Nov 29 16:31:06 2022] Starting services with log directory [/apache-druid-{{DRUIDVERSION}}/log].
[Tue Nov 29 16:31:06 2022] Running command[zk]: bin/run-zk conf
[Tue Nov 29 16:31:06 2022] Running command[broker]: bin/run-druid broker /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms1187m -Xmx1187m -XX:MaxDirectMemorySize=791m'
[Tue Nov 29 16:31:06 2022] Running command[router]: bin/run-druid router /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms128m -Xmx128m'
[Tue Nov 29 16:31:06 2022] Running command[coordinator-overlord]: bin/run-druid coordinator-overlord /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms1290m -Xmx1290m'
[Tue Nov 29 16:31:06 2022] Running command[historical]: bin/run-druid historical /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms1376m -Xmx1376m -XX:MaxDirectMemorySize=2064m'
[Tue Nov 29 16:31:06 2022] Running command[middleManager]: bin/run-druid middleManager /apache-druid-{{DRUIDVERSION}}/conf/druid/single-server/quickstart '-Xms64m -Xmx64m' '-Ddruid.worker.capacity=2 -Ddruid.indexer.runner.javaOptsArray=["-server","-Duser.timezone=UTC","-Dfile.encoding=UTF-8","-XX:+ExitOnOutOfMemoryError","-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager","-Xms256m","-Xmx256m","-XX:MaxDirectMemorySize=256m"]'
2018-08-09 16:37:52 -04:00
```
2020-04-30 15:07:28 -04:00
All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under
2022-09-06 13:36:09 -04:00
the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv` .
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance.
2020-04-30 15:07:28 -04:00
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
To stop Druid at any time, use CTRL+C in the terminal. This exits the `bin/start-druid` script and terminates all Druid processes.
2018-08-09 16:37:52 -04:00
2022-09-17 00:58:11 -04:00
## Open the web console
2018-08-09 16:37:52 -04:00
2022-09-17 00:58:11 -04:00
After the Druid services finish startup, open the [web console ](../operations/web-console.md ) at [http://localhost:8888 ](http://localhost:8888 ).
2018-08-09 16:37:52 -04:00
2022-09-17 00:58:11 -04:00
![web console ](../assets/tutorial-quickstart-01.png "web console" )
2018-09-21 17:18:31 -04:00
2022-09-17 00:58:11 -04:00
It may take a few seconds for all Druid services to finish starting, including the [Druid router ](../design/router.md ), which serves the console. If you attempt to open the web console before startup is complete, you may see errors in the browser. Wait a few moments and try again.
2018-08-09 16:37:52 -04:00
2022-09-17 00:58:11 -04:00
In this quickstart, you use the the web console to perform ingestion. The MSQ task engine specifically uses the **Query** view to edit and run SQL queries.
For a complete walkthrough of the **Query** view as it relates to the multi-stage query architecture and the MSQ task engine, see [UI walkthrough ](../operations/web-console.md ).
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
## Load data
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
The Druid distribution bundles the `wikiticker-2015-09-12-sampled.json.gz` sample dataset that you can use for testing. The sample dataset is located in the `quickstart/tutorial/` folder, accessible from the Druid root directory, and represents Wikipedia page edits for a given day.
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
Follow these steps to load the sample Wikipedia dataset:
2018-08-09 16:37:52 -04:00
2022-09-06 13:36:09 -04:00
1. In the **Query** view, click **Connect external data** .
2. Select the **Local disk** tile and enter the following values:
2020-04-30 15:07:28 -04:00
- **Base directory**: `quickstart/tutorial/`
- **File filter**: `wikiticker-2015-09-12-sampled.json.gz`
2022-09-06 13:36:09 -04:00
![Data location ](../assets/tutorial-quickstart-02.png "Data location" )
2020-04-30 15:07:28 -04:00
Entering the base directory and [wildcard file filter ](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html ) separately, as afforded by the UI, allows you to specify multiple files for ingestion at once.
2022-09-06 13:36:09 -04:00
3. Click **Connect data** .
4. On the **Parse** page, you can examine the raw data and perform the following optional actions before loading data into Druid:
- Expand a row to see the corresponding source data.
- Customize how the data is handled by selecting from the **Input format** options.
- Adjust the primary timestamp column for the data.
2020-04-30 15:07:28 -04:00
Druid requires data to have a primary timestamp column (internally stored in a column called `__time` ).
2022-09-06 13:36:09 -04:00
If your dataset doesn't have a timestamp, Druid uses the default value of `1970-01-01 00:00:00` .
![Data sample ](../assets/tutorial-quickstart-03.png "Data sample" )
5. Click **Done** . You're returned to the **Query** view that displays the newly generated query.
The query inserts the sample data into the table named `wikiticker-2015-09-12-sampled` .
< details > < summary > Show the query< / summary >
```sql
REPLACE INTO "wikiticker-2015-09-12-sampled" OVERWRITE ALL
WITH input_data AS (SELECT *
FROM TABLE(
EXTERN(
'{"type":"local","baseDir":"quickstart/tutorial/","filter":"wikiticker-2015-09-12-sampled.json.gz"}',
'{"type":"json"}',
'[{"name":"time","type":"string"},{"name":"channel","type":"string"},{"name":"cityName","type":"string"},{"name":"comment","type":"string"},{"name":"countryIsoCode","type":"string"},{"name":"countryName","type":"string"},{"name":"isAnonymous","type":"string"},{"name":"isMinor","type":"string"},{"name":"isNew","type":"string"},{"name":"isRobot","type":"string"},{"name":"isUnpatrolled","type":"string"},{"name":"metroCode","type":"long"},{"name":"namespace","type":"string"},{"name":"page","type":"string"},{"name":"regionIsoCode","type":"string"},{"name":"regionName","type":"string"},{"name":"user","type":"string"},{"name":"delta","type":"long"},{"name":"added","type":"long"},{"name":"deleted","type":"long"}]'
)
))
SELECT
TIME_PARSE("time") AS __time,
channel,
cityName,
comment,
countryIsoCode,
countryName,
isAnonymous,
isMinor,
isNew,
isRobot,
isUnpatrolled,
metroCode,
namespace,
page,
regionIsoCode,
regionName,
user,
delta,
added,
deleted
FROM input_data
PARTITIONED BY DAY
```
< / details >
2020-04-30 15:07:28 -04:00
2022-09-20 20:44:21 -04:00
6. Optionally, click **Preview** to see the general shape of the data before you ingest it.
7. Edit the first line of the query and change the default destination datasource name from `wikiticker-2015-09-12-sampled` to `wikipedia` .
8. Click **Run** to execute the query. The task may take a minute or two to complete. When done, the task displays its duration and the number of rows inserted into the table. The view is set to automatically refresh, so you don't need to refresh the browser to see the status change.
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
![Run query ](../assets/tutorial-quickstart-04.png "Run query" )
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
A successful task means that Druid data servers have picked up one or more segments.
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
## Query data
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
Once the ingestion job is complete, you can query the data.
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
In the **Query** view, run the following query to produce a list of top channels:
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
```sql
SELECT
channel,
COUNT(*)
2022-09-20 20:44:21 -04:00
FROM "wikipedia"
2022-09-06 13:36:09 -04:00
GROUP BY channel
ORDER BY COUNT(*) DESC
```
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
![Query view ](../assets/tutorial-quickstart-05.png "Query view" )
2020-04-30 15:07:28 -04:00
2022-09-06 13:36:09 -04:00
Congratulations! You've gone from downloading Druid to querying data with the MSQ task engine in just one quickstart.
2020-04-30 15:07:28 -04:00
## Next steps
2022-09-06 13:36:09 -04:00
See the following topics for more information:
2018-08-09 16:37:52 -04:00
2022-09-20 20:44:21 -04:00
* [Druid SQL overview ](../querying/sql.md ) or the [Query tutorial ](./tutorial-query.md ) to learn about how to query the data you just ingested.
2022-09-17 00:58:11 -04:00
* [Ingestion overview ](../ingestion/index.md ) to explore options for ingesting more data.
* [Tutorial: Load files using SQL ](./tutorial-msq-extern.md ) to learn how to generate a SQL query that loads external data into a Druid datasource.
* [Tutorial: Load data with native batch ingestion ](tutorial-batch-native.md ) to load and query data with Druid's native batch ingestion feature.
* [Tutorial: Load stream data from Apache Kafka ](./tutorial-kafka.md ) to load streaming data from a Kafka topic.
2022-09-06 13:36:09 -04:00
* [Extensions ](../development/extensions.md ) for details on Druid extensions.
2022-09-17 00:58:11 -04:00
Druid automated quickstart (#13365)
* Druid automated quickstart
* remove conf/druid/single-server/quickstart/_common/historical/jvm.config
* Minor changes in python script
* Add lower bound memory for some services
* Additional runtime properties for services
* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py
* File end newline
* Limit the ability to start multiple instances of a service, documentation changes
* simplify script arguments
* restore changes in medium profile
* run-druid refactor
* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging
* Remove extra quotes from mm task javaopts array
* Update logic to compute minimum memory
* simplify run-druid
* remove debug options from run-druid
* resolve the config_path provided
* comment out service specific runtime properties which are computed in the code
* simplify run-druid
* clean up docs, naming changes
* Throw ValueError exception on illegal state
* update docs
* rename args, compute_only -> compute, run_zk -> zk
* update help documentation
* update help documentation
* move task memory computation into separate method
* Add validation checks
* remove print
* Add validations
* remove start-druid bash script, rename start-druid-main
* Include tasks in lower bound memory calculation
* Fix test
* 256m instead of 256g
* caffeine cache uses 5% of heap
* ensure min task count is 2, task count is monotonic
* update configs and documentation for runtime props in conf/druid/single-server/quickstart
* Update docs
* Specify memory argument for each profile in single-server.md
* Update middleManager runtime.properties
* Move quickstart configs to conf/druid/base, add bash launch script, support python2
* Update supervise script
* rename base config directory to auto
* rename python script, changes to pass repeated args to supervise
* remove exmaples/conf/druid/base dir
* add docs
* restore changes in conf dir
* update start-druid-auto
* remove hashref for commands in supervise script
* start-druid-main java_opts array is comma separated
* update entry point script name in python script
* Update help docs
* documentation changes
* docs changes
* update docs
* add support for running indexer
* update supported services list
* update help
* Update python.md
* remove dir
* update .spelling
* Remove dependency on psutil and pathlib
* update docs
* Update get_physical_memory method
* Update help docs
* update docs
* update method to get physical memory on python
* udpate spelling
* update .spelling
* minor change
* Minor change
* memory comptuation for indexer
* update start-druid
* Update python.md
* Update single-server.md
* Update python.md
* run python3 --version to check if python is installed
* Update supervise script
* start-druid: echo message if python not found
* update anchor text
* minor change
* Update condition in supervise script
* JVM not jvm in docs
2022-12-09 14:04:02 -05:00
Remember that after stopping Druid services, you can start clean next time by deleting the `var` directory from the Druid root directory and running the `bin/start-druid` script again. You may want to do this before using other data ingestion tutorials, since they use the same Wikipedia datasource.