Druid Quickstart refactor and update (#9766)

* Update data-formats.md

Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"

* Light text cleanup

* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.

* Update index.md

* original quickstart full first pass

* original quickstart full first pass

* first pass all the way through

* straggler

* image touchups and finished old tutorial

* a bit of finishing up

* Review comments

* fixing links

* spell checking gymnastics
This commit is contained in:
sthetland 2020-04-30 12:07:28 -07:00 committed by GitHub
parent 39722bd064
commit c61365c1e0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
28 changed files with 2838 additions and 2575 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 800 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 133 KiB

After

Width:  |  Height:  |  Size: 223 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 600 KiB

After

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 197 KiB

After

Width:  |  Height:  |  Size: 396 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 234 KiB

After

Width:  |  Height:  |  Size: 477 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 251 KiB

After

Width:  |  Height:  |  Size: 437 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 100 KiB

After

Width:  |  Height:  |  Size: 187 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 96 KiB

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 169 KiB

After

Width:  |  Height:  |  Size: 318 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 114 KiB

After

Width:  |  Height:  |  Size: 183 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 118 KiB

After

Width:  |  Height:  |  Size: 215 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

After

Width:  |  Height:  |  Size: 309 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 122 KiB

After

Width:  |  Height:  |  Size: 166 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 139 KiB

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 65 KiB

After

Width:  |  Height:  |  Size: 120 KiB

View File

@ -174,7 +174,7 @@ The `inputFormat` to load data of ORC format. An example is:
"expr": "$.path.to.nested" "expr": "$.path.to.nested"
} }
] ]
} },
"binaryAsString": false "binaryAsString": false
}, },
... ...

View File

@ -77,7 +77,7 @@ The [Druid router process](../design/router.md), which serves the [Druid console
It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore. It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
From here you can follow along with the [standard tutorials](./index.md#loading-data), or elaborate on your `docker-compose.yml` to add any additional external service dependencies as necessary. From here you can follow along with the [Quickstart](./index.md#step-4-load-data), or elaborate on your `docker-compose.yml` to add any additional external service dependencies as necessary.
## Docker Memory Requirements ## Docker Memory Requirements
If you experience any processes crashing with a 137 error code you likely don't have enough memory allocated to Docker. 6 GB may be a good place to start. If you experience any processes crashing with a 137 error code you likely don't have enough memory allocated to Docker. 6 GB may be a good place to start.

View File

@ -23,63 +23,55 @@ title: "Quickstart"
--> -->
In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data This quickstart gets you started with Apache Druid and introduces you to some of its basic features.
after completing this initial setup. Following these steps, you will install Druid and load sample
data using its native batch ingestion feature.
Before beginning the quickstart, it is helpful to read the [general Druid overview](../design/index.md) and the Before starting, you may want to read the [general Druid overview](../design/index.md) and
[ingestion overview](../ingestion/index.md), as the tutorials will refer to concepts discussed on those pages. [ingestion overview](../ingestion/index.md), as the tutorials refer to concepts discussed on those pages.
## Prerequisites ## Requirements
### Software You can follow these steps on a relatively small machine, such as a laptop with around 4 CPU and 16 GB of RAM.
You will need: Druid comes with several startup configuration profiles for a range of machine sizes.
The `micro-quickstart`configuration profile shown here is suitable for evaluating Druid. If you want to
try out Druid's performance or scaling capabilities, you'll need a larger machine and configuration profile.
The configuration profiles included with Druid range from the even smaller _Nano-Quickstart_ configuration (1 CPU, 4GB RAM)
to the _X-Large_ configuration (64 CPU, 512GB RAM). For more information, see
[Single server deployment](../operations/single-server.md). Alternatively, see [Clustered deployment](./cluster.md) for
information on deploying Druid services across clustered machines.
The software requirements for the installation machine are:
* **Java 8 (8u92+) or later**
* Linux, Mac OS X, or other Unix-like OS (Windows is not supported) * Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
* Java 8, Update 92 or later (8u92+)
> **Warning:** Druid only officially supports Java 8. Any Java version later than 8 is still experimental. > Druid officially supports Java 8 only. Support for later major versions of Java is currently in experimental status.
>
> If needed, you can specify where to find Java using the environment variables `DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the verify-java script.
### Hardware > Druid relies on the environment variables `JAVA_HOME` or `DRUID_JAVA_HOME` to find Java on the machine. You can set
`DRUID_JAVA_HOME` if there is more than one instance of Java. To verify Java requirements for your environment, run the
`bin/verify-java` script.
Druid includes several example [single-server configurations](../operations/single-server.md), along with scripts to
start the Druid processes using these configurations.
If you're running on a small machine such as a laptop for a quick evaluation, the `micro-quickstart` configuration is ## Step 1. Install Druid
a good choice, sized for a 4CPU/16GB RAM environment.
If you plan to use the single-machine deployment for further evaluation beyond the tutorials, we recommend a larger After confirming the [requirements](#requirements), follow these steps:
configuration than `micro-quickstart`.
## Getting started 1. Download
the [{{DRUIDVERSION}} release](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz).
2. In your terminal, extract Druid and change directories to the distribution directory:
[Download](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz) ```bash
the {{DRUIDVERSION}} release. tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
cd apache-druid-{{DRUIDVERSION}}
```
In the directory, you'll find `LICENSE` and `NOTICE` files and subdirectories for executable files, configuration files, sample data and more.
Extract Druid by running the following commands in your terminal: ## Step 2: Start up Druid services
```bash Start up Druid services using the `micro-quickstart` single-machine configuration.
tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
cd apache-druid-{{DRUIDVERSION}}
```
In the package, you should find:
* `LICENSE` and `NOTICE` files
* `bin/*` - scripts useful for this quickstart
* `conf/*` - example configurations for single-server and clustered setup
* `extensions/*` - core Druid extensions
* `hadoop-dependencies/*` - Druid Hadoop dependencies
* `lib/*` - libraries and dependencies for core Druid
* `quickstart/*` - configuration files, sample data, and other files for the quickstart tutorials
## Start up Druid services
The following commands will assume that you are using the `micro-quickstart` single-machine configuration. If you are
using a different configuration, the `bin` directory has equivalent scripts for each configuration, such as
`bin/start-single-server-small`.
From the apache-druid-{{DRUIDVERSION}} package root, run the following command: From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
@ -87,7 +79,7 @@ From the apache-druid-{{DRUIDVERSION}} package root, run the following command:
./bin/start-micro-quickstart ./bin/start-micro-quickstart
``` ```
This will bring up instances of ZooKeeper and the Druid services, all running on the local machine, e.g.: This brings up instances of ZooKeeper and the Druid services:
```bash ```bash
$ ./bin/start-micro-quickstart $ ./bin/start-micro-quickstart
@ -99,96 +91,174 @@ $ ./bin/start-micro-quickstart
[Fri May 3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart [Fri May 3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
``` ```
All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`. All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under
the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes. At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance.
Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888). To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and
The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address. terminates all Druid processes.
## Step 3. Open the Druid console
After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888).
![Druid console](../assets/tutorial-quickstart-01.png "Druid console") ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore. It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again.
## Loading data ## Step 4. Load data
### Tutorial dataset
For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
The page edit events are stored as JSON objects in a text file.
The sample data has the following columns, and an example event is shown below:
* added
* channel
* cityName
* comment
* countryIsoCode
* countryName
* deleted
* delta
* isAnonymous
* isMinor
* isNew
* isRobot
* isUnpatrolled
* metroCode
* namespace
* page
* regionIsoCode
* regionName
* user
```json
{
"timestamp":"2015-09-12T20:03:45.018Z",
"channel":"#en.wikipedia",
"namespace":"Main",
"page":"Spider-Man's powers and equipment",
"user":"foobar",
"comment":"/* Artificial web-shooters */",
"cityName":"New York",
"regionName":"New York",
"regionIsoCode":"NY",
"countryName":"United States",
"countryIsoCode":"US",
"isAnonymous":false,
"isNew":false,
"isMinor":false,
"isRobot":false,
"isUnpatrolled":false,
"added":99,
"delta":99,
"deleted":0,
}
```
### Data loading tutorials Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_,
as we'll do here to perform batch file loading with Druid's native batch ingestion.
The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases. The Druid distribution bundles sample data we can use. The sample data located in `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz`
All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above. in the Druid root directory represents Wikipedia page edits for a given day.
- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion. 1. Click **Load data** from the Druid console header (![Load data](../assets/tutorial-batch-data-loader-00.png)).
- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
### Resetting cluster state 2. Select the **Local disk** tile and then click **Connect data**.
If you want a clean start after stopping the services, delete the `var` directory and run the `bin/start-micro-quickstart` script again. ![Data loader init](../assets/tutorial-batch-data-loader-01.png "Data loader init")
Once every service has started, you are now ready to load data. 3. Enter the following values:
#### Resetting Kafka - **Base directory**: `quickstart/tutorial/`
- **File filter**: `wikiticker-2015-09-12-sampled.json.gz`
![Data location](../assets/tutorial-batch-data-loader-015.png "Data location")
Entering the base directory and [wildcard file filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) separately, as afforded by the UI, allows you to specify multiple files for ingestion at once.
4. Click **Apply**.
The data loader displays the raw data, giving you a chance to verify that the data
appears as expected.
![Data loader sample](../assets/tutorial-batch-data-loader-02.png "Data loader sample")
Notice that your position in the sequence of steps to load data, **Connect** in our case, appears at the top of the console, as shown below.
You can click other steps to move forward or backward in the sequence at any time.
![Load data](../assets/tutorial-batch-data-loader-12.png)
5. Click **Next: Parse data**.
The data loader tries to determine the parser appropriate for the data format automatically. In this case
it identifies the data format as `json`, as shown in the **Input format** field at the bottom right.
![Data loader parse data](../assets/tutorial-batch-data-loader-03.png "Data loader parse data")
Feel free to select other **Input format** options to get a sense of their configuration settings
and how Druid parses other types of data.
6. With the JSON parser selected, click **Next: Parse time**. The **Parse time** settings are where you view and adjust the
primary timestamp column for the data.
![Data loader parse time](../assets/tutorial-batch-data-loader-04.png "Data loader parse time")
Druid requires data to have a primary timestamp column (internally stored in a column called `__time`).
If you do not have a timestamp in your data, select `Constant value`. In our example, the data loader
determines that the `time` column is the only candidate that can be used as the primary time column.
7. Click **Next: Transform**, **Next: Filter**, and then **Next: Configure schema**, skipping a few steps.
You do not need to adjust transformation or filtering settings, as applying ingestion time transforms and
filters are out of scope for this tutorial.
8. The Configure schema settings are where you configure what [dimensions](../ingestion/index.md#dimensions)
and [metrics](../ingestion/index.md#metrics) are ingested. The outcome of this configuration represents exactly how the
data will appear in Druid after ingestion.
Since our dataset is very small, you can turn off [rollup](../ingestion/index.md#rollup)
by unsetting the **Rollup** switch and confirming the change when prompted.
![Data loader schema](../assets/tutorial-batch-data-loader-05.png "Data loader schema")
10. Click **Next: Partition** to configure how the data will be split into segments. In this case, choose `DAY` as
the **Segment granularity**.
![Data loader partition](../assets/tutorial-batch-data-loader-06.png "Data loader partition")
Since this is a small dataset, we can have just a single segment, which is what selecting `DAY` as the
segment granularity gives us.
11. Click **Next: Tune** and **Next: Publish**.
12. The Publish settings are where you specify the datasource name in Druid. Let's change the default name from
`wikiticker-2015-09-12-sampled` to `wikipedia`.
![Data loader publish](../assets/tutorial-batch-data-loader-07.png "Data loader publish")
13. Click **Next: Edit spec** to review the ingestion spec we've constructed with the data loader.
![Data loader spec](../assets/tutorial-batch-data-loader-08.png "Data loader spec")
Feel free to go back and change settings from previous steps to see how doing so updates the spec.
Similarly, you can edit the spec directly and see it reflected in the previous steps.
> For other ways to load ingestion specs in Druid, see [Tutorial: Loading a file](./tutorial-batch.md).
14. Once you are satisfied with the spec, click **Submit**.
The new task for our wikipedia datasource now appears in the Ingestion view.
![Tasks view](../assets/tutorial-batch-data-loader-09.png "Tasks view")
The task may take a minute or two to complete. When done, the task status should be "SUCCESS", with
the duration of the task indicated. Note that the view is set to automatically
refresh, so you do not need to refresh the browser to see the status change.
A successful task means that one or more segments have been built and are now picked up by our data servers.
## Step 5. Query the data
You can now see the data as a datasource in the console and try out a query, as follows:
1. Click **Datasources** from the console header.
If the wikipedia datasource doesn't appear, wait a few moments for the segment to finish loading. A datasource is
queryable once it is shown to be "Fully available" in the **Availability** column.
2. When the datasource is available, open the Actions menu (![Actions](../assets/datasources-action-button.png)) for that
datasource and choose **Query with SQL**.
![Datasource view](../assets/tutorial-batch-data-loader-10.png "Datasource view")
> Notice the other actions you can perform for a datasource, including configuring retention rules, compaction, and more.
3. Run the prepopulated query, `SELECT * FROM "wikipedia"` to see the results.
![Query view](../assets/tutorial-batch-data-loader-11.png "Query view")
Congratulations! You've gone from downloading Druid to querying data in just one quickstart. See the following
section for what to do next.
## Next steps
After finishing the quickstart, check out the [query tutorial](../tutorials/tutorial-query.md) to further explore
Query features in the Druid console.
Alternatively, learn about other ways to ingest data in one of these tutorials:
- [Loading stream data from Apache Kafka](./tutorial-kafka.md) How to load streaming data from a Kafka topic.
- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) How to perform a batch file load, using a remote Hadoop cluster.
- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) How to write a new ingestion spec and use it to load data.
Remember that after stopping Druid services, you can start clean next time by deleting the `var` directory from the Druid root directory and
running the `bin/start-micro-quickstart` script again. You will likely want to do this before taking other data ingestion tutorials,
since in them you will create the same wikipedia datasource.
If you completed [Tutorial: Loading stream data from Kafka](./tutorial-kafka.md) and wish to reset the cluster state, you should additionally clear out any Kafka state.
Shut down the Kafka broker with CTRL-C before stopping ZooKeeper and the Druid services, and then delete the Kafka log directory at `/tmp/kafka-logs`:
```bash
rm -rf /tmp/kafka-logs
```

View File

@ -24,107 +24,15 @@ sidebar_label: "Loading files natively"
--> -->
This tutorial demonstrates how to perform a batch file load, using Apache Druid's native batch ingestion. This tutorial demonstrates how to load data into Apache Druid from a file using Apache Druid's native batch ingestion feature.
For this tutorial, we'll assume you've already downloaded Druid as described in You initiate data loading in Druid by submitting an *ingestion task* spec to the Druid Overlord. You can write ingestion
the [quickstart](index.html) using the `micro-quickstart` single-machine configuration and have it specs by hand or using the _data loader_ built into the Druid console.
running on your local machine. You don't need to have loaded any data yet.
A data load is initiated by submitting an *ingestion task* spec to the Druid Overlord. For this tutorial, we'll be loading the sample Wikipedia page edits data. The [Quickstart](./index.md) shows you how to use the data loader to build an ingestion spec. For production environments, it's
likely that you'll want to automate data ingestion. This tutorial starts by showing you how to submit an ingestion spec
An ingestion spec can be written by hand or by using the "Data loader" that is built into the Druid console. directly in the Druid console, and then introduces ways to ingest batch data that lend themselves to
The data loader can help you build an ingestion spec by sampling your data and and iteratively configuring various ingestion parameters. automation—from the command line and from a script.
The data loader currently only supports native batch ingestion (support for streaming, including data stored in Apache Kafka and AWS Kinesis, is coming in future releases).
Streaming ingestion is only available through a written ingestion spec today.
We've included a sample of Wikipedia edits from September 12, 2015 to get you started.
## Loading data with the data loader
Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in the console header.
![Data loader init](../assets/tutorial-batch-data-loader-01.png "Data loader init")
Select `Local disk` and click `Connect data`.
![Data loader sample](../assets/tutorial-batch-data-loader-02.png "Data loader sample")
Enter `quickstart/tutorial/` as the base directory and `wikiticker-2015-09-12-sampled.json.gz` as a filter.
The separation of base directory and [wildcard file filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) is there if you need to ingest data from multiple files.
Click `Apply` and make sure that the data you are seeing is correct.
Once the data is located, you can click "Next: Parse data" to go to the next step.
![Data loader parse data](../assets/tutorial-batch-data-loader-03.png "Data loader parse data")
The data loader will try to automatically determine the correct parser for the data.
In this case it will successfully determine `json`.
Feel free to play around with different parser options to get a preview of how Druid will parse your data.
With the `json` parser selected, click `Next: Parse time` to get to the step centered around determining your primary timestamp column.
![Data loader parse time](../assets/tutorial-batch-data-loader-04.png "Data loader parse time")
Druid's architecture requires a primary timestamp column (internally stored in a column called `__time`).
If you do not have a timestamp in your data, select `Constant value`.
In our example, the data loader will determine that the `time` column in our raw data is the only candidate that can be used as the primary time column.
Click `Next: ...` twice to go past the `Transform` and `Filter` steps.
You do not need to enter anything in these steps as applying ingestion time transforms and filters are out of scope for this tutorial.
![Data loader schema](../assets/tutorial-batch-data-loader-05.png "Data loader schema")
In the `Configure schema` step, you can configure which [dimensions](../ingestion/index.md#dimensions) and [metrics](../ingestion/index.md#metrics) will be ingested into Druid.
This is exactly what the data will appear like in Druid once it is ingested.
Since our dataset is very small, go ahead and turn off [`Rollup`](../ingestion/index.md#rollup) by clicking on the switch and confirming the change.
Once you are satisfied with the schema, click `Next` to go to the `Partition` step where you can fine tune how the data will be partitioned into segments.
![Data loader partition](../assets/tutorial-batch-data-loader-06.png "Data loader partition")
Here, you can adjust how the data will be split up into segments in Druid.
Since this is a small dataset, there are no adjustments that need to be made in this step.
Clicking past the `Tune` step, to get to the publish step.
![Data loader publish](../assets/tutorial-batch-data-loader-07.png "Data loader publish")
The `Publish` step is where we can specify what the datasource name in Druid.
Let's name this datasource `wikipedia`.
Finally, click `Next` to review your spec.
![Data loader spec](../assets/tutorial-batch-data-loader-08.png "Data loader spec")
This is the spec you have constructed.
Feel free to go back and make changes in previous steps to see how changes will update the spec.
Similarly, you can also edit the spec directly and see it reflected in the previous steps.
Once you are satisfied with the spec, click `Submit` and an ingestion task will be created.
![Tasks view](../assets/tutorial-batch-data-loader-09.png "Tasks view")
You will be taken to the task view with the focus on the newly created task.
The task view is set to auto refresh, wait until your task succeeds.
When a tasks succeeds it means that it built one or more segments that will now be picked up by the data servers.
Navigate to the `Datasources` view from the header.
![Datasource view](../assets/tutorial-batch-data-loader-10.png "Datasource view")
Wait until your datasource (`wikipedia`) appears.
This can take a few seconds as the segments are being loaded.
A datasource is queryable once you see a green (fully available) circle.
At this point, you can go to the `Query` view to run SQL queries against the datasource.
![Query view](../assets/tutorial-batch-data-loader-11.png "Query view")
Run a `SELECT * FROM "wikipedia"` query to see your results.
Check out the [query tutorial](../tutorials/tutorial-query.md) to run some example queries on the newly loaded data.
## Loading data with a spec (via console) ## Loading data with a spec (via console)
@ -195,17 +103,17 @@ which has been configured to read the `quickstart/tutorial/wikiticker-2015-09-12
} }
``` ```
This spec will create a datasource named "wikipedia". This spec creates a datasource named "wikipedia".
From the task view, click on `Submit task` and select `Raw JSON task`. From the Ingestion view, click the ellipses next to Tasks and choose `Submit JSON task`.
![Tasks view add task](../assets/tutorial-batch-submit-task-01.png "Tasks view add task") ![Tasks view add task](../assets/tutorial-batch-submit-task-01.png "Tasks view add task")
This will bring up the spec submission dialog where you can paste the spec above. This brings up the spec submission dialog where you can paste the spec above.
![Query view](../assets/tutorial-batch-submit-task-02.png "Query view") ![Query view](../assets/tutorial-batch-submit-task-02.png "Query view")
Once the spec is submitted, you can follow the same instructions as above to wait for the data to load and then query it. Once the spec is submitted, wait a few moments for the data to load, after which you can query it.
## Loading data with a spec (via command line) ## Loading data with a spec (via command line)

View File

@ -264,7 +264,14 @@ Please follow the [query tutorial](../tutorials/tutorial-query.md) to run some e
## Cleanup ## Cleanup
If you wish to go through any of the other ingestion tutorials, you will need to shut down the cluster and reset the cluster state by removing the contents of the `var` directory under the druid package, as the other tutorials will write to the same "wikipedia" datasource. To go through any of the other ingestion tutorials, you will need to shut down the cluster and reset the cluster state by removing the contents of the `var` directory in the Druid home, as the other tutorials will write to the same "wikipedia" datasource.
You should additionally clear out any Kafka state. Do so by shutting down the Kafka broker with CTRL-C before stopping ZooKeeper and the Druid services, and then deleting the Kafka log directory at `/tmp/kafka-logs`:
```bash
rm -rf /tmp/kafka-logs
```
## Further reading ## Further reading

View File

@ -321,6 +321,7 @@ prepend
prepended prepended
prepending prepending
prepends prepends
prepopulated
preprocessing preprocessing
priori priori
programmatically programmatically
@ -383,6 +384,7 @@ unmergeable
unmerged unmerged
unparseable unparseable
unparsed unparsed
unsetting
useFilterCNF useFilterCNF
uptime uptime
uris uris
@ -1746,3 +1748,4 @@ UserGroupInformation
CVE-2019-17571 CVE-2019-17571
CVE-2019-12399 CVE-2019-12399
CVE-2018-17196 CVE-2018-17196
bin.tar.gz

View File

@ -424,7 +424,7 @@
"sidebar_label": "Multitenancy" "sidebar_label": "Multitenancy"
}, },
"querying/post-aggregations": { "querying/post-aggregations": {
"title": "Postaggregations" "title": "Post-aggregations"
}, },
"querying/query-context": { "querying/query-context": {
"title": "Query context", "title": "Query context",

4952
website/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@ -14,13 +14,13 @@
"spellcheck": "mdspell --en-us --ignore-numbers --report '../docs/**/*.md'" "spellcheck": "mdspell --en-us --ignore-numbers --report '../docs/**/*.md'"
}, },
"devDependencies": { "devDependencies": {
"docusaurus": "^1.12.0", "docusaurus": "^1.14.4",
"markdown-spellcheck": "^1.3.1", "markdown-spellcheck": "^1.3.1",
"node-sass": "^4.12.0" "node-sass": "^4.13.1"
}, },
"dependencies": { "dependencies": {
"fast-glob": "^3.0.4", "fast-glob": "^3.2.2",
"fs-extra": "^8.1.0", "fs-extra": "^8.1.0",
"replace-in-file": "^4.1.3" "replace-in-file": "^4.3.1"
} }
} }

View File

@ -83,3 +83,12 @@ footer.druid-footer {
.navGroups > .navGroup:last-child { .navGroups > .navGroup:last-child {
display: none; } display: none; }
/* testing inline images */
article p img,
article iframe {
display: inline;
margin-left: auto;
margin-right: auto;
max-width: 100%;
}