Update tutorials for 0.14.0-incubating (#7157)

This commit is contained in:
Jonathan Wei 2019-02-27 19:50:31 -08:00 committed by Fangjin Yang
parent cacdc83cad
commit 3d247498ef
28 changed files with 125 additions and 55 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 223 KiB

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 217 KiB

After

Width:  |  Height:  |  Size: 273 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 305 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 343 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 KiB

After

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 127 KiB

After

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 218 KiB

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 401 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 135 KiB

After

Width:  |  Height:  |  Size: 43 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 228 KiB

View File

@ -96,13 +96,13 @@ This will bring up instances of Zookeeper and the Druid services, all running on
```bash ```bash
bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
[Thu Jul 26 12:16:23 2018] Running command[zk], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf [Wed Feb 27 12:46:13 2019] Running command[zk], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[coordinator], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf [Wed Feb 27 12:46:13 2019] Running command[coordinator], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[broker], logging to[//stage/apache-druid-#{DRUIDVERSION}/var/sv/broker.log]: bin/run-druid broker quickstart/tutorial/conf [Wed Feb 27 12:46:13 2019] Running command[broker], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/broker.log]: bin/run-druid broker quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[historical], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/historical.log]: bin/run-druid historical quickstart/tutorial/conf [Wed Feb 27 12:46:13 2019] Running command[router], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/router.log]: bin/run-druid router quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[overlord], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/overlord.log]: bin/run-druid overlord quickstart/tutorial/conf [Wed Feb 27 12:46:13 2019] Running command[historical], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/historical.log]: bin/run-druid historical quickstart/tutorial/conf
[Thu Jul 26 12:16:23 2018] Running command[middleManager], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/middleManager.log]: bin/run-druid middleManager quickstart/tutorial/conf [Wed Feb 27 12:46:13 2019] Running command[overlord], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/overlord.log]: bin/run-druid overlord quickstart/tutorial/conf
[Wed Feb 27 12:46:13 2019] Running command[middleManager], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/middleManager.log]: bin/run-druid middleManager quickstart/tutorial/conf
``` ```
All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-#{DRUIDVERSION} package root. Logs for the services are located at `var/sv`. All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-#{DRUIDVERSION} package root. Logs for the services are located at `var/sv`.

View File

@ -163,16 +163,16 @@ Which will print the ID of the task if the submission was successful:
{"task":"index_wikipedia_2018-06-09T21:30:32.802Z"} {"task":"index_wikipedia_2018-06-09T21:30:32.802Z"}
``` ```
To view the status of the ingestion task, go to the Overlord console: To view the status of the ingestion task, go to the Druid Console:
[http://localhost:8090/console.html](http://localhost:8090/console.html). You can refresh the console periodically, and after [http://localhost:8888/](http://localhost:8888). You can refresh the console periodically, and after
the task is successful, you should see a "SUCCESS" status for the task. the task is successful, you should see a "SUCCESS" status for the task under the [Tasks view](http://localhost:8888/unified-console.html#tasks).
After the ingestion task finishes, the data will be loaded by Historical nodes and available for After the ingestion task finishes, the data will be loaded by Historical processes and available for
querying within a minute or two. You can monitor the progress of loading the data in the querying within a minute or two. You can monitor the progress of loading the data in the
Coordinator console, by checking whether there is a datasource "wikipedia" with a blue circle Datasources view, by checking whether there is a datasource "wikipedia" with a green circle
indicating "fully available": [http://localhost:8081/#/](http://localhost:8081/#/). indicating "fully available": [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources).
![Coordinator console](../tutorials/img/tutorial-batch-01.png "Wikipedia 100% loaded") ![Druid Console](../tutorials/img/tutorial-batch-01.png "Wikipedia 100% loaded")
## Further reading ## Further reading

View File

@ -49,11 +49,15 @@ Please note that `maxRowsPerSegment` in the ingestion spec is set to 1000. This
It's 5000000 by default and may need to be adjusted to make your segments optimized. It's 5000000 by default and may need to be adjusted to make your segments optimized.
</div> </div>
After the ingestion completes, go to http://localhost:8081/#/datasources/compaction-tutorial in a browser to view information about the new datasource in the Coordinator console. After the ingestion completes, go to [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser to see the new datasource in the Druid Console.
![compaction-tutorial datasource](../tutorials/img/tutorial-compaction-01.png "compaction-tutorial datasource")
Click the `51 segments` link next to "Fully Available" for the `compaction-tutorial` datasource to view information about the datasource's segments:
There will be 51 segments for this datasource, 1-3 segments per hour in the input data: There will be 51 segments for this datasource, 1-3 segments per hour in the input data:
![Original segments](../tutorials/img/tutorial-retention-01.png "Original segments") ![Original segments](../tutorials/img/tutorial-compaction-02.png "Original segments")
Running a COUNT(*) query on this datasource shows that there are 39,244 rows: Running a COUNT(*) query on this datasource shows that there are 39,244 rows:
@ -71,7 +75,7 @@ Retrieved 1 row in 1.38s.
Let's now compact these 51 small segments. Let's now compact these 51 small segments.
We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-final-index.json`: We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-keep-granularity.json`:
```json ```json
{ {
@ -96,18 +100,20 @@ In this tutorial example, only one compacted segment will be created per hour, a
Let's submit this task now: Let's submit this task now:
```bash ```bash
bin/post-index-task --file quickstart/tutorial/compaction-final-index.json bin/post-index-task --file quickstart/tutorial/compaction-keep-granularity.json
``` ```
After the task finishes, refresh the http://localhost:8081/#/datasources/compaction-tutorial page. After the task finishes, refresh the [segments view](http://localhost:8888/unified-console.html#segments).
The original 51 segments will eventually be marked as "unused" by the Coordinator and removed, with the new compacted segments remaining. The original 51 segments will eventually be marked as "unused" by the Coordinator and removed, with the new compacted segments remaining.
By default, the Druid Coordinator will not mark segments as unused until the Coordinator process has been up for at least 15 minutes, so you may see the old segment set and the new compacted set at the same time in the Coordinator, e.g.: By default, the Druid Coordinator will not mark segments as unused until the Coordinator process has been up for at least 15 minutes, so you may see the old segment set and the new compacted set at the same time in the Druid Console, with 75 total segments:
![Compacted segments intermediate state](../tutorials/img/tutorial-compaction-01.png "Compacted segments intermediate state") ![Compacted segments intermediate state 1](../tutorials/img/tutorial-compaction-03.png "Compacted segments intermediate state 1")
The new compacted segments have a more recent version than the original segments, so even when both sets of segments are shown by the Coordinator, queries will only read from the new compacted segments. ![Compacted segments intermediate state 2](../tutorials/img/tutorial-compaction-04.png "Compacted segments intermediate state 2")
The new compacted segments have a more recent version than the original segments, so even when both sets of segments are shown in the Druid Console, queries will only read from the new compacted segments.
Let's try running a COUNT(*) on `compaction-tutorial` again, where the row count should still be 39,244: Let's try running a COUNT(*) on `compaction-tutorial` again, where the row count should still be 39,244:
@ -121,9 +127,47 @@ dsql> select count(*) from "compaction-tutorial";
Retrieved 1 row in 1.30s. Retrieved 1 row in 1.30s.
``` ```
After the Coordinator has been running for at least 15 minutes, the http://localhost:8081/#/datasources/compaction-tutorial page should show there is only 1 segment: After the Coordinator has been running for at least 15 minutes, the [segments view](http://localhost:8888/unified-console.html#segments) should show there are 24 segments, one per hour:
![Compacted segments hourly granularity 1](../tutorials/img/tutorial-compaction-05.png "Compacted segments hourly granularity 1")
![Compacted segments hourly granularity 2](../tutorials/img/tutorial-compaction-06.png "Compacted segments hourly granularity 2")
## Compact the data with new segment granularity
The compaction task can also produce compacted segments with a granularity different from the granularity of the input segments.
We have included a compaction task spec that will create DAY granularity segments at `quickstart/tutorial/compaction-day-granularity.json`:
```json
{
"type": "compact",
"dataSource": "compaction-tutorial",
"interval": "2015-09-12/2015-09-13",
"segmentGranularity": "DAY",
"tuningConfig" : {
"type" : "index",
"maxRowsPerSegment" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}
```
Note that `segmentGranularity` is set to `DAY` in this compaction task spec.
Let's submit this task now:
```bash
bin/post-index-task --file quickstart/tutorial/compaction-day-granularity.json
```
It will take a bit of time before the Coordinator marks the old input segments as unused, so you may see an intermediate state with 25 total segments. Eventually, there will only be one DAY granularity segment:
![Compacted segments day granularity 1](../tutorials/img/tutorial-compaction-07.png "Compacted segments day granularity 1")
![Compacted segments day granularity 2](../tutorials/img/tutorial-compaction-08.png "Compacted segments day granularity 2")
![Compacted segments final state](../tutorials/img/tutorial-compaction-02.png "Compacted segments final state")
## Further reading ## Further reading

View File

@ -41,7 +41,7 @@ Let's load this initial data:
bin/post-index-task --file quickstart/tutorial/deletion-index.json bin/post-index-task --file quickstart/tutorial/deletion-index.json
``` ```
When the load finishes, open http://localhost:8081/#/datasources/deletion-tutorial in a browser. When the load finishes, open [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser.
## How to permanently delete data ## How to permanently delete data
@ -56,15 +56,17 @@ Let's drop some segments now, first with load rules, then manually.
As with the previous retention tutorial, there are currently 24 segments in the `deletion-tutorial` datasource. As with the previous retention tutorial, there are currently 24 segments in the `deletion-tutorial` datasource.
Click the `edit rules` button with a pencil icon at the upper left corner of the page. click the blue pencil icon next to `Cluster default: loadForever` for the `deletion-tutorial` datasource.
A rule configuration window will appear. Enter `tutorial` for both the user and changelog comment field. A rule configuration window will appear.
Now click the `+ Add a rule` button twice. Now click the `+ New rule` button twice.
In the `rule #1` box at the top, click `Load`, `Interval`, enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in the interval box, and click `+ _default_tier replicant`. In the upper rule box, select `Load` and `by interval`, and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval`. Replicants can remain at 2 in the `_default_tier`.
In the `rule #2` box at the bottom, click `Drop` and `Forever`. In the lower rule box, select `Drop` and `forever`.
Now click `Next` and enter `tutorial` for both the user and changelog comment field.
This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage. This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.
@ -102,11 +104,11 @@ $ ls -l1 var/druid/segments/deletion-tutorial/
Let's manually disable a segment now. This will mark a segment as "unused", but not remove it from deep storage. Let's manually disable a segment now. This will mark a segment as "unused", but not remove it from deep storage.
On http://localhost:8081/#/datasources/deletion-tutorial, click one of the remaining segments on the left for full details about the segment: In the [segments view](http://localhost:8888/unified-console.html#segments), click the arrow on the left side of one of the remaining segments to expand the segment entry:
![Segments](../tutorials/img/tutorial-deletion-01.png "Segments") ![Segments](../tutorials/img/tutorial-deletion-01.png "Segments")
The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2016-06-27T14:00:00.000Z_2016-06-27T15:00:00.000Z_2018-07-27T22:57:00.110Z` for the segment of hour 14. The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-02-28T01:11:51.606Z` for the segment of hour 14.
Let's disable the hour 14 segment by sending the following DELETE request to the Coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box: Let's disable the hour 14 segment by sending the following DELETE request to the Coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:

View File

@ -70,6 +70,8 @@ If the supervisor was successfully created, you will get a response containing t
For more details about what's going on here, check out the For more details about what's going on here, check out the
[Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html). [Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).
You can view the current supervisors and tasks in the Druid Console: [http://localhost:8888/unified-console.html#tasks](http://localhost:8888/unified-console.html#tasks).
## Load data ## Load data
Let's launch a console producer for our topic and send some data! Let's launch a console producer for our topic and send some data!

View File

@ -41,49 +41,54 @@ The ingestion spec can be found at `quickstart/tutorial/retention-index.json`. L
bin/post-index-task --file quickstart/tutorial/retention-index.json bin/post-index-task --file quickstart/tutorial/retention-index.json
``` ```
After the ingestion completes, go to http://localhost:8081 in a browser to access the Coordinator console. After the ingestion completes, go to [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser to access the Druid Console's datasource view.
In the Coordinator console, go to the `datasources` tab at the top of the page. This view shows the available datasources and a summary of the retention rules for each datasource:
This tab shows the available datasources and a summary of the retention rules for each datasource: ![Summary](../tutorials/img/tutorial-retention-01.png "Summary")
![Summary](../tutorials/img/tutorial-retention-00.png "Summary") Currently there are no rules set for the `retention-tutorial` datasource. Note that there are default rules for the cluster: load forever with 2 replicants in `_default_tier`.
Currently there are no rules set for the `retention-tutorial` datasource. Note that there are default rules, currently set to `load Forever 2 in _default_tier`. This means that all data will be loaded regardless of timestamp, and each segment will be replicated to two Historical processes in the default tier.
This means that all data will be loaded regardless of timestamp, and each segment will be replicated to two nodes in the default tier.
In this tutorial, we will ignore the tiering and redundancy concepts for now. In this tutorial, we will ignore the tiering and redundancy concepts for now.
Let's click the `retention-tutorial` datasource on the left. Let's view the segments for the `retention-tutorial` datasource by clicking the "24 Segments" link next to "Fully Available".
The next page (http://localhost:8081/#/datasources/retention-tutorial) provides information about what segments a datasource contains. On the left, the page shows that there are 24 segments, each one containing data for a specific hour of 2015-09-12: The segments view ([http://localhost:8888/unified-console.html#segments](http://localhost:8888/unified-console.html#segments)) provides information about what segments a datasource contains. The page shows that there are 24 segments, each one containing data for a specific hour of 2015-09-12:
![Original segments](../tutorials/img/tutorial-retention-01.png "Original segments") ![Original segments](../tutorials/img/tutorial-retention-02.png "Original segments")
## Set retention rules ## Set retention rules
Suppose we want to drop data for the first 12 hours of 2015-09-12 and keep data for the later 12 hours of 2015-09-12. Suppose we want to drop data for the first 12 hours of 2015-09-12 and keep data for the later 12 hours of 2015-09-12.
Click the `edit rules` button with a pencil icon at the upper left corner of the page. Go to the [datasources view](http://localhost:8888/unified-console.html#datasources) and click the blue pencil icon next to `Cluster default: loadForever` for the `retention-tutorial` datasource.
A rule configuration window will appear. Enter `tutorial` for both the user and changelog comment field. A rule configuration window will appear:
Now click the `+ Add a rule` button twice. ![Rule configuration](../tutorials/img/tutorial-retention-03.png "Rule configuration")
In the `rule #1` box at the top, click `Load`, `Interval`, enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in the interval box, and click `+ _default_tier replicant`. Now click the `+ New rule` button twice.
In the `rule #2` box at the bottom, click `Drop` and `Forever`. In the upper rule box, select `Load` and `by interval`, and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval`. Replicants can remain at 2 in the `_default_tier`.
In the lower rule box, select `Drop` and `forever`.
The rules should look like this: The rules should look like this:
![Set rules](../tutorials/img/tutorial-retention-02.png "Set rules") ![Set rules](../tutorials/img/tutorial-retention-04.png "Set rules")
Now click `Save all rules`, wait for a few seconds, and refresh the page. Now click `Next`. The rule configuration process will ask for a user name and comment, for change logging purposes. You can enter `tutorial` for both.
Now click `Save`. You can see the new rules in the datasources view:
![New rules](../tutorials/img/tutorial-retention-05.png "New rules")
Give the cluster a few minutes to apply the rule change, and go to the [segments view](http://localhost:8888/unified-console.html#segments) in the Druid Console.
The segments for the first 12 hours of 2015-09-12 are now gone: The segments for the first 12 hours of 2015-09-12 are now gone:
![New segments](../tutorials/img/tutorial-retention-03.png "New segments") ![New segments](../tutorials/img/tutorial-retention-06.png "New segments")
The resulting retention rule chain is the following: The resulting retention rule chain is the following:
@ -93,7 +98,6 @@ The resulting retention rule chain is the following:
3. loadForever (default rule) 3. loadForever (default rule)
The rule chain is evaluated from top to bottom, with the default rule chain always added at the bottom. The rule chain is evaluated from top to bottom, with the default rule chain always added at the bottom.
The tutorial rule chain we just created loads data if it is within the specified 12 hour interval. The tutorial rule chain we just created loads data if it is within the specified 12 hour interval.

View File

@ -0,0 +1,12 @@
{
"type": "compact",
"dataSource": "compaction-tutorial",
"interval": "2015-09-12/2015-09-13",
"segmentGranularity": "DAY",
"tuningConfig" : {
"type" : "index",
"maxRowsPerSegment" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
}

View File

@ -120,7 +120,7 @@ druid.selectors.coordinator.serviceName=druid/coordinator
# #
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"] druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=logging druid.emitter=noop
druid.emitter.logging.logLevel=info druid.emitter.logging.logLevel=info
# Storage type of double columns # Storage type of double columns
@ -138,3 +138,8 @@ druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.
# SQL # SQL
# #
druid.sql.enable=true druid.sql.enable=true
#
# Lookups
#
druid.lookup.enableLookupSyncOnStartup=false

View File

@ -24,14 +24,14 @@ druid.plaintextPort=8091
druid.worker.capacity=3 druid.worker.capacity=3
# Task launch parameters # Task launch parameters
druid.indexer.runner.javaOpts=-server -Xms512m -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task druid.indexer.task.baseTaskDir=var/druid/task
# HTTP server threads # HTTP server threads
druid.server.http.numThreads=9 druid.server.http.numThreads=9
# Processing threads and buffers on Peons # Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=256000000 druid.indexer.fork.property.druid.processing.buffer.sizeBytes=201326592
druid.indexer.fork.property.druid.processing.numThreads=2 druid.indexer.fork.property.druid.processing.numThreads=2
# Hadoop indexing # Hadoop indexing

View File

@ -0,0 +1 @@
org.apache.druid.cli.Main server router