Update tutorials for 0.14.0-incubating (#7157)
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 53 KiB |
Before Width: | Height: | Size: 223 KiB After Width: | Height: | Size: 54 KiB |
Before Width: | Height: | Size: 217 KiB After Width: | Height: | Size: 273 KiB |
After Width: | Height: | Size: 39 KiB |
After Width: | Height: | Size: 305 KiB |
After Width: | Height: | Size: 39 KiB |
After Width: | Height: | Size: 343 KiB |
After Width: | Height: | Size: 39 KiB |
After Width: | Height: | Size: 42 KiB |
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 70 KiB |
Before Width: | Height: | Size: 127 KiB After Width: | Height: | Size: 196 KiB |
Before Width: | Height: | Size: 218 KiB After Width: | Height: | Size: 53 KiB |
Before Width: | Height: | Size: 76 KiB After Width: | Height: | Size: 401 KiB |
Before Width: | Height: | Size: 135 KiB After Width: | Height: | Size: 43 KiB |
After Width: | Height: | Size: 66 KiB |
After Width: | Height: | Size: 60 KiB |
After Width: | Height: | Size: 228 KiB |
|
@ -96,13 +96,13 @@ This will bring up instances of Zookeeper and the Druid services, all running on
|
|||
|
||||
```bash
|
||||
bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
|
||||
[Thu Jul 26 12:16:23 2018] Running command[zk], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
|
||||
[Thu Jul 26 12:16:23 2018] Running command[coordinator], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
|
||||
[Thu Jul 26 12:16:23 2018] Running command[broker], logging to[//stage/apache-druid-#{DRUIDVERSION}/var/sv/broker.log]: bin/run-druid broker quickstart/tutorial/conf
|
||||
[Thu Jul 26 12:16:23 2018] Running command[historical], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/historical.log]: bin/run-druid historical quickstart/tutorial/conf
|
||||
[Thu Jul 26 12:16:23 2018] Running command[overlord], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/overlord.log]: bin/run-druid overlord quickstart/tutorial/conf
|
||||
[Thu Jul 26 12:16:23 2018] Running command[middleManager], logging to[/stage/apache-druid-#{DRUIDVERSION}/var/sv/middleManager.log]: bin/run-druid middleManager quickstart/tutorial/conf
|
||||
|
||||
[Wed Feb 27 12:46:13 2019] Running command[zk], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/zk.log]: bin/run-zk quickstart/tutorial/conf
|
||||
[Wed Feb 27 12:46:13 2019] Running command[coordinator], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/coordinator.log]: bin/run-druid coordinator quickstart/tutorial/conf
|
||||
[Wed Feb 27 12:46:13 2019] Running command[broker], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/broker.log]: bin/run-druid broker quickstart/tutorial/conf
|
||||
[Wed Feb 27 12:46:13 2019] Running command[router], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/router.log]: bin/run-druid router quickstart/tutorial/conf
|
||||
[Wed Feb 27 12:46:13 2019] Running command[historical], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/historical.log]: bin/run-druid historical quickstart/tutorial/conf
|
||||
[Wed Feb 27 12:46:13 2019] Running command[overlord], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/overlord.log]: bin/run-druid overlord quickstart/tutorial/conf
|
||||
[Wed Feb 27 12:46:13 2019] Running command[middleManager], logging to[/apache-druid-#{DRUIDVERSION}/var/sv/middleManager.log]: bin/run-druid middleManager quickstart/tutorial/conf
|
||||
```
|
||||
|
||||
All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-#{DRUIDVERSION} package root. Logs for the services are located at `var/sv`.
|
||||
|
|
|
@ -163,16 +163,16 @@ Which will print the ID of the task if the submission was successful:
|
|||
{"task":"index_wikipedia_2018-06-09T21:30:32.802Z"}
|
||||
```
|
||||
|
||||
To view the status of the ingestion task, go to the Overlord console:
|
||||
[http://localhost:8090/console.html](http://localhost:8090/console.html). You can refresh the console periodically, and after
|
||||
the task is successful, you should see a "SUCCESS" status for the task.
|
||||
To view the status of the ingestion task, go to the Druid Console:
|
||||
[http://localhost:8888/](http://localhost:8888). You can refresh the console periodically, and after
|
||||
the task is successful, you should see a "SUCCESS" status for the task under the [Tasks view](http://localhost:8888/unified-console.html#tasks).
|
||||
|
||||
After the ingestion task finishes, the data will be loaded by Historical nodes and available for
|
||||
After the ingestion task finishes, the data will be loaded by Historical processes and available for
|
||||
querying within a minute or two. You can monitor the progress of loading the data in the
|
||||
Coordinator console, by checking whether there is a datasource "wikipedia" with a blue circle
|
||||
indicating "fully available": [http://localhost:8081/#/](http://localhost:8081/#/).
|
||||
Datasources view, by checking whether there is a datasource "wikipedia" with a green circle
|
||||
indicating "fully available": [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources).
|
||||
|
||||
![Coordinator console](../tutorials/img/tutorial-batch-01.png "Wikipedia 100% loaded")
|
||||
![Druid Console](../tutorials/img/tutorial-batch-01.png "Wikipedia 100% loaded")
|
||||
|
||||
## Further reading
|
||||
|
||||
|
|
|
@ -49,11 +49,15 @@ Please note that `maxRowsPerSegment` in the ingestion spec is set to 1000. This
|
|||
It's 5000000 by default and may need to be adjusted to make your segments optimized.
|
||||
</div>
|
||||
|
||||
After the ingestion completes, go to http://localhost:8081/#/datasources/compaction-tutorial in a browser to view information about the new datasource in the Coordinator console.
|
||||
After the ingestion completes, go to [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser to see the new datasource in the Druid Console.
|
||||
|
||||
![compaction-tutorial datasource](../tutorials/img/tutorial-compaction-01.png "compaction-tutorial datasource")
|
||||
|
||||
Click the `51 segments` link next to "Fully Available" for the `compaction-tutorial` datasource to view information about the datasource's segments:
|
||||
|
||||
There will be 51 segments for this datasource, 1-3 segments per hour in the input data:
|
||||
|
||||
![Original segments](../tutorials/img/tutorial-retention-01.png "Original segments")
|
||||
![Original segments](../tutorials/img/tutorial-compaction-02.png "Original segments")
|
||||
|
||||
Running a COUNT(*) query on this datasource shows that there are 39,244 rows:
|
||||
|
||||
|
@ -71,7 +75,7 @@ Retrieved 1 row in 1.38s.
|
|||
|
||||
Let's now compact these 51 small segments.
|
||||
|
||||
We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-final-index.json`:
|
||||
We have included a compaction task spec for this tutorial datasource at `quickstart/tutorial/compaction-keep-granularity.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
|
@ -96,18 +100,20 @@ In this tutorial example, only one compacted segment will be created per hour, a
|
|||
Let's submit this task now:
|
||||
|
||||
```bash
|
||||
bin/post-index-task --file quickstart/tutorial/compaction-final-index.json
|
||||
bin/post-index-task --file quickstart/tutorial/compaction-keep-granularity.json
|
||||
```
|
||||
|
||||
After the task finishes, refresh the http://localhost:8081/#/datasources/compaction-tutorial page.
|
||||
After the task finishes, refresh the [segments view](http://localhost:8888/unified-console.html#segments).
|
||||
|
||||
The original 51 segments will eventually be marked as "unused" by the Coordinator and removed, with the new compacted segments remaining.
|
||||
|
||||
By default, the Druid Coordinator will not mark segments as unused until the Coordinator process has been up for at least 15 minutes, so you may see the old segment set and the new compacted set at the same time in the Coordinator, e.g.:
|
||||
By default, the Druid Coordinator will not mark segments as unused until the Coordinator process has been up for at least 15 minutes, so you may see the old segment set and the new compacted set at the same time in the Druid Console, with 75 total segments:
|
||||
|
||||
![Compacted segments intermediate state](../tutorials/img/tutorial-compaction-01.png "Compacted segments intermediate state")
|
||||
![Compacted segments intermediate state 1](../tutorials/img/tutorial-compaction-03.png "Compacted segments intermediate state 1")
|
||||
|
||||
The new compacted segments have a more recent version than the original segments, so even when both sets of segments are shown by the Coordinator, queries will only read from the new compacted segments.
|
||||
![Compacted segments intermediate state 2](../tutorials/img/tutorial-compaction-04.png "Compacted segments intermediate state 2")
|
||||
|
||||
The new compacted segments have a more recent version than the original segments, so even when both sets of segments are shown in the Druid Console, queries will only read from the new compacted segments.
|
||||
|
||||
Let's try running a COUNT(*) on `compaction-tutorial` again, where the row count should still be 39,244:
|
||||
|
||||
|
@ -121,9 +127,47 @@ dsql> select count(*) from "compaction-tutorial";
|
|||
Retrieved 1 row in 1.30s.
|
||||
```
|
||||
|
||||
After the Coordinator has been running for at least 15 minutes, the http://localhost:8081/#/datasources/compaction-tutorial page should show there is only 1 segment:
|
||||
After the Coordinator has been running for at least 15 minutes, the [segments view](http://localhost:8888/unified-console.html#segments) should show there are 24 segments, one per hour:
|
||||
|
||||
![Compacted segments hourly granularity 1](../tutorials/img/tutorial-compaction-05.png "Compacted segments hourly granularity 1")
|
||||
|
||||
![Compacted segments hourly granularity 2](../tutorials/img/tutorial-compaction-06.png "Compacted segments hourly granularity 2")
|
||||
|
||||
## Compact the data with new segment granularity
|
||||
|
||||
The compaction task can also produce compacted segments with a granularity different from the granularity of the input segments.
|
||||
|
||||
We have included a compaction task spec that will create DAY granularity segments at `quickstart/tutorial/compaction-day-granularity.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "compact",
|
||||
"dataSource": "compaction-tutorial",
|
||||
"interval": "2015-09-12/2015-09-13",
|
||||
"segmentGranularity": "DAY",
|
||||
"tuningConfig" : {
|
||||
"type" : "index",
|
||||
"maxRowsPerSegment" : 5000000,
|
||||
"maxRowsInMemory" : 25000,
|
||||
"forceExtendableShardSpecs" : true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that `segmentGranularity` is set to `DAY` in this compaction task spec.
|
||||
|
||||
Let's submit this task now:
|
||||
|
||||
```bash
|
||||
bin/post-index-task --file quickstart/tutorial/compaction-day-granularity.json
|
||||
```
|
||||
|
||||
It will take a bit of time before the Coordinator marks the old input segments as unused, so you may see an intermediate state with 25 total segments. Eventually, there will only be one DAY granularity segment:
|
||||
|
||||
![Compacted segments day granularity 1](../tutorials/img/tutorial-compaction-07.png "Compacted segments day granularity 1")
|
||||
|
||||
![Compacted segments day granularity 2](../tutorials/img/tutorial-compaction-08.png "Compacted segments day granularity 2")
|
||||
|
||||
![Compacted segments final state](../tutorials/img/tutorial-compaction-02.png "Compacted segments final state")
|
||||
|
||||
## Further reading
|
||||
|
||||
|
|
|
@ -41,7 +41,7 @@ Let's load this initial data:
|
|||
bin/post-index-task --file quickstart/tutorial/deletion-index.json
|
||||
```
|
||||
|
||||
When the load finishes, open http://localhost:8081/#/datasources/deletion-tutorial in a browser.
|
||||
When the load finishes, open [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser.
|
||||
|
||||
## How to permanently delete data
|
||||
|
||||
|
@ -56,15 +56,17 @@ Let's drop some segments now, first with load rules, then manually.
|
|||
|
||||
As with the previous retention tutorial, there are currently 24 segments in the `deletion-tutorial` datasource.
|
||||
|
||||
Click the `edit rules` button with a pencil icon at the upper left corner of the page.
|
||||
click the blue pencil icon next to `Cluster default: loadForever` for the `deletion-tutorial` datasource.
|
||||
|
||||
A rule configuration window will appear. Enter `tutorial` for both the user and changelog comment field.
|
||||
A rule configuration window will appear.
|
||||
|
||||
Now click the `+ Add a rule` button twice.
|
||||
Now click the `+ New rule` button twice.
|
||||
|
||||
In the `rule #1` box at the top, click `Load`, `Interval`, enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in the interval box, and click `+ _default_tier replicant`.
|
||||
In the upper rule box, select `Load` and `by interval`, and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval`. Replicants can remain at 2 in the `_default_tier`.
|
||||
|
||||
In the `rule #2` box at the bottom, click `Drop` and `Forever`.
|
||||
In the lower rule box, select `Drop` and `forever`.
|
||||
|
||||
Now click `Next` and enter `tutorial` for both the user and changelog comment field.
|
||||
|
||||
This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.
|
||||
|
||||
|
@ -102,11 +104,11 @@ $ ls -l1 var/druid/segments/deletion-tutorial/
|
|||
|
||||
Let's manually disable a segment now. This will mark a segment as "unused", but not remove it from deep storage.
|
||||
|
||||
On http://localhost:8081/#/datasources/deletion-tutorial, click one of the remaining segments on the left for full details about the segment:
|
||||
In the [segments view](http://localhost:8888/unified-console.html#segments), click the arrow on the left side of one of the remaining segments to expand the segment entry:
|
||||
|
||||
![Segments](../tutorials/img/tutorial-deletion-01.png "Segments")
|
||||
|
||||
The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2016-06-27T14:00:00.000Z_2016-06-27T15:00:00.000Z_2018-07-27T22:57:00.110Z` for the segment of hour 14.
|
||||
The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-02-28T01:11:51.606Z` for the segment of hour 14.
|
||||
|
||||
Let's disable the hour 14 segment by sending the following DELETE request to the Coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:
|
||||
|
||||
|
|
|
@ -70,6 +70,8 @@ If the supervisor was successfully created, you will get a response containing t
|
|||
For more details about what's going on here, check out the
|
||||
[Druid Kafka indexing service documentation](../development/extensions-core/kafka-ingestion.html).
|
||||
|
||||
You can view the current supervisors and tasks in the Druid Console: [http://localhost:8888/unified-console.html#tasks](http://localhost:8888/unified-console.html#tasks).
|
||||
|
||||
## Load data
|
||||
|
||||
Let's launch a console producer for our topic and send some data!
|
||||
|
|
|
@ -41,49 +41,54 @@ The ingestion spec can be found at `quickstart/tutorial/retention-index.json`. L
|
|||
bin/post-index-task --file quickstart/tutorial/retention-index.json
|
||||
```
|
||||
|
||||
After the ingestion completes, go to http://localhost:8081 in a browser to access the Coordinator console.
|
||||
After the ingestion completes, go to [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser to access the Druid Console's datasource view.
|
||||
|
||||
In the Coordinator console, go to the `datasources` tab at the top of the page.
|
||||
This view shows the available datasources and a summary of the retention rules for each datasource:
|
||||
|
||||
This tab shows the available datasources and a summary of the retention rules for each datasource:
|
||||
![Summary](../tutorials/img/tutorial-retention-01.png "Summary")
|
||||
|
||||
![Summary](../tutorials/img/tutorial-retention-00.png "Summary")
|
||||
Currently there are no rules set for the `retention-tutorial` datasource. Note that there are default rules for the cluster: load forever with 2 replicants in `_default_tier`.
|
||||
|
||||
Currently there are no rules set for the `retention-tutorial` datasource. Note that there are default rules, currently set to `load Forever 2 in _default_tier`.
|
||||
|
||||
This means that all data will be loaded regardless of timestamp, and each segment will be replicated to two nodes in the default tier.
|
||||
This means that all data will be loaded regardless of timestamp, and each segment will be replicated to two Historical processes in the default tier.
|
||||
|
||||
In this tutorial, we will ignore the tiering and redundancy concepts for now.
|
||||
|
||||
Let's click the `retention-tutorial` datasource on the left.
|
||||
Let's view the segments for the `retention-tutorial` datasource by clicking the "24 Segments" link next to "Fully Available".
|
||||
|
||||
The next page (http://localhost:8081/#/datasources/retention-tutorial) provides information about what segments a datasource contains. On the left, the page shows that there are 24 segments, each one containing data for a specific hour of 2015-09-12:
|
||||
The segments view ([http://localhost:8888/unified-console.html#segments](http://localhost:8888/unified-console.html#segments)) provides information about what segments a datasource contains. The page shows that there are 24 segments, each one containing data for a specific hour of 2015-09-12:
|
||||
|
||||
![Original segments](../tutorials/img/tutorial-retention-01.png "Original segments")
|
||||
![Original segments](../tutorials/img/tutorial-retention-02.png "Original segments")
|
||||
|
||||
## Set retention rules
|
||||
|
||||
Suppose we want to drop data for the first 12 hours of 2015-09-12 and keep data for the later 12 hours of 2015-09-12.
|
||||
|
||||
Click the `edit rules` button with a pencil icon at the upper left corner of the page.
|
||||
Go to the [datasources view](http://localhost:8888/unified-console.html#datasources) and click the blue pencil icon next to `Cluster default: loadForever` for the `retention-tutorial` datasource.
|
||||
|
||||
A rule configuration window will appear. Enter `tutorial` for both the user and changelog comment field.
|
||||
A rule configuration window will appear:
|
||||
|
||||
Now click the `+ Add a rule` button twice.
|
||||
![Rule configuration](../tutorials/img/tutorial-retention-03.png "Rule configuration")
|
||||
|
||||
In the `rule #1` box at the top, click `Load`, `Interval`, enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in the interval box, and click `+ _default_tier replicant`.
|
||||
Now click the `+ New rule` button twice.
|
||||
|
||||
In the `rule #2` box at the bottom, click `Drop` and `Forever`.
|
||||
In the upper rule box, select `Load` and `by interval`, and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval`. Replicants can remain at 2 in the `_default_tier`.
|
||||
|
||||
In the lower rule box, select `Drop` and `forever`.
|
||||
|
||||
The rules should look like this:
|
||||
|
||||
![Set rules](../tutorials/img/tutorial-retention-02.png "Set rules")
|
||||
![Set rules](../tutorials/img/tutorial-retention-04.png "Set rules")
|
||||
|
||||
Now click `Save all rules`, wait for a few seconds, and refresh the page.
|
||||
Now click `Next`. The rule configuration process will ask for a user name and comment, for change logging purposes. You can enter `tutorial` for both.
|
||||
|
||||
Now click `Save`. You can see the new rules in the datasources view:
|
||||
|
||||
![New rules](../tutorials/img/tutorial-retention-05.png "New rules")
|
||||
|
||||
Give the cluster a few minutes to apply the rule change, and go to the [segments view](http://localhost:8888/unified-console.html#segments) in the Druid Console.
|
||||
The segments for the first 12 hours of 2015-09-12 are now gone:
|
||||
|
||||
![New segments](../tutorials/img/tutorial-retention-03.png "New segments")
|
||||
![New segments](../tutorials/img/tutorial-retention-06.png "New segments")
|
||||
|
||||
The resulting retention rule chain is the following:
|
||||
|
||||
|
@ -93,7 +98,6 @@ The resulting retention rule chain is the following:
|
|||
|
||||
3. loadForever (default rule)
|
||||
|
||||
|
||||
The rule chain is evaluated from top to bottom, with the default rule chain always added at the bottom.
|
||||
|
||||
The tutorial rule chain we just created loads data if it is within the specified 12 hour interval.
|
||||
|
|
|
@ -0,0 +1,12 @@
|
|||
{
|
||||
"type": "compact",
|
||||
"dataSource": "compaction-tutorial",
|
||||
"interval": "2015-09-12/2015-09-13",
|
||||
"segmentGranularity": "DAY",
|
||||
"tuningConfig" : {
|
||||
"type" : "index",
|
||||
"maxRowsPerSegment" : 5000000,
|
||||
"maxRowsInMemory" : 25000,
|
||||
"forceExtendableShardSpecs" : true
|
||||
}
|
||||
}
|
|
@ -120,7 +120,7 @@ druid.selectors.coordinator.serviceName=druid/coordinator
|
|||
#
|
||||
|
||||
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
|
||||
druid.emitter=logging
|
||||
druid.emitter=noop
|
||||
druid.emitter.logging.logLevel=info
|
||||
|
||||
# Storage type of double columns
|
||||
|
@ -138,3 +138,8 @@ druid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.
|
|||
# SQL
|
||||
#
|
||||
druid.sql.enable=true
|
||||
|
||||
#
|
||||
# Lookups
|
||||
#
|
||||
druid.lookup.enableLookupSyncOnStartup=false
|
|
@ -24,14 +24,14 @@ druid.plaintextPort=8091
|
|||
druid.worker.capacity=3
|
||||
|
||||
# Task launch parameters
|
||||
druid.indexer.runner.javaOpts=-server -Xms512m -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
|
||||
druid.indexer.runner.javaOpts=-server -Xms1g -Xmx1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
|
||||
druid.indexer.task.baseTaskDir=var/druid/task
|
||||
|
||||
# HTTP server threads
|
||||
druid.server.http.numThreads=9
|
||||
|
||||
# Processing threads and buffers on Peons
|
||||
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=256000000
|
||||
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=201326592
|
||||
druid.indexer.fork.property.druid.processing.numThreads=2
|
||||
|
||||
# Hadoop indexing
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
org.apache.druid.cli.Main server router
|