mirror of https://github.com/apache/druid.git
Update tutorial to delete data (#7577)
* Update tutorial to delete data * update tutorial, remove old ways to drop data * PR comments
This commit is contained in:
parent
e874da7cea
commit
917106985f
Binary file not shown.
Before Width: | Height: | Size: 196 KiB After Width: | Height: | Size: 791 KiB |
Binary file not shown.
After Width: | Height: | Size: 787 KiB |
|
@ -29,8 +29,6 @@ This tutorial demonstrates how to delete existing data.
|
|||
For this tutorial, we'll assume you've already downloaded Apache Druid (incubating) as described in
|
||||
the [single-machine quickstart](index.html) and have it running on your local machine.
|
||||
|
||||
Completing [Tutorial: Configuring retention](../tutorials/tutorial-retention.html) first is highly recommended, as we will be using retention rules in this tutorial.
|
||||
|
||||
## Load initial data
|
||||
|
||||
In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at `quickstart/tutorial/deletion-index.json`, and it creates a datasource called `deletion-tutorial`.
|
||||
|
@ -47,30 +45,25 @@ When the load finishes, open [http://localhost:8888/unified-console.html#datasou
|
|||
|
||||
Permanent deletion of a Druid segment has two steps:
|
||||
|
||||
1. The segment must first be marked as "unused". This occurs when a segment is dropped by retention rules, and when a user manually disables a segment through the Coordinator API. This tutorial will cover both cases.
|
||||
1. The segment must first be marked as "unused". This occurs when a user manually disables a segment through the Coordinator API.
|
||||
2. After segments have been marked as "unused", a Kill Task will delete any "unused" segments from Druid's metadata store as well as deep storage.
|
||||
|
||||
Let's drop some segments now, first with load rules, then manually.
|
||||
Let's drop some segments now, by using the coordinator API to drop data by interval and segmentIds.
|
||||
|
||||
## Drop some data with load rules
|
||||
## Disable segments by interval
|
||||
|
||||
As with the previous retention tutorial, there are currently 24 segments in the `deletion-tutorial` datasource.
|
||||
Let's disable segments in a specified interval. This will mark all segments in the interval as "unused", but not remove them from deep storage.
|
||||
Let's disable segments in interval `2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z` i.e. between hour 18 and 20.
|
||||
|
||||
click the blue pencil icon next to `Cluster default: loadForever` for the `deletion-tutorial` datasource.
|
||||
```bash
|
||||
curl -X 'POST' -H 'Content-Type:application/json' -d '{ "interval" : "2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z" }' http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/markUnused
|
||||
```
|
||||
|
||||
A rule configuration window will appear.
|
||||
After that command completes, you should see that the segment for hour 18 and 19 have been disabled:
|
||||
|
||||
Now click the `+ New rule` button twice.
|
||||
![Segments 2](../tutorials/img/tutorial-deletion-02.png "Segments 2")
|
||||
|
||||
In the upper rule box, select `Load` and `by interval`, and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval`. Replicants can remain at 2 in the `_default_tier`.
|
||||
|
||||
In the lower rule box, select `Drop` and `forever`.
|
||||
|
||||
Now click `Next` and enter `tutorial` for both the user and changelog comment field.
|
||||
|
||||
This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.
|
||||
|
||||
You can see that all 24 segments are still present in deep storage by listing the contents of `apache-druid-#{DRUIDVERSION}/var/druid/segments/deletion-tutorial`:
|
||||
Note that the hour 18 and 19 segments are still present in deep storage:
|
||||
|
||||
```bash
|
||||
$ ls -l1 var/druid/segments/deletion-tutorial/
|
||||
|
@ -100,9 +93,9 @@ $ ls -l1 var/druid/segments/deletion-tutorial/
|
|||
2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z
|
||||
```
|
||||
|
||||
## Manually disable a segment
|
||||
## Disable segments by segment IDs
|
||||
|
||||
Let's manually disable a segment now. This will mark a segment as "unused", but not remove it from deep storage.
|
||||
Let's disable some segments by their segmentID. This will again mark the segments as "unused", but not remove them from deep storage. You can see the full segmentID for a segment from UI as explained below.
|
||||
|
||||
In the [segments view](http://localhost:8888/unified-console.html#segments), click the arrow on the left side of one of the remaining segments to expand the segment entry:
|
||||
|
||||
|
@ -110,17 +103,29 @@ In the [segments view](http://localhost:8888/unified-console.html#segments), cli
|
|||
|
||||
The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-02-28T01:11:51.606Z` for the segment of hour 14.
|
||||
|
||||
Let's disable the hour 14 segment by sending the following DELETE request to the Coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:
|
||||
Let's disable the hour 13 and 14 segments by sending a POST request to the Coordinator with this payload
|
||||
|
||||
```bash
|
||||
curl -XDELETE http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/segments/{SEGMENT-ID}
|
||||
```json
|
||||
{
|
||||
"segmentIds":
|
||||
[
|
||||
"deletion-tutorial_2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z_2019-05-01T17:38:46.961Z",
|
||||
"deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-05-01T17:38:46.961Z"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
After that command completes, you should see that the segment for hour 14 has been disabled:
|
||||
This payload json has been provided at `quickstart/tutorial/deletion-disable-segments.json`. Submit the POST request to Coordinator like this:
|
||||
|
||||
![Segments 2](../tutorials/img/tutorial-deletion-02.png "Segments 2")
|
||||
```bash
|
||||
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/deletion-disable-segments.json http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/markUnused
|
||||
```
|
||||
|
||||
Note that the hour 14 segment is still in deep storage:
|
||||
After that command completes, you should see that the segments for hour 13 and 14 have been disabled:
|
||||
|
||||
![Segments 3](../tutorials/img/tutorial-deletion-03.png "Segments 3")
|
||||
|
||||
Note that the hour 13 and 14 segments are still in deep storage:
|
||||
|
||||
```bash
|
||||
$ ls -l1 var/druid/segments/deletion-tutorial/
|
||||
|
@ -165,12 +170,9 @@ After this task completes, you can see that the disabled segments have now been
|
|||
```bash
|
||||
$ ls -l1 var/druid/segments/deletion-tutorial/
|
||||
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
|
||||
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
|
||||
2015-09-12T15:00:00.000Z_2015-09-12T16:00:00.000Z
|
||||
2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
|
||||
2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
|
||||
2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z
|
||||
2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z
|
||||
2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
|
||||
2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
|
||||
2015-09-12T22:00:00.000Z_2015-09-12T23:00:00.000Z
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
{
|
||||
"segmentIds":
|
||||
[
|
||||
"deletion-tutorial_2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z_2019-05-01T17:38:46.961Z",
|
||||
"deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-05-01T17:38:46.961Z"
|
||||
]
|
||||
}
|
Loading…
Reference in New Issue