Update tutorial to delete data (#7577)

* Update tutorial to delete data

* update tutorial, remove old ways to drop data

* PR comments
This commit is contained in:
Surekha 2019-05-15 14:40:06 -07:00 committed by Fangjin Yang
parent e874da7cea
commit 917106985f
4 changed files with 38 additions and 29 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 196 KiB

After

Width:  |  Height:  |  Size: 791 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 787 KiB

View File

@ -29,8 +29,6 @@ This tutorial demonstrates how to delete existing data.
For this tutorial, we'll assume you've already downloaded Apache Druid (incubating) as described in
the [single-machine quickstart](index.html) and have it running on your local machine.
Completing [Tutorial: Configuring retention](../tutorials/tutorial-retention.html) first is highly recommended, as we will be using retention rules in this tutorial.
## Load initial data
In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at `quickstart/tutorial/deletion-index.json`, and it creates a datasource called `deletion-tutorial`.
@ -47,30 +45,25 @@ When the load finishes, open [http://localhost:8888/unified-console.html#datasou
Permanent deletion of a Druid segment has two steps:
1. The segment must first be marked as "unused". This occurs when a segment is dropped by retention rules, and when a user manually disables a segment through the Coordinator API. This tutorial will cover both cases.
1. The segment must first be marked as "unused". This occurs when a user manually disables a segment through the Coordinator API.
2. After segments have been marked as "unused", a Kill Task will delete any "unused" segments from Druid's metadata store as well as deep storage.
Let's drop some segments now, first with load rules, then manually.
Let's drop some segments now, by using the coordinator API to drop data by interval and segmentIds.
## Drop some data with load rules
## Disable segments by interval
As with the previous retention tutorial, there are currently 24 segments in the `deletion-tutorial` datasource.
Let's disable segments in a specified interval. This will mark all segments in the interval as "unused", but not remove them from deep storage.
Let's disable segments in interval `2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z` i.e. between hour 18 and 20.
click the blue pencil icon next to `Cluster default: loadForever` for the `deletion-tutorial` datasource.
```bash
curl -X 'POST' -H 'Content-Type:application/json' -d '{ "interval" : "2015-09-12T18:00:00.000Z/2015-09-12T20:00:00.000Z" }' http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/markUnused
```
A rule configuration window will appear.
After that command completes, you should see that the segment for hour 18 and 19 have been disabled:
Now click the `+ New rule` button twice.
![Segments 2](../tutorials/img/tutorial-deletion-02.png "Segments 2")
In the upper rule box, select `Load` and `by interval`, and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval`. Replicants can remain at 2 in the `_default_tier`.
In the lower rule box, select `Drop` and `forever`.
Now click `Next` and enter `tutorial` for both the user and changelog comment field.
This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.
You can see that all 24 segments are still present in deep storage by listing the contents of `apache-druid-#{DRUIDVERSION}/var/druid/segments/deletion-tutorial`:
Note that the hour 18 and 19 segments are still present in deep storage:
```bash
$ ls -l1 var/druid/segments/deletion-tutorial/
@ -100,9 +93,9 @@ $ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z
```
## Manually disable a segment
## Disable segments by segment IDs
Let's manually disable a segment now. This will mark a segment as "unused", but not remove it from deep storage.
Let's disable some segments by their segmentID. This will again mark the segments as "unused", but not remove them from deep storage. You can see the full segmentID for a segment from UI as explained below.
In the [segments view](http://localhost:8888/unified-console.html#segments), click the arrow on the left side of one of the remaining segments to expand the segment entry:
@ -110,17 +103,29 @@ In the [segments view](http://localhost:8888/unified-console.html#segments), cli
The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-02-28T01:11:51.606Z` for the segment of hour 14.
Let's disable the hour 14 segment by sending the following DELETE request to the Coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:
Let's disable the hour 13 and 14 segments by sending a POST request to the Coordinator with this payload
```bash
curl -XDELETE http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/segments/{SEGMENT-ID}
```json
{
"segmentIds":
[
"deletion-tutorial_2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z_2019-05-01T17:38:46.961Z",
"deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-05-01T17:38:46.961Z"
]
}
```
After that command completes, you should see that the segment for hour 14 has been disabled:
This payload json has been provided at `quickstart/tutorial/deletion-disable-segments.json`. Submit the POST request to Coordinator like this:
![Segments 2](../tutorials/img/tutorial-deletion-02.png "Segments 2")
```bash
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/deletion-disable-segments.json http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/markUnused
```
Note that the hour 14 segment is still in deep storage:
After that command completes, you should see that the segments for hour 13 and 14 have been disabled:
![Segments 3](../tutorials/img/tutorial-deletion-03.png "Segments 3")
Note that the hour 13 and 14 segments are still in deep storage:
```bash
$ ls -l1 var/druid/segments/deletion-tutorial/
@ -165,12 +170,9 @@ After this task completes, you can see that the disabled segments have now been
```bash
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
2015-09-12T15:00:00.000Z_2015-09-12T16:00:00.000Z
2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z
2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z
2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
2015-09-12T22:00:00.000Z_2015-09-12T23:00:00.000Z

View File

@ -0,0 +1,7 @@
{
"segmentIds":
[
"deletion-tutorial_2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z_2019-05-01T17:38:46.961Z",
"deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-05-01T17:38:46.961Z"
]
}