2018-12-13 12:47:20 -07:00
---
layout: doc_page
title: "Tutorial: Deleting data"
---
2018-11-13 10:38:37 -07:00
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2018-08-09 13:37:52 -07:00
# Tutorial: Deleting data
This tutorial demonstrates how to delete existing data.
2019-04-19 15:52:26 -07:00
For this tutorial, we'll assume you've already downloaded Apache Druid (incubating) as described in
2018-08-09 13:37:52 -07:00
the [single-machine quickstart ](index.html ) and have it running on your local machine.
2018-08-13 11:11:32 -07:00
Completing [Tutorial: Configuring retention ](../tutorials/tutorial-retention.html ) first is highly recommended, as we will be using retention rules in this tutorial.
2018-08-09 13:37:52 -07:00
## Load initial data
In this tutorial, we will use the Wikipedia edits data, with an indexing spec that creates hourly segments. This spec is located at `quickstart/tutorial/deletion-index.json` , and it creates a datasource called `deletion-tutorial` .
Let's load this initial data:
2018-08-13 11:11:32 -07:00
```bash
2018-08-09 13:37:52 -07:00
bin/post-index-task --file quickstart/tutorial/deletion-index.json
```
2019-02-27 19:50:31 -08:00
When the load finishes, open [http://localhost:8888/unified-console.html#datasources ](http://localhost:8888/unified-console.html#datasources ) in a browser.
2018-08-09 13:37:52 -07:00
## How to permanently delete data
Permanent deletion of a Druid segment has two steps:
1. The segment must first be marked as "unused". This occurs when a segment is dropped by retention rules, and when a user manually disables a segment through the Coordinator API. This tutorial will cover both cases.
2. After segments have been marked as "unused", a Kill Task will delete any "unused" segments from Druid's metadata store as well as deep storage.
Let's drop some segments now, first with load rules, then manually.
## Drop some data with load rules
As with the previous retention tutorial, there are currently 24 segments in the `deletion-tutorial` datasource.
2019-02-27 19:50:31 -08:00
click the blue pencil icon next to `Cluster default: loadForever` for the `deletion-tutorial` datasource.
2018-08-09 13:37:52 -07:00
2019-02-27 19:50:31 -08:00
A rule configuration window will appear.
2018-08-09 13:37:52 -07:00
2019-02-27 19:50:31 -08:00
Now click the `+ New rule` button twice.
2018-08-09 13:37:52 -07:00
2019-02-27 19:50:31 -08:00
In the upper rule box, select `Load` and `by interval` , and then enter `2015-09-12T12:00:00.000Z/2015-09-13T00:00:00.000Z` in field next to `by interval` . Replicants can remain at 2 in the `_default_tier` .
2018-08-09 13:37:52 -07:00
2019-02-27 19:50:31 -08:00
In the lower rule box, select `Drop` and `forever` .
Now click `Next` and enter `tutorial` for both the user and changelog comment field.
2018-08-09 13:37:52 -07:00
This will cause the first 12 segments of `deletion-tutorial` to be dropped. However, these dropped segments are not removed from deep storage.
2018-11-01 22:47:29 -06:00
You can see that all 24 segments are still present in deep storage by listing the contents of `apache-druid-#{DRUIDVERSION}/var/druid/segments/deletion-tutorial` :
2018-08-09 13:37:52 -07:00
2018-08-13 11:11:32 -07:00
```bash
2018-08-09 13:37:52 -07:00
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
2015-09-12T02:00:00.000Z_2015-09-12T03:00:00.000Z
2015-09-12T03:00:00.000Z_2015-09-12T04:00:00.000Z
2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z
2015-09-12T05:00:00.000Z_2015-09-12T06:00:00.000Z
2015-09-12T06:00:00.000Z_2015-09-12T07:00:00.000Z
2015-09-12T07:00:00.000Z_2015-09-12T08:00:00.000Z
2015-09-12T08:00:00.000Z_2015-09-12T09:00:00.000Z
2015-09-12T09:00:00.000Z_2015-09-12T10:00:00.000Z
2015-09-12T10:00:00.000Z_2015-09-12T11:00:00.000Z
2015-09-12T11:00:00.000Z_2015-09-12T12:00:00.000Z
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z
2015-09-12T15:00:00.000Z_2015-09-12T16:00:00.000Z
2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z
2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z
2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
2015-09-12T22:00:00.000Z_2015-09-12T23:00:00.000Z
2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z
```
## Manually disable a segment
Let's manually disable a segment now. This will mark a segment as "unused", but not remove it from deep storage.
2019-02-27 19:50:31 -08:00
In the [segments view ](http://localhost:8888/unified-console.html#segments ), click the arrow on the left side of one of the remaining segments to expand the segment entry:
2018-08-09 13:37:52 -07:00
data:image/s3,"s3://crabby-images/f89ab/f89ab5d2fc8ae2c10aca91d5e548e7b2ff8abc76" alt="Segments "
2019-02-27 19:50:31 -08:00
The top of the info box shows the full segment ID, e.g. `deletion-tutorial_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2019-02-28T01:11:51.606Z` for the segment of hour 14.
2018-08-09 13:37:52 -07:00
2019-01-30 19:41:07 -08:00
Let's disable the hour 14 segment by sending the following DELETE request to the Coordinator, where {SEGMENT-ID} is the full segment ID shown in the info box:
2018-08-09 13:37:52 -07:00
2018-08-13 11:11:32 -07:00
```bash
2018-08-09 13:37:52 -07:00
curl -XDELETE http://localhost:8081/druid/coordinator/v1/datasources/deletion-tutorial/segments/{SEGMENT-ID}
```
After that command completes, you should see that the segment for hour 14 has been disabled:
data:image/s3,"s3://crabby-images/a5103/a5103b54f4c6752555913a91f11acc1a8f706b2e" alt="Segments 2 "
Note that the hour 14 segment is still in deep storage:
2018-08-13 11:11:32 -07:00
```bash
2018-08-09 13:37:52 -07:00
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T00:00:00.000Z_2015-09-12T01:00:00.000Z
2015-09-12T01:00:00.000Z_2015-09-12T02:00:00.000Z
2015-09-12T02:00:00.000Z_2015-09-12T03:00:00.000Z
2015-09-12T03:00:00.000Z_2015-09-12T04:00:00.000Z
2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z
2015-09-12T05:00:00.000Z_2015-09-12T06:00:00.000Z
2015-09-12T06:00:00.000Z_2015-09-12T07:00:00.000Z
2015-09-12T07:00:00.000Z_2015-09-12T08:00:00.000Z
2015-09-12T08:00:00.000Z_2015-09-12T09:00:00.000Z
2015-09-12T09:00:00.000Z_2015-09-12T10:00:00.000Z
2015-09-12T10:00:00.000Z_2015-09-12T11:00:00.000Z
2015-09-12T11:00:00.000Z_2015-09-12T12:00:00.000Z
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z
2015-09-12T15:00:00.000Z_2015-09-12T16:00:00.000Z
2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z
2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z
2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
2015-09-12T22:00:00.000Z_2015-09-12T23:00:00.000Z
2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z
```
## Run a kill task
Now that we have disabled some segments, we can submit a Kill Task, which will delete the disabled segments from metadata and deep storage.
2018-11-01 22:47:29 -06:00
A Kill Task spec has been provided at `quickstart/tutorial/deletion-kill.json` . Submit this task to the Overlord with the following command:
2018-08-09 13:37:52 -07:00
2018-08-13 11:11:32 -07:00
```bash
2018-08-09 13:37:52 -07:00
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/deletion -kill.json http://localhost:8090/druid/indexer/v1/task
```
After this task completes, you can see that the disabled segments have now been removed from deep storage:
2018-08-13 11:11:32 -07:00
```bash
2018-08-09 13:37:52 -07:00
$ ls -l1 var/druid/segments/deletion-tutorial/
2015-09-12T12:00:00.000Z_2015-09-12T13:00:00.000Z
2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z
2015-09-12T15:00:00.000Z_2015-09-12T16:00:00.000Z
2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z
2015-09-12T17:00:00.000Z_2015-09-12T18:00:00.000Z
2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z
2015-09-12T19:00:00.000Z_2015-09-12T20:00:00.000Z
2015-09-12T20:00:00.000Z_2015-09-12T21:00:00.000Z
2015-09-12T21:00:00.000Z_2015-09-12T22:00:00.000Z
2015-09-12T22:00:00.000Z_2015-09-12T23:00:00.000Z
2015-09-12T23:00:00.000Z_2015-09-13T00:00:00.000Z
2018-11-13 10:38:37 -07:00
```