druid/docs/content/operations/insert-segment-to-db.md

---
layout: doc_page
---
# insert-segment-to-db Tool

`insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used
to update the segment table in metadata storage after people manually migrate segments from one place to another.
It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the
segments are stored.

Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere
the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering.

In order to make it work, user will have to provide metadata storage credentials and deep storage type through Java JVM argument
or runtime.properties file. Specifically, this tool needs to know

`druid.metadata.storage.type`

`druid.metadata.storage.connector.connectURI`

`druid.metadata.storage.connector.user`

`druid.metadata.storage.connector.password`

`druid.storage.type`

Besides the properties above, you also need to specify the location where the segments are stored and whether you want to
update descriptor.json (`partitionNum_descriptor.json` for HDFS data storage). These two can be provided through command line arguments.

`--workingDir` (Required)

    The directory URI where segments are stored. This tool will recursively look for segments underneath this directory
    and insert/update these segments in metdata storage.
    Attention: workingDir must be a complete URI, which means it must be prefixed with scheme type. For example,
    hdfs://hostname:port/segment_directory

`--updateDescriptor` (Optional)

    if set to true, this tool will update `loadSpec` field in `descriptor.json` (`partitionNum_descriptor.json` for HDFS data storage) if the path in `loadSpec` is different from
    where `desciptor.json` (`partitionNum_descriptor.json` for HDFS data storage) was found. Default value is `true`.

Note: you will also need to load different Druid extensions per the metadata and deep storage you use. For example, if you
use `mysql` as metadata storage and HDFS as deep storage, you should load `mysql-metadata-storage` and `druid-hdfs-storage`
extensions.


Example:

Suppose your metadata storage is `mysql` and you've migrated some segments to a directory in HDFS, and that directory looks
like this,

```
Directory path: /druid/storage/wikipedia

├── 2013-08-31T000000.000Z_2013-09-01T000000.000Z
│   └── 2015-10-21T22_07_57.074Z
│           ├── 0_descriptor.json
│           └── 0_index.zip
├── 2013-09-01T000000.000Z_2013-09-02T000000.000Z
│   └── 2015-10-21T22_07_57.074Z
│           ├── 0_descriptor.json
│           └── 0_index.zip
├── 2013-09-02T000000.000Z_2013-09-03T000000.000Z
│   └── 2015-10-21T22_07_57.074Z
│           ├── 0_descriptor.json
│           └── 0_index.zip
└── 2013-09-03T000000.000Z_2013-09-04T000000.000Z
    └── 2015-10-21T22_07_57.074Z
            ├── 0_descriptor.json
            └── 0_index.zip
```

To load all these segments into `mysql`, you can fire the command below,

```
java 
-Ddruid.metadata.storage.type=mysql 
-Ddruid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid 
-Ddruid.metadata.storage.connector.user=druid 
-Ddruid.metadata.storage.connector.password=diurd 
-Ddruid.extensions.loadList=[\"mysql-metadata-storage\",\"druid-hdfs-storage\"] 
-Ddruid.storage.type=hdfs
-cp $DRUID_CLASSPATH 
io.druid.cli.Main tools insert-segment-to-db --workingDir hdfs://host:port//druid/storage/wikipedia --updateDescriptor true
```

In this example, `mysql` and deep storage type are provided through Java JVM arguments, you can optionally put all
of them in a runtime.properites file and include it in the Druid classpath. Note that we also include `mysql-metadata-storage`
and `druid-hdfs-storage` in the extension list.

After running this command, the segments table in `mysql` should store the new location for each segment we just inserted.
Note that for segments stored in HDFS, druid config must contain core-site.xml as described in [Druid Docs](http://druid.io/docs/latest/tutorials/cluster.html), as this new location is stored with relative path.

It is also possible to use `s3` as deep storage. In order to work with it, specify `s3` as deep storage type and load 
[`druid-s3-extensions`](../development/extensions-core/s3.html) as an extension.

```
java
-Ddruid.metadata.storage.type=mysql 
-Ddruid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid 
-Ddruid.metadata.storage.connector.user=druid 
-Ddruid.metadata.storage.connector.password=diurd
-Ddruid.extensions.loadList=[\"mysql-metadata-storage\",\"druid-s3-extensions\"]
-Ddruid.storage.type=s3
-Ddruid.s3.accessKey=... 
-Ddruid.s3.secretKey=...
-Ddruid.storage.bucket=your-bucket
-Ddruid.storage.baseKey=druid/storage/wikipedia
-Ddruid.storage.maxListingLength=1000
-cp $DRUID_CLASSPATH
io.druid.cli.Main tools insert-segment-to-db --workingDir "druid/storage/wikipedia" --updateDescriptor true
```

 Note that you can provide the location of segments with either `druid.storage.baseKey` or `--workingDir`. If both are 
 specified, `--workingDir` gets higher priority. `druid.storage.maxListingLength` is to determine the length of a
 partial list in requesting a object listing to `s3`, which defaults to 1000.
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			`---`
			`layout: doc_page`
			`---`
			`# insert-segment-to-db Tool`

			`insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used
			`to update the segment table in metadata storage after people manually migrate segments from one place to another.`
			`It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the`
			`segments are stored.`

			`Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere`
			`the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering.`

			`In order to make it work, user will have to provide metadata storage credentials and deep storage type through Java JVM argument`
			`or runtime.properties file. Specifically, this tool needs to know`

			`druid.metadata.storage.type`

			`druid.metadata.storage.connector.connectURI`

			`druid.metadata.storage.connector.user`

			`druid.metadata.storage.connector.password`

			`druid.storage.type`

			`Besides the properties above, you also need to specify the location where the segments are stored and whether you want to`
Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			update descriptor.json (`partitionNum_descriptor.json` for HDFS data storage). These two can be provided through command line arguments.
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00
			`--workingDir` (Required)

			`The directory URI where segments are stored. This tool will recursively look for segments underneath this directory`
			`and insert/update these segments in metdata storage.`
			`Attention: workingDir must be a complete URI, which means it must be prefixed with scheme type. For example,`
			`hdfs://hostname:port/segment_directory`

			`--updateDescriptor` (Optional)

Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			if set to true, this tool will update `loadSpec` field in `descriptor.json` (`partitionNum_descriptor.json` for HDFS data storage) if the path in `loadSpec` is different from
			where `desciptor.json` (`partitionNum_descriptor.json` for HDFS data storage) was found. Default value is `true`.
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00
			`Note: you will also need to load different Druid extensions per the metadata and deep storage you use. For example, if you`
Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			use `mysql` as metadata storage and HDFS as deep storage, you should load `mysql-metadata-storage` and `druid-hdfs-storage`
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			`extensions.`


			`Example:`

			Suppose your metadata storage is `mysql` and you've migrated some segments to a directory in HDFS, and that directory looks
			`like this,`

			```
			`Directory path: /druid/storage/wikipedia`

			`├── 2013-08-31T000000.000Z_2013-09-01T000000.000Z`
			`│ └── 2015-10-21T22_07_57.074Z`
Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			`│ ├── 0_descriptor.json`
			`│ └── 0_index.zip`
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			`├── 2013-09-01T000000.000Z_2013-09-02T000000.000Z`
			`│ └── 2015-10-21T22_07_57.074Z`
Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			`│ ├── 0_descriptor.json`
			`│ └── 0_index.zip`
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			`├── 2013-09-02T000000.000Z_2013-09-03T000000.000Z`
			`│ └── 2015-10-21T22_07_57.074Z`
Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			`│ ├── 0_descriptor.json`
			`│ └── 0_index.zip`
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			`└── 2013-09-03T000000.000Z_2013-09-04T000000.000Z`
			`└── 2015-10-21T22_07_57.074Z`
Namespace optimization for hdfs data segments. (#3877) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs. 2017-03-01 12:51:20 -05:00			`├── 0_descriptor.json`
			`└── 0_index.zip`
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			```

			To load all these segments into `mysql`, you can fire the command below,

			```
			`java`
			`-Ddruid.metadata.storage.type=mysql`
			`-Ddruid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid`
			`-Ddruid.metadata.storage.connector.user=druid`
			`-Ddruid.metadata.storage.connector.password=diurd`
			`-Ddruid.extensions.loadList=[\"mysql-metadata-storage\",\"druid-hdfs-storage\"]`
			`-Ddruid.storage.type=hdfs`
			`-cp $DRUID_CLASSPATH`
Fix command name in example (#3088) 2016-06-07 13:44:27 -04:00			`io.druid.cli.Main tools insert-segment-to-db --workingDir hdfs://host:port//druid/storage/wikipedia --updateDescriptor true`
DataSegmentFinder tool `insert-segment-to-db` is a tool that can insert segments into Druid metadata storage. It is intended to be used to update the segment table in metadata storage after people manually migrate segments from one place to another. It can also be used to insert missing segment into Druid, or even recover metadata storage by telling it where the segments are stored. Note: This tool expects users to have Druid cluster running in a "safe" mode, where there are no active tasks to interfere the segments being inserted. Users can optionally bring down the cluster to make 100% sure nothing is interfering. 2015-10-25 21:44:43 -04:00			```

			In this example, `mysql` and deep storage type are provided through Java JVM arguments, you can optionally put all
			of them in a runtime.properites file and include it in the Druid classpath. Note that we also include `mysql-metadata-storage`
			and `druid-hdfs-storage` in the extension list.

			After running this command, the segments table in `mysql` should store the new location for each segment we just inserted.
doc: added note to README, about necessary hdfs config after insert-segment-to-db (#3402) 2016-08-28 19:39:33 -04:00			`Note that for segments stored in HDFS, druid config must contain core-site.xml as described in [Druid Docs](http://druid.io/docs/latest/tutorials/cluster.html), as this new location is stored with relative path.`
Support finding segments in AWS S3. (#3399) * support finding segments from a AWS S3 storage. * add more Uts * address comments and add a document for the feature. * update docs indentation * update docs indentation * address comments. 1. add a Ut for json ser/deser for the config object. 2. more informant error message in a Ut. * address comments. 1. use @Min to validate the configuration object 2. change updateDescriptor to a string as it does not take an argument otherwise * fix a Ut failure - delete a Ut for testing default max length. 2016-10-10 20:27:09 -04:00
			It is also possible to use `s3` as deep storage. In order to work with it, specify `s3` as deep storage type and load
			[`druid-s3-extensions`](../development/extensions-core/s3.html) as an extension.

			```
			`java`
			`-Ddruid.metadata.storage.type=mysql`
			`-Ddruid.metadata.storage.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid`
			`-Ddruid.metadata.storage.connector.user=druid`
			`-Ddruid.metadata.storage.connector.password=diurd`
			`-Ddruid.extensions.loadList=[\"mysql-metadata-storage\",\"druid-s3-extensions\"]`
			`-Ddruid.storage.type=s3`
			`-Ddruid.s3.accessKey=...`
			`-Ddruid.s3.secretKey=...`
			`-Ddruid.storage.bucket=your-bucket`
			`-Ddruid.storage.baseKey=druid/storage/wikipedia`
			`-Ddruid.storage.maxListingLength=1000`
			`-cp $DRUID_CLASSPATH`
			`io.druid.cli.Main tools insert-segment-to-db --workingDir "druid/storage/wikipedia" --updateDescriptor true`
			```

			Note that you can provide the location of segments with either `druid.storage.baseKey` or `--workingDir`. If both are
			specified, `--workingDir` gets higher priority. `druid.storage.maxListingLength` is to determine the length of a
			partial list in requesting a object listing to `s3`, which defaults to 1000.