2018-12-13 14:47:20 -05:00
|
|
|
---
|
2019-08-21 00:48:59 -04:00
|
|
|
id: dump-segment
|
|
|
|
title: "dump-segment tool"
|
2018-12-13 14:47:20 -05:00
|
|
|
---
|
|
|
|
|
2018-11-13 12:38:37 -05:00
|
|
|
<!--
|
|
|
|
~ Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
~ or more contributor license agreements. See the NOTICE file
|
|
|
|
~ distributed with this work for additional information
|
|
|
|
~ regarding copyright ownership. The ASF licenses this file
|
|
|
|
~ to you under the Apache License, Version 2.0 (the
|
|
|
|
~ "License"); you may not use this file except in compliance
|
|
|
|
~ with the License. You may obtain a copy of the License at
|
|
|
|
~
|
|
|
|
~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
~
|
|
|
|
~ Unless required by applicable law or agreed to in writing,
|
|
|
|
~ software distributed under the License is distributed on an
|
|
|
|
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
~ KIND, either express or implied. See the License for the
|
|
|
|
~ specific language governing permissions and limitations
|
|
|
|
~ under the License.
|
|
|
|
-->
|
|
|
|
|
2016-06-23 17:37:50 -04:00
|
|
|
|
2019-04-19 18:52:26 -04:00
|
|
|
The DumpSegment tool can be used to dump the metadata or contents of an Apache Druid (incubating) segment for debugging purposes. Note that the
|
2016-07-06 15:42:50 -04:00
|
|
|
dump is not necessarily a full-fidelity translation of the segment. In particular, not all metadata is included, and
|
|
|
|
complex metric values may not be complete.
|
2016-06-23 17:37:50 -04:00
|
|
|
|
|
|
|
To run the tool, point it at a segment directory and provide a file for writing output:
|
|
|
|
|
|
|
|
```
|
2019-12-07 09:36:46 -05:00
|
|
|
java -classpath "/my/druid/lib/*" -Ddruid.extensions.loadList="[]" org.apache.druid.cli.Main \
|
|
|
|
tools dump-segment \
|
2016-06-23 17:37:50 -04:00
|
|
|
--directory /home/druid/path/to/segment/ \
|
|
|
|
--out /home/druid/output.txt
|
|
|
|
```
|
|
|
|
|
|
|
|
### Output format
|
|
|
|
|
2016-07-06 15:42:50 -04:00
|
|
|
#### Data dumps
|
|
|
|
|
|
|
|
By default, or with `--dump rows`, this tool dumps rows of the segment as newline-separate JSON objects, with one
|
|
|
|
object per line, using the default serialization for each column. Normally all columns are included, but if you like,
|
|
|
|
you can limit the dump to specific columns with `--column name`.
|
|
|
|
|
|
|
|
For example, one line might look like this when pretty-printed:
|
2016-06-23 17:37:50 -04:00
|
|
|
|
|
|
|
```
|
|
|
|
{
|
|
|
|
"__time": 1442018818771,
|
|
|
|
"added": 36,
|
|
|
|
"channel": "#en.wikipedia",
|
|
|
|
"cityName": null,
|
|
|
|
"comment": "added project",
|
|
|
|
"count": 1,
|
|
|
|
"countryIsoCode": null,
|
|
|
|
"countryName": null,
|
|
|
|
"deleted": 0,
|
|
|
|
"delta": 36,
|
|
|
|
"isAnonymous": "false",
|
|
|
|
"isMinor": "false",
|
|
|
|
"isNew": "false",
|
|
|
|
"isRobot": "false",
|
|
|
|
"isUnpatrolled": "false",
|
|
|
|
"iuser": "00001553",
|
|
|
|
"metroCode": null,
|
|
|
|
"namespace": "Talk",
|
|
|
|
"page": "Talk:Oswald Tilghman",
|
|
|
|
"regionIsoCode": null,
|
|
|
|
"regionName": null,
|
|
|
|
"user": "GELongstreet"
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2016-07-06 15:42:50 -04:00
|
|
|
#### Metadata dumps
|
|
|
|
|
|
|
|
With `--dump metadata`, this tool dumps metadata instead of rows. Metadata dumps generated by this tool are in the same
|
2019-08-21 00:48:59 -04:00
|
|
|
format as returned by the [SegmentMetadata query](../querying/segmentmetadataquery.md).
|
2016-07-06 15:42:50 -04:00
|
|
|
|
|
|
|
#### Bitmap dumps
|
|
|
|
|
|
|
|
With `--dump bitmaps`, this tool dump bitmap indexes instead of rows. Bitmap dumps generated by this tool include
|
|
|
|
dictionary-encoded string columns only. The output contains a field "bitmapSerdeFactory" describing the type of bitmaps
|
|
|
|
used in the segment, and a field "bitmaps" containing the bitmaps for each value of each column. These are base64
|
|
|
|
encoded by default, but you can also dump them as lists of row numbers with `--decompress-bitmaps`.
|
|
|
|
|
|
|
|
Normally all columns are included, but if you like, you can limit the dump to specific columns with `--column name`.
|
|
|
|
|
|
|
|
Sample output:
|
|
|
|
|
|
|
|
```
|
|
|
|
{
|
|
|
|
"bitmapSerdeFactory": {
|
|
|
|
"type": "concise"
|
|
|
|
},
|
|
|
|
"bitmaps": {
|
|
|
|
"isRobot": {
|
|
|
|
"false": "//aExfu+Nv3X...",
|
|
|
|
"true": "gAl7OoRByQ..."
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
2016-06-23 17:37:50 -04:00
|
|
|
|
|
|
|
### Command line arguments
|
|
|
|
|
|
|
|
|argument|description|required?|
|
|
|
|
|--------|-----------|---------|
|
|
|
|
|--directory file|Directory containing segment data. This could be generated by unzipping an "index.zip" from deep storage.|yes|
|
2016-07-06 15:42:50 -04:00
|
|
|
|--output file|File to write to, or omit to write to stdout.|yes|
|
|
|
|
|--dump TYPE|Dump either 'rows' (default), 'metadata', or 'bitmaps'|no|
|
2016-06-23 17:37:50 -04:00
|
|
|
|--column columnName|Column to include. Specify multiple times for multiple columns, or omit to include all columns.|no|
|
2019-08-21 00:48:59 -04:00
|
|
|
|--filter json|JSON-encoded [query filter](../querying/filters.md). Omit to include all rows. Only used if dumping rows.|no|
|
2016-07-06 15:42:50 -04:00
|
|
|
|--time-iso8601|Format __time column in ISO8601 format rather than long. Only used if dumping rows.|no|
|
|
|
|
|--decompress-bitmaps|Dump bitmaps as arrays rather than base64-encoded compressed bitmaps. Only used if dumping bitmaps.|no|
|