mirror of https://github.com/apache/druid.git
96 lines
3.2 KiB
Markdown
96 lines
3.2 KiB
Markdown
---
|
|
layout: doc_page
|
|
---
|
|
# DumpSegment tool
|
|
|
|
The DumpSegment tool can be used to dump the metadata or contents of a segment for debugging purposes. Note that the
|
|
dump is not necessarily a full-fidelity translation of the segment. In particular, not all metadata is included, and
|
|
complex metric values may not be complete.
|
|
|
|
To run the tool, point it at a segment directory and provide a file for writing output:
|
|
|
|
```
|
|
java org.apache.druid.cli.Main tools dump-segment \
|
|
--directory /home/druid/path/to/segment/ \
|
|
--out /home/druid/output.txt
|
|
```
|
|
|
|
### Output format
|
|
|
|
#### Data dumps
|
|
|
|
By default, or with `--dump rows`, this tool dumps rows of the segment as newline-separate JSON objects, with one
|
|
object per line, using the default serialization for each column. Normally all columns are included, but if you like,
|
|
you can limit the dump to specific columns with `--column name`.
|
|
|
|
For example, one line might look like this when pretty-printed:
|
|
|
|
```
|
|
{
|
|
"__time": 1442018818771,
|
|
"added": 36,
|
|
"channel": "#en.wikipedia",
|
|
"cityName": null,
|
|
"comment": "added project",
|
|
"count": 1,
|
|
"countryIsoCode": null,
|
|
"countryName": null,
|
|
"deleted": 0,
|
|
"delta": 36,
|
|
"isAnonymous": "false",
|
|
"isMinor": "false",
|
|
"isNew": "false",
|
|
"isRobot": "false",
|
|
"isUnpatrolled": "false",
|
|
"iuser": "00001553",
|
|
"metroCode": null,
|
|
"namespace": "Talk",
|
|
"page": "Talk:Oswald Tilghman",
|
|
"regionIsoCode": null,
|
|
"regionName": null,
|
|
"user": "GELongstreet"
|
|
}
|
|
```
|
|
|
|
#### Metadata dumps
|
|
|
|
With `--dump metadata`, this tool dumps metadata instead of rows. Metadata dumps generated by this tool are in the same
|
|
format as returned by the [SegmentMetadata query](../querying/segmentmetadataquery.html).
|
|
|
|
#### Bitmap dumps
|
|
|
|
With `--dump bitmaps`, this tool dump bitmap indexes instead of rows. Bitmap dumps generated by this tool include
|
|
dictionary-encoded string columns only. The output contains a field "bitmapSerdeFactory" describing the type of bitmaps
|
|
used in the segment, and a field "bitmaps" containing the bitmaps for each value of each column. These are base64
|
|
encoded by default, but you can also dump them as lists of row numbers with `--decompress-bitmaps`.
|
|
|
|
Normally all columns are included, but if you like, you can limit the dump to specific columns with `--column name`.
|
|
|
|
Sample output:
|
|
|
|
```
|
|
{
|
|
"bitmapSerdeFactory": {
|
|
"type": "concise"
|
|
},
|
|
"bitmaps": {
|
|
"isRobot": {
|
|
"false": "//aExfu+Nv3X...",
|
|
"true": "gAl7OoRByQ..."
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Command line arguments
|
|
|
|
|argument|description|required?|
|
|
|--------|-----------|---------|
|
|
|--directory file|Directory containing segment data. This could be generated by unzipping an "index.zip" from deep storage.|yes|
|
|
|--output file|File to write to, or omit to write to stdout.|yes|
|
|
|--dump TYPE|Dump either 'rows' (default), 'metadata', or 'bitmaps'|no|
|
|
|--column columnName|Column to include. Specify multiple times for multiple columns, or omit to include all columns.|no|
|
|
|--filter json|JSON-encoded [query filter](../querying/filters.html). Omit to include all rows. Only used if dumping rows.|no|
|
|
|--time-iso8601|Format __time column in ISO8601 format rather than long. Only used if dumping rows.|no|
|
|
|--decompress-bitmaps|Dump bitmaps as arrays rather than base64-encoded compressed bitmaps. Only used if dumping bitmaps.|no|
|