mirror of https://github.com/apache/druid.git
157 lines
5.3 KiB
Markdown
157 lines
5.3 KiB
Markdown
---
|
|
layout: doc_page
|
|
---
|
|
|
|
# Scan query
|
|
Scan query returns raw Druid rows in streaming mode.
|
|
|
|
```json
|
|
{
|
|
"queryType": "scan",
|
|
"dataSource": "wikipedia",
|
|
"resultFormat": "list",
|
|
"columns":[],
|
|
"intervals": [
|
|
"2013-01-01/2013-01-02"
|
|
],
|
|
"batchSize":20480,
|
|
"limit":5
|
|
}
|
|
```
|
|
|
|
There are several main parts to a scan query:
|
|
|
|
|property|description|required?|
|
|
|--------|-----------|---------|
|
|
|queryType|This String should always be "scan"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|
|
|dataSource|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../querying/datasource.html) for more information.|yes|
|
|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|
|
|resultFormat|How result represented, list or compactedList or valueVector. Currently only `list` and `compactedList` are supported. Default is `list`|no|
|
|
|filter|See [Filters](../querying/filters.html)|no|
|
|
|columns|A String array of dimensions and metrics to scan. If left empty, all dimensions and metrics are returned.|no|
|
|
|batchSize|How many rows buffered before return to client. Default is `20480`|no|
|
|
|limit|How many rows to return. If not specified, all rows will be returned.|no|
|
|
|context|An additional JSON Object which can be used to specify certain flags.|no|
|
|
|
|
The format of the result when resultFormat equals to `list`:
|
|
|
|
```json
|
|
[{
|
|
"segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
|
|
"columns" : [
|
|
"timestamp",
|
|
"robot",
|
|
"namespace",
|
|
"anonymous",
|
|
"unpatrolled",
|
|
"page",
|
|
"language",
|
|
"newpage",
|
|
"user",
|
|
"count",
|
|
"added",
|
|
"delta",
|
|
"variation",
|
|
"deleted"
|
|
],
|
|
"events" : [ {
|
|
"timestamp" : "2013-01-01T00:00:00.000Z",
|
|
"robot" : "1",
|
|
"namespace" : "article",
|
|
"anonymous" : "0",
|
|
"unpatrolled" : "0",
|
|
"page" : "11._korpus_(NOVJ)",
|
|
"language" : "sl",
|
|
"newpage" : "0",
|
|
"user" : "EmausBot",
|
|
"count" : 1.0,
|
|
"added" : 39.0,
|
|
"delta" : 39.0,
|
|
"variation" : 39.0,
|
|
"deleted" : 0.0
|
|
}, {
|
|
"timestamp" : "2013-01-01T00:00:00.000Z",
|
|
"robot" : "0",
|
|
"namespace" : "article",
|
|
"anonymous" : "0",
|
|
"unpatrolled" : "0",
|
|
"page" : "112_U.S._580",
|
|
"language" : "en",
|
|
"newpage" : "1",
|
|
"user" : "MZMcBride",
|
|
"count" : 1.0,
|
|
"added" : 70.0,
|
|
"delta" : 70.0,
|
|
"variation" : 70.0,
|
|
"deleted" : 0.0
|
|
}, {
|
|
"timestamp" : "2013-01-01T00:00:00.000Z",
|
|
"robot" : "0",
|
|
"namespace" : "article",
|
|
"anonymous" : "0",
|
|
"unpatrolled" : "0",
|
|
"page" : "113_U.S._243",
|
|
"language" : "en",
|
|
"newpage" : "1",
|
|
"user" : "MZMcBride",
|
|
"count" : 1.0,
|
|
"added" : 77.0,
|
|
"delta" : 77.0,
|
|
"variation" : 77.0,
|
|
"deleted" : 0.0
|
|
}, {
|
|
"timestamp" : "2013-01-01T00:00:00.000Z",
|
|
"robot" : "0",
|
|
"namespace" : "article",
|
|
"anonymous" : "0",
|
|
"unpatrolled" : "0",
|
|
"page" : "113_U.S._73",
|
|
"language" : "en",
|
|
"newpage" : "1",
|
|
"user" : "MZMcBride",
|
|
"count" : 1.0,
|
|
"added" : 70.0,
|
|
"delta" : 70.0,
|
|
"variation" : 70.0,
|
|
"deleted" : 0.0
|
|
}, {
|
|
"timestamp" : "2013-01-01T00:00:00.000Z",
|
|
"robot" : "0",
|
|
"namespace" : "article",
|
|
"anonymous" : "0",
|
|
"unpatrolled" : "0",
|
|
"page" : "113_U.S._756",
|
|
"language" : "en",
|
|
"newpage" : "1",
|
|
"user" : "MZMcBride",
|
|
"count" : 1.0,
|
|
"added" : 68.0,
|
|
"delta" : 68.0,
|
|
"variation" : 68.0,
|
|
"deleted" : 0.0
|
|
} ]
|
|
} ]
|
|
```
|
|
|
|
The format of the result when resultFormat equals to `compactedList`:
|
|
|
|
```json
|
|
[{
|
|
"segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
|
|
"columns" : [
|
|
"timestamp", "robot", "namespace", "anonymous", "unpatrolled", "page", "language", "newpage", "user", "count", "added", "delta", "variation", "deleted"
|
|
],
|
|
"events" : [
|
|
["2013-01-01T00:00:00.000Z", "1", "article", "0", "0", "11._korpus_(NOVJ)", "sl", "0", "EmausBot", 1.0, 39.0, 39.0, 39.0, 0.0],
|
|
["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "112_U.S._580", "en", "1", "MZMcBride", 1.0, 70.0, 70.0, 70.0, 0.0],
|
|
["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._243", "en", "1", "MZMcBride", 1.0, 77.0, 77.0, 77.0, 0.0],
|
|
["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._73", "en", "1", "MZMcBride", 1.0, 70.0, 70.0, 70.0, 0.0],
|
|
["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._756", "en", "1", "MZMcBride", 1.0, 68.0, 68.0, 68.0, 0.0]
|
|
]
|
|
} ]
|
|
```
|
|
|
|
The biggest difference between select query and scan query is that, scan query doesn't retain all rows in memory before rows can be returned to client.
|
|
It will cause memory pressure if too many rows required by select query.
|
|
Scan query doesn't have this issue.
|
|
Scan query can return all rows without issuing another pagination query, which is extremely useful when query against historical or realtime node directly. |