druid/docs/content/development/extensions-contrib/scan-query.md

---
layout: doc_page
---

# Scan query
Scan query returns raw Druid rows in streaming mode.

```json
 {
   "queryType": "scan",
   "dataSource": "wikipedia",
   "resultFormat": "list",
   "columns":[],
   "intervals": [
     "2013-01-01/2013-01-02"
   ],
   "batchSize":20480,
   "limit":5
 }
```

There are several main parts to a scan query:

|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be "scan"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../querying/datasource.html) for more information.|yes|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|resultFormat|How result represented, list or compactedList or valueVector. Currently only `list` and `compactedList` are supported. Default is `list`|no|
|filter|See [Filters](../querying/filters.html)|no|
|columns|A String array of dimensions and metrics to scan. If left empty, all dimensions and metrics are returned.|no|
|batchSize|How many rows buffered before return to client. Default is `20480`|no|
|limit|How many rows to return. If not specified, all rows will be returned.|no|
|context|An additional JSON Object which can be used to specify certain flags.|no|

The format of the result when resultFormat equals to `list`:

```json
 [{
    "segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
    "columns" : [
      "timestamp",
      "robot",
      "namespace",
      "anonymous",
      "unpatrolled",
      "page",
      "language",
      "newpage",
      "user",
      "count",
      "added",
      "delta",
      "variation",
      "deleted"
    ],
    "events" : [ {
        "timestamp" : "2013-01-01T00:00:00.000Z",
        "robot" : "1",
        "namespace" : "article",
        "anonymous" : "0",
        "unpatrolled" : "0",
        "page" : "11._korpus_(NOVJ)",
        "language" : "sl",
        "newpage" : "0",
        "user" : "EmausBot",
        "count" : 1.0,
        "added" : 39.0,
        "delta" : 39.0,
        "variation" : 39.0,
        "deleted" : 0.0
    }, {
        "timestamp" : "2013-01-01T00:00:00.000Z",
        "robot" : "0",
        "namespace" : "article",
        "anonymous" : "0",
        "unpatrolled" : "0",
        "page" : "112_U.S._580",
        "language" : "en",
        "newpage" : "1",
        "user" : "MZMcBride",
        "count" : 1.0,
        "added" : 70.0,
        "delta" : 70.0,
        "variation" : 70.0,
        "deleted" : 0.0
    }, {
        "timestamp" : "2013-01-01T00:00:00.000Z",
        "robot" : "0",
        "namespace" : "article",
        "anonymous" : "0",
        "unpatrolled" : "0",
        "page" : "113_U.S._243",
        "language" : "en",
        "newpage" : "1",
        "user" : "MZMcBride",
        "count" : 1.0,
        "added" : 77.0,
        "delta" : 77.0,
        "variation" : 77.0,
        "deleted" : 0.0
    }, {
        "timestamp" : "2013-01-01T00:00:00.000Z",
        "robot" : "0",
        "namespace" : "article",
        "anonymous" : "0",
        "unpatrolled" : "0",
        "page" : "113_U.S._73",
        "language" : "en",
        "newpage" : "1",
        "user" : "MZMcBride",
        "count" : 1.0,
        "added" : 70.0,
        "delta" : 70.0,
        "variation" : 70.0,
        "deleted" : 0.0
    }, {
        "timestamp" : "2013-01-01T00:00:00.000Z",
        "robot" : "0",
        "namespace" : "article",
        "anonymous" : "0",
        "unpatrolled" : "0",
        "page" : "113_U.S._756",
        "language" : "en",
        "newpage" : "1",
        "user" : "MZMcBride",
        "count" : 1.0,
        "added" : 68.0,
        "delta" : 68.0,
        "variation" : 68.0,
        "deleted" : 0.0
    } ]
} ]
```

The format of the result when resultFormat equals to `compactedList`:

```json
 [{
    "segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",
    "columns" : [
      "timestamp", "robot", "namespace", "anonymous", "unpatrolled", "page", "language", "newpage", "user", "count", "added", "delta", "variation", "deleted"
    ],
    "events" : [
     ["2013-01-01T00:00:00.000Z", "1", "article", "0", "0", "11._korpus_(NOVJ)", "sl", "0", "EmausBot", 1.0, 39.0, 39.0, 39.0, 0.0],
     ["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "112_U.S._580", "en", "1", "MZMcBride", 1.0, 70.0, 70.0, 70.0, 0.0],
     ["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._243", "en", "1", "MZMcBride", 1.0, 77.0, 77.0, 77.0, 0.0],
     ["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._73", "en", "1", "MZMcBride", 1.0, 70.0, 70.0, 70.0, 0.0],
     ["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._756", "en", "1", "MZMcBride", 1.0, 68.0, 68.0, 68.0, 0.0]
    ]
} ]
```

The biggest difference between select query and scan query is that, scan query doesn't retain all rows in memory before rows can be returned to client.  
It will cause memory pressure if too many rows required by select query.  
Scan query doesn't have this issue.  
Scan query can return all rows without issuing another pagination query, which is extremely useful when query against historical or realtime node directly.
streaming version of select query (#3307) * streaming version of select query * use columns instead of dimensions and metrics;prepare for valueVector;remove granularity * respect query limit within historical * use constant * fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner * add some test for scan query * add scan query document * fix merge conflicts * add compactedList resultFormat, this format is better for json ser/der * respect query timeout * respect query limit on broker * use static consts and remove unused code 2017-01-19 17:09:53 -05:00			`---`
			`layout: doc_page`
			`---`

			`# Scan query`
			`Scan query returns raw Druid rows in streaming mode.`

			```json
			`{`
			`"queryType": "scan",`
			`"dataSource": "wikipedia",`
			`"resultFormat": "list",`
			`"columns":[],`
			`"intervals": [`
			`"2013-01-01/2013-01-02"`
			`],`
			`"batchSize":20480,`
			`"limit":5`
			`}`
			```

			`There are several main parts to a scan query:`

			`\|property\|description\|required?\|`
			`\|--------\|-----------\|---------\|`
			`\|queryType\|This String should always be "scan"; this is the first thing Druid looks at to figure out how to interpret the query\|yes\|`
			`\|dataSource\|A String or Object defining the data source to query, very similar to a table in a relational database. See [DataSource](../querying/datasource.html) for more information.\|yes\|`
			`\|intervals\|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.\|yes\|`
			\|resultFormat\|How result represented, list or compactedList or valueVector. Currently only `list` and `compactedList` are supported. Default is `list`\|no\|
			`\|filter\|See [Filters](../querying/filters.html)\|no\|`
			`\|columns\|A String array of dimensions and metrics to scan. If left empty, all dimensions and metrics are returned.\|no\|`
			\|batchSize\|How many rows buffered before return to client. Default is `20480`\|no\|
			`\|limit\|How many rows to return. If not specified, all rows will be returned.\|no\|`
			`\|context\|An additional JSON Object which can be used to specify certain flags.\|no\|`

			The format of the result when resultFormat equals to `list`:

			```json
			`[{`
			`"segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",`
			`"columns" : [`
			`"timestamp",`
			`"robot",`
			`"namespace",`
			`"anonymous",`
			`"unpatrolled",`
			`"page",`
			`"language",`
			`"newpage",`
			`"user",`
			`"count",`
			`"added",`
			`"delta",`
			`"variation",`
			`"deleted"`
			`],`
			`"events" : [ {`
			`"timestamp" : "2013-01-01T00:00:00.000Z",`
			`"robot" : "1",`
			`"namespace" : "article",`
			`"anonymous" : "0",`
			`"unpatrolled" : "0",`
			`"page" : "11._korpus_(NOVJ)",`
			`"language" : "sl",`
			`"newpage" : "0",`
			`"user" : "EmausBot",`
			`"count" : 1.0,`
			`"added" : 39.0,`
			`"delta" : 39.0,`
			`"variation" : 39.0,`
			`"deleted" : 0.0`
			`}, {`
			`"timestamp" : "2013-01-01T00:00:00.000Z",`
			`"robot" : "0",`
			`"namespace" : "article",`
			`"anonymous" : "0",`
			`"unpatrolled" : "0",`
			`"page" : "112_U.S._580",`
			`"language" : "en",`
			`"newpage" : "1",`
			`"user" : "MZMcBride",`
			`"count" : 1.0,`
			`"added" : 70.0,`
			`"delta" : 70.0,`
			`"variation" : 70.0,`
			`"deleted" : 0.0`
			`}, {`
			`"timestamp" : "2013-01-01T00:00:00.000Z",`
			`"robot" : "0",`
			`"namespace" : "article",`
			`"anonymous" : "0",`
			`"unpatrolled" : "0",`
			`"page" : "113_U.S._243",`
			`"language" : "en",`
			`"newpage" : "1",`
			`"user" : "MZMcBride",`
			`"count" : 1.0,`
			`"added" : 77.0,`
			`"delta" : 77.0,`
			`"variation" : 77.0,`
			`"deleted" : 0.0`
			`}, {`
			`"timestamp" : "2013-01-01T00:00:00.000Z",`
			`"robot" : "0",`
			`"namespace" : "article",`
			`"anonymous" : "0",`
			`"unpatrolled" : "0",`
			`"page" : "113_U.S._73",`
			`"language" : "en",`
			`"newpage" : "1",`
			`"user" : "MZMcBride",`
			`"count" : 1.0,`
			`"added" : 70.0,`
			`"delta" : 70.0,`
			`"variation" : 70.0,`
			`"deleted" : 0.0`
			`}, {`
			`"timestamp" : "2013-01-01T00:00:00.000Z",`
			`"robot" : "0",`
			`"namespace" : "article",`
			`"anonymous" : "0",`
			`"unpatrolled" : "0",`
			`"page" : "113_U.S._756",`
			`"language" : "en",`
			`"newpage" : "1",`
			`"user" : "MZMcBride",`
			`"count" : 1.0,`
			`"added" : 68.0,`
			`"delta" : 68.0,`
			`"variation" : 68.0,`
			`"deleted" : 0.0`
			`} ]`
			`} ]`
			```

			The format of the result when resultFormat equals to `compactedList`:

			```json
			`[{`
			`"segmentId" : "wikipedia_editstream_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9",`
			`"columns" : [`
			`"timestamp", "robot", "namespace", "anonymous", "unpatrolled", "page", "language", "newpage", "user", "count", "added", "delta", "variation", "deleted"`
			`],`
			`"events" : [`
			`["2013-01-01T00:00:00.000Z", "1", "article", "0", "0", "11._korpus_(NOVJ)", "sl", "0", "EmausBot", 1.0, 39.0, 39.0, 39.0, 0.0],`
			`["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "112_U.S._580", "en", "1", "MZMcBride", 1.0, 70.0, 70.0, 70.0, 0.0],`
			`["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._243", "en", "1", "MZMcBride", 1.0, 77.0, 77.0, 77.0, 0.0],`
			`["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._73", "en", "1", "MZMcBride", 1.0, 70.0, 70.0, 70.0, 0.0],`
			`["2013-01-01T00:00:00.000Z", "0", "article", "0", "0", "113_U.S._756", "en", "1", "MZMcBride", 1.0, 68.0, 68.0, 68.0, 0.0]`
			`]`
			`} ]`
			```

			`The biggest difference between select query and scan query is that, scan query doesn't retain all rows in memory before rows can be returned to client.`
			`It will cause memory pressure if too many rows required by select query.`
			`Scan query doesn't have this issue.`
			`Scan query can return all rows without issuing another pagination query, which is extremely useful when query against historical or realtime node directly.`