Docusaurus2 upgrade for master (#14411)

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
317brian 2023-08-16 19:01:21 -07:00 committed by GitHub
parent 6b14dde50e
commit 6b4dda964d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
138 changed files with 12352 additions and 12285 deletions

View File

@ -166,7 +166,7 @@ jobs:
run: |
(cd website && npm install)
cd website
npm run link-lint
npm run build
npm run spellcheck
- name: web console

7
.gitignore vendored
View File

@ -29,6 +29,13 @@ integration-tests/gen-scripts/
/bin/
*.hprof
**/.ipynb_checkpoints/
website/.yarn/
website/node_modules/
website/.docusaurus/
website/build/
# Local Netlify folder
.netlify
*.pyc
**/.ipython/
**/.jupyter/

View File

@ -87,7 +87,11 @@ Use the built-in query workbench to prototype [DruidSQL](https://druid.apache.or
See the [latest documentation](https://druid.apache.org/docs/latest/) for the documentation for the current official release. If you need information on a previous release, you can browse [previous releases documentation](https://druid.apache.org/docs/).
Make documentation and tutorials updates in [`/docs`](https://github.com/apache/druid/tree/master/docs) using [MarkDown](https://www.markdownguide.org/) and contribute them using a pull request.
Make documentation and tutorials updates in [`/docs`](https://github.com/apache/druid/tree/master/docs) using [Markdown](https://www.markdownguide.org/) or extended Markdown [(MDX)](https://mdxjs.com/). Then, open a pull request.
To build the site locally, you need Node 16.14 or higher and to install Docusaurus 2 with `npm|yarn install` in the `website` directory. Then you can run `npm|yarn start` to launch a local build of the docs.
If you're looking to update non-doc pages like Use Cases, those files are in the [`druid-website-src`](https://github.com/apache/druid-website-src/tree/master) repo.
### Community

View File

@ -370,20 +370,37 @@ $ svn commit -m 'add 0.17.0-rc3 artifacts'
### Update druid.staged.apache.org
This repo is the source of truth for the Markdown files. The Markdown files get copied to `druid-website-src` and built there as part of the release process. It's all handled by a script in that repo called `do_all_things`.
For more thorough instructions and a description of what the `do_all_things` script does, see the [`druid-website-src` README](https://github.com/apache/druid-website-src)
1. Pull https://github.com/apache/druid-website and https://github.com/apache/druid-website-src. These repositories should be in the same directory as your Druid repository that should have the release tag checked out.
2. From druid-website, checkout branch `asf-staging`.
2. From `druid-website`, checkout branch `asf-staging`.
3. From druid-website-src, create a release branch from `master` and run `./release.sh 0.17.0 0.17.0`, replacing `0.17.0` where the first argument is the release version and 2nd argument is commit-ish. This script will:
* checkout the tag of the Druid release version
* build the docs for that version into druid-website-src
* build druid-website-src into druid-website
* stage druid-website-src and druid-website repositories to git.
4. Make a PR to the src repo (https://github.com/apache/druid-website-src) for the release branch, such as `0.17.0-docs`.
3. From `druid-website-src`, create a release branch from `master`, such as `27.0.0-docs`.
1. Update the version list in `static/js/version`.js with the version you're releasing and the release date. The highest release version goes in position 0.
1. In `scripts`, run:
5. Make another PR to the website repo (https://github.com/apache/druid-website) for the `asf-staging` branch. Once the website PR is pushed to `asf-staging`, https://druid.staged.apache.org/ will be updated near immediately with the new docs.
```python
# Include `--skip-install` if you already have Docusaurus 2 installed in druid-website-src.
# The script assumes you use `npm`. If you use `yarn`, include `--yarn`.
python do_all_things.py -v VERSION --source /my/path/to/apache/druid
```
4. Make a PR to the src repo (https://github.com/apache/druid-website-src) for the release branch. In the changed files, you should see the following:
- In `published_versions` directory: HTML files for `docs/VERSION` , `docs/latest`, and assorted HTML and non-HTML files
- In the `docs` directory at the root of the repo, the new Markdown files.
All these files should be part of your PR to `druid-website-src`.
<br />
Verify the site looks fine and that the versions on the homepage and Downloads page look correct. You can run `http-server` or something similar in `published_versions`.
5. Make a PR to the website repo (https://github.com/apache/druid-website) for the `asf-staging` branch using the contents of `published_versions` in `druid-website-src`. Once the website PR is pushed to `asf-staging`, https://druid.staged.apache.org/ will be updated near immediately with the new docs.
### Create staged Maven repo

View File

@ -29,7 +29,9 @@ This document describes the data management API endpoints for Apache Druid. This
While segments may be enabled by issuing POST requests for the datasources, the Coordinator may again disable segments if they match any configured [drop rules](../operations/rule-configuration.md#drop-rules). Even if segments are enabled by these APIs, you must configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes. If an indexing or kill task runs at the same time these APIs are invoked, the behavior is undefined. Some segments might be killed and others might be enabled. It's also possible that all segments might be disabled, but the indexing task can still read data from those segments and succeed.
> Avoid using indexing or kill tasks and these APIs at the same time for the same datasource and time chunk.
:::info
Avoid using indexing or kill tasks and these APIs at the same time for the same datasource and time chunk.
:::
`POST /druid/coordinator/v1/datasources/{dataSourceName}`

View File

@ -3,8 +3,12 @@ id: json-querying-api
title: JSON querying API
sidebar_label: JSON querying
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
@ -33,7 +37,7 @@ In this topic, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the server
Submits a JSON-based native query. The body of the request is the native query itself.
Druid supports different types of queries for different use cases. All queries require the following properties:
* `queryType`: A string representing the type of query. Druid supports the following native query types: `timeseries`, `topN`, `groupBy`, `timeBoundaries`, `segmentMetadata`, `datasourceMetadata`, `scan`, and `search`.
* `queryType`: A string representing the type of query. Druid supports the following native query types: `timeseries`, `topN`, `groupBy`, `timeBoundaries`, `segmentMetadata`, `datasourceMetadata`, `scan`, and `search`.
* `dataSource`: A string or object defining the source of data to query. The most common value is the name of the datasource to query. For more information, see [Datasources](../querying/datasource.md).
For additional properties based on your query type or use case, see [available native queries](../querying/querying.md#available-queries).
@ -49,13 +53,16 @@ For additional properties based on your query type or use case, see [available n
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="1" label="200 SUCCESS">
*Successfully submitted query*
<!--400 BAD REQUEST-->
*Successfully submitted query*
</TabItem>
<TabItem value="2" label="400 BAD REQUEST">
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
@ -69,17 +76,19 @@ For additional properties based on your query type or use case, see [available n
```
For more information on possible error messages, see [query execution failures](../querying/querying.md#query-execution-failures).
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
---
### Example query: `topN`
The following example shows a `topN` query. The query analyzes the `social_media` datasource to return the top five users from the `username` dimension with the highest number of views from the `views` metric.
The following example shows a `topN` query. The query analyzes the `social_media` datasource to return the top five users from the `username` dimension with the highest number of views from the `views` metric.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="3" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2?pretty=null" \
@ -103,7 +112,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2?pretty=null" \
]
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="4" label="HTTP">
```HTTP
POST /druid/v2?pretty=null HTTP/1.1
@ -131,7 +142,8 @@ Content-Length: 336
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Example response: `topN`
@ -179,9 +191,10 @@ In this query:
* The `upvoteToPostRatio` is a post-aggregation of the `upvoteSum` and the `postCount`, divided to calculate the ratio.
* The result is sorted based on the `upvoteToPostRatio` in descending order.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="5" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2" \
@ -217,7 +230,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2" \
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="6" label="HTTP">
```HTTP
POST /druid/v2?pretty=null HTTP/1.1
@ -256,12 +271,14 @@ Content-Length: 817
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Example response: `groupBy`
<details>
<summary>Click to show sample response</summary>
```json
[
{
@ -280,7 +297,7 @@ Content-Length: 817
## Get segment information for query
Retrieves an array that contains objects with segment information, including the server locations associated with the query provided in the request body.
Retrieves an array that contains objects with segment information, including the server locations associated with the query provided in the request body.
### URL
@ -292,13 +309,16 @@ Retrieves an array that contains objects with segment information, including the
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="7" label="200 SUCCESS">
*Successfully retrieved segment information*
<!--400 BAD REQUEST-->
*Successfully retrieved segment information*
</TabItem>
<TabItem value="8" label="400 BAD REQUEST">
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
@ -312,15 +332,17 @@ Retrieves an array that contains objects with segment information, including the
```
For more information on possible error messages, see [query execution failures](../querying/querying.md#query-execution-failures).
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
---
### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="9" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/candidates" \
@ -345,7 +367,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/candidates" \
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="10" label="HTTP">
```HTTP
POST /druid/v2/candidates HTTP/1.1
@ -374,7 +398,8 @@ Content-Length: 336
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
### Sample response
@ -895,4 +920,4 @@ Content-Length: 336
}
]
```
</details>
</details>

View File

@ -99,8 +99,10 @@ If no used segments are found for the given inputs, this API returns `204 No Con
## Metadata store information
> Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL
> [`sys.segments`](../querying/sql-metadata-tables.md#segments-table) table.
:::info
Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL
[`sys.segments`](../querying/sql-metadata-tables.md#segments-table) table.
:::
`GET /druid/coordinator/v1/metadata/segments`
@ -279,10 +281,12 @@ This section documents the API endpoints for the processes that reside on Query
Note that all _interval_ URL parameters are ISO 8601 strings delimited by a `_` instead of a `/`
as in `2016-06-27_2016-06-28`.
> Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL
> [`INFORMATION_SCHEMA.TABLES`](../querying/sql-metadata-tables.md#tables-table),
> [`INFORMATION_SCHEMA.COLUMNS`](../querying/sql-metadata-tables.md#columns-table), and
> [`sys.segments`](../querying/sql-metadata-tables.md#segments-table) tables.
:::info
Note: Much of this information is available in a simpler, easier-to-use form through the Druid SQL
[`INFORMATION_SCHEMA.TABLES`](../querying/sql-metadata-tables.md#tables-table),
[`INFORMATION_SCHEMA.COLUMNS`](../querying/sql-metadata-tables.md#columns-table), and
[`sys.segments`](../querying/sql-metadata-tables.md#segments-table) tables.
:::
`GET /druid/v2/datasources`
@ -296,17 +300,21 @@ If no interval is specified, a default interval spanning a configurable period b
`GET /druid/v2/datasources/{dataSourceName}/dimensions`
> This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead
> which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md)
> if you're using SQL.
>
:::info
This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead
which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md)
if you're using SQL.
:::
Returns the dimensions of the datasource.
`GET /druid/v2/datasources/{dataSourceName}/metrics`
> This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead
> which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md)
> if you're using SQL.
:::info
This API is deprecated and will be removed in future releases. Please use [SegmentMetadataQuery](../querying/segmentmetadataquery.md) instead
which provides more comprehensive information and supports all dataSource types including streaming dataSources. It's also encouraged to use [INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md)
if you're using SQL.
:::
Returns the metrics of the datasource.

View File

@ -3,8 +3,12 @@ id: retention-rules-api
title: Retention rules API
sidebar_label: Retention rules
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
@ -48,7 +52,7 @@ Note that this endpoint returns an HTTP `200 OK` even if the datasource does not
### Header parameters
The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the `auditInfo` property for audit history.
The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the `auditInfo` property for audit history.
* `X-Druid-Author` (optional)
* Type: String
@ -59,13 +63,15 @@ The endpoint supports a set of optional header parameters to populate the `autho
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="1" label="200 SUCCESS">
*Successfully updated retention rules for specified datasource*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully updated retention rules for specified datasource*
</TabItem>
</Tabs>
---
@ -73,9 +79,10 @@ The endpoint supports a set of optional header parameters to populate the `autho
The following example sets a set of broadcast, load, and drop retention rules for the `kttm1` datasource.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="2" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/kttm1" \
@ -100,7 +107,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/kttm1" \
]'
```
<!--HTTP-->
</TabItem>
<TabItem value="3" label="HTTP">
```HTTP
POST /druid/coordinator/v1/rules/kttm1 HTTP/1.1
@ -128,7 +137,8 @@ Content-Length: 273
]
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
### Sample response
@ -150,7 +160,7 @@ This request overwrites any existing rules for all datasources. To remove defaul
### Header parameters
The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the `auditInfo` property for audit history.
The endpoint supports a set of optional header parameters to populate the `author` and `comment` fields in the `auditInfo` property for audit history.
* `X-Druid-Author` (optional)
* Type: String
@ -161,17 +171,21 @@ The endpoint supports a set of optional header parameters to populate the `autho
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="4" label="200 SUCCESS">
*Successfully updated default retention rules*
<!--500 SERVER ERROR-->
*Successfully updated default retention rules*
*Error with request body*
</TabItem>
<TabItem value="5" label="500 SERVER ERROR">
<!--END_DOCUSAURUS_CODE_TABS-->
*Error with request body*
</TabItem>
</Tabs>
---
@ -179,9 +193,10 @@ The endpoint supports a set of optional header parameters to populate the `autho
The following example updates the default retention rule for all datasources with a `loadByInterval` rule.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="6" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/_default" \
@ -196,7 +211,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/_default" \
]'
```
<!--HTTP-->
</TabItem>
<TabItem value="7" label="HTTP">
```HTTP
POST /druid/coordinator/v1/rules/_default HTTP/1.1
@ -214,7 +231,8 @@ Content-Length: 205
]
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
### Sample response
@ -230,34 +248,40 @@ Retrieves all current retention rules in the cluster including the default reten
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="8" label="200 SUCCESS">
*Successfully retrieved retention rules*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully retrieved retention rules*
</TabItem>
</Tabs>
---
### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="9" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules"
```
<!--HTTP-->
</TabItem>
<TabItem value="10" label="HTTP">
```HTTP
GET /druid/coordinator/v1/rules HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
### Sample response
@ -302,13 +326,15 @@ Note that this endpoint returns an HTTP `200 OK` message code even if the dataso
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="11" label="200 SUCCESS">
*Successfully retrieved retention rules*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully retrieved retention rules*
</TabItem>
</Tabs>
---
@ -316,22 +342,26 @@ Note that this endpoint returns an HTTP `200 OK` message code even if the dataso
The following example retrieves the custom retention rules and default retention rules for datasource with the name `social_media`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="12" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/social_media?full=null"
```
<!--HTTP-->
</TabItem>
<TabItem value="13" label="HTTP">
```HTTP
GET /druid/coordinator/v1/rules/social_media?full=null HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
### Sample response
@ -383,21 +413,27 @@ Note that the following query parameters cannot be chained.
### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="14" label="200 SUCCESS">
*Successfully retrieved audit history*
<!--400 BAD REQUEST-->
*Successfully retrieved audit history*
*Request in the incorrect format*
</TabItem>
<TabItem value="15" label="400 BAD REQUEST">
<!--404 NOT FOUND-->
*`count` query parameter too large*
*Request in the incorrect format*
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
<TabItem value="16" label="404 NOT FOUND">
*`count` query parameter too large*
</TabItem>
</Tabs>
---
@ -405,22 +441,26 @@ Note that the following query parameters cannot be chained.
The following example retrieves the audit history for all datasources from `2023-07-13` to `2023-07-19`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="17" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/rules/history?interval=2023-07-13%2F2023-07-19"
```
<!--HTTP-->
</TabItem>
<TabItem value="18" label="HTTP">
```HTTP
GET /druid/coordinator/v1/rules/history?interval=2023-07-13/2023-07-19 HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
### Sample response
@ -519,4 +559,4 @@ Host: http://ROUTER_IP:ROUTER_PORT
}
]
```
</details>
</details>

File diff suppressed because it is too large Load Diff

View File

@ -3,8 +3,12 @@ id: sql-api
title: Druid SQL API
sidebar_label: Druid SQL
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
@ -23,9 +27,12 @@ sidebar_label: Druid SQL
~ under the License.
-->
Apache Druid supports two query languages: [Druid SQL](../querying/sql.md) and [native queries](../querying/querying.md). This topic describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
This document describes the SQL language.
:::
In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments.
In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments.
## Query from Historicals
@ -33,7 +40,7 @@ In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router s
Submits a SQL-based query in the JSON request body. Returns a JSON object with the query results and optional metadata for the results. You can also use this endpoint to query [metadata tables](../querying/sql-metadata-tables.md).
Each query has an associated SQL query ID. You can set this ID manually using the SQL context parameter `sqlQueryId`. If not set, Druid automatically generates `sqlQueryId` and returns it in the response header for `X-Druid-SQL-Query-Id`. Note that you need the `sqlQueryId` to [cancel a query](#cancel-a-query) endpoint.
Each query has an associated SQL query ID. You can set this ID manually using the SQL context parameter `sqlQueryId`. If not set, Druid automatically generates `sqlQueryId` and returns it in the response header for `X-Druid-SQL-Query-Id`. Note that you need the `sqlQueryId` to [cancel a query](#cancel-a-query) endpoint.
#### URL
@ -48,10 +55,10 @@ The request body takes the following properties:
* `object`: Returns a JSON array of JSON objects with the HTTP header `Content-Type: application/json`.
* `array`: Returns a JSON array of JSON arrays with the HTTP header `Content-Type: application/json`.
* `objectLines`: Returns newline-delimited JSON objects with a trailing blank line. Returns the HTTP header `Content-Type: text/plain`.
* `arrayLines`: Returns newline-delimited JSON arrays with a trailing blank line. Returns the HTTP header `Content-Type: text/plain`.
* `csv`: Returns a comma-separated values with one row per line and a trailing blank line. Returns the HTTP header `Content-Type: text/csv`.
* `arrayLines`: Returns newline-delimited JSON arrays with a trailing blank line. Returns the HTTP header `Content-Type: text/plain`.
* `csv`: Returns a comma-separated values with one row per line and a trailing blank line. Returns the HTTP header `Content-Type: text/csv`.
* `header`: Boolean value that determines whether to return information on column names. When set to `true`, Druid returns the column names as the first row of the results. To also get information on the column types, set `typesHeader` or `sqlTypesHeader` to `true`. For a comparative overview of data formats and configurations for the header, see the [Query output format](#query-output-format) table.
* `typesHeader`: Adds Druid runtime type information in the header. Requires `header` to be set to `true`. Complex types, like sketches, will be reported as `COMPLEX<typeName>` if a particular complex type name is known for that field, or as `COMPLEX` if the particular type name is unknown or mixed.
* `typesHeader`: Adds Druid runtime type information in the header. Requires `header` to be set to `true`. Complex types, like sketches, will be reported as `COMPLEX<typeName>` if a particular complex type name is known for that field, or as `COMPLEX` if the particular type name is unknown or mixed.
* `sqlTypesHeader`: Adds SQL type information in the header. Requires `header` to be set to `true`.
* `context`: JSON object containing optional [SQL query context parameters](../querying/sql-query-context.md), such as to set the query ID, time zone, and whether to use an approximation algorithm for distinct count.
* `parameters`: List of query parameters for parameterized queries. Each parameter in the array should be a JSON object containing the parameter's SQL data type and parameter value. For a list of supported SQL types, see [Data types](../querying/sql-data-types.md).
@ -68,15 +75,18 @@ The request body takes the following properties:
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="1" label="200 SUCCESS">
*Successfully submitted query*
<!--400 BAD REQUEST-->
*Successfully submitted query*
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
</TabItem>
<TabItem value="2" label="400 BAD REQUEST">
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
```json
{
@ -86,9 +96,11 @@ The request body takes the following properties:
"host": "The host on which the error occurred."
}
```
<!--500 INTERNAL SERVER ERROR-->
</TabItem>
<TabItem value="3" label="500 INTERNAL SERVER ERROR">
*Request not sent due to unexpected conditions. Returns a JSON object detailing the error with the following format:*
*Request not sent due to unexpected conditions. Returns a JSON object detailing the error with the following format:*
```json
{
@ -99,7 +111,8 @@ The request body takes the following properties:
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
Older versions of Druid that support the `typesHeader` and `sqlTypesHeader` parameters return the HTTP header `X-Druid-SQL-Header-Included: yes` when you set `header` to `true`. Druid returns the HTTP response header for compatibility, regardless of whether `typesHeader` and `sqlTypesHeader` are set.
@ -110,9 +123,10 @@ Older versions of Druid that support the `typesHeader` and `sqlTypesHeader` par
The following example retrieves all rows in the `wikipedia` datasource where the `user` is `BlueMoon2662`. The query is assigned the ID `request01` using the `sqlQueryId` context parameter. The optional properties `header`, `typesHeader`, and `sqlTypesHeader` are set to `true` to include type information to the response.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="4" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql" \
@ -126,7 +140,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql" \
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="5" label="HTTP">
```HTTP
POST /druid/v2/sql HTTP/1.1
@ -143,7 +159,8 @@ Content-Length: 192
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -266,7 +283,7 @@ Cancels a query on the Router or the Broker with the associated `sqlQueryId`. Th
When you cancel a query, Druid handles the cancellation in a best-effort manner. Druid immediately marks the query as canceled and aborts the query execution as soon as possible. However, the query may continue running for a short time after you make the cancellation request.
Cancellation requests require READ permission on all resources used in the SQL query.
Cancellation requests require READ permission on all resources used in the SQL query.
#### URL
@ -274,21 +291,27 @@ Cancellation requests require READ permission on all resources used in the SQL q
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="6" label="202 SUCCESS">
<!--202 SUCCESS-->
*Successfully deleted query*
<!--403 FORBIDDEN-->
</TabItem>
<TabItem value="7" label="403 FORBIDDEN">
*Authorization failure*
<!--404 NOT FOUND-->
*Authorization failure*
*Invalid `sqlQueryId` or query was completed before cancellation request*
</TabItem>
<TabItem value="8" label="404 NOT FOUND">
<!--END_DOCUSAURUS_CODE_TABS-->
*Invalid `sqlQueryId` or query was completed before cancellation request*
</TabItem>
</Tabs>
---
@ -296,22 +319,26 @@ Cancellation requests require READ permission on all resources used in the SQL q
The following example cancels a request with the set query ID `request01`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="9" label="cURL">
<!--cURL-->
```shell
curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/request01"
```
<!--HTTP-->
</TabItem>
<TabItem value="10" label="HTTP">
```HTTP
DELETE /druid/v2/sql/request01 HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -342,7 +369,7 @@ The following table shows examples of how Druid returns the column names and dat
> Query from deep storage is an [experimental feature](../development/experimental.md).
You can use the `sql/statements` endpoint to query segments that exist only in deep storage and are not loaded onto your Historical processes as determined by your load rules.
You can use the `sql/statements` endpoint to query segments that exist only in deep storage and are not loaded onto your Historical processes as determined by your load rules.
Note that at least one segment of a datasource must be available on a Historical process so that the Broker can plan your query. A quick way to check if this is true is whether or not a datasource is visible in the Druid console.
@ -359,13 +386,13 @@ Note that at least part of a datasource must be available on a Historical proces
<code class="postAPI">POST</code> <code>/druid/v2/sql/statements</code>
#### Request body
#### Request body
Generally, the `sql` and `sql/statements` endpoints support the same response body fields with minor differences. For general information about the available fields, see [Submit a query to the `sql` endpoint](#submit-a-query).
Keep the following in mind when submitting queries to the `sql/statements` endpoint:
- There are additional context parameters for `sql/statements`:
- There are additional context parameters for `sql/statements`:
- `executionMode` determines how query results are fetched. Druid currently only supports `ASYNC`. You must manually retrieve your results after the query completes.
- `selectDestination` determines where final results get written. By default, results are written to task reports. Set this parameter to `durableStorage` to instruct Druid to write the results from SELECT queries to durable storage, which allows you to fetch larger result sets. Note that this requires you to have [durable storage for MSQ enabled](../operations/durable-storage.md).
@ -374,15 +401,18 @@ Keep the following in mind when submitting queries to the `sql/statements` endpo
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="1" label="200 SUCCESS">
*Successfully queried from deep storage*
<!--400 BAD REQUEST-->
*Successfully queried from deep storage*
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
</TabItem>
<TabItem value="2" label="400 BAD REQUEST">
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
```json
{
@ -391,21 +421,23 @@ Keep the following in mind when submitting queries to the `sql/statements` endpo
"host": "The host on which the error occurred.",
"errorCode": "Well-defined error code.",
"persona": "Role or persona associated with the error.",
"category": "Classification of the error.",
"category": "Classification of the error.",
"errorMessage": "Summary of the encountered issue with expanded information.",
"context": "Additional context about the error."
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="3" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements" \
@ -414,11 +446,13 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements" \
"query": "SELECT * FROM wikipedia WHERE user='\''BlueMoon2662'\''",
"context": {
"executionMode":"ASYNC"
}
}
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="4" label="HTTP">
```HTTP
POST /druid/v2/sql/statements HTTP/1.1
@ -430,11 +464,12 @@ Content-Length: 134
"query": "SELECT * FROM wikipedia WHERE user='BlueMoon2662'",
"context": {
"executionMode":"ASYNC"
}
}
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -569,49 +604,57 @@ Retrieves information about the query associated with the given query ID. The re
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="5" label="200 SUCCESS">
*Successfully retrieved query status*
<!--400 BAD REQUEST-->
*Successfully retrieved query status*
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
</TabItem>
<TabItem value="6" label="400 BAD REQUEST">
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
```json
{
"error": "Summary of the encountered error.",
"errorCode": "Well-defined error code.",
"persona": "Role or persona associated with the error.",
"category": "Classification of the error.",
"category": "Classification of the error.",
"errorMessage": "Summary of the encountered issue with expanded information.",
"context": "Additional context about the error."
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample request
The following example retrieves the status of a query with specified ID `query-9b93f6f7-ab0e-48f5-986a-3520f84f0804`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="7" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-9b93f6f7-ab0e-48f5-986a-3520f84f0804"
```
<!--HTTP-->
</TabItem>
<TabItem value="8" label="HTTP">
```HTTP
GET /druid/v2/sql/statements/query-9b93f6f7-ab0e-48f5-986a-3520f84f0804 HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -789,47 +832,55 @@ When getting query results, keep the following in mind:
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="9" label="200 SUCCESS">
*Successfully retrieved query results*
<!--400 BAD REQUEST-->
*Successfully retrieved query results*
*Query in progress. Returns a JSON object detailing the error with the following format:*
</TabItem>
<TabItem value="10" label="400 BAD REQUEST">
*Query in progress. Returns a JSON object detailing the error with the following format:*
```json
{
"error": "Summary of the encountered error.",
"errorCode": "Well-defined error code.",
"persona": "Role or persona associated with the error.",
"category": "Classification of the error.",
"category": "Classification of the error.",
"errorMessage": "Summary of the encountered issue with expanded information.",
"context": "Additional context about the error."
}
```
<!--404 NOT FOUND-->
</TabItem>
<TabItem value="11" label="404 NOT FOUND">
*Query not found, failed or canceled*
<!--500 SERVER ERROR-->
</TabItem>
<TabItem value="12" label="500 SERVER ERROR">
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
*Error thrown due to bad query. Returns a JSON object detailing the error with the following format:*
```json
{
"error": "Summary of the encountered error.",
"errorCode": "Well-defined error code.",
"persona": "Role or persona associated with the error.",
"category": "Classification of the error.",
"category": "Classification of the error.",
"errorMessage": "Summary of the encountered issue with expanded information.",
"context": "Additional context about the error."
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
---
@ -837,22 +888,26 @@ When getting query results, keep the following in mind:
The following example retrieves the status of a query with specified ID `query-f3bca219-173d-44d4-bdc7-5002e910352f`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="13" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-f3bca219-173d-44d4-bdc7-5002e910352f/results"
```
<!--HTTP-->
</TabItem>
<TabItem value="14" label="HTTP">
```HTTP
GET /druid/v2/sql/statements/query-f3bca219-173d-44d4-bdc7-5002e910352f/results HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -1087,7 +1142,7 @@ Host: http://ROUTER_IP:ROUTER_PORT
### Cancel a query
Cancels a running or accepted query.
Cancels a running or accepted query.
#### URL
@ -1095,32 +1150,38 @@ Cancels a running or accepted query.
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 OK-->
<TabItem value="15" label="200 OK">
*A no op operation since the query is not in a state to be cancelled*
<!--202 ACCEPTED-->
*A no op operation since the query is not in a state to be cancelled*
*Successfully accepted query for cancellation*
</TabItem>
<TabItem value="16" label="202 ACCEPTED">
<!--404 SERVER ERROR-->
*Invalid query ID. Returns a JSON object detailing the error with the following format:*
*Successfully accepted query for cancellation*
</TabItem>
<TabItem value="17" label="404 SERVER ERROR">
*Invalid query ID. Returns a JSON object detailing the error with the following format:*
```json
{
"error": "Summary of the encountered error.",
"errorCode": "Well-defined error code.",
"persona": "Role or persona associated with the error.",
"category": "Classification of the error.",
"category": "Classification of the error.",
"errorMessage": "Summary of the encountered issue with expanded information.",
"context": "Additional context about the error."
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
---
@ -1128,22 +1189,26 @@ Cancels a running or accepted query.
The following example cancels a query with specified ID `query-945c9633-2fa2-49ab-80ae-8221c38c024da`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="18" label="cURL">
<!--cURL-->
```shell
curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/v2/sql/statements/query-945c9633-2fa2-49ab-80ae-8221c38c024da"
```
<!--HTTP-->
</TabItem>
<TabItem value="19" label="HTTP">
```HTTP
DELETE /druid/v2/sql/statements/query-945c9633-2fa2-49ab-80ae-8221c38c024da HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response

View File

@ -3,6 +3,8 @@ id: sql-ingestion-api
title: SQL-based ingestion API
sidebar_label: SQL-based ingestion
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
@ -23,9 +25,11 @@ sidebar_label: SQL-based ingestion
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
The **Query** view in the web console provides a friendly experience for the multi-stage query task engine (MSQ task
engine) and multi-stage query architecture. We recommend using the web console if you do not need a programmatic
@ -52,9 +56,10 @@ As an experimental feature, this endpoint also accepts SELECT queries. SELECT qu
by the controller, and written into the [task report](#get-the-report-for-a-query-task) as an array of arrays. The
behavior and result format of plain SELECT queries (without INSERT or REPLACE) is subject to change.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="1" label="HTTP">
<!--HTTP-->
```
POST /druid/v2/sql/task
@ -69,7 +74,10 @@ POST /druid/v2/sql/task
}
```
<!--curl-->
</TabItem>
<TabItem value="2" label="curl">
```bash
# Make sure you replace `username`, `password`, `your-instance`, and `port` with the values for your deployment.
@ -83,7 +91,10 @@ curl --location --request POST 'https://<username>:<password>@<your-instance>:<p
}'
```
<!--Python-->
</TabItem>
<TabItem value="3" label="Python">
```python
import json
@ -108,7 +119,9 @@ print(response.text)
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Response
@ -132,22 +145,29 @@ You can retrieve status of a query to see if it is still running, completed succ
#### Request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="4" label="HTTP">
<!--HTTP-->
```
GET /druid/indexer/v1/task/<taskId>/status
```
<!--curl-->
</TabItem>
<TabItem value="5" label="curl">
```bash
# Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment.
curl --location --request GET 'https://<username>:<password>@<your-instance>:<port>/druid/indexer/v1/task/<taskId>/status'
```
<!--Python-->
</TabItem>
<TabItem value="6" label="Python">
```python
import requests
@ -163,7 +183,9 @@ response = requests.get(url, headers=headers, data=payload, auth=('USER', 'PASSW
print(response.text)
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Response
@ -210,22 +232,29 @@ For an explanation of the fields in a report, see [Report response fields](#repo
#### Request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="7" label="HTTP">
<!--HTTP-->
```
GET /druid/indexer/v1/task/<taskId>/reports
```
<!--curl-->
</TabItem>
<TabItem value="8" label="curl">
```bash
# Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment.
curl --location --request GET 'https://<username>:<password>@<your-instance>:<port>/druid/indexer/v1/task/<taskId>/reports'
```
<!--Python-->
</TabItem>
<TabItem value="9" label="Python">
```python
import requests
@ -238,7 +267,9 @@ response = requests.get(url, headers=headers, auth=('USER', 'PASSWORD'))
print(response.text)
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Response
@ -513,7 +544,7 @@ The response shows an example report for a query.
"0": 1,
"1": 1,
"2": 1
},
},
"totalMergersForUltimateLevel": 1,
"progressDigest": 1
}
@ -589,22 +620,29 @@ The following table describes the response fields when you retrieve a report for
#### Request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="10" label="HTTP">
<!--HTTP-->
```
POST /druid/indexer/v1/task/<taskId>/shutdown
```
<!--curl-->
</TabItem>
<TabItem value="11" label="curl">
```bash
# Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment.
curl --location --request POST 'https://<username>:<password>@<your-instance>:<port>/druid/indexer/v1/task/<taskId>/shutdown'
```
<!--Python-->
</TabItem>
<TabItem value="12" label="Python">
```python
import requests
@ -620,7 +658,9 @@ response = requests.post(url, headers=headers, data=payload, auth=('USER', 'PASS
print(response.text)
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Response

View File

@ -23,8 +23,10 @@ sidebar_label: SQL JDBC driver
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](../querying/querying.md).
This document describes the SQL language.
:::
You can make [Druid SQL](../querying/sql.md) queries using the [Avatica JDBC driver](https://calcite.apache.org/avatica/downloads/).
@ -86,7 +88,9 @@ improvements for larger result sets. To use it apply the following connection UR
String url = "jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica-protobuf/;transparent_reconnect=true;serialization=protobuf";
```
> The protobuf endpoint is also known to work with the official [Golang Avatica driver](https://github.com/apache/calcite-avatica-go)
:::info
The protobuf endpoint is also known to work with the official [Golang Avatica driver](https://github.com/apache/calcite-avatica-go)
:::
Table metadata is available over JDBC using `connection.getMetaData()` or by querying the
[INFORMATION_SCHEMA tables](../querying/sql-metadata-tables.md). For an example of this, see [Get the metadata for a datasource](#get-the-metadata-for-a-datasource).

View File

@ -3,8 +3,12 @@ id: supervisor-api
title: Supervisor API
sidebar_label: Supervisors
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
@ -50,40 +54,46 @@ Returns an array of strings representing the names of active supervisors. If the
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="1" label="200 SUCCESS">
*Successfully retrieved array of active supervisor IDs*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully retrieved array of active supervisor IDs*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="2" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor"
```
<!--HTTP-->
</TabItem>
<TabItem value="3" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Click to show sample response</summary>
```json
[
"wikipedia_stream",
@ -102,40 +112,46 @@ Retrieves an array of active supervisor objects. If there are no active supervis
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="4" label="200 SUCCESS">
*Successfully retrieved supervisor objects*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully retrieved supervisor objects*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="5" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor?full=null"
```
<!--HTTP-->
</TabItem>
<TabItem value="6" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor?full=null HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Click to show sample response</summary>
```json
[
{
@ -764,40 +780,46 @@ Retrieves an array of objects representing active supervisors and their current
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="7" label="200 SUCCESS">
*Successfully retrieved supervisor state objects*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully retrieved supervisor state objects*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="8" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor?state=true"
```
<!--HTTP-->
</TabItem>
<TabItem value="9" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor?state=true HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Click to show sample response</summary>
```json
[
{
@ -829,17 +851,21 @@ Retrieves the specification for a single supervisor. The returned specification
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="10" label="200 SUCCESS">
*Successfully retrieved supervisor spec*
<!--404 NOT FOUND-->
*Successfully retrieved supervisor spec*
*Invalid supervisor ID*
</TabItem>
<TabItem value="11" label="404 NOT FOUND">
<!--END_DOCUSAURUS_CODE_TABS-->
*Invalid supervisor ID*
</TabItem>
</Tabs>
---
@ -847,22 +873,26 @@ Retrieves the specification for a single supervisor. The returned specification
The following example shows how to retrieve the specification of a supervisor with the name `wikipedia_stream`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="12" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/wikipedia_stream"
```
<!--HTTP-->
</TabItem>
<TabItem value="13" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor/wikipedia_stream HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -1187,17 +1217,21 @@ For additional information about the status report, see the topic for each strea
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="14" label="200 SUCCESS">
*Successfully retrieved supervisor status*
<!--404 NOT FOUND-->
*Successfully retrieved supervisor status*
*Invalid supervisor ID*
</TabItem>
<TabItem value="15" label="404 NOT FOUND">
<!--END_DOCUSAURUS_CODE_TABS-->
*Invalid supervisor ID*
</TabItem>
</Tabs>
---
@ -1205,22 +1239,26 @@ For additional information about the status report, see the topic for each strea
The following example shows how to retrieve the status of a supervisor with the name `social_media`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="16" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/status"
```
<!--HTTP-->
</TabItem>
<TabItem value="17" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor/social_media/status HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -1287,34 +1325,40 @@ Retrieve an audit history of specs for all supervisors.
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="18" label="200 SUCCESS">
*Successfully retrieved audit history*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully retrieved audit history*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="19" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/history"
```
<!--HTTP-->
</TabItem>
<TabItem value="20" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor/history HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -1642,17 +1686,21 @@ Retrieves an audit history of specs for a single supervisor.
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="21" label="200 SUCCESS">
*Successfully retrieved supervisor audit history*
<!--404 NOT FOUND-->
*Successfully retrieved supervisor audit history*
*Invalid supervisor ID*
</TabItem>
<TabItem value="22" label="404 NOT FOUND">
<!--END_DOCUSAURUS_CODE_TABS-->
*Invalid supervisor ID*
</TabItem>
</Tabs>
---
@ -1660,22 +1708,26 @@ Retrieves an audit history of specs for a single supervisor.
The following example shows how to retrieve the audit history of a supervisor with the name `wikipedia_stream`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="23" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/wikipedia_stream/history"
```
<!--HTTP-->
</TabItem>
<TabItem value="24" label="HTTP">
```HTTP
GET /druid/indexer/v1/supervisor/wikipedia_stream/history HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -1994,7 +2046,7 @@ Host: http://ROUTER_IP:ROUTER_PORT
### Create or update a supervisor
Creates a new supervisor or updates an existing one for the same datasource with a new schema and configuration.
Creates a new supervisor or updates an existing one for the same datasource with a new schema and configuration.
You can define a supervisor spec for [Apache Kafka](../development/extensions-core/kafka-ingestion.md#define-a-supervisor-spec) or [Amazon Kinesis](../development/extensions-core/kinesis-ingestion.md#supervisor-spec) streaming ingestion methods. Once created, the supervisor persists in the metadata database.
@ -2006,27 +2058,32 @@ When you call this endpoint on an existing supervisor for the same datasource, t
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="25" label="200 SUCCESS">
*Successfully created a new supervisor or updated an existing supervisor*
<!--415 UNSUPPORTED MEDIA TYPE-->
*Successfully created a new supervisor or updated an existing supervisor*
*Request body content type is not in JSON format*
</TabItem>
<TabItem value="26" label="415 UNSUPPORTED MEDIA TYPE">
<!--END_DOCUSAURUS_CODE_TABS-->
*Request body content type is not in JSON format*
</TabItem>
</Tabs>
---
#### Sample request
The following example uses JSON input format to create a supervisor spec for Kafka with a `social_media` datasource and `social_media` topic.
The following example uses JSON input format to create a supervisor spec for Kafka with a `social_media` datasource and `social_media` topic.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="27" label="cURL">
<!--cURL-->
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor" \
@ -2083,7 +2140,9 @@ curl "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor" \
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="28" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor HTTP/1.1
@ -2143,7 +2202,8 @@ Content-Length: 1359
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -2166,21 +2226,27 @@ Suspends a single running supervisor. Returns the updated supervisor spec, where
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="29" label="200 SUCCESS">
*Successfully shut down supervisor*
<!--400 BAD REQUEST-->
*Successfully shut down supervisor*
*Supervisor already suspended*
</TabItem>
<TabItem value="30" label="400 BAD REQUEST">
<!--404 NOT FOUND-->
*Invalid supervisor ID*
*Supervisor already suspended*
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
<TabItem value="31" label="404 NOT FOUND">
*Invalid supervisor ID*
</TabItem>
</Tabs>
---
@ -2188,22 +2254,26 @@ Suspends a single running supervisor. Returns the updated supervisor spec, where
The following example shows how to suspend a running supervisor with the name `social_media`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="32" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/suspend"
```
<!--HTTP-->
</TabItem>
<TabItem value="33" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/social_media/suspend HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -2522,34 +2592,40 @@ Suspends all supervisors. Note that this endpoint returns an HTTP `200 Success`
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="34" label="200 SUCCESS">
*Successfully suspended all supervisors*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully suspended all supervisors*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="35" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/suspendAll"
```
<!--HTTP-->
</TabItem>
<TabItem value="36" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/suspendAll HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -2573,21 +2649,27 @@ Resumes indexing tasks for a supervisor. Returns an updated supervisor spec with
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="37" label="200 SUCCESS">
*Successfully resumed supervisor*
<!--400 BAD REQUEST-->
*Successfully resumed supervisor*
*Supervisor already running*
</TabItem>
<TabItem value="38" label="400 BAD REQUEST">
<!--404 NOT FOUND-->
*Invalid supervisor ID*
*Supervisor already running*
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
<TabItem value="39" label="404 NOT FOUND">
*Invalid supervisor ID*
</TabItem>
</Tabs>
---
@ -2595,22 +2677,26 @@ Resumes indexing tasks for a supervisor. Returns an updated supervisor spec with
The following example resumes a previously suspended supervisor with name `social_media`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="40" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/resume"
```
<!--HTTP-->
</TabItem>
<TabItem value="41" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/social_media/resume HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -2930,34 +3016,40 @@ Resumes all supervisors. Note that this endpoint returns an HTTP `200 Success` c
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="42" label="200 SUCCESS">
*Successfully resumed all supervisors*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully resumed all supervisors*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="43" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/resumeAll"
```
<!--HTTP-->
</TabItem>
<TabItem value="44" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/resumeAll HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -2975,7 +3067,7 @@ Host: http://ROUTER_IP:ROUTER_PORT
Resets the specified supervisor. This endpoint clears stored offsets in Kafka or sequence numbers in Kinesis, prompting the supervisor to resume data reading. The supervisor will start from the earliest or latest available position, depending on the platform (offsets in Kafka or sequence numbers in Kinesis). It kills and recreates active tasks to read from valid positions.
Use this endpoint to recover from a stopped state due to missing offsets in Kafka or sequence numbers in Kinesis. Use this endpoint with caution as it may result in skipped messages and lead to data loss or duplicate data.
Use this endpoint to recover from a stopped state due to missing offsets in Kafka or sequence numbers in Kinesis. Use this endpoint with caution as it may result in skipped messages and lead to data loss or duplicate data.
#### URL
@ -2983,40 +3075,48 @@ Use this endpoint to recover from a stopped state due to missing offsets in Kafk
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="45" label="200 SUCCESS">
*Successfully reset supervisor*
<!--404 NOT FOUND-->
*Successfully reset supervisor*
*Invalid supervisor ID*
</TabItem>
<TabItem value="46" label="404 NOT FOUND">
<!--END_DOCUSAURUS_CODE_TABS-->
*Invalid supervisor ID*
</TabItem>
</Tabs>
---
#### Sample request
The following example shows how to reset a supervisor with the name `social_media`.
The following example shows how to reset a supervisor with the name `social_media`.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="47" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/reset"
```
<!--HTTP-->
</TabItem>
<TabItem value="48" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/social_media/reset HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -3032,48 +3132,56 @@ Host: http://ROUTER_IP:ROUTER_PORT
### Terminate a supervisor
Terminates a supervisor and its associated indexing tasks, triggering the publishing of their segments. When terminated, a tombstone marker is placed in the database to prevent reloading on restart.
Terminates a supervisor and its associated indexing tasks, triggering the publishing of their segments. When terminated, a tombstone marker is placed in the database to prevent reloading on restart.
The terminated supervisor still exists in the metadata store and its history can be retrieved.
#### URL
#### URL
<code class="postAPI">POST</code> <code>/druid/indexer/v1/supervisor/:supervisorId/terminate</code>
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="49" label="200 SUCCESS">
*Successfully terminated a supervisor*
<!--404 NOT FOUND-->
*Successfully terminated a supervisor*
*Invalid supervisor ID or supervisor not running*
</TabItem>
<TabItem value="50" label="404 NOT FOUND">
<!--END_DOCUSAURUS_CODE_TABS-->
*Invalid supervisor ID or supervisor not running*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="51" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/social_media/terminate"
```
<!--HTTP-->
</TabItem>
<TabItem value="52" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/social_media/terminate HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -3097,34 +3205,40 @@ Terminates all supervisors. Terminated supervisors still exist in the metadata s
#### Responses
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<!--200 SUCCESS-->
<TabItem value="53" label="200 SUCCESS">
*Successfully terminated all supervisors*
<!--END_DOCUSAURUS_CODE_TABS-->
*Successfully terminated all supervisors*
</TabItem>
</Tabs>
---
#### Sample request
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="54" label="cURL">
<!--cURL-->
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/indexer/v1/supervisor/terminateAll"
```
<!--HTTP-->
</TabItem>
<TabItem value="55" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor/terminateAll HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
#### Sample response
@ -3140,8 +3254,8 @@ Host: http://ROUTER_IP:ROUTER_PORT
### Shut down a supervisor
Shuts down a supervisor. This endpoint is deprecated and will be removed in future releases. Use the equivalent [terminate](#terminate-a-supervisor) endpoint instead.
Shuts down a supervisor. This endpoint is deprecated and will be removed in future releases. Use the equivalent [terminate](#terminate-a-supervisor) endpoint instead.
#### URL
<code class="postAPI">POST</code> <code>/druid/indexer/v1/supervisor/:supervisorId/shutdown</code>
<code class="postAPI">POST</code> <code>/druid/indexer/v1/supervisor/:supervisorId/shutdown</code>

File diff suppressed because it is too large Load Diff

View File

@ -67,7 +67,9 @@ Core extensions are maintained by Druid committers.
## Community extensions
> Community extensions are not maintained by Druid committers, although we accept patches from community members using these extensions. They may not have been as extensively tested as the core extensions.
:::info
Community extensions are not maintained by Druid committers, although we accept patches from community members using these extensions. They may not have been as extensively tested as the core extensions.
:::
A number of community members have contributed their own extensions to Druid that are not packaged with the default Druid tarball.
If you'd like to take on maintenance for a community extension, please post on [dev@druid.apache.org](https://lists.apache.org/list.html?dev@druid.apache.org) to let us know!
@ -123,12 +125,16 @@ druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]
These extensions are located in the `extensions` directory of the distribution.
> Druid bundles two sets of configurations: one for the [quickstart](../tutorials/index.md) and
> one for a [clustered configuration](../tutorials/cluster.md). Make sure you are updating the correct
> `common.runtime.properties` for your setup.
:::info
Druid bundles two sets of configurations: one for the [quickstart](../tutorials/index.md) and
one for a [clustered configuration](../tutorials/cluster.md). Make sure you are updating the correct
`common.runtime.properties` for your setup.
:::
> Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions
> on how to install this library, see the [MySQL extension page](../development/extensions-core/mysql.md).
:::info
Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions
on how to install this library, see the [MySQL extension page](../development/extensions-core/mysql.md).
:::
### Loading community extensions
@ -151,10 +157,14 @@ java \
You only have to install the extension once. Then, add `"druid-example-extension"` to
`druid.extensions.loadList` in common.runtime.properties to instruct Druid to load the extension.
> Please make sure all the Extensions related configuration properties listed [here](../configuration/index.md#extensions) are set correctly.
:::info
Please make sure all the Extensions related configuration properties listed [here](../configuration/index.md#extensions) are set correctly.
:::
> The Maven `groupId` for almost every [community extension](../configuration/extensions.md#community-extensions) is `org.apache.druid.extensions.contrib`. The `artifactId` is the name
> of the extension, and the version is the latest Druid stable version.
:::info
The Maven `groupId` for almost every [community extension](../configuration/extensions.md#community-extensions) is `org.apache.druid.extensions.contrib`. The `artifactId` is the name
of the extension, and the version is the latest Druid stable version.
:::
### Loading extensions from the classpath

View File

@ -775,7 +775,9 @@ the following properties.
|--------|-----------|-------|
|`druid.javascript.enabled`|Set to "true" to enable JavaScript functionality. This affects the JavaScript parser, filter, extractionFn, aggregator, post-aggregator, router strategy, and worker selection strategy.|false|
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
### Double Column storage
@ -973,7 +975,9 @@ Issuing a GET request at the same URL will return the spec that is currently in
The `smartSegmentLoading` mode simplifies Coordinator configuration for segment loading and balancing.
If you enable this mode, do not provide values for the properties in the table below as the Coordinator computes them automatically.
Druid computes the values to optimize Coordinator performance, based on the current state of the cluster.
> If you enable `smartSegmentLoading` mode, Druid ignores any value you provide for the following properties.
:::info
If you enable `smartSegmentLoading` mode, Druid ignores any value you provide for the following properties.
:::
|Property|Computed value|Description|
|--------|--------------|-----------|
@ -1397,7 +1401,9 @@ Example: a function that sends batch_index_task to workers 10.0.0.1 and 10.0.0.2
}
```
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
###### affinityConfig
@ -1996,9 +2002,11 @@ The Druid SQL server is configured through the following properties on the Broke
|`druid.sql.planner.maxNumericInFilters`|Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates to an [OR](../querying/filters.md#or) of [Bound filter](../querying/filters.md#bound-filter). By default, Druid does not restrict the amount of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of `maxNumericInFilters` should instead rewrite their queries to use strings in the `WHERE` clause instead of numbers. For example, `WHERE someString IN (123, 456)`. If this value is disabled, `maxNumericInFilters` set through query context is ignored.|`-1` (disabled)|
|`druid.sql.approxCountDistinct.function`|Implementation to use for the [`APPROX_COUNT_DISTINCT` function](../querying/sql-aggregations.md). Without extensions loaded, the only valid value is `APPROX_COUNT_DISTINCT_BUILTIN` (a HyperLogLog, or HLL, based implementation). If the [DataSketches extension](../development/extensions-core/datasketches-extension.md) is loaded, this can also be `APPROX_COUNT_DISTINCT_DS_HLL` (alternative HLL implementation) or `APPROX_COUNT_DISTINCT_DS_THETA`.<br /><br />Theta sketches use significantly more memory than HLL sketches, so you should prefer one of the two HLL implementations.|APPROX_COUNT_DISTINCT_BUILTIN|
> Previous versions of Druid had properties named `druid.sql.planner.maxQueryCount` and `druid.sql.planner.maxSemiJoinRowsInMemory`.
> These properties are no longer available. Since Druid 0.18.0, you can use `druid.server.http.maxSubqueryRows` to control the maximum
> number of rows permitted across all subqueries.
:::info
Previous versions of Druid had properties named `druid.sql.planner.maxQueryCount` and `druid.sql.planner.maxSemiJoinRowsInMemory`.
These properties are no longer available. Since Druid 0.18.0, you can use `druid.server.http.maxSubqueryRows` to control the maximum
number of rows permitted across all subqueries.
:::
#### Broker Caching
@ -2017,8 +2025,10 @@ You can optionally only configure caching to be enabled on the Broker by setting
See [cache configuration](#cache-configuration) for how to configure cache settings.
> Note: Even if cache is enabled, for [groupBy v2](../querying/groupbyquery.md#strategies) queries, segment level cache do not work on Brokers.
> See [Differences between v1 and v2](../querying/groupbyquery.md#differences-between-v1-and-v2) and [Query caching](../querying/caching.md) for more information.
:::info
Note: Even if cache is enabled, for [groupBy v2](../querying/groupbyquery.md#strategies) queries, segment level cache do not work on Brokers.
See [Differences between v1 and v2](../querying/groupbyquery.md#differences-between-v1-and-v2) and [Query caching](../querying/caching.md) for more information.
:::
#### Segment Discovery
|Property|Possible Values|Description|Default|
@ -2053,7 +2063,9 @@ for both Broker and Historical processes, when defined in the common properties
#### Local Cache
> DEPRECATED: Use caffeine (default as of v0.12.0) instead
:::info
DEPRECATED: Use caffeine (default as of v0.12.0) instead
:::
The local cache is deprecated in favor of the Caffeine cache, and may be removed in a future version of Druid. The Caffeine cache affords significantly better performance and control over eviction behavior compared to `local` cache, and is recommended in any situation where you are using JRE 8u60 or higher.

View File

@ -108,12 +108,14 @@ The following example log4j2.xml is based upon the micro quickstart:
Peons always output logs to standard output. Middle Managers redirect task logs from standard output to
[long-term storage](index.md#log-long-term-storage).
> NOTE:
> Druid shares the log4j configuration file among all services, including task peon processes.
> However, you must define a console appender in the logger for your peon processes.
> If you don't define a console appender, Druid creates and configures a new console appender
> that retains the log level, such as `info` or `warn`, but does not retain any other appender
> configuration, including non-console ones.
:::info
NOTE:
Druid shares the log4j configuration file among all services, including task peon processes.
However, you must define a console appender in the logger for your peon processes.
If you don't define a console appender, Druid creates and configures a new console appender
that retains the log level, such as `info` or `warn`, but does not retain any other appender
configuration, including non-console ones.
:::
## Log directory
The included log4j2.xml configuration for Druid and ZooKeeper writes logs to the `log` directory at the root of the distribution.

View File

@ -174,7 +174,9 @@ The following auto-compaction configuration compacts existing `HOUR` segments in
}
```
> Auto-compaction skips datasources containing ALL granularity segments when the target granularity is different.
:::info
Auto-compaction skips datasources containing ALL granularity segments when the target granularity is different.
:::
### Update partitioning scheme

View File

@ -61,7 +61,9 @@ See [Setting up a manual compaction task](#setting-up-manual-compaction) for mor
During compaction, Druid overwrites the original set of segments with the compacted set. Druid also locks the segments for the time interval being compacted to ensure data consistency. By default, compaction tasks do not modify the underlying data. You can configure the compaction task to change the query granularity or add or remove dimensions in the compaction task. This means that the only changes to query results should be the result of intentional, not automatic, changes.
You can set `dropExisting` in `ioConfig` to "true" in the compaction task to configure Druid to replace all existing segments fully contained by the interval. See the suggestion for reindexing with finer granularity under [Implementation considerations](../ingestion/native-batch.md#implementation-considerations) for an example.
> WARNING: `dropExisting` in `ioConfig` is a beta feature.
:::info
WARNING: `dropExisting` in `ioConfig` is a beta feature.
:::
If an ingestion task needs to write data to a segment for a time interval locked for compaction, by default the ingestion task supersedes the compaction task and the compaction task fails without finishing. For manual compaction tasks, you can adjust the input spec interval to avoid conflicts between ingestion and compaction. For automatic compaction, you can set the `skipOffsetFromLatest` key to adjust the auto-compaction starting point from the current time to reduce the chance of conflicts between ingestion and compaction.
Another option is to set the compaction task to higher priority than the ingestion task.
@ -79,7 +81,9 @@ For example consider two overlapping segments: segment "A" for the interval 01/0
Unless you modify the query granularity in the [`granularitySpec`](#compaction-granularity-spec), Druid retains the query granularity for the compacted segments. If segments have different query granularities before compaction, Druid chooses the finest level of granularity for the resulting compacted segment. For example if a compaction task combines two segments, one with day query granularity and one with minute query granularity, the resulting segment uses minute query granularity.
> In Apache Druid 0.21.0 and prior, Druid sets the granularity for compacted segments to the default granularity of `NONE` regardless of the query granularity of the original segments.
:::info
In Apache Druid 0.21.0 and prior, Druid sets the granularity for compacted segments to the default granularity of `NONE` regardless of the query granularity of the original segments.
:::
If you configure query granularity in compaction to go from a finer granularity like month to a coarser query granularity like year, then Druid overshadows the original segment with coarser granularity. Because the new segments have a coarser granularity, running a kill task to remove the overshadowed segments for those intervals will cause you to permanently lose the finer granularity data.
@ -130,11 +134,15 @@ To perform a manual compaction, you submit a compaction task. Compaction tasks m
|`granularitySpec`|When set, the compaction task uses the specified `granularitySpec` rather than generating one from existing segments. See [Compaction `granularitySpec`](#compaction-granularity-spec) for details.|No|
|`context`|[Task context](../ingestion/tasks.md#context)|No|
> Note: Use `granularitySpec` over `segmentGranularity` and only set one of these values. If you specify different values for these in the same compaction spec, the task fails.
:::info
Note: Use `granularitySpec` over `segmentGranularity` and only set one of these values. If you specify different values for these in the same compaction spec, the task fails.
:::
To control the number of result segments per time chunk, you can set [`maxRowsPerSegment`](../ingestion/native-batch.md#partitionsspec) or [`numShards`](../ingestion/../ingestion/native-batch.md#tuningconfig).
> You can run multiple compaction tasks in parallel. For example, if you want to compact the data for a year, you are not limited to running a single task for the entire year. You can run 12 compaction tasks with month-long intervals.
:::info
You can run multiple compaction tasks in parallel. For example, if you want to compact the data for a year, you are not limited to running a single task for the entire year. You can run 12 compaction tasks with month-long intervals.
:::
A compaction task internally generates an `index` or `index_parallel` task spec for performing compaction work with some fixed parameters. For example, its `inputSource` is always the [`druid` input source](../ingestion/input-sources.md), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default.

View File

@ -43,7 +43,9 @@ is ongoing for a particular time range of a datasource, new ingestions for that
other time ranges proceed as normal. Read-only queries also proceed as normal, using the pre-existing version of the
data.
> Druid does not support single-record updates by primary key.
:::info
Druid does not support single-record updates by primary key.
:::
## Reindex

View File

@ -141,9 +141,11 @@ This is to avoid conflicts between compaction tasks and realtime tasks.
Note that realtime tasks have a higher priority than compaction tasks by default. Realtime tasks will revoke the locks of compaction tasks if their intervals overlap, resulting in the termination of the compaction task.
For more information, see [Avoid conflicts with ingestion](../data-management/automatic-compaction.md#avoid-conflicts-with-ingestion).
> This policy currently cannot handle the situation when there are a lot of small segments which have the same interval,
> and their total size exceeds [`inputSegmentSizeBytes`](../configuration/index.md#automatic-compaction-dynamic-configuration).
> If it finds such segments, it simply skips them.
:::info
This policy currently cannot handle the situation when there are a lot of small segments which have the same interval,
and their total size exceeds [`inputSegmentSizeBytes`](../configuration/index.md#automatic-compaction-dynamic-configuration).
If it finds such segments, it simply skips them.
:::
### FAQ

View File

@ -22,8 +22,10 @@ title: "Indexer Process"
~ under the License.
-->
> The Indexer is an optional and [experimental](../development/experimental.md) feature.
> Its memory management system is still under development and will be significantly enhanced in later releases.
:::info
The Indexer is an optional and [experimental](../development/experimental.md) feature.
Its memory management system is still under development and will be significantly enhanced in later releases.
:::
The Apache Druid Indexer process is an alternative to the MiddleManager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process.

View File

@ -38,7 +38,9 @@ Derby is the default metadata store for Druid, however, it is not suitable for p
[MySQL](../development/extensions-core/mysql.md) and [PostgreSQL](../development/extensions-core/postgresql.md) are more production suitable metadata stores.
See [Metadata storage configuration](../configuration/index.md#metadata-storage) for the default configuration settings.
> We also recommend you set up a high availability environment because there is no way to restore lost metadata.
:::info
We also recommend you set up a high availability environment because there is no way to restore lost metadata.
:::
## Available metadata stores
@ -46,7 +48,9 @@ Druid supports Derby, MySQL, and PostgreSQL for storing metadata.
### Derby
> For production clusters, consider using MySQL or PostgreSQL instead of Derby.
:::info
For production clusters, consider using MySQL or PostgreSQL instead of Derby.
:::
Configure metadata storage with Derby by setting the following properties in your Druid configuration.

View File

@ -134,7 +134,9 @@ Allows defining arbitrary routing rules using a JavaScript function. The functio
}
```
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
### Routing of SQL queries using strategies

View File

@ -29,7 +29,9 @@ Apache Druid uses [Apache ZooKeeper](http://zookeeper.apache.org/) (ZK) for mana
Apache Druid supports ZooKeeper versions 3.5.x and above.
> Note: Starting with Apache Druid 0.22.0, support for ZooKeeper 3.4.x has been removed
:::info
Note: Starting with Apache Druid 0.22.0, support for ZooKeeper 3.4.x has been removed
:::
## ZooKeeper Operations

View File

@ -31,7 +31,9 @@ The `druid-histogram` extension provides an approximate histogram aggregator and
## Approximate Histogram aggregator (Deprecated)
> The Approximate Histogram aggregator is deprecated. Please use [DataSketches Quantiles](../extensions-core/datasketches-quantiles.md) instead which provides a superior distribution-independent algorithm with formal error guarantees.
:::info
The Approximate Histogram aggregator is deprecated. Please use [DataSketches Quantiles](../extensions-core/datasketches-quantiles.md) instead which provides a superior distribution-independent algorithm with formal error guarantees.
:::
This aggregator is based on
[http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf)

View File

@ -47,7 +47,9 @@ For additional sketch types supported in Druid, see [DataSketches extension](dat
|`round`|Round off values to whole numbers. Only affects query-time behavior and is ignored at ingestion-time.|no, defaults to `false`|
|`shouldFinalize`|Return the final double type representing the estimate rather than the intermediate sketch type itself. In addition to controlling the finalization of this aggregator, you can control whether all aggregators are finalized with the query context parameters [`finalize`](../../querying/query-context.md) and [`sqlFinalizeOuterSketches`](../../querying/sql-query-context.md).|no, defaults to `true`|
> The default `lgK` value has proven to be sufficient for most use cases; expect only very negligible improvements in accuracy with `lgK` values over `16` in normal circumstances.
:::info
The default `lgK` value has proven to be sufficient for most use cases; expect only very negligible improvements in accuracy with `lgK` values over `16` in normal circumstances.
:::
### HLLSketchBuild aggregator
@ -65,20 +67,22 @@ For additional sketch types supported in Druid, see [DataSketches extension](dat
The `HLLSketchBuild` aggregator builds an HLL sketch object from the specified input column. When used during ingestion, Druid stores pre-generated HLL sketch objects in the datasource instead of the raw data from the input column.
When applied at query time on an existing dimension, you can use the resulting column as an intermediate dimension by the [post-aggregators](#post-aggregators).
> It is very common to use `HLLSketchBuild` in combination with [rollup](../../ingestion/rollup.md) to create a [metric](../../ingestion/ingestion-spec.md#metricsspec) on high-cardinality columns. In this example, a metric called `userid_hll` is included in the `metricsSpec`. This will perform a HLL sketch on the `userid` field at ingestion time, allowing for highly-performant approximate `COUNT DISTINCT` query operations and improving roll-up ratios when `userid` is then left out of the `dimensionsSpec`.
>
> ```
> "metricsSpec": [
> {
> "type": "HLLSketchBuild",
> "name": "userid_hll",
> "fieldName": "userid",
> "lgK": 12,
> "tgtHllType": "HLL_4"
> }
> ]
> ```
>
:::info
It is very common to use `HLLSketchBuild` in combination with [rollup](../../ingestion/rollup.md) to create a [metric](../../ingestion/ingestion-spec.md#metricsspec) on high-cardinality columns. In this example, a metric called `userid_hll` is included in the `metricsSpec`. This will perform a HLL sketch on the `userid` field at ingestion time, allowing for highly-performant approximate `COUNT DISTINCT` query operations and improving roll-up ratios when `userid` is then left out of the `dimensionsSpec`.
```
"metricsSpec": [
{
"type": "HLLSketchBuild",
"name": "userid_hll",
"fieldName": "userid",
"lgK": 12,
"tgtHllType": "HLL_4"
}
]
```
:::
### HLLSketchMerge aggregator

View File

@ -30,10 +30,12 @@ This module can be used side to side with other lookup module like the global ca
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-single` in the extensions load list.
> If using JDBC, you will need to add your database's client JAR files to the extension's directory.
> For Postgres, the connector JAR is already included.
> See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
> Copy or symlink the downloaded file to `extensions/druid-lookups-cached-single` under the distribution root directory.
:::info
If using JDBC, you will need to add your database's client JAR files to the extension's directory.
For Postgres, the connector JAR is already included.
See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
Copy or symlink the downloaded file to `extensions/druid-lookups-cached-single` under the distribution root directory.
:::
## Architecture
Generally speaking this module can be divided into two main component, namely, the data fetcher layer and caching layer.

View File

@ -26,7 +26,9 @@ This Apache Druid extension adds an Authorizer which implements access control f
Make sure to [include](../../configuration/extensions.md#loading-extensions) `druid-ranger-security` in the extensions load list.
> The latest release of Apache Ranger is at the time of writing version 2.0. This version has a dependency on `log4j 1.2.17` which has a vulnerability if you configure it to use a `SocketServer` (CVE-2019-17571). Next to that, it also includes Kafka 2.0.0 which has 2 known vulnerabilities (CVE-2019-12399, CVE-2018-17196). Kafka can be used by the audit component in Ranger, but is not required.
:::info
The latest release of Apache Ranger is at the time of writing version 2.0. This version has a dependency on `log4j 1.2.17` which has a vulnerability if you configure it to use a `SocketServer` (CVE-2019-17571). Next to that, it also includes Kafka 2.0.0 which has 2 known vulnerabilities (CVE-2019-12399, CVE-2018-17196). Kafka can be used by the audit component in Ranger, but is not required.
:::
## Configuration
@ -67,7 +69,9 @@ druid.escalator.internalClientPassword=password2
druid.escalator.authorizerName=ranger
```
> Contrary to the documentation of `druid-basic-auth` Ranger does not automatically provision a highly privileged system user, you will need to do this yourself. This system user in the case of `druid-basic-auth` is named `druid_system` and for the escalator it is configurable, as shown above. Make sure to take note of these user names and configure `READ` access to `state:STATE` and to `config:security` in your ranger policies, otherwise system services will not work properly.
:::info
Contrary to the documentation of `druid-basic-auth` Ranger does not automatically provision a highly privileged system user, you will need to do this yourself. This system user in the case of `druid-basic-auth` is named `druid_system` and for the escalator it is configurable, as shown above. Make sure to take note of these user names and configure `READ` access to `state:STATE` and to `config:security` in your ranger policies, otherwise system services will not work properly.
:::
#### Properties to configure the extension in Apache Druid
|Property|Description|Default|required|
@ -92,7 +96,9 @@ You should get back `json` describing the service definition you just added. You
When installing a new Druid service in Apache Ranger for the first time, Ranger will provision the policies to allow the administrative user `read/write` access to all properties and data sources. You might want to limit this. Do not forget to add the correct policies for the `druid_system` user and the `internalClientUserName` of the escalator.
> Loading new data sources requires `write` access to the `datasource` prior to the loading itself. So if you want to create a datasource `wikipedia` you are required to have an `allow` policy inside Apache Ranger before trying to load the spec.
:::info
Loading new data sources requires `write` access to the `datasource` prior to the loading itself. So if you want to create a datasource `wikipedia` you are required to have an `allow` policy inside Apache Ranger before trying to load the spec.
:::
## Usage

View File

@ -81,14 +81,18 @@ This topic contains configuration reference information for the Apache Kafka sup
## Idle Supervisor Configuration
> Note that Idle state transitioning is currently designated as experimental.
:::info
Note that Idle state transitioning is currently designated as experimental.
:::
| Property | Description | Required |
| ------------- | ------------- | ------------- |
| `enabled` | If `true`, Kafka supervisor will become idle if there is no data on input stream/topic for some time. | no (default == false) |
| `inactiveAfterMillis` | Supervisor is marked as idle if all existing data has been read from input topic and no new data has been published for `inactiveAfterMillis` milliseconds. | no (default == `600_000`) |
> When the supervisor enters the idle state, no new tasks will be launched subsequent to the completion of the currently executing tasks. This strategy may lead to reduced costs for cluster operators while using topics that get sporadic data.
:::info
When the supervisor enters the idle state, no new tasks will be launched subsequent to the completion of the currently executing tasks. This strategy may lead to reduced costs for cluster operators while using topics that get sporadic data.
:::
The following example demonstrates supervisor spec with `lagBased` autoScaler and idle config enabled:
```json

View File

@ -3,8 +3,12 @@ id: kinesis-ingestion
title: "Amazon Kinesis ingestion"
sidebar_label: "Amazon Kinesis"
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
@ -58,9 +62,10 @@ supervisor creates a new set of tasks. In this way, the supervisors persist acro
The following example shows how to submit a supervisor spec for a stream with the name `KinesisStream`.
In this example, `http://SERVICE_IP:SERVICE_PORT` is a placeholder for the server address of deployment and the service port.
<!--DOCUSAURUS_CODE_TABS-->
<Tabs>
<TabItem value="1" label="cURL">
<!--cURL-->
```shell
curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor" \
-H "Content-Type: application/json" \
@ -135,7 +140,9 @@ curl -X POST "http://SERVICE_IP:SERVICE_PORT/druid/indexer/v1/supervisor" \
}
}'
```
<!--HTTP-->
</TabItem>
<TabItem value="2" label="HTTP">
```HTTP
POST /druid/indexer/v1/supervisor
HTTP/1.1
@ -213,7 +220,8 @@ Content-Type: application/json
}
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
## Supervisor I/O configuration

View File

@ -25,8 +25,10 @@ title: "Globally Cached Lookups"
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-global` in the extensions load list.
## Configuration
> Static configuration is no longer supported. Lookups can be configured through
> [dynamic configuration](../../querying/lookups.md#configuration).
:::info
Static configuration is no longer supported. Lookups can be configured through
[dynamic configuration](../../querying/lookups.md#configuration).
:::
Globally cached lookups are appropriate for lookups which are not possible to pass at query time due to their size,
or are not desired to be passed at query time because the data is to reside in and be handled by the Druid servers,
@ -369,11 +371,13 @@ The JDBC lookups will poll a database to populate its local cache. If the `tsCol
}
```
> If using JDBC, you will need to add your database's client JAR files to the extension's directory.
> For Postgres, the connector JAR is already included.
> See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
> The connector JAR should reside in the classpath of Druid's main class loader.
> To add the connector JAR to the classpath, you can copy the downloaded file to `lib/` under the distribution root directory. Alternatively, create a symbolic link to the connector in the `lib` directory.
:::info
If using JDBC, you will need to add your database's client JAR files to the extension's directory.
For Postgres, the connector JAR is already included.
See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
The connector JAR should reside in the classpath of Druid's main class loader.
To add the connector JAR to the classpath, you can copy the downloaded file to `lib/` under the distribution root directory. Alternatively, create a symbolic link to the connector in the `lib` directory.
:::
## Introspection

View File

@ -25,8 +25,10 @@ title: "MySQL Metadata Store"
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.
> The MySQL extension requires the MySQL Connector/J library or MariaDB Connector/J library, neither of which are included in the Druid distribution.
> Refer to the following section for instructions on how to install this library.
:::info
The MySQL extension requires the MySQL Connector/J library or MariaDB Connector/J library, neither of which are included in the Druid distribution.
Refer to the following section for instructions on how to install this library.
:::
## Installing the MySQL connector library
@ -76,7 +78,7 @@ This extension also supports using MariaDB server, https://mariadb.org/download/
Connect to MySQL from the machine where it is installed.
```bash
> mysql -u root
mysql -u root
```
Paste the following snippet into the mysql prompt:

View File

@ -47,12 +47,14 @@ This algorithm was proven to be numerically stable by J.L. Barlow in
"Error analysis of a pairwise summation algorithm to compute sample variance"
Numer. Math, 58 (1991) pp. 583--590
> As with all [aggregators](../../querying/sql-aggregations.md), the order of operations across segments is
> non-deterministic. This means that if this aggregator operates with an input type of "float" or "double", the result
> of the aggregation may not be precisely the same across multiple runs of the query.
>
> To produce consistent results, round the variance to a fixed number of decimal places so that the results are
> precisely the same across query runs.
:::info
As with all [aggregators](../../querying/sql-aggregations.md), the order of operations across segments is
non-deterministic. This means that if this aggregator operates with an input type of "float" or "double", the result
of the aggregation may not be precisely the same across multiple runs of the query.
To produce consistent results, round the variance to a fixed number of decimal places so that the results are
precisely the same across query runs.
:::
### Pre-aggregating variance at ingestion time

View File

@ -77,7 +77,9 @@ parsing data is less efficient than writing a native Java parser or using an ext
You can use the `inputFormat` field to specify the data format for your input data.
> `inputFormat` doesn't support all data formats or ingestion methods supported by Druid.
:::info
`inputFormat` doesn't support all data formats or ingestion methods supported by Druid.
:::
Especially if you want to use the Hadoop ingestion, you still need to use the [Parser](#parser).
If your data is formatted in some format not listed in this section, please consider using the Parser instead.
@ -167,7 +169,9 @@ For example:
### ORC
To use the ORC input format, load the Druid Orc extension ( [`druid-orc-extensions`](../development/extensions-core/orc.md)).
> To upgrade from versions earlier than 0.15.0 to 0.15.0 or new, read [Migration from 'contrib' extension](../development/extensions-core/orc.md#migration-from-contrib-extension).
:::info
To upgrade from versions earlier than 0.15.0 to 0.15.0 or new, read [Migration from 'contrib' extension](../development/extensions-core/orc.md#migration-from-contrib-extension).
:::
Configure the ORC `inputFormat` to load ORC data as follows:
@ -289,9 +293,11 @@ If `type` is not included, the avroBytesDecoder defaults to `schema_repo`.
###### Inline Schema Based Avro Bytes Decoder
> The "schema_inline" decoder reads Avro records using a fixed schema and does not support schema migration. If you
> may need to migrate schemas in the future, consider one of the other decoders, all of which use a message header that
> allows the parser to identify the proper Avro schema for reading records.
:::info
The "schema_inline" decoder reads Avro records using a fixed schema and does not support schema migration. If you
may need to migrate schemas in the future, consider one of the other decoders, all of which use a message header that
allows the parser to identify the proper Avro schema for reading records.
:::
This decoder can be used if all the input events can be read using the same schema. In this case, specify the schema in the input task JSON itself, as described below.
@ -503,7 +509,9 @@ For example:
### Protobuf
> You need to include the [`druid-protobuf-extensions`](../development/extensions-core/protobuf.md) as an extension to use the Protobuf input format.
:::info
You need to include the [`druid-protobuf-extensions`](../development/extensions-core/protobuf.md) as an extension to use the Protobuf input format.
:::
Configure the Protobuf `inputFormat` to load Protobuf data as follows:
@ -686,9 +694,11 @@ Each entry in the `fields` list can have the following components:
## Parser
> The Parser is deprecated for [native batch tasks](./native-batch.md), [Kafka indexing service](../development/extensions-core/kafka-ingestion.md),
:::info
The Parser is deprecated for [native batch tasks](./native-batch.md), [Kafka indexing service](../development/extensions-core/kafka-ingestion.md),
and [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md).
Consider using the [input format](#input-format) instead for these types of ingestion.
:::
This section lists all default and core extension parsers.
For community extension parsers, please see our [community extensions list](../configuration/extensions.md#community-extensions).
@ -705,9 +715,13 @@ Each line can be further parsed using [`parseSpec`](#parsespec).
### Avro Hadoop Parser
> You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser.
:::info
You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Hadoop Parser.
:::
> See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid
:::info
See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid
:::
This parser is for [Hadoop batch ingestion](./hadoop.md).
The `inputFormat` of `inputSpec` in `ioConfig` must be set to `"org.apache.druid.data.input.avro.AvroValueInputFormat"`.
@ -764,10 +778,14 @@ For example, using Avro Hadoop parser with custom reader's schema file:
### ORC Hadoop Parser
> You need to include the [`druid-orc-extensions`](../development/extensions-core/orc.md) as an extension to use the ORC Hadoop Parser.
:::info
You need to include the [`druid-orc-extensions`](../development/extensions-core/orc.md) as an extension to use the ORC Hadoop Parser.
:::
> If you are considering upgrading from earlier than 0.15.0 to 0.15.0 or a higher version,
> please read [Migration from 'contrib' extension](../development/extensions-core/orc.md#migration-from-contrib-extension) carefully.
:::info
If you are considering upgrading from earlier than 0.15.0 to 0.15.0 or a higher version,
please read [Migration from 'contrib' extension](../development/extensions-core/orc.md#migration-from-contrib-extension) carefully.
:::
This parser is for [Hadoop batch ingestion](./hadoop.md).
The `inputFormat` of `inputSpec` in `ioConfig` must be set to `"org.apache.orc.mapreduce.OrcInputFormat"`.
@ -1005,7 +1023,9 @@ setting `"mapreduce.job.user.classpath.first": "true"`, then this will not be an
### Parquet Hadoop Parser
> You need to include the [`druid-parquet-extensions`](../development/extensions-core/parquet.md) as an extension to use the Parquet Hadoop Parser.
:::info
You need to include the [`druid-parquet-extensions`](../development/extensions-core/parquet.md) as an extension to use the Parquet Hadoop Parser.
:::
The Parquet Hadoop parser is for [Hadoop batch ingestion](./hadoop.md) and parses Parquet files directly.
The `inputFormat` of `inputSpec` in `ioConfig` must be set to `org.apache.druid.data.input.parquet.DruidParquetInputFormat`.
@ -1147,12 +1167,16 @@ However, the Parquet Avro Hadoop Parser was the original basis for supporting th
### Parquet Avro Hadoop Parser
> Consider using the [Parquet Hadoop Parser](#parquet-hadoop-parser) over this parser to ingest
:::info
Consider using the [Parquet Hadoop Parser](#parquet-hadoop-parser) over this parser to ingest
Parquet files. See [Parquet Hadoop Parser vs Parquet Avro Hadoop Parser](#parquet-hadoop-parser-vs-parquet-avro-hadoop-parser)
for the differences between those parsers.
:::
> You need to include both the [`druid-parquet-extensions`](../development/extensions-core/parquet.md)
:::info
You need to include both the [`druid-parquet-extensions`](../development/extensions-core/parquet.md)
[`druid-avro-extensions`] as extensions to use the Parquet Avro Hadoop Parser.
:::
The Parquet Avro Hadoop Parser is for [Hadoop batch ingestion](./hadoop.md).
This parser first converts the Parquet data into Avro records, and then parses them to ingest into Druid.
@ -1234,9 +1258,13 @@ an explicitly defined [format](http://www.joda.org/joda-time/apidocs/org/joda/ti
### Avro Stream Parser
> You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Stream Parser.
:::info
You need to include the [`druid-avro-extensions`](../development/extensions-core/avro.md) as an extension to use the Avro Stream Parser.
:::
> See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid
:::info
See the [Avro Types](../development/extensions-core/avro.md#avro-types) section for how Avro types are handled in Druid
:::
This parser is for [stream ingestion](./index.md#streaming) and reads Avro data from a stream directly.
@ -1276,7 +1304,9 @@ For example, using Avro stream parser with schema repo Avro bytes decoder:
### Protobuf Parser
> You need to include the [`druid-protobuf-extensions`](../development/extensions-core/protobuf.md) as an extension to use the Protobuf Parser.
:::info
You need to include the [`druid-protobuf-extensions`](../development/extensions-core/protobuf.md) as an extension to use the Protobuf Parser.
:::
This parser is for [stream ingestion](./index.md#streaming) and reads Protocol buffer data from a stream directly.
@ -1430,9 +1460,11 @@ Multiple Instances:
## ParseSpec
> The Parser is deprecated for [native batch tasks](./native-batch.md), [Kafka indexing service](../development/extensions-core/kafka-ingestion.md),
:::info
The Parser is deprecated for [native batch tasks](./native-batch.md), [Kafka indexing service](../development/extensions-core/kafka-ingestion.md),
and [Kinesis indexing service](../development/extensions-core/kinesis-ingestion.md).
Consider using the [input format](#input-format) instead for these types of ingestion.
:::
ParseSpecs serve two purposes:
@ -1468,7 +1500,9 @@ Sample spec:
### JSON Lowercase ParseSpec
> The _jsonLowercase_ parser is deprecated and may be removed in a future version of Druid.
:::info
The _jsonLowercase_ parser is deprecated and may be removed in a future version of Druid.
:::
This is a special variation of the JSON ParseSpec that lower cases all the column names in the incoming JSON data. This parseSpec is required if you are updating to Druid 0.7.x from Druid 0.6.x, are directly ingesting JSON with mixed case column names, do not have any ETL in place to lower case those column names, and would like to make queries that include the data you created using 0.6.x and 0.7.x.
@ -1608,7 +1642,9 @@ columns names ("column_1", "column2", ... "column_n") will be assigned. Ensure t
Note with the JavaScript parser that data must be fully parsed and returned as a `{key:value}` format in the JS logic.
This means any flattening or parsing multi-dimensional values must be done here.
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
### TimeAndDims ParseSpec

View File

@ -102,8 +102,10 @@ available in Druid's [web console](../operations/web-console.md). Druid's visual
## `dataSchema`
> The `dataSchema` spec has been changed in 0.17.0. The new spec is supported by all ingestion methods
:::info
The `dataSchema` spec has been changed in 0.17.0. The new spec is supported by all ingestion methods
except for _Hadoop_ ingestion. See the [Legacy `dataSchema` spec](#legacy-dataschema-spec) for the old spec.
:::
The `dataSchema` is a holder for the following components:
@ -166,10 +168,12 @@ configuring the [primary timestamp](./schema-model.md#primary-timestamp). An exa
}
```
> Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
> and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
> your ingestion spec.
:::info
Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
your ingestion spec.
:::
A `timestampSpec` can have the following components:
@ -212,10 +216,12 @@ The following `dimensionsSpec` example uses schema auto-discovery (`"useSchemaDi
```
> Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
> and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
> your ingestion spec.
:::info
Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
your ingestion spec.
:::
A `dimensionsSpec` can have the following components:
@ -248,7 +254,9 @@ Druid will interpret a `dimensionsSpec` in two possible ways: _normal_ or _schem
Normal interpretation occurs when either `dimensions` or `spatialDimensions` is non-empty. In this case, the combination of the two lists will be taken as the set of dimensions to be ingested, and the list of `dimensionExclusions` will be ignored.
> The following description of schemaless refers to string-based schemaless where Druid treats dimensions it discovers as strings. We recommend you use schema auto-discovery instead where Druid infers the type for the dimension. For more information, see [`dimensionsSpec`](#dimensionsspec).
:::info
The following description of schemaless refers to string-based schemaless where Druid treats dimensions it discovers as strings. We recommend you use schema auto-discovery instead where Druid infers the type for the dimension. For more information, see [`dimensionsSpec`](#dimensionsspec).
:::
Schemaless interpretation occurs when both `dimensions` and `spatialDimensions` are empty or null. In this case, the set of dimensions is determined in the following way:
@ -262,8 +270,10 @@ Schemaless interpretation occurs when both `dimensions` and `spatialDimensions`
Additionally, if you have empty columns that you want to include in the string-based schemaless ingestion, you'll need to include the context parameter `storeEmptyColumns` and set it to `true`.
> Note: Fields generated by a [`transformSpec`](#transformspec) are not currently considered candidates for
> schemaless dimension interpretation.
:::info
Note: Fields generated by a [`transformSpec`](#transformspec) are not currently considered candidates for
schemaless dimension interpretation.
:::
### `metricsSpec`
@ -281,11 +291,13 @@ An example `metricsSpec` is:
]
```
> Generally, when [rollup](./rollup.md) is disabled, you should have an empty `metricsSpec` (because without rollup,
> Druid does not do any ingestion-time aggregation, so there is little reason to include an ingestion-time aggregator). However,
> in some cases, it can still make sense to define metrics: for example, if you want to create a complex column as a way of
> pre-computing part of an [approximate aggregation](../querying/aggregations.md#approximate-aggregations), this can only
> be done by defining a metric in a `metricsSpec`.
:::info
Generally, when [rollup](./rollup.md) is disabled, you should have an empty `metricsSpec` (because without rollup,
Druid does not do any ingestion-time aggregation, so there is little reason to include an ingestion-time aggregator). However,
in some cases, it can still make sense to define metrics: for example, if you want to create a complex column as a way of
pre-computing part of an [approximate aggregation](../querying/aggregations.md#approximate-aggregations), this can only
be done by defining a metric in a `metricsSpec`.
:::
### `granularitySpec`
@ -340,10 +352,12 @@ records during ingestion time. It is optional. An example `transformSpec` is:
}
```
> Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
> and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
> your ingestion spec.
:::info
Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
your ingestion spec.
:::
#### Transforms
@ -369,10 +383,12 @@ Druid currently includes one kind of built-in transform, the expression transfor
The `expression` is a [Druid query expression](../querying/math-expr.md).
> Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
> and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
> your ingestion spec.
:::info
Conceptually, after input data records are read, Druid applies ingestion spec components in a particular order:
first [`flattenSpec`](data-formats.md#flattenspec) (if any), then [`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
and finally [`dimensionsSpec`](#dimensionsspec) and [`metricsSpec`](#metricsspec). Keep this in mind when writing
your ingestion spec.
:::
#### Filter
@ -382,8 +398,10 @@ ingested. Any of Druid's standard [query filters](../querying/filters.md) can be
### Legacy `dataSchema` spec
> The `dataSchema` spec has been changed in 0.17.0. The new spec is supported by all ingestion methods
:::info
The `dataSchema` spec has been changed in 0.17.0. The new spec is supported by all ingestion methods
except for _Hadoop_ ingestion. See [`dataSchema`](#dataschema) for the new spec.
:::
The legacy `dataSchema` spec has below two more components in addition to the ones listed in the [`dataSchema`](#dataschema) section above.
@ -506,7 +524,9 @@ Front coding is an experimental feature starting in version 25.0. Front coding i
You can enable front coding with all types of ingestion. For information on defining an `indexSpec` in a query context, see [SQL-based ingestion reference](../multi-stage-query/reference.md#context-parameters).
> Front coding was originally introduced in Druid 25.0, and an improved 'version 1' was introduced in Druid 26.0, with typically faster read speed and smaller storage size. The current recommendation is to enable it in a staging environment and fully test your use case before using in production. By default, segments created with front coding enabled in Druid 26.0 are backwards compatible with Druid 25.0, but those created with Druid 26.0 or 25.0 are not compatible with Druid versions older than 25.0. If using front coding in Druid 25.0 and upgrading to Druid 26.0, the `formatVersion` defaults to `0` to keep writing out the older format to enable seamless downgrades to Druid 25.0, and then later is recommended to be changed to `1` once determined that rollback is not necessary.
:::info
Front coding was originally introduced in Druid 25.0, and an improved 'version 1' was introduced in Druid 26.0, with typically faster read speed and smaller storage size. The current recommendation is to enable it in a staging environment and fully test your use case before using in production. By default, segments created with front coding enabled in Druid 26.0 are backwards compatible with Druid 25.0, but those created with Druid 26.0 or 25.0 are not compatible with Druid versions older than 25.0. If using front coding in Druid 25.0 and upgrading to Druid 26.0, the `formatVersion` defaults to `0` to keep writing out the older format to enable seamless downgrades to Druid 25.0, and then later is recommended to be changed to `1` once determined that rollback is not necessary.
:::
Beyond these properties, each ingestion method has its own specific tuning properties. See the documentation for each
[ingestion method](./index.md#ingestion-methods) for details.

View File

@ -29,7 +29,9 @@ For general information on native batch indexing and parallel task indexing, see
## S3 input source
> You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source.
:::info
You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source.
:::
The S3 input source reads objects directly from S3. You can specify either:
- a list of S3 URI strings
@ -206,11 +208,15 @@ Properties Object:
|assumeRoleArn|AWS ARN of the role to assume [see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html). **assumeRoleArn** can be used either with the ingestion spec AWS credentials or with the default S3 credentials|None|no|
|assumeRoleExternalId|A unique identifier that might be required when you assume a role in another account [see](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html)|None|no|
> **Note:** If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.
:::info
**Note:** If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credentials provider chain](../development/extensions-core/s3.md#s3-authentication-methods) is used.
:::
## Google Cloud Storage input source
> You need to include the [`druid-google-extensions`](../development/extensions-core/google.md) as an extension to use the Google Cloud Storage input source.
:::info
You need to include the [`druid-google-extensions`](../development/extensions-core/google.md) as an extension to use the Google Cloud Storage input source.
:::
The Google Cloud Storage input source is to support reading objects directly
from Google Cloud Storage. Objects can be specified as list of Google
@ -294,7 +300,9 @@ Google Cloud Storage object:
## Azure input source
> You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
:::info
You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
:::
The Azure input source reads objects directly from Azure Blob store or Azure Data Lake sources. You can
specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](./native-batch.md) indexing and each worker task reads one chunk of the split data.
@ -375,7 +383,9 @@ The `objects` property is:
## HDFS input source
> You need to include the [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension to use the HDFS input source.
:::info
You need to include the [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension to use the HDFS input source.
:::
The HDFS input source is to support reading files directly
from HDFS storage. File paths can be specified as an HDFS URI string or a list
@ -462,9 +472,11 @@ in `druid.ingestion.hdfs.allowedProtocols`. See [HDFS input source security conf
The HTTP input source is to support reading files directly from remote sites via HTTP.
> **Security notes:** Ingestion tasks run under the operating system account that runs the Druid processes, for example the Indexer, Middle Manager, and Peon. This means any user who can submit an ingestion task can specify an input source referring to any location that the Druid process can access. For example, using `http` input source, users may have access to internal network servers.
>
> The `http` input source is not limited to the HTTP or HTTPS protocols. It uses the Java URI class that supports HTTP, HTTPS, FTP, file, and jar protocols by default.
:::info
**Security notes:** Ingestion tasks run under the operating system account that runs the Druid processes, for example the Indexer, Middle Manager, and Peon. This means any user who can submit an ingestion task can specify an input source referring to any location that the Druid process can access. For example, using `http` input source, users may have access to internal network servers.
The `http` input source is not limited to the HTTP or HTTPS protocols. It uses the Java URI class that supports HTTP, HTTPS, FTP, file, and jar protocols by default.
:::
For more information about security best practices, see [Security overview](../operations/security-overview.md#best-practices).
@ -690,10 +702,12 @@ rolled-up datasource `wikipedia_rollup` by grouping on hour, "countryName", and
}
```
> Note: Older versions (0.19 and earlier) did not respect the timestampSpec when using the Druid input source. If you
> have ingestion specs that rely on this and cannot rewrite them, set
> [`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`](../configuration/index.md#indexer-general-configuration)
> to `true` to enable a compatibility mode where the timestampSpec is ignored.
:::info
Note: Older versions (0.19 and earlier) did not respect the timestampSpec when using the Druid input source. If you
have ingestion specs that rely on this and cannot rewrite them, set
[`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`](../configuration/index.md#indexer-general-configuration)
to `true` to enable a compatibility mode where the timestampSpec is ignored.
:::
## SQL input source
@ -796,7 +810,9 @@ The following is an example of a Combining input source spec:
## Iceberg input source
> To use the Iceberg input source, add the `druid-iceberg-extensions` extension.
:::info
To use the Iceberg input source, add the `druid-iceberg-extensions` extension.
:::
You use the Iceberg input source to read data stored in the Iceberg table format. For a given table, the input source scans up to the latest Iceberg snapshot from the configured Hive catalog. Druid ingests the underlying live data files using the existing input source formats.

View File

@ -23,7 +23,9 @@ sidebar_label: "Firehose (deprecated)"
~ under the License.
-->
> Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](../operations/migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
:::info
Firehose ingestion is deprecated. See [Migrate from firehose to input source ingestion](../operations/migrate-from-firehose-ingestion.md) for instructions on migrating from firehose ingestion to using native batch ingestion input sources.
:::
There are several firehoses readily available in Druid, some are meant for examples, others can be used directly in a production environment.

View File

@ -23,8 +23,10 @@ sidebar_label: "JSON-based batch (simple)"
~ under the License.
-->
> This page describes native batch ingestion using [ingestion specs](ingestion-spec.md). Refer to the [ingestion
> methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
:::info
This page describes native batch ingestion using [ingestion specs](ingestion-spec.md). Refer to the [ingestion
methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
:::
The simple task ([task type](tasks.md) `index`) executes single-threaded as a single task within the indexing service. For parallel, scalable options consider using [`index_parallel` tasks](./native-batch.md) or [SQL-based batch ingestion](../multi-stage-query/index.md).

View File

@ -23,7 +23,9 @@ sidebar_label: JSON-based batch
~ under the License.
-->
> This page describes JSON-based batch ingestion using [ingestion specs](ingestion-spec.md). For SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) extension, see [SQL-based ingestion](../multi-stage-query/index.md). Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
:::info
This page describes JSON-based batch ingestion using [ingestion specs](ingestion-spec.md). For SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) extension, see [SQL-based ingestion](../multi-stage-query/index.md). Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which ingestion method is right for you.
:::
Apache Druid supports the following types of native batch indexing tasks:
- Parallel task indexing (`index_parallel`) that can run multiple indexing tasks concurrently. Parallel task works well for production ingestion tasks.
@ -344,7 +346,9 @@ In hash partitioning, the partition function is used to compute hash of partitio
#### Single-dimension range partitioning
> Single dimension range partitioning is not supported in the sequential mode of the `index_parallel` task type.
:::info
Single dimension range partitioning is not supported in the sequential mode of the `index_parallel` task type.
:::
Range partitioning has [several benefits](#benefits-of-range-partitioning) related to storage footprint and query
performance.
@ -388,13 +392,17 @@ the time chunk and the value of `partitionDimension`; each worker task reads the
falling in the same partition of the same range from multiple MiddleManager/Indexer processes and merges
them to create the final segments. Finally, they push the final segments to the deep storage.
> Because the task with single-dimension range partitioning makes two passes over the input
> in `partial dimension distribution` and `partial segment generation` phases,
> the task may fail if the input changes in between the two passes.
:::info
Because the task with single-dimension range partitioning makes two passes over the input
in `partial dimension distribution` and `partial segment generation` phases,
the task may fail if the input changes in between the two passes.
:::
#### Multi-dimension range partitioning
> Multi-dimension range partitioning is not supported in the sequential mode of the `index_parallel` task type.
:::info
Multi-dimension range partitioning is not supported in the sequential mode of the `index_parallel` task type.
:::
Range partitioning has [several benefits](#benefits-of-range-partitioning) related to storage footprint and query
performance. Multi-dimension range partitioning improves over single-dimension range partitioning by allowing

View File

@ -249,7 +249,9 @@ Druid can infer the schema for your data in one of two ways:
#### Type-aware schema discovery
> Note that using type-aware schema discovery can impact downstream BI tools depending on how they handle ARRAY typed columns.
:::info
Note that using type-aware schema discovery can impact downstream BI tools depending on how they handle ARRAY typed columns.
:::
You can have Druid infer the schema and types for your data partially or fully by setting `dimensionsSpec.useSchemaDiscovery` to `true` and defining some or no dimensions in the dimensions list.

View File

@ -304,7 +304,9 @@ For example, a Kafka indexing task and a compaction task can always write segmen
The reason for this is because a Kafka indexing task always appends new segments, while a compaction task always overwrites existing segments.
The segments created with the segment locking have the _same_ major version and a _higher_ minor version.
> The segment locking is still experimental. It could have unknown bugs which potentially lead to incorrect query results.
:::info
The segment locking is still experimental. It could have unknown bugs which potentially lead to incorrect query results.
:::
To enable segment locking, you may need to set `forceTimeChunkLock` to `false` in the [task context](#context).
Once `forceTimeChunkLock` is unset, the task will choose a proper lock type to use automatically.
@ -415,11 +417,15 @@ If you don't see the log file in long-term storage, it means either:
You can check the middleManager / indexer logs locally to see if there was a push failure. If there was not, check the Overlord's own process logs to see why the task failed before it started.
> If you are running the indexing service in remote mode, the task logs must be stored in S3, Azure Blob Store, Google Cloud Storage or HDFS.
:::info
If you are running the indexing service in remote mode, the task logs must be stored in S3, Azure Blob Store, Google Cloud Storage or HDFS.
:::
You can configure retention periods for logs in milliseconds by setting `druid.indexer.logs.kill` properties in [configuration](../configuration/index.md#task-logging). The Overlord will then automatically manage task logs in log directories along with entries in task-related metadata storage tables.
> Automatic log file deletion typically works based on the log file's 'modified' timestamp in the back-end store. Large clock skews between Druid processes and the long-term store might result in unintended behavior.
:::info
Automatic log file deletion typically works based on the log file's 'modified' timestamp in the back-end store. Large clock skews between Druid processes and the long-term store might result in unintended behavior.
:::
## Configuring task storage sizes

View File

@ -23,9 +23,11 @@ sidebar_label: "Key concepts"
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
## Multi-stage query task engine

View File

@ -23,9 +23,11 @@ sidebar_label: Examples
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
These example queries show you some of the things you can do when modifying queries for your use case. Copy the example queries into the **Query** view of the web console and run them to see what they do.

View File

@ -24,9 +24,11 @@ description: Introduces multi-stage query architecture and its task engine
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
Apache Druid supports SQL-based ingestion using the bundled [`druid-multi-stage-query` extension](#load-the-extension).
This extension adds a [multi-stage query task engine for SQL](concepts.md#multi-stage-query-task-engine) that allows running SQL

View File

@ -23,9 +23,11 @@ sidebar_label: Known issues
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
## Multi-stage query task runtime

View File

@ -23,9 +23,11 @@ sidebar_label: Reference
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
## SQL reference

View File

@ -23,9 +23,11 @@ sidebar_label: Security
~ under the License.
-->
> This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
> extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
> ingestion method is right for you.
:::info
This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md)
extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which
ingestion method is right for you.
:::
All authenticated users can use the multi-stage query task engine (MSQ task engine) through the UI and API if the
extension is loaded. However, without additional permissions, users are not able to issue queries that read or write
@ -46,7 +48,9 @@ users with access to the Overlord API can perform some actions even if they didn
retrieving status or canceling a query. For more information about the Overlord API and the task API, see [APIs for
SQL-based ingestion](../api-reference/sql-ingestion-api.md).
> Keep in mind that any user with access to Overlord APIs can submit `query_controller` tasks with only the WRITE DATASOURCE permission.
:::info
Keep in mind that any user with access to Overlord APIs can submit `query_controller` tasks with only the WRITE DATASOURCE permission.
:::
Depending on what a user is trying to do, they might also need the following permissions:

View File

@ -63,7 +63,9 @@ memberOf: cn=mygroup,ou=groups,dc=example,dc=com
You use this information to map the LDAP group to Druid roles in a later step.
> Druid uses the `memberOf` attribute to determine a group's membership using LDAP. If your LDAP server implementation doesn't include this attribute, you must complete some additional steps when you [map LDAP groups to Druid roles](#map-ldap-groups-to-druid-roles).
:::info
Druid uses the `memberOf` attribute to determine a group's membership using LDAP. If your LDAP server implementation doesn't include this attribute, you must complete some additional steps when you [map LDAP groups to Druid roles](#map-ldap-groups-to-druid-roles).
:::
## Configure Druid for LDAP authentication
@ -105,7 +107,9 @@ In the example below, the LDAP user is `internal@example.com`.
- `userAttribute`: The user search attribute.
- `internal@example.com` is the LDAP user you created in step 1. In the example it serves as both the internal client user and the initial admin user.
> In the above example, the [Druid escalator](../development/extensions-core/druid-basic-security.md#escalator) and LDAP initial admin user are set to the same user - `internal@example.com`. If the escalator is set to a different user, you must follow steps 4 and 5 to create the group mapping and allocate initial roles before the rest of the cluster can function.
:::info
In the above example, the [Druid escalator](../development/extensions-core/druid-basic-security.md#escalator) and LDAP initial admin user are set to the same user - `internal@example.com`. If the escalator is set to a different user, you must follow steps 4 and 5 to create the group mapping and allocate initial roles before the rest of the cluster can function.
:::
4. Save your group mapping to a JSON file. An example file `groupmap.json` looks like this:

View File

@ -421,8 +421,10 @@ Enabling process termination on out-of-memory errors is useful as well, since th
-XX:HeapDumpPath=/var/logs/druid/historical.hprof
-XX:MaxDirectMemorySize=1g
```
> Please note that the flag settings above represent sample, general guidelines only. Be careful to use values appropriate
:::info
Please note that the flag settings above represent sample, general guidelines only. Be careful to use values appropriate
for your specific scenario and be sure to test any changes in staging environments.
:::
`ExitOnOutOfMemoryError` flag is only supported starting JDK 8u92 . For older versions, `-XX:OnOutOfMemoryError='kill -9 %p'` can be used.

View File

@ -71,7 +71,9 @@ If you want to skip the details, check out the [example](#example) for configuri
<a name="kill-task"></a>
### Segment records and segments in deep storage (kill task)
> The kill task is the only configuration in this topic that affects actual data in deep storage and not simply metadata or logs.
:::info
The kill task is the only configuration in this topic that affects actual data in deep storage and not simply metadata or logs.
:::
Segment records and segments in deep storage become eligible for deletion when both of the following conditions hold:
@ -118,8 +120,10 @@ Rule cleanup uses the following configuration:
Druid retains all compaction configuration records by default, which should be suitable for most use cases.
If you create and delete short-lived datasources with high frequency, and you set auto compaction configuration on those datasources, then consider turning on automated cleanup of compaction configuration records.
> With automated cleanup of compaction configuration records, if you create a compaction configuration for some datasource before the datasource exists, for example if initial ingestion is still ongoing, Druid may remove the compaction configuration.
:::info
With automated cleanup of compaction configuration records, if you create a compaction configuration for some datasource before the datasource exists, for example if initial ingestion is still ongoing, Druid may remove the compaction configuration.
To prevent the configuration from being prematurely removed, wait for the datasource to be created before applying the compaction configuration to the datasource.
:::
Unlike other metadata records, compaction configuration records do not have a retention period set by `durationToRetain`. Druid deletes compaction configuration records at every cleanup cycle for inactive datasources, which do not have segments either used or unused.
@ -130,7 +134,9 @@ Compaction configuration cleanup uses the following configuration:
- `druid.coordinator.kill.compaction.period`: Defines the frequency in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601#Durations) for the cleanup job to check for and delete eligible compaction configuration records. Defaults to `P1D`.
>If you already have an extremely large compaction configuration, you may not be able to delete compaction configuration due to size limits with the audit log. In this case you can set `druid.audit.manager.maxPayloadSizeBytes` and `druid.audit.manager.skipNullField` to avoid the auditing issue. See [Audit logging](../configuration/index.md#audit-logging).
:::info
If you already have an extremely large compaction configuration, you may not be able to delete compaction configuration due to size limits with the audit log. In this case you can set `druid.audit.manager.maxPayloadSizeBytes` and `druid.audit.manager.skipNullField` to avoid the auditing issue. See [Audit logging](../configuration/index.md#audit-logging).
:::
### Datasource records created by supervisors

View File

@ -1,48 +0,0 @@
---
id: getting-started
title: "Getting started with Apache Druid"
---
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
## Overview
If you are new to Druid, we recommend reading the [Design Overview](../design/index.md) and the [Ingestion Overview](../ingestion/index.md) first for a basic understanding of Druid.
## Single-server Quickstart and Tutorials
To get started with running Druid, the simplest and quickest way is to try the [single-server quickstart and tutorials](../tutorials/index.md).
## Deploying a Druid cluster
If you wish to jump straight to deploying Druid as a cluster, or if you have an existing single-server deployment that you wish to migrate to a clustered deployment, please see the [Clustered Deployment Guide](../tutorials/cluster.md).
## Operating Druid
The [configuration reference](../configuration/index.md) describes all of Druid's configuration properties.
The [API reference](../api-reference/api-reference.md) describes the APIs available on each Druid process.
The [basic cluster tuning guide](../operations/basic-cluster-tuning.md) is an introductory guide for tuning your Druid cluster.
## Need help with Druid?
If you have questions about using Druid, please reach out to the [Druid user mailing list or other community channels](https://druid.apache.org/community/)!

View File

@ -35,7 +35,9 @@ All Druid metrics share a common set of fields:
Metrics may have additional dimensions beyond those listed above.
> Most metric values reset each emission period, as specified in `druid.monitoring.emissionPeriod`.
:::info
Most metric values reset each emission period, as specified in `druid.monitoring.emissionPeriod`.
:::
## Query metrics

View File

@ -60,7 +60,9 @@ Set the following query laning properties in the `broker/runtime.properties` fil
* `druid.query.scheduler.laning.strategy` The strategy used to assign queries to lanes.
You can use the built-in [“high/low” laning strategy](../configuration/index.md#highlow-laning-strategy), or [define your own laning strategy manually](../configuration/index.md#manual-laning-strategy).
* `druid.query.scheduler.numThreads` The total number of queries that can be served per Broker. We recommend setting this value to 1-2 less than `druid.server.http.numThreads`.
> The query scheduler by default does not limit the number of queries that a Broker can serve. Setting this property to a bounded number limits the thread count. If the allocated threads are all occupied, any incoming query, including interactive queries, will be rejected with an HTTP 429 status code.
:::info
The query scheduler by default does not limit the number of queries that a Broker can serve. Setting this property to a bounded number limits the thread count. If the allocated threads are all occupied, any incoming query, including interactive queries, will be rejected with an HTTP 429 status code.
:::
### Lane-specific properties

View File

@ -134,6 +134,8 @@ Note that if you specify `--defaultVersion`, you don't have to put version infor
java -classpath "/my/druid/lib/*" org.apache.druid.cli.Main tools pull-deps --defaultVersion {{DRUIDVERSION}} --clean -c org.apache.druid.extensions:mysql-metadata-storage -h org.apache.hadoop:hadoop-client:2.3.0 -h org.apache.hadoop:hadoop-client:2.4.0
```
> Please note to use the pull-deps tool you must know the Maven groupId, artifactId, and version of your extension.
>
> For Druid community extensions listed [here](../configuration/extensions.md), the groupId is "org.apache.druid.extensions.contrib" and the artifactId is the name of the extension.
:::info
Please note to use the pull-deps tool you must know the Maven groupId, artifactId, and version of your extension.
For Druid community extensions listed [here](../configuration/extensions.md), the groupId is "org.apache.druid.extensions.contrib" and the artifactId is the name of the extension.
:::

View File

@ -95,7 +95,9 @@ curl --location --request GET 'http://localhost:8888/druid/coordinator/v1/rules'
The rules API accepts an array of rules as JSON objects. The JSON object you send in the API request for each rule is specific to the rules types outlined below.
> You must pass the entire array of rules, in your desired order, with each API request. Each POST request to the rules API overwrites the existing rules for the specified datasource.
:::info
You must pass the entire array of rules, in your desired order, with each API request. Each POST request to the rules API overwrites the existing rules for the specified datasource.
:::
The order of rules is very important. The Coordinator reads rules in the order in which they appear in the rules list. For example, in the following screenshot the Coordinator evaluates data against rule 1, then rule 2, then rule 3:

View File

@ -32,8 +32,9 @@ By default, security features in Druid are disabled, which simplifies the initia
The following recommendations apply to the Druid cluster setup:
* Run Druid as an unprivileged Unix user. Do not run Druid as the root user.
> **WARNING!** \
Druid administrators have the same OS permissions as the Unix user account running Druid. See [Authentication and authorization model](security-user-auth.md#authentication-and-authorization-model). If the Druid process is running under the OS root user account, then Druid administrators can read or write all files that the root account has access to, including sensitive files such as `/etc/passwd`.
:::caution
Druid administrators have the same OS permissions as the Unix user account running Druid. See [Authentication and authorization model](security-user-auth.md#authentication-and-authorization-model). If the Druid process is running under the OS root user account, then Druid administrators can read or write all files that the root account has access to, including sensitive files such as `/etc/passwd`.
:::
* Enable authentication to the Druid cluster for production environments and other environments that can be accessed by untrusted networks.
* Enable authorization and do not expose the web console without authorization enabled. If authorization is not enabled, any user that has access to the web console has the same privileges as the operating system user that runs the web console process.
* Grant users the minimum permissions necessary to perform their functions. For instance, do not allow users who only need to query data to write to data sources or view state.
@ -82,7 +83,9 @@ keytool -import -file public.cert -alias druid -keystore truststore.jks
Druid uses Jetty as its embedded web server. See [Configuring SSL/TLS KeyStores
](https://www.eclipse.org/jetty/documentation/jetty-11/operations-guide/index.html#og-keystore) from the Jetty documentation.
> WARNING: Do not use self-signed certificates for production environments. Instead, rely on your current public key infrastructure to generate and distribute trusted keys.
:::caution
Do not use self-signed certificates for production environments. Instead, rely on your current public key infrastructure to generate and distribute trusted keys.
:::
### Update Druid TLS configurations
Edit `common.runtime.properties` for all Druid services on all nodes. Add or update the following TLS options. Restart the cluster when you are finished.
@ -194,15 +197,19 @@ The following diagram depicts the authorization model, and the relationship betw
The following steps walk through a sample setup procedure:
> The default Coordinator API port is 8081 for non-TLS connections and 8281 for secured connections.
:::info
The default Coordinator API port is 8081 for non-TLS connections and 8281 for secured connections.
:::
1. Create a user by issuing a POST request to `druid-ext/basic-security/authentication/db/MyBasicMetadataAuthenticator/users/<USERNAME>`.
Replace `<USERNAME>` with the *new* username you are trying to create. For example:
```bash
curl -u admin:password1 -XPOST https://my-coordinator-ip:8281/druid-ext/basic-security/authentication/db/MyBasicMetadataAuthenticator/users/myname
```
> If you have TLS enabled, be sure to adjust the curl command accordingly. For example, if your Druid servers use self-signed certificates,
you may choose to include the `insecure` curl option to forgo certificate checking for the curl command.
:::info
If you have TLS enabled, be sure to adjust the curl command accordingly. For example, if your Druid servers use self-signed certificates,
you may choose to include the `insecure` curl option to forgo certificate checking for the curl command.
:::
2. Add a credential for the user by issuing a POST request to `druid-ext/basic-security/authentication/db/MyBasicMetadataAuthenticator/users/<USERNAME>/credentials`. For example:
```bash
@ -244,7 +251,9 @@ The following steps walk through a sample setup procedure:
}
]
```
> Note: Druid treats the resource name as a regular expression (regex). You can use a specific datasource name or regex to grant permissions for multiple datasources at a time.
:::info
Note: Druid treats the resource name as a regular expression (regex). You can use a specific datasource name or regex to grant permissions for multiple datasources at a time.
:::
## Configuring an LDAP authenticator
@ -263,7 +272,9 @@ From the innermost layer:
1. Druid processes have the same access to the local files granted to the specified system user running the process.
2. The Druid ingestion system can create new processes to execute tasks. Those tasks inherit the user of their parent process. This means that any user authorized to submit an ingestion task can use the ingestion task permissions to read or write any local files or external resources that the Druid process has access to.
> Note: Only grant the `DATASOURCE WRITE` to trusted users because they can act as the Druid process.
:::info
Note: Only grant the `DATASOURCE WRITE` to trusted users because they can act as the Druid process.
:::
Within the cluster:
1. Druid assumes it operates on an isolated, protected network where no reachable IP within the network is under adversary control. When you implement Druid, take care to setup firewalls and other security measures to secure both inbound and outbound connections.

View File

@ -51,12 +51,14 @@ You may need to consider the followings to optimize your segments.
doesn't match with the "number of rows per segment", please consider optimizing
number of rows per segment rather than this value.
> The above recommendation works in general, but the optimal setting can
> vary based on your workload. For example, if most of your queries
> are heavy and take a long time to process each row, you may want to make
> segments smaller so that the query processing can be more parallelized.
> If you still see some performance issue after optimizing segment size,
> you may need to find the optimal settings for your workload.
:::info
The above recommendation works in general, but the optimal setting can
vary based on your workload. For example, if most of your queries
are heavy and take a long time to process each row, you may want to make
segments smaller so that the query processing can be more parallelized.
If you still see some performance issue after optimizing segment size,
you may need to find the optimal settings for your workload.
:::
There might be several ways to check if the compaction is necessary. One way
is using the [System Schema](../querying/sql-metadata-tables.md#system-schema). The

View File

@ -35,10 +35,12 @@ Access the web console at the following address:
http://<ROUTER_IP>:<ROUTER_PORT>
```
> **Security note:** Without [Druid user permissions](../operations/security-overview.md) configured, any user of the
:::info
**Security note:** Without [Druid user permissions](../operations/security-overview.md) configured, any user of the
API or web console has effectively the same level of access to local files and network services as the user under which
Druid runs. It is a best practice to avoid running Druid as the root user, and to use Druid permissions or network
firewalls to restrict which users have access to potentially sensitive resources.
:::
This topic presents the high-level features and functionality of the web console.

View File

@ -22,10 +22,12 @@ title: "Aggregations"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about aggregators available in SQL, refer to the
> [SQL documentation](sql-aggregations.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about aggregators available in SQL, refer to the
[SQL documentation](sql-aggregations.md).
:::
You can use aggregations:
- in the ingestion spec during ingestion to summarize data before it enters Apache Druid.
@ -399,8 +401,10 @@ Compared to the Theta sketch, the HLL sketch does not support set operations and
#### Cardinality, hyperUnique
> For new use cases, we recommend evaluating [DataSketches Theta Sketch](../development/extensions-core/datasketches-theta.md) or [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.md) instead.
> The DataSketches aggregators are generally able to offer more flexibility and better accuracy than the classic Druid `cardinality` and `hyperUnique` aggregators.
:::info
For new use cases, we recommend evaluating [DataSketches Theta Sketch](../development/extensions-core/datasketches-theta.md) or [DataSketches HLL Sketch](../development/extensions-core/datasketches-hll.md) instead.
The DataSketches aggregators are generally able to offer more flexibility and better accuracy than the classic Druid `cardinality` and `hyperUnique` aggregators.
:::
The [Cardinality and HyperUnique](../querying/hll-old.md) aggregators are older aggregator implementations available by default in Druid that also provide distinct count estimates using the HyperLogLog algorithm. The newer DataSketches Theta and HLL extension-provided aggregators described above have superior accuracy and performance and are recommended instead.
@ -442,9 +446,11 @@ We do not recommend the fixed buckets histogram for general use, as its usefulne
#### Approximate Histogram (deprecated)
> The Approximate Histogram aggregator is deprecated.
> There are a number of other quantile estimation algorithms that offer better performance, accuracy, and memory footprint.
> We recommend using [DataSketches Quantiles](../development/extensions-core/datasketches-quantiles.md) instead.
:::info
The Approximate Histogram aggregator is deprecated.
There are a number of other quantile estimation algorithms that offer better performance, accuracy, and memory footprint.
We recommend using [DataSketches Quantiles](../development/extensions-core/datasketches-quantiles.md) instead.
:::
The [Approximate Histogram](../development/extensions-core/approximate-histograms.md) extension-provided aggregator also provides quantile estimates and histogram approximations, based on [http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf).
@ -568,7 +574,9 @@ JavaScript functions are expected to return floating-point values.
}
```
> JavaScript functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
## Miscellaneous aggregations
@ -637,4 +645,3 @@ possible output of the aggregator is:
As the example illustrates, you can think of the output number as an unsigned _n_ bit number where _n_ is the number of dimensions passed to the aggregator.
Druid sets the bit at position X for the number to 0 if the sub-grouping includes a dimension at position X in the aggregator input. Otherwise, Druid sets this bit to 1.

View File

@ -45,9 +45,11 @@ Druid supports two types of query caching:
Druid invalidates any cache the moment any underlying data change to avoid returning stale results. This is especially important for `table` datasources that have highly-variable underlying data segments, including real-time data segments.
> **Druid can store cache data on the local JVM heap or in an external distributed key/value store (e.g. memcached)**
>
> The default is a local cache based upon [Caffeine](https://github.com/ben-manes/caffeine). The default maximum cache storage size is the minimum of 1 GiB / ten percent of maximum runtime memory for the JVM, with no cache expiration. See [Cache configuration](../configuration/index.md#cache-configuration) for information on how to configure cache storage. When using caffeine, the cache is inside the JVM heap and is directly measurable. Heap usage will grow up to the maximum configured size, and then the least recently used segment results will be evicted and replaced with newer results.
:::info
**Druid can store cache data on the local JVM heap or in an external distributed key/value store (e.g. memcached)**
The default is a local cache based upon [Caffeine](https://github.com/ben-manes/caffeine). The default maximum cache storage size is the minimum of 1 GiB / ten percent of maximum runtime memory for the JVM, with no cache expiration. See [Cache configuration](../configuration/index.md#cache-configuration) for information on how to configure cache storage. When using caffeine, the cache is inside the JVM heap and is directly measurable. Heap usage will grow up to the maximum configured size, and then the least recently used segment results will be evicted and replaced with newer results.
:::
### Per-segment caching

View File

@ -3,6 +3,10 @@ id: datasource
title: "Datasources"
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
@ -34,12 +38,15 @@ responses.
### `table`
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
<Tabs>
<TabItem value="1" label="SQL">
```sql
SELECT column1, column2 FROM "druid"."dataSourceName"
```
<!--Native-->
</TabItem>
<TabItem value="2" label="Native">
```json
{
"queryType": "scan",
@ -48,7 +55,8 @@ SELECT column1, column2 FROM "druid"."dataSourceName"
"intervals": ["0000/3000"]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
The table datasource is the most common type. This is the kind of datasource you get when you perform
[data ingestion](../ingestion/index.md). They are split up into segments, distributed around the cluster,
@ -72,12 +80,15 @@ To see a list of all table datasources, use the SQL query
### `lookup`
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
<Tabs>
<TabItem value="3" label="SQL">
```sql
SELECT k, v FROM lookup.countries
```
<!--Native-->
</TabItem>
<TabItem value="4" label="Native">
```json
{
"queryType": "scan",
@ -89,7 +100,8 @@ SELECT k, v FROM lookup.countries
"intervals": ["0000/3000"]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
Lookup datasources correspond to Druid's key-value [lookup](lookups.md) objects. In [Druid SQL](sql.md#from),
they reside in the `lookup` schema. They are preloaded in memory on all servers, so they can be accessed rapidly.
@ -101,19 +113,22 @@ both are always strings.
To see a list of all lookup datasources, use the SQL query
`SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'lookup'`.
> Performance tip: Lookups can be joined with a base table either using an explicit [join](#join), or by using the
> SQL [`LOOKUP` function](sql-scalar.md#string-functions).
> However, the join operator must evaluate the condition on each row, whereas the
> `LOOKUP` function can defer evaluation until after an aggregation phase. This means that the `LOOKUP` function is
> usually faster than joining to a lookup datasource.
:::info
Performance tip: Lookups can be joined with a base table either using an explicit [join](#join), or by using the
SQL [`LOOKUP` function](sql-scalar.md#string-functions).
However, the join operator must evaluate the condition on each row, whereas the
`LOOKUP` function can defer evaluation until after an aggregation phase. This means that the `LOOKUP` function is
usually faster than joining to a lookup datasource.
:::
Refer to the [Query execution](query-execution.md#table) page for more details on how queries are executed when you
use table datasources.
### `union`
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
<Tabs>
<TabItem value="5" label="SQL">
```sql
SELECT column1, column2
FROM (
@ -124,7 +139,9 @@ FROM (
SELECT column1, column2 FROM table3
)
```
<!--Native-->
</TabItem>
<TabItem value="6" label="Native">
```json
{
"queryType": "scan",
@ -136,7 +153,8 @@ FROM (
"intervals": ["0000/3000"]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
Unions allow you to treat two or more tables as a single datasource. In SQL, this is done with the UNION ALL operator
applied directly to tables, called a ["table-level union"](sql.md#table-level). In native queries, this is done with a
@ -158,8 +176,9 @@ use union datasources.
### `inline`
<!--DOCUSAURUS_CODE_TABS-->
<!--Native-->
<Tabs>
<TabItem value="7" label="Native">
```json
{
"queryType": "scan",
@ -175,7 +194,8 @@ use union datasources.
"intervals": ["0000/3000"]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
Inline datasources allow you to query a small amount of data that is embedded in the query itself. They are useful when
you want to write a query on a small amount of data without loading it first. They are also useful as inputs into a
@ -193,8 +213,9 @@ use inline datasources.
### `query`
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
<Tabs>
<TabItem value="8" label="SQL">
```sql
-- Uses a subquery to count hits per page, then takes the average.
SELECT
@ -202,7 +223,9 @@ SELECT
FROM
(SELECT page, COUNT(*) AS hits FROM site_traffic GROUP BY page)
```
<!--Native-->
</TabItem>
<TabItem value="9" label="Native">
```json
{
"queryType": "timeseries",
@ -230,7 +253,8 @@ FROM
]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
Query datasources allow you to issue subqueries. In native queries, they can appear anywhere that accepts a
`dataSource` (except underneath a `union`). In SQL, they can appear in the following places, always surrounded by parentheses:
@ -239,15 +263,18 @@ Query datasources allow you to issue subqueries. In native queries, they can app
- As inputs to a JOIN: `<table-or-subquery-1> t1 INNER JOIN <table-or-subquery-2> t2 ON t1.<col1> = t2.<col2>`.
- In the WHERE clause: `WHERE <column> { IN | NOT IN } (<subquery>)`. These are translated to joins by the SQL planner.
> Performance tip: In most cases, subquery results are fully buffered in memory on the Broker and then further
> processing occurs on the Broker itself. This means that subqueries with large result sets can cause performance
> bottlenecks or run into memory usage limits on the Broker. See the [Query execution](query-execution.md#query)
> page for more details on how subqueries are executed and what limits will apply.
:::info
Performance tip: In most cases, subquery results are fully buffered in memory on the Broker and then further
processing occurs on the Broker itself. This means that subqueries with large result sets can cause performance
bottlenecks or run into memory usage limits on the Broker. See the [Query execution](query-execution.md#query)
page for more details on how subqueries are executed and what limits will apply.
:::
### `join`
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
<Tabs>
<TabItem value="10" label="SQL">
```sql
-- Joins "sales" with "countries" (using "store" as the join key) to get sales by country.
SELECT
@ -259,7 +286,9 @@ FROM
GROUP BY
countries.v
```
<!--Native-->
</TabItem>
<TabItem value="11" label="Native">
```json
{
"queryType": "groupBy",
@ -284,7 +313,8 @@ GROUP BY
]
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
Join datasources allow you to do a SQL-style join of two datasources. Stacking joins on top of each other allows
you to join arbitrarily many datasources.
@ -352,9 +382,9 @@ perform best if `d.field` is a string.
4. As of Druid {{DRUIDVERSION}}, the join operator must evaluate the condition for each row. In the future, we expect
to implement both early and deferred condition evaluation, which we expect to improve performance considerably for
common use cases.
5. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into
Join's children). Druid only supports pushing predicates into the join if they originated from
above the join. Hence, the location of predicates and filters in your Druid SQL is very important.
5. Currently, Druid does not support pushing down predicates (condition and filter) past a Join (i.e. into
Join's children). Druid only supports pushing predicates into the join if they originated from
above the join. Hence, the location of predicates and filters in your Druid SQL is very important.
Also, as a result of this, comma joins should be avoided.
#### Future work for joins
@ -371,21 +401,23 @@ future versions:
### `unnest`
> The unnest datasource is [experimental](../development/experimental.md). Its API and behavior are subject
> to change in future releases. It is not recommended to use this feature in production at this time.
:::info
The unnest datasource is [experimental](../development/experimental.md). Its API and behavior are subject
to change in future releases. It is not recommended to use this feature in production at this time.
:::
Use the `unnest` datasource to unnest a column with multiple values in an array.
For example, you have a source column that looks like this:
| Nested |
| -- |
| Nested |
| -- |
| [a, b] |
| [c, d] |
| [e, [f,g]] |
When you use the `unnest` datasource, the unnested column looks like this:
| Unnested |
| Unnested |
| -- |
| a |
| b |

View File

@ -23,9 +23,11 @@ sidebar_label: "DatasourceMetadata"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type that is only available in the native language.
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type that is only available in the native language.
:::
Data Source Metadata queries return metadata information for a dataSource. These queries return information about:

View File

@ -23,10 +23,12 @@ sidebar_label: "Dimensions"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql-scalar.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about functions available in SQL, refer to the
[SQL documentation](sql-scalar.md).
:::
The following JSON fields can be used in a query to operate on dimension values.
@ -344,7 +346,9 @@ Example for the `__time` dimension:
}
```
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
### Registered lookup extraction function

View File

@ -23,10 +23,12 @@ sidebar_label: "Filters"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about aggregators available in SQL, refer to the
> [SQL documentation](sql-scalar.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about aggregators available in SQL, refer to the
[SQL documentation](sql-scalar.md).
:::
A filter is a JSON object indicating which rows of data should be included in the computation for a query. Its essentially the equivalent of the WHERE clause in SQL.
Filters are commonly applied on dimensions, but can be applied on aggregated metrics, for example, see [Filtered aggregator](./aggregations.md#filtered-aggregator) and [Having filters](./having.md).
@ -614,12 +616,16 @@ The JavaScript filter matches a dimension against the specified JavaScript funct
}
```
> JavaScript-based functionality is disabled by default. Refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
## Extraction filter
> The extraction filter is now deprecated. Use the selector filter with an extraction function instead.
:::info
The extraction filter is now deprecated. The selector filter with an extraction function specified
provides identical functionality and should be used instead.
:::
Extraction filter matches a dimension using a specific [extraction function](./dimensionspecs.md#extraction-functions).
The following filter matches the values for which the extraction function has a transformation entry `input_key=output_value` where

View File

@ -22,8 +22,10 @@ title: "Spatial filters"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](../querying/sql.md) and [native queries](../querying/querying.md).
> This document describes a feature that is only available in the native language.
:::info
Apache Druid supports two query languages: [Druid SQL](../querying/sql.md) and [native queries](../querying/querying.md).
This document describes a feature that is only available in the native language.
:::
Apache Druid supports filtering spatially indexed columns based on an origin and a bound.

View File

@ -23,10 +23,12 @@ sidebar_label: "Granularities"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about time functions available in SQL, refer to the
> [SQL documentation](sql-scalar.md#date-and-time-functions).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about time functions available in SQL, refer to the
[SQL documentation](sql-scalar.md#date-and-time-functions).
:::
Granularity determines how to bucket data across the time dimension, or how to aggregate data by hour, day, minute, etc.
@ -59,7 +61,9 @@ Druid supports the following granularity strings:
The minimum and maximum granularities are `none` and `all`, described as follows:
* `all` buckets everything into a single bucket.
* `none` does not mean zero bucketing. It buckets data to millisecond granularity—the granularity of the internal index. You can think of `none` as equivalent to `millisecond`.
> Do not use `none` in a [timeseries query](../querying/timeseriesquery.md); Druid fills empty interior time buckets with zeroes, meaning the output will contain results for every single millisecond in the requested interval.
:::info
Do not use `none` in a [timeseries query](../querying/timeseriesquery.md); Druid fills empty interior time buckets with zeroes, meaning the output will contain results for every single millisecond in the requested interval.
:::
*Avoid using the `week` granularity for partitioning at ingestion time, because weeks don't align neatly with months and years, making it difficult to partition by coarser granularities later.

View File

@ -23,17 +23,21 @@ sidebar_label: "GroupBy"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type in the native language. For information about when Druid SQL will use this query type, refer to the
> [SQL documentation](sql-translation.md#query-types).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type in the native language. For information about when Druid SQL will use this query type, refer to the
[SQL documentation](sql-translation.md#query-types).
:::
These types of Apache Druid queries take a groupBy query object and return an array of JSON objects where each object represents a
grouping asked for by the query.
> Note: If you are doing aggregations with time as your only grouping, or an ordered groupBy over a single dimension,
> consider [Timeseries](timeseriesquery.md) and [TopN](topnquery.md) queries as well as
> groupBy. Their performance may be better in some cases. See [Alternatives](#alternatives) below for more details.
:::info
Note: If you are doing aggregations with time as your only grouping, or an ordered groupBy over a single dimension,
consider [Timeseries](timeseriesquery.md) and [TopN](topnquery.md) queries as well as
groupBy. Their performance may be better in some cases. See [Alternatives](#alternatives) below for more details.
:::
An example groupBy query object is shown below:
@ -227,9 +231,11 @@ The response for the query above would look something like:
]
```
> Notice that dimensions that are not included in an individual subtotalsSpec grouping are returned with a `null` value. This response format represents a behavior change as of Apache Druid 0.18.0.
> In release 0.17.0 and earlier, such dimensions were entirely excluded from the result. If you were relying on this old behavior to determine whether a particular dimension was not part of
> a subtotal grouping, you can now use [Grouping aggregator](aggregations.md#grouping-aggregator) instead.
:::info
Notice that dimensions that are not included in an individual subtotalsSpec grouping are returned with a `null` value. This response format represents a behavior change as of Apache Druid 0.18.0.
In release 0.17.0 and earlier, such dimensions were entirely excluded from the result. If you were relying on this old behavior to determine whether a particular dimension was not part of
a subtotal grouping, you can now use [Grouping aggregator](aggregations.md#grouping-aggregator) instead.
:::
## Implementation details

View File

@ -22,10 +22,12 @@ title: "Having filters (groupBy)"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql-scalar.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about functions available in SQL, refer to the
[SQL documentation](sql-scalar.md).
:::
A having clause is a JSON object identifying which rows from a groupBy query should be returned, by specifying conditions on aggregated values.

View File

@ -22,9 +22,11 @@ title: "Sorting and limiting (groupBy)"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about sorting in SQL, refer to the [SQL documentation](sql.md#order-by).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about sorting in SQL, refer to the [SQL documentation](sql.md#order-by).
:::
The limitSpec field provides the functionality to sort and limit the set of results from a groupBy query. If you group by a single dimension and are ordering by a single metric, we highly recommend using [TopN Queries](../querying/topnquery.md) instead. The performance will be substantially better. Available options are:

View File

@ -109,9 +109,11 @@ But this one is not, since both "2" and "3" map to the same value:
To tell Druid that your lookup is injective, you must specify `"injective" : true` in the lookup configuration. Druid
will not detect this automatically.
> Currently, the injective lookup optimization is not triggered when lookups are inputs to a
> [join datasource](datasource.md#join). It is only used when lookup functions are used directly, without the join
> operator.
:::info
Currently, the injective lookup optimization is not triggered when lookups are inputs to a
[join datasource](datasource.md#join). It is only used when lookup functions are used directly, without the join
operator.
:::
Dynamic Configuration
---------------------

View File

@ -22,9 +22,11 @@ title: "Expressions"
~ under the License.
-->
> Apache Druid supports two query languages: [native queries](../querying/querying.md) and [Druid SQL](../querying/sql.md).
> This document describes the native language. For information about functions available in SQL, refer to the
> [SQL documentation](../querying/sql-scalar.md).
:::info
Apache Druid supports two query languages: [native queries](../querying/querying.md) and [Druid SQL](../querying/sql.md).
This document describes the native language. For information about functions available in SQL, refer to the
[SQL documentation](../querying/sql-scalar.md).
:::
Expressions are used in various places in the native query language, including
[virtual columns](../querying/virtual-columns.md) and [join conditions](../querying/datasource.md#join). They are

View File

@ -72,7 +72,9 @@ The following sections describe filtering and grouping behavior based on the fol
{"timestamp": "2011-01-14T00:00:00.000Z", "tags": ["t5","t6","t7"]} #row3
{"timestamp": "2011-01-14T00:00:00.000Z", "tags": []} #row4
```
> Be sure to remove the comments before trying out the sample data.
:::info
Be sure to remove the comments before trying out the sample data.
:::
### Filtering

View File

@ -3,6 +3,10 @@ id: nested-columns
title: "Nested columns"
sidebar_label: Nested columns
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
@ -33,7 +37,7 @@ Druid supports directly ingesting nested data with the following formats: JSON,
## Example nested data
The examples in this topic use the JSON data in [`nested_example_data.json`](https://static.imply.io/data/nested_example_data.json). The file contains a simple facsimile of an order tracking and shipping table.
The examples in this topic use the JSON data in [`nested_example_data.json`](https://static.imply.io/data/nested_example_data.json). The file contains a simple facsimile of an order tracking and shipping table.
When pretty-printed, a sample row in `nested_example_data` looks like this:
@ -124,7 +128,7 @@ For example, the following ingestion spec instructs Druid to ingest `shipTo` and
### Transform data during batch ingestion
You can use the [SQL JSON functions](./sql-json-functions.md) to transform nested data and reference the transformed data in your ingestion spec.
You can use the [SQL JSON functions](./sql-json-functions.md) to transform nested data and reference the transformed data in your ingestion spec.
To do this, define the output name and expression in the `transforms` list in the `transformSpec` object of your ingestion spec.
@ -341,8 +345,9 @@ For example, consider the following deserialized row of the sample data set:
The following examples demonstrate how to ingest the `shipTo` and `details` columns both as string type and as `COMPLEX<json>` in the `shipTo_parsed` and `details_parsed` columns.
<!--DOCUSAURUS_CODE_TABS-->
<!--SQL-->
<Tabs>
<TabItem value="1" label="SQL">
```
REPLACE INTO deserialized_example OVERWRITE ALL
WITH source AS (SELECT * FROM TABLE(
@ -358,12 +363,14 @@ SELECT
"department",
"shipTo",
"details",
PARSE_JSON("shipTo") as "shipTo_parsed",
PARSE_JSON("shipTo") as "shipTo_parsed",
PARSE_JSON("details") as "details_parsed"
FROM source
PARTITIONED BY DAY
```
<!--Native batch-->
</TabItem>
<TabItem value="2" label="Native batch">
```
{
"type": "index_parallel",
@ -423,7 +430,8 @@ PARTITIONED BY DAY
}
}
```
<!--END_DOCUSAURUS_CODE_TABS-->
</TabItem>
</Tabs>
## Querying nested columns
@ -475,13 +483,15 @@ Example query results:
### Extracting nested data elements
The `JSON_VALUE` function is specially optimized to provide native Druid level performance when processing nested literal values, as if they were flattened, traditional, Druid column types. It does this by reading from the specialized nested columns and indexes that are built and stored in JSON objects when Druid creates segments.
The `JSON_VALUE` function is specially optimized to provide native Druid level performance when processing nested literal values, as if they were flattened, traditional, Druid column types. It does this by reading from the specialized nested columns and indexes that are built and stored in JSON objects when Druid creates segments.
Some operations using `JSON_VALUE` run faster than those using native Druid columns. For example, filtering numeric types uses the indexes built for nested numeric columns, which are not available for Druid DOUBLE, FLOAT, or LONG columns.
`JSON_VALUE` only returns literal types. Any paths that reference JSON objects or array types return null.
> To achieve the best possible performance, use the `JSON_VALUE` function whenever you query JSON objects.
:::info
To achieve the best possible performance, use the `JSON_VALUE` function whenever you query JSON objects.
:::
#### Example query: Extract nested data elements
@ -561,7 +571,7 @@ Example query results:
### Transforming JSON object data
In addition to `JSON_VALUE`, Druid offers a number of operators that focus on transforming JSON object data:
In addition to `JSON_VALUE`, Druid offers a number of operators that focus on transforming JSON object data:
- `JSON_QUERY`
- `JSON_OBJECT`

View File

@ -22,10 +22,12 @@ title: "Post-aggregations"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql-aggregations.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about functions available in SQL, refer to the
[SQL documentation](sql-aggregations.md).
:::
Post-aggregations are specifications of processing that should happen on aggregated values as they come out of Apache Druid. If you include a post aggregation as part of a query, make sure to include all aggregators the post-aggregator requires.
@ -147,7 +149,9 @@ Example JavaScript aggregator:
}
```
> JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::info
JavaScript-based functionality is disabled by default. Please refer to the Druid [JavaScript programming guide](../development/javascript.md) for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
:::
### HyperUnique Cardinality post-aggregator

View File

@ -22,10 +22,12 @@ title: "Query execution"
~ under the License.
-->
> This document describes how Druid executes [native queries](querying.md), but since [Druid SQL](sql.md) queries
> are translated to native queries, this document applies to the SQL runtime as well. Refer to the SQL
> [Query translation](sql-translation.md) page for information about how SQL queries are translated to native
> queries.
:::info
This document describes how Druid executes [native queries](querying.md), but since [Druid SQL](sql.md) queries
are translated to native queries, this document applies to the SQL runtime as well. Refer to the SQL
[Query translation](sql-translation.md) page for information about how SQL queries are translated to native
queries.
:::
Druid's approach to query execution varies depending on the kind of [datasource](datasource.md) you are querying.

View File

@ -23,10 +23,12 @@ title: "Native queries"
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the
> native query language. For information about how Druid SQL chooses which native query types to use when
> it runs a SQL query, refer to the [SQL documentation](sql-translation.md#query-types).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the
native query language. For information about how Druid SQL chooses which native query types to use when
it runs a SQL query, refer to the [SQL documentation](sql-translation.md#query-types).
:::
Native queries in Druid are JSON objects and are typically issued to the Broker or Router processes. Queries can be
posted like this:
@ -35,7 +37,9 @@ posted like this:
curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
```
> Replace `<queryable_host>:<port>` with the appropriate address and port for your system. For example, if running the quickstart configuration, replace `<queryable_host>:<port>` with localhost:8888.
:::info
Replace `<queryable_host>:<port>` with the appropriate address and port for your system. For example, if running the quickstart configuration, replace `<queryable_host>:<port>` with localhost:8888.
:::
You can also enter them directly in the web console's Query view. Simply pasting a native query into the console switches the editor into JSON mode.
@ -50,7 +54,9 @@ The Content-Type/Accept Headers can also take 'application/x-jackson-smile'.
curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @<query_json_file>
```
> If the Accept header is not provided, it defaults to the value of 'Content-Type' header.
:::info
If the Accept header is not provided, it defaults to the value of 'Content-Type' header.
:::
Druid's native query is relatively low level, mapping closely to how computations are performed internally. Druid queries
are designed to be lightweight and complete very quickly. This means that for more complex analysis, or to build

View File

@ -23,10 +23,12 @@ sidebar_label: "Scan"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type in the native language. For information about when Druid SQL will use this query type, refer to the
> [SQL documentation](sql-translation.md#query-types).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type in the native language. For information about when Druid SQL will use this query type, refer to the
[SQL documentation](sql-translation.md#query-types).
:::
The Scan query returns raw Apache Druid rows in streaming mode.

View File

@ -23,9 +23,11 @@ sidebar_label: "Search"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type that is only available in the native language.
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type that is only available in the native language.
:::
A search query returns dimension values that match the search specification.

View File

@ -23,10 +23,12 @@ sidebar_label: "SegmentMetadata"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type that is only available in the native language. However, Druid SQL contains similar functionality in
> its [metadata tables](sql-metadata-tables.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type that is only available in the native language. However, Druid SQL contains similar functionality in
its [metadata tables](sql-metadata-tables.md).
:::
Segment metadata queries return per-segment information about:

View File

@ -22,10 +22,12 @@ title: "String comparators"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql-scalar.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about functions available in SQL, refer to the
[SQL documentation](sql-scalar.md).
:::
These sorting orders are used by the [TopNMetricSpec](./topnmetricspec.md), [SearchQuery](./searchquery.md), GroupByQuery's [LimitSpec](./limitspec.md), and [BoundFilter](./filters.md#bound-filter).

View File

@ -30,8 +30,10 @@ sidebar_label: "Aggregation functions"
patterns in this markdown file and parse it to TypeScript file for web console
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
You can use aggregation functions in the SELECT clause of any [Druid SQL](./sql.md) query.
@ -56,12 +58,14 @@ always return 0 as the initial value.
In the aggregation functions supported by Druid, only `COUNT`, `ARRAY_AGG`, and `STRING_AGG` accept the DISTINCT keyword.
> The order of aggregation operations across segments is not deterministic. This means that non-commutative aggregation
> functions can produce inconsistent results across the same query.
>
> Functions that operate on an input type of "float" or "double" may also see these differences in aggregation
> results across multiple query runs because of this. If precisely the same value is desired across multiple query runs,
> consider using the `ROUND` function to smooth out the inconsistencies between queries.
:::info
The order of aggregation operations across segments is not deterministic. This means that non-commutative aggregation
functions can produce inconsistent results across the same query.
Functions that operate on an input type of "float" or "double" may also see these differences in aggregation
results across multiple query runs because of this. If precisely the same value is desired across multiple query runs,
consider using the `ROUND` function to smooth out the inconsistencies between queries.
:::
|Function|Notes|Default|
|--------|-----|-------|

View File

@ -31,8 +31,10 @@ sidebar_label: "Array functions"
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
This page describes the operations you can perform on arrays using [Druid SQL](./sql.md). See [`ARRAY` data type documentation](./sql-data-types.md#arrays) for additional details.

View File

@ -23,8 +23,10 @@ sidebar_label: "SQL data types"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
Druid associates each column with a specific data type. This topic describes supported data types in [Druid SQL](./sql.md).
@ -84,10 +86,12 @@ You can treat multi-value string dimensions as arrays using special
Grouping by multi-value dimensions observes the native Druid multi-value aggregation behavior, which is similar to an implicit SQL UNNEST. See [Grouping](multi-value-dimensions.md#grouping) for more information.
> Because the SQL planner treats multi-value dimensions as VARCHAR, there are some inconsistencies between how they are handled in Druid SQL and in native queries. For instance, expressions involving multi-value dimensions may be incorrectly optimized by the Druid SQL planner. For example, `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to
:::info
Because the SQL planner treats multi-value dimensions as VARCHAR, there are some inconsistencies between how they are handled in Druid SQL and in native queries. For instance, expressions involving multi-value dimensions may be incorrectly optimized by the Druid SQL planner. For example, `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to
`false`, even though it is possible for a single row to have both `'a'` and `'b'` as values for `multi_val_dim`.
>
> The SQL behavior of multi-value dimensions may change in a future release to more closely align with their behavior in native queries, but the [multi-value string functions](./sql-multivalue-string-functions.md) should be able to provide nearly all possible native functionality.
The SQL behavior of multi-value dimensions may change in a future release to more closely align with their behavior in native queries, but the [multi-value string functions](./sql-multivalue-string-functions.md) should be able to provide nearly all possible native functionality.
:::
## Arrays
@ -113,9 +117,11 @@ distinguish between empty and null rows. An empty row will never appear natively
but any multi-value function which manipulates the array form of the value may produce an empty array, which is handled
separately while processing.
> Do not mix the usage of multi-value functions and normal scalar functions within the same expression, as the planner will be unable
> to determine how to properly process the value given its ambiguous usage. A multi-value string must be treated consistently within
> an expression.
:::info
Do not mix the usage of multi-value functions and normal scalar functions within the same expression, as the planner will be unable
to determine how to properly process the value given its ambiguous usage. A multi-value string must be treated consistently within
an expression.
:::
When converted to ARRAY or used with [array functions](./sql-array-functions.md), multi-value strings behave as standard SQL arrays and can no longer
be manipulated with non-array functions.

View File

@ -23,8 +23,10 @@ sidebar_label: "All functions"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
This page provides a reference of all Druid SQL functions in alphabetical order.

View File

@ -23,8 +23,10 @@ sidebar_label: "SQL metadata tables"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
Druid Brokers infer table and column metadata for each datasource from segments loaded in the cluster, and use this to
@ -47,8 +49,10 @@ FROM INFORMATION_SCHEMA.COLUMNS
WHERE "TABLE_SCHEMA" = 'druid' AND "TABLE_NAME" = 'foo'
```
> Note: INFORMATION_SCHEMA tables do not currently support Druid-specific functions like `TIME_PARSE` and
> `APPROX_QUANTILE_DS`. Only standard SQL functions can be used.
:::info
Note: INFORMATION_SCHEMA tables do not currently support Druid-specific functions like `TIME_PARSE` and
`APPROX_QUANTILE_DS`. Only standard SQL functions can be used.
:::
### SCHEMATA table
`INFORMATION_SCHEMA.SCHEMATA` provides a list of all known schemas, which include `druid` for standard [Druid Table datasources](datasource.md#table), `lookup` for [Lookups](datasource.md#lookup), `sys` for the virtual [System metadata tables](#system-schema), and `INFORMATION_SCHEMA` for these virtual tables. Tables are allowed to have the same name across different schemas, so the schema may be included in an SQL statement to distinguish them, e.g. `lookup.table` vs `druid.table`.
@ -130,8 +134,10 @@ WHERE "IS_AGGREGATOR" = 'YES'
The "sys" schema provides visibility into Druid segments, servers and tasks.
> Note: "sys" tables do not currently support Druid-specific functions like `TIME_PARSE` and
> `APPROX_QUANTILE_DS`. Only standard SQL functions can be used.
:::info
Note: "sys" tables do not currently support Druid-specific functions like `TIME_PARSE` and
`APPROX_QUANTILE_DS`. Only standard SQL functions can be used.
:::
### SEGMENTS table

View File

@ -31,8 +31,10 @@ sidebar_label: "Multi-value string functions"
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
Druid supports string dimensions containing multiple values.
This page describes the operations you can perform on multi-value string dimensions using [Druid SQL](./sql.md).

View File

@ -31,8 +31,10 @@ sidebar_label: "Operators"
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here.

View File

@ -23,8 +23,10 @@ sidebar_label: "SQL query context"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
Druid supports query context parameters which affect [SQL query](./sql.md) planning.
See [Query context](query-context.md) for general query context parameters for all query types.

View File

@ -31,8 +31,10 @@ sidebar_label: "Scalar functions"
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
[Druid SQL](./sql.md) includes scalar functions that include numeric and string functions, IP address functions, Sketch functions, and more, as described on this page.

View File

@ -23,8 +23,10 @@ sidebar_label: "SQL query translation"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the Druid SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the Druid SQL language.
:::
Druid uses [Apache Calcite](https://calcite.apache.org/) to parse and plan SQL queries.
Druid translates SQL statements into its [native JSON-based query language](querying.md).
@ -782,7 +784,9 @@ Refer to the [Query execution](query-execution.md#join) page for information abo
Subqueries in SQL are generally translated to native query datasources. Refer to the
[Query execution](query-execution.md#query) page for information about how subqueries are executed.
> Note: Subqueries in the WHERE clause, like `WHERE col1 IN (SELECT foo FROM ...)` are translated to inner joins.
:::info
Note: Subqueries in the WHERE clause, like `WHERE col1 IN (SELECT foo FROM ...)` are translated to inner joins.
:::
## Approximations

View File

@ -23,8 +23,10 @@ sidebar_label: "Overview and syntax"
~ under the License.
-->
> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
> This document describes the SQL language.
:::info
Apache Druid supports two query languages: Druid SQL and [native queries](querying.md).
This document describes the SQL language.
:::
You can query data in Druid datasources using Druid SQL. Druid translates SQL queries into its [native query language](querying.md). To learn about translation and how to get the best performance from Druid SQL, see [SQL query translation](sql-translation.md).
@ -85,8 +87,10 @@ documentation.
## UNNEST
> The UNNEST SQL function is [experimental](../development/experimental.md). Its API and behavior are subject
> to change in future releases. It is not recommended to use this feature in production at this time.
:::info
The UNNEST SQL function is [experimental](../development/experimental.md). Its API and behavior are subject
to change in future releases. It is not recommended to use this feature in production at this time.
:::
The UNNEST clause unnests array values. It's the SQL equivalent to the [unnest datasource](./datasource.md#unnest). The source for UNNEST can be an array or an input that's been transformed into an array, such as with helper functions like MV_TO_ARRAY or ARRAY.
@ -220,7 +224,9 @@ UNION ALL
SELECT COUNT(*) FROM tbl WHERE my_column = 'value2'
```
> With top-level queries, you can't apply GROUP BY, ORDER BY, or any other operator to the results of a UNION ALL.
:::info
With top-level queries, you can't apply GROUP BY, ORDER BY, or any other operator to the results of a UNION ALL.
:::
### Table-level
@ -250,8 +256,10 @@ Add "EXPLAIN PLAN FOR" to the beginning of any query to get information about ho
the query will not actually be executed. Refer to the [Query translation](sql-translation.md#interpreting-explain-plan-output)
documentation for more information on the output of EXPLAIN PLAN.
> For the legacy plan, be careful when interpreting EXPLAIN PLAN output, and use [request logging](../configuration/index.md#request-logging) if in doubt.
:::info
For the legacy plan, be careful when interpreting EXPLAIN PLAN output, and use [request logging](../configuration/index.md#request-logging) if in doubt.
Request logs show the exact native query that will be run. Alternatively, to see the native query plan, set `useNativeQueryExplain` to true in the query context.
:::
## Identifiers and literals

View File

@ -23,9 +23,11 @@ sidebar_label: "TimeBoundary"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type that is only available in the native language.
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type that is only available in the native language.
:::
Time boundary queries return the earliest and latest data points of a data set. The grammar is:

View File

@ -23,10 +23,12 @@ sidebar_label: "Timeseries"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type in the native language. For information about when Druid SQL will use this query type, refer to the
> [SQL documentation](sql-translation.md#query-types).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type in the native language. For information about when Druid SQL will use this query type, refer to the
[SQL documentation](sql-translation.md#query-types).
:::
These types of queries take a timeseries query object and return an array of JSON objects where each object represents a value asked for by the timeseries query.

View File

@ -22,9 +22,11 @@ title: "Sorting (topN)"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about sorting in SQL, refer to the [SQL documentation](sql.md#order-by).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about sorting in SQL, refer to the [SQL documentation](sql.md#order-by).
:::
In Apache Druid, the topN metric spec specifies how topN values should be sorted.

View File

@ -23,10 +23,12 @@ sidebar_label: "TopN"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes a query
> type in the native language. For information about when Druid SQL will use this query type, refer to the
> [SQL documentation](sql-translation.md#query-types).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes a query
type in the native language. For information about when Druid SQL will use this query type, refer to the
[SQL documentation](sql-translation.md#query-types).
:::
Apache Druid TopN queries return a sorted set of results for the values in a given dimension according to some criteria. Conceptually, they can be thought of as an approximate [GroupByQuery](../querying/groupbyquery.md) over a single dimension with an [Ordering](../querying/limitspec.md) spec. TopNs are much faster and resource efficient than GroupBys for this use case. These types of queries take a topN query object and return an array of JSON objects where each object represents a value asked for by the topN query.

View File

@ -22,10 +22,12 @@ title: "Virtual columns"
~ under the License.
-->
> Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
> This document describes the native
> language. For information about functions available in SQL, refer to the
> [SQL documentation](sql-scalar.md).
:::info
Apache Druid supports two query languages: [Druid SQL](sql.md) and [native queries](querying.md).
This document describes the native
language. For information about functions available in SQL, refer to the
[SQL documentation](sql-scalar.md).
:::
Virtual columns are queryable column "views" created from a set of columns during a query.

View File

@ -136,8 +136,10 @@ We recommend running your favorite Linux distribution. You will also need
* [Java 8u92+, 11, or 17](../operations/java.md)
* Python 2 or Python 3
> If needed, you can specify where to find Java using the environment variables
> `DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the `bin/verify-java` script.
:::info
If needed, you can specify where to find Java using the environment variables
`DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the `bin/verify-java` script.
:::
For information about installing Java, see the documentation for your OS package manager. If your Ubuntu-based OS does not have a recent enough version of Java, WebUpd8 offers [packages for those
OSes](http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html).
@ -409,8 +411,10 @@ inbound connections on the following:
- 8082 (Broker)
- 8088 (Router, if used)
> In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware,
> rather than on the Master server.
:::info
In production, we recommend deploying ZooKeeper and your metadata store on their own dedicated hardware,
rather than on the Master server.
:::
## Start Master Server
@ -439,7 +443,9 @@ can start the Master server processes together with ZK using:
bin/start-cluster-master-with-zk-server
```
> In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.
:::info
In production, we also recommend running a ZooKeeper cluster on its own dedicated hardware.
:::
## Start Data Server
@ -453,8 +459,10 @@ bin/start-cluster-data-server
You can add more Data servers as needed.
> For clusters with complex resource allocation needs, you can break apart Historicals and MiddleManagers and scale the components individually.
> This also allows you take advantage of Druid's built-in MiddleManager autoscaling facility.
:::info
For clusters with complex resource allocation needs, you can break apart Historicals and MiddleManagers and scale the components individually.
This also allows you take advantage of Druid's built-in MiddleManager autoscaling facility.
:::
## Start Query Server

View File

@ -150,7 +150,10 @@ You can now see the data as a datasource in the console and try out a query, as
![Datasource view](../assets/tutorial-batch-data-loader-10.png "Datasource view")
> Notice the other actions you can perform for a datasource, including configuring retention rules, compaction, and more.
:::info
Notice the other actions you can perform for a datasource, including configuring retention rules, compaction, and more.
:::
3. Run the prepopulated query, `SELECT * FROM "wikipedia"` to see the results.
![Query view](../assets/tutorial-batch-data-loader-11.png "Query view")

View File

@ -51,7 +51,9 @@ Submit the spec as follows to create a datasource called `compaction-tutorial`:
bin/post-index-task --file quickstart/tutorial/compaction-init-index.json --url http://localhost:8081
```
> `maxRowsPerSegment` in the tutorial ingestion spec is set to 1000 to generate multiple segments per hour for demonstration purposes. Do not use this spec in production.
:::info
`maxRowsPerSegment` in the tutorial ingestion spec is set to 1000 to generate multiple segments per hour for demonstration purposes. Do not use this spec in production.
:::
After the ingestion completes, navigate to [http://localhost:8888/unified-console.html#datasources](http://localhost:8888/unified-console.html#datasources) in a browser to see the new datasource in the web console.

Some files were not shown because too many files have changed in this diff Show More