From 4b2f1adecf6332fe7c41c068c6cdc78d2fbf4551 Mon Sep 17 00:00:00 2001 From: Gian Merlino Date: Sat, 17 Sep 2022 10:47:57 -0700 Subject: [PATCH] Docs: Clarify the situation with SELECT. (#13109) --- docs/multi-stage-query/api.md | 28 ++++++++++++++-------------- docs/multi-stage-query/concepts.md | 11 ++++++----- docs/multi-stage-query/index.md | 9 +++++---- 3 files changed, 25 insertions(+), 23 deletions(-) diff --git a/docs/multi-stage-query/api.md b/docs/multi-stage-query/api.md index 8532ea6817c..96e30e0ea72 100644 --- a/docs/multi-stage-query/api.md +++ b/docs/multi-stage-query/api.md @@ -42,15 +42,15 @@ You submit queries to the MSQ task engine using the `POST /druid/v2/sql/task/` e #### Request -Currently, the MSQ task engine ignores the provided values of `resultFormat`, `header`, -`typesHeader`, and `sqlTypesHeader`. SQL SELECT queries write out their results into the task report (in the `multiStageQuery.payload.results.results` key) formatted as if `resultFormat` is an `array`. +The SQL task endpoint accepts [SQL requests in the JSON-over-HTTP form](../querying/sql-api.md#request-body) using the +`query`, `context`, and `parameters` fields, but ignoring the `resultFormat`, `header`, `typesHeader`, and +`sqlTypesHeader` fields. -For task queries similar to the [example queries](./examples.md), you need to escape characters such as quotation marks (") if you use something like `curl`. -You don't need to escape characters if you use a method that can parse JSON seamlessly, such as Python. -The Python example in this topic escapes quotation marks although it's not required. +This endpoint accepts [INSERT](reference.md#insert) and [REPLACE](reference.md#replace) statements. -The following example is the same query that you submit when you complete [Convert a JSON ingestion -spec](../tutorials/tutorial-msq-convert-spec.md) where you insert data into a table named `wikipedia`. +As an experimental feature, this endpoint also accepts SELECT queries. SELECT query results are collected from workers +by the controller, and written into the [task report](#get-the-report-for-a-query-task) as an array of arrays. The +behavior and result format of plain SELECT queries (without INSERT or REPLACE) is subject to change. @@ -199,9 +199,12 @@ A report provides detailed information about a query task, including things like Keep the following in mind when using the task API to view reports: -- For SELECT queries, the report includes the results. At this time, if you want to view results for SELECT queries, you need to retrieve them as a generic map from the report and extract the results. -- The task report stores query details for controller tasks. -- If you encounter `500 Server Error` or `404 Not Found` errors, the task may be in the process of starting up or shutting down. +- The task report for an entire job is associated with the `query_controller` task. The `query_worker` tasks do not have + their own reports; their information is incorporated into the controller report. +- The task report API may report `404 Not Found` temporarily while the task is in the process of starting up. +- As an experimental feature, the SQL task engine supports running SELECT queries. SELECT query results are written into +the `multiStageQuery.payload.results.results` task report key as an array of arrays. The behavior and result format of plain +SELECT queries (without INSERT or REPLACE) is subject to change. For an explanation of the fields in a report, see [Report response fields](#report-response-fields). @@ -230,11 +233,8 @@ import requests # Make sure you replace `username`, `password`, `your-instance`, `port`, and `taskId` with the values for your deployment. url = "https://:@:/druid/indexer/v1/task//reports" -payload={} headers = {} - -response = requests.request("GET", url, headers=headers, data=payload) - +response = requests.request("GET", url, headers=headers) print(response.text) ``` diff --git a/docs/multi-stage-query/concepts.md b/docs/multi-stage-query/concepts.md index ea65fd76de7..5d12a9927bf 100644 --- a/docs/multi-stage-query/concepts.md +++ b/docs/multi-stage-query/concepts.md @@ -29,14 +29,15 @@ sidebar_label: "Key concepts" ## SQL task engine -The `druid-multi-stage-query` extension adds a multi-stage query (MSQ) task engine that executes SQL SELECT, -[INSERT](reference.md#insert), and [REPLACE](reference.md#replace) statements as batch tasks in the indexing service, -which execute on [Middle Managers](../design/architecture.md#druid-services). INSERT and REPLACE tasks publish +The `druid-multi-stage-query` extension adds a multi-stage query (MSQ) task engine that executes SQL statements as batch +tasks in the indexing service, which execute on [Middle Managers](../design/architecture.md#druid-services). +[INSERT](reference.md#insert) and [REPLACE](reference.md#replace) tasks publish [segments](../design/architecture.md#datasources-and-segments) just like [all other forms of batch ingestion](../ingestion/index.md#batch). Each query occupies at least two task slots while running: one controller task, -and at least one worker task. +and at least one worker task. As an experimental feature, the MSQ task engine also supports running SELECT queries as +batch tasks. The behavior and result format of plain SELECT (without INSERT or REPLACE) is subject to change. -You can execute queries using the MSQ task engine through the **Query** view in the [web +You can execute SQL statements using the MSQ task engine through the **Query** view in the [web console](../operations/web-console.md) or through the [`/druid/v2/sql/task` API](api.md). For more details on how SQL queries are executed using the MSQ task engine, see [multi-stage query diff --git a/docs/multi-stage-query/index.md b/docs/multi-stage-query/index.md index d97de6dd633..64130aa03c8 100644 --- a/docs/multi-stage-query/index.md +++ b/docs/multi-stage-query/index.md @@ -30,11 +30,12 @@ description: Introduces multi-stage query architecture and its task engine Apache Druid supports SQL-based ingestion using the bundled [`druid-multi-stage-query` extension](#load-the-extension). This extension adds a [multi-stage query task engine for SQL](concepts.md#sql-task-engine) that allows running SQL -[INSERT](concepts.md#insert) and [REPLACE](concepts.md#replace) statements as batch tasks. +[INSERT](concepts.md#insert) and [REPLACE](concepts.md#replace) statements as batch tasks. As an experimental feature, +the task engine also supports running SELECT queries as batch tasks. -Nearly all SELECT capabilities are available for `INSERT ... SELECT` and `REPLACE ... SELECT` queries, with certain -exceptions listed on the [Known issues](./known-issues.md#select) page. This allows great flexibility to apply -transformations, filters, JOINs, aggregations, and so on while ingesting data. This also allows in-database +Nearly all SELECT capabilities are available in the SQL task engine, with certain exceptions listed on the [Known +issues](./known-issues.md#select) page. This allows great flexibility to apply transformations, filters, JOINs, +aggregations, and so on as part of `INSERT ... SELECT` and `REPLACE ... SELECT` statements. This also allows in-database transformation: creating new tables based on queries of other tables. ## Vocabulary