Doc fixes for query from deep storage and MSQ (#15313)

Minor updates to the documentation.

    Added prerequisites.
    Removed a known issue in MSQ since its no longer valid.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
This commit is contained in:
Karan Kumar 2023-11-03 10:52:20 +05:30 committed by GitHub
parent 9576fd3141
commit 5036af6fb3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 6 additions and 4 deletions

View File

@ -42,10 +42,6 @@ an [UnknownError](./reference.md#error_UnknownError) with a message including "N
- `GROUPING SETS` are not implemented. Queries using these features return a
[QueryNotSupported](reference.md#error_QueryNotSupported) error.
- For some `COUNT DISTINCT` queries, you'll encounter a [QueryNotSupported](reference.md#error_QueryNotSupported) error
that includes `Must not have 'subtotalsSpec'` as one of its causes. This is caused by the planner attempting to use
`GROUPING SET`s, which are not implemented.
- The numeric varieties of the `EARLIEST` and `LATEST` aggregators do not work properly. Attempting to use the numeric
varieties of these aggregators lead to an error like
`java.lang.ClassCastException: class java.lang.Double cannot be cast to class org.apache.druid.collections.SerializablePair`.

View File

@ -24,6 +24,10 @@ title: "Query from deep storage"
Druid can query segments that are only stored in deep storage. Running a query from deep storage is slower than running queries from segments that are loaded on Historical processes, but it's a great tool for data that you either access infrequently or where the low latency results that typical Druid queries provide is not necessary. Queries from deep storage can increase the surface area of data available to query without requiring you to scale your Historical processes to accommodate more segments.
## Prerequisites
Query from deep storage requires the Multi-stage query (MSQ) task engine. Load the extension for it if you don't already have it enabled before you begin. See [enable MSQ](../multi-stage-query/index.md#load-the-extension) for more information.
## Keep segments in deep storage only
Any data you ingest into Druid is already stored in deep storage, so you don't need to perform any additional configuration from that perspective. However, to take advantage of the cost savings that querying from deep storage provides, make sure not all your segments get loaded onto Historical processes.

View File

@ -31,6 +31,8 @@ To run the queries in this tutorial, replace `ROUTER:PORT` with the location of
For more general information, see [Query from deep storage](../querying/query-from-deep-storage.md).
If you are trying this feature on an existing cluster, make sure query from deep storage [prerequisites](../querying/query-from-deep-storage.md#prerequisites) are met.
## Load example data
Use the **Load data** wizard or the following SQL query to ingest the `wikipedia` sample datasource bundled with Druid. If you use the wizard, make sure you change the partitioning to be by hour.