druid/extensions-contrib/grpc-query/README.md

313 lines
13 KiB
Markdown

<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
# gRPC Query Extension for Druid
This extension provides a gRPC API for SQL and Native queries.
Druid uses REST as its RPC protocol. Druid has a large variety of REST operations
including query, ingest jobs, monitoring, configuration and many more. Although
REST is a universally supported RPC format, it is not the only one in use. This
extension allows gRPC-based clients to issue SQL queries.
Druid is optimized for high-concurrency, low-complexity queries that return a
small result set (a few thousand rows at most). The small-query focus allows
Druid to offer a simple, stateless request/response REST API. This gRPC API
follows that Druid pattern: it is optimized for simple queries and follows
Druid's request/response model. APIs such as JDBC can handle larger results
because they are stateful: a client can request pages of results using multiple
API calls. This API does not support paging: the entire result set is returned
in the response, resulting in an API which is fast for small queries, and not
suitable for larger result sets.
## Use Cases
The gRPC query extension can be used in two ways, depending on the selected
result format.
### CSV or JSON Response Format
The simplest way to use the gRPC extension is to send a query request that
uses CSV or JSON as the return format. The client simply pulls the results
from the response and does something useful with them. For the CSV format,
headers can be created from the column metadata in the response message.
### Protobuf Response Format
Some applications want to use Protobuf as the result format. In this case,
the extension encodes Protobuf-encoded rows as the binary payload of the query
response. This works for an application which uses a fixed set of queries, each
of which is carefully designed to power one application, say a dashboard. The
(simplified) message flow is:
```text
+-----------+ query -> +-------+
| Dashboard | -- gRPC --> | Druid |
+-----------+ <- data +-------+
```
In practice, there may be multiple proxy layers: one on the application side, and
the Router on the Druid side.
The dashboard displays a fixed set of reports and charts. Each of those sends a
well-defined query specified as part of the application. The returned data is thus
both well-known and fixed for each query. The set of queries is fixed by the contents
of the dashboard. That is, this is not an ad-hoc query use case.
Because the queries are locked down, and are part of the application, the set of valid
result sets is also well known and locked down. Given this well-controlled use case, it
is possible to use a pre-defined Protobuf message to represent the results of each distinct
query. (Protobuf is a compiled format: the solution works only because the set of messages
are well known. It would not work for the ad-hoc case in which each query has a different
result set schema.)
To be very clear: the application has a fixed set of queries to be sent to Druid via gRPC.
For each query, there is a fixed Protobuf response format defined by the application.
No other queries, aside from this well-known set, will be sent to the gRPC endpoint using
the Protobuf response format. If the set of queries is not well-defined, use the CSV
or JSON response format instead.
## Installation
The gRPC query extension is a "contrib" extension and is not installed by default when
you install Druid. Instead, you must install it manually.
In development, you can build Druid with all the "contrib" extensions. When building
Druid, include the `-P bundle-contrib-exts` in addition to the `-P dist` option:
```bash
mvn package -Pdist,bundle-contrib-exts ...
```
In production, follow the [Druid documentation](https://druid.apache.org/docs/latest/development/extensions.html).
To enable the extension, add the following to the load list in
`_commmon/common.runtime.properties`:
```text
druid.extensions.loadList=[..., "grpc-query"]
```
Adding the extension to the load list automatically enables the extension,
but only in the Broker.
If you use the Protobuf response format, bundle up your Protobuf classes
into a jar file, and place that jar file in the
`$DRUID_HOME/extensions/grpc-query` directory. The Protobuf classes will
appear on the class path and will be available from the `grpc-query`
extension.
### Configuration
Enable and configure the extension in `broker/runtime.properties`:
```text
druid.grpcQuery.port=50051
```
The default port is 50051 (preliminary).
If you use the Protobuf response format, bundle up your Protobuf classes
into a jar file, and place that jar file in the
`$DRUID_HOME/extensions/grpc-query` directory. The Protobuf classes will
appear on the class path and will be available from the `grpc-query`
extension.
## Usage
See the `src/main/proto/query.proto` file in the `grpc-query` project for the request and
response message formats. The request message format closely follows the REST JSON message
format. The response is optimized for gRPC: it contains an error (if the request fails),
or the result schema and result data as a binary payload. You can query the gRPC endpoint
with any gRPC client.
Although both Druid SQL and Druid itself support a `float` data type, that type is not
usable in a Protobuf response object. Internally Druid converts all `float` values to
`double`. As a result, the Protobuf reponse object supports only the `double` type.
An attempt to use `float` will lead to a runtime error when processing the query.
Use the `double` type instead.
Sample request,
```
QueryRequest.newBuilder()
.setQuery("SELECT * FROM foo")
.setResultFormat(QueryResultFormat.CSV)
.setQueryType(QueryOuterClass.QueryType.SQL)
.build();
```
When using Protobuf response format, bundle up your Protobuf classes
into a jar file, and place that jar file in the
`$DRUID_HOME/extensions/grpc-query` directory.
Specify the response Protobuf message name in the request.
```
QueryRequest.newBuilder()
.setQuery("SELECT dim1, dim2, dim3, cnt, m1, m2, unique_dim1, __time AS "date" FROM foo")
.setQueryType(QueryOuterClass.QueryType.SQL)
.setProtobufMessageName(QueryResult.class.getName())
.setResultFormat(QueryResultFormat.PROTOBUF_INLINE)
.build();
Response message
message QueryResult {
string dim1 = 1;
string dim2 = 2;
string dim3 = 3;
int64 cnt = 4;
float m1 = 5;
double m2 = 6;
bytes unique_dim1 = 7;
google.protobuf.Timestamp date = 8;
}
```
## Security
The extension supports both "anonymous" and basic authorization. Anonymous is the mode
for an out-of-the-box Druid: no authorization needed. The extension does not yet support
other security extensions: each needs its own specific integration.
Clients that use basic authentication must include a set of credentials. See
`BasicCredentials` for a typical implementation and `BasicAuthTest` for how to
configure the credentials in the client.
## Implementation Notes
This project contains several components:
* Guice module and associated server initialization code.
* Netty-based gRPC server.
* A "driver" that performs the actual query and generates the results.
## Debugging
Debugging of the gRPC extension requires extra care due to the nuances of loading
classes from an extension.
### Running in a Server
Druid extensions are designed to run in the Druid server. The gRPC extension is
loaded only in the Druid broker using the contiguration described above. If something
fails during startup, the Broker will crash. Consult the Broker logs to determine
what went wrong. Startup failures are typically due to required jars not being installed
as part of the extension. Check the `pom.xml` file to track down what's missing.
Failures can also occur when running a query. Such failures will result in a failure
response and should result in a log entry in the Broker log file. Use the log entry
to sort out what went wrong.
You can also attach a debugger to the running process. You'll have to enable the debugger
in the server by adding the required parameters to the Broker's `jvm.config` file.
### Debugging using Unit Tests
To debug the functionality of the extension, your best bet is to debug in the context
of a unit test. Druid provides a special test-only SQL stack with a few pre-defined
datasources. See the various `CalciteQueryTest` classes to see what these are. You can
also query Druid's various system tables. See `GrpcQueryTest` for a simple "starter"
unit test that configures the server and uses an in-process client to send requests.
Most unit testing can be done without the gRPC server, by calling the `QueryDriver`
class directly. That is, if the goal is work with the code that takes a request, runs
a query, and produces a response, then the driver is the key and the server is just a
bit of extra copmlexity. See the `DriverTest` class for an example unit test.
### Debugging in a Server in an IDE
We would like to be able to debug the gRPC extension, within the Broker, in an IDE.
As it turns out, doing so breaks Druid's class loader mechanisms in ways that are both
hard to understand and hard to work around. When run in a server, Java creates an instance
of `GrpcQueryModule` using the extension's class loader. Java then uses that same class
loader to load other classes in the extension, including those here and those in the
shaded gRPC jar file.
However, when run in an IDE, if this project is on the class path, then the `GrpcQueryModule`
class will be loaded from the "App" class loader. This works fine: it causes the other
classes of this module to also be loaded from the class path. However, once execution
calls into gRPC, Java will use the App class loader, not the extension class loader, and
will fail to find some of the classes, resulting in Java exceptions. Worse, in some cases,
Java may load the same class from both class loaders. To Java, these are not the same
classes, and you will get mysterious errors as a result.
For now, the lesson is: don't try to debug the extension in the Broker in the IDE. Use
one of the above options instead.
For reference (and in case we figure out a solution to the class loader conflict),
the way to debug the Broker in an IDE is the following:
* Build your branch. Use the `-P bundle-contrib-exts` flag in place of `-P dist`, as described
above.
* Create an install from the distribution produced above.
* Use the `single-server/micro-quickstart` config for debugging.
* Configure the installation using the steps above.
* Modify the Supervisor config for your config to comment out the line that launches
the broker. Use the hash (`#`) character to comment out the line.
* In your IDE, define a launch configuration for the Broker.
* The launch command is `server broker`
* Add the following JVM arguments:
```text
--add-exports java.base/jdk.internal.perf=ALL-UNNAMED
--add-exports jdk.management/com.sun.management.internal=ALL-UNNAMED
```
* Define `grpc-query` as a project dependency. (This is for Eclipse; IntelliJ may differ.)
* Configure the class path to include the common and Broker properties files.
* Launch the micro-quickstart cluster.
* Launch the Broker in your IDE.
### gRPC Logging
Debugging of the gRPC stack is difficult since the shaded jar loses source attachments.
Logging helps. gRPC logging is not enabled via Druid's logging system. Intead, [create
the following `logging.properties` file](https://stackoverflow.com/questions/50243717/grpc-logger-level):
```text
handlers=java.util.logging.ConsoleHandler
io.grpc.level=FINE
java.util.logging.ConsoleHandler.level=FINE
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
```
Then, pass the following on the command line:
```text
-Djava.util.logging.config.file=logging.properties
```
Adjust the path to the file depending on where you put the file.
## Acknowledgements
This is not the first project to have created a gRPC API for Druid. Others include:
* [[Proposal] define a RPC protocol for querying data, support apache Arrow as data
exchange interface](https://github.com/apache/druid/issues/3891)
* [gRPC Druid extension PoC](https://github.com/ndolgov/gruid)
* [Druid gRPC-json server extension](https://github.com/apache/druid/pull/6798)
Full credit goes to those who have gone this way before.
Note that the class loader solution used by the two code bases above turned out
to not be needed. See the notes above about the class loader issues.