mirror of https://github.com/apache/druid.git
fix protobuf extension packaging and docs (#9320)
* fix protobuf extension packaging and docs * fix paths * Update protobuf.md * Update protobuf.md
This commit is contained in:
parent
53bb45fc9a
commit
b55657cc26
|
@ -30,16 +30,16 @@ for [stream ingestion](../../ingestion/index.md#streaming). See corresponding do
|
||||||
|
|
||||||
## Example: Load Protobuf messages from Kafka
|
## Example: Load Protobuf messages from Kafka
|
||||||
|
|
||||||
This example demonstrates how to load Protobuf messages from Kafka. Please read the [Load from Kafka tutorial](../../tutorials/tutorial-kafka.md) first. This example will use the same "metrics" dataset.
|
This example demonstrates how to load Protobuf messages from Kafka. Please read the [Load from Kafka tutorial](../../tutorials/tutorial-kafka.md) first, and see [Kafka Indexing Service](./kafka-ingestion.md) documentation for more details.
|
||||||
|
|
||||||
Files used in this example are found at `./examples/quickstart/protobuf` in your Druid directory.
|
The files used in this example are found at [`./examples/quickstart/protobuf` in your Druid directory](https://github.com/apache/druid/tree/master/examples/quickstart/protobuf).
|
||||||
|
|
||||||
- We will use [Kafka Indexing Service](./kafka-ingestion.md).
|
For this example:
|
||||||
- Kafka broker host is `localhost:9092`.
|
- Kafka broker host is `localhost:9092`
|
||||||
- Kafka topic is `metrics_pb` instead of `metrics`.
|
- Kafka topic is `metrics_pb`
|
||||||
- datasource name is `metrics-kafka-pb` instead of `metrics-kafka` to avoid the confusion.
|
- Datasource name is `metrics-protobuf`
|
||||||
|
|
||||||
Here is the metrics JSON example.
|
Here is a JSON example of the 'metrics' data schema used in the example.
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
@ -56,7 +56,7 @@ Here is the metrics JSON example.
|
||||||
|
|
||||||
### Proto file
|
### Proto file
|
||||||
|
|
||||||
The proto file should look like this. Save it as metrics.proto.
|
The corresponding proto file for our 'metrics' dataset looks like this.
|
||||||
|
|
||||||
```
|
```
|
||||||
syntax = "proto3";
|
syntax = "proto3";
|
||||||
|
@ -74,28 +74,27 @@ message Metrics {
|
||||||
|
|
||||||
### Descriptor file
|
### Descriptor file
|
||||||
|
|
||||||
Using the `protoc` Protobuf compiler to generate the descriptor file. Save the metrics.desc file either in the classpath or reachable by URL. In this example the descriptor file was saved at /tmp/metrics.desc.
|
Next, we use the `protoc` Protobuf compiler to generate the descriptor file and save it as `metrics.desc`. The descriptor file must be either in the classpath or reachable by URL. In this example the descriptor file was saved at `/tmp/metrics.desc`, however this file is also available in the example files. From your Druid install directory:
|
||||||
|
|
||||||
```
|
```
|
||||||
protoc -o /tmp/metrics.desc metrics.proto
|
protoc -o /tmp/metrics.desc ./quickstart/protobuf/metrics.proto
|
||||||
```
|
```
|
||||||
|
|
||||||
### Supervisor spec JSON
|
## Create Kafka Supervisor
|
||||||
|
|
||||||
Below is the complete Supervisor spec JSON to be submitted to the Overlord.
|
Below is the complete Supervisor spec JSON to be submitted to the Overlord.
|
||||||
Please make sure these keys are properly configured for successful ingestion.
|
Make sure these keys are properly configured for successful ingestion.
|
||||||
|
|
||||||
- `descriptor` for the descriptor file URL.
|
Important supervisor properties
|
||||||
- `protoMessageType` from the proto definition.
|
- `descriptor` for the descriptor file URL
|
||||||
- parseSpec `format` must be `json`.
|
- `protoMessageType` from the proto definition
|
||||||
- `topic` to subscribe. The topic is "metrics_pb" instead of "metrics".
|
- `parser` should have `type` set to `protobuf`, but note that the `format` of the `parseSpec` must be `json`
|
||||||
- `bootstrap.server` is the Kafka broker host.
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"type": "kafka",
|
"type": "kafka",
|
||||||
"dataSchema": {
|
"dataSchema": {
|
||||||
"dataSource": "metrics-kafka2",
|
"dataSource": "metrics-protobuf",
|
||||||
"parser": {
|
"parser": {
|
||||||
"type": "protobuf",
|
"type": "protobuf",
|
||||||
"descriptor": "file:///tmp/metrics.desc",
|
"descriptor": "file:///tmp/metrics.desc",
|
||||||
|
@ -165,18 +164,55 @@ Please make sure these keys are properly configured for successful ingestion.
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Kafka Producer
|
## Adding Protobuf messages to Kafka
|
||||||
|
|
||||||
Here is the sample script that publishes the metrics to Kafka in Protobuf format.
|
If necessary, from your Kafka installation directory run the following command to create the Kafka topic
|
||||||
|
|
||||||
1. Run `protoc` again with the Python binding option. This command generates `metrics_pb2.py` file.
|
```
|
||||||
```
|
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic metrics_pb
|
||||||
protoc -o metrics.desc metrics.proto --python_out=.
|
```
|
||||||
```
|
|
||||||
|
|
||||||
2. Create Kafka producer script.
|
This example script requires `protobuf` and `kafka-python` modules. With the topic in place, messages can be inserted running the following command from your Druid installation directory
|
||||||
|
|
||||||
This script requires `protobuf` and `kafka-python` modules.
|
```
|
||||||
|
./bin/generate-example-metrics | ./quickstart/protobuf/pb_publisher.py
|
||||||
|
```
|
||||||
|
|
||||||
|
You can confirm that data has been inserted to your Kafka topic using the following command from your Kafka installation directory
|
||||||
|
|
||||||
|
```
|
||||||
|
./bin/kafka-console-consumer --zookeeper localhost --topic metrics_pb
|
||||||
|
```
|
||||||
|
|
||||||
|
which should print messages like this
|
||||||
|
|
||||||
|
```
|
||||||
|
millisecondsGETR"2017-04-06T03:23:56Z*2002/list:request/latencyBwww1.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
If your supervisor created in the previous step is running, the indexing tasks should begin producing the messages and the data will soon be available for querying in Druid.
|
||||||
|
|
||||||
|
## Generating the example files
|
||||||
|
|
||||||
|
The files provided in the example quickstart can be generated in the following manner starting with only `metrics.proto`.
|
||||||
|
|
||||||
|
### `metrics.desc`
|
||||||
|
|
||||||
|
The descriptor file is generated using `protoc` Protobuf compiler. Given a `.proto` file, a `.desc` file can be generated like so.
|
||||||
|
|
||||||
|
```
|
||||||
|
protoc -o metrics.desc metrics.proto
|
||||||
|
```
|
||||||
|
|
||||||
|
### `metrics_pb2.py`
|
||||||
|
`metrics_pb2.py` is also generated with `protoc`
|
||||||
|
|
||||||
|
```
|
||||||
|
protoc -o metrics.desc metrics.proto --python_out=.
|
||||||
|
```
|
||||||
|
|
||||||
|
### `pb_publisher.py`
|
||||||
|
After `metrics_pb2.py` is generated, another script can be constructed to parse JSON data, convert it to Protobuf, and produce to a Kafka topic
|
||||||
|
|
||||||
```python
|
```python
|
||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
@ -187,32 +223,17 @@ import json
|
||||||
from kafka import KafkaProducer
|
from kafka import KafkaProducer
|
||||||
from metrics_pb2 import Metrics
|
from metrics_pb2 import Metrics
|
||||||
|
|
||||||
|
|
||||||
producer = KafkaProducer(bootstrap_servers='localhost:9092')
|
producer = KafkaProducer(bootstrap_servers='localhost:9092')
|
||||||
topic = 'metrics_pb'
|
topic = 'metrics_pb'
|
||||||
metrics = Metrics()
|
|
||||||
|
|
||||||
for row in iter(sys.stdin):
|
for row in iter(sys.stdin):
|
||||||
d = json.loads(row)
|
d = json.loads(row)
|
||||||
|
metrics = Metrics()
|
||||||
for k, v in d.items():
|
for k, v in d.items():
|
||||||
setattr(metrics, k, v)
|
setattr(metrics, k, v)
|
||||||
pb = metrics.SerializeToString()
|
pb = metrics.SerializeToString()
|
||||||
producer.send(topic, pb)
|
producer.send(topic, pb)
|
||||||
```
|
|
||||||
|
|
||||||
3. run producer
|
|
||||||
|
|
||||||
```
|
producer.flush()
|
||||||
./bin/generate-example-metrics | ./pb_publisher.py
|
|
||||||
```
|
|
||||||
|
|
||||||
4. test
|
|
||||||
|
|
||||||
```
|
|
||||||
kafka-console-consumer --zookeeper localhost --topic metrics_pb
|
|
||||||
```
|
|
||||||
|
|
||||||
It should print messages like this
|
|
||||||
|
|
||||||
```
|
|
||||||
millisecondsGETR"2017-04-06T03:23:56Z*2002/list:request/latencyBwww1.example.com
|
|
||||||
```
|
```
|
||||||
|
|
|
@ -32,3 +32,5 @@ for row in iter(sys.stdin):
|
||||||
setattr(metrics, k, v)
|
setattr(metrics, k, v)
|
||||||
pb = metrics.SerializeToString()
|
pb = metrics.SerializeToString()
|
||||||
producer.send(topic, pb)
|
producer.send(topic, pb)
|
||||||
|
|
||||||
|
producer.flush()
|
||||||
|
|
|
@ -109,7 +109,7 @@
|
||||||
<artifactId>maven-shade-plugin</artifactId>
|
<artifactId>maven-shade-plugin</artifactId>
|
||||||
<version>3.2.1</version>
|
<version>3.2.1</version>
|
||||||
<configuration>
|
<configuration>
|
||||||
<createDependencyReducedPom>false</createDependencyReducedPom>
|
<createDependencyReducedPom>true</createDependencyReducedPom>
|
||||||
<relocations>
|
<relocations>
|
||||||
<relocation>
|
<relocation>
|
||||||
<pattern>com.google.protobuf</pattern>
|
<pattern>com.google.protobuf</pattern>
|
||||||
|
|
Loading…
Reference in New Issue