mirror of https://github.com/apache/druid.git
docs: Refresh docs for SQL input source (#17031)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
parent
9696f0b37c
commit
2e2f3cf66a
|
@ -31,9 +31,9 @@ This module can be used side to side with other lookup module like the global ca
|
|||
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-single` in the extensions load list.
|
||||
|
||||
:::info
|
||||
If using JDBC, you will need to add your database's client JAR files to the extension's directory.
|
||||
To use JDBC, you must add your database client JAR files to the extension's directory.
|
||||
For Postgres, the connector JAR is already included.
|
||||
See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
|
||||
See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#install-mysql-connectorj) or [MariaDB](./mysql.md#install-mariadb-connectorj) connector libraries.
|
||||
Copy or symlink the downloaded file to `extensions/druid-lookups-cached-single` under the distribution root directory.
|
||||
:::
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
id: mysql
|
||||
title: "MySQL Metadata Store"
|
||||
title: "MySQL metadata store"
|
||||
---
|
||||
|
||||
<!--
|
||||
|
@ -25,41 +25,58 @@ title: "MySQL Metadata Store"
|
|||
|
||||
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.
|
||||
|
||||
:::info
|
||||
The MySQL extension requires the MySQL Connector/J library or MariaDB Connector/J library, neither of which are included in the Druid distribution.
|
||||
Refer to the following section for instructions on how to install this library.
|
||||
:::
|
||||
With the MySQL extension, you can use MySQL as a metadata store or ingest from a MySQL database.
|
||||
|
||||
## Installing the MySQL connector library
|
||||
The extension requires a connector library that's not included with Druid.
|
||||
See the [Prerequisites](#prerequisites) for installation instructions.
|
||||
|
||||
This extension can use Oracle's MySQL JDBC driver which is not included in the Druid distribution. You must
|
||||
install it separately. There are a few ways to obtain this library:
|
||||
## Prerequisites
|
||||
|
||||
- It can be downloaded from the MySQL site at: https://dev.mysql.com/downloads/connector/j/
|
||||
- It can be fetched from Maven Central at: https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.2.0/mysql-connector-j-8.2.0.jar
|
||||
- It may be available through your package manager, e.g. as `libmysql-java` on APT for a Debian-based OS
|
||||
To use the MySQL extension, you need to install one of the following libraries:
|
||||
* [MySQL Connector/J](#install-mysql-connectorj)
|
||||
* [MariaDB Connector/J](#install-mariadb-connectorj)
|
||||
|
||||
This fetches the MySQL connector JAR file with a name like `mysql-connector-j-8.2.0.jar`.
|
||||
### Install MySQL Connector/J
|
||||
|
||||
Copy or symlink this file inside the folder `extensions/mysql-metadata-storage` under the distribution root directory.
|
||||
The MySQL extension uses Oracle's MySQL JDBC driver.
|
||||
The current version of Druid uses version 8.2.0.
|
||||
Other versions may not work with this extension.
|
||||
|
||||
## Alternative: Installing the MariaDB connector library
|
||||
You can download the library from one of the following sources:
|
||||
|
||||
This extension also supports using the MariaDB connector jar, though it is also not included in the Druid distribution, so you must install it separately.
|
||||
- [MySQL website](https://dev.mysql.com/downloads/connector/j/)
|
||||
Visit the archives page to access older product versions.
|
||||
- [Maven Central (direct download)](https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.2.0/mysql-connector-j-8.2.0.jar)
|
||||
- Your package manager. For example, `libmysql-java` on APT for a Debian-based OS.
|
||||
|
||||
- Download from the MariaDB site: https://mariadb.com/downloads/connector
|
||||
- Download from Maven Central: https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.3/mariadb-java-client-2.7.3.jar
|
||||
The download includes the MySQL connector JAR file with a name like `mysql-connector-j-8.2.0.jar`.
|
||||
Copy or create a symbolic link to this file inside the `lib` folder in the distribution root directory.
|
||||
|
||||
This fetches the MariaDB connector JAR file with a name like `maria-java-client-2.7.3.jar`.
|
||||
### Install MariaDB Connector/J
|
||||
|
||||
Copy or symlink this file to `extensions/mysql-metadata-storage` under the distribution root directory.
|
||||
This extension also supports using the MariaDB connector jar.
|
||||
The current version of Druid uses version 2.7.3.
|
||||
Other versions may not work with this extension.
|
||||
|
||||
You can download the library from one of the following sources:
|
||||
|
||||
- [MariaDB website](https://mariadb.com/downloads/connectors/connectors-data-access/java8-connector)
|
||||
Click **Show All Files** to access older product versions.
|
||||
- [Maven Central (direct download)](https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.3/mariadb-java-client-2.7.3.jar)
|
||||
|
||||
The download includes the MariaDB connector JAR file with a name like `maria-java-client-2.7.3.jar`.
|
||||
Copy or create a symbolic link to this file inside the `lib` folder in the distribution root directory.
|
||||
|
||||
To configure the `mysql-metadata-storage` extension to use the MariaDB connector library instead of MySQL, set `druid.metadata.mysql.driver.driverClassName=org.mariadb.jdbc.Driver`.
|
||||
|
||||
Depending on the MariaDB client library version, the connector supports both `jdbc:mysql:` and `jdbc:mariadb:` connection URIs. However, the parameters to configure the connection vary between implementations, so be sure to [check the documentation](https://mariadb.com/kb/en/about-mariadb-connector-j/#connection-strings) for details.
|
||||
The protocol of the connection string is `jdbc:mysql:` or `jdbc:mariadb:`,
|
||||
depending on your specific version of the MariaDB client library.
|
||||
For more information on the parameters to configure a connection,
|
||||
[see the MariaDB documentation](https://mariadb.com/kb/en/about-mariadb-connector-j/#connection-strings)
|
||||
for your connector version.
|
||||
|
||||
|
||||
## Setting up MySQL
|
||||
## Set up MySQL
|
||||
|
||||
To avoid issues with upgrades that require schema changes to a large metadata table, consider a MySQL version that supports instant ADD COLUMN semantics. For example, MySQL 8.
|
||||
|
||||
|
@ -90,7 +107,7 @@ This extension also supports using MariaDB server, https://mariadb.org/download/
|
|||
CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4;
|
||||
|
||||
-- create a druid user
|
||||
CREATE USER 'druid'@'localhost' IDENTIFIED BY 'diurd';
|
||||
CREATE USER 'druid'@'localhost' IDENTIFIED BY 'password';
|
||||
|
||||
-- grant the user all the permissions on the database we just created
|
||||
GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'localhost';
|
||||
|
@ -111,10 +128,11 @@ This extension also supports using MariaDB server, https://mariadb.org/download/
|
|||
|
||||
If using the MariaDB connector library, set `druid.metadata.mysql.driver.driverClassName=org.mariadb.jdbc.Driver`.
|
||||
|
||||
## Encrypting MySQL connections
|
||||
This extension provides support for encrypting MySQL connections. To get more information about encrypting MySQL connections using TLS/SSL in general, please refer to this [guide](https://dev.mysql.com/doc/refman/5.7/en/using-encrypted-connections.html).
|
||||
## Encrypt MySQL connections
|
||||
|
||||
## Configuration
|
||||
This extension provides support for encrypting MySQL connections. To get more information about encrypting MySQL connections using TLS/SSL in general, please refer to this [guide](https://dev.mysql.com/doc/refman/5.7/en/using-encrypted-connections.html).
|
||||
|
||||
## Configuration properties
|
||||
|
||||
|Property|Description|Default|Required|
|
||||
|--------|-----------|-------|--------|
|
||||
|
@ -129,7 +147,10 @@ If using the MariaDB connector library, set `druid.metadata.mysql.driver.driverC
|
|||
|`druid.metadata.mysql.ssl.enabledSSLCipherSuites`|Overrides the existing cipher suites with these cipher suites.|none|no|
|
||||
|`druid.metadata.mysql.ssl.enabledTLSProtocols`|Overrides the TLS protocols with these protocols.|none|no|
|
||||
|
||||
### MySQL InputSource
|
||||
## MySQL input source
|
||||
|
||||
The MySQL extension provides an implementation of an SQL input source to ingest data into Druid from a MySQL database.
|
||||
For more information on the input source parameters, see [SQL input source](../../ingestion/input-sources.md#sql-input-source).
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
id: postgresql
|
||||
title: "PostgreSQL Metadata Store"
|
||||
title: "PostgreSQL metadata store"
|
||||
---
|
||||
|
||||
<!--
|
||||
|
@ -25,7 +25,9 @@ title: "PostgreSQL Metadata Store"
|
|||
|
||||
To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `postgresql-metadata-storage` in the extensions load list.
|
||||
|
||||
## Setting up PostgreSQL
|
||||
With the PostgreSQL extension, you can use PostgreSQL as a metadata store or ingest from a PostgreSQL database.
|
||||
|
||||
## Set up PostgreSQL
|
||||
|
||||
To avoid issues with upgrades that require schema changes to a large metadata table, consider a PostgreSQL version that supports instant ADD COLUMN semantics.
|
||||
|
||||
|
@ -69,7 +71,7 @@ To avoid issues with upgrades that require schema changes to a large metadata ta
|
|||
druid.metadata.storage.connector.password=diurd
|
||||
```
|
||||
|
||||
## Configuration
|
||||
## Configuration properties
|
||||
|
||||
In most cases, the configuration options map directly to the [postgres JDBC connection options](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database).
|
||||
|
||||
|
@ -87,9 +89,10 @@ In most cases, the configuration options map directly to the [postgres JDBC conn
|
|||
| `druid.metadata.postgres.ssl.sslPasswordCallback` | The classname of the SSL password provider. | none | no |
|
||||
| `druid.metadata.postgres.dbTableSchema` | druid meta table schema | `public` | no |
|
||||
|
||||
### PostgreSQL InputSource
|
||||
## PostgreSQL input source
|
||||
|
||||
The PostgreSQL extension provides an implementation of an [SQL input source](../../ingestion/input-sources.md) which can be used to ingest data into Druid from a PostgreSQL database.
|
||||
The PostgreSQL extension provides an implementation of an SQL input source to ingest data into Druid from a PostgreSQL database.
|
||||
For more information on the input source parameters, see [SQL input source](../../ingestion/input-sources.md#sql-input-source).
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
|
@ -29,10 +29,8 @@ For general information on native batch indexing and parallel task indexing, see
|
|||
|
||||
## S3 input source
|
||||
|
||||
:::info
|
||||
|
||||
You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source.
|
||||
|
||||
:::info Required extension
|
||||
To use the S3 input source, load the extension [`druid-s3-extensions`](../development/extensions-core/s3.md) in your `common.runtime.properties` file.
|
||||
:::
|
||||
|
||||
The S3 input source reads objects directly from S3. You can specify either:
|
||||
|
@ -41,7 +39,7 @@ The S3 input source reads objects directly from S3. You can specify either:
|
|||
* a list of S3 location prefixes that attempts to list the contents and ingest
|
||||
all objects contained within the locations.
|
||||
|
||||
The S3 input source is splittable. Therefore, you can use it with the [Parallel task](./native-batch.md). Each worker task of `index_parallel` reads one or multiple objects.
|
||||
The S3 input source is splittable. Therefore, you can use it with the [parallel task](./native-batch.md). Each worker task of `index_parallel` reads one or multiple objects.
|
||||
|
||||
Sample specs:
|
||||
|
||||
|
@ -219,16 +217,14 @@ If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credential
|
|||
|
||||
## Google Cloud Storage input source
|
||||
|
||||
:::info
|
||||
|
||||
You need to include the [`druid-google-extensions`](../development/extensions-core/google.md) as an extension to use the Google Cloud Storage input source.
|
||||
|
||||
:::info Required extension
|
||||
To use the Google Cloud Storage input source, load the extension [`druid-google-extensions`](../development/extensions-core/google.md) in your `common.runtime.properties` file.
|
||||
:::
|
||||
|
||||
The Google Cloud Storage input source is to support reading objects directly
|
||||
from Google Cloud Storage. Objects can be specified as list of Google
|
||||
Cloud Storage URI strings. The Google Cloud Storage input source is splittable
|
||||
and can be used by the [Parallel task](./native-batch.md), where each worker task of `index_parallel` will read
|
||||
and can be used by the [parallel task](./native-batch.md), where each worker task of `index_parallel` will read
|
||||
one or multiple objects.
|
||||
|
||||
Sample specs:
|
||||
|
@ -307,14 +303,12 @@ Google Cloud Storage object:
|
|||
|
||||
## Azure input source
|
||||
|
||||
:::info
|
||||
|
||||
You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
|
||||
|
||||
:::info Required extension
|
||||
To use the Azure input source, load the extension [`druid-azure-extensions`](../development/extensions-core/azure.md) in your `common.runtime.properties` file.
|
||||
:::
|
||||
|
||||
The Azure input source (that uses the type `azureStorage`) reads objects directly from Azure Blob store or Azure Data Lake sources. You can
|
||||
specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](./native-batch.md) indexing and each worker task reads one chunk of the split data.
|
||||
specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [parallel task](./native-batch.md) indexing and each worker task reads one chunk of the split data.
|
||||
|
||||
The `azureStorage` input source is a new schema for Azure input sources that allows you to specify which storage account files should be ingested from. We recommend that you update any specs that use the old `azure` schema to use the new `azureStorage` schema. The new schema provides more functionality than the older `azure` schema.
|
||||
|
||||
|
@ -491,15 +485,13 @@ The `objects` property is:
|
|||
|
||||
## HDFS input source
|
||||
|
||||
:::info
|
||||
|
||||
You need to include the [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension to use the HDFS input source.
|
||||
|
||||
:::info Required extension
|
||||
To use the HDFS input source, load the extension [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) in your `common.runtime.properties` file.
|
||||
:::
|
||||
|
||||
The HDFS input source is to support reading files directly
|
||||
from HDFS storage. File paths can be specified as an HDFS URI string or a list
|
||||
of HDFS URI strings. The HDFS input source is splittable and can be used by the [Parallel task](./native-batch.md),
|
||||
of HDFS URI strings. The HDFS input source is splittable and can be used by the [parallel task](./native-batch.md),
|
||||
where each worker task of `index_parallel` will read one or multiple files.
|
||||
|
||||
Sample specs:
|
||||
|
@ -593,7 +585,7 @@ The `http` input source is not limited to the HTTP or HTTPS protocols. It uses t
|
|||
|
||||
For more information about security best practices, see [Security overview](../operations/security-overview.md#best-practices).
|
||||
|
||||
The HTTP input source is _splittable_ and can be used by the [Parallel task](./native-batch.md),
|
||||
The HTTP input source is _splittable_ and can be used by the [parallel task](./native-batch.md),
|
||||
where each worker task of `index_parallel` will read only one file. This input source does not support Split Hint Spec.
|
||||
|
||||
Sample specs:
|
||||
|
@ -701,7 +693,7 @@ Sample spec:
|
|||
|
||||
The Local input source is to support reading files directly from local storage,
|
||||
and is mainly intended for proof-of-concept testing.
|
||||
The Local input source is _splittable_ and can be used by the [Parallel task](./native-batch.md),
|
||||
The Local input source is _splittable_ and can be used by the [parallel task](./native-batch.md),
|
||||
where each worker task of `index_parallel` will read one or multiple files.
|
||||
|
||||
Sample spec:
|
||||
|
@ -736,7 +728,7 @@ Sample spec:
|
|||
|
||||
The Druid input source is to support reading data directly from existing Druid segments,
|
||||
potentially using a new schema and changing the name, dimensions, metrics, rollup, etc. of the segment.
|
||||
The Druid input source is _splittable_ and can be used by the [Parallel task](./native-batch.md).
|
||||
The Druid input source is _splittable_ and can be used by the [parallel task](./native-batch.md).
|
||||
This input source has a fixed input format for reading from Druid segments;
|
||||
no `inputFormat` field needs to be specified in the ingestion spec when using this input source.
|
||||
|
||||
|
@ -833,17 +825,29 @@ For more information on the `maxNumConcurrentSubTasks` field, see [Implementatio
|
|||
|
||||
## SQL input source
|
||||
|
||||
:::info Required extension
|
||||
To use the SQL input source, you must load the appropriate extension in your `common.runtime.properties` file.
|
||||
* To connect to MySQL, load the extension [`mysql-metadata-storage`](../development/extensions-core/mysql.md).
|
||||
* To connect to PostgreSQL, load the extension [`postgresql-metadata-storage`](../development/extensions-core/postgresql.md).
|
||||
|
||||
The MySQL extension requires a JDBC driver.
|
||||
For more information, see the [Installing the MySQL connector library](../development/extensions-core/mysql.md).
|
||||
:::
|
||||
|
||||
The SQL input source is used to read data directly from RDBMS.
|
||||
The SQL input source is _splittable_ and can be used by the [Parallel task](./native-batch.md), where each worker task will read from one SQL query from the list of queries.
|
||||
You can _split_ the ingestion tasks for a SQL input source. When you use the [parallel task](./native-batch.md) type, each worker task reads from one SQL query from the list of queries.
|
||||
This input source does not support Split Hint Spec.
|
||||
Since this input source has a fixed input format for reading events, no `inputFormat` field needs to be specified in the ingestion spec when using this input source.
|
||||
Please refer to the Recommended practices section below before using this input source.
|
||||
|
||||
The SQL input source has a fixed input format for reading events.
|
||||
Don't specify `inputFormat` when using this input source.
|
||||
|
||||
Refer to the [recommended practices](#recommended-practices) before using this input source.
|
||||
|
||||
|Property|Description|Required|
|
||||
|--------|-----------|---------|
|
||||
|type|Set the value to `sql`.|Yes|
|
||||
|database|Specifies the database connection details. The database type corresponds to the extension that supplies the `connectorConfig` support. The specified extension must be loaded into Druid:<br/><br/><ul><li>[mysql-metadata-storage](../development/extensions-core/mysql.md) for `mysql`</li><li> [postgresql-metadata-storage](../development/extensions-core/postgresql.md) extension for `postgresql`.</li></ul><br/><br/>You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../configuration/index.md#jdbc-connections-to-external-databases) for more details.|Yes|
|
||||
|foldCase|Toggle case folding of database column names. This may be enabled in cases where the database returns case insensitive column names in query results.|No|
|
||||
|database|Specifies the database connection details. The database type corresponds to the extension that supplies the `connectorConfig` support.<br/><br/>You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../configuration/index.md#jdbc-connections-to-external-databases) for more details.|Yes|
|
||||
|foldCase|Boolean to toggle case folding of database column names. For example, to ingest a database column named `Entry_Date` as `entry_date`, set `foldCase` to true and include `entry_date` in the [`dimensionsSpec`](ingestion-spec.md#dimensionsspec).|No|
|
||||
|sqls|List of SQL queries where each SQL query would retrieve the data to be indexed.|Yes|
|
||||
|
||||
The following is an example of an SQL input source spec:
|
||||
|
@ -887,7 +891,7 @@ Compared to the other native batch input sources, SQL input source behaves diffe
|
|||
|
||||
The Combining input source lets you read data from multiple input sources.
|
||||
It identifies the splits from delegate input sources and uses a worker task to process each split.
|
||||
Use the Combining input source only if all the delegates are splittable and can be used by the [Parallel task](./native-batch.md).
|
||||
Each delegate input source must be splittable and compatible with the [parallel task type](./native-batch.md).
|
||||
|
||||
Similar to other input sources, the Combining input source supports a single `inputFormat`.
|
||||
Delegate input sources that require an `inputFormat` must have the same format for input data.
|
||||
|
@ -931,10 +935,8 @@ The following is an example of a Combining input source spec:
|
|||
|
||||
## Iceberg input source
|
||||
|
||||
:::info
|
||||
|
||||
To use the Iceberg input source, load the extension [`druid-iceberg-extensions`](../development/extensions-contrib/iceberg.md).
|
||||
|
||||
:::info Required extension
|
||||
To use the Iceberg input source, load the extension [`druid-iceberg-extensions`](../development/extensions-contrib/iceberg.md) in your `common.runtime.properties` file.
|
||||
:::
|
||||
|
||||
You use the Iceberg input source to read data stored in the Iceberg table format. For a given table, the input source scans up to the latest Iceberg snapshot from the configured Hive catalog. Druid ingests the underlying live data files using the existing input source formats.
|
||||
|
@ -1138,10 +1140,8 @@ This input source provides the following filters: `and`, `equals`, `interval`, a
|
|||
|
||||
## Delta Lake input source
|
||||
|
||||
:::info
|
||||
|
||||
To use the Delta Lake input source, load the extension [`druid-deltalake-extensions`](../development/extensions-contrib/delta-lake.md).
|
||||
|
||||
:::info Required extension
|
||||
To use the Delta Lake input source, load the extension [`druid-deltalake-extensions`](../development/extensions-contrib/delta-lake.md) in your `common.runtime.properties` file.
|
||||
:::
|
||||
|
||||
You can use the Delta input source to read data stored in a Delta Lake table. For a given table, the input source scans
|
||||
|
|
|
@ -377,7 +377,7 @@ The JDBC lookups will poll a database to populate its local cache. If the `tsCol
|
|||
:::info
|
||||
If using JDBC, you will need to add your database's client JAR files to the extension's directory.
|
||||
For Postgres, the connector JAR is already included.
|
||||
See the MySQL extension documentation for instructions to obtain [MySQL](../development/extensions-core/mysql.md#installing-the-mysql-connector-library) or [MariaDB](../development/extensions-core/mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
|
||||
See the MySQL extension documentation for instructions to obtain [MySQL](../development/extensions-core/mysql.md#install-mysql-connectorj) or [MariaDB](../development/extensions-core/mysql.md#install-mariadb-connectorj) connector libraries.
|
||||
The connector JAR should reside in the classpath of Druid's main class loader.
|
||||
To add the connector JAR to the classpath, you can copy the downloaded file to `lib/` under the distribution root directory. Alternatively, create a symbolic link to the connector in the `lib` directory.
|
||||
:::
|
||||
|
|
Loading…
Reference in New Issue