docs: Refresh docs for SQL input source (#17031)

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-09-16 15:52:37 -07:00 · 2024-09-16 15:52:37 -07:00 · 2e2f3cf66a
parent 9696f0b37c
commit 2e2f3cf66a
5 changed files with 95 additions and 71 deletions
--- a/docs/development/extensions-core/druid-lookups.md
+++ b/docs/development/extensions-core/druid-lookups.md
@ -31,9 +31,9 @@ This module can be used side to side with other lookup module like the global ca
 To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-single` in the extensions load list.

 :::info
- If using JDBC, you will need to add your database's client JAR files to the extension's directory.
+To use JDBC, you must add your database client JAR files to the extension's directory.
 For Postgres, the connector JAR is already included.
- See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#installing-the-mysql-connector-library) or [MariaDB](./mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
+ See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#install-mysql-connectorj) or [MariaDB](./mysql.md#install-mariadb-connectorj) connector libraries.
 Copy or symlink the downloaded file to `extensions/druid-lookups-cached-single` under the distribution root directory.
 :::

--- a/docs/development/extensions-core/mysql.md
+++ b/docs/development/extensions-core/mysql.md
@ -1,6 +1,6 @@
 ---
 id: mysql
-title: "MySQL Metadata Store"
+title: "MySQL metadata store"
 ---

 <!--
@ -25,41 +25,58 @@ title: "MySQL Metadata Store"

 To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `mysql-metadata-storage` in the extensions load list.

-:::info
- The MySQL extension requires the MySQL Connector/J library or MariaDB Connector/J library, neither of which are included in the Druid distribution.
- Refer to the following section for instructions on how to install this library.
-:::
+With the MySQL extension, you can use MySQL as a metadata store or ingest from a MySQL database.

-## Installing the MySQL connector library
+The extension requires a connector library that's not included with Druid.
+See the [Prerequisites](#prerequisites) for installation instructions.

-This extension can use Oracle's MySQL JDBC driver which is not included in the Druid distribution. You must
-install it separately. There are a few ways to obtain this library:
+## Prerequisites

- It can be downloaded from the MySQL site at: https://dev.mysql.com/downloads/connector/j/
- It can be fetched from Maven Central at: https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.2.0/mysql-connector-j-8.2.0.jar
- It may be available through your package manager, e.g. as `libmysql-java` on APT for a Debian-based OS
+To use the MySQL extension, you need to install one of the following libraries:
+* [MySQL Connector/J](#install-mysql-connectorj)
+* [MariaDB Connector/J](#install-mariadb-connectorj)

-This fetches the MySQL connector JAR file with a name like `mysql-connector-j-8.2.0.jar`.
+### Install MySQL Connector/J

-Copy or symlink this file inside the folder `extensions/mysql-metadata-storage` under the distribution root directory.
+The MySQL extension uses Oracle's MySQL JDBC driver.
+The current version of Druid uses version 8.2.0.
+Other versions may not work with this extension.

-## Alternative: Installing the MariaDB connector library
+You can download the library from one of the following sources:

-This extension also supports using the MariaDB connector jar, though it is also not included in the Druid distribution, so you must install it separately.
+- [MySQL website](https://dev.mysql.com/downloads/connector/j/)  
+  Visit the archives page to access older product versions.
+- [Maven Central (direct download)](https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.2.0/mysql-connector-j-8.2.0.jar)
+- Your package manager. For example, `libmysql-java` on APT for a Debian-based OS.

- Download from the MariaDB site: https://mariadb.com/downloads/connector
- Download from Maven Central: https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.3/mariadb-java-client-2.7.3.jar
+The download includes the MySQL connector JAR file with a name like `mysql-connector-j-8.2.0.jar`.
+Copy or create a symbolic link to this file inside the `lib` folder in the distribution root directory.

-This fetches the MariaDB connector JAR file with a name like `maria-java-client-2.7.3.jar`.
+### Install MariaDB Connector/J

-Copy or symlink this file to `extensions/mysql-metadata-storage` under the distribution root directory.
+This extension also supports using the MariaDB connector jar.
+The current version of Druid uses version 2.7.3.
+Other versions may not work with this extension.
+
+You can download the library from one of the following sources:
+
+- [MariaDB website](https://mariadb.com/downloads/connectors/connectors-data-access/java8-connector)  
+  Click **Show All Files** to access older product versions.
+- [Maven Central (direct download)](https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.3/mariadb-java-client-2.7.3.jar)
+
+The download includes the MariaDB connector JAR file with a name like `maria-java-client-2.7.3.jar`.
+Copy or create a symbolic link to this file inside the `lib` folder in the distribution root directory.

 To configure the `mysql-metadata-storage` extension to use the MariaDB connector library instead of MySQL, set `druid.metadata.mysql.driver.driverClassName=org.mariadb.jdbc.Driver`.

-Depending on the MariaDB client library version, the connector supports both `jdbc:mysql:` and `jdbc:mariadb:` connection URIs. However, the parameters to configure the connection vary between implementations, so be sure to [check the documentation](https://mariadb.com/kb/en/about-mariadb-connector-j/#connection-strings) for details.
+The protocol of the connection string is `jdbc:mysql:` or `jdbc:mariadb:`,
+depending on your specific version of the MariaDB client library.
+For more information on the parameters to configure a connection,
+[see the MariaDB documentation](https://mariadb.com/kb/en/about-mariadb-connector-j/#connection-strings)
+for your connector version.


-## Setting up MySQL
+## Set up MySQL

 To avoid issues with upgrades that require schema changes to a large metadata table, consider a MySQL version that supports instant ADD COLUMN semantics. For example, MySQL 8.

@ -90,7 +107,7 @@ This extension also supports using MariaDB server, https://mariadb.org/download/
  CREATE DATABASE druid DEFAULT CHARACTER SET utf8mb4;

  -- create a druid user
-  CREATE USER 'druid'@'localhost' IDENTIFIED BY 'diurd';
+  CREATE USER 'druid'@'localhost' IDENTIFIED BY 'password';

  -- grant the user all the permissions on the database we just created
  GRANT ALL PRIVILEGES ON druid.* TO 'druid'@'localhost';
@ -111,10 +128,11 @@ This extension also supports using MariaDB server, https://mariadb.org/download/

 If using the MariaDB connector library, set `druid.metadata.mysql.driver.driverClassName=org.mariadb.jdbc.Driver`.

-## Encrypting MySQL connections
-  This extension provides support for encrypting MySQL connections. To get more information about encrypting MySQL connections using TLS/SSL in general, please refer to this [guide](https://dev.mysql.com/doc/refman/5.7/en/using-encrypted-connections.html).
+## Encrypt MySQL connections

-## Configuration
+This extension provides support for encrypting MySQL connections. To get more information about encrypting MySQL connections using TLS/SSL in general, please refer to this [guide](https://dev.mysql.com/doc/refman/5.7/en/using-encrypted-connections.html).
+
+## Configuration properties

 |Property|Description|Default|Required|
 |--------|-----------|-------|--------|
@ -129,7 +147,10 @@ If using the MariaDB connector library, set `druid.metadata.mysql.driver.driverC
 |`druid.metadata.mysql.ssl.enabledSSLCipherSuites`|Overrides the existing cipher suites with these cipher suites.|none|no|
 |`druid.metadata.mysql.ssl.enabledTLSProtocols`|Overrides the TLS protocols with these protocols.|none|no|

-### MySQL InputSource
+## MySQL input source
+
+The MySQL extension provides an implementation of an SQL input source to ingest data into Druid from a MySQL database.
+For more information on the input source parameters, see [SQL input source](../../ingestion/input-sources.md#sql-input-source).

 ```json
 {
--- a/docs/development/extensions-core/postgresql.md
+++ b/docs/development/extensions-core/postgresql.md
@ -1,6 +1,6 @@
 ---
 id: postgresql
-title: "PostgreSQL Metadata Store"
+title: "PostgreSQL metadata store"
 ---

 <!--
@ -25,7 +25,9 @@ title: "PostgreSQL Metadata Store"

 To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `postgresql-metadata-storage` in the extensions load list.

-## Setting up PostgreSQL
+With the  PostgreSQL extension, you can use PostgreSQL as a metadata store or ingest from a PostgreSQL database.
+
+## Set up PostgreSQL

 To avoid issues with upgrades that require schema changes to a large metadata table, consider a PostgreSQL version that supports instant ADD COLUMN semantics.

@ -69,7 +71,7 @@ To avoid issues with upgrades that require schema changes to a large metadata ta
  druid.metadata.storage.connector.password=diurd
  ```

-## Configuration
+## Configuration properties

 In most cases, the configuration options map directly to the [postgres JDBC connection options](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database).

@ -87,9 +89,10 @@ In most cases, the configuration options map directly to the [postgres JDBC conn
 | `druid.metadata.postgres.ssl.sslPasswordCallback` | The classname of the SSL password provider. | none | no |
 | `druid.metadata.postgres.dbTableSchema` | druid meta table schema | `public` | no |

-### PostgreSQL InputSource
+## PostgreSQL input source

-The PostgreSQL extension provides an implementation of an [SQL input source](../../ingestion/input-sources.md) which can be used to ingest data into Druid from a PostgreSQL database.
+The PostgreSQL extension provides an implementation of an SQL input source to ingest data into Druid from a PostgreSQL database.
+For more information on the input source parameters, see [SQL input source](../../ingestion/input-sources.md#sql-input-source).

 ```json
 {
--- a/docs/ingestion/input-sources.md
+++ b/docs/ingestion/input-sources.md
@ -29,10 +29,8 @@ For general information on native batch indexing and parallel task indexing, see

 ## S3 input source

-:::info
-
-You need to include the [`druid-s3-extensions`](../development/extensions-core/s3.md) as an extension to use the S3 input source.
-
+:::info Required extension
+To use the S3 input source, load the extension [`druid-s3-extensions`](../development/extensions-core/s3.md) in your `common.runtime.properties` file.
 :::

 The S3 input source reads objects directly from S3. You can specify either:
@ -41,7 +39,7 @@ The S3 input source reads objects directly from S3. You can specify either:
 * a list of S3 location prefixes that attempts to list the contents and ingest
 all objects contained within the locations.

-The S3 input source is splittable. Therefore, you can use it with the [Parallel task](./native-batch.md). Each worker task of `index_parallel` reads one or multiple objects.
+The S3 input source is splittable. Therefore, you can use it with the [parallel task](./native-batch.md). Each worker task of `index_parallel` reads one or multiple objects.

 Sample specs:

@ -219,16 +217,14 @@ If `accessKeyId` and `secretAccessKey` are not given, the default [S3 credential

 ## Google Cloud Storage input source

-:::info
-
-You need to include the [`druid-google-extensions`](../development/extensions-core/google.md) as an extension to use the Google Cloud Storage input source.
-
+:::info Required extension
+To use the Google Cloud Storage input source, load the extension [`druid-google-extensions`](../development/extensions-core/google.md) in your `common.runtime.properties` file.
 :::

 The Google Cloud Storage input source is to support reading objects directly
 from Google Cloud Storage. Objects can be specified as list of Google
 Cloud Storage URI strings. The Google Cloud Storage input source is splittable
-and can be used by the [Parallel task](./native-batch.md), where each worker task of `index_parallel` will read
+and can be used by the [parallel task](./native-batch.md), where each worker task of `index_parallel` will read
 one or multiple objects.

 Sample specs:
@ -307,14 +303,12 @@ Google Cloud Storage object:

 ## Azure input source

-:::info
-
-You need to include the [`druid-azure-extensions`](../development/extensions-core/azure.md) as an extension to use the Azure input source.
-
+:::info Required extension
+To use the Azure input source, load the extension [`druid-azure-extensions`](../development/extensions-core/azure.md) in your `common.runtime.properties` file.
 :::

 The Azure input source (that uses the type `azureStorage`) reads objects directly from Azure Blob store or Azure Data Lake sources. You can
-specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [Parallel task](./native-batch.md) indexing and each worker task reads one chunk of the split data.
+specify objects as a list of file URI strings or prefixes. You can split the Azure input source for use with [parallel task](./native-batch.md) indexing and each worker task reads one chunk of the split data.

 The `azureStorage` input source is a new schema for Azure input sources that allows you to specify which storage account files should be ingested from. We recommend that you update any specs that use the old `azure` schema to use the new `azureStorage` schema. The new schema provides more functionality than the older `azure` schema.

@ -491,15 +485,13 @@ The `objects` property is:

 ## HDFS input source

-:::info
-
-You need to include the [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) as an extension to use the HDFS input source.
-
+:::info Required extension
+To use the HDFS input source, load the extension [`druid-hdfs-storage`](../development/extensions-core/hdfs.md) in your `common.runtime.properties` file.
 :::

 The HDFS input source is to support reading files directly
 from HDFS storage. File paths can be specified as an HDFS URI string or a list
-of HDFS URI strings. The HDFS input source is splittable and can be used by the [Parallel task](./native-batch.md),
+of HDFS URI strings. The HDFS input source is splittable and can be used by the [parallel task](./native-batch.md),
 where each worker task of `index_parallel` will read one or multiple files.

 Sample specs:
@ -593,7 +585,7 @@ The `http` input source is not limited to the HTTP or HTTPS protocols. It uses t

 For more information about security best practices, see [Security overview](../operations/security-overview.md#best-practices).

-The HTTP input source is _splittable_ and can be used by the [Parallel task](./native-batch.md),
+The HTTP input source is _splittable_ and can be used by the [parallel task](./native-batch.md),
 where each worker task of `index_parallel` will read only one file. This input source does not support Split Hint Spec.

 Sample specs:
@ -701,7 +693,7 @@ Sample spec:

 The Local input source is to support reading files directly from local storage,
 and is mainly intended for proof-of-concept testing.
-The Local input source is _splittable_ and can be used by the [Parallel task](./native-batch.md),
+The Local input source is _splittable_ and can be used by the [parallel task](./native-batch.md),
 where each worker task of `index_parallel` will read one or multiple files.

 Sample spec:
@ -736,7 +728,7 @@ Sample spec:

 The Druid input source is to support reading data directly from existing Druid segments,
 potentially using a new schema and changing the name, dimensions, metrics, rollup, etc. of the segment.
-The Druid input source is _splittable_ and can be used by the [Parallel task](./native-batch.md).
+The Druid input source is _splittable_ and can be used by the [parallel task](./native-batch.md).
 This input source has a fixed input format for reading from Druid segments;
 no `inputFormat` field needs to be specified in the ingestion spec when using this input source.

@ -833,17 +825,29 @@ For more information on the `maxNumConcurrentSubTasks` field, see [Implementatio

 ## SQL input source

+:::info Required extension
+To use the SQL input source, you must load the appropriate extension in your `common.runtime.properties` file.
+* To connect to MySQL, load the extension [`mysql-metadata-storage`](../development/extensions-core/mysql.md).
+* To connect to PostgreSQL, load the extension [`postgresql-metadata-storage`](../development/extensions-core/postgresql.md).
+
+The MySQL extension requires a JDBC driver.
+For more information, see the [Installing the MySQL connector library](../development/extensions-core/mysql.md).
+:::
+
 The SQL input source is used to read data directly from RDBMS.
-The SQL input source is _splittable_ and can be used by the [Parallel task](./native-batch.md), where each worker task will read from one SQL query from the list of queries.
+You can _split_ the ingestion tasks for a SQL input source. When you use the [parallel task](./native-batch.md) type, each worker task reads from one SQL query from the list of queries.
 This input source does not support Split Hint Spec.
-Since this input source has a fixed input format for reading events, no `inputFormat` field needs to be specified in the ingestion spec when using this input source.
-Please refer to the Recommended practices section below before using this input source.
+
+The SQL input source has a fixed input format for reading events.
+Don't specify `inputFormat` when using this input source.
+
+Refer to the [recommended practices](#recommended-practices) before using this input source.

 |Property|Description|Required|
 |--------|-----------|---------|
 |type|Set the value to `sql`.|Yes|
-|database|Specifies the database connection details. The database type corresponds to the extension that supplies the `connectorConfig` support. The specified extension must be loaded into Druid:<br/><br/><ul><li>[mysql-metadata-storage](../development/extensions-core/mysql.md) for `mysql`</li><li> [postgresql-metadata-storage](../development/extensions-core/postgresql.md) extension for `postgresql`.</li></ul><br/><br/>You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../configuration/index.md#jdbc-connections-to-external-databases) for more details.|Yes|
-|foldCase|Toggle case folding of database column names. This may be enabled in cases where the database returns case insensitive column names in query results.|No|
+|database|Specifies the database connection details. The database type corresponds to the extension that supplies the `connectorConfig` support.<br/><br/>You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../configuration/index.md#jdbc-connections-to-external-databases) for more details.|Yes|
+|foldCase|Boolean to toggle case folding of database column names. For example, to ingest a database column named `Entry_Date` as `entry_date`, set `foldCase` to true and include `entry_date` in the [`dimensionsSpec`](ingestion-spec.md#dimensionsspec).|No|
 |sqls|List of SQL queries where each SQL query would retrieve the data to be indexed.|Yes|

 The following is an example of an SQL input source spec:
@ -887,7 +891,7 @@ Compared to the other native batch input sources, SQL input source behaves diffe

 The Combining input source lets you read data from multiple input sources.
 It identifies the splits from delegate input sources and uses a worker task to process each split.
-Use the Combining input source only if all the delegates are splittable and can be used by the [Parallel task](./native-batch.md).
+Each delegate input source must be splittable and compatible with the [parallel task type](./native-batch.md).

 Similar to other input sources, the Combining input source supports a single `inputFormat`.
 Delegate input sources that require an `inputFormat` must have the same format for input data.
@ -931,10 +935,8 @@ The following is an example of a Combining input source spec:

 ## Iceberg input source

-:::info
-
-To use the Iceberg input source, load the extension [`druid-iceberg-extensions`](../development/extensions-contrib/iceberg.md).
-
+:::info Required extension
+To use the Iceberg input source, load the extension [`druid-iceberg-extensions`](../development/extensions-contrib/iceberg.md) in your `common.runtime.properties` file.
 :::

 You use the Iceberg input source to read data stored in the Iceberg table format. For a given table, the input source scans up to the latest Iceberg snapshot from the configured Hive catalog. Druid ingests the underlying live data files using the existing input source formats.
@ -1138,10 +1140,8 @@ This input source provides the following filters: `and`, `equals`, `interval`, a

 ## Delta Lake input source

-:::info
-
-To use the Delta Lake input source, load the extension [`druid-deltalake-extensions`](../development/extensions-contrib/delta-lake.md).
-
+:::info Required extension
+To use the Delta Lake input source, load the extension [`druid-deltalake-extensions`](../development/extensions-contrib/delta-lake.md) in your `common.runtime.properties` file.
 :::

 You can use the Delta input source to read data stored in a Delta Lake table. For a given table, the input source scans
--- a/docs/querying/lookups-cached-global.md
+++ b/docs/querying/lookups-cached-global.md
@ -377,7 +377,7 @@ The JDBC lookups will poll a database to populate its local cache. If the `tsCol
 :::info
 If using JDBC, you will need to add your database's client JAR files to the extension's directory.
 For Postgres, the connector JAR is already included.
- See the MySQL extension documentation for instructions to obtain [MySQL](../development/extensions-core/mysql.md#installing-the-mysql-connector-library) or [MariaDB](../development/extensions-core/mysql.md#alternative-installing-the-mariadb-connector-library) connector libraries.
+ See the MySQL extension documentation for instructions to obtain [MySQL](../development/extensions-core/mysql.md#install-mysql-connectorj) or [MariaDB](../development/extensions-core/mysql.md#install-mariadb-connectorj) connector libraries.
 The connector JAR should reside in the classpath of Druid's main class loader.
 To add the connector JAR to the classpath, you can copy the downloaded file to `lib/` under the distribution root directory. Alternatively, create a symbolic link to the connector in the `lib` directory.
 :::