--- id: reference title: SQL-based ingestion reference sidebar_label: Reference --- > This page describes SQL-based batch ingestion using the [`druid-multi-stage-query`](../multi-stage-query/index.md) > extension, new in Druid 24.0. Refer to the [ingestion methods](../ingestion/index.md#batch) table to determine which > ingestion method is right for you. ## SQL reference This topic is a reference guide for the multi-stage query architecture in Apache Druid. For examples of real-world usage, refer to the [Examples](examples.md) page. ### `EXTERN` Use the `EXTERN` function to read external data. Function format: ```sql SELECT FROM TABLE( EXTERN( '', '', '' ) ) ``` `EXTERN` consists of the following parts: 1. Any [Druid input source](../ingestion/native-batch-input-source.md) as a JSON-encoded string. 2. Any [Druid input format](../ingestion/data-formats.md) as a JSON-encoded string. 3. A row signature, as a JSON-encoded array of column descriptors. Each column descriptor must have a `name` and a `type`. The type can be `string`, `long`, `double`, or `float`. This row signature is used to map the external data into the SQL layer. For more information, see [Read external data with EXTERN](concepts.md#extern). ### `HTTP`, `INLINE` and `LOCALFILES` While `EXTERN` allows you to specify an external table using JSON, other table functions allow you describe the external table using SQL syntax. Each function works for one specific kind of input source. You provide properties using SQL named arguments. The row signature is given using the Druid SQL `EXTEND` keyword using SQL syntax and types. Function format: ```sql SELECT FROM TABLE( http( userName => 'bob', password => 'secret', uris => 'http:foo.com/bar.csv', format => 'csv' ) ) EXTEND (x VARCHAR, y VARCHAR, z BIGINT) ``` Note that the `EXTEND` keyword is optional. The following is equally valid (and perhaps more convenient): ```sql SELECT FROM TABLE( http( userName => 'bob', password => 'secret', uris => 'http:foo.com/bar.csv', format => 'csv' ) ) (x VARCHAR, y VARCHAR, z BIGINT) ``` The set of table functions and formats is preliminary in this release. #### `HTTP` The `HTTP` table function represents the `HttpInputSource` class in Druid which allows you to read from an HTTP server. The function accepts the following arguments: | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `userName` | Basic authentication user name | `httpAuthenticationUsername` | No | | `password` | Basic authentication password | `httpAuthenticationPassword` | No | | `passwordEnvVar` | Environment variable that contains the basic authentication password| `httpAuthenticationPassword` | No | | `uris` | Comma-separated list of URIs to read. | `uris` | Yes | #### `INLINE` The `INLINE` table function represents the `InlineInputSource` class in Druid which provides data directly in the table function. The function accepts the following arguments: | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `data` | Text lines of inline data. Separate lines with a newline. | `data` | Yes | #### `LOCALFILES` The `LOCALFILES` table function represents the `LocalInputSource` class in Druid which reads files from the file system of the node running Druid. This is most useful for single-node installations. The function accepts the following arguments: | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `baseDir` | Directory to read from. | `baseDir` | No | | `filter` | Filter pattern to read. Example: `*.csv`. | `filter` | No | | `files` | Comma-separated list of files to read. | `files` | No | You must either provide the `baseDir` or the list of `files`. You can provide both, in which case the files are assumed relative to the `baseDir`. If you provide a `filter`, you must provide the `baseDir`. #### Table Function Format Each of the table functions above requires that you specify a format. | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `format` | The input format, using the same names as for `EXTERN`. | `inputFormat.type` | Yes | #### CSV Format Use the `csv` format to read from CSV. This choice selects the Druid `CsvInputFormat` class. | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `listDelimiter` | The delimiter to use for fields that represent a list of strings. | `listDelimiter` | No | | `skipRows` | The number of rows to skip at the start of the file. Default is 0. | `skipHeaderRows` | No | MSQ does not have the ability to infer schema from a CSV, file, so the `findColumnsFromHeader` property is unavailable. Instead, Columns are given using the `EXTEND` syntax described above. #### Delimited Text Format Use the `tsv` format to read from an arbitrary delimited (CSV-like) file such as tab-delimited, pipe-delimited, etc. This choice selects the Druid `DelimitedInputFormat` class. | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `delimiter` | The delimiter which separates fields. | `delimiter` | Yes | | `listDelimiter` | The delimiter to use for fields that represent a list of strings. | `listDelimiter` | No | | `skipRows` | The number of rows to skip at the start of the file. Default is 0. | `skipHeaderRows` | No | As noted above, MSQ cannot infer schema using headers. Use `EXTEND` instead. #### JSON Format Use the `json` format to read from a JSON input source. This choice selects the Druid `JsonInputFormat` class. | Name | Description | JSON equivalent | Required | | ---- | ----------- | --------------- | -------- | | `keepNulls` | Whether to keep null values. Defaults to `false`. | `keepNullColumns` | No | ### `INSERT` Use the `INSERT` statement to insert data. Unlike standard SQL, `INSERT` loads data into the target table according to column name, not positionally. If necessary, use `AS` in your `SELECT` column list to assign the correct names. Do not rely on their positions within the SELECT clause. Statement format: ```sql INSERT INTO < SELECT query > PARTITIONED BY