Add PPL and SQL section (#1111)

* Merge pull request #1 from Yury-Fridlyand/dev-update-sql-relevance-docs

Update SQL plugin relevance functions documentation.

Co-authored-by: MaxKsyunz <maxk@bitquilltech.com>
Signed-off-by: Yury Fridlyand <yuryf@bitquilltech.com>

* Address PR feedback.

Signed-off-by: Yury Fridlyand <yuryf@bitquilltech.com>

* Address PR feedback by @joshuali925.

Signed-off-by: Yury Fridlyand <yuryf@bitquilltech.com>

* Remove PPL page from Observability Plugin. Add link to Observability page. Make some simple formatting changes

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Reword paragraph

Signed-off-by: Naarcha-AWS <naarcha@amazon.com>

* Adds SQL and PPL API and other SQL plugin changes

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Formatting changes

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Incorporates editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

Signed-off-by: Yury Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Naarcha-AWS <naarcha@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Yury Fridlyand <yuryf@bitquilltech.com>
Co-authored-by: MaxKsyunz <maxk@bitquilltech.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
This commit is contained in:
Naarcha-AWS 2022-09-26 09:28:00 -07:00 committed by GitHub
parent a6e47e02f5
commit c69f860bfe
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
41 changed files with 2097 additions and 1357 deletions

View File

@ -10,7 +10,7 @@ has_toc: false
ML Commons for OpenSearch eases the development of machine learning features by providing a set of common machine learning (ML) algorithms through transport and REST API calls. Those calls choose the right nodes and resources for each ML request and monitors ML tasks to ensure uptime. This allows you to leverage existing open-source ML algorithms and reduce the effort required to develop new ML features.
Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [AD]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#ad) and [kmeans]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/commands#kmeans) Piped Processing Language (PPL) commands.
Interaction with the ML Commons plugin occurs through either the [REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api) or [`ad`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#ad) and [`kmeans`]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/functions#kmeans) Piped Processing Language (PPL) commands.
Models [trained]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#train-model) through the ML Commons plugin support model-based algorithms such as kmeans. After you've trained a model enough so that it meets your precision requirements, you can apply the model to [predict]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api#predict) new data safely.

View File

@ -18,7 +18,7 @@ To get started, select the Menu button on the upper left corner of the OpenSearc
2. Enter a name for your application and optionally add a description.
3. Do at least one of the following:
- Use [PPL]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) to specify the base query.
- Use [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index) to specify the base query.
You can't change the base query after the application is created.
{: .note }
@ -31,7 +31,7 @@ You can't change the base query after the application is created.
### Create a visualization
1. Choose the **Log Events** tab.
1. Use [PPL]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) to build upon your base query.
1. Use [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index) to build upon your base query.
1. Choose the **Visualizations** tab to see your visualizations.
1. Expand the **Save** dropdown menu, enter a name for your visualization, then choose **Save**.

View File

@ -6,7 +6,7 @@ nav_order: 10
# Event analytics
Event analytics in observability is where you can use [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) (PPL) queries to build and view different visualizations of your data.
Event analytics in Observability is where you can use [Piped Processing Language]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index) (PPL) queries to build and view different visualizations of your data.
## Get started with event analytics
@ -24,7 +24,7 @@ source = opensearch_dashboards_sample_data_logs | fields host | stats count()
By default, Dashboards shows results from the last 15 minutes of your data. To see data from a different timeframe, use the date and time selector.
For more information about building PPL queries, see [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index).
For more information about building PPL queries, see [Piped Processing Language]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index).
## Save a visualization

View File

@ -5,7 +5,6 @@ nav_order: 1
has_children: false
redirect_from:
- /observability-plugin/
- /observability-plugin/
---
# About Observability
@ -16,7 +15,7 @@ Observability is collection of plugins and applications that let you visualize d
Your experience of exploring data might differ, but if you're new to exploring data to create visualizations, we recommend trying a workflow like the following:
1. Explore data over a certain timeframe using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index).
1. Explore data within a certain timeframe using [Piped Processing Language]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index).
2. Use [event analytics]({{site.url}}{{site.baseurl}}/observability-plugin/event-analytics) to turn data-driven events into visualizations.
![Sample Event Analytics View]({{site.url}}{{site.baseurl}}/images/event-analytics.png)
3. Create [operational panels]({{site.url}}{{site.baseurl}}/observability-plugin/operational-panels) and add visualizations to compare data the way you like.

View File

@ -6,7 +6,7 @@ nav_order: 30
# Operational panels
Operational panels in OpenSearch Dashboards are collections of visualizations generated using [Piped Processing Language]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index) (PPL) queries.
Operational panels in OpenSearch Dashboards are collections of visualizations generated using [Piped Processing Language]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index) (PPL) queries.
## Get started with operational panels

View File

@ -1,36 +0,0 @@
---
layout: default
title: Data Types
parent: Piped processing language
nav_order: 6
---
# Data types
The following table shows the data types supported by the PPL plugin and how each one maps to OpenSearch and SQL data types:
PPL Type | OpenSearch Type | SQL Type
:--- | :--- | :---
boolean | boolean | BOOLEAN
byte | byte | TINYINT
byte | short | SMALLINT
integer | integer | INTEGER
long | long | BIGINT
float | float | REAL
float | half_float | FLOAT
float | scaled_float | DOUBLE
double | double | DOUBLE
string | keyword | VARCHAR
text | text | VARCHAR
timestamp | date | TIMESTAMP
ip | ip | VARCHAR
timestamp | date | TIMESTAMP
binary | binary | VARBINARY
struct | object | STRUCT
array | nested | STRUCT
In addition to this list, the PPL plugin also supports the `datetime` type, though it doesn't have a corresponding mapping with OpenSearch.
To use a function without a corresponding mapping, you must explicitly convert the data type to one that does.
The PPL plugin supports all SQL date and time types. To learn more, see [SQL Data Types]({{site.url}}{{site.baseurl}}/search-plugins/sql/datatypes/).

View File

@ -1,24 +0,0 @@
---
layout: default
title: Endpoint
parent: Piped processing language
nav_order: 1
---
# Endpoint
Introduced 1.0
{: .label .label-purple }
To send a query request to PPL plugin, use the HTTP POST request.
We recommend a POST request because it doesn't have any length limit and it allows you to pass other parameters to the plugin for other functionality.
Use the `_explain` endpoint for query translation and troubleshooting.
## Request Format
To use the PPL plugin with your own applications, send requests to `_plugins/_ppl`, with your query in the request body:
```json
curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl \
... -d '{"query" : "source=accounts | fields firstname, lastname"}'
```

View File

@ -1,10 +0,0 @@
---
layout: default
title: Functions
parent: Piped processing language
nav_order: 10
---
# Functions
The PPL plugin supports all SQL functions. To learn more, see [SQL Functions]({{site.url}}{{site.baseurl}}/search-plugins/sql/functions/).

View File

@ -1,71 +0,0 @@
---
layout: default
title: Protocol
parent: Piped processing language
nav_order: 2
---
# Protocol
The PPL plugin provides responses in JDBC format. The JDBC format is widely used because it provides schema information and more functionality such as pagination. Besides JDBC driver, various clients can benefit from the detailed and well formatted response.
## Response Format
The body of HTTP POST request can take a few more additional fields with the PPL query:
```json
curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl \
... -d '{"query" : "source=accounts | fields firstname, lastname"}'
```
The following example shows a normal response where the schema includes a field name and its type and datarows includes the result set:
```json
{
"schema": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
}
],
"datarows": [
[
"Amber",
"Duke"
],
[
"Hattie",
"Bond"
],
[
"Nanette",
"Bates"
],
[
"Dale",
"Adams"
]
],
"total": 4,
"size": 4
}
```
If any error occurred, error message and the cause will be returned instead:
```json
curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl \
... -d '{"query" : "source=unknown | fields firstname, lastname"}'
{
"error": {
"reason": "Error occurred in OpenSearch engine: no such index [unknown]",
"details": "org.opensearch.index.IndexNotFoundException: no such index [unknown]\nFor more details, please send request for Json format to see the raw response from opensearch engine.",
"type": "IndexNotFoundException"
},
"status": 404
}
```

View File

@ -1,49 +0,0 @@
---
layout: default
title: Settings
parent: Piped processing language
nav_order: 3
---
# Settings
The PPL plugin adds a few settings to the standard OpenSearch cluster settings. Most are dynamic, so you can change the default behavior of the plugin without restarting your cluster.
You can update these settings like any other cluster setting:
```json
PUT _cluster/settings
{
"transient": {
"plugins": {
"ppl": {
"enabled": "false"
}
}
}
}
```
Similarly, you can also update the settings by sending request to the plugin setting endpoint `_plugins/_query/settings` :
```json
PUT _plugins/_query/settings
{
"transient": {
"plugins": {
"ppl": {
"enabled": "false"
}
}
}
}
```
Requests to `_plugins/_ppl` include index names in the request body, so they have the same access policy considerations as the `bulk`, `mget`, and `msearch` operations. If you set the `rest.action.multi.allow_explicit_index` parameter to `false`, the PPL plugin is disabled.
You can specify the settings shown in the following table:
Setting | Description | Default
:--- | :--- | :---
`plugins.ppl.enabled` | Change to `false` to disable the PPL component. | True
`plugins.query.memory_limit` | Set heap memory usage limit. If a query crosses this limit, it's terminated. | 85%
`plugins.query.size_limit` | Set the maximum number of results that you want to see. This impacts the accuracy of aggregation operations. For example, if you have 1000 documents in an index, by default, only 200 documents are extracted from the index for aggregation. | 200

View File

@ -262,4 +262,4 @@ You can use wildcards to delete more than one data stream.
We recommend deleting data from a data stream using an ISM policy.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/) and [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) and [PPL]({{site.url}}{{site.baseurl}}/observability-plugin/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions on the data stream name.
You can also use [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/index/), [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/), and [PPL]({{site.url}}{{site.baseurl}}/search-plugins/sql/ppl/index/) to query your data stream directly. You can also use the security plugin to define granular permissions for the data stream name.

View File

@ -1,52 +1,91 @@
---
layout: default
title: Aggregation Functions
title: Aggregate Functions
parent: SQL
nav_order: 11
---
# Aggregation functions
# Aggregate functions
Aggregate functions use the `GROUP BY` clause to group sets of values into subsets.
OpenSearch supports the following aggregate functions:
Function | Description
:--- | :---
AVG | Returns the average of the results.
COUNT | Returns the number of results.
SUM | Returns the sum of the results.
MIN | Returns the minimum of the results.
MAX | Returns the maximum of the results.
VAR_POP or VARIANCE | Returns the population variance of the results after discarding nulls.
VAR_SAMP | Returns the sample variance of the results after discarding nulls.
STD or STDDEV | Returns the sample standard deviation of the results. Returns 0 when it has only one row of results.
STDDEV_POP | Returns the population standard deviation of the results.
STDDEV_SAMP | Returns the sample standard deviation of the results. Returns null when it has only one row of results.
The examples below reference an `accounts` table. You can try out the examples by indexing the following documents into OpenSearch using the bulk index operation:
```json
```json
PUT accounts/_bulk?refresh
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL","acct_open_date":"2008-01-23"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN","acct_open_date":"2008-06-07"}
{"index":{"_id":"13"}}
{"account_number":13,"balance":32838,"firstname":"Nanette","lastname":"Bates","age":28,"gender":"F","address":"789 Madison Street","employer":"Quility","email":"nanettebates@quility.com","city":"Nogal","state":"VA","acct_open_date":"2010-04-11"}
{"index":{"_id":"18"}}
{"account_number":18,"balance":4180,"firstname":"Dale","lastname":"Adams","age":33,"gender":"M","address":"467 Hutchinson Court","email":"daleadams@boink.com","city":"Orick","state":"MD","acct_open_date":"2022-11-05"}
```
## Group By
Use the `GROUP BY` clause as an identifier, ordinal, or expression.
### Identifier
The following query returns the gender and average age of customers in the `accounts` index and groups the results by gender:
```sql
SELECT gender, sum(age) FROM accounts GROUP BY gender;
SELECT gender, avg(age) FROM accounts GROUP BY gender;
```
| gender | sum (age)
| gender | avg(age)
:--- | :---
F | 28 |
M | 101 |
F | 28.0 |
M | 33.666666666666664 |
### Ordinal
The following query returns the gender and average age of customers in the `accounts` index. It groups the results by the first column of the result set, which in this case is `gender`:
```sql
SELECT gender, sum(age) FROM accounts GROUP BY 1;
SELECT gender, avg(age) FROM accounts GROUP BY 1;
```
| gender | sum (age)
:--- | :---
F | 28 |
M | 101 |
F | 28.0 |
M | 33.666666666666664 |
### Expression
The following query
```sql
SELECT abs(account_number), sum(age) FROM accounts GROUP BY abs(account_number);
SELECT abs(account_number), avg(age) FROM accounts GROUP BY abs(account_number);
```
| abs(account_number) | sum (age)
| abs(account_number) | avg(age)
:--- | :---
| 1 | 32 |
| 13 | 28 |
| 18 | 33 |
| 6 | 36 |
| 1 | 32.0 |
| 13 | 28.0 |
| 18 | 33.0 |
| 6 | 36.0 |
## Aggregation

View File

@ -1,23 +1,24 @@
---
layout: default
title: SQL CLI
parent: SQL
nav_order: 2
title: SQL and PPL CLI
parent: SQL and PPL
nav_order: 3
---
# SQL CLI
# SQL and PPL CLI
SQL CLI is a stand-alone Python application that you can launch with the `opensearchsql` command.
The SQL and PPL command line interface (CLI) is a standalone Python application that you can launch with the `opensearchsql` command.
Install the SQL plugin to your OpenSearch instance, run the CLI using MacOS or Linux, and connect to any valid OpenSearch end-point.
To use the SQL and PPL CLI, install the SQL plugin on your OpenSearch instance, run the CLI using MacOS or Linux, and connect to any valid OpenSearch endpoint.
![SQL CLI]({{site.url}}{{site.baseurl}}/images/cli.gif)
## Features
SQL CLI has the following features:
The SQL and PPL CLI has the following features:
- Multi-line input
- PPL support
- Autocomplete for SQL syntax and index names
- Syntax highlighting
- Formatted output:
@ -33,26 +34,16 @@ SQL CLI has the following features:
Launch your local OpenSearch instance and make sure you have the SQL plugin installed.
To install the SQL CLI:
1. We suggest you install and activate a python3 virtual environment to avoid changing your local environment:
```
pip install virtualenv
virtualenv venv
cd venv
source ./bin/activate
```
2. Install the CLI:
```
1. Install the CLI:
```console
pip3 install opensearchsql
```
The SQL CLI only works with Python 3.
{: .note }
3. To launch the CLI, run:
```
2. To launch the CLI, run:
```console
opensearchsql https://localhost:9200 --username admin --password admin
```
By default, the `opensearchsql` command connects to http://localhost:9200.
@ -71,25 +62,41 @@ For a list of all available configurations, see [clirc](https://github.com/opens
## Using the CLI
1. Save the sample [accounts test data](https://github.com/opensearch-project/sql/blob/main/doctest/test_data/accounts.json) file.
1. Index the sample data.
1. Run the CLI tool. If your cluster runs with the default security settings, use the following command:
```console
opensearchsql --username admin --password admin https://localhost:9200
```
curl -H "Content-Type: application/x-ndjson" -POST https://localhost:9200/data/_bulk -u 'admin:admin' --insecure --data-binary "@accounts.json"
If your cluster runs without security, run:
```console
opensearchsql
```
1. Run a sample SQL command:
```
2. Run a sample SQL command:
```sql
SELECT * FROM accounts;
```
By default, you see a maximum output of 200 rows. To show more results, add a `LIMIT` clause with the desired value.
To exit the CLI tool, select **Ctrl+D**.
{: .tip }
## Using the CLI with PPL
1. Run the CLI by specifying the query language:
```console
opensearchsql -l ppl <params>
```
2. Execute a PPL query:
```sql
source=accounts | fields firstname, lastname
```
## Query options
Run a single query with the following options:
Run a single query with the following command line options:
- `--help`: Help page for options
- `-q`: Follow by a single query
- `-f`: Specify JDBC or raw format output
- `-v`: Display data vertically
@ -97,6 +104,7 @@ Run a single query with the following options:
## CLI options
- `--help`: Help page for options
- `-l`: Query language option. Available options are `sql` and `ppl`. Default is `sql`
- `-p`: Always use pager to display output
- `--clirc`: Provide path for the configuration file

View File

@ -1,8 +1,8 @@
---
layout: default
title: Data Types
parent: SQL
nav_order: 73
parent: SQL and PPL
nav_order: 7
---
# Data types

View File

@ -1,226 +0,0 @@
---
layout: default
title: Endpoint
parent: SQL
nav_order: 13
---
# Endpoint
Introduced 1.0
{: .label .label-purple }
To send query request to SQL plugin, you can either use a request
parameter in HTTP GET or request body by HTTP POST request. POST request
is recommended because it doesn't have length limitation and allows for
other parameters passed to plugin for other functionality such as
prepared statement. And also the explain endpoint is used very often for
query translation and troubleshooting.
## GET
### Description
You can send HTTP GET request with your query embedded in URL parameter.
### Example
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X GET localhost:9200/_plugins/_sql?sql=SELECT * FROM accounts
```
## POST
### Description
You can also send HTTP POST request with your query in request body.
### Example
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"query" : "SELECT * FROM accounts"
}'
```
## Explain
### Description
To translate your query, send it to explain endpoint. The explain output
is OpenSearch domain specific language (DSL) in JSON format. You can
just copy and paste it to your console to run it against OpenSearch
directly.
### Example
Explain query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql/_explain -d '{
"query" : "SELECT firstname, lastname FROM accounts WHERE age > 20"
}'
```
Explain:
```json
{
"from": 0,
"size": 200,
"query": {
"bool": {
"filter": [{
"bool": {
"must": [{
"range": {
"age": {
"from": 20,
"to": null,
"include_lower": false,
"include_upper": true,
"boost": 1.0
}
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": [
"firstname",
"lastname"
],
"excludes": []
}
}
```
## Cursor
### Description
To get back a paginated response, use the `fetch_size` parameter. The value of `fetch_size` should be greater than 0. The default value is 1,000. A value of 0 will fallback to a non-paginated response.
The `fetch_size` parameter is only supported for the JDBC response format.
{: .note }
### Example
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"fetch_size" : 5,
"query" : "SELECT firstname, lastname FROM accounts WHERE age > 20 ORDER BY state ASC"
}'
```
Result set:
```json
{
"schema": [
{
"name": "firstname",
"type": "text"
},
{
"name": "lastname",
"type": "text"
}
],
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9",
"total": 956,
"datarows": [
[
"Cherry",
"Carey"
],
[
"Lindsey",
"Hawkins"
],
[
"Sargent",
"Powers"
],
[
"Campos",
"Olsen"
],
[
"Savannah",
"Kirby"
]
],
"size": 5,
"status": 200
}
```
To fetch subsequent pages, use the `cursor` from last response:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9"
}'
```
The result only has the `fetch_size` number of `datarows` and `cursor`.
The last page has only `datarows` and no `cursor`.
The `datarows` can have more than the `fetch_size` number of records in case the nested fields are flattened.
```json
{
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMabcde12345",
"datarows": [
[
"Abbey",
"Karen"
],
[
"Chen",
"Ken"
],
[
"Ani",
"Jade"
],
[
"Peng",
"Hu"
],
[
"John",
"Doe"
]
]
}
```
The `cursor` context is automatically cleared on the last page.
To explicitly clear cursor context, use the `_plugins/_sql/close endpoint` operation.
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql/close -d '{
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9"
}'
```
#### Sample response
```json
{"succeeded":true}
```

View File

@ -0,0 +1,490 @@
---
layout: default
title: Full-Text Search
parent: SQL and PPL
nav_order: 11
---
# Full-text search
Use SQL commands for full-text search. The SQL plugin supports a subset of full-text queries available in OpenSearch.
To learn about full-text queries in OpenSearch, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/).
## Match
Use the `MATCH` function to search documents that match a `string`, `number`, `date`, or `boolean` value for a given field.
### Syntax
```sql
match(field_expression, query_expression[, option=<option_value>]*)
```
You can specify the following options in any order:
- `analyzer`
- `auto_generate_synonyms_phrase`
- `fuzziness`
- `max_expansions`
- `prefix_length`
- `fuzzy_transpositions`
- `fuzzy_rewrite`
- `lenient`
- `operator`
- `minimum_should_match`
- `zero_terms_query`
- `boost`
Please, refer to `match` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#match) for parameter description and supported values.
### Example 1: Search the `message` field for the text "this is a test":
```json
GET my_index/_search
{
"query": {
"match": {
"message": "this is a test"
}
}
}
```
*SQL query:*
```sql
SELECT message FROM my_index WHERE match(message, "this is a test")
```
*PPL query:*
```ppl
SOURCE=my_index | WHERE match(message, "this is a test") | FIELDS message
```
### Example 2: Search the `message` field with the `operator` parameter:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "this is a test",
"operator": "and"
}
}
}
}
```
*SQL query:*
```sql
SELECT message FROM my_index WHERE match(message, "this is a test", operator='and')
```
*PPL query:*
```ppl
SOURCE=my_index | WHERE match(message, "this is a test", operator='and') | FIELDS message
```
### Example 3: Search the `message` field with the `operator` and `zero_terms_query` parameters:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "to be or not to be",
"operator": "and",
"zero_terms_query": "all"
}
}
}
}
```
*SQL query:*
```sql
SELECT message FROM my_index WHERE match(message, "this is a test", operator='and', zero_terms_query='all')
```
*PPL query:*
```sql
SOURCE=my_index | WHERE match(message, "this is a test", operator='and', zero_terms_query='all') | FIELDS message
```
## Multi-match
To search for text in multiple fields, use `MULTI_MATCH` function. This function maps to the `multi_match` query used in search engine, to returns the documents that match a provided text, number, date or boolean value with a given field or fields.
### Syntax
The `MULTI_MATCH` function lets you *boost* certain fields using **^** character. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. The syntax allows to specify the fields in double quotes, single quotes, surrounded by backticks, or unquoted. Use star ``"*"`` to search all fields. Star symbol should be quoted.
```sql
multi_match([field_expression+], query_expression[, option=<option_value>]*)
```
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by whitespace. Please, refer to examples below:
```sql
multi_match(["Tags" ^ 2, 'Title' 3.4, `Body`, Comments ^ 0.3], ...)
multi_match(["*"], ...)
```
You can specify the following options for `MULTI_MATCH` in any order:
- `analyzer`
- `auto_generate_synonyms_phrase`
- `cutoff_frequency`
- `fuzziness`
- `fuzzy_transpositions`
- `lenient`
- `max_expansions`
- `minimum_should_match`
- `operator`
- `prefix_length`
- `tie_breaker`
- `type`
- `slop`
- `zero_terms_query`
- `boost`
Please, refer to `multi_match` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#multi-match) for parameter description and supported values.
### For example, REST API search for `Dale` in either the `firstname` or `lastname` fields:
```json
GET accounts/_search
{
"query": {
"multi_match": {
"query": "Lane Street",
"fields": [ "address" ],
}
}
}
```
could be called from *SQL* using `multi_match` function
```sql
SELECT firstname, lastname
FROM accounts
WHERE multi_match(['*name'], 'Dale')
```
or `multi_match` *PPL* function
```sql
SOURCE=accounts | WHERE multi_match(['*name'], 'Dale') | fields firstname, lastname
```
| firstname | lastname
:--- | :---
Dale | Adams
## Query string
To split text based on operators, use the `QUERY_STRING` function. The `QUERY_STRING` function supports logical connectives, wildcard, regex, and proximity search.
This function maps to the to the `query_string` query used in search engine, to return the documents that match a provided text, number, date or boolean value with a given field or fields.
### Syntax
The `QUERY_STRING` function has syntax similar to `MATCH_QUERY` and lets you *boost* certain fields using **^** character. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields. The syntax allows to specify the fields in double quotes, single quotes, surrounded by backticks, or unquoted. Use star ``"*"`` to search all fields. Star symbol should be quoted.
```sql
query_string([field_expression+], query_expression[, option=<option_value>]*)
```
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by whitespace. Please, refer to examples below:
```sql
query_string(["Tags" ^ 2, 'Title' 3.4, `Body`, Comments ^ 0.3], ...)
query_string(["*"], ...)
```
You can specify the following options for `QUERY_STRING` in any order:
- `analyzer`
- `allow_leading_wildcard`
- `analyze_wildcard`
- `auto_generate_synonyms_phrase_query`
- `boost`
- `default_operator`
- `enable_position_increments`
- `fuzziness`
- `fuzzy_rewrite`
- `escape`
- `fuzzy_max_expansions`
- `fuzzy_prefix_length`
- `fuzzy_transpositions`
- `lenient`
- `max_determinized_states`
- `minimum_should_match`
- `quote_analyzer`
- `phrase_slop`
- `quote_field_suffix`
- `rewrite`
- `type`
- `tie_breaker`
- `time_zone`
Please, refer to `query_string` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#query-string) for parameter description and supported values.
### Example of using `query_string` in SQL and PPL queries:
The REST API search request
```json
GET accounts/_search
{
"query": {
"query_string": {
"query": "Lane Street",
"fields": [ "address" ],
}
}
}
```
could be called from *SQL*
```sql
SELECT account_number, address
FROM accounts
WHERE query_string(['address'], 'Lane Street', default_operator='OR')
```
or from *PPL*
```sql
SOURCE=accounts | WHERE query_string(['address'], 'Lane Street', default_operator='OR') | fields account_number, address
```
| account_number | address
:--- | :---
1 | 880 Holmes Lane
6 | 671 Bristol Street
13 | 789 Madison Street
## Match phrase
To search for exact phrases, use `MATCHPHRASE` or `MATCH_PHRASE` functions.
### Syntax
```sql
matchphrasequery(field_expression, query_expression)
matchphrase(field_expression, query_expression[, option=<option_value>]*)
match_phrase(field_expression, query_expression[, option=<option_value>]*)
```
The `MATCHPHRASE`/`MATCH_PHRASE` functions let you specify the following options in any order:
- `analyzer`
- `slop`
- `zero_terms_query`
- `boost`
Please, refer to `match_phrase` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#match-phrase) for parameter description and supported values.
### Example of using `match_phrase` in SQL and PPL queries:
The REST API search request
```json
GET accounts/_search
{
"query": {
"match_phrase": {
"address": {
"query": "880 Holmes Lane"
}
}
}
}
```
could be called from *SQL*
```sql
SELECT account_number, address
FROM accounts
WHERE match_phrase(address, '880 Holmes Lane')
```
or *PPL*
```sql
SOURCE=accounts | WHERE match_phrase(address, '880 Holmes Lane') | FIELDS account_number, address
```
| account_number | address
:--- | :---
1 | 880 Holmes Lane
## Simple query string
The `simple_query_string` function maps to the `simple_query_string` query in OpenSearch. It returns the documents that match a provided text, number, date or boolean value with a given field or fields.
The **^** lets you *boost* certain fields. Boosts are multipliers that weigh matches in one field more heavily than matches in other fields.
### Syntax
The syntax allows to specify the fields in double quotes, single quotes, surrounded by backticks, or unquoted. Use star ``"*"`` to search all fields. Star symbol should be quoted.
```sql
simple_query_string([field_expression+], query_expression[, option=<option_value>]*)
```
The weight is optional and is specified after the field name. It could be delimited by the `caret` character -- `^` or by whitespace. Please, refer to examples below:
```sql
simple_query_string(["Tags" ^ 2, 'Title' 3.4, `Body`, Comments ^ 0.3], ...)
simple_query_string(["*"], ...)
```
You can specify the following options for `SIMPLE_QUERY_STRING` in any order:
- `analyze_wildcard`
- `analyzer`
- `auto_generate_synonyms_phrase_query`
- `boost`
- `default_operator`
- `flags`
- `fuzzy_max_expansions`
- `fuzzy_prefix_length`
- `fuzzy_transpositions`
- `lenient`
- `minimum_should_match`
- `quote_field_suffix`
Please, refer to `simple_query_string` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#simple-query-string) to check parameter meanings and available values.
### *Example* of using `simple_query_string` in SQL and PPL queries:
The REST API search request
```json
GET accounts/_search
{
"query": {
"simple_query_string": {
"query": "Lane Street",
"fields": [ "address" ],
}
}
}
```
could be called from *SQL*
```sql
SELECT account_number, address
FROM accounts
WHERE simple_query_string(['address'], 'Lane Street', default_operator='OR')
```
or from *PPL*
```sql
SOURCE=accounts | WHERE simple_query_string(['address'], 'Lane Street', default_operator='OR') | fields account_number, address
```
| account_number | address
:--- | :---
1 | 880 Holmes Lane
6 | 671 Bristol Street
13 | 789 Madison Street
## Match phrase prefix
To search for phrases by given prefix, use `MATCH_PHRASE_PREFIX` function to make a prefix query out of the last term in the query string.
### Syntax
```sql
match_phrase_prefix(field_expression, query_expression[, option=<option_value>]*)
```
The `MATCH_PHRASE_PREFIX` function lets you specify the following options in any order:
- `analyzer`
- `slop`
- `max_expansions`
- `zero_terms_query`
- `boost`
Please, refer to `match_phrase_prefix` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#match-phrase-prefix) for parameter description and supported values.
### *Example* of using `match_phrase_prefix` in SQL and PPL queries:
The REST API search request
```json
GET accounts/_search
{
"query": {
"match_phrase_prefix": {
"author": {
"query": "Alexander Mil"
}
}
}
}
```
could be called from *SQL*
```sql
SELECT author, title
FROM books
WHERE match_phrase_prefix(author, 'Alexander Mil')
```
or *PPL*
```sql
source=books | where match_phrase_prefix(author, 'Alexander Mil') | fields author, title
```
| author | title
:--- | :---
Alan Alexander Milne | The House at Pooh Corner
Alan Alexander Milne | Winnie-the-Pooh
## Match boolean prefix
Use the `match_bool_prefix` function to search documents that match text only for a given field prefix.
### Syntax
```sql
match_bool_prefix(field_expression, query_expression[, option=<option_value>]*)
```
The `MATCH_BOOL_PREFIX` function lets you specify the following options in any order:
- `minimum_should_match`
- `fuzziness`
- `prefix_length`
- `max_expansions`
- `fuzzy_transpositions`
- `fuzzy_rewrite`
- `boost`
- `analyzer`
- `operator`
Please, refer to `match_bool_prefix` query [documentation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/#match-boolean-prefix) for parameter description and supported values.
### Example of using `match_bool_prefix` in SQL and PPL queries:
The REST API search request
```json
GET accounts/_search
{
"query": {
"match_bool_prefix": {
"address": {
"query": "Bristol Stre"
}
}
}
}
```
could be called from *SQL*
```sql
SELECT firstname, address
FROM accounts
WHERE match_bool_prefix(address, 'Bristol Stre')
```
or *PPL*
```sql
source=accounts | where match_bool_prefix(address, 'Bristol Stre') | fields firstname, address
```
| firstname | address
:--- | :---
Hattie | 671 Bristol Street
Nanette | 789 Madison Street

View File

@ -1,7 +1,7 @@
---
layout: default
title: Functions
parent: SQL
parent: SQL and PPL
nav_order: 10
---
@ -10,9 +10,9 @@ nav_order: 10
You must enable fielddata in the document mapping for most string functions to work properly.
The specification shows the return type of the function with a generic type `T` as the argument.
For example, `abs(number T) -> T` means that the function `abs` accepts a numerical argument of type `T`, which could be any sub-type of the `number` type, and it returns the actual type of `T` as the return type.
For example, `abs(number T) -> T` means that the function `abs` accepts a numerical argument of type `T`, which could be any subtype of the `number` type, and it returns the actual type of `T` as the return type.
The SQL plugin supports the following functions.
The SQL plugin supports the following common functions shared across the SQL and PPL languages.
## Mathematical
@ -131,3 +131,7 @@ Function | Specification | Example
if | `if(boolean, es_type, es_type) -> es_type` | `SELECT if(false, 0, 1) FROM my-index LIMIT 1`, `SELECT if(true, 0, 1) FROM my-index LIMIT 1`
ifnull | `ifnull(es_type, es_type) -> es_type` | `SELECT ifnull('hello', 1) FROM my-index LIMIT 1`, `SELECT ifnull(null, 1) FROM my-index LIMIT 1`
isnull | `isnull(es_type) -> integer` | `SELECT isnull(null) FROM my-index LIMIT 1`, `SELECT isnull(1) FROM my-index LIMIT 1`
## Relevance-based search (full-text search)
These functions are only available in the `WHERE` clause. For their descriptions and usage examples in SQL and PPL, see [Full-text search]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text/).

View File

@ -1,8 +1,8 @@
---
layout: default
title: Identifiers
parent: Piped processing language
nav_order: 7
parent: SQL and PPL
nav_order: 6
---
@ -28,7 +28,7 @@ For regular identifiers, you can use the name without any back tick or escape ch
In this example, `source`, `fields`, `account_number`, `firstname`, and `lastname` are all identifiers. Out of these, the `source` field is a reserved identifier.
```sql
source=accounts | fields account_number, firstname, lastname;
SELECT account_number, firstname, lastname FROM accounts;
```
| account_number | firstname | lastname |

View File

@ -1,6 +1,6 @@
---
layout: default
title: SQL
title: SQL and PPL
nav_order: 38
has_children: true
has_toc: false
@ -8,69 +8,10 @@ redirect_from:
- /search-plugins/sql/
---
# SQL
# SQL and PPL
OpenSearch SQL lets you write queries in SQL rather than the [OpenSearch query domain-specific language (DSL)]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text). If you're already familiar with SQL and don't want to learn the query DSL, this feature is a great option.
## Workbench
The easiest way to get familiar with the SQL plugin is to use **Query Workbench** in OpenSearch Dashboards to test various queries. To learn more, see [Workbench]({{site.url}}{{site.baseurl}}/search-plugins/sql/workbench/).
![OpenSearch Dashboards SQL UI plugin]({{site.url}}{{site.baseurl}}/images/sql.png)
## REST API
To use the SQL plugin with your own applications, send requests to `_plugins/_sql`:
```json
POST _plugins/_sql
{
"query": "SELECT * FROM my-index LIMIT 50"
}
```
Heres how core SQL concepts map to OpenSearch:
SQL | OpenSearch
:--- | :---
Table | Index
Row | Document
Column | Field
You can query multiple indices by listing them or using wildcards:
```json
POST _plugins/_sql
{
"query": "SELECT * FROM my-index1,myindex2,myindex3 LIMIT 50"
}
POST _plugins/_sql
{
"query": "SELECT * FROM my-index* LIMIT 50"
}
```
For a sample [curl](https://curl.haxx.se/) command, try:
```bash
curl -XPOST https://localhost:9200/_plugins/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{"query": "SELECT * FROM opensearch_dashboards_sample_data_flights LIMIT 10"}'
```
By default, queries return data in JDBC format, but you can also return data in standard OpenSearch JSON, CSV, or raw formats:
```json
POST _plugins/_sql?format=json|csv|raw
{
"query": "SELECT * FROM my-index LIMIT 50"
}
```
See the rest of this guide for detailed information on request parameters, settings, supported operations, tools, and more.
## Contributing
To get involved and help us improve the SQL plugin, see the [development guide](https://github.com/opensearch-project/sql/blob/main/DEVELOPER_GUIDE.rst) for instructions on setting up your development environment and building the project.

View File

@ -1,61 +1,17 @@
---
layout: default
title: Limitations
parent: SQL
nav_order: 18
parent: SQL and PPL
nav_order: 99
---
# Limitations
The SQL plugin has the following limitations:
## SELECT FROM WHERE
### Select literal is not supported
The select literal expression is not supported. For example, `Select 1` is not supported.
### Where clause does not support arithmetic operations
The `WHERE` clause does not support expressions. For example, `SELECT FlightNum FROM opensearch_dashboards_sample_data_flights where (AvgTicketPrice + 100) <= 1000` is not supported.
### Aggregation over expression is not supported
You can only apply aggregation on fields, aggregations can't accept an expression as a parameter. For example, `avg(log(age))` is not supported.
### Conflict type in multiple index query
Queries using wildcard index fail if the index has the field with a conflict type.
For example, if you have two indices with field `a`:
```
POST conflict_index_1/_doc/
{
"a": {
"b": 1
}
}
POST conflict_index_2/_doc/
{
"a": {
"b": 1,
"c": 2
}
}
```
Then, the query fails because of the field mapping conflict. The query `SELECT * FROM conflict_index*` also fails for the same reason.
```sql
Error occurred in OpenSearch engine: Different mappings are not allowed for the same field[a]: found [{properties:{b:{type:long},c:{type:long}}}] and [{properties:{b:{type:long}}}] ",
"details": "com.amazon.opensearch.sql.rewriter.matchtoterm.VerificationException: Different mappings are not allowed for the same field[a]: found [{properties:{b:{type:long},c:{type:long}}}] and [{properties:{b:{type:long}}}] \nFor more details, please send request for Json format to see the raw response from opensearch engine.",
"type": "VerificationException
```
## Aggregation over expression is not supported
You can only apply aggregation to fields. Aggregations cannot accept an expression as a parameter. For example, `avg(log(age))` is not supported.
## Subquery in the FROM clause
@ -76,10 +32,10 @@ But, if the outer query has `GROUP BY` or `ORDER BY`, then it's not supported.
The `join` query does not support aggregations on the joined result.
For example, e.g. `SELECT depo.name, avg(empo.age) FROM empo JOIN depo WHERE empo.id == depo.id GROUP BY depo.name` is not supported.
## Pagination only supports basic queries
The pagination query enables you to get back paginated responses.
Currently, the pagination only supports basic queries. For example, the following query returns the data with cursor id.
```json
@ -116,3 +72,23 @@ The response in JDBC format with cursor id.
```
The query with `aggregation` and `join` does not support pagination for now.
## Query processing engines
The SQL plugin has two query processing engines, `V1` and `V2`. Most of the features are supported by both engines, but only the new engine is actively being developed. A query that is first executed on the `V2` engine falls back to the `V1` engine in case of failure. If a query is supported in `V2` but not included in `V1`, the query will fail with an error response.
### V1 engine limitations
* The select literal expression without `FROM` clause is not supported. For example, `SELECT 1` is not supported.
* The `WHERE` clause does not support expressions. For example, `SELECT FlightNum FROM opensearch_dashboards_sample_data_flights where (AvgTicketPrice + 100) <= 1000` is not supported.
* Most [relevancy search functions]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text/) are implemented in the `V2` engine only.
Such queries are successfully executed by the `V2` engine unless they have `V1`-specific functions. You will likely never meet these limitations.
### V2 engine limitations
* The [cursor feature](#pagination-only-supports-basic-queries) is supported by the `V1` engine only.
For support of `cursor`/`pagination` in the `V2` engine, track [GitHub issue #656](https://github.com/opensearch-project/sql/issues/656).
* The `V2` engine does not track query execution time, so slow queries are not reported.
* The `V2` query engine not only runs queries in the OpenSearch engine but also supports post-processing for complicated queries. Accordingly, the explain output is no longer pure OpenSearch domain-specific language (DSL) but also includes query plan information from the `V2` query engine.
* The `V2` engine does not support [`SCORE_QUERY`]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/functions#score-query) and [`WILDCARD_QUERY`]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql/functions#wildcard-query) functions.

View File

@ -1,8 +1,8 @@
---
layout: default
title: Monitoring
parent: SQL
nav_order: 15
parent: SQL and PPL
nav_order: 95
---
# Monitoring

View File

@ -1,69 +1,14 @@
---
layout: default
title: Commands
parent: Piped processing language
nav_order: 4
parent: PPL - Piped Processing Language
grand_parent: SQL and PPL
nav_order: 2
---
# Commands
Start a PPL query with a `search` command to reference a table to search from. You can have the commands that follow in any order.
In the following example, the `search` command refers to an `accounts` index as the source, then uses `fields` and `where` commands for the conditions:
```sql
search source=accounts
| where age > 18
| fields firstname, lastname
```
In the below examples, we represent required arguments in angle brackets `< >` and optional arguments in square brackets `[ ]`.
{: .note }
## search
Use the `search` command to retrieve a document from an index. You can only use the `search` command as the first command in the PPL query.
### Syntax
```sql
search source=<index> [boolean-expression]
```
Field | Description | Required
:--- | :--- |:---
`search` | Specify search keywords. | Yes
`index` | Specify which index to query from. | No
`bool-expression` | Specify an expression that evaluates to a boolean value. | No
*Example 1*: Get all documents
To get all documents from the `accounts` index:
```sql
search source=accounts;
```
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams
*Example 2*: Get documents that match a condition
To get all documents from the `accounts` index that have either `account_number` equal to 1 or have `gender` as `F`:
```sql
search source=accounts account_number=1 or gender=\"F\";
```
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke |
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates |
`PPL` supports all [`SQL` common]({{site.url}}{{site.baseurl}}/search-plugins/sql/functions/) functions, including [relevance search]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text/), but also introduces few more functions (called `commands`) which are available in `PPL` only.
## dedup
@ -82,7 +27,7 @@ Field | Description | Type | Required | Default
`consecutive` | If true, remove only consecutive events with duplicate combinations of values. | `Boolean` | No | False
`field-list` | Specify a comma-delimited field list. At least one field is required. | `String` or comma-separated list of strings | Yes | -
*Example 1*: Dedup by one field
**Example 1: Dedup by one field**
To remove duplicate documents with the same gender:
@ -96,7 +41,7 @@ search source=accounts | dedup gender | fields account_number, gender;
13 | F
*Example 2*: Keep two duplicate documents
**Example 2: Keep two duplicate documents**
To keep two duplicate documents with the same gender:
@ -110,7 +55,7 @@ search source=accounts | dedup 2 gender | fields account_number, gender;
6 | M
13 | F
*Example 3*: Keep or ignore an empty field by default
**Example 3: Keep or ignore an empty field by default**
To keep two duplicate documents with a `null` field value:
@ -137,7 +82,7 @@ search source=accounts | dedup email | fields account_number, email;
6 | hattiebond@netagy.com
18 | daleadams@boink.com
*Example 4*: Dedup of consecutive documents
**Example 4: Dedup of consecutive documents**
To remove duplicates of consecutive documents:
@ -170,7 +115,7 @@ Field | Description | Required
`field` | If a field name does not exist, a new field is added. If the field name already exists, it's overwritten. | Yes
`expression` | Specify any supported expression. | Yes
*Example 1*: Create a new field
**Example 1: Create a new field**
To create a new `doubleAge` field for each document. `doubleAge` is the result of `age` multiplied by 2:
@ -200,7 +145,7 @@ search source=accounts | eval age = age + 1 | fields age;
| 29
| 34
*Example 3*: Create a new field with a field defined with the `eval` command
**Example 3: Create a new field with a field defined with the `eval` command**
To create a new field `ddAge`. `ddAge` is the result of `doubleAge` multiplied by 2, where `doubleAge` is defined in the `eval` command:
@ -235,7 +180,7 @@ Field | Description | Required | Default
`index` | Plus (+) keeps only fields specified in the field list. Minus (-) removes all fields specified in the field list. | No | +
`field list` | Specify a comma-delimited list of fields. | Yes | No default
*Example 1*: Select specified fields from result
**Example 1: Select specified fields from result**
To get `account_number`, `firstname`, and `lastname` fields from a search result:
@ -250,7 +195,7 @@ search source=accounts | fields account_number, firstname, lastname;
| 13 | Nanette | Bates
| 18 | Dale | Adams
*Example 2*: Remove specified fields from a search result
**Example 2: Remove specified fields from a search result**
To remove the `account_number` field from the search results:
@ -283,7 +228,7 @@ regular-expression | The regular expression used to extract new fields from the
The regular expression is used to match the whole text field of each document with Java regex engine. Each named capture group in the expression will become a new ``STRING`` field.
*Example 1*: Create new field
**Example 1: Create new field**
The example shows how to create new field `host` for each document. `host` will be the hostname after `@` in `email` field. Parsing a null field will return an empty string.
@ -315,7 +260,7 @@ fetched rows / total rows = 4/4
| Madison Street
| Hutchinson Court
*Example 3*: Filter and sort be casted parsed field
**Example 3: Filter and sort be casted parsed field**
The example shows how to sort street numbers that are higher than 500 in address field.
@ -354,7 +299,7 @@ Field | Description | Required
`source-field` | The name of the field that you want to rename. | Yes
`target-field` | The name you want to rename to. | Yes
*Example 1*: Rename one field
**Example 1: Rename one field**
Rename the `account_number` field as `an`:
@ -369,7 +314,7 @@ search source=accounts | rename account_number as an | fields an;
| 13
| 18
*Example 2*: Rename multiple fields
**Example 2: Rename multiple fields**
Rename the `account_number` field as `an` and `employer` as `emp`:
@ -404,7 +349,7 @@ Field | Description | Required | Default
`[+|-]` | Use plus [+] to sort by ascending order and minus [-] to sort by descending order. | No | Ascending order
`sort-field` | Specify the field that you want to sort by. | Yes | -
*Example 1*: Sort by one field
**Example 1: Sort by one field**
To sort all documents by the `age` field in ascending order:
@ -419,7 +364,7 @@ search source=accounts | sort age | fields account_number, age;
| 18 | 33
| 6 | 36
*Example 2*: Sort by one field and return all results
**Example 2: Sort by one field and return all results**
To sort all documents by the `age` field in ascending order and specify count as 0 to get back all results:
@ -434,7 +379,7 @@ search source=accounts | sort 0 age | fields account_number, age;
| 18 | 33
| 6 | 36
*Example 3*: Sort by one field in descending order
**Example 3: Sort by one field in descending order**
To sort all documents by the `age` field in descending order:
@ -449,7 +394,7 @@ search source=accounts | sort - age | fields account_number, age;
| 1 | 32
| 13 | 28
*Example 4*: Specify the number of sorted documents to return
**Example 4: Specify the number of sorted documents to return**
To sort all documents by the `age` field in ascending order and specify count as 2 to get back two results:
@ -462,7 +407,7 @@ search source=accounts | sort 2 age | fields account_number, age;
| 13 | 28
| 1 | 32
*Example 5*: Sort by multiple fields
**Example 5: Sort by multiple fields**
To sort all documents by the `gender` field in ascending order and `age` field in descending order:
@ -503,7 +448,7 @@ Field | Description | Required | Default
`aggregation` | Specify a statistical aggregation function. The argument of this function must be a field. | Yes | 1000
`by-clause` | Specify one or more fields to group the results by. If not specified, the `stats` command returns only one row, which is the aggregation over the entire result set. | No | -
*Example 1*: Calculate the average value of a field
**Example 1: Calculate the average value of a field**
To calculate the average `age` of all documents:
@ -515,7 +460,7 @@ search source=accounts | stats avg(age);
:--- |
| 32.25
*Example 2*: Calculate the average value of a field by group
**Example 2: Calculate the average value of a field by group**
To calculate the average age grouped by gender:
@ -528,7 +473,7 @@ search source=accounts | stats avg(age) by gender;
| F | 28.0
| M | 33.666666666666664
*Example 3*: Calculate the average and sum of a field by group
**Example 3: Calculate the average and sum of a field by group**
To calculate the average and sum of age grouped by gender:
@ -541,7 +486,7 @@ search source=accounts | stats avg(age), sum(age) by gender;
| F | 28 | 28
| M | 33.666666666666664 | 101
*Example 4*: Calculate the maximum value of a field
**Example 4: Calculate the maximum value of a field**
To calculate the maximum age:
@ -553,7 +498,7 @@ search source=accounts | stats max(age);
:--- |
| 36
*Example 5*: Calculate the maximum and minimum value of a field by group
**Example 5: Calculate the maximum and minimum value of a field by group**
To calculate the maximum and minimum age values grouped by gender:
@ -580,7 +525,7 @@ Field | Description | Required
:--- | :--- |:---
`bool-expression` | An expression that evaluates to a boolean value. | No
*Example 1*: Filter result set with a condition
**Example: Filter result set with a condition**
To get all documents from the `accounts` index where `account_number` is 1 or gender is `F`:
@ -607,7 +552,7 @@ Field | Description | Required | Default
:--- | :--- |:---
`N` | Specify the number of results to return. | No | 10
*Example 1*: Get the first 10 results
**Example 1: Get the first 10 results**
To get the first 10 results:
@ -621,7 +566,7 @@ search source=accounts | fields firstname, age | head;
| Hattie | 36
| Nanette | 28
*Example 2*: Get the first N results
**Example 2: Get the first N results**
To get the first two results:
@ -654,7 +599,7 @@ Field | Description | Required
`field-list` | Specify a comma-delimited list of field names. | No
`by-clause` | Specify one or more fields to group the results by. | No
*Example 1*: Find the least common values in a field
**Example 1: Find the least common values in a field**
To find the least common values of gender:
@ -667,7 +612,7 @@ search source=accounts | rare gender;
| F
| M
*Example 2*: Find the least common values grouped by gender
**Example 2: Find the least common values grouped by gender**
To find the least common age grouped by gender:
@ -701,7 +646,7 @@ Field | Description | Default
`field-list` | Specify a comma-delimited list of field names. | -
`by-clause` | Specify one or more fields to group the results by. | -
*Example 1*: Find the most common values in a field
**Example 1: Find the most common values in a field**
To find the most common genders:
@ -714,7 +659,7 @@ search source=accounts | top gender;
| M
| F
*Example 2*: Find the most common value in a field
**Example 2: Find the most common value in a field**
To find the most common gender:
@ -726,7 +671,7 @@ search source=accounts | top 1 gender;
:--- |
| M
*Example 2*: Find the most common values grouped by gender
**Example 3: Find the most common values grouped by gender**
To find the most common age grouped by gender:
@ -743,100 +688,11 @@ search source=accounts | top 1 age by gender;
The `top` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node.
## match
Use the `match` command to search documents that match a `string`, `number`, `date`, or `boolean` value for a given field.
### Syntax
```sql
match(field_expression, query_expression[, option=<option_value>]*)
```
You can specify the following options:
- `analyzer`
- `auto_generate_synonyms_phrase`
- `fuzziness`
- `max_expansions`
- `prefix_length`
- `fuzzy_transpositions`
- `fuzzy_rewrite`
- `lenient`
- `operator`
- `minimum_should_match`
- `zero_terms_query`
- `boost`
*Example 1*: Search the `message` field:
```json
GET my_index/_search
{
"query": {
"match": {
"message": "this is a test"
}
}
}
```
PPL query:
```sql
search source=my_index | match field=message query="this is a test"
```
*Example 2*: Search the `message` field with the `operator` parameter:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "this is a test",
"operator": "and"
}
}
}
}
```
PPL query:
```sql
search source=my_index | match field=message query="this is a test" operator=and
```
*Example 3*: Search the `message` field with the `operator` and `zero_terms_query` parameters:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "to be or not to be",
"operator": "and",
"zero_terms_query": "all"
}
}
}
}
```
PPL query:
```ppl
search source=my_index | where match(message, "this is a test", operator=and, zero_terms_query=all)
```
## ad
The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed in time RCF for processing time-series data and batch RCF for processing non-time-series data.
The `ad` command applies the Random Cut Forest (RCF) algorithm in the [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed in time RCF for processing time-series data and batch RCF for processing non-time-series data.
### Fixed In Time RCF For Time-series Data Command Syntax
### Syntax: Fixed In Time RCF For Time-series Data Command
```sql
ad <shingle_size> <time_decay> <time_field>
@ -848,7 +704,7 @@ Field | Description | Required
`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No
`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in "yyyy-MM-dd HH:mm:ss".| Yes
### Batch RCF for Non-time-series Data Command Syntax
### Syntax: Batch RCF for Non-time-series Data Command
```sql
ad <shingle_size> <time_decay>
@ -859,7 +715,7 @@ Field | Description | Required
`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No
`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No
*Example 1*: Detecting events in New York City from taxi ridership data with time-series data
**Example 1: Detecting events in New York City from taxi ridership data with time-series data**
The example trains a RCF model and use the model to detect anomalies in the time-series ridership data.
@ -873,7 +729,7 @@ value | timestamp | score | anomaly_grade
:--- | :--- |:--- | :---
10844.0 | 1404172800000 | 0.0 | 0.0
*Example 2*: Detecting events in New York City from taxi ridership data with non-time-series data
**Example 2: Detecting events in New York City from taxi ridership data with non-time-series data**
PPL query:
@ -889,7 +745,7 @@ value | score | anomalous
The kmeans command applies the ML Commons plugin's kmeans algorithm to the provided PPL command's search results.
## Syntax
### Syntax
```sql
kmeans <cluster-number>
@ -897,7 +753,7 @@ kmeans <cluster-number>
For `cluster-number`, enter the number of clusters you want to group your data points into.
*Example*
**Example: Group Iris data**
The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals.

View File

@ -1,14 +1,17 @@
---
layout: default
title: Piped processing language
nav_order: 40
title: PPL &ndash; Piped Processing Language
parent: SQL and PPL
nav_order: 5
has_children: true
has_toc: false
redirect_from:
- /search-plugins/ppl/
- /search-plugins/sql/ppl
- /search-plugins/ppl
- /observability-plugin/ppl
---
# Piped Processing Language
# PPL &ndash; Piped Processing Language
Piped Processing Language (PPL) is a query language that lets you use pipe (`|`) syntax to explore, discover, and query data stored in OpenSearch.
@ -42,7 +45,7 @@ Go to **Query Workbench** and select **PPL**.
The following example returns `firstname` and `lastname` fields for documents in an `accounts` index with `age` greater than 18:
```json
```sql
search source=accounts
| where age > 18
| fields firstname, lastname

View File

@ -0,0 +1,71 @@
---
layout: default
title: Syntax
parent: PPL - Piped Processing Language
grand_parent: SQL and PPL
nav_order: 1
---
# PPL syntax
Every PPL query starts with the `search` command. It specifies the index to search and retrieve documents from. Subsequent commands can follow in any order.
Currently, `PPL` supports only one `search` command, which can be omitted to simplify the query.
{ : .note}
## Syntax
```sql
search source=<index> [boolean-expression]
source=<index> [boolean-expression]
```
Field | Description | Required
:--- | :--- |:---
`search` | Specifies search keywords. | Yes
`index` | Specifies which index to query from. | No
`bool-expression` | Specifies an expression that evaluates to a Boolean value. | No
## Examples
**Example 1: Search through accounts index**
In the following example, the `search` command refers to an `accounts` index as the source and uses `fields` and `where` commands for the conditions:
```sql
search source=accounts
| where age > 18
| fields firstname, lastname
```
In the following examples, angle brackets `< >` enclose required arguments and square brackets `[ ]` enclose optional arguments.
{: .note }
**Example 2: Get all documents**
To get all documents from the `accounts` index, specify it as the `source`:
```sql
search source=accounts;
```
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke
| 6 | Hattie | 671 Bristol Street | 5686 | M | Dante | Netagy | TN | 36 | hattiebond@netagy.com | Bond
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates
| 18 | Dale | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | Adams
**Example 3: Get documents that match a condition**
To get all documents from the `accounts` index that either have `account_number` equal to 1 or have `gender` as `F`, use the following query:
```sql
search source=accounts account_number=1 or gender=\"F\";
```
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
:--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :---
| 1 | Amber | 880 Holmes Lane | 39225 | M | Brogan | Pyrami | IL | 32 | amberduke@pyrami.com | Duke |
| 13 | Nanette | 789 Madison Street | 32838 | F | Nogal | Quility | VA | 28 | null | Bates |

View File

@ -1,331 +0,0 @@
---
layout: default
title: Protocol
parent: SQL
nav_order: 14
---
# Protocol
For the protocol, SQL plugin provides multiple response formats for
different purposes while the request format is same for all. Among them
JDBC format is widely used because it provides schema information and
more functionality such as pagination. Besides JDBC driver, various
clients can benefit from the detailed and well formatted response.
## Request Format
### Description
The body of HTTP POST request can take a few more other fields with SQL
query.
### Example 1
Use `filter` to add more conditions to
OpenSearch DSL directly.
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"query" : "SELECT firstname, lastname, balance FROM accounts",
"filter" : {
"range" : {
"balance" : {
"lt" : 10000
}
}
}
}'
```
Explain:
```json
{
"from": 0,
"size": 200,
"query": {
"bool": {
"filter": [{
"bool": {
"filter": [{
"range": {
"balance": {
"from": null,
"to": 10000,
"include_lower": true,
"include_upper": false,
"boost": 1.0
}
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": [
"firstname",
"lastname",
"balance"
],
"excludes": []
}
}
```
### Example 2
Use `parameters` for actual parameter value
in prepared SQL query.
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"query": "SELECT * FROM accounts WHERE age = ?",
"parameters": [{
"type": "integer",
"value": 30
}]
}'
```
Explain:
```json
{
"from": 0,
"size": 200,
"query": {
"bool": {
"filter": [{
"bool": {
"must": [{
"term": {
"age": {
"value": 30,
"boost": 1.0
}
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}
```
## JDBC Format
### Description
By default, the plugin returns the JDBC standard format. This format
is provided for JDBC driver and clients that need both schema and
result set well formatted.
### Example 1
Here is an example for normal response. The
`schema` includes field name and its type
and `datarows` includes the result set.
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age LIMIT 2"
}'
```
Result set:
```json
{
"schema": [{
"name": "firstname",
"type": "text"
},
{
"name": "lastname",
"type": "text"
},
{
"name": "age",
"type": "long"
}
],
"total": 4,
"datarows": [
[
"Nanette",
"Bates",
28
],
[
"Amber",
"Duke",
32
]
],
"size": 2,
"status": 200
}
```
### Example 2
If any error occurred, error message and the cause will be returned
instead.
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql -d '{
"query" : "SELECT unknown FROM accounts"
}'
```
Result set:
```json
{
"error": {
"reason": "Invalid SQL query",
"details": "Field [unknown] cannot be found or used here.",
"type": "SemanticAnalysisException"
},
"status": 400
}
```
## OpenSearch DSL
### Description
The `json` format returns original response from OpenSearch in
JSON. Because this is the native response from OpenSearch, extra
efforts are needed to parse and interpret it.
### Example
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql?format=json -d '{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age LIMIT 2"
}'
```
Result set:
```json
{
"_shards": {
"total": 5,
"failed": 0,
"successful": 5,
"skipped": 0
},
"hits": {
"hits": [{
"_index": "accounts",
"_type": "account",
"_source": {
"firstname": "Nanette",
"age": 28,
"lastname": "Bates"
},
"_id": "13",
"sort": [
28
],
"_score": null
},
{
"_index": "accounts",
"_type": "account",
"_source": {
"firstname": "Amber",
"age": 32,
"lastname": "Duke"
},
"_id": "1",
"sort": [
32
],
"_score": null
}
],
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null
},
"took": 100,
"timed_out": false
}
```
## CSV Format
### Description
You can also use CSV format to download result set as CSV.
### Example
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql?format=csv -d '{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age"
}'
```
Result set:
```text
firstname,lastname,age
Nanette,Bates,28
Amber,Duke,32
Dale,Adams,33
Hattie,Bond,36
```
## Raw Format
### Description
Additionally raw format can be used to pipe the result to other command
line tool for post processing.
### Example
SQL query:
```console
>> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_sql?format=raw -d '{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age"
}'
```
Result set:
```text
Nanette|Bates|28
Amber|Duke|32
Dale|Adams|33
Hattie|Bond|36
```

View File

@ -0,0 +1,283 @@
---
layout: default
title: Response formats
parent: SQL and PPL
nav_order: 2
---
# Response formats
The SQL plugin provides the `jdbc`, `csv`, `raw`, and `json` response formats that are useful for different purposes. The `jdbc` format is widely used because it provides the schema information and adds more functionality, such as pagination. Besides the JDBC driver, various clients can benefit from a detailed and well-formatted response.
## JDBC format
By default, the SQL plugin returns the response in the standard JDBC format. This format is provided for the JDBC driver and clients that need both the schema and the result set to be well formatted.
#### Sample request
The following query does not specify the response format, so the format is set to `jdbc`:
```json
POST _plugins/_sql
{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age LIMIT 2"
}
```
#### Sample response
In the response, the `schema` contains the field names and types, and the `datarows` field contains the result set:
```json
{
"schema": [{
"name": "firstname",
"type": "text"
},
{
"name": "lastname",
"type": "text"
},
{
"name": "age",
"type": "long"
}
],
"total": 4,
"datarows": [
[
"Nanette",
"Bates",
28
],
[
"Amber",
"Duke",
32
]
],
"size": 2,
"status": 200
}
```
If an error of any type occurs, OpenSearch returns the error message.
The following query searches for a non-existent field `unknown`:
```json
POST /_plugins/_sql
{
"query" : "SELECT unknown FROM accounts"
}
```
The response contains the error message and the cause of the error:
```json
{
"error": {
"reason": "Invalid SQL query",
"details": "Field [unknown] cannot be found or used here.",
"type": "SemanticAnalysisException"
},
"status": 400
}
```
## OpenSearch DSL JSON format
If you set the format to `json`, the original OpenSearch response is returned in JSON format. Because this is the native response from OpenSearch, extra effort is needed to parse and interpret it.
#### Sample request
The following query sets the response format to `json`:
```json
POST _plugins/_sql?format=json
{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age LIMIT 2"
}
```
#### Sample response
The response is the original response from OpenSearch:
```json
{
"_shards": {
"total": 5,
"failed": 0,
"successful": 5,
"skipped": 0
},
"hits": {
"hits": [{
"_index": "accounts",
"_type": "account",
"_source": {
"firstname": "Nanette",
"age": 28,
"lastname": "Bates"
},
"_id": "13",
"sort": [
28
],
"_score": null
},
{
"_index": "accounts",
"_type": "account",
"_source": {
"firstname": "Amber",
"age": 32,
"lastname": "Duke"
},
"_id": "1",
"sort": [
32
],
"_score": null
}
],
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null
},
"took": 100,
"timed_out": false
}
```
## CSV format
You can also specify to return results in CSV format.
#### Sample request
```json
POST /_plugins/_sql?format=csv
{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age"
}
```
#### Sample response
```text
firstname,lastname,age
Nanette,Bates,28
Amber,Duke,32
Dale,Adams,33
Hattie,Bond,36
```
### Sanitizing results in CSV format
By default, OpenSearch sanitizes header cells (field names) and data cells (field contents) according to the following rules:
- If a cell starts with `+`, `-`, `=` , or `@`, the sanitizer inserts a single quote (`'`) at the start of the cell.
- If a cell contains one or more commas (`,`), the sanitizer surrounds the cell with double quotes (`"`).
### Example
The following query indexes a document with cells that either start with special characters or contain commas:
```json
PUT /userdata/_doc/1?refresh=true
{
"+firstname": "-Hattie",
"=lastname": "@Bond",
"address": "671 Bristol Street, Dente, TN"
}
```
You can use the query below to request results in CSV format:
```json
POST /_plugins/_sql?format=csv
{
"query" : "SELECT * FROM userdata"
}
```
In the response, cells that start with special characters are prefixed with `'`. The cell that has commas is surrounded with quotation marks:
```text
'+firstname,'=lastname,address
'Hattie,'@Bond,"671 Bristol Street, Dente, TN"
```
To skip sanitizing, set the `sanitize` query parameter to false:
```json
POST /_plugins/_sql?format=csvandsanitize=false
{
"query" : "SELECT * FROM userdata"
}
```
The response contains the results in the original CSV format:
```text
=lastname,address,+firstname
@Bond,"671 Bristol Street, Dente, TN",-Hattie
```
## Raw format
You can use the raw format to pipe the results to other command line tools for post-processing.
#### Sample request
```json
POST /_plugins/_sql?format=raw
{
"query" : "SELECT firstname, lastname, age FROM accounts ORDER BY age"
}
```
#### Sample response
```text
Nanette|Bates|28
Amber|Duke|32
Dale|Adams|33
Hattie|Bond|36
```
By default, OpenSearch sanitizes results in `raw` format according to the following rule:
- If a data cell contains one or more pipe characters (`|`), the sanitizer surrounds the cell with double quotes.
### Example
The following query indexes a document with pipe characters (`|`) in its fields:
```json
PUT /userdata/_doc/1?refresh=true
{
"+firstname": "|Hattie",
"=lastname": "Bond|",
"|address": "671 Bristol Street| Dente| TN"
}
```
You can use the query below to request results in `raw` format:
```json
POST /_plugins/_sql?format=raw
{
"query" : "SELECT * FROM userdata"
}
```
The query returns cells with the `|` character surrounded by quotation marks:
```text
"|address"|=lastname|+firstname
"671 Bristol Street| Dente| TN"|"Bond|"|"|Hattie"
```

View File

@ -1,13 +1,15 @@
---
layout: default
title: Settings
parent: SQL
nav_order: 16
parent: SQL and PPL
nav_order: 77
---
# Settings
The SQL plugin adds a few settings to the standard OpenSearch cluster settings. Most are dynamic, so you can change the default behavior of the plugin without restarting your cluster.
The SQL plugin adds a few settings to the standard OpenSearch cluster settings. Most are dynamic, so you can change the default behavior of the plugin without restarting your cluster.
It is possible to independently disable processing of `PPL` or `SQL` queries.
You can update these settings like any other cluster setting:
@ -20,7 +22,23 @@ PUT _cluster/settings
}
```
Similarly, you can also update the settings by sending the request to the plugin setting endpoint `_plugins/_query/setting`:
Alternatively, you can use the following request format:
```json
PUT _cluster/settings
{
"transient": {
"plugins": {
"ppl": {
"enabled": "false"
}
}
}
}
```
Similarly, you can update the settings by sending a request to the `_plugins/_query/settings` endpoint:
```json
PUT _plugins/_query/settings
{
@ -30,10 +48,31 @@ PUT _plugins/_query/settings
}
```
Alternatively, you can use the following request format:
```json
PUT _plugins/_query/settings
{
"transient": {
"plugins": {
"ppl": {
"enabled": "false"
}
}
}
}
```
Requests to the `_plugins/_ppl` and `_plugins/_sql` endpoints include index names in the request body, so they have the same access policy considerations as the `bulk`, `mget`, and `msearch` operations. Setting the `rest.action.multi.allow_explicit_index` parameter to `false` disables both the `SQL` and `PPL` endpoints.
{: .note}
# Available settings
Setting | Default | Description
:--- | :--- | :---
`plugins.sql.enabled` | True | Change to `false` to disable the plugin.
`plugins.sql.slowlog` | 2 seconds | Configure the time limit (in seconds) for slow queries. The plugin logs slow queries as `Slow query: elapsed=xxx (ms)` in `opensearch.log`.
`plugins.sql.cursor.keep_alive` | 1 minute | This value configures how long the cursor context is kept open. Cursor contexts are resource heavy, so we recommend a low value.
`plugins.query.memory_limit` | 85% | This setting configures the heap memory usage limit for the circuit breaker of the query engine.
`plugins.query.size_limit` | 200 | The setting sets the default size of index that the query engine fetches from OpenSearch.
`plugins.sql.enabled` | True | Change to `false` to disable the `SQL` support in the plugin.
`plugins.ppl.enabled` | True | Change to `false` to disable the `PPL` support in the plugin.
`plugins.sql.slowlog` | 2 seconds | Configures the time limit (in seconds) for slow queries. The plugin logs slow queries as `Slow query: elapsed=xxx (ms)` in `opensearch.log`.
`plugins.sql.cursor.keep_alive` | 1 minute | Configures how long the cursor context is kept open. Cursor contexts are resource resource intensive, so we recommend a low value.
`plugins.query.memory_limit` | 85% | Configures the heap memory usage limit for the circuit breaker of the query engine.
`plugins.query.size_limit` | 200 | Sets the default size of index that the query engine fetches from OpenSearch.

View File

@ -1,205 +0,0 @@
---
layout: default
title: Full-Text Search
parent: SQL
nav_order: 8
---
# Full-text search
Use SQL commands for full-text search. The SQL plugin supports a subset of the full-text queries available in OpenSearch.
To learn about full-text queries in OpenSearch, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/).
## Match
Use the `match` command to search documents that match a `string`, `number`, `date`, or `boolean` value for a given field.
### Syntax
```sql
match(field_expression, query_expression[, option=<option_value>]*)
```
You can specify the following options:
- `analyzer`
- `auto_generate_synonyms_phrase`
- `fuzziness`
- `max_expansions`
- `prefix_length`
- `fuzzy_transpositions`
- `fuzzy_rewrite`
- `lenient`
- `operator`
- `minimum_should_match`
- `zero_terms_query`
- `boost`
*Example 1*: Search the `message` field:
```json
GET my_index/_search
{
"query": {
"match": {
"message": "this is a test"
}
}
}
```
SQL query:
```sql
SELECT message FROM my_index WHERE match(message, "this is a test")
```
*Example 2*: Search the `message` field with the `operator` parameter:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "this is a test",
"operator": "and"
}
}
}
}
```
SQL query:
```sql
SELECT message FROM my_index WHERE match(message, "this is a test", operator=and)
```
*Example 3*: Search the `message` field with the `operator` and `zero_terms_query` parameters:
```json
GET my_index/_search
{
"query": {
"match": {
"message": {
"query": "to be or not to be",
"operator": "and",
"zero_terms_query": "all"
}
}
}
}
```
SQL query:
```sql
SELECT message FROM my_index WHERE match(message, "this is a test", operator=and, zero_terms_query=all)
```
To search for text in a single field, use `MATCHQUERY` or `MATCH_QUERY` functions.
Pass in your search query and the field name that you want to search against.
```sql
SELECT account_number, address
FROM accounts
WHERE MATCH_QUERY(address, 'Holmes')
```
Alternate syntax:
```sql
SELECT account_number, address
FROM accounts
WHERE address = MATCH_QUERY('Holmes')
```
| account_number | address
:--- | :---
1 | 880 Holmes Lane
## Multi match
To search for text in multiple fields, use `MULTI_MATCH`, `MULTIMATCH`, or `MULTIMATCHQUERY` functions.
For example, search for `Dale` in either the `firstname` or `lastname` fields:
```sql
SELECT firstname, lastname
FROM accounts
WHERE MULTI_MATCH('query'='Dale', 'fields'='*name')
```
| firstname | lastname
:--- | :---
Dale | Adams
## Query string
To split text based on operators, use the `QUERY` function.
```sql
SELECT account_number, address
FROM accounts
WHERE QUERY('address:Lane OR address:Street')
```
| account_number | address
:--- | :---
1 | 880 Holmes Lane
6 | 671 Bristol Street
13 | 789 Madison Street
The `QUERY` function supports logical connectives, wildcard, regex, and proximity search.
## Match phrase
To search for exact phrases, use `MATCHPHRASE`, `MATCH_PHRASE`, or `MATCHPHRASEQUERY` functions.
```sql
SELECT account_number, address
FROM accounts
WHERE MATCH_PHRASE(address, '880 Holmes Lane')
```
| account_number | address
:--- | :---
1 | 880 Holmes Lane
## Score query
To return a relevance score along with every matching document, use `SCORE`, `SCOREQUERY`, or `SCORE_QUERY` functions.
You need to pass in two arguments. The first is the `MATCH_QUERY` expression. The second is an optional floating point number to boost the score (default value is 1.0).
```sql
SELECT account_number, address, _score
FROM accounts
WHERE SCORE(MATCH_QUERY(address, 'Lane'), 0.5) OR
SCORE(MATCH_QUERY(address, 'Street'), 100)
ORDER BY _score
```
| account_number | address | score
:--- | :--- | :---
1 | 880 Holmes Lane | 0.5
6 | 671 Bristol Street | 100
13 | 789 Madison Street | 100

View File

@ -0,0 +1,521 @@
---
layout: default
title: SQL/PPL API
parent: SQL and PPL
nav_order: 1
---
# SQL/PPL API
Use the SQL and PPL API to send queries to the SQL plugin. Use the `_sql` endpoint to send queries in SQL, and the `_ppl` endpoint to send queries in PPL. For both of these, you can also use the `_explain` endpoint to translate your query into [OpenSearch domain-specific language]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/) (DSL) or to troubleshoot errors.
---
#### Table of contents
- TOC
{:toc}
---
## Query API
Introduced 1.0
{: .label .label-purple }
Sends an SQL/PPL query to the SQL plugin. You can pass the format for the response as a query parameter.
### Query parameters
Parameter | Data Type | Description
:--- | :--- | :---
[format]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/) | String | The format for the response. The `_sql` endpoint supports `jdbc`, `csv`, `raw`, and `json` formats. The `_ppl` endpoint supports `jdbc`, `csv`, and `raw` formats. Default is `jdbc`.
sanitize | Boolean | Specifies whether to escape special characters in the results. See [Response formats]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats/) for more information. Default is `true`.
### Request fields
Field | Data Type | Description
:--- | :--- | :---
query | String | The query to be executed. Required.
[filter](#filtering-results) | JSON object | The filter for the results. Optional.
[fetch_size](#paginating-results) | integer | The number of results to return in one response. Used for paginating results. Default is 1,000. Optional. Only supported for the `jdbc` response format.
#### Sample request
```json
POST /_plugins/_sql
{
"query" : "SELECT * FROM accounts"
}
```
#### Sample response
The response contains the schema and the results:
```json
{
"schema": [
{
"name": "account_number",
"type": "long"
},
{
"name": "firstname",
"type": "text"
},
{
"name": "address",
"type": "text"
},
{
"name": "balance",
"type": "long"
},
{
"name": "gender",
"type": "text"
},
{
"name": "city",
"type": "text"
},
{
"name": "employer",
"type": "text"
},
{
"name": "state",
"type": "text"
},
{
"name": "age",
"type": "long"
},
{
"name": "email",
"type": "text"
},
{
"name": "lastname",
"type": "text"
}
],
"datarows": [
[
1,
"Amber",
"880 Holmes Lane",
39225,
"M",
"Brogan",
"Pyrami",
"IL",
32,
"amberduke@pyrami.com",
"Duke"
],
[
6,
"Hattie",
"671 Bristol Street",
5686,
"M",
"Dante",
"Netagy",
"TN",
36,
"hattiebond@netagy.com",
"Bond"
],
[
13,
"Nanette",
"789 Madison Street",
32838,
"F",
"Nogal",
"Quility",
"VA",
28,
"nanettebates@quility.com",
"Bates"
],
[
18,
"Dale",
"467 Hutchinson Court",
4180,
"M",
"Orick",
null,
"MD",
33,
"daleadams@boink.com",
"Adams"
]
],
"total": 4,
"size": 4,
"status": 200
}
```
### Response fields
Field | Data Type | Description
:--- | :--- | :---
schema | Array | Specifies the field names and types for all fields.
data_rows | 2D array | An array of results. Each result represents one matching row (document).
total | Integer | The total number of rows (documents) in the index.
size | Integer | The number of results to return in one response.
status | String | The HTTP response status OpenSearch returns after running the query.
## Explain API
The SQL plugin has an `explain` feature that shows how a query is executed against OpenSearch, which is useful for debugging and development. A POST request to the `_plugins/_sql/_explain` or `_plugins/_ppl/_explain` endpoint returns [OpenSearch domain-specific language]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/) (DSL) in JSON format, explaining the query.
You can execute the explain API operation either in command line using `curl` or in the Dashboards console, like in the example below.
#### Sample explain request for an SQL query
```json
POST _plugins/_sql/_explain
{
"query": "SELECT firstname, lastname FROM accounts WHERE age > 20"
}
```
#### Sample SQL query explain response
```json
{
"root": {
"name": "ProjectOperator",
"description": {
"fields": "[firstname, lastname]"
},
"children": [
{
"name": "OpenSearchIndexScan",
"description": {
"request": """OpenSearchQueryRequest(indexName=accounts, sourceBuilder={"from":0,"size":200,"timeout":"1m","query":{"range":{"age":{"from":20,"to":null,"include_lower":false,"include_upper":true,"boost":1.0}}},"_source":{"includes":["firstname","lastname"],"excludes":[]},"sort":[{"_doc":{"order":"asc"}}]}, searchDone=false)"""
},
"children": []
}
]
}
}
```
#### Sample explain request for a PPL query
```json
POST _plugins/_ppl/_explain
{
"query" : "source=accounts | fields firstname, lastname"
}
```
#### Sample PPL query explain response
```json
{
"root": {
"name": "ProjectOperator",
"description": {
"fields": "[firstname, lastname]"
},
"children": [
{
"name": "OpenSearchIndexScan",
"description": {
"request": """OpenSearchQueryRequest(indexName=accounts, sourceBuilder={"from":0,"size":200,"timeout":"1m","_source":{"includes":["firstname","lastname"],"excludes":[]}}, searchDone=false)"""
},
"children": []
}
]
}
}
```
For queries that require post-processing, the `explain` response includes a query plan in addition to the OpenSearch DSL. For those queries that don't require post processing, you can see a complete DSL.
## Paginating results
To get back a paginated response, use the `fetch_size` parameter. The value of `fetch_size` should be greater than 0. The default value is 1,000. A value of 0 will fall back to a non-paginated response.
The `fetch_size` parameter is only supported for the `jdbc` response format.
{: .note }
### Example
The following request contains an SQL query and specifies to return five results at a time:
```json
POST _plugins/_sql/
{
"fetch_size" : 5,
"query" : "SELECT firstname, lastname FROM accounts WHERE age > 20 ORDER BY state ASC"
}
```
The response contains all the fields that a query without `fetch_size` would contain, and a `cursor` field that is used to retrieve subsequent pages of results:
```json
{
"schema": [
{
"name": "firstname",
"type": "text"
},
{
"name": "lastname",
"type": "text"
}
],
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9",
"total": 956,
"datarows": [
[
"Cherry",
"Carey"
],
[
"Lindsey",
"Hawkins"
],
[
"Sargent",
"Powers"
],
[
"Campos",
"Olsen"
],
[
"Savannah",
"Kirby"
]
],
"size": 5,
"status": 200
}
```
To fetch subsequent pages, use the `cursor` from the previous response:
```json
POST /_plugins/_sql
{
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9"
}
```
The next response contains only the `datarows` of the results and a new `cursor`.
```json
{
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMabcde12345",
"datarows": [
[
"Abbey",
"Karen"
],
[
"Chen",
"Ken"
],
[
"Ani",
"Jade"
],
[
"Peng",
"Hu"
],
[
"John",
"Doe"
]
]
}
```
The `datarows` can have more than the `fetch_size` number of records in case nested fields are flattened.
{: .note }
The last page of results has only `datarows` and no `cursor`. The `cursor` context is automatically cleared on the last page.
To explicitly clear the cursor context, use the `_plugins/_sql/close` endpoint operation:
```json
POST /_plugins/_sql/close
{
"cursor": "d:eyJhIjp7fSwicyI6IkRYRjFaWEo1UVc1a1JtVjBZMmdCQUFBQUFBQUFBQU1XZWpkdFRFRkZUMlpTZEZkeFdsWnJkRlZoYnpaeVVRPT0iLCJjIjpbeyJuYW1lIjoiZmlyc3RuYW1lIiwidHlwZSI6InRleHQifSx7Im5hbWUiOiJsYXN0bmFtZSIsInR5cGUiOiJ0ZXh0In1dLCJmIjo1LCJpIjoiYWNjb3VudHMiLCJsIjo5NTF9"
}'
```
The response is an acknowledgement from OpenSearch:
```json
{"succeeded":true}
```
## Filtering results
You can use the `filter` parameter to add more conditions to the OpenSearch DSL directly.
The following SQL query returns the names and account balances of all customers. The results are then filtered to contain only those customers with less than $10,000 balance.
```json
POST /_plugins/_sql/
{
"query" : "SELECT firstname, lastname, balance FROM accounts",
"filter" : {
"range" : {
"balance" : {
"lt" : 10000
}
}
}
}
```
The response contains the matching results:
```json
{
"schema": [
{
"name": "firstname",
"type": "text"
},
{
"name": "lastname",
"type": "text"
},
{
"name": "balance",
"type": "long"
}
],
"total": 2,
"datarows": [
[
"Hattie",
"Bond",
5686
],
[
"Dale",
"Adams",
4180
]
],
"size": 2,
"status": 200
}
```
You can use the Explain API to see how this query is executed against OpenSearch:
```json
POST /_plugins/_sql/_explain
{
"query" : "SELECT firstname, lastname, balance FROM accounts",
"filter" : {
"range" : {
"balance" : {
"lt" : 10000
}
}
}
}'
```
The response contains the Boolean query in OpenSearch DSL that corresponds to the query above:
```json
{
"from": 0,
"size": 200,
"query": {
"bool": {
"filter": [{
"bool": {
"filter": [{
"range": {
"balance": {
"from": null,
"to": 10000,
"include_lower": true,
"include_upper": false,
"boost": 1.0
}
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"_source": {
"includes": [
"firstname",
"lastname",
"balance"
],
"excludes": []
}
}
```
## Using parameters
You can use the `parameters` field to pass parameter values to a prepared SQL query.
The following explain operation uses an SQL query with an `age` parameter:
```json
POST /_plugins/_sql/_explain
{
"query": "SELECT * FROM accounts WHERE age = ?",
"parameters": [{
"type": "integer",
"value": 30
}]
}
```
The response contains the Boolean query in OpenSearch DSL that corresponds to the SQL query above:
```json
{
"from": 0,
"size": 200,
"query": {
"bool": {
"filter": [{
"bool": {
"must": [{
"term": {
"age": {
"value": 30,
"boost": 1.0
}
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
}
}
```

View File

@ -0,0 +1,149 @@
---
layout: default
title: Aggregation Functions
parent: SQL
grand_parent: SQL and PPL
nav_order: 11
---
# Aggregation functions
Aggregate functions use the `GROUP BY` clause to group sets of values into subsets.
## Group By
Use the `GROUP BY` clause as an identifier, ordinal, or expression.
### Identifier
```sql
SELECT gender, sum(age) FROM accounts GROUP BY gender;
```
| gender | sum (age)
:--- | :---
F | 28 |
M | 101 |
### Ordinal
```sql
SELECT gender, sum(age) FROM accounts GROUP BY 1;
```
| gender | sum (age)
:--- | :---
F | 28 |
M | 101 |
### Expression
```sql
SELECT abs(account_number), sum(age) FROM accounts GROUP BY abs(account_number);
```
| abs(account_number) | sum (age)
:--- | :---
| 1 | 32 |
| 13 | 28 |
| 18 | 33 |
| 6 | 36 |
## Aggregation
Use aggregations as a select, expression, or an argument of an expression.
### Select
```sql
SELECT gender, sum(age) FROM accounts GROUP BY gender;
```
| gender | sum (age)
:--- | :---
F | 28 |
M | 101 |
### Argument
```sql
SELECT gender, sum(age) * 2 as sum2 FROM accounts GROUP BY gender;
```
| gender | sum2
:--- | :---
F | 56 |
M | 202 |
### Expression
```sql
SELECT gender, sum(age * 2) as sum2 FROM accounts GROUP BY gender;
```
| gender | sum2
:--- | :---
F | 56 |
M | 202 |
### COUNT
Use the `COUNT` function to accept arguments such as a `*` or a literal like `1`.
The meaning of these different forms are as follows:
- `COUNT(field)` - Only counts if given a field (or expression) is not null or missing in the input rows.
- `COUNT(*)` - Counts the number of all its input rows.
- `COUNT(1)` (same as `COUNT(*)`) - Counts any non-null literal.
## Having
Use the `HAVING` clause to filter out aggregated values.
### HAVING with GROUP BY
You can use aggregate expressions or its aliases defined in a `SELECT` clause in a `HAVING` condition.
We recommend using a non-aggregate expression in the `WHERE` clause although you can do this in a `HAVING` clause.
The aggregations in a `HAVING` clause are not necessarily the same as that in a select list. As an extension to the SQL standard, you're not restricted to using identifiers only in the `GROUP BY` list.
For example:
```sql
SELECT gender, sum(age)
FROM accounts
GROUP BY gender
HAVING sum(age) > 100;
```
| gender | sum (age)
:--- | :---
M | 101 |
Here's another example for using an alias in a `HAVING` condition.
```sql
SELECT gender, sum(age) AS s
FROM accounts
GROUP BY gender
HAVING s > 100;
```
| gender | s
:--- | :---
M | 101 |
If an identifier is ambiguous, for example, present both as a select alias and as an index field (preference is alias). In this case, the identifier is replaced with an expression aliased in the `SELECT` clause:
### HAVING without GROUP BY
You can use a `HAVING` clause without the `GROUP BY` clause. This is useful because aggregations are not supported in a `WHERE` clause:
```sql
SELECT 'Total of age > 100'
FROM accounts
HAVING sum(age) > 100;
```
| Total of age > 100 |
:--- |
Total of age > 100 |

View File

@ -2,6 +2,7 @@
layout: default
title: Basic Queries
parent: SQL
grand_parent: SQL and PPL
nav_order: 5
---

View File

@ -2,6 +2,7 @@
layout: default
title: Complex Queries
parent: SQL
grand_parent: SQL and PPL
nav_order: 6
---

View File

@ -2,6 +2,7 @@
layout: default
title: Delete
parent: SQL
grand_parent: SQL and PPL
nav_order: 12
---

View File

@ -0,0 +1,225 @@
---
layout: default
title: Functions
parent: SQL
grand_parent: SQL and PPL
nav_order: 7
---
# Functions
The SQL language supports all SQL plugin [common functions]({{site.url}}{{site.baseurl}}/search-plugins/sql/functions/), including [relevance search]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text/), but also introduces a few function synonyms, which are available in SQL only.
These synonyms are provided by the `V1` engine. For more information, see [Limitations]({{site.url}}{{site.baseurl}}/search-plugins/sql/limitation).
## Match query
The `MATCHQUERY` and `MATCH_QUERY` functions are synonyms for the [`MATCH`]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text#match) relevance function. They don't accept additional arguments but provide an alternate syntax.
### Syntax
To use `matchquery` or `match_query`, pass in your search query and the field name that you want to search against:
```sql
match_query(field_expression, query_expression[, option=<option_value>]*)
matchquery(field_expression, query_expression[, option=<option_value>]*)
field_expression = match_query(query_expression[, option=<option_value>]*)
field_expression = matchquery(query_expression[, option=<option_value>]*)
```
You can specify the following options in any order:
- `analyzer`
- `boost`
### Example
You can use `MATCHQUERY` to replace `MATCH`:
```sql
SELECT account_number, address
FROM accounts
WHERE MATCHQUERY(address, 'Holmes')
```
Alternatively, you can use `MATCH_QUERY` to replace `MATCH`:
```sql
SELECT account_number, address
FROM accounts
WHERE address = MATCH_QUERY('Holmes')
```
The results contain documents in which the address contains "Holmes":
| account_number | address
:--- | :---
1 | 880 Holmes Lane
## Multi-match
There are three synonyms for [`MULTI_MATCH`]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text#multi-match), each with a slightly different syntax. They accept a query string and a fields list with weights. They can also accept additional optional parameters.
### Syntax
```sql
multimatch('query'=query_expression[, 'fields'=field_expression][, option=<option_value>]*)
multi_match('query'=query_expression[, 'fields'=field_expression][, option=<option_value>]*)
multimatchquery('query'=query_expression[, 'fields'=field_expression][, option=<option_value>]*)
```
The `fields` parameter is optional and can contain a single field or a comma-separated list (whitespace characters are not allowed). The weight for each field is optional and is specified after the field name. It should be delimited by the `caret` character -- `^` -- without whitespace.
### Example
The following queries show the `fields` parameter of a multi-match query with a single field and a field list:
```sql
multi_match('fields' = "Tags^2,Title^3.4,Body,Comments^0.3", ...)
multi_match('fields' = "Title", ...)
```
You can specify the following options in any order:
- `analyzer`
- `boost`
- `slop`
- `type`
- `tie_breaker`
- `operator`
## Query string
The `QUERY` function is a synonym for [`QUERY_STRING`]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text#query-string).
### Syntax
```sql
query('query'=query_expression[, 'fields'=field_expression][, option=<option_value>]*)
```
The `fields` parameter is optional and can contain a single field or a comma-separated list (whitespace characters are not allowed). The weight for each field is optional and is specified after the field name. It should be delimited by the `caret` character -- `^` -- without whitespace.
### Example
The following queries show the `fields` parameter of a multi-match query with a single field and a field list:
```sql
query('fields' = "Tags^2,Title^3.4,Body,Comments^0.3", ...)
query('fields' = "Tags", ...)
```
You can specify the following options in any order:
- `analyzer`
- `boost`
- `slop`
- `default_field`
### Example of using `query_string` in SQL and PPL queries:
The following is a sample REST API search request in OpenSearch DSL.
```json
GET accounts/_search
{
"query": {
"query_string": {
"query": "Lane Street",
"fields": [ "address" ],
}
}
}
```
The request above is equivalent to the following `query` function:
```sql
SELECT account_number, address
FROM accounts
WHERE query('address:Lane OR address:Street')
```
The results contain addresses that contain "Lane" or "Street":
| account_number | address
:--- | :---
1 | 880 Holmes Lane
6 | 671 Bristol Street
13 | 789 Madison Street
## Match phrase
The `MATCHPHRASEQUERY` function is a synonym for [`MATCH_PHRASE`]({{site.url}}{{site.baseurl}}/search-plugins/sql/full-text#query-string).
### Syntax
```sql
matchphrasequery(query_expression, field_expression[, option=<option_value>]*)
```
You can specify the following options in any order:
- `analyzer`
- `boost`
- `slop`
## Score query
To return a relevance score along with every matching document, use the `SCORE`, `SCOREQUERY`, or `SCORE_QUERY` functions.
### Syntax
The `SCORE` function expects two arguments. The first argument is the [`MATCH_QUERY`](#match-query) expression. The second argument is an optional floating-point number to boost the score (the default value is 1.0):
```sql
SCORE(match_query_expression, score)
SCOREQUERY(match_query_expression, score)
SCORE_QUERY(match_query_expression, score)
```
### Example
The following example uses the `SCORE` function to boost the documents' scores:
```sql
SELECT account_number, address, _score
FROM accounts
WHERE SCORE(MATCH_QUERY(address, 'Lane'), 0.5) OR
SCORE(MATCH_QUERY(address, 'Street'), 100)
ORDER BY _score
```
The results contain matches with corresponding scores:
| account_number | address | score
:--- | :--- | :---
1 | 880 Holmes Lane | 0.5
6 | 671 Bristol Street | 100
13 | 789 Madison Street | 100
## Wildcard query
To search documents by a given wildcard, use the `WILDCARDQUERY` or `WILDCARD_QUERY` functions.
### Syntax
```sql
wildcardquery(field_expression, query_expression[, boost=<value>])
wildcard_query(field_expression, query_expression[, boost=<value>])
```
### Example
The following example uses a wildcard query:
```sql
SELECT account_number, address
FROM accounts
WHERE wildcard_query(address, '*Holmes*');
```
The results contain documents that match the wildcard expression:
| account_number | address
:--- | :---
1 | 880 Holmes Lane

View File

@ -0,0 +1,76 @@
---
layout: default
title: SQL
parent: SQL and PPL
nav_order: 4
has_children: true
has_toc: false
redirect_from:
- /search-plugins/sql/sql
---
# SQL
## Workbench
The easiest way to get familiar with the SQL plugin is to use **Query Workbench** in OpenSearch Dashboards to test various queries. To learn more, see [Workbench]({{site.url}}{{site.baseurl}}/search-plugins/sql/workbench/).
![OpenSearch Dashboards SQL UI plugin]({{site.url}}{{site.baseurl}}/images/sql.png)
## SQL and OpenSearch terminology
Heres how core SQL concepts map to OpenSearch:
SQL | OpenSearch
:--- | :---
Table | Index
Row | Document
Column | Field
## REST API
For a complete REST API reference for the SQL plugin, see [SQL/PPL API]({{site.url}}{{site.baseurl}}/search-plugins/sql/sql-ppl-api).
To use the SQL plugin with your own applications, send requests to the `_plugins/_sql` endpoint:
```json
POST _plugins/_sql
{
"query": "SELECT * FROM my-index LIMIT 50"
}
```
You can query multiple indexes by using a comma-separated list:
```json
POST _plugins/_sql
{
"query": "SELECT * FROM my-index1,myindex2,myindex3 LIMIT 50"
}
```
You can also specify an index pattern with a wildcard expression:
```json
POST _plugins/_sql
{
"query": "SELECT * FROM my-index* LIMIT 50"
}
```
To run the above query in the command line, use the [curl](https://curl.haxx.se/) command:
```bash
curl -XPOST https://localhost:9200/_plugins/_sql -u 'admin:admin' -k -H 'Content-Type: application/json' -d '{"query": "SELECT * FROM my-index* LIMIT 50"}'
```
You can specify the [response format]({{site.url}}{{site.baseurl}}/search-plugins/sql/response-formats) as JDBC, standard OpenSearch JSON, CSV, or raw. By default, queries return data in JDBC format. The following query sets the format to JSON:
```json
POST _plugins/_sql?format=json
{
"query": "SELECT * FROM my-index LIMIT 50"
}
```
See the rest of this guide for more information about request parameters, settings, supported operations, and tools.

View File

@ -2,6 +2,7 @@
layout: default
title: JDBC Driver
parent: SQL
grand_parent: SQL and PPL
nav_order: 71
---
@ -10,3 +11,7 @@ nav_order: 71
The Java Database Connectivity (JDBC) driver lets you integrate OpenSearch with your favorite business intelligence (BI) applications.
For information on downloading and using the JAR file, see [the SQL repository on GitHub](https://github.com/opensearch-project/sql/tree/master/sql-jdbc).
## Connecting to Tableau
To connect to Tableau, follow the detailed instructions in the [GitHub repository](https://github.com/opensearch-project/sql/blob/main/bi-connectors/TableauConnector/README.md).

View File

@ -2,6 +2,7 @@
layout: default
title: Metadata Queries
parent: SQL
grand_parent: SQL and PPL
nav_order: 9
---

View File

@ -2,6 +2,7 @@
layout: default
title: ODBC Driver
parent: SQL
grand_parent: SQL and PPL
nav_order: 72
---
@ -9,9 +10,7 @@ nav_order: 72
The Open Database Connectivity (ODBC) driver is a read-only ODBC driver for Windows and macOS that lets you connect business intelligence (BI) and data visualization applications like [Tableau](https://github.com/opensearch-project/sql/blob/main/sql-odbc/docs/user/tableau_support.md), [Microsoft Excel](https://github.com/opensearch-project/sql/blob/main/sql-odbc/docs/user/microsoft_excel_support.md), and [Power BI](https://github.com/opensearch-project/sql/blob/main/sql-odbc/docs/user/power_bi_support.md) to the SQL plugin.
For information on downloading and using the JAR file, see [the SQL repository on GitHub](https://github.com/opensearch-project/sql/tree/main/sql-odbc).
{% comment %}
For information on downloading and using the driver, see [the SQL repository on GitHub](https://github.com/opensearch-project/sql/tree/main/sql-odbc).
## Specifications
@ -23,8 +22,8 @@ The following operating systems are supported:
Operating System | Version
:--- | :---
Windows | Windows 10
macOS | Catalina 10.15.4 and Mojave 10.14.6
Windows | Windows 10, Windows 11
macOS | Catalina 10.15.4, Mojave 10.14.6, Big Sur 11.6.7, Monterey 12.4
## Concepts
@ -46,13 +45,13 @@ To install the driver, download the bundled distribution installer from [here](h
The installer is unsigned and shows a security dialog. Choose **More info** and **Run anyway**.
1. Choose **Next** to proceed with the installation.
2. Choose **Next** to proceed with the installation.
1. Accept the agreement, and choose **Next**.
3. Accept the agreement, and choose **Next**.
1. The installer comes bundled with documentation and useful resources files to connect with various BI tools (for example, a `.tdc` file for Tableau). You can choose to keep or remove these resources. Choose **Next**.
4. The installer comes bundled with documentation and useful resource files to connect to various BI tools (for example, a `.tdc` file for Tableau). You can choose to keep or remove these resources. Choose **Next**.
1. Choose **Install** and **Finish**.
5. Choose **Install** and **Finish**.
The following connection information is set up as part of the default DSN:
@ -73,13 +72,13 @@ Before installing the ODBC Driver on macOS, install the iODBC Driver Manager.
The installer is unsigned and shows a security dialog. Right-click on the installer and choose **Open**.
1. Choose **Continue** several times to proceed with the installation.
2. Choose **Continue** several times to proceed with the installation.
1. Choose the **Destination** to install the driver files.
3. Choose the **Destination** to install the driver files.
1. The installer comes bundled with documentation and useful resources files to connect with various BI tools (for example, a `.tdc` file for Tableau). You can choose to keep or remove these resources. Choose **Continue**.
4. The installer comes bundled with documentation and useful resources files to connect to various BI tools (for example, a `.tdc` file for Tableau). You can choose to keep or remove these resources. Choose **Continue**.
1. Choose **Install** and **Close**.
5. Choose **Install** and **Close**.
Currently, the DSN is not set up as part of the installation and needs to be configured manually. First, open `iODBC Administrator`:
@ -90,19 +89,19 @@ sudo /Applications/iODBC/iODBC\ Administrator64.app/Contents/MacOS/iODBC\ Admini
This command gives the application permissions to save the driver and DSN configurations.
1. Choose **ODBC Drivers** tab.
1. Choose **Add a Driver** and fill in the following details:
2. Choose **Add a Driver** and fill in the following details:
- **Description of the Driver**: Enter the driver name that you used for the ODBC connection (for example, OpenSearch SQL ODBC Driver).
- **Driver File Name**: Enter the path to the driver file (default: `<driver-install-dir>/bin/libopensearchsqlodbc.dylib`).
- **Setup File Name**: Enter the path to the setup file (default: `<driver-install-dir>/bin/libopensearchsqlodbc.dylib`).
1. Choose the user driver.
1. Choose **OK** to save the options.
1. Choose the **User DSN** tab.
1. Select **Add**.
1. Choose the driver that you added above.
1. For **Data Source Name (DSN)**, enter the name of the DSN used to store connection options (for example, OpenSearch SQL ODBC DSN).
1. For **Comment**, add an optional comment.
1. Add key-value pairs by using the `+` button. We recommend the following options for a default local OpenSearch installation:
3. Choose the user driver.
4. Choose **OK** to save the options.
5. Choose the **User DSN** tab.
6. Select **Add**.
7. Choose the driver that you added above.
8. For **Data Source Name (DSN)**, enter the name of the DSN used to store connection options (for example, OpenSearch SQL ODBC DSN).
9. For **Comment**, add an optional comment.
10. Add key-value pairs by using the `+` button. We recommend the following options for a default local OpenSearch installation:
- **Host**: `localhost` - OpenSearch server endpoint
- **Port**: `9200` - The server port
- **Auth**: `NONE` - The authentication mode
@ -111,8 +110,8 @@ This command gives the application permissions to save the driver and DSN config
- **ResponseTimeout**: `10` - The number of seconds to wait for a response from the server
- **UseSSL**: `0` - Do not use SSL for connections
1. Choose **OK** to save the DSN configuration.
1. Choose **OK** to exit the iODBC Administrator.
11. Choose **OK** to save the DSN configuration.
12. Choose **OK** to exit the iODBC Administrator.
## Customizing the ODBC driver
@ -166,13 +165,12 @@ Option | Description | Type | Default
Option | Description | Type | Default
:--- | :---
`LogLevel` | Severity level for driver logs. | one of `ES_OFF`, `ES_FATAL`, `ES_ERROR`, `ES_INFO`, `ES_DEBUG`, `ES_TRACE`, `ES_ALL` | `ES_WARNING`
`LogLevel` | Severity level for driver logs. | `LOG_OFF`, `LOG_FATAL`, `LOG_ERROR`, `LOG_INFO`, `LOG_DEBUG`, `LOG_TRACE`, or `LOG_ALL` | `LOG_WARNING`
`LogOutput` | Location for storing driver logs. | `string` | `WIN: C:\`, `MAC: /tmp`
You need administrative privileges to change the logging options.
{: .note }
## Connecting to Tableau
Pre-requisites:
@ -183,13 +181,14 @@ Pre-requisites:
1. Start Tableau. Under the **Connect** section, go to **To a Server** and choose **Other Databases (ODBC)**.
1. In the **DSN drop-down**, select the OpenSearch DSN you set up in the previous set of steps. The options you added will be automatically filled into the **Connection Attributes**.
2. In the **DSN drop-down**, select the OpenSearch DSN you set up in the previous set of steps. The options you added will be automatically filled in under the **Connection Attributes**.
1. Select **Sign In**. After a few seconds, Tableau connects to your OpenSearch server. Once connected, you will directed to **Datasource** window. The **Database** will be already populated with name of the OpenSearch cluster.
3. Select **Sign In**. After a few seconds, Tableau connects to your OpenSearch server. Once connected, you will be directed to the **Datasource** window. The **Database** will be already be populated with the name of the OpenSearch cluster.
To list all the indices, click the search icon under **Table**.
1. Start playing with data by dragging table to connection area. Choose **Update Now** or **Automatically Update** to populate table data.
4. Start experimenting with data by dragging the table to the connection area. Choose **Update Now** or **Automatically Update** to populate the table data.
See more detailed instructions in the [GitHub repository](https://github.com/opensearch-project/sql/blob/main/sql-odbc/docs/user/tableau_support.md).
### Troubleshooting
@ -203,4 +202,6 @@ This is most likely due to OpenSearch server not running on **host** and **post*
Confirm **host** and **post** are correct and OpenSearch server is running with OpenSearch SQL plugin.
Also make sure `.tdc` that was downloaded with the installer is copied correctly to `<user_home_directory>/Documents/My Tableau Repository/Datasources` directory.
{% endcomment %}
## Connecting to Microsoft Power BI
Follow the [installation instructions](https://github.com/opensearch-project/sql/blob/main/bi-connectors/PowerBIConnector/README.md) and the [configuration instructions](https://github.com/opensearch-project/sql/blob/main/bi-connectors/PowerBIConnector/power_bi_support.md) published in the GitHub repository.

View File

@ -2,7 +2,8 @@
layout: default
title: JSON Support
parent: SQL
nav_order: 7
grand_parent: SQL and PPL
nav_order: 8
---
# JSON Support

View File

@ -1,8 +1,8 @@
---
layout: default
title: Troubleshooting
parent: SQL
nav_order: 17
parent: SQL and PPL
nav_order: 88
---
# Troubleshooting

View File

@ -2,6 +2,7 @@
layout: default
title: Query Workbench
parent: SQL
grand_parent: SQL and PPL
nav_order: 1
---