SQL: Doc on syntax (identifiers in particular) (#38662)

Add section on syntax, identifiers and literals and on single vs double quotes. (cherry picked from commit aafdb598082e451f36294bd174d0887a276d8c7f)
2025-03-25 01:19:02 +00:00 · 2019-02-15 15:24:03 +02:00 · 2019-02-15 15:24:03 +02:00 · 9357c17288
commit 9357c17288
parent eb7da4b90d
10 changed files with 269 additions and 16 deletions
--- a/docs/reference/sql/appendix/syntax-reserved.asciidoc
+++ b/docs/reference/sql/appendix/syntax-reserved.asciidoc
@ -5,7 +5,7 @@

 Table with reserved keywords that need to be quoted. Also provide an example to make it more obvious.

-The following table lists all of the keywords that are reserved in Presto,
+The following table lists all of the keywords that are reserved in {es-sql},
 along with their status in the SQL standard. These reserved keywords must
 be quoted (using double quotes) in order to be used as an identifier, for example:

@ -31,43 +31,65 @@ s|SQL-92
 |`BETWEEN`                    |reserved      |reserved
 |`BY`                         |reserved      |reserved
 |`CAST`                       |reserved      |reserved
+|`CATALOG`                    |reserved      |reserved
+|`CONVERT`                    |reserved      |reserved
+|`CURRENT_DATE`               |reserved      |reserved
+|`CURRENT_TIMESTAMP`          |reserved      |reserved
+|`DAY`                        |reserved      |reserved
+|`DAYS`                       |              |
 |`DESC`                       |reserved      |reserved
 |`DESCRIBE`                   |reserved      |reserved
 |`DISTINCT`                   |reserved      |reserved
+|`ESCAPE`                     |reserved      |reserved
 |`EXISTS`                     |reserved      |reserved
 |`EXPLAIN`                    |reserved      |reserved
 |`EXTRACT`                    |reserved      |reserved
 |`FALSE`                      |reserved      |reserved
+|`FIRST`                      |reserved      |reserved
 |`FROM`                       |reserved      |reserved
 |`FULL`                       |reserved      |reserved
 |`GROUP`                      |reserved      |reserved
 |`HAVING`                     |reserved      |reserved
+|`HOUR`                       |reserved      |reserved
+|`HOURS`                      |              |
 |`IN`                         |reserved      |reserved
 |`INNER`                      |reserved      |reserved
+|`INTERVAL`                   |reserved      |reserved
 |`IS`                         |reserved      |reserved
 |`JOIN`                       |reserved      |reserved
 |`LEFT`                       |reserved      |reserved
 |`LIKE`                       |reserved      |reserved
 |`LIMIT`                      |reserved      |reserved
 |`MATCH`                      |reserved      |reserved
+|`MINUTE`                     |reserved      |reserved
+|`MINUTES`                    |              |
+|`MONTH`                      |reserved      |reserved
 |`NATURAL`                    |reserved      |reserved
-|`NO`                         |reserved      |reserved
 |`NOT`                        |reserved      |reserved
 |`NULL`                       |reserved      |reserved
+|`NULLS`                      |              |
 |`ON`                         |reserved      |reserved
 |`OR`                         |reserved      |reserved
 |`ORDER`                      |reserved      |reserved
 |`OUTER`                      |reserved      |reserved
 |`RIGHT`                      |reserved      |reserved
+|`RLIKE`                      |              |
+|`QUERY`                      |              |
+|`SECOND`                     |reserved      |reserved
+|`SECONDS`                    |              |
 |`SELECT`                     |reserved      |reserved
 |`SESSION`                    |              |reserved
 |`TABLE`                      |reserved      |reserved
+|`TABLES`                     |              |
 |`THEN`                       |reserved      |reserved
 |`TO`                         |reserved      |reserved
 |`TRUE`                       |reserved      |reserved
+|`TYPE`                       |              |
 |`USING`                      |reserved      |reserved
 |`WHEN`                       |reserved      |reserved
 |`WHERE`                      |reserved      |reserved
 |`WITH`                       |reserved      |reserved
+|`YEAR`                       |reserved      |reserved
+|`YEARS`                      |              |

 |===
--- a/docs/reference/sql/index.asciidoc
+++ b/docs/reference/sql/index.asciidoc
@ -12,9 +12,13 @@
 [partintro]
 --

-X-Pack includes a SQL feature to execute SQL against Elasticsearch
+X-Pack includes a SQL feature to execute SQL queries against {es}
 indices and return results in tabular format.

+The following chapters aim to cover everything from usage, to syntax and drivers.
+Experience users or those in a hurry might want to jump directly to 
+the list of SQL <<sql-commands, commands>> and <<sql-functions, functions>>.
+
 <<sql-overview, Overview>>::
    Overview of {es-sql} and its features.
 <<sql-getting-started, Getting Started>>::
@ -22,22 +26,19 @@ indices and return results in tabular format.
 <<sql-concepts, Concepts and Terminology>>::
    Language conventions across SQL and {es}.
 <<sql-security,Security>>::
-    Securing {es-sql} and {es}.
+    Secure {es-sql} and {es}.
 <<sql-rest,REST API>>::
-    Accepts SQL in a JSON document, executes it, and returns the
-    results.
+    Execute SQL in JSON format over REST.
 <<sql-translate,Translate API>>::
-    Accepts SQL in a JSON document and translates it into a native
-    Elasticsearch query and returns that.
+    Translate SQL in JSON format to {es} native query.
 <<sql-cli,CLI>>::
-    Command-line application that connects to {es} to execute
-    SQL and print tabular results.
+    Command-line application for executing SQL against {es}.
 <<sql-jdbc,JDBC>>::
-    A JDBC driver for {es}.
+    JDBC driver for {es}.
 <<sql-odbc,ODBC>>::
-    An ODBC driver for {es}.
+    ODBC driver for {es}.
 <<sql-client-apps,Client Applications>>::
-    Documentation for configuring various SQL/BI tools with {es-sql}.
+    Setup various SQL/BI tools with {es-sql}.
 <<sql-spec,SQL Language>>::
    Overview of the {es-sql} language, such as supported data types, commands and
    syntax.
--- a/docs/reference/sql/language/index.asciidoc
+++ b/docs/reference/sql/language/index.asciidoc
@ -3,12 +3,14 @@
 [[sql-spec]]
 == SQL Language

-This chapter describes the SQL semantics supported in X-Pack namely:
+This chapter describes the SQL syntax and semantics supported namely:

-<<sql-data-types>>:: Data types
+<<sql-lexical-structure>>:: Lexical structure
 <<sql-commands>>:: Commands
+<<sql-data-types>>:: Data types
 <<sql-index-patterns>>:: Index patterns

+include::syntax/lexic/index.asciidoc[]
+include::syntax/commands/index.asciidoc[]
 include::data-types.asciidoc[]
-include::syntax/index.asciidoc[]
 include::index-patterns.asciidoc[]
--- a/docs/reference/sql/language/syntax/commands/describe-table.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/describe-table.asciidoc
--- a/docs/reference/sql/language/syntax/commands/index.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/index.asciidoc
--- a/docs/reference/sql/language/syntax/commands/select.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/select.asciidoc
--- a/docs/reference/sql/language/syntax/commands/show-columns.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/show-columns.asciidoc
--- a/docs/reference/sql/language/syntax/commands/show-functions.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/show-functions.asciidoc
--- a/docs/reference/sql/language/syntax/commands/show-tables.asciidoc
+++ b/docs/reference/sql/language/syntax/commands/show-tables.asciidoc
--- a/docs/reference/sql/language/syntax/lexic/index.asciidoc
+++ b/docs/reference/sql/language/syntax/lexic/index.asciidoc
@ -0,0 +1,228 @@
+[role="xpack"]
+[testenv="basic"]
+[[sql-lexical-structure]]
+== Lexical Structure
+
+This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth.
+
+{es-sql} currently accepts only one _command_ at a time. A command is a sequence of _tokens_ terminated by the end of input stream.
+
+A token can be a __key word__, an _identifier_ (_quoted_ or _unquoted_), a _literal_ (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided.
+
+[[sql-syntax-keywords]]
+[float]
+=== Key Words
+
+Take the following example:
+
+[source, sql]
+----
+SELECT * FROM table
+----
+
+This query has four tokens: `SELECT`, `\*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are __key words__ meaning words that have a fixed meaning in SQL. The token `table` is an _identifier_ meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc...
+
+As one can see, both key words and identifiers have the _same_ lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the <<sql-syntax-reserved, reserved appendix>>.
+Do note that key words are case-insensitive meaning the previous example can be written as:
+
+[source, sql]
+----
+select * fRoM table;
+----
+
+Identifiers however are not - as {es} is case sensitive, {es-sql} uses the received value verbatim.
+
+To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others.
+
+[[sql-syntax-identifiers]]
+[float]
+=== Identifiers
+
+Identifiers can be of two types: __quoted__ and __unquoted__:
+
+[source, sql]
+----
+SELECT ip_address FROM "hosts-*"
+----
+
+This query has two identifiers, `ip_address` and `hosts-\*` (an <<multi-index,index pattern>>). As `ip_address` does not clash with any key words it can be used verbatim, `hosts-*` on the other hand cannot as it clashes with `-` (minus operation) and `*` hence the double quotes.
+
+Another example:
+
+[source, sql]
+----
+SELECT "from" FROM "<logstash-{now/d}>"
+----
+
+The first identifier from needs to quoted as otherwise it clashes with the `FROM` key word (which is case insensitive as thus can be written as `from`) while the second identifier using {es} <<date-math-index-names>> would have otherwise confuse the parser.
+
+Hence why in general, *especially* when dealing with user input it is *highly* recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation.
+
+[[sql-syntax-literals]]
+[float]
+=== Literals (Constants)
+
+{es-sql} supports two kind of __implicitly-typed__ literals: strings and numbers.
+
+[[sql-syntax-string-literals]]
+[float]
+==== String Literals
+
+A string literal is an arbitrary number of characters bounded by single quotes `'`: `'Giant Robot'`. 
+To include a single quote in the string, escape it using another single quote: `'Captain EO''s Voyage'`. 
+
+NOTE: An escaped single quote is *not* a double quote (`"`), but a single quote `'` _repeated_ (`''`).
+
+[sql-syntax-numeric-literals]
+[float]
+==== Numeric Literals
+
+Numeric literals are accepted both in decimal and scientific notation with exponent marker (`e` or `E`), starting either with a digit or decimal point `.`:
+
+[source, sql]
+----
+1969    -- integer notation
+3.14    -- decimal notation
+.1234   -- decimal notation starting with decimal point
+4E5     -- scientific notation (with exponent marker)
+1.2e-3  -- scientific notation with decimal point
+----
+
+Numeric literals that contain a decimal point are always interpreted as being of type `double`. Those without are considered `integer` if they fit otherwise their type is `long` (or `BIGINT` in ANSI SQL types).
+
+[[sql-syntax-generic-literals]]
+[float]
+==== Generic Literals
+
+When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated <<sql-operators-cast, cast operator>> and <<sql-functions-type-conversion, functions>>:
+
+[source, sql]
+----
+123::LONG                                   -- cast 123 to a LONG
+CAST('1969-05-13T12:34:56' AS TIMESTAMP)    -- cast the given string to datetime
+CONVERT('10.0.0.1', IP)                     -- cast '10.0.0.1' to an IP    
+----
+
+Do note that {es-sql} provides functions that out of the box return popular literals (like `E()`) or provide dedicated parsing for certain strings.
+
+[[sql-syntax-single-vs-double-quotes]]
+[float]
+=== Single vs Double Quotes
+
+It is worth pointing out that in SQL, single quotes `'` and double quotes `"` have different meaning and *cannot* be used interchangeably.
+Single quotes are used to declare a <<sql-syntax-string-literals, string literal>> while double quotes for <<sql-syntax-identifiers, identifiers>>.
+
+To wit:
+
+[source, sql]
+----
+SELECT "first_name" <1>
+  FROM "musicians"  <1>
+ WHERE "last_name"  <1>
+     = 'Carroll'    <2>
+----
+
+<1> Double quotes `"` used for column and table identifiers
+<2> Single quotes `'` used for a string literal
+
+[[sql-syntax-special-chars]]
+[float]
+=== Special characters
+
+A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below:
+
+
+[cols="^m,^15"]
+
+|===
+
+s|Char
+s|Description
+
+|* | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions.
+|, | Commas are used to enumerate the elements of a list.
+|. | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc...).
+|()| Parentheses are used for specific SQL commands, function declarations or to enforce precedence.
+|===
+
+[[sql-syntax-operators]]
+[float]
+=== Operators
+
+Most operators in {es-sql} have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence.
+
+The following table indicates the supported operators and their precendence (highest to lowest);
+
+[cols="^2m,^,^3"]
+
+|===
+
+s|Operator/Element
+s|Associativity
+s|Description
+
+|.
+|left
+|qualifier separator
+
+|::
+|left
+|PostgreSQL-style type cast
+
+|+ - 
+|right
+|unary plus and minus (numeric literal sign)
+
+|* / %
+|left
+|multiplication, division, modulo
+
+|+ -
+|left
+|addition, substraction
+
+|BETWEEN IN LIKE
+|
+|range containment, string matching
+
+|< > <= >= = <=> <> !=
+|
+|comparison
+
+|NOT
+|right
+|logical negation
+
+|AND
+|left
+|logical conjunction
+
+|OR
+|left
+|logical disjunction
+
+|===
+
+
+[[sql-syntax-comments]]
+[float]
+=== Comments
+
+{es-sql} allows comments which are sequence of characters ignored by the parsers.
+
+Two styles are supported:
+
+Single Line:: Comments start with a double dash `--` and continue until the end of the line.
+Multi line:: Comments that start with `/\*` and end with `*/` (also known as C-style). 
+
+
+[source, sql]
+----
+-- single line comment
+/* multi
+   line
+   comment
+   that supports /* nested comments */
+   */
+----
+