OpenSearch

Commit Graph

Author	SHA1	Message	Date
Marios Trivyzas	f958e9abdc	SQL: Implement scripting inside aggs (#55241 ) (#55371 ) Implement the use of scalar functions inside aggregate functions. This allows for complex expressions inside aggregations, with or without GROUBY as well as with or without a HAVING clause. e.g.: ``` SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b FROM test GROUP BY b HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5 ``` Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as this is currently not implemented on the ElasticSearch side. Fixes: #29980 Fixes: #36865 Fixes: #37271 (cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)	2020-04-17 12:41:22 +02:00
Marios Trivyzas	d9fd6fc90c	SQL: [Docs] Fix typo Add missing closing "`" Follows: `c2e0552537`	2020-02-12 21:50:57 +01:00
Marios Trivyzas	c2e0552537	SQL: [Docs] Add limitation for sorting on aggs (#52210 ) Add a section to point out that when ordering by an aggregate only plain aggregate functions are allowed, no scalars/operators can be used on top of them. Fixes: #52204 (cherry picked from commit 78a1185549ff7f3229fd2d036567eb2a4f2cf230)	2020-02-12 12:56:06 +01:00
Marios Trivyzas	f373020349	SQL: Fix ORDER BY YEAR() function (#51562 ) Previously, if YEAR() was used as and ORDER BY argument without being wrapped with another scalar (e.g. YEAR(birth_date) + 10), no script ordering was used but instead the underlying field (e.g. birth_date) was used instead as a performance optimisation. This works correctly if YEAR() is the only ORDER BY arg but if further args are used as tie breakers for the ordering wrong results are produced. This is because 2 rows with the different birth_date but on the same year are not tied as the underlying ordering is on birth_date and not on the YEAR(birth_date), and the following ORDER BY args are ignored. Remove this optimisation for YEAR() to avoid incorrect results in such cases. As a consequence another bug is revealed: scalar functions on top of nested fields produce scripted sorting/filtering which is not yet supported. In such cases no error was thrown but instead all values for such nested fields were null and were passed to the script implementing the sorting/filtering, producing incorrect results. Detect such cases and throw a validation exception. Fixes: #51224 (cherry picked from commit f41efd6753dc3650a7eabb3e07b02b3b32c5704c)	2020-01-30 15:29:36 +01:00
Bogdan Pintea	8c2ab8bb72	SQL:Docs: add the PIVOT clause to SELECT section (#49129 ) The PR adds the documentation on the PIVOT clause. (cherry picked from commit a55b36065e6496c44b6e3191296931d477a8e5f5)	2019-11-20 18:21:06 +01:00
Andrei Stefan	2633d11eb7	Switch from using docvalue_fields to extracting values from _source (#44062 ) (#44804 ) * Switch from using docvalue_fields to extracting values from _source where applicable. Doing this means parsing the _source and handling the numbers parsing just like Elasticsearch is doing it when it's indexing a document. * This also introduces a minor limitation: aliases type of fields that are NOT part of a tree of sub-fields will not be able to be retrieved anymore. field_caps API doesn't shed any light into a field being an alias or not and at _source parsing time there is no way to know if a root field is an alias or not. Fields of the type "a.b.c.alias" can be extracted from docvalue_fields, only if the field they point to can be extracted from docvalue_fields. Also, not all fields in a hierarchy of fields can be evaluated to being an alias. (cherry picked from commit 8bf8a055e38f00df5f49c8d97f632f69d6e00c2c)	2019-07-25 10:02:41 +03:00
Marios Trivyzas	9cd89c3453	SQL: Increase hard limit for sorting on aggregates (#43220 ) To be consistent with the `search.max_buckets` default setting, set the hard limit of the PriorityQueue used for in memory sorting, when sorting on an aggregate function, to 10000. Fixes: #43168 (cherry picked from commit 079e012fdea68ea0a7daae078359495047e9c407)	2019-06-14 13:51:38 +02:00
Igor Motov	70ea3cf847	SQL: Add initial geo support (#42031 ) (#42135 ) Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. Queries that are supported as a result of this initial implementation Metadata commands - `DESCRIBE table` - returns the correct column types `GEOMETRY` for geo shapes and geo points. - `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions - `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. Returning geoshapes and geopoints from elasticsearch - `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console. - `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation; Using geopoints to elasticsearch - The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. - `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query. Limitations: Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values. Relates to #29872 Backport of #42031	2019-05-14 18:57:12 -05:00
Marios Trivyzas	d5b0badeb7	SQL: Remove CircuitBreaker from parser (#41835 ) The CircuitBreaker was introduced as means of preventing a `StackOverflowException` during the build of the AST by the parser. The ANTLR4 grammar causes a weird behaviour for a Parser Listener. The `enterEveryRule()` method is often called with a different parsing context than the respective `exitEveryRule()`. This makes it difficult to keep track of the tree's depth, and a custom Map was used as an attempt of matching the contextes as they are encounter during `enter` and during `exit` of the rules. This approach had 2 important drawbacks: 1. It's hard to maintain this custom Map as the grammar changes. 2. The CircuitBreaker could often lead to false positives which caused valid queries to return an Exception and prevent them from executing. So, this removes completely the CircuitBreaker which is replaced be a simple handling of the `StackOverflowException` Fixes: #41471 (cherry picked from commit 1559a8e2dbd729138b52e89b7e80264c9f4ad1e7)	2019-05-07 23:25:37 +03:00
James Rodewig	53702efddd	[DOCS] Add anchors for Asciidoctor migration (#41648 )	2019-04-30 10:20:17 -04:00
Marios Trivyzas	899ed2bf81	SQL: Introduce SQL TIME data type (#39802 ) Support ANSI SQL's TIME type by introductin a runtime-only ES SQL time type. Closes: #38174 (cherry picked from commit 046ccd4cf0a251b2a3ddff6b072ab539a6711900)	2019-04-01 23:57:27 +02:00
Costin Leau	61f49af497	SQL: Spec tests now use classpath discovery (#40388 ) To avoid having to specify each spec by hand (which can miss specs to be added), the test infrastructure now performs classpath discovery so that each spec added, is automatically considered. Relates #40358 (cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90)	2019-03-25 15:22:52 +02:00
Costin Leau	076a68007c	SQL: Add multi_value_field_leniency inside FieldHitExtractor (#40113 ) For cases where fields can have multi values, allow the behavior to be customized through a dedicated configuration field. By default this will be enabled on the drivers so that existing datasets work instead of throwing an exception. For regular SQL usage, the behavior is false so that the user is aware of the underlying data. Fix #39700 (cherry picked from commit 2b351571961f172fd59290ee079126bbd081ceaf)	2019-03-18 14:56:03 +02:00
Andrei Stefan	ba44f28340	SQL: ignore UNSUPPORTED fields for JDBC and ODBC modes in 'SYS COLUMNS' (#39518 ) * SYS COLUMNS will skip UNSUPPORTED field types in ODBC and JDBC, as well. NESTED and OBJECT types were already skipped in ODBC mode, now they are skipped in JDBC mode, as well. (cherry picked from commit 9e0df64b2d36c9069dfa506570468f0522c86417)	2019-03-01 15:26:31 +02:00
Costin Leau	5b112b1d9d	SQL: remove beta marker from documentation (#38661 ) (cherry picked from commit fb6e7a30c9eed1e8b83496aaf1efe7e2288f9dd8)	2019-02-10 00:09:58 +02:00
Costin Leau	783c9ed372	SQL: Allow sorting of groups by aggregates (#38042 ) Introduce client-side sorting of groups based on aggregate functions. To allow this, the Analyzer has been extended to push down to underlying Aggregate, aggregate function and the Querier has been extended to identify the case and consume the results in order and sort them based on the given columns. The underlying QueryContainer has been slightly modified to allow a view of the underlying values being extracted as the columns used for sorting might not be requested by the user. The PR also adds minor tweaks, mainly related to tree output. Close #35118	2019-02-02 01:38:25 +02:00
Marios Trivyzas	19dccf8f3e	SQL: [Docs] Add limitation for aggregate functions on scalars (#38186 ) Currently aggregate functions can operate only directly on fields. They cannot be used on top of scalar functions as painless scripting is currently not supported.	2019-02-01 16:13:51 +02:00
Marios Trivyzas	4710a7472f	SQL: Implement FIRST/LAST aggregate functions (#37936 ) FIRST and LAST can be used with one argument and work similarly to MIN and MAX but they are implemented using a Top Hits aggregation and therefore can also operate on keyword fields. When a second argument is provided then they return the first/last value of the first arg when its values are ordered ascending/descending (respectively) by the values of the second argument. Currently because of the usage of a Top Hits aggregation FIRST and LAST cannot be used in the HAVING clause of a GROUP BY query to filter on the results of the aggregation. Closes: #35639	2019-01-31 16:33:05 +02:00
Andrei Stefan	39a072389c	SQL: add sub-selects to the Limitations page (#37012 )	2019-01-07 10:08:51 +02:00
Andrei Stefan	09fa827adc	SQL: documentation improvements and updates (#36918 ) * Added Limitations page * Made the aggregations page follow the common template for functions * Modified all tables to have the first row's cells content centered * Polishing in other various sections	2018-12-21 23:25:54 +02:00

20 Commits