The related issue regarding aggregation queries where some literals
are also selected together with aggregate function has been fixed
with #49570. Add integration tests to verify the behavior.
Relates to: #41411
(cherry picked from commit 9f414a8d05c75e1a9f8250084f6dcd634d5d78d8)
Add unit and integration tests where literals are SELECTed
in combination with GROUP BY and possibly aggregate functions.
Relates to #41411 and #34583
which have been fixed.
(cherry picked from commit b97f1ca12675d6ea4772c60578922fe1cc2409ee)
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
Previously, if YEAR() was used as and ORDER BY argument without being
wrapped with another scalar (e.g. YEAR(birth_date) + 10), no script
ordering was used but instead the underlying field (e.g. birth_date)
was used instead as a performance optimisation. This works correctly if
YEAR() is the only ORDER BY arg but if further args are used as tie
breakers for the ordering wrong results are produced. This is because
2 rows with the different birth_date but on the same year are not tied
as the underlying ordering is on birth_date and not on the
YEAR(birth_date), and the following ORDER BY args are ignored.
Remove this optimisation for YEAR() to avoid incorrect results in
such cases.
As a consequence another bug is revealed: scalar functions on top
of nested fields produce scripted sorting/filtering which is not yet
supported. In such cases no error was thrown but instead all values for
such nested fields were null and were passed to the script implementing
the sorting/filtering, producing incorrect results.
Detect such cases and throw a validation exception.
Fixes: #51224
(cherry picked from commit f41efd6753dc3650a7eabb3e07b02b3b32c5704c)
In the SQL with SSL tests, we need to find the interfaces that are up,
are loopback devices, or have a loopback address. If we check if the
device is up first, we can run into situations where the device is a
virtual ethernet device that might have disappeared between us seeing
the device, and checking if it is up. By first checking if the device is
a loopback device or it has a loopback address, then we can avoid
checking if the device is up except for loopback devices and therefore
we can avoid the disappearing virtual ethernet device problem.
* Introduce reusable QL plugin for SQL and EQL (#50815)
Extract reusable functionality from SQL into its own dedicated project QL.
Implemented as a plugin, it provides common components across SQL and the upcoming EQL.
While this commit is fairly large, for the most part it's just a big file move from sql package to the newly introduced ql.
(cherry picked from commit ec1ac0d463bfa12a02c8174afbcdd6984345e8b4)
* SQL: Fix incomplete registration of geo NamedWritables
(cherry picked from commit e295763686f9592976e551e504fdad1d2a3a566d)
* QL: Extend NodeSubclass to read classes from jars (#50866)
As the test classes are spread across more than one project, the Gradle
classpath contains not just folders but also jars.
This commit allows the test class to explore the archive content and
load matching classes from said source.
(cherry picked from commit 25ad74928afcbf286dc58f7d430491b0af662f04)
* QL: Remove implicit conversion inside Literal (#50962)
Literal constructor makes an implicit conversion for each value given
which turns out has some subtle side-effects.
Improve MathProcessors to preserve numeric type where possible
Fix bug on issue compatibility between date and intervals
Preserve the source when folding inside the Optimizer
(cherry picked from commit 9b73e225b0aa07a23859550fb117bae571a2b672)
* QL: Refactor DataType for pluggability (#51328)
Change DataType from enum to class
Break DataType enums into QL (default) and SQL types
Make data type conversion pluggable so that new types can be introduced
As part of the process:
- static type conversion in QL package (such as Literal) has been
removed
- several utility classes have been broken into base (QL) and extended
(SQL) parts based on type awareness
- operators (+,-,/,*) are
- due to extensibility, serialization of arithmetic operation has been
slightly changed and pushed down to the operator executor itself
(cherry picked from commit aebda81b30e1563b877a8896309fd50633e0b663)
* Compilation fixes for 7.x
* REST PreparedStatement-like query parameters are now supported in the form of an array of non-object, non-array values where ES SQL parser will try to infer the data type of the value being passed as parameter.
(cherry picked from commit 45b8bf619aecb1c03d7bc0cf06928dcc36005a66)
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
"dac": {}
}'
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
}
],
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
},
"status" : 400
}
```
The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
Hide the `.async-search-*` in Security by making it a restricted index namespace.
The namespace is hard-coded.
To grant privileges on restricted indices, one must explicitly toggle the
`allow_restricted_indices` flag in the indices permission in the role definition.
As is the case with any other index, if a certain user lacks all permissions for an
index, that index is effectively nonexistent for that user.
To recap, Attributes form the properties of a derived table.
Each LogicalPlan has Attributes as output since each one can be part of
a query and as such its result are sent to its consumer.
This change essentially removes the name id comparison so any changes
applied to existing expressions should work as long as the said
expressions are semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.
By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent a reference from getting out of sync with its source due to
optimizations.
Essentially going forward there are only 3 types of NamedExpressions:
Alias - user define (implicit or explicit) name
FieldAttribute - field backed by Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute. Typically the Attribute of an Alias.
* Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.
* Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.
* Side-effect, simplify the rules as the state for InnerAggs doesn't
have to be contained anymore.
* Improve ResolveMissingRef rule to handle references to named
non-singular expression tree against the same expression used up the
tree.
#49693 backport to 7.x
(cherry picked from commit 5d095e2173bcbf120f534a6f2a584185a7879b57)
Some extended testing with MS-SQL server and H2 (which agree on
results) revealed bugs in the implementation of WEEK related extraction
and diff functions.
Non-iso WEEK seems to be broken since #48209 because
of the replacement of Calendar and the change in the ISO rules.
ISO_WEEK failed for some edge cases around the January 1st.
DATE_DIFF was previously based on non-iso WEEK extraction which seems
not to be the case.
Fixes: #49376
(cherry picked from commit 54fe7f57289c46bb0905b1418f51a00e8c581560)
Grouping By YEAR() is translated to a histogram aggregation, but
previously if there was a scalar function invloved (e.g.:
`YEAR(date + INTERVAL 2 YEARS)`), there was no proper script created
and the histogram was applied on a field with name: `date + INTERVAL 2 YEARS`
which doesn't make sense, and resulted in null result.
Check the underlying field of YEAR() and if it's a function call
`asScript()` to properly get the painless script on which the histogram
is applied.
Fixes: #49386
(cherry picked from commit 93c37abc943d00d3a14ba08435d118a6d48874c7)
Add TRUNC as alias to already implemented TRUNCATE
numeric function which is the flavour supported by
Oracle and PostgreSQL.
Relates to: #41195
(cherry picked from commit f2aa7f0779bc5cce40cc0c1f5e5cf1a5bb7d84f0)
Previously, CaseProcessor was pre-calculating (called `process()`)
on all the building elements of a CASE/IIF expression, not only the
conditions involved but also the results, as well as the final else result.
In case one of those results had an erroneous calculation
(e.g.: division by zero) this was executed and resulted in
an Exception to be thrown, even if this result was not used because of
the condition guarding it. e.g.:
```
SELECT CASE myField1 = 0 THEN NULL ELSE myField2 / myField1 END
FROM test;
```
Fixes: #49388
(cherry picked from commit dbd169afc98686cae1bc72024fad0ca32b272efd)
Previously, DATEDIFF for minutes and hours was doing a
rounding calculation using all the time fields (secs, msecs/micros/nanos).
Instead it should first truncate the 2 dates to the respective field (mins or hours)
zeroing out all the more detailed time fields and then make the subtraction.
(cherry picked from commit 124cd18e20429e19d52fd8dc383827ea5132d428)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.
Closes#42042
* Introduce binary_format request parameter (binary.format for JDBC) to disable binary
communication between clients (jdbc/odbc) and server.
* for CLI - "binary" command line parameter (or -b) is introduced. Default value is "true".
* binary communication (cbor) is enabled by default
* disabling request parameter introduced for debugging purposes only
(cherry picked from commit f96a5ca61cb9fad9ed59357320af20e669348ce7)
Fix an issue that arises from the use of ExpressionIds as keys in a lookup map
that helps the QueryTranslator to identify the grouping columns. The issue is
that the same expression in different parts of the query (SELECT clause and GROUP BY clause)
ends up with different ExpressionIds so the lookup fails. So, instead of ExpressionIds
use the hashCode() of NamedExpression.
Fixes: #41159Fixes: #40001Fixes: #40240Fixes: #33361Fixes: #46316Fixes: #36074Fixes: #34543Fixes: #37044Fixes: #42041
(cherry picked from commit 3c38ea555984fcd2c6bf9e39d0f47a01b09e7c48)
This commit simplifies and standardizes our usage of the Gradle Shadow
plugin to conform more to plugin conventions. The custom "bundle" plugin
has been removed as it's not necessary and performs the same function
as the Shadow plugin's default behavior with existing configurations.
Additionally, this removes unnecessary creation of a "nodeps" artifact,
which is unnecessary because by default project dependencies will in
fact use the non-shadowed JAR unless explicitly depending on the
"shadow" configuration.
Finally, we've cleaned up the logic used for unit testing, so we are
now correctly testing against the shadow JAR when the plugin is applied.
This better represents a real-world scenario for consumers and provides
better test coverage for incorrectly declared dependencies.
(cherry picked from commit 3698131109c7e78bdd3a3340707e1c7b4740d310)
Previously, the safety check for the 2nd argument of the DateAddProcessor was
restricting it to Integer which was wrong since we allow all non-rational
numbers, so it's changed to a Number check as it's done in other cases.
Enhanced some tests regarding the check for an integer (non-rational
argument).
(cherry picked from commit 0516b6eaf5eb98fa5bd087c3fece80139a6b118e)
* Convert RunTask to use testclusers, remove ClusterFormationTasks
This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.
With this we can now remove ClusterFormationTasks.
DATE_PART(<datetime unit>, <date/datetime>) is a function that allows
the user to extract the specified unit from a date/datetime field
similar to the EXTRACT (<datetime unit> FROM <date/datetime>) but
with different names and aliases for the units and it also provides more
options like `DATE_PART('tzoffset', datetimeField)`.
Implemented following the SQL server's spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/datepart-transact-sql?view=sql-server-2017
with the difference that the <datetime unit> argument is either a
literal single quoted string or gets a value from a table field, whereas
in SQL server keywords are used (unquoted identifiers) and it's not
possible to use a value coming for a table column.
Closes: #46372
(cherry picked from commit ead743d3579eb753fd314d4a58fae205e465d72e)
Add examples of failures for both sql and csv integeration
tests and instructions on how to mute them.
(cherry picked from commit 591bba46516d770f5fc95a4c536dd7448b74dd49)
When an integration test fails before the assertion of the results it's
missing information, like the file name and the line in the file where
the test resides.
(cherry picked from commit 683dc7213311d13c81e06829e08f3f9f80ebf73a)
Previously, if a column (field, scalar, alias) appeared more than once in the
SELECT list, the value was returned only once (1st appearance) in each row.
Fixes: #41811
(cherry picked from commit 097ea36581a751605fc4f2088319d954ce35b5d1)
To be on the safe side in terms of use cases also add the alias
DATETRUNC to the DATE_TRUNC function.
Follows: #46473
(cherry picked from commit 9ac223cb1fc66486f86e218fa785a32b61e9bacc)
In some cases, the fetch size affects the way the groups are returned
causing the last page to go beyond the limit. Add dedicated check to
prevent extra data from being returned.
Fix#47002
(cherry picked from commit f4c29646f097bbd29855300342823ef4cef61c05)
Enables support for Cartesian geometries shape type. We still need to
decide how to handle the distance function since it is currently using
the haversine distance formula and returns results in meters, which
doesn't make any sense for Cartesian geometries.
Closes#46412
Relates to #43644
Add initial PIVOT support for transforming a regular table into a
statistics table around an arbitrary pivoting column:
SELECT * FROM
(SELECT languages, country, salary, FROM mp)
PIVOT (AVG(salary) FOR countries IN ('NL', 'DE', 'ES', 'RO', 'US'))
In the current implementation PIVOT allows only one aggregation however
this restriction is likely to be lifted in the future.
Also not all aggregations are working, in particular MatrixStats are not yet supported.
(cherry picked from commit d91263746a222915c570d4a662ec48c1d6b4f583)
Since the `resolveAllDependencies` task resolves all the congfigurations
it can find, this was not caught by our testing, but it's required to be
configuraed specifically.
We should probably cut-over to the new configurations at some point to
avoid problems like this.
Closeselastic/infra#14580
When encountering only indices with empty mapping, the IndexResolver
throws an exception as it expects to find at least one entry.
This commit fixes this case so that an empty mapping is returned.
Fix#46757
(cherry picked from commit 5f4f5807acb93b5fab36718c092c328977a396b6)
Handle queries with implicit GROUP BY where the aggregation is not in
the projection/SELECT but inside the filter/HAVING such as:
SELECT 1 FROM x HAVING COUNT(*) > 0
The engine now properly identifies the case and handles it accordingly.
Fix#37051
(cherry picked from commit fa53ca05d8219c27079b50b4a5b7aeb220c7cde2)