The SimplifyConditional rule is removing NULL literals from those
functions to simplify their evaluation. This happens in the Optimizer
and a new instance of the conditional function is generated. Previously,
the dataType was not set properly (defaulted to DataType.NULL) for
those new instances and since the resolveType() wasn't called again
it resulted in returning always null.
E.g.:
SELECT COALESCE(null, 'foo', null, 'bar')
COALESCE(null, 'foo', null, 'bar')
-----------------
null
This issue was not visible before because the tests always used an alias
for the conditional function which caused the resolveType() to be
called which sets the dataType properly.
E.g.:
SELECT COALESCE(null, 'foo', null, 'bar') as c
c
-----------------
foo
(cherry picked from commit c39980a65dd593363f1d8d1b038b26cb0ce02aaf)
Fix bug in predicate subtraction that caused the evaluation to be
skipped on the first mismatch instead of evaluating the whole list. In
some cases this caused not only an incorrect result but one that kept on
growing causing the engine to bail
Fix#40835
(cherry picked from commit bd2b33d6eaca616a5acd846204e2d12f905854d4)
* Handle the scenario where assertLogs() is not called from a test method
but the audit rolling file rolls over.
* Use a local boolean variable instead of the static one to account for
assertBusy() code block possibly being called multiple times and having
different execution paths.
(cherry picked from commit 6f642196cbab90079c610097befc794746170df1)
CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP can be used as SQL keywords
(without parentheses) and therefore there is a special rule in the
grammar to accommodate this.
Previously, this rule was also catching the parenthesised version of those functions too,
not allowing the {fn <functionName>()} to be used. E.g.:
{fn current_time(2)} or {fn current_timestamp()}
Now, the grammar rule catches only the keyword versions and all the parenthesised
versions go through the normal function resolution. As a consequence the validation
of the precision is moved from the parser lever (ExpressionBuilder) to the function
implementations.
Fixes: #41240
(cherry picked from commit bfbc9f140144b5a35aa29008b58bf58074419853)
When specifying a limit over an agg sorting, the limit will be pushed
down to the grouping which affects the custom sorting. This commit fixes
that and restricts the limit only to sorting.
Fix#40984
(cherry picked from commit da3726528d9011b05c0677ece6d11558994eccd9)
Although the translation rule was implemented in the `Optimizer`,
the rule was not added in the list of rules to be executed.
Relates to #41195
Follows #37936
(cherry picked from commit f426a339b77af6008d41cc000c9199fe384e9269)
Yet another improvement to SYS TABLES on differentiating between table
types specified as '%' and '' while maintaining legacy support for null
Fix#40775
(cherry picked from commit 6dbca5edd335eb1da8e7825389a15e5fe45397d4)
As empty string has a certain meaning, the JDBC driver returns an empty
set instead for better client compatibility.
Fix#41028
(cherry picked from commit 4cbafa585b7a514eb6c156606dd516324cd3980a)
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)
This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.
(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)
* Fix forking JVM runner
* Don't bump shadow plugin version
Properly treat '%' as a wildcard for catalog filtering instead of doing
a straight string match.
Table filtering now considers aliases as well.
Add escaping char for LIKE queries with user defined params
Fix monotony of ORDINAL_POSITION
Add integration test for SYS COLUMNS - currently running only inside
single_node since the cluster name is test dependent.
Add pattern unescaping for index names
Fix#40582
(cherry picked from commit 8e61b77d3f849661b7175544f471119042fe9551)
Move verification of arguments for Conditional functions and IN
from `Verifier` to the `resolveType()` method of the functions.
(cherry picked from commit 241644aac57baee1eb128b993ee410c7d08172a5)
* Avoid sharing source directories as it breaks intellij
* Subprojects share main project output classes directory
* Fix jar hell
* Fix sql security with ssl integ tests
* Relax dependency ordering rule so we don't explode on cycles
Changed the JDBC metadata to return empty results sets instead of
throwing SQLFeatureNotSupported as it seems a more safer/compatible
approach for consumers.
Fix#40533
(cherry picked from commit ef2d2527c2b5140556fd477e7ff6ea36966684da)
- Remove superfluous methods that are already
defined in superclasses.
- Improve tests for null folding on conditionals
(cherry picked from commit 67f9404f5004362e569353d1e950ffe5d7a9ab6e)
After `TIME` SQL data type is introduced, implement
`CURRENT_TIME/CURTIME` functions similarly to CURRENT_TIMESTAMP
that return the system's current time (only, without the date part).
Closes: #40468
(cherry picked from commit 9feede781409d0e264ce45951a25b28ff129b187)
TimeProcessor didn't implement `getWriteableName()` so the one from
the parent was used which returned the `NAME` of the parent. This
caused `TimeProcessor` objects to be deserialised into
DateTimeProcessor.
Moreover, added a restriction to run the TIME related integration tests
only in UTC timezone.
Fixes: #40717
(cherry picked from commit cfea348bec20e547df72c415cccd85279accb767)
A full format for a DATETIME would be:
`2019-03-30T10:20:30.123+10:00` which is 29 chars long.
For DATE a full format would be: `2019-03-30T00:00:00.000+10:00`
which is also 29 chars long.
(cherry picked from commit 6be83964ed025528778bca8d35692762e166983b)
Support ANSI SQL's TIME type by introductin a runtime-only
ES SQL time type.
Closes: #38174
(cherry picked from commit 046ccd4cf0a251b2a3ddff6b072ab539a6711900)
* Have LIKE and RLIKE only use term-level queries (wildcard and regexp respectively). They
are already working only with exact fields, thus be in-line with how
SQL works in general (what you index is what you search on).
(cherry picked from commit 1bba887d481b49db231a1442922f1813952dcc67)
Enable some Ignored integration tests for issues/features that
have already been resolved/implemented.
(cherry picked from commit c23580f477ffc61c5701e14a91006db7bf21a8d4)
Previously, an expression like `10 + 2::long` would be interpreted
as `CAST(10 + 2 AS LONG)` instead of `10 + CAST(2 AS LONG)`.
(cherry picked from commit e34cc2f38b1477e78788ee377938f42cc47187c7)
SYS TABLES meta command has been improved to better adhere to the ODBC
spec in particular with regards to the handling of enumerations (and
the differences between '%', null and ''(empty string))
Fix#40348
(cherry picked from commit e3070615000228c283d17ce8d182b44f1450a5d5)
Previously, `getTime(colIdx/colLabel)` and
`getObject(colIdx/colLabel, java.sql.Time.class)` methods were computing
the time from a `ZonedDateTime` by applying day in millis modulo on the epoch millis
of the `ZonedDateTime` object. This is wrong as we need to keep the time
related fields at the timezone of the `ZonedDateTime` object and just
set the date info to the epoch date (01/01/1970).
Additionally fixes a testing issue as the original timezone id is converted
to an offset string when parsing the response from the server.
* Document MATCH and QUERY function predicates.
* Polish the functions pages and add a list of functions to the main Functions & Operators page.
(cherry picked from commit 4cec0ae1b962ec7ea011a290aec72740386eb808)
To avoid having to specify each spec by hand (which can miss specs to be
added), the test infrastructure now performs classpath discovery so that
each spec added, is automatically considered.
Relates #40358
(cherry picked from commit d0f60b4425c731509aa8ca765d55f563f866ef90)
Previously, `getDate(int columnIdx)/getDate(String columnLabel)` and
were using legacy`java.util.Calendar` instead of the the `java.time.*`
classes to reset to the start of day. This resulted in different results
for certain timestamps and timezones when calling
`getDate(col)` vs`getObject(col, java.sql.Date)`
Now only the methods (that must be implemented due to the JDBC spec)
`getDate(int columnIdx, Calendar cal)/getDate(String columnLabel, Calendar cal)`
are still using the `java.util.Calendar` for those conversion.
The same change was applied to
`getTime(int columnIdx)/getTime(String columnLabel)`
and
`getTimestamp(int columnIdx)/getTimestamp(String columnLabel)`
Fixes: #40289
(cherry picked from commit 44560671f18397e0c58e3647732880fcb73a5034)
Previously metric aggregations on date fields would return a double
which caused errors when trying to apply scalar functions on top, e.g.:
```
SELECT YEAR(MAX(date)) FROM test
```
Fixes: #40376
(cherry-picked from commit 41d0a038467fbdbbf67fd9bfdf27623451cae63a)
* Refactor RegexMatch to support both LIKE and RLIKE
* Add integration tests for RLIKE
* Polish the rest of tests
(cherry picked from commit 7562d6eeeb77c04794002649fe726f4b3a9a398b)
Upgrade JLine to 3.10.0
Switch to using JLine granular jars instead of the uber-one
Remove Jansi dependency (due to errors in closing streams)
Pin JNA dependency to our own artifact
Fix#40239
(cherry picked from commit 9afa65fa80111f3b68c13373c7b6db13c11dde31)
Extend CAST to support all data types notations (whether SQL or ES
specific)
Fix#40282
(cherry picked from commit eb2ee8a344da946920598839a5db76c8bb9bc3fe)
* Rewrite Round and Truncate functions to have a slightly different
approach to handling the optional parameter in the constructor. Until now
the optional parameter was considered 0 if the value was missing and the
constructor was filling in this value. The current solution is to have
the optional parameter as null right until the actual calculation is done.
(cherry picked from commit 3e314f8fa4cb322e67949e80857561ce51268726)
* Define a equals method for Like function so that the pattern used
is considered in the equality check. Whenever the functions are resolved
this check should be used.
(cherry picked from commit 4e5d5af58a140573b8ee19d57c7839db7b779e3b)
Previously, calling getDate()/getTime()/getTimestamp() and getObject()
with the corresponding java.sql class on a column of SQL DATE type from
the JDBC result set would throw an Exception.
Previously, when a trival plain `SELECT` or a trivial `SELECT` with
aggregations has also an `ORDER BY` or a `LIMIT` or both, then the
optimization to convert it to a `LocalRelation` was skipped resulting
in exception thrown. E.g.::
```
SELECT 'foo' FROM test LIMIT 10
```
or
```
SELECT 'foo' FROM test GROUP BY 1 ORDER BY 1
```
Fixes: #40211
When selecting columns of ES type `date` (SQL's DATETIME) the
`FieldHitExtractor` was not using the timezone of the client session
but always resorted to UTC. The same behaviour (UTC only) was
encountered also for grouping keys (`CompositeKeyExtractor`) and
for First/Last functions on dates (`TopHitsAggExtractor`).
Fixes: #40152
* Take into consideration aliases that can be used as aggregates
and in the ORDER BY element so that the groupings are re-ordered inside
the composite aggregation according to the ORDER BY ordering.
(cherry picked from commit 110c0b90b9cf2e9344ab3f412cfa8f8cd94ad71f)
For cases where fields can have multi values, allow the behavior to be
customized through a dedicated configuration field.
By default this will be enabled on the drivers so that existing datasets
work instead of throwing an exception.
For regular SQL usage, the behavior is false so that the user is aware
of the underlying data.
Fix#39700
(cherry picked from commit 2b351571961f172fd59290ee079126bbd081ceaf)
Since other classes besides intervals can be serialized as part of
the Cursor, the getNamedWritables method should be moved from Intervals
to a more generic class Literals.
Relates to #39973
Previously, JDBC's REST call to the server was always sending UTC
instead of the timezone passed through connection string/properties.
Moreover the conversion to java.sql.Date was problematic as a
calculation on the epoch millis was used to set the time to 00:00:00.000
and the timezone info was lost. This caused the resulting java.sql.Date
object which is always using the JVM's timezone (no matter what timezone
setting is used in the connection string/properties) to be wrongly created.
Fixes: #39915
When a query is translated into script terms agg where key has a date
type, it should generate a terms agg with value_type long instead of
date, otherwise the key gets formatted as a string, which confuses
hit extractor.
Fixes#37042
Painless allows ZonedDateTime objects to be passed natively to scripts
which creates problematic translate queries as the ZonedDateTime is
passed as a string instead.
Wrap this with a dedicated method to perform the conversion.
Fix#39877
(cherry picked from commit 4957cad5bda77257d10430ac102e93f5e062148a)
Enhance ConstantProcessor to properly serialize complex objects
(Intervals) that have their own custom serialization/deserialization
mechanism
Fix#39875
(cherry picked from commit ed8a1f9340673e69a44ea7a89679cadb4762e43d)
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
MIN/MAX on strings are supported and are implemented with
TopAggs FIRST/LAST respectively, but they cannot operate on
`text` fields without underlying `keyword` fields => inexact.
Follows: #39427
Fix bug in IndexResolver that caused conflicts in multi-field types to
be ignored up (causing the query to fail later on due to mapping
conflicts).
The issue was caused by the multi-field which forced the parent creation
before checking its validity across mappings
Fix#39547
(cherry picked from commit 4e4fe289f90b9b5eae09072d54903701a3128696)
Queries that require counting of all hits (COUNT(*) on implicit
group by), now enable accurate hit tracking.
Fix#37971
(cherry picked from commit 265b637cf6df08986a890b8b5daf012c2b0c1699)
* SYS COLUMNS will skip UNSUPPORTED field types in ODBC and JDBC, as well.
NESTED and OBJECT types were already skipped in ODBC mode, now they are
skipped in JDBC mode, as well.
(cherry picked from commit 9e0df64b2d36c9069dfa506570468f0522c86417)
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.
Closes: #38501Fixes: #35203
Backport of #39350
Contains the following:
* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
Previously, if a text field had an underlying keyword field
the latter was not used instead of the text leading to wrong
results returned by queries filtering with LIKE/RLIKE.
Fixes: #39442
* Add "columnar" option for REST requests (but be lenient for non-"plain"
modes) for json, yaml, smile and cbor formats.
* Updated documentation
(cherry picked from commit 5b7e0de237fb514d14a61a347bc669d4b4adbe56)
This changes the name of the internal security index to ".security-7",
but supports indices that were upgraded from earlier versions and use
the ".security-6" name.
In all cases, both ".security-6" and ".security-7" are considered to
be restricted index names regardless of which name is actually in use
on the cluster.
Backport of: #39337
This defaults to "true" (current behavior) and will throw an exception
if there is a property that cannot be recognized. If "false", it will
ignore anything unrecognizable.
(cherry picked from commit 38fbf9792bcf4fe66bb3f17589e5fe6d29748d07)
`<expression>::<dataType>` is a simplified altenative syntax to
`CAST(<expression> AS <dataType> which exists in PostgreSQL and
provides an improved user experience and possibly more compact
SQL queries.
Fixes: #38717
Improve verifier to disallow grouping over grouping functions (e.g.
HISTOGRAM over HISTOGRAM).
Close#38308
(cherry picked from commit 4e9b1cfd4df38c652bba36b4b4b538ce7c714b6e)
Constant numbers (of any form: integers, decimals, negatives,
scientific) and strings shouldn't increase the depth counters
as they don't contribute to the increment of the stack depth.
Fixes: #38571
- Add resolution to the exact keyword field (if exists) for text fields.
- Add proper verification and error message if underlying keyword
doesn'texist.
- Move check for field attribute in the comparison list to the
`resolveType()` method of `IN`.
Fixes: #38424
Aliases defined in SELECT (Project or Aggregate) are now resolved in the
following WHERE clause. The Analyzer has been enhanced to identify this
rule and replace the field accordingly.
Close#29983
Since introduction of data types that don't have a corresponding type
in ES the `esType` is error-prone when used for `unmappedType()` calls.
Moreover since the renaming of `DATE` to `DATETIME` and the introduction
of an actual date-only `DATE` the `esType` would return `datetime` which
is not a valid type for ES mapping.
Fixes: #38051
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).
This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:
```
PUT index?include_type_name=false
{
"mappings": {
"_doc": {
"properties": { ... }
}
}
}
```
There is a similar issue for both 'put templates' and 'put mappings' requests
as well.
This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
Instead of throwing an exception, use an unresolved attribute to pass
the message to the Verifier.
Additionally improve the parser to save the extended source for the
Aggregate and OrderBy.
Close#38208
* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.
Introduce client-side sorting of groups based on aggregate
functions. To allow this, the Analyzer has been extended to push down
to underlying Aggregate, aggregate function and the Querier has been
extended to identify the case and consume the results in order and sort
them based on the given columns.
The underlying QueryContainer has been slightly modified to allow a view
of the underlying values being extracted as the columns used for sorting
might not be requested by the user.
The PR also adds minor tweaks, mainly related to tree output.
Close#35118
FIRST and LAST can be used with one argument and work similarly to MIN
and MAX but they are implemented using a Top Hits aggregation and
therefore can also operate on keyword fields. When a second argument is
provided then they return the first/last value of the first arg when its
values are ordered ascending/descending (respectively) by the values of
the second argument. Currently because of the usage of a Top Hits
aggregation FIRST and LAST cannot be used in the HAVING clause of a
GROUP BY query to filter on the results of the aggregation.
Closes: #35639
* Added SSL configuration options tests
Removed the allow.self.signed option from the documentation since we allow
by default self signed certificates as well.
* Added more tests