For cases where fields can have multi values, allow the behavior to be
customized through a dedicated configuration field.
By default this will be enabled on the drivers so that existing datasets
work instead of throwing an exception.
For regular SQL usage, the behavior is false so that the user is aware
of the underlying data.
Fix#39700
(cherry picked from commit 2b351571961f172fd59290ee079126bbd081ceaf)
Previously, JDBC's REST call to the server was always sending UTC
instead of the timezone passed through connection string/properties.
Moreover the conversion to java.sql.Date was problematic as a
calculation on the epoch millis was used to set the time to 00:00:00.000
and the timezone info was lost. This caused the resulting java.sql.Date
object which is always using the JVM's timezone (no matter what timezone
setting is used in the connection string/properties) to be wrongly created.
Fixes: #39915
When a query is translated into script terms agg where key has a date
type, it should generate a terms agg with value_type long instead of
date, otherwise the key gets formatted as a string, which confuses
hit extractor.
Fixes#37042
Enhance ConstantProcessor to properly serialize complex objects
(Intervals) that have their own custom serialization/deserialization
mechanism
Fix#39875
(cherry picked from commit ed8a1f9340673e69a44ea7a89679cadb4762e43d)
* Bundle java in distributions
Setting up a jdk is currently a required external step when installing
elasticsearch. This is particularly problematic for the rpm/deb packages
as installing a jdk in the same package installation command does not
guarantee any order, so must be done in separate steps. Additionally,
JAVA_HOME must be set and often causes problems in selecting a correct
jdk when, for example, the system java is an older unsupported version.
This commit bundles platform specific openjdks into each distribution.
In addition to eliminating the issues above, it also presents future
possible improvements like using jlink to build jdk images only
containing modules that elasticsearch uses.
closes#31845
Queries that require counting of all hits (COUNT(*) on implicit
group by), now enable accurate hit tracking.
Fix#37971
(cherry picked from commit 265b637cf6df08986a890b8b5daf012c2b0c1699)
For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.
Closes: #38501Fixes: #35203
Previously, if a text field had an underlying keyword field
the latter was not used instead of the text leading to wrong
results returned by queries filtering with LIKE/RLIKE.
Fixes: #39442
* Add "columnar" option for REST requests (but be lenient for non-"plain"
modes) for json, yaml, smile and cbor formats.
* Updated documentation
(cherry picked from commit 5b7e0de237fb514d14a61a347bc669d4b4adbe56)
This changes the name of the internal security index to ".security-7",
but supports indices that were upgraded from earlier versions and use
the ".security-6" name.
In all cases, both ".security-6" and ".security-7" are considered to
be restricted index names regardless of which name is actually in use
on the cluster.
Backport of: #39337
`<expression>::<dataType>` is a simplified altenative syntax to
`CAST(<expression> AS <dataType> which exists in PostgreSQL and
provides an improved user experience and possibly more compact
SQL queries.
Fixes: #38717
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).
This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:
```
PUT index?include_type_name=false
{
"mappings": {
"_doc": {
"properties": { ... }
}
}
}
```
There is a similar issue for both 'put templates' and 'put mappings' requests
as well.
This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.
Introduce client-side sorting of groups based on aggregate
functions. To allow this, the Analyzer has been extended to push down
to underlying Aggregate, aggregate function and the Querier has been
extended to identify the case and consume the results in order and sort
them based on the given columns.
The underlying QueryContainer has been slightly modified to allow a view
of the underlying values being extracted as the columns used for sorting
might not be requested by the user.
The PR also adds minor tweaks, mainly related to tree output.
Close#35118
FIRST and LAST can be used with one argument and work similarly to MIN
and MAX but they are implemented using a Top Hits aggregation and
therefore can also operate on keyword fields. When a second argument is
provided then they return the first/last value of the first arg when its
values are ordered ascending/descending (respectively) by the values of
the second argument. Currently because of the usage of a Top Hits
aggregation FIRST and LAST cannot be used in the HAVING clause of a
GROUP BY query to filter on the results of the aggregation.
Closes: #35639
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.
This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
When the arguements of PERCENTILE and PERCENTILE_RANK can be folded,
the `ConstantFolding` rule kicks in and calls the `replaceChildren()`
method on `InnerAggregate` which is created from the aggregation rules
of the `Optimizerz. `InnerAggregate` in turn, cannot implement the method
as the logic of creating a new `InnerAggregate` instance from a list of
`Expression`s resides in the Optimizer. So, instead, `ConstantFolding`
should be applied before any of the aggregations related rules.
Fixes: #37099
This commit removes the Index Audit Output type, following its deprecation
in 6.7 by 8765a31d4e6770. It also adds the migration notice (settings notice).
In general, the problem with the index audit output is that event indexing
can be slower than the rate with which audit events are generated,
especially during the daily rollovers or the rolling cluster upgrades.
In this situation audit events will be lost which is a terrible failure situation
for an audit system.
Besides of the settings under the `xpack.security.audit.index` namespace, the
`xpack.security.audit.outputs` setting has also been deprecated and will be
removed in 7. Although explicitly configuring the logfile output does not touch
any deprecation bits, this setting is made redundant in 7 so this PR deprecates
it as well.
Relates #29881
* Add separate CLI Mode
* Use the correct Mode for cursor close requests
* Renamed CliFormatter and have different formatting behavior for CLI and "text" format.
* SQL: Rename SQL data type DATE to DATETIME
SQL data type DATE has only the date part (e.g.: 2019-01-14)
without any time information. Previously the SQL type DATE was
referring to the ES DATE which contains also the time part along
with TZ information. To conform with SQL data types the data type
`DATE` is renamed to `DATETIME`, since it includes also the time,
as a new runtime SQL `DATE` data type will be introduced down the road,
which only contains the date part and meets the SQL standard.
Closes: #36440
* Address comments
When reporting metadata, several clients have issues with the 'ALIAS'
type. To improve compatibility and be consistent with the ANSI SQL
expectations and because they are similar, aliases targets are now
reported as views.
Close#37422
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit removes the fallback for SSL settings. While this may be
seen as a non user friendly change, the intention behind this change
is to simplify the reasoning needed to understand what is actually
being used for a given SSL configuration. Each configuration now needs
to be explicitly specified as there is no global configuration or
fallback to some other configuration.
Closes#29797
Improve error messages by returning the original SQL statement
declaration instead of trying to reproduce it as the casing and
whitespaces are not preserved accurately leading to small
differences.
Close#37161
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
Field of types aliases that have dots in name are returned without a
hierarchy by field_caps, as oppose to the mapping api or field with
concrete types, which in turn breaks IndexResolver.
This commit fixes this by creating the backing hierarchy similar to the
mapping api.
Close#37224
* provide overriden `hashCode` and toString methods to account for `DISTINCT`
* change the analyzer for scenarios where `COUNT <field_name>` and `COUNT DISTINCT` have different paths
* defined a new `filter` aggregation encapsulating an `exists` query to filter out null or missing values
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls
* COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one
Improve error message returned to the client when an SQL statement
cannot be translated to a ES query DSL. Cases:
1. WHERE clause evaluates to FALSE => No results returned
1. Missing FROM clause => Local execution, e.g.: SELECT SIN(PI())
3. Special SQL command => Only valid of SQL iface, e.g.: SHOW TABLES
Fixes: #37040
* Added Limitations page
* Made the aggregations page follow the common template for functions
* Modified all tables to have the first row's cells content centered
* Polishing in other various sections
Allow scripts to correctly reference grouping functions
Fix bug in translation of date/time functions mixed with histograms.
Enhance Verifier to prevent histograms being nested inside other
functions inside GROUP BY (as it implies double grouping)
Extend Histogram docs
Negative timestamps are currently supported in joda time. These are
dates before epoch. However, it doesn't really make sense to have a
negative timestamp, since this is a modern format. Any dates before
epoch can be represented with normal date formats, like ISO8601.
Additionally, implementing negative epoch timestamp parsing in java time
has an edge case which would more than double the code required. This
commit deprecates use of negative epoch timestamps.
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
When trying to analyse a HAVING condition, we crate a temp Aggregate
Plan which wasn't created correctly (missing the expressions from the
SELECT clause) and as a result the ordinal (1, 2, etc) in the GROUP BY
couldn't be resolved correctly.
Also after successfully analysing the HAVING condition, still the
original plan was returned.
Fixes: #36059
Introduce Histogram grouping function for bucketing/grouping data based
on a given range. Both date and numeric histograms are supported using
the appropriate range declaration (numbers vs intervals).
SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h
SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h
In addition add multiply operator for Intervals
Add docs for intervals and histogram
Fix#36509
Add CURRENT_TIMESTAMP as keyword as well function alongside NOW()
These return the current date/time for the given query, computed when
the statement reaches the server. For completeness, CURRENT_TIMESTAMP
also accepts precision as an optional parameter.
Fix#36534
Add missing `formatTemplate()` for conditional functions which
resulted in incomplete painless script. Moreover the specific
return type of Object in the painless signatures resulted in
casting exceptions when conditional functions are used in the
ORDER BY.
Fixes: #36631
The getters and setters for useDisMax() have been deprecated since at least 6.0,
also there hasn't been any reference to the query parameter in the
documentation. Removing it from the builder and tests and replacing it with
`tieBreaker(1.0f)` where necessary.
* Renamed DAY_OF_WEEK and WEEK_OF_YEAR functions to their ISO version and
added the same functions with different functionality.
* Rewritten the datetime functions documentation to follow the format of the other
functions documentation pages.
* This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.
- REST API docs
- HLRC docs and doc tests
- Handle REST actions with deprecation warnings
- Changed endpoints in rest-api-spec and relevant file names
Move classes under the same package to avoid internal classes being
exposed to the outside. Remove public visibility outside 3 classes:
EsDriver, EsDataSource and EsTypes.
The driver only has one package, namely org.elasticsearch.xpack.sql.jdbc
Use Es prefix for classes to ease name conflict and indicate their
destination
Fix#35437
Add GREATEST(expr1, expr2, ... exprN) and LEAST(expr1, expr2, exprN)
functions which are in the family of CONDITIONAL functions.
Implementation follows PostgreSQL behaviour, so the functions return
`NULL` when all of their arguments evaluate to `NULL`.
Renamed `CoalescePipe` and `CoalesceProcessor` to `ConditionalPipe` and
`ConditionalProcessor` respectively, to be able to reuse them for
`Greatest` and `Least` evaluations. To achieve that `ConditionalOperation`
has been added to differentiate between the functionalities at execution
time.
Closes: #35878
This operator handles nulls in different way than the normal `=`.
If one of the operants is `null` and the other not it returns `false`.
If both operants are `null` it returns `true`. Therefore in contrary to
`=`, which returns `null` if at least one of the operants is `null`, this one
never returns `null` as a result.
Closes: #35871
Introduce INTERVAL as a DataType
Add INTERVAL to the grammar which supports the standard SQL declaration
(without precision):
> INTERVAL '1 23:45:01.123456789' DAY TO SECOND
but also number for single unit intervals:
> INTERVAL 1 YEAR
as well as the plurals of the units:
> INTERVAL 2 YEARS
Interval are internally supported as just another Literal being backed
by java.time.Period and java.time.Duration
Move JDBC away from JDBCType enum to SQLType interface
Refactor DataType by moving it into server core and adding dedicated (and
much simpler) JDBC driver type
Improve internal JDBC conversion by normalizing on the DataType
Rename JDBC columnInfo to JdbcColumnInfo to differentiate between it and
the SQL ColumnInfo
Fix#29990
Add `IsNull` node in parser to simplify expressions so that `<value> IS NULL` is
no longer translated internally to `NOT(<value> IS NOT NULL)`
Replace `IsNotNullProcessor` with `CheckNullProcessor` to encapsulate both
isNull and isNotNull functionality.
Closes: #34876Fixes: #35171
Override `process()` in `BinaryLogicProcessor` which doesn't immediately
return null if left or right argument is null, which is the behaviour of
`process()` of the parent class `BinaryProcessor`.
Also, add more tests for `AND` and `OR` in SELECT clause with literal.
Fixes: #35240
Add NotEquals node in parser to simplify expressions so that <value1> != <value2> is
no longer translated internally to NOT(<value1> = <value2>)
Closes: #35210Fixes: #35233
Moves `x-pack/qa/sql/*` into `x-pack/plugin/sql/qa` to make it simpler
to run all of the sql tests. This lines up with how newer projects like
cross cluster replication are testing themselves.