For functions: move checks for `text` fields without underlying `keyword`
fields or with many of them (ambiguity) to the type resolution stage.
For Order By/Group By: move checks to the `Verifier` to catch early
before `QueryTranslator` or execution.
Closes: #38501Fixes: #35203
Backport of #39350
Contains the following:
* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
Previously, if a text field had an underlying keyword field
the latter was not used instead of the text leading to wrong
results returned by queries filtering with LIKE/RLIKE.
Fixes: #39442
* Add "columnar" option for REST requests (but be lenient for non-"plain"
modes) for json, yaml, smile and cbor formats.
* Updated documentation
(cherry picked from commit 5b7e0de237fb514d14a61a347bc669d4b4adbe56)
This changes the name of the internal security index to ".security-7",
but supports indices that were upgraded from earlier versions and use
the ".security-6" name.
In all cases, both ".security-6" and ".security-7" are considered to
be restricted index names regardless of which name is actually in use
on the cluster.
Backport of: #39337
This defaults to "true" (current behavior) and will throw an exception
if there is a property that cannot be recognized. If "false", it will
ignore anything unrecognizable.
(cherry picked from commit 38fbf9792bcf4fe66bb3f17589e5fe6d29748d07)
`<expression>::<dataType>` is a simplified altenative syntax to
`CAST(<expression> AS <dataType> which exists in PostgreSQL and
provides an improved user experience and possibly more compact
SQL queries.
Fixes: #38717
Improve verifier to disallow grouping over grouping functions (e.g.
HISTOGRAM over HISTOGRAM).
Close#38308
(cherry picked from commit 4e9b1cfd4df38c652bba36b4b4b538ce7c714b6e)
Constant numbers (of any form: integers, decimals, negatives,
scientific) and strings shouldn't increase the depth counters
as they don't contribute to the increment of the stack depth.
Fixes: #38571
- Add resolution to the exact keyword field (if exists) for text fields.
- Add proper verification and error message if underlying keyword
doesn'texist.
- Move check for field attribute in the comparison list to the
`resolveType()` method of `IN`.
Fixes: #38424
Aliases defined in SELECT (Project or Aggregate) are now resolved in the
following WHERE clause. The Analyzer has been enhanced to identify this
rule and replace the field accordingly.
Close#29983
Since introduction of data types that don't have a corresponding type
in ES the `esType` is error-prone when used for `unmappedType()` calls.
Moreover since the renaming of `DATE` to `DATETIME` and the introduction
of an actual date-only `DATE` the `esType` would return `datetime` which
is not a valid type for ES mapping.
Fixes: #38051
`CreateIndexRequest#source(Map<String, Object>, ... )`, which is used when
deserializing index creation requests, accidentally accepts mappings that are
nested twice under the type key (as described in the bug report #38266).
This in turn causes us to be too lenient in parsing typeless mappings. In
particular, we accept the following index creation request, even though it
should not contain the type key `_doc`:
```
PUT index?include_type_name=false
{
"mappings": {
"_doc": {
"properties": { ... }
}
}
}
```
There is a similar issue for both 'put templates' and 'put mappings' requests
as well.
This PR makes the minimal changes to detect and reject these typed mappings in
requests. It does not address #38266 generally, or attempt a larger refactor
around types in these server-side requests, as I think this should be done at a
later time.
Instead of throwing an exception, use an unresolved attribute to pass
the message to the Verifier.
Additionally improve the parser to save the extended source for the
Aggregate and OrderBy.
Close#38208
* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.
Introduce client-side sorting of groups based on aggregate
functions. To allow this, the Analyzer has been extended to push down
to underlying Aggregate, aggregate function and the Querier has been
extended to identify the case and consume the results in order and sort
them based on the given columns.
The underlying QueryContainer has been slightly modified to allow a view
of the underlying values being extracted as the columns used for sorting
might not be requested by the user.
The PR also adds minor tweaks, mainly related to tree output.
Close#35118
FIRST and LAST can be used with one argument and work similarly to MIN
and MAX but they are implemented using a Top Hits aggregation and
therefore can also operate on keyword fields. When a second argument is
provided then they return the first/last value of the first arg when its
values are ordered ascending/descending (respectively) by the values of
the second argument. Currently because of the usage of a Top Hits
aggregation FIRST and LAST cannot be used in the HAVING clause of a
GROUP BY query to filter on the results of the aggregation.
Closes: #35639
* Added SSL configuration options tests
Removed the allow.self.signed option from the documentation since we allow
by default self signed certificates as well.
* Added more tests
Doc-value fields now return a value that is based on the mappings rather than
the script implementation by default.
This deprecates the special `use_field_mapping` docvalue format which was added
in #29639 only to ease the transition to 7.x and it is not necessary anymore in
7.0.
* Exit batch files explictly using ERRORLEVEL
This makes sure the exit code is preserved when calling the batch
files from different contexts other than DOS
Fixes#29582
This also fixes specific error codes being masked by an explict
exit /b 1
causing the useful exitcodes from ExitCodes to be lost.
* fix line breaks for calling cli to match the bash scripts
* indent size of bash files is 2, make sure editorconfig does the same for bat files
* update indenting to match bash files
* update elasticsearch-keystore.bat indenting
* Update elasticsearch-node.bat to exit outside of endlocal
* Remove empty statements
There are a couple of instances of undocumented empty statements all across the
code base. While they are mostly harmless, they make the code hard to read and
are potentially error-prone. Removing most of these instances and marking blocks
that look empty by intention as such.
* Change test, slightly more verbose but less confusing
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.
Closes#33028
When the arguements of PERCENTILE and PERCENTILE_RANK can be folded,
the `ConstantFolding` rule kicks in and calls the `replaceChildren()`
method on `InnerAggregate` which is created from the aggregation rules
of the `Optimizerz. `InnerAggregate` in turn, cannot implement the method
as the logic of creating a new `InnerAggregate` instance from a list of
`Expression`s resides in the Optimizer. So, instead, `ConstantFolding`
should be applied before any of the aggregations related rules.
Fixes: #37099
This commit removes the Index Audit Output type, following its deprecation
in 6.7 by 8765a31d4e6770. It also adds the migration notice (settings notice).
In general, the problem with the index audit output is that event indexing
can be slower than the rate with which audit events are generated,
especially during the daily rollovers or the rolling cluster upgrades.
In this situation audit events will be lost which is a terrible failure situation
for an audit system.
Besides of the settings under the `xpack.security.audit.index` namespace, the
`xpack.security.audit.outputs` setting has also been deprecated and will be
removed in 7. Although explicitly configuring the logfile output does not touch
any deprecation bits, this setting is made redundant in 7 so this PR deprecates
it as well.
Relates #29881
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.
The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.
Relates #27330
This adds a set of helper classes to determine if an agg "has a value".
This is needed because InternalAggs represent "empty" in different
manners according to convention. Some use `NaN`, `+/- Inf`, `0.0`, etc.
A user can pass the Internal agg type to one of these helper methods
and it will report if the agg contains a value or not, which allows the
user to differentiate "empty" from a real `NaN`.
These helpers are best-effort in some cases. For example, several
pipeline aggs share a single return class but use different conventions
to mark "empty", so the helper uses the loosest definition that applies
to all the aggs that use the class.
Sums in particular are unreliable. The InternalSum simply returns 0.0
if the agg is empty (which is correct, no values == sum of zero). But this
also means the helper cannot differentiate from "empty" and `+1 + -1`.
* Add separate CLI Mode
* Use the correct Mode for cursor close requests
* Renamed CliFormatter and have different formatting behavior for CLI and "text" format.
Throws an exception if hit extractor tries to retrieve unsupported
object. For example, selecting "a" from `{"a": {"b": "c"}}` now throws
an exception instead of returning null.
Relates to #37364
* SQL: Rename SQL data type DATE to DATETIME
SQL data type DATE has only the date part (e.g.: 2019-01-14)
without any time information. Previously the SQL type DATE was
referring to the ES DATE which contains also the time part along
with TZ information. To conform with SQL data types the data type
`DATE` is renamed to `DATETIME`, since it includes also the time,
as a new runtime SQL `DATE` data type will be introduced down the road,
which only contains the date part and meets the SQL standard.
Closes: #36440
* Address comments
When reporting metadata, several clients have issues with the 'ALIAS'
type. To improve compatibility and be consistent with the ANSI SQL
expectations and because they are similar, aliases targets are now
reported as views.
Close#37422
Adjust FieldExtractor to handle fields which contain `.` in their
name, regardless where they fall in, in the document hierarchy. E.g.:
```
{
"a.b": "Elastic Search"
}
{
"a": {
"b.c": "Elastic Search"
}
}
{
"a.b": {
"c": {
"d.e" : "Elastic Search"
}
}
}
```
Fixes: #37128
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit removes the fallback for SSL settings. While this may be
seen as a non user friendly change, the intention behind this change
is to simplify the reasoning needed to understand what is actually
being used for a given SSL configuration. Each configuration now needs
to be explicitly specified as there is no global configuration or
fallback to some other configuration.
Closes#29797
Improve error messages by returning the original SQL statement
declaration instead of trying to reproduce it as the casing and
whitespaces are not preserved accurately leading to small
differences.
Close#37161
Since `full` can be common as a field name or part of a field name
(e.g.: `full.name` or `name.full`), it's nice if it's not a reserved
keyword of the grammar so a user can use it without resorting to quotes.
Fixes: #37376
SqlPlugin cannot have more than one public constructor, so for the testing
purposes the `getLicenseState()` should be overriden.
Fixes: #37191
Co-authored-by: Michael Basnight <mbasnight@gmail.com>
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
Field of types aliases that have dots in name are returned without a
hierarchy by field_caps, as oppose to the mapping api or field with
concrete types, which in turn breaks IndexResolver.
This commit fixes this by creating the backing hierarchy similar to the
mapping api.
Close#37224
* provide overriden `hashCode` and toString methods to account for `DISTINCT`
* change the analyzer for scenarios where `COUNT <field_name>` and `COUNT DISTINCT` have different paths
* defined a new `filter` aggregation encapsulating an `exists` query to filter out null or missing values
* Use `_count` aggregation value only for not-DISTINCT COUNT function calls
* COUNT DISTINCT will use the _exact_ version of a field (the `keyword` sub-field for example), if there is one
Improve error message returned to the client when an SQL statement
cannot be translated to a ES query DSL. Cases:
1. WHERE clause evaluates to FALSE => No results returned
1. Missing FROM clause => Local execution, e.g.: SELECT SIN(PI())
3. Special SQL command => Only valid of SQL iface, e.g.: SHOW TABLES
Fixes: #37040
Logical operators OR and AND as well as conditional functions
(COALESCE, LEAST, GREATEST, etc.) cannot be folded to NULL if one
of their children is NULL as is the case for most of the functions.
Therefore, their nullable() implementation cannot return true. On
the other hand they cannot return false as if they're wrapped within
an IS NULL or IS NOT NULL expression, the expression will be folded
to false and true respectively leading to wrong results.
Change the signature of nullable() method and add a third value UKNOWN
to handle these cases.
Fixes: #35872
Improve parsing to save the source for each token alongside the location
of each Node/Expression for accurate reproducibility of an expression
name and source
Fix#36894
Enhance error message for the case that the 2nd argument of PERCENTILE
and PERCENTILE_RANK is not a foldable, as it doesn't make sense to have
a dynamic value coming from a field.
Fixes: #36903
* Added Limitations page
* Made the aggregations page follow the common template for functions
* Modified all tables to have the first row's cells content centered
* Polishing in other various sections
Allow scripts to correctly reference grouping functions
Fix bug in translation of date/time functions mixed with histograms.
Enhance Verifier to prevent histograms being nested inside other
functions inside GROUP BY (as it implies double grouping)
Extend Histogram docs
* This change is to account for different system clock implementations
or different Java versions (for Java 8, milliseconds precision is used;
for Java 9+ a system specific clock implementation is used which can
have greater precision than what we need here).
Negative timestamps are currently supported in joda time. These are
dates before epoch. However, it doesn't really make sense to have a
negative timestamp, since this is a modern format. Any dates before
epoch can be represented with normal date formats, like ISO8601.
Additionally, implementing negative epoch timestamp parsing in java time
has an edge case which would more than double the code required. This
commit deprecates use of negative epoch timestamps.
When a filter is evaluated to false then it becomes a LocalRelation
with an EmptyExecutable. The LocalRelation in turn, becomes a
LocalExec and the the SkipQueryIfFoldingProjection was wrongly
converting it to a SingletonExecutable. Moreover made a change, so
that the queries without FROM clause, which are supposed to return a
single row, to become a LocalRelation with a SingletonExecutable
instead of EmptyExecutable to avoid mixing up with the ones operating
on a table but with a filter that evaluates to false.
Fixes: #35980
Lucene 7.6 uses a smaller encoding for LatLonShape. This commit forks the LatLonShape classes to Elasticsearch's local lucene package. These classes will be removed on the release of Lucene 7.6.
Fix grammar so that each element inside the list of values for IN
is a valueExpression and not a more generic expression. Introduce a
mapping for context names as some rules in the grammar are exited with
a different rule from the one they entered.This helps so that the decrement
of depth counts in the Parser's CircuitBreakerListener works correctly.
For the list of values for IN, don't count the
PrimaryExpressionContext as this is not visited on exitRule() due to
the peculiarity in our gramamr with the predicate and predicated.
Fixes: #36592
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
As the internals have moved to java.time, the usage of TimeZone itself
should be minimized as it creates issues when being converted to ZoneId
Protocol wise the two are mostly identical so consumer should not see
any difference.
Note that terminology wise, inside the docs, the public API and inside
the protocol timeZone will continue to be used as it's more widely
understood as oppose to zoneId which is an implementation detail
specific to the JVM
Fix#36535
When trying to analyse a HAVING condition, we crate a temp Aggregate
Plan which wasn't created correctly (missing the expressions from the
SELECT clause) and as a result the ordinal (1, 2, etc) in the GROUP BY
couldn't be resolved correctly.
Also after successfully analysing the HAVING condition, still the
original plan was returned.
Fixes: #36059
Introduce Histogram grouping function for bucketing/grouping data based
on a given range. Both date and numeric histograms are supported using
the appropriate range declaration (numbers vs intervals).
SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h
SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h
In addition add multiply operator for Intervals
Add docs for intervals and histogram
Fix#36509
Add CURRENT_TIMESTAMP as keyword as well function alongside NOW()
These return the current date/time for the given query, computed when
the statement reaches the server. For completeness, CURRENT_TIMESTAMP
also accepts precision as an optional parameter.
Fix#36534
Add missing `formatTemplate()` for conditional functions which
resulted in incomplete painless script. Moreover the specific
return type of Object in the painless signatures resulted in
casting exceptions when conditional functions are used in the
ORDER BY.
Fixes: #36631
Previously, Math.floorMod was used for integers and longs
which has different logic for negative numbers. Also, the
priority of data types check was wrong as if one of the args
is double the evaluation should be with doubles, then for floats,
then longs and finally integers.
Fixes: #36364
The getters and setters for useDisMax() have been deprecated since at least 6.0,
also there hasn't been any reference to the query parameter in the
documentation. Removing it from the builder and tests and replacing it with
`tieBreaker(1.0f)` where necessary.
Includes the following:
* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this
has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
* Renamed DAY_OF_WEEK and WEEK_OF_YEAR functions to their ISO version and
added the same functions with different functionality.
* Rewritten the datetime functions documentation to follow the format of the other
functions documentation pages.
* This commit is part of our plan to deprecate and ultimately remove the use of _xpack in the REST APIs.
- REST API docs
- HLRC docs and doc tests
- Handle REST actions with deprecation warnings
- Changed endpoints in rest-api-spec and relevant file names
Previously, we used a CamelCase to CAMEL_CASE transformation to get the
primary name of a function from its class name which led to some issues
since there are functions that we don't want to be registered this way
(e.g.: IFNULL). To simplify the logic and avoid and "magic"
transformations in the FunctionRegistry a primary name must be provided
explicitely for each function.
The same change is applied for the function resolution (when a function
is used in an SQL statement). There is no CamelCase to CAMEL_CASE
transformation but only upper-casing is applied (fuNcTiOn -> FUNCTION).
Includes:
LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates
Relates #36286
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
When the grouping key of a GROUP BY is a painless script (functions are
involved), the data type of the key was incorrect in certain cases
(Boolean, IP, Date). This resulted in returning wrong data type for this
columns in the query results. E.g.:
```
SELECT COUNT(*), a > 10 AS a FROM t GROUP BY a
```
Fixes: #35662
Move classes under the same package to avoid internal classes being
exposed to the outside. Remove public visibility outside 3 classes:
EsDriver, EsDataSource and EsTypes.
The driver only has one package, namely org.elasticsearch.xpack.sql.jdbc
Use Es prefix for classes to ease name conflict and indicate their
destination
Fix#35437
This change removes the deprecated useDisMax() and useAllFields() methods from
the QueryStringQueryBuilder and related tests. The disMax parameter has already
been a no-op since 6.0 and also the useAllFields has been deprecated since 6.0
and there is a direct replacement via defaultField.
`SIGN` and `RADIANS` where wrongly overriding `mathFunction()`.
Converted `mathFunction()` to private in `MathFunction` since it
shouldn't be overriden, as it uses the assigned `MathOperation`
to get the funtion name for painless scripts.
Fixes: #35654
Add special verifier rule to check that the arguments of conditional
functions are of the same or compatible types. This way the user gets
a descriptive error message with line number and column indicating
where is the offending argument.
Closes: #35907
This commit adds back bundling of all deps of the sql jdbc jar. This was
lost in a refactoring of how the shadow plugin is handled for the entire
elasticsearch project.
Add GREATEST(expr1, expr2, ... exprN) and LEAST(expr1, expr2, exprN)
functions which are in the family of CONDITIONAL functions.
Implementation follows PostgreSQL behaviour, so the functions return
`NULL` when all of their arguments evaluate to `NULL`.
Renamed `CoalescePipe` and `CoalesceProcessor` to `ConditionalPipe` and
`ConditionalProcessor` respectively, to be able to reuse them for
`Greatest` and `Least` evaluations. To achieve that `ConditionalOperation`
has been added to differentiate between the functionalities at execution
time.
Closes: #35878
Due to some unresolvable type conflict between the expected definition
in JDBC vs ODBC, the driver mode is now passed over so that certain
command can change their results accordingly (in this case SYS COLUMNS)
Fix#35376
This operator handles nulls in different way than the normal `=`.
If one of the operants is `null` and the other not it returns `false`.
If both operants are `null` it returns `true`. Therefore in contrary to
`=`, which returns `null` if at least one of the operants is `null`, this one
never returns `null` as a result.
Closes: #35871
Move away from performing eager, fail-fast validation of mismatched
mapping to a lazy evaluation based on the fields actually used in the
query. This allows queries to run on the parts of the indices that
"work" which is not just convenient but also a necessity for large
mappings (like logging) where alignment is hard/impossible to achieve.
Fix#35659
Introduce INTERVAL as a DataType
Add INTERVAL to the grammar which supports the standard SQL declaration
(without precision):
> INTERVAL '1 23:45:01.123456789' DAY TO SECOND
but also number for single unit intervals:
> INTERVAL 1 YEAR
as well as the plurals of the units:
> INTERVAL 2 YEARS
Interval are internally supported as just another Literal being backed
by java.time.Period and java.time.Duration
Move JDBC away from JDBCType enum to SQLType interface
Refactor DataType by moving it into server core and adding dedicated (and
much simpler) JDBC driver type
Improve internal JDBC conversion by normalizing on the DataType
Rename JDBC columnInfo to JdbcColumnInfo to differentiate between it and
the SQL ColumnInfo
Fix#29990
This parameter in the `query_string` query was deprecated in 6.0 and ignored
since then. Its API methods and remaining uses can be removed in the upcoming
major version.
Relates to #35734