Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.
Relates to PR: #38373
Relates to issue #24422
Co-authored-by: sandmannn <bohdanpukalskyi@gmail.com>
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
* EQL: Use In from QL
* EQL: Add more In tests
* EQL: Test In duplicates
* EQL: Add test for In mixed types
* EQL: Copy In translation to QL
* SQL: Use InComparisons from QL
* EQL: Remove boost checks from QueryFolderOkTests
* QL: Add TranslatorHandler.convert
Improve separation of scripting between EQL and SQL by delegating common
methods to QL. The context detection is determined based on the package
to avoid having repetitive class hierarchies.
The Painless whitelists have been improved so that the declaring class
is used instead of the inherited one.
Relates #53688
(cherry picked from commit 6d46033e736c64ac9255c5d6964600d2a931430a)
EQL: Add Substring function with Python semantics (#53688)
Does not reuse substring from SQL due to the difference in semantics and
the accepted arguments.
Currently it is missing full integration tests as, due to the usage of
scripting, requires an actual integration test against a proper cluster
(and likely its own QA project).
(cherry picked from commit f58680bad33d5ce4139157a69a4d9f5f286bc3c4)
This commit introduces aarch64 packaging, including bundling an aarch64
JDK distribution. We had to make some interesting choices here:
- ML binaries are not compiled for aarch64, so for now we disable ML on
aarch64
- depending on underlying page sizes, we have to disable class data
sharing
Fix NPE when `null` is passed as a parameter for a parameterized
pattern of LIKE/RLIKE. e.g.: `field LIKE ?` params=[null]`
Check for null pattern in LIKE/RLIKE as for RLIKE (RegexpQuery) we
get an IllegalArgumentExpression from Lucence but for LIKE
(WildcardQuery) we get an NPE.
Fixes: #53557
(cherry picked from commit ec3481ed13254ecdec32acf7a0fafd536ec77aff)
Add missing asScript() implementation for LIKE/RLIKE expressions.
When LIKE/RLIKE are used for example in GROUP BY or are wrapped with
scalar functions in a WHERE clause, the translation must produce a
painless script which will be executed to implement the correct
behaviour and previously this was completely missing, and as a
consquence wrong results were silently (no error) returned.
Fixes: #53486
(cherry picked from commit eaa8ead6742a8e7dcf343bcbaff8de031550fd77)
This commit adds a new request object field, "version", containing the version of the requesting client. This parameter is now accepted - and for certain clients required - by the server and the request is validated against it. Currently server's and client's versions still need to be equal in order for the request to be accepted. Relaxing this check is going to be part of future work.
On the clients' side, the only check remaining is to ensure that the peer server is supporting version backwards compatibility (i.e. is on, or newer than a certain release).
(cherry picked from commit a8f413a20fb023bec83af0de1211a2936a7f558c)
Add a unit test to verify that the optimization of expression
(e.g. COALESCE) is applied to all instances of the expression:
SELECT, WHERE, GROUP BY and HAVING.
Relates to #35270
(cherry picked from commit 2ceedc7f2019fad92cd86679af1a9c6fa594aa8d)
Set size/displaySize to 45 which is the maximum string for
an IP (v6), since IPs are returned as strings.
Fixes: #52762
(cherry picked from commit 815f01747a4d54a274ca248af6fc08e5ea0728c1)
* Move In, InPipe and InProcessor out of SQL to the common QL project.
* Move tests classes to the QL project.
* Create SQL dedicated In class to handle SQL specific data types.
* Update SQL classes to use the InPipe and InProcessor QL classes.
* Extract common Foldables methods in QL project.
* Be more explicit when folding and converting a foldable value, by
removing most of the code inside Foldables class.
(cherry picked from commit 7425042f86f66df8c207c5e96f9b9848bda2b4c3)
The sql-cli script sources x-pack-env, but it does so assuming the
current directory is ES_HOME. This commit alters the source command to
use ES_HOME which is available after running elasticsearch-env.
closes#47803
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.
In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
Backport of #52596
Refactor the code to allow contextual parameterization of dateFormat and
name.
Separate aggs/query implementation though there's room for improvement
in the future
(cherry picked from commit e086f81b688875b33d01e4504ce7377031c8cf28)
Translate to an agg query even if only literals are selected,
so that the correct number of rows is returned (number of buckets).
Fix issue with key only in GROUP BY (not in select) and WHERE clause:
Resolve aggregates and groupings based on the child plan which holds
the info info for all the fields of the underlying table.
Fixes: #41951Fixes: #41413
(cherry picked from commit 45b85809678b34a448639a420b97e25436ae851f)
This commit makes the names of fetch subphases more consistent:
* Now the names end in just 'Phase', whereas before some ended in
'FetchSubPhase'. This matches the query subphases like AggregationPhase.
* Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what
they do.
This commit removes the need for DeprecatedRoute and ReplacedRoute to
have an instance of a DeprecationLogger. Instead the RestController now
has a DeprecationLogger that will be used for all deprecated and
replaced route messages.
Relates #51950
Backport of #52278
* Add more checks around parameter conversions
This commit adds two necessary verifications on received parameters:
- it checks the validity of the parameter's data type: if the declared
data type is resolved to an ES or Java type;
- it checks if the returned converter is non-null (i.e. a conversion is
possible) and generates an appropriate exception otherwise.
(cherry picked from commit eda30ac9c69383165324328c599ace39ac064342)
* Extract common optimizer tests (#52169)
(cherry picked from commit e5ad72bc22e9ec0686ab582195f0032efcb880bf)
* Hook in the optimizer rules (#52172)
(cherry picked from commit 1f90d8cc56052fbf2af604e72f9f5ca73f5e75d5)
Previously, in the in-memory sorting module
`LocalAggregationSorterListener` only the aggregate functions where used
(grabbed by the `sortingColumns`). As a consequence, if the ORDER BY
was also using columns of the GROUP BY clause, (especially in the case
of higher priority - before the aggregate functions) wrong results were
produced. E.g.:
```
SELECT gender, MAX(salary) AS max FROM test_emp
GROUP BY gender
ORDER BY gender, max
```
Add all columns of the ORDER BY to the `sortingColumns` so that the
`LocalAggregationSorterListener` can use the correct comparators in
the underlying PriorityQueue used to implement the in-memory sorting.
Fixes: #50355
(cherry picked from commit be680af11c823292c2d115bff01658f7b75abd76)
Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone Z. This could lead to incorrect results.
Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.
Fixes: #51258
(cherry picked from commit 8f7afbdeb9295999b48a6c36db5b31cbe0cee432)
Make the parsing of date more lenient
- as an escaped literal: `{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}`
- cast a string to a date: `CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)`
Closes: #49379
(cherry picked from commit 5863b27500d5e7f6cdd8c6c62b09b84e53ca724a)
This fixes:
- the parsing of milliseconds in intervals: everything past the . used to be converted as-is to milliseconds, with no normalisation of the unit; thus, a value of .23 ended up as 23 millis in the interval, instead of 230.
- the printing of a trailing .0, in case the interval lacks the fractional part;
- tests generating a random millisecond value used to simply print it in the string about to be evaluated without a necessary front-filling of 0[s], where the amount was below 100/10.
(The combination of first and last issues above, plus statistical "luck" made the incorrect handling pass the tests.)
(cherry picked from commit 4de8c64f63ee37c1bcfdb9b9d3a07d09be243222)
Allow also whitespace ` ` (together with `T`) as a separator between
date and time parts of the timestamp string. E.g.:
```
{ts '2020-02-08 12.10.45'}
```
or
```
{ts '2020-02-08T12.10.45'}
```
Fixes: #46069
(cherry picked from commit 07c977023fb8ceab5991c359a6cbfe07beaad9bb)
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
Add unit and integration tests where literals are SELECTed
in combination with GROUP BY and possibly aggregate functions.
Relates to #41411 and #34583
which have been fixed.
(cherry picked from commit b97f1ca12675d6ea4772c60578922fe1cc2409ee)
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.
Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
* Optimize not-equalities in con-/disjunctions
This commit adds optimisations of not-equalities in conjunctions and
disjunctions:
* for conjunctions, the not-equality can be optimized away when applied
together with a range or inequality, in case the not-equality point
falls outside the domain of the later condition; if its on the boarder,
it will modify the bound, to simply exclude the equality, if present;
otherwise no optimisation can be applied;
* for disjunctions, the not-equals could filter away the ranges and
inequalities, unless these include an equality on the bound, in which
case the entire condition becomes always true, but this would influence
the score() function, so it's been omitted;
* fix aggregations of inequalities in ranges
This commit fixes the loop that aggregates inequalities into ranges:
- it won't advance the outer loop index in case of a merge, since the
current element is removed;
- it will break the inner loop, since comparision against the element
selected in the outer loop can't continue, as it had been removed.
(cherry picked from commit 789724ac2cc726de603849b4eeb8194da7528bcc)
Previously, if YEAR() was used as and ORDER BY argument without being
wrapped with another scalar (e.g. YEAR(birth_date) + 10), no script
ordering was used but instead the underlying field (e.g. birth_date)
was used instead as a performance optimisation. This works correctly if
YEAR() is the only ORDER BY arg but if further args are used as tie
breakers for the ordering wrong results are produced. This is because
2 rows with the different birth_date but on the same year are not tied
as the underlying ordering is on birth_date and not on the
YEAR(birth_date), and the following ORDER BY args are ignored.
Remove this optimisation for YEAR() to avoid incorrect results in
such cases.
As a consequence another bug is revealed: scalar functions on top
of nested fields produce scripted sorting/filtering which is not yet
supported. In such cases no error was thrown but instead all values for
such nested fields were null and were passed to the script implementing
the sorting/filtering, producing incorrect results.
Detect such cases and throw a validation exception.
Fixes: #51224
(cherry picked from commit f41efd6753dc3650a7eabb3e07b02b3b32c5704c)
Add a verification that full-text search functions `MATCH()` and `QUERY()`
are not allowed in the SELECT clause, so that a nice error message is
returned to the user early instead of an "ugly" exception.
Fixes: #47446
* Introduce reusable QL plugin for SQL and EQL (#50815)
Extract reusable functionality from SQL into its own dedicated project QL.
Implemented as a plugin, it provides common components across SQL and the upcoming EQL.
While this commit is fairly large, for the most part it's just a big file move from sql package to the newly introduced ql.
(cherry picked from commit ec1ac0d463bfa12a02c8174afbcdd6984345e8b4)
* SQL: Fix incomplete registration of geo NamedWritables
(cherry picked from commit e295763686f9592976e551e504fdad1d2a3a566d)
* QL: Extend NodeSubclass to read classes from jars (#50866)
As the test classes are spread across more than one project, the Gradle
classpath contains not just folders but also jars.
This commit allows the test class to explore the archive content and
load matching classes from said source.
(cherry picked from commit 25ad74928afcbf286dc58f7d430491b0af662f04)
* QL: Remove implicit conversion inside Literal (#50962)
Literal constructor makes an implicit conversion for each value given
which turns out has some subtle side-effects.
Improve MathProcessors to preserve numeric type where possible
Fix bug on issue compatibility between date and intervals
Preserve the source when folding inside the Optimizer
(cherry picked from commit 9b73e225b0aa07a23859550fb117bae571a2b672)
* QL: Refactor DataType for pluggability (#51328)
Change DataType from enum to class
Break DataType enums into QL (default) and SQL types
Make data type conversion pluggable so that new types can be introduced
As part of the process:
- static type conversion in QL package (such as Literal) has been
removed
- several utility classes have been broken into base (QL) and extended
(SQL) parts based on type awareness
- operators (+,-,/,*) are
- due to extensibility, serialization of arithmetic operation has been
slightly changed and pushed down to the operator executor itself
(cherry picked from commit aebda81b30e1563b877a8896309fd50633e0b663)
* Compilation fixes for 7.x
The hierarchy of fields/sub-fields under a field that is of an
unsupported data type will be marked as unsupported as well. Until this
change, the behavior was to set the unsupported data type field's
hierarchy as empty.
Example, considering the following hierarchy of fields/sub-fields
a -> b -> c -> d, if b would be of type "foo", then b, c and d will
be marked as unsupported.
(cherry picked from commit 7adb286c4c485b9e781f88b0a2f98cab9ec5b7e2)
* Extend the optimizations for equalities
This commit supplements the optimisations of equalities in conjunctions
and disjunctions:
* for conjunctions, the existing optimizations with ranges are extended
with not-equalities and inequalities; these lead to a fast resolution,
the conjunction either being evaluate to a FALSE, or the non-equality
conditions being dropped as superfluous;
* optimisations for disjunctions are added to be applied against ranges,
inequalities and not-equalities; these lead to disjunction either
becoming TRUE or the equality being dropped, either as superfluous or
merged into a range/inequality.
* Adress review notes
* Fix the bug around wrongly optimizing 'a=2 OR a!=?', which only yields
TRUE for same values in equality and inequality.
* Var renamings, code style adjustments, comments corrections.
* Address further review comments. Extend optim.
- fix a few code comments;
- extend the Equals OR NotEquals optimitsation (a=2 OR a!=5 -> a!=5);
- extend the Equals OR Range optimisation on limits equality (a=2 OR
2<=a<5 -> 2<=a<5);
- in case an equality is being removed in a conjunction, the rest of
possible optimisations to test is now skipped.
* rename one var for better legiblity
- s/rmEqual/removeEquals
(cherry picked from commit 62e7c6a010f10cd7893ee5c99bad8b8d2a693436)
* SQL: Optimisation fixes for conjunction merges
This commit fixes the following issues around the way comparisions are
merged with ranges in conjunctions:
* the decision to include the equality of the lower limit is corrected;
* the selection of the upper limit is corrected to use the upper bound
of the range;
* the list of terms in the conjunction is sorted to have the ranges at
the bottom; this allows subsequent binary comarisions to find compatible
ranges and potentially be merged away. The end guarantee being that the
optimisation takes place irrespective of the order of the conjunction
terms in the statement.
Some comments are also corrected.
* adress review observation on anon. comparator
Replace anonymous comparator of split AND Expressions with a lambda.
(cherry picked from commit 9828cb143a41f1bda1219541f3a8fdc03bf6dd14)
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.
In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
- keys can't be longer than 20 chars,
- values can only be numbers or strings of no more than 50 chars - no inner
arrays or objects,
- the metadata can't have more than 5 keys in total.
Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.
Here is how the meta might look like in mappings:
```json
{
"properties": {
"latency": {
"type": "long",
"meta": {
"unit": "ms"
}
}
}
}
```
And then in the field capabilities response:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms" ]
}
}
}
}
```
When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms", "ns" ]
}
}
}
}
```
Closes#33267
* Adds JavaDoc to `AbstractWireTestCase` and
`AbstractWireSerializingTestCase` so it is more obvious you should prefer
the latter if you have a choice
* Moves the `instanceReader` method out of `AbstractWireTestCase` becaue
it is no longer used.
* Marks a bunch of methods final so it is more obvious which classes are
for what.
* Cleans up the side effects of the above.