Add a new gradle module under eql/qa which runs and validates a set of
queries over a 4m event dataset (restored from a snapshot residing in a
gcs bucket). The results are providing by running the exact set of queries
with Python EQL against the same dataset.
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
(cherry picked from commit 1cf789e5fcfb0f364f665bfaac021e24a4c2f556)
Co-authored-by: Mark Vieira <portugee@gmail.com>
This commit fixes two issues in dealing with bool fields in EQL:
- avoid simplifications of field == true expressions
- adding comparison to clauses on fields missing logic (where bool)
Fix#63693
(cherry picked from commit d10a5d0e842bbd4e0031834de948ceb24da3872b)
(cherry picked from commit 0227da3a275c7f22ff524d99d53e1a79146f9e28)
* Allow all indices options variants
Irrespective of allow_no_indices value, throw VerificationException when
there is no index validated
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Do not filter by tiebreaker while searching sequence matches as
it's not monotonic and thus can filter out valid data.
Add handling for data 'near' the boundary that has the same timestamp
but different tie-breaker and thus can be just outside the window.
Fix#62781
Relates #63215
(cherry picked from commit 36f834600d4d9ded0fb7b1440274b2e597733770)
(cherry picked from commit 72a2ce825f3bfd13f87423ba7f3c739ea64c57f6)
Since match (for matching regex) is not currently in use remove it for
now.
Close#63263
(cherry picked from commit 6abd531cf457f3c5686f59709647bed3276e3c6b)
Make EQL case sensitive by default and adapt some of the string functions
Remove the case sensitive option from Between string function
Add case_insensitive option to term and wildcard queries usage
(cherry picked from commit 7550e0664c8c2f1f13519036c759b1e76345551f)
Add a unit test to verify that an identifier surrounded with backquotes
is not a valid syntax for eventType value, as eventType is
schemantically a string literal and not a field identifier.
Follows: #63169
(cherry picked from commit ff12c1340b3890ac52251f31259fa9a719d9eacc)
Ignore by default unavailable indices (same as ES) and verify that
allowNoIndices is set to false since at least one index is required
for validating the query.
Fix#62986
(cherry picked from commit fd75ac27223cd1b699b8d9c311dc401a39f9e0c8)
Do not filter by tiebreaker while searching sequence matches as
it's not monotonic and thus can filter out valid data.
Fix#62781
(cherry picked from commit 4d62198df70f3b70f8b6e7730e888057652c18a8)
Point-In-Time queries cannot be ran on individual indices but on all.
Thus all PIT queries move their index from the request level to a filter
so this condition is fulfilled while keeping the query scoped
accordingly.
Fix#63132
(cherry picked from commit c8eb4f724d5dcc0fcc172c6219ecfbc1dc1fbbae)
Event type is actually a string value for event.category which can
contain any kind of characters, or start with a digit, which currently
is not supported, so we introduce the possibility to be able to use the
usual syntax of " and """ for strings and raw strings.
Make the grammar a bit cleaner by using the identifier only where it's
actually an identifier in terms of query scemantics.
Fixes: #62933
(cherry picked from commit 306e1d76da3db652db57f11f847705b3995609ff)
Use triple double quotes enclosing a string literal to interpret it
as unescaped, in order to use `?` for marking query params and avoid
user confusion. `?` also usually implies regex expressions.
Any character inside the `"""` beginning-closing markings is considered
raw and the only thing that is not permitted is the `"""` sequence itself.
If a user wants to use that, needs to resort to the normal `"` string literal
and use proper escaping.
Relates to #61659
(cherry picked from commit d87c2ca2eacab5552bca1e520d33cf71da40bcfd)
Introduce : operator for doing case insensitive string comparisons.
Recognizes "*" for wildcard matches in string literals.
Restricted only to string types.
Relates #62941
(cherry picked from commit 201e577e65f26a9b958a6197fe6c7268da39de29)
Since `fork` is not used, is undocumented in Python EQL and
there is no plan at the moment to implement it in the future,
removing it from the grammar. User will get parsing exceptions
instead of higher level messages about unsupported features
which can lead to wrong expectations.
(cherry picked from commit f6a0f8f01c1b1893bab86629d1de73e9f9dae8dc)
This fixes a bug introduced when moving from mget to ids query. While
mget returns all the ids given, id query is a search query and thus by
default returns only 10 documents.
The fix correctly sets the expected size so all the information is
returned inside the response.
Fix#63030
(cherry picked from commit 09ba85548a0142a1fe8376efea9cc4e7764a207c)
Previously, backquote couldn't not be used inside an escaped identifier,
e.g.:
```
`my`identifier` = "some_value"
```
was not allowed. Introduce escaping of the backquote by using a
double backquote:
```
`my``identifier` = "some_value"
```
(cherry picked from commit 49514121486f42a58674b3e5901de4021fda5c15)
Remove Count class and related artifacts since that functionality is not
(yet) available.
Update parser name for better error reporting.
Fix#62131
(cherry picked from commit 060f500346788c4c5d0b3b9c045facec5d677d3d)
Fix query creation inside sequences with any queries due to lacking a
clause to combine, which lead to an invalid request being created.
Fix#62967
(cherry picked from commit ff59d8823919a6e70928816e5c3687308ebde33f)
Replace common Like and RLike queries that match all characters with
IsNotNull (exists) queries
Fix#62585
(cherry picked from commit 4c23fad0468a9edd7325b06c6a96f7af37625dbf)
Eclipse was confused for two reasons:
1. `:x-pack:plugin` depended on itself.
2. `ql`, `sql`, and `eql` couldn't see some methods.
I fixed problem 1 by only adding the "depends on itself" configuration
outside of eclipse. I fixed problem 2 by making a `test` sub-project in
`ql` that contains test utilities and depending on those where possible.
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.
Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
Expressions like `1 = 2 = 3 = 4` or `1 < 2 = 3 >= 4` were treated with
leftmost priority: ((1 = 2) = 3) = 4 which can lead to confusing
results. Since such expressions don't make so much change for EQL
filters we disallow them in the parser to prevent unexpected results
from their bad usage.
Major DBs like PostgreSQL and Oracle also disallow them in their SQL
syntax. (counter example would be MySQL which interprets them as we did
before with leftmost priority).
Fixes: #61654
(cherry picked from commit 8f94981bb093f104228d267b532e0a3d5b7f6a38)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.
Fix#62494
(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
The usage of single quotes to wrap a string literal is forbidden
and an error encouraging the user to user double quotes is returned.
Tests are properly adjusted.
Relates to #61659
(cherry picked from commit 8be400b77370bf4cf68c89f492c2d235f3cce43c)
To preserve the PIT semantics, the retrieval of results has moved from
using multi-get to using an idsQuery.
(cherry picked from commit 1c2362fcf2be62ce568b3772924abce7331ef23c)
Use the newly introduced PIT API to have a consistent view of the data
while doing sequence matching, which involves multiple calls, aka
repeatable reads and thus avoid race conditions or any in-flight updates
on the data.
(cherry picked from commit daa72fc3c71fd36afb55278021ff6bbc591ef148)
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
This commit removes `integTest` task from all es-plugins.
Most relevant projects have been converted to use yamlRestTest, javaRestTest,
or internalClusterTest in prior PRs.
A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)
related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
Since join keys are common across all queries in a Join/Sequence, any
constraint applied on one query needs to be obeyed but all the other
queries.
This PR enhances the optimizer to propagate such constraints across
all queries so they get pushed down to the actual generated ES queries.
Fix#58937
(cherry picked from commit 4afa5debc199c132c07015bfae17952c40a21e5d)
* The query client uses an array of indices instead of the comma separated
version of the indices names
(cherry picked from commit 8ec4a768f4892a4a2faed25836cb333a9deb2ace)
The current implementation of the filter pipe is incomplete hence why
it got reverted. Note this is not a complete revert as some of the
improvements of said commit (such as the PostAnalyzer) are useful in
general.
Relates #61805
(cherry picked from commit 7a7eb66f7d39586c3a3bc00dce49e6c47a23b46a)
Backport of #61904 to 7.x branch.
The eql search api redirects to the search api. For this reason the eql
search api could work with concrete data stream names. However if security
is enabled and a data stream name snippet with a wildcard was used then
it could not resolve this expressions. This is because the EqlSearchRequest
class didn't overwrite the `includeDataStreams()` method. This pr fixes this,
so that the security layer can properly expand data stream name wildcard
expressions for the eql search api.
This commit also moves the eql data stream test to xpack rest tests,
so that the test runs with security enabled. This is required to reproduce
the bug.
Closes#60828
For 1/2 the plugins in x-pack, the integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.
This includes the following projects:
async-search, autoscaling, ccr, enrich, eql, frozen-indicies,
data-streams, graph, ilm, mapper-constant-keyword, mapper-flattened, ml
A few of the more specialized qa projects within these plugins
have not been changed with this PR due to additional complexity which should
be addressed separately.
A follow up PR will address the remaining x-pack plugins (this PR is big enough as-is).
related: #61802
related: #56841
related: #59939
related: #55896
Allow filtering through a pipe, across events and sequences.
Filter pipes are pushed down to base queries.
For now filtering after limit (head/tail) is forbidden as the
semantics are still up for debate.
Fix#59763
(cherry picked from commit 80569a388b76cecb5f55037fe989c8b6f140761b)
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.
As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).
Fix#59764Fix#59779
Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)