[DOCS] EQL: Document how sequence queries handle matches (#65794) (#65887)

Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
This commit is contained in:
James Rodewig 2020-12-04 09:57:08 -05:00 committed by GitHub
parent a5e65beab2
commit 793eb48502
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -741,3 +741,147 @@ three double quotes (`"""`) instead.
*** {eql-ref}/pipes.html#sort[`sort`]
*** {eql-ref}/pipes.html#unique[`unique`]
*** {eql-ref}/pipes.html#unique-count[`unique_count`]
[discrete]
[[eql-how-sequence-queries-handle-matches]]
==== How sequence queries handle matches
<<eql-sequences,Sequence queries>> don't find all potential matches for a
sequence. This approach would be too slow and costly for large event data sets.
Instead, a sequence query handles pending sequence matches as a
{wikipedia}/Finite-state_machine[state machine]:
* Each event item in the sequence query is a state in the machine.
* Only one pending sequence can be in each state at a time.
* If two pending sequences are in the same state at the same time, the most
recent sequence overwrites the older one.
* If the query includes <<eql-by-keyword,`by` fields>>, the query uses a
separate state machine for each unique `by` field value.
.*Example*
[%collapsible]
====
A data set contains the following `process` events in ascending chronological
order:
[source,js]
----
{ "index" : { "_id" : "1" } }
{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
{ "index" : { "_id" : "2" } }
{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
{ "index" : { "_id" : "3" } }
{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
{ "index" : { "_id" : "4" } }
{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
{ "index" : { "_id" : "5" } }
{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
{ "index" : { "_id" : "6" } }
{ "user": { "name": "elkbee" }, "process": { "name": "attrib" }, ...}
{ "index" : { "_id" : "7" } }
{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
{ "index" : { "_id" : "8" } }
{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
{ "index" : { "_id" : "9" } }
{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
{ "index" : { "_id" : "10" } }
{ "user": { "name": "elkbee" }, "process": { "name": "cat" }, ...}
{ "index" : { "_id" : "11" } }
{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
----
// NOTCONSOLE
An EQL sequence query searches the data set:
[source,eql]
----
sequence by user.name
[process where process.name == "attrib"]
[process where process.name == "bash"]
[process where process.name == "cat"]
----
The query's event items correspond to the following states:
* State A: `[process where process.name == "attrib"]`
* State B: `[process where process.name == "bash"]`
* Complete: `[process where process.name == "cat"]`
To find matching sequences, the query uses separate state machines for each
unique `user.name` value. Pending sequence matches move through each machine's
states as follows:
[source,txt]
----
{ "index" : { "_id" : "1" } }
{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
// Creates sequence [1] in state A for the "root" user.
//
// root: A=[1]
{ "index" : { "_id" : "2" } }
{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
// Creates sequence [2] in state A for "root", overwriting sequence [1].
//
// root: A=[2]
{ "index" : { "_id" : "3" } }
{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
// Nothing happens. The "elkbee" user has no pending sequence to move from state A to state B
{ "index" : { "_id" : "4" } }
{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
// Sequence [2] moves out of state A for "root". State B for "root" now contains [2, 4]
// State A for "root" is now empty.
//
// root: A=[]
// root: B=[2, 4]
{ "index" : { "_id" : "5" } }
{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
// Nothing happens. State A is empty for "root".
{ "index" : { "_id" : "6" } }
{ "user": { "name": "elkbee" }, "process": { "name": "attrib" }, ...}
// Creates sequence [6] in state A for "elkbee".
//
// elkbee: A=[6]
{ "index" : { "_id" : "7" } }
{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
// Creates sequence [7] in state A for "root".
// Sequence [2, 4] remains in state B for "root".
//
// root: A=[7]
// root: B=[2, 4]
{ "index" : { "_id" : "8" } }
{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
// Sequence [6, 8] moves to state B for "elkbee".
// State A for "elkbee" is now empty.
//
// elkbee: A=[]
// elkbee: B=[6, 8]
{ "index" : { "_id" : "9" } }
{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
// Sequence [2, 4, 9] is complete for "root".
// State B for "root" is now empty.
// Sequence [7] remains in state A.
//
// root: A=[7]
// root: B=[]
{ "index" : { "_id" : "10" } }
{ "user": { "name": "elkbee" }, "process": { "name": "cat" }, ...}
// Sequence [6, 8, 10] is complete for "elkbee".
// State A and B for "elkbee" are now empty.
//
// elkbee: A=[]
// elkbee: B=[]
{ "index" : { "_id" : "11" } }
{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
// Nothing happens. State B for "root" is empty.
----
====