2016-12-21 13:02:47 -05:00
|
|
|
[[search-aggregations-bucket-adjacency-matrix-aggregation]]
|
|
|
|
=== Adjacency Matrix Aggregation
|
|
|
|
|
2020-08-17 11:27:04 -04:00
|
|
|
A bucket aggregation returning a form of {wikipedia}/Adjacency_matrix[adjacency matrix].
|
2016-12-21 13:02:47 -05:00
|
|
|
The request provides a collection of named filter expressions, similar to the `filters` aggregation
|
|
|
|
request.
|
|
|
|
Each bucket in the response represents a non-empty cell in the matrix of intersecting filters.
|
|
|
|
|
|
|
|
Given filters named `A`, `B` and `C` the response would return buckets with the following names:
|
|
|
|
|
|
|
|
|
|
|
|
[options="header"]
|
|
|
|
|=======================
|
|
|
|
| h|A h|B h|C
|
|
|
|
h|A |A |A&B |A&C
|
|
|
|
h|B | |B |B&C
|
|
|
|
h|C | | |C
|
|
|
|
|=======================
|
|
|
|
|
|
|
|
The intersecting buckets e.g `A&C` are labelled using a combination of the two filter names separated by
|
|
|
|
the ampersand character. Note that the response does not also include a "C&A" bucket as this would be the
|
|
|
|
same set of documents as "A&C". The matrix is said to be _symmetric_ so we only return half of it. To do this we sort
|
|
|
|
the filter name strings and always use the lowest of a pair as the value to the left of the "&" separator.
|
|
|
|
|
|
|
|
An alternative `separator` parameter can be passed in the request if clients wish to use a separator string
|
|
|
|
other than the default of the ampersand.
|
|
|
|
|
|
|
|
|
|
|
|
Example:
|
|
|
|
|
2019-09-05 10:11:25 -04:00
|
|
|
[source,console]
|
2016-12-21 13:02:47 -05:00
|
|
|
--------------------------------------------------
|
2019-01-23 03:46:28 -05:00
|
|
|
PUT /emails/_bulk?refresh
|
2016-12-21 13:02:47 -05:00
|
|
|
{ "index" : { "_id" : 1 } }
|
|
|
|
{ "accounts" : ["hillary", "sidney"]}
|
|
|
|
{ "index" : { "_id" : 2 } }
|
|
|
|
{ "accounts" : ["hillary", "donald"]}
|
|
|
|
{ "index" : { "_id" : 3 } }
|
|
|
|
{ "accounts" : ["vladimir", "donald"]}
|
|
|
|
|
2017-12-14 11:47:53 -05:00
|
|
|
GET emails/_search
|
2016-12-21 13:02:47 -05:00
|
|
|
{
|
|
|
|
"size": 0,
|
|
|
|
"aggs" : {
|
|
|
|
"interactions" : {
|
|
|
|
"adjacency_matrix" : {
|
|
|
|
"filters" : {
|
|
|
|
"grpA" : { "terms" : { "accounts" : ["hillary", "sidney"] }},
|
|
|
|
"grpB" : { "terms" : { "accounts" : ["donald", "mitt"] }},
|
|
|
|
"grpC" : { "terms" : { "accounts" : ["vladimir", "nigel"] }}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
|
|
In the above example, we analyse email messages to see which groups of individuals
|
|
|
|
have exchanged messages.
|
|
|
|
We will get counts for each group individually and also a count of messages for pairs
|
|
|
|
of groups that have recorded interactions.
|
|
|
|
|
|
|
|
Response:
|
|
|
|
|
2019-09-06 16:09:09 -04:00
|
|
|
[source,console-result]
|
2016-12-21 13:02:47 -05:00
|
|
|
--------------------------------------------------
|
|
|
|
{
|
|
|
|
"took": 9,
|
|
|
|
"timed_out": false,
|
|
|
|
"_shards": ...,
|
|
|
|
"hits": ...,
|
|
|
|
"aggregations": {
|
|
|
|
"interactions": {
|
|
|
|
"buckets": [
|
|
|
|
{
|
|
|
|
"key":"grpA",
|
|
|
|
"doc_count": 2
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"key":"grpA&grpB",
|
|
|
|
"doc_count": 1
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"key":"grpB",
|
|
|
|
"doc_count": 2
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"key":"grpB&grpC",
|
|
|
|
"doc_count": 1
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"key":"grpC",
|
|
|
|
"doc_count": 1
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// TESTRESPONSE[s/"took": 9/"took": $body.took/]
|
|
|
|
// TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
|
|
|
|
// TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
|
|
|
|
|
|
|
|
==== Usage
|
|
|
|
On its own this aggregation can provide all of the data required to create an undirected weighted graph.
|
|
|
|
However, when used with child aggregations such as a `date_histogram` the results can provide the
|
2020-08-17 11:27:04 -04:00
|
|
|
additional levels of data required to perform {wikipedia}/Dynamic_network_analysis[dynamic network analysis]
|
2016-12-21 13:02:47 -05:00
|
|
|
where examining interactions _over time_ becomes important.
|
|
|
|
|
|
|
|
==== Limitations
|
|
|
|
For N filters the matrix of buckets produced can be N²/2 and so there is a default maximum
|
2019-09-06 08:59:47 -04:00
|
|
|
imposed of 100 filters . This setting can be changed using the `index.max_adjacency_matrix_filters` index-level setting
|
2020-07-07 10:28:20 -04:00
|
|
|
(note this setting is deprecated and will be repaced with `indices.query.bool.max_clause_count` in 8.0+).
|