129 lines
5.3 KiB
Plaintext
129 lines
5.3 KiB
Plaintext
[role="xpack"]
|
|
[[graph-explore]]
|
|
=== Explore API
|
|
|
|
The graph explore API ...
|
|
|
|
==== Request
|
|
|
|
`POST <index>/_xpack/graph/_explore`
|
|
|
|
|
|
==== Description
|
|
|
|
After an initial search users typically want to review the results using a form
|
|
of graph visualization tool like the one in the Kibana Graph UI. Users will
|
|
frequently then select one or more vertices of interest and ask to load more
|
|
vertices that may be connected to their current selection. In graph-speak,
|
|
this operation is often called _spidering_ or _spidering out_.
|
|
|
|
In order to spider out it is typically necessary to define two things:
|
|
|
|
* The set of vertices from which you would like to spider
|
|
* The set of vertices you already have in your workspace which you want to
|
|
avoid seeing again in results
|
|
|
|
These two pieces of information when passed to the graph explore API will
|
|
ensure you are returned new vertices that can be attached to the existing
|
|
selection.
|
|
|
|
The `include`and `exclude` clauses provide the essential features that enable
|
|
clients to progressively build up a picture of related information in their
|
|
workspace. The `include` clause is used to define the set of start points from
|
|
which users wish to spider. Include clauses can also be used to limit the end
|
|
points users wish to reach, thereby "filling in" some of the missing links
|
|
between existing vertices in their client-side workspace. The `exclude` clause
|
|
can be used to avoid the Graph API returning vertices already visible in a
|
|
client's workspace or perhaps could list undesirable vertices that the client
|
|
has blacklisted from their workspace and never wants to see returned.
|
|
|
|
//==== Path Parameters
|
|
|
|
//==== Query Parameters
|
|
|
|
==== Request Body
|
|
|
|
connections::
|
|
TBD. A list of fields is provided.
|
|
query:::
|
|
TBD. Optionally, a "guiding query" can be used to guide the API as it
|
|
explores connected terms.
|
|
vertices:::
|
|
TBD.
|
|
|
|
NOTE: Further "connections" can be nested inside the "connections" object to
|
|
continue exploring out the relationships in the data. Each level of nesting is
|
|
commonly referred to as a "hop" and proximity in a graph is often thought of in
|
|
terms of "hop depth".
|
|
|
|
controls::
|
|
TBD.
|
|
use_significance:::
|
|
TBD. The `use_significance` flag defaults to true and is used to filter
|
|
associated terms to only those that are significantly associated with our
|
|
query. The algorithm used to calculate significance are explained in the
|
|
documentation for the
|
|
{ref}/search-aggregations-bucket-significantterms-aggregation.html[significant_terms aggregation].
|
|
|
|
sample_size:::
|
|
TBD. Each "hop" considers a sample of the best-matching documents on each
|
|
shard (default is 100 documents). Using samples has the dual benefit of
|
|
keeping exploration focused on meaningfully-connected terms and improving
|
|
the speed of execution. Very small values (less than 50) may not provide
|
|
sufficient weight-of-evidence to identify significant connections between
|
|
terms while very large sample sizes may dilute the quality and be slow.
|
|
|
|
timeout:::
|
|
TBD. A `timeout` setting (expressed here in milliseconds) after which
|
|
exploration will be halted and results gathered so far are returned. This is
|
|
a best-effort approach to termination so may overrun if, for example, a long
|
|
pause is encountered while FieldData is loaded for a field.
|
|
|
|
sample_diversity:::
|
|
TBD. To avoid the top-matching documents sample being dominated by a single
|
|
source of results sometimes it can prove necessary to request diversity in
|
|
the sample. This is achieved by selecting a single-value field and a maximum
|
|
number of documents per value in that field. In this example we are
|
|
requiring that there are no more than 500 click documents from any one
|
|
department in the store. This might help us consider products from the
|
|
electronics, book and video departments whereas without this diversification
|
|
our results may be entirely dominated by the electronics department.
|
|
|
|
query::
|
|
TBD. A query is used to "seed" the exploration. Any of the usual {es} query
|
|
syntax can be used here to identify the documents of interest.
|
|
|
|
vertices::
|
|
TBD. A list of fields is provided.
|
|
exclude:::
|
|
TBD. The `exclude` clause avoids returning specific terms.
|
|
field::: TBD
|
|
include:::
|
|
TBD. Lists the start points from which we want to spider using an `include`
|
|
array of the terms of interest. Note that if you have an `include` clause,
|
|
there is no need to define a seed query - we are implicitly querying for
|
|
documents that contain any of the terms listed in our include clauses.
|
|
Instead of passing plain strings in this array it is also possible to pass
|
|
objects with `term` and `boost` values to boost matches on certain terms
|
|
over others.
|
|
size:::
|
|
TBD. We can control the maximum number of vertex terms returned for each
|
|
field using the `size` property. (Default is 5).
|
|
min_doc_count:::
|
|
TBD. This property acts as a certainty threshold - just how many documents
|
|
have to contain a pair of terms before we consider this to be a useful
|
|
connection? (Default is 3).
|
|
shard_min_doc_count:::
|
|
TBD. This is an advanced setting - just how many documents on a shard have
|
|
to contain a pair of terms before we return this for global consideration?
|
|
(Default is 2).
|
|
|
|
//==== Authorization
|
|
|
|
////
|
|
==== Examples
|
|
|
|
TO-DO: Add link to example in Kibana Guide
|
|
|
|
////
|