From 3b8ae56b39c6b19400a61074f70096d6e55895d6 Mon Sep 17 00:00:00 2001 From: Joel Bernstein Date: Wed, 8 Jul 2020 14:18:38 -0400 Subject: [PATCH] Ref Guide: Add Streaming Expression documentation for 8.6 release --- .../src/stream-source-reference.adoc | 63 ++++++++++++++++++- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/solr/solr-ref-guide/src/stream-source-reference.adoc b/solr/solr-ref-guide/src/stream-source-reference.adoc index 2c3d02f3ed6..1203023b29f 100644 --- a/solr/solr-ref-guide/src/stream-source-reference.adoc +++ b/solr/solr-ref-guide/src/stream-source-reference.adoc @@ -108,6 +108,50 @@ jdbc( ) ---- +== drill + +The `drill` function is designed to support efficient high cardinality aggregation. The `drill` +function sends a request to the `export` handler in a specific collection which includes a Streaming +Expression that the `export` handler applies to the sorted result set. The `export` handler then emits the aggregated tuples. +The `drill` function reads and emits the aggregated tuples fromn each shard maintaining the sort order, +but does not merge the aggregations. Streaming Expression functions can be wrapped around the `drill` function to +merge the aggregates. + +=== drill Parameters + +* `collection`: (Mandatory) the collection being searched. +* `q`: (Mandatory) The query to perform on the Solr index. +* `fl`: (Mandatory) The list of fields to return. +* `sort`: (Mandatory) The sort criteria. +* `expr`: The streaming expression that is sent to the export handler that operates over the sorted +result set. The `input()` function provides the stream of sorted tuples from the export handler (see examples below). + +=== drill Syntax + +Example 1: Basic drill syntax + +[source,text] +---- +drill(articles, + q="abstract:water", + fl="author", + sort="author asc", + rollup(input(), over="author", count(*))) +---- + +Example 2: A `rollup` wrapped around the `drill` function to sum the counts emitted from each shard. + +[source,text] +---- +rollup(drill(articles, + q="abstract:water", + fl="author", + sort="author asc", + rollup(input(), over="author", count(*))), + over="author", + sum(count(*))) +---- + == echo The `echo` function returns a single Tuple echoing its text parameter. `Echo` is the simplest stream source designed to provide text @@ -135,7 +179,8 @@ The `facet` function provides aggregations that are rolled up over buckets. Unde * `overfetch`: (Default 150) Over-fetching is used to provide accurate aggregations over high cardinality fields. * `method`: The JSON facet API aggregation method. * `bucketSizeLimit`: Sets the absolute number of rows to fetch. This is incompatible with rows, offset and overfetch. This value is applied to each dimension. '-1' will fetch all the buckets. -* `metrics`: List of metrics to compute for the buckets. Currently supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`. +* `metrics`: List of metrics to compute for the buckets. Currently supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`, `per(col, 50)`. The `per` metric calculates a percentile +for a numeric column and can be specified multiple times in the same facet function. === facet Syntax @@ -156,6 +201,8 @@ facet(collection1, max(a_f), avg(a_i), avg(a_f), + per(a_f, 50), + per(a_f, 75), count(*)) ---- @@ -179,6 +226,8 @@ facet(collection1, max(a_f), avg(a_i), avg(a_f), + per(a_f, 50), + per(a_f, 75), count(*)) ---- @@ -431,7 +480,9 @@ The `stats` function gathers simple aggregations for a search result set. The st * `collection`: (Mandatory) Collection the stats will be aggregated from. * `q`: (Mandatory) The query to build the aggregations from. -* `metrics`: (Mandatory) The metrics to include in the result tuple. Current supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)` and `count(*)` +* `metrics`: (Mandatory) The metrics to include in the result tuple. Current supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`, `per(col, 50)`. The `per` metric calculates a percentile +for a numeric column and can be specified multiple times in the same stats function. + === stats Syntax @@ -447,6 +498,8 @@ stats(collection1, max(a_f), avg(a_i), avg(a_f), + per(a_f, 50), + per(a_f, 75), count(*)) ---- @@ -464,7 +517,9 @@ JSON Facet API as its high performance aggregation engine. * `end`: (Mandatory) The end of the time series expressed in Solr date or date math syntax. * `gap`: (Mandatory) The time gap between time series aggregation points expressed in Solr date math syntax. * `format`: (Optional) Date template to format the date field in the output tuples. Formatting is performed by Java's SimpleDateFormat class. -* `metrics`: (Mandatory) The metrics to include in the result tuple. Current supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)` and `count(*)` +* `metrics`: (Mandatory) The metrics to include in the result tuple. Current supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`, `per(col, 50)`. The `per` metric calculates a percentile +for a numeric column and can be specified multiple times in the same timeseries function. + === timeseries Syntax @@ -482,6 +537,8 @@ timeseries(collection1, max(a_f), avg(a_i), avg(a_f), + per(a_f, 50), + per(a_f, 75), count(*)) ----