mirror of https://github.com/apache/lucene.git
SOLR-12913: Add new facet expression and pivot docs
This commit is contained in:
parent
ff1df8a15c
commit
531b16633a
|
@ -130,8 +130,12 @@ The `facet` function provides aggregations that are rolled up over buckets. Unde
|
|||
* `collection`: (Mandatory) Collection the facets will be aggregated from.
|
||||
* `q`: (Mandatory) The query to build the aggregations from.
|
||||
* `buckets`: (Mandatory) Comma separated list of fields to rollup over. The comma separated list represents the dimensions in a multi-dimensional rollup.
|
||||
* `bucketSorts`: Comma separated list of sorts to apply to each dimension in the buckets parameters. Sorts can be on the computed metrics or on the bucket values.
|
||||
* `bucketSizeLimit`: The number of buckets to include. This value is applied to each dimension. '-1' will fetch all the buckets.
|
||||
* `bucketSorts`: (Mandatory) Comma separated list of sorts to apply to each dimension in the buckets parameters. Sorts can be on the computed metrics or on the bucket values.
|
||||
* `rows`: (Default 10) The number of rows to return. '-1' will return all rows.
|
||||
* `offset`:(Default 0) The offset in the result set to start from.
|
||||
* `overfetch`: (Default 150) Over-fetching is used to provide accurate aggregations over high cardinality fields.
|
||||
* `method`: The JSON facet API aggregation method.
|
||||
* `bucketSizeLimit`: Sets the absolute number of rows to fetch. This is incompatible with rows, offset and overfetch. This value is applied to each dimension. '-1' will fetch all the buckets.
|
||||
* `metrics`: List of metrics to compute for the buckets. Currently supported metrics are `sum(col)`, `avg(col)`, `min(col)`, `max(col)`, `count(*)`.
|
||||
|
||||
=== facet Syntax
|
||||
|
@ -144,7 +148,7 @@ facet(collection1,
|
|||
q="*:*",
|
||||
buckets="a_s",
|
||||
bucketSorts="sum(a_i) desc",
|
||||
bucketSizeLimit=100,
|
||||
rows=100,
|
||||
sum(a_i),
|
||||
sum(a_f),
|
||||
min(a_i),
|
||||
|
@ -166,7 +170,8 @@ facet(collection1,
|
|||
q="*:*",
|
||||
buckets="year_i, month_i, day_i",
|
||||
bucketSorts="year_i desc, month_i desc, day_i desc",
|
||||
bucketSizeLimit=100,
|
||||
rows=10,
|
||||
offset=20,
|
||||
sum(a_i),
|
||||
sum(a_f),
|
||||
min(a_i),
|
||||
|
@ -179,6 +184,7 @@ facet(collection1,
|
|||
----
|
||||
|
||||
The example above shows a facet function with rollups over three buckets, where the buckets are returned in descending order by bucket value.
|
||||
The rows param returns 10 rows and the offset param starts returning rows from the 20th row.
|
||||
|
||||
== features
|
||||
|
||||
|
|
|
@ -31,6 +31,12 @@ to vectorize and analyze the results sets.
|
|||
|
||||
Below are some of the key stream sources:
|
||||
|
||||
* *`facet`*: Multi-dimensional aggregations are a powerful tool for generating
|
||||
co-occurrence counts for categorical data. The `facet` function uses the JSON facet API
|
||||
under the covers to provide fast, distributed, multi-dimension aggregations. With math expressions
|
||||
the aggregated results can be pivoted into a co-occurance matrix which can be mined for
|
||||
correlations and hidden similarities within the data.
|
||||
|
||||
* *`random`*: Random sampling is widely used in statistics, probability and machine learning.
|
||||
The `random` function returns a random sample of search results that match a
|
||||
query. The random samples can be vectorized and operated on by math expressions and the results
|
||||
|
@ -242,6 +248,80 @@ When this expression is sent to the `/stream` handler it responds with:
|
|||
}
|
||||
----
|
||||
|
||||
== Facet Co-Occurrence Matrices
|
||||
|
||||
The `facet` function can be used to quickly perform mulit-dimension aggregations of categorical data from
|
||||
records stored in a Solr Cloud collection. These multi-dimension aggregations can represent co-occurrence
|
||||
counts for the values in the dimensions. The `pivot` function can be used to move two dimensional
|
||||
aggregations into a co-occurrence matrix. The co-occurrence matrix can then be clustered or analyzed for
|
||||
correlations to learn about the hidden connections within the data.
|
||||
|
||||
In the example below th `facet` expression is used to generate a two dimensional faceted aggregation.
|
||||
The first dimension is the US State that a car was purchased in and the second dimension is the car model.
|
||||
The two dimensional facet generates the co-occurrence counts for the number of times a particular car model
|
||||
was purchased in a particular state.
|
||||
|
||||
|
||||
[source,text]
|
||||
----
|
||||
facet(collection1, q="*:*", buckets="state, model", bucketSorts="count(*) desc", rows=5, count(*))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"state": "NY",
|
||||
"model": "camry",
|
||||
"count(*)": 13342
|
||||
},
|
||||
{
|
||||
"state": "NJ",
|
||||
"model": "accord",
|
||||
"count(*)": 13002
|
||||
},
|
||||
{
|
||||
"state": "NY",
|
||||
"model": "civic",
|
||||
"count(*)": 12901
|
||||
},
|
||||
{
|
||||
"state": "CA",
|
||||
"model": "focus",
|
||||
"count(*)": 12892
|
||||
},
|
||||
{
|
||||
"state": "TX",
|
||||
"model": "f150",
|
||||
"count(*)": 12871
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 171
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
The `pivot` function can be used to move the facet results into a co-occurrence matrix. In the example below
|
||||
The `pivot` function is used to create a matrix where the rows of the matrix are the US States (state) and the
|
||||
columns of the matrix are the car models (model). The values in the matrix are the co-occurrence counts (count(*))
|
||||
from facet results. Once the co-occurrence matrix has been created the US States can be clustered
|
||||
by car model, or the matrix can be transposed and car models can be clustered by the US States
|
||||
where they were bought.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=facet(collection1, q="*:*", buckets="state, model", bucketSorts="count(*) desc", rows="-1", count(*)),
|
||||
b=pivot(a, state, model, count(*)),
|
||||
c=kmeans(b, 7))
|
||||
----
|
||||
|
||||
== Latitude / Longitude Vectors
|
||||
|
||||
The `latlonVectors` function wraps a list of tuples and parses a lat/lon location field into
|
||||
|
|
Loading…
Reference in New Issue