mirror of https://github.com/apache/lucene.git
SOLR-11144: Add Analytics Component docs to the Ref Guide
This commit is contained in:
parent
e915078707
commit
6428ddb10e
|
@ -0,0 +1,91 @@
|
|||
= Analytics Expression Sources
|
||||
:page-tocclass: right
|
||||
// Licensed to the Apache Software Foundation (ASF) under one
|
||||
// or more contributor license agreements. See the NOTICE file
|
||||
// distributed with this work for additional information
|
||||
// regarding copyright ownership. The ASF licenses this file
|
||||
// to you under the Apache License, Version 2.0 (the
|
||||
// "License"); you may not use this file except in compliance
|
||||
// with the License. You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing,
|
||||
// software distributed under the License is distributed on an
|
||||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
// KIND, either express or implied. See the License for the
|
||||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
Expression sources are the source of the data being aggregated in <<analytics.adoc#expressions,analytics expressions>>.
|
||||
|
||||
These sources can be either Solr fields indexed with docValues, or constants.
|
||||
|
||||
== Supported Field Types
|
||||
|
||||
The following <<field-types-included-with-solr.adoc#field-types-included-with-solr, Solr field types>> are supported.
|
||||
Fields of these types can be either multi-valued and single-valued.
|
||||
|
||||
All fields used in analytics expressions *must* have <<docvalues.adoc#docvalues,docValues>> enabled.
|
||||
|
||||
|
||||
// Since Trie* fields are deprecated as of 7.0, we should consider removing Trie* fields from this list...
|
||||
|
||||
[horizontal]
|
||||
String::
|
||||
StrField
|
||||
Boolean::
|
||||
BoolField
|
||||
Integer::
|
||||
TrieIntField +
|
||||
IntPointField
|
||||
Long::
|
||||
TrieLongField +
|
||||
LongPointField
|
||||
Float::
|
||||
TrieFloatField +
|
||||
FloatPointField
|
||||
Double::
|
||||
TrieDoubleField +
|
||||
DoublePointField
|
||||
Date::
|
||||
TrieDateField +
|
||||
DatePointField
|
||||
|
||||
.Multi-valued Field De-duplication
|
||||
[WARNING]
|
||||
====
|
||||
All multi-valued field types, except for PointFields, are de-duplicated, meaning duplicate values for the same field are removed during indexing.
|
||||
In order to save duplicates, you must use PointField types.
|
||||
====
|
||||
|
||||
== Constants
|
||||
|
||||
Constants can be included in expressions to use along side fields and functions. The available constants are shown below.
|
||||
Constants do not need to be surrounded by any function to define them, they can be used exactly like fields in an expression.
|
||||
|
||||
=== Strings
|
||||
|
||||
There are two possible ways of specifying constant strings, as shown below.
|
||||
|
||||
* Surrounded by double quotes, inside the quotes both `"` and `\` must be escaped with a `\` character.
|
||||
+
|
||||
`"Inside of 'double' \\ \"quotes\""` \=> `Inside of 'double' \ "quotes"`
|
||||
* Surrounded by single quotes, inside the quotes both `'` and `\` must be escaped with a `\` character.
|
||||
+
|
||||
`'Inside of "single" \\ \'quotes\''` \=> `Inside of "double" \ 'quotes'`
|
||||
|
||||
=== Dates
|
||||
|
||||
Dates can be specified in the same way as they are in Solr queries. Just use ISO-8601 format.
|
||||
For more information, refer to the <<working-with-dates.adoc#working-with-dates,Working with Dates>> section.
|
||||
|
||||
* `2017-07-17T19:35:08Z`
|
||||
|
||||
=== Numeric
|
||||
|
||||
Any non-decimal number will be read as an integer, or as a long if it is too large for an integer. All decimal numbers will be read as doubles.
|
||||
|
||||
* `-123421`: Integer
|
||||
* `800000000000`: Long
|
||||
* `230.34`: Double
|
|
@ -0,0 +1,360 @@
|
|||
= Analytics Mapping Functions
|
||||
:page-tocclass: right
|
||||
// Licensed to the Apache Software Foundation (ASF) under one
|
||||
// or more contributor license agreements. See the NOTICE file
|
||||
// distributed with this work for additional information
|
||||
// regarding copyright ownership. The ASF licenses this file
|
||||
// to you under the Apache License, Version 2.0 (the
|
||||
// "License"); you may not use this file except in compliance
|
||||
// with the License. You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing,
|
||||
// software distributed under the License is distributed on an
|
||||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
// KIND, either express or implied. See the License for the
|
||||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
Mapping functions map values for each Solr Document or Reduction.
|
||||
|
||||
Below is a list of all mapping functions provided by the Analytics Component.
|
||||
These mappings can be chained together to implement more complex functionality.
|
||||
|
||||
== Numeric Functions
|
||||
|
||||
=== Negation
|
||||
Negates the result of a numeric expression.
|
||||
|
||||
`neg(<_Numeric_ T>)` \=> `<T>`::
|
||||
* `neg(10.53)` \=> `-10.53`
|
||||
* `neg([1, -4])` \=> `[-1, 4]`
|
||||
|
||||
=== Absolute Value
|
||||
Returns the absolute value of the numeric expression.
|
||||
|
||||
`abs(< _Numeric_ T >)` \=> `< T >`::
|
||||
* `abs(-10.53)` \=> `10.53`
|
||||
* `abs([1, -4])` \=> `[1, 4]`
|
||||
|
||||
[[analytics-round]]
|
||||
=== Round
|
||||
Rounds the numeric expression to the nearest `Integer` or `Long` value.
|
||||
|
||||
`round(< _Float_ >)` \=> `< _Int_ >`::
|
||||
`round(< _Double_ >)` \=> `< _Long_ >`::
|
||||
* `round(-1.5)` \=> `-1`
|
||||
* `round([1.75, 100.34])` \=> `[2, 100]`
|
||||
|
||||
=== Ceiling
|
||||
Rounds the numeric expression to the nearest `Integer` or `Long` value that is greater than or equal to the original value.
|
||||
|
||||
`ceil(< _Float_ >)` \=> `< _Int_ >`::
|
||||
`ceil(< _Double_ >)` \=> `< _Long_ >`::
|
||||
* `ceil(5.01)` \=> `5`
|
||||
* `ceil([-4.999, 6.99])` \=> `[-4, 7]`
|
||||
|
||||
[[analytics-floor]]
|
||||
=== Floor
|
||||
Rounds the numeric expression to the nearest `Integer` or `Long` value that is less than or equal to the original value.
|
||||
|
||||
`floor(< _Float_ >)` \=> `< _Int_ >`::
|
||||
`floor(< _Double_ >)` \=> `< _Long_ >`::
|
||||
* `floor(5.75)` \=> `5`
|
||||
* `floor([-4.001, 6.01])` \=> `[-5, 6]`
|
||||
|
||||
=== Addition
|
||||
Adds the values of the numeric expressions.
|
||||
|
||||
`add(< _Multi Double_ >)` \=> `< _Single Double_ >`::
|
||||
* `add([1, -4])` \=> `-3.0`
|
||||
`add(< _Single Double_ >, < _Multi Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `add(3.5, [1, -4])` \=> `[4.5, -0.5]`
|
||||
`add(< _Multi Double_ >, < _Single Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `add([1, -4], 3.5)` \=> `[4.5, -0.5]`
|
||||
`add(< _Single Double_ >, ...)` \=> `< _Single Double_ >`::
|
||||
* `add(3.5, 100, -27.6)` \=> `75.9`
|
||||
|
||||
=== Subtraction
|
||||
Subtracts the values of the numeric expressions.
|
||||
|
||||
`sub(< _Single Double_ >, < _Single Double_ >)` \=> `< _Single Double_ >`::
|
||||
* `sub(3.5, 100)` \=> `-76.5`
|
||||
`sub(< _Single Double_ >, < _Multi Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `sub(3.5, [1, -4])` \=> `[2.5, 7.5]`
|
||||
`sub(< _Multi Double_ >, < _Single Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `sub([1, -4], 3.5)` \=> `[-2.5, -7.5]`
|
||||
|
||||
=== Multiplication
|
||||
Multiplies the values of the numeric expressions.
|
||||
|
||||
`mult(< _Multi Double_ >)` \=> `< _Single Double_ >`::
|
||||
* `mult([1, -4])` \=> `-4.0`
|
||||
`mult(< _Single Double_ >, < _Multi Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `mult(3.5, [1, -4])` \=> `[3.5, -16.0]`
|
||||
`mult(< _Multi Double_ >, < _Single Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `mult([1, -4], 3.5)` \=> `[3.5, 16.0]`
|
||||
`mult(< _Single Double_ >, ...)` \=> `< _Single Double_ >`::
|
||||
* `mult(3.5, 100, -27.6)` \=> `-9660`
|
||||
|
||||
=== Division
|
||||
Divides the values of the numeric expressions.
|
||||
|
||||
`div(< _Single Double_ >, < _Single Double_ >)` \=> `< _Single Double_ >`::
|
||||
* `div(3.5, 100)` \=> `.035`
|
||||
`div(< _Single Double_ >, < _Multi Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `div(3.5, [1, -4])` \=> `[3.5, -0.875]`
|
||||
`div(< _Multi Double_ >, < _Single Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `div([1, -4], 25)` \=> `[0.04, -0.16]`
|
||||
|
||||
=== Power
|
||||
Takes one numeric expression to the power of another.
|
||||
|
||||
*NOTE:* The square root function `sqrt(< _Double_ >)` can be used as shorthand for `pow(< _Double_ >, .5)`
|
||||
|
||||
`pow(< _Single Double_ >, < _Single Double_ >)` \=> `< _Single Double_ >`::
|
||||
* `pow(2, 4)` \=> `16.0`
|
||||
`pow(< _Single Double_ >, < _Multi Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `pow(16, [-1, 0])` \=> `[0.0625, 1]`
|
||||
`pow(< _Multi Double_ >, < _Single Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `pow([1, 16], .25)` \=> `[1.0, 2.0]`
|
||||
|
||||
=== Logarithm
|
||||
Takes one logarithm of numeric expressions, with an optional second numeric expression as the base.
|
||||
If only one expression is given, the natural log is used.
|
||||
|
||||
`log(< _Double_ >)` \=> `< _Double_ >`::
|
||||
* `log(5)` \=> `1.6094...`
|
||||
* `log([1.0, 100.34])` \=> `[0.0, 4.6085...]`
|
||||
`log(< _Single Double_ >, < _Single Double_ >)` \=> `< _Single Double_ >`::
|
||||
* `log(2, 4)` \=> `0.5`
|
||||
`log(< _Single Double_ >, < _Multi Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `log(16, [2, 4])` \=> `[4, 2]`
|
||||
`log(< _Multi Double_ >, < _Single Double_ >)` \=> `< _Multi Double_ >`::
|
||||
* `log([81, 3], 9)` \=> `[2.0, 0.5]`
|
||||
|
||||
== Logic
|
||||
|
||||
[[analytics-logic-neg]]
|
||||
=== Negation
|
||||
Negates the result of a boolean expression.
|
||||
|
||||
`neg(< _Bool_ >)` \=> `< _Bool_>`::
|
||||
* `neg(F)` \=> `T`
|
||||
* `neg([F, T])` \=> `[T, F]`
|
||||
|
||||
[[analytics-and]]
|
||||
=== And
|
||||
ANDs the values of the boolean expressions.
|
||||
|
||||
`and(< _Multi Bool_ >)` \=> `< _Single Bool_ >`::
|
||||
* `and([T, F, T])` \=> `F`
|
||||
`and(< _Single Bool_ >, < _Multi Bool_ >)` \=> `< _Multi Bool_ >`::
|
||||
* `and(F, [T, T])` \=> `[F, F]`
|
||||
`and(< _Multi Bool_ >, < _Single Bool_ >)` \=> `< _Multi Bool_ >`::
|
||||
* `and([F, T], T)` \=> `[F, T]`
|
||||
`and(< _Single Bool_ >, ...)` \=> `< _Single Bool_ >`::
|
||||
* `and(T, T, T)` \=> `T`
|
||||
|
||||
[[analytics-or]]
|
||||
=== Or
|
||||
ORs the values of the boolean expressions.
|
||||
|
||||
`or(< _Multi Bool_ >)` \=> `< _Single Bool_ >`::
|
||||
* `or([T, F, T])` \=> `T`
|
||||
`or(< _Single Bool_ >, < _Multi Bool_ >)` \=> `< _Multi Bool_ >`::
|
||||
* `or(F, [F, T])` \=> `[F, T]`
|
||||
`or(< _Multi Bool_ >, < _Single Bool_ >)` \=> `< _Multi Bool_ >`::
|
||||
* `or([F, T], T)` \=> `[T, T]`
|
||||
`or(< _Single Bool_ >, ...)` \=> `< _Single Bool_ >`::
|
||||
* `or(F, F, F)` \=> `F`
|
||||
|
||||
==== Exists
|
||||
Checks whether any value(s) exist for the expression.
|
||||
|
||||
`exists( T )` \=> `< _Single Bool_ >`::
|
||||
* `exists([1, 2, 3])` \=> `T`
|
||||
* `exists([])` \=> `F`
|
||||
* `exists(_empty_)` \=> `F`
|
||||
* `exists('abc')` \=> `T`
|
||||
|
||||
== Comparison
|
||||
|
||||
=== Equality
|
||||
Checks whether two expressions' values are equal. The parameters must be the same type, after implicit casting.
|
||||
|
||||
`equal(< _Single_ T >, < _Single_ T >)` \=> `< _Single Bool_ >`::
|
||||
* `equal(F, F)` \=> `T`
|
||||
`equal(< _Single_ T >, < _Multi_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `equal("a", ["a", "ab"])` \=> `[T, F]`
|
||||
`equal(< _Multi_ T >, < _Single_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `equal([1.5, -3.0], -3)` \=> `[F, T]`
|
||||
|
||||
=== Greater Than
|
||||
Checks whether a numeric or `Date` expression's values are greater than another expression's values.
|
||||
The parameters must be the same type, after implicit casting.
|
||||
|
||||
`gt(< _Single Numeric/Date_ T >, < _Single_ T >)` \=> `< _Single Bool_ >`::
|
||||
* `gt(1800-01-02, 1799-12-20)` \=> `F`
|
||||
`gt(< _Single Numeric/Date_ T >, < _Multi_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `gt(30.756, [30, 100])` \=> `[F, T]`
|
||||
`gt(< _Multi Numeric/Date_ T >, < _Single_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `gt([30, 75.6], 30)` \=> `[F, T]`
|
||||
|
||||
=== Greater Than or Equals
|
||||
Checks whether a numeric or `Date` expression's values are greater than or equal to another expression's values.
|
||||
The parameters must be the same type, after implicit casting.
|
||||
|
||||
`gte(< _Single Numeric/Date_ T >, < _Single_ T >)` \=> `< _Single Bool_ >`::
|
||||
* `gte(1800-01-02, 1799-12-20)` \=> `F`
|
||||
`gte(< _Single Numeric/Date_ T >, < _Multi_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `gte(30.756, [30, 100])` \=> `[F, T]`
|
||||
`gte(< _Multi Numeric/Date_ T >, < _Single_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `gte([30, 75.6], 30)` \=> `[T, T]`
|
||||
|
||||
=== Less Than
|
||||
Checks whether a numeric or `Date` expression's values are less than another expression's values.
|
||||
The parameters must be the same type, after implicit casting.
|
||||
|
||||
`lt(< _Single Numeric/Date_ T >, < _Single_ T >)` \=> `< _Single Bool_ >`::
|
||||
* `lt(1800-01-02, 1799-12-20)` \=> `T`
|
||||
`lt(< _Single Numeric/Date_ T >, < _Multi_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `lt(30.756, [30, 100])` \=> `[T, F]`
|
||||
`lt(< _Multi Numeric/Date_ T >, < _Single_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `lt([30, 75.6], 30)` \=> `[F, F]`
|
||||
|
||||
=== Less Than or Equals
|
||||
Checks whether a numeric or `Date` expression's values are less than or equal to another expression's values.
|
||||
The parameters must be the same type, after implicit casting.
|
||||
|
||||
`lte(< _Single Numeric/Date_ T >, < _Single_ T >)` \=> `< _Single Bool_ >`::
|
||||
* `lte(1800-01-02, 1799-12-20)` \=> `T`
|
||||
`lte(< _Single Numeric/Date_ T >, < _Multi_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `lte(30.756, [30, 100])` \=> `[T, F]`
|
||||
`lte(< _Multi Numeric/Date_ T >, < _Single_ T >)` \=> `< _Multi Bool_ >`::
|
||||
* `lte([30, 75.6], 30)` \=> `[T, F]`
|
||||
|
||||
[[analytics-top]]
|
||||
=== Top
|
||||
Returns the maximum of the numeric, `Date` or `String` expression(s)' values.
|
||||
The parameters must be the same type, after implicit casting.
|
||||
(Currently the only type not compatible is `Boolean`, which will be converted to a `String` implicitly in order to compile the expression)
|
||||
|
||||
`top(< _Multi_ T >)` \=> `< _Single_ T >`::
|
||||
* `top([30, 400, -10, 0])` \=> `400`
|
||||
`top(< _Single_ T >, ...)` \=> `< _Single_ T >`::
|
||||
* `top("a", 1, "d")` \=> `"d"`
|
||||
|
||||
=== Bottom
|
||||
Returns the minimum of the numeric, `Date` or `String` expression(s)' values.
|
||||
The parameters must be the same type, after implicit casting.
|
||||
(Currently the only type not compatible is `Boolean`, which will be converted to a `String` implicitly in order to compile the expression)
|
||||
|
||||
`bottom(< _Multi_ T >)` \=> `< _Single_ T >`::
|
||||
* `bottom([30, 400, -10, 0])` \=> `-10`
|
||||
`bottom(< _Single_ T >, ...)` \=> `< _Single_ T >`::
|
||||
* `bottom("a", 1, "d")` \=> `"1"`
|
||||
|
||||
== Conditional
|
||||
|
||||
[[analytics-if]]
|
||||
=== If
|
||||
Returns the value(s) of the `THEN` or `ELSE` expressions depending on whether the boolean conditional expression's value is `true` or `false`.
|
||||
The `THEN` and `ELSE` expressions must be of the same type and cardinality after implicit casting is done.
|
||||
|
||||
`if(< _Single Bool_>, < T >, < T >)` \=> `< T >`::
|
||||
* `if(true, "abc", [1,2])` \=> `["abc"]`
|
||||
* `if(false, "abc", 123)` \=> `"123"`
|
||||
|
||||
=== Replace
|
||||
Replace all values from the 1^st^ expression that are equal to the value of the 2^nd^ expression with the value of the 3^rd^ expression.
|
||||
All parameters must be the same type after implicit casting is done.
|
||||
|
||||
`replace(< T >, < _Single_ T >, < _Single_ T >)` \=> `< T >`::
|
||||
* `replace([1,3], 3, "4")` \=> `["1", "4"]`
|
||||
* `replace("abc", "abc", 18)` \=> `"18"`
|
||||
* `replace("abc", 1, "def")` \=> `"abc"`
|
||||
|
||||
=== Fill Missing
|
||||
If the 1^st^ expression does not have values, fill it with the values for the 2^nd^ expression.
|
||||
Both expressions must be of the same type and cardinality after implicit casting is done
|
||||
|
||||
`fill_missing(< T >, < T >)` \=> `< T >`::
|
||||
* `fill_missing([], 3)` \=> `[3]`
|
||||
* `fill_missing(_empty_, "abc")` \=> `"abc"`
|
||||
* `fill_missing("abc", [1])` \=> `["abc"]`
|
||||
|
||||
=== Remove
|
||||
Remove all occurrences of the 2^nd^ expression's value from the values of the 1^st^ expression.
|
||||
Both expressions must be of the same type after implicit casting is done
|
||||
|
||||
`remove(< T >, < _Single_ T >)` \=> `< T >`::
|
||||
* `remove([1,2,3,2], 2)` \=> `[1, 3]`
|
||||
* `remove("1", 1)` \=> `_empty_`
|
||||
* `remove(1, "abc")` \=> `"1"`
|
||||
|
||||
=== Filter
|
||||
Return the values of the 1^st^ expression if the value of the 2^nd^ expression is `true`, otherwise return no values.
|
||||
|
||||
`filter(< T >, < _Single Boolean_ >)` \=> `< T >`::
|
||||
* `filter([1,2,3], true)` \=> `[1,2,3]`
|
||||
* `filter([1,2,3], false)` \=> `[]`
|
||||
* `filter("abc", false)` \=> `_empty_`
|
||||
* `filter("abc", true)` \=> `1`
|
||||
|
||||
== Date
|
||||
|
||||
=== Date Parse
|
||||
Explicitly converts the values of a `String` or `Long` expression into `Dates`.
|
||||
|
||||
`date(< _String_ >)` \=> `< _Date_ >`::
|
||||
* `date('1800-01-02')` \=> `1800-01-02T​00:00:00Z`
|
||||
* `date(['1800-01-02', '2016-05-23'])` \=> `[1800-01-02T..., 2016-05-23T...]`
|
||||
`date(< _Long_ >)` \=> `< _Date_ >`::
|
||||
* `date(1232343246648)` \=> `2009-01-19T​05:34:06Z`
|
||||
* `date([1232343246648, 223234324664])` \=> `[2009-01-19T..., 1977-01-27T...]`
|
||||
|
||||
[[analytics-date-math]]
|
||||
=== Date Math
|
||||
Compute the given date math strings for the values of a `Date` expression. The date math strings *must* be <<analytics-expression-sources.adoc#strings, constant>>.
|
||||
|
||||
`date_math(< _Date_ >, < _Constant String_ >...)` \=> `< _Date_ >`::
|
||||
* `date_math(1800-04-15, '+1DAY', '-1MONTH')` \=> `1800-03-16`
|
||||
* `date_math([1800-04-15,2016-05-24], '+1DAY', '-1MONTH')` \=> `[1800-03-16, 2016-04-25]`
|
||||
|
||||
== String
|
||||
|
||||
=== Explicit Casting
|
||||
Explicitly casts the expression to a `String` expression.
|
||||
|
||||
`string(< _String_ >)` \=> `< _String_ >`::
|
||||
* `string(1)` \=> `'1'`
|
||||
* `string([1.5, -2.0])` \=> `['1.5', '-2.0']`
|
||||
|
||||
=== Concatenation
|
||||
Concatenations the values of the `String` expression(s) together.
|
||||
|
||||
`concat(< _Multi String_ >)` \=> `< _Single String_ >`::
|
||||
* `concat(['a','b','c'])` \=> `'abc'`
|
||||
`concat(< _Single String_ >, < _Multi String_ >)` \=> `< _Multi String_ >`::
|
||||
* `concat(1, ['a','b','c'])` \=> `['1a','1b','1c']`
|
||||
`concat(< _Multi String_ >, < _Single String_ >)` \=> `< _Multi String_ >`::
|
||||
* `concat(['a','b','c'], 1)` \=> `['a1','b1','c1']`
|
||||
`concat(< _Single String_ >...)` \=> `< _Single String_ >`::
|
||||
* `concat('a','b','c')` \=> `'abc'`
|
||||
* `concat('a',_empty_,'c')` \=> `'ac'` +
|
||||
_Empty values are ignored_
|
||||
|
||||
=== Separated Concatenation
|
||||
Concatenations the values of the `String` expression(s) together using the given <<analytics-expression-sources.adoc#strings, constant string>> value as a separator.
|
||||
|
||||
`concat_sep(< _Constant String_ >, < _Multi String_ >)` \=> `< _Single String_ >`::
|
||||
* `concat_sep('-', ['a','b'])` \=> `'a-b'`
|
||||
`concat_sep(< _Constant String_ >, < _Single String_ >, < _Multi String_ >)` \=> `< _Multi String_ >`::
|
||||
* `concat_sep(2,1,['a','b'])` \=> `['12a','12b']`
|
||||
`concat_sep(< _Constant String_ >, < _Multi String_ >, < _Single String_ >)` \=> `< _Multi String_ >`::
|
||||
* `concat_sep(2,['a','b'],1)` \=> `['a21','b21']`
|
||||
* `concat_sep('-','a',2,3)` \=> `'a-2-3'`
|
||||
* `concat_sep(';','a',_empty_,'c')` \=> `'a;c'` +
|
||||
_Empty values are ignored_
|
|
@ -0,0 +1,120 @@
|
|||
= Analytics Reduction Functions
|
||||
:page-tocclass: right
|
||||
:page-toclevels: 2
|
||||
// Licensed to the Apache Software Foundation (ASF) under one
|
||||
// or more contributor license agreements. See the NOTICE file
|
||||
// distributed with this work for additional information
|
||||
// regarding copyright ownership. The ASF licenses this file
|
||||
// to you under the Apache License, Version 2.0 (the
|
||||
// "License"); you may not use this file except in compliance
|
||||
// with the License. You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing,
|
||||
// software distributed under the License is distributed on an
|
||||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
// KIND, either express or implied. See the License for the
|
||||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
Reduction functions reduce the values of <<analytics-expression-sources.adoc#analytics-expression-sources,sources>>
|
||||
and/or unreduced <<analytics-mapping-functions.adoc#analytics-mapping-functions,mapping functions>>
|
||||
for every Solr Document to a single value.
|
||||
|
||||
Below is a list of all reduction functions provided by the Analytics Component.
|
||||
These can be combined using mapping functions to implement more complex functionality.
|
||||
|
||||
== Counting Reductions
|
||||
|
||||
=== Count
|
||||
The number of existing values for an expression. For single-valued expressions, this is equivalent to `docCount`.
|
||||
If no expression is given, the number of matching documents is returned.
|
||||
|
||||
`count()` \=> `< _Single Long_ >`
|
||||
`count(< T >)` \=> `< _Single Long_ >`
|
||||
|
||||
=== Doc Count
|
||||
The number of documents for which an expression has existing values. For single-valued expressions, this is equivalent to `count`.
|
||||
If no expression is given, the number of matching documents is returned.
|
||||
|
||||
`doc_count()` \=> `< _Single Long_ >`
|
||||
|
||||
`doc_count(< T >)` \=> `< _Single Long_ >`
|
||||
|
||||
=== Missing
|
||||
The number of documents for which an expression has no existing value.
|
||||
|
||||
`missing(< T >)` \=> `< _Single Long_ >`
|
||||
|
||||
[[analytics-unique]]
|
||||
=== Unique
|
||||
The number of unique values for an expression. This function accepts `Numeric`, `Date` and `String` expressions.
|
||||
|
||||
`unique(< T >)` \=> `< _Single Long_ >`
|
||||
|
||||
== Math Reductions
|
||||
|
||||
=== Sum
|
||||
Returns the sum of all values for the expression.
|
||||
|
||||
`sum(< _Double_ >)` \=> `< _Single Double_ >`
|
||||
|
||||
=== Variance
|
||||
Returns the variance of all values for the expression.
|
||||
|
||||
`variance(< _Double_ >)` \=> `< _Single Double_ >`
|
||||
|
||||
=== Standard Deviation
|
||||
Returns the standard deviation of all values for the expression.
|
||||
|
||||
`stddev(< _Double_ >)` \=> `< _Single Double_ >`
|
||||
|
||||
=== Mean
|
||||
Returns the arithmetic mean of all values for the expression.
|
||||
|
||||
`mean(< _Double_ >)` \=> `< _Single Double_ >`
|
||||
|
||||
=== Weighted Mean
|
||||
Returns the arithmetic mean of all values for the second expression weighted by the values of the first expression.
|
||||
|
||||
`wmean(< _Double_ >, < _Double_ >)` \=> `< _Single Double_ >`
|
||||
|
||||
NOTE: The expressions must satisfy the rules for `mult` function parameters.
|
||||
|
||||
== Ordering Reductions
|
||||
|
||||
=== Minimum
|
||||
Returns the minimum value for the expression. This function accepts `Numeric`, `Date` and `String` expressions.
|
||||
|
||||
`min(< T >)` \=> `< _Single_ T >`
|
||||
|
||||
=== Maximum
|
||||
Returns the maximum value for the expression. This function accepts `Numeric`, `Date` and `String` expressions.
|
||||
|
||||
`max(< T >)` \=> `< _Single_ T >`
|
||||
|
||||
=== Median
|
||||
Returns the median of all values for the expression. This function accepts `Numeric` and `Date` expressions.
|
||||
|
||||
`median(< T >)` \=> `< _Single_ T >`
|
||||
|
||||
=== Percentile
|
||||
Calculates the given percentile of all values for the expression.
|
||||
This function accepts `Numeric`, `Date` and `String` expressions for the 2^nd^ parameter.
|
||||
|
||||
The percentile, given as the 1^st^ parameter, must be a <<analytics-expression-sources.adoc#numeric,constant double>> between [0, 100).
|
||||
|
||||
`percentile(<Constant Double>, < T >)` \=> `< _Single_ T >`
|
||||
|
||||
=== Ordinal
|
||||
Calculates the given ordinal of all values for the expression.
|
||||
This function accepts `Numeric`, `Date` and `String` expressions for the 2^nd^ parameter.
|
||||
The ordinal, given as the 1^st^ parameter, must be a <<analytics-expression-sources.adoc#numeric,constant integer>>.
|
||||
*0 is not accepted as an ordinal value.*
|
||||
|
||||
If the ordinal is positive, the returned value will be the _n_^th^ smallest value.
|
||||
|
||||
If the ordinal is negative, the returned value will be the _n_^th^ largest value.
|
||||
|
||||
`ordinal(<Constant Int>, < T >)` \=> `< _Single_ T >`
|
|
@ -0,0 +1,819 @@
|
|||
= Analytics Component
|
||||
:page-children: analytics-expression-sources, analytics-mapping-functions, analytics-reduction-functions
|
||||
:page-tocclass: right
|
||||
:page-toclevel: 2
|
||||
// Licensed to the Apache Software Foundation (ASF) under one
|
||||
// or more contributor license agreements. See the NOTICE file
|
||||
// distributed with this work for additional information
|
||||
// regarding copyright ownership. The ASF licenses this file
|
||||
// to you under the Apache License, Version 2.0 (the
|
||||
// "License"); you may not use this file except in compliance
|
||||
// with the License. You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing,
|
||||
// software distributed under the License is distributed on an
|
||||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
// KIND, either express or implied. See the License for the
|
||||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
The Analytics Component allows users to calculate complex statistical aggregations over result sets.
|
||||
|
||||
The component enables interacting with data in a variety of ways, both through a diverse set of analytics functions as well as powerful faceting functionality.
|
||||
The standard facets are supported within the analytics component with additions that leverage its analytical capabilities.
|
||||
|
||||
== Analytics Configuration
|
||||
|
||||
The Analytics component is in a contrib module, therefore it will need to be enabled in the `solrconfig.xml` for each collection where you would like to use it.
|
||||
|
||||
Since the Analytics framework is a _search component_, it must be declared as such and added to the search handler.
|
||||
|
||||
For distributed analytics requests over cloud collections, the component uses the `AnalyticsHandler` strictly for inter-shard communication.
|
||||
The Analytics Handler should not be used by users to submit analytics requests.
|
||||
|
||||
To configure Solr to use the Analytics Component, the first step is to add a `lib` directive so Solr loads the Analytic Component classes (for more about the `lib` directive, see <<lib-directives-in-solrconfig.adoc#lib-directives-in-solrconfig, Lib Directives in SolrConfig>>). In the section of `solrconfig.xml` where the default `lib` directive are, add a line:
|
||||
|
||||
[source,xml]
|
||||
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-analytics-\d.*\.jar" />
|
||||
|
||||
Next you need to enable the request handler and search component. Add the following lines to `solrconfig.xml`, near the defintions for other request handlers:
|
||||
|
||||
[source,xml]
|
||||
.solrconfig.xml
|
||||
----
|
||||
<!-- To handle user requests -->
|
||||
<searchComponent name="analytics" class="org.apache.solr.handler.component.AnalyticsComponent" />
|
||||
|
||||
<requestHandler name="/select" class="solr.SearchHandler">
|
||||
<arr name="last_components">
|
||||
<str>analytics</str>
|
||||
</arr>
|
||||
</requestHandler>
|
||||
|
||||
<!-- For inter-shard communication during distributed requests -->
|
||||
<requestHandler name="/analytics" class="org.apache.solr.handler.AnalyticsHandler" />
|
||||
----
|
||||
|
||||
For these changes to take effect, restart Solr or reload the core or collection.
|
||||
|
||||
== Request Syntax
|
||||
|
||||
An Analytics request is passed to Solr with the parameter `analytics` in a request sent to the
|
||||
<<requesthandlers-and-searchcomponents-in-solrconfig.adoc#searchhandlers,Search Handler>>.
|
||||
Since the analytics request is sent inside of a search handler request, it will compute results based on the result set determined by the search handler.
|
||||
|
||||
For example, this curl command encodes and POSTs a simple analytics request to the the search handler:
|
||||
|
||||
[source,bash]
|
||||
----
|
||||
curl --data-urlencode 'analytics={
|
||||
"expressions" : {
|
||||
"revenue" : "sum(mult(price,quantity))"
|
||||
}
|
||||
}'
|
||||
http://localhost:8983/solr/sales/select?q=*:*&wt=json&rows=0
|
||||
----
|
||||
|
||||
There are 3 main parts of any analytics request:
|
||||
|
||||
Expressions::
|
||||
A list of calculations to perform over the entire result set. Expressions aggregate the search results into a single value to return.
|
||||
This list is entirely independent of the expressions defined in each of the groupings. Find out more about them in the section <<Expressions>>.
|
||||
|
||||
Functions::
|
||||
One or more <<variable-functions, Variable Functions>> to be used throughout the rest of the request. These are essentially lambda functions and can be combined in a number of ways.
|
||||
These functions for the expressions defined in `expressions` as well as `groupings`.
|
||||
|
||||
Groupings::
|
||||
The list of <<groupings-and-facets, Groupings>> to calculate in addition to the expressions.
|
||||
Groupings hold a set of facets and a list of expressions to compute over those facets.
|
||||
The expressions defined in a grouping are only calculated over the facets defined in that grouping.
|
||||
|
||||
[NOTE]
|
||||
.Optional Parameters
|
||||
Either the `expressions` or the `groupings` parameter must be present in the request, or else there will be no analytics to compute.
|
||||
The `functions` parameter is always optional.
|
||||
|
||||
[source,json]
|
||||
.Example Analytics Request
|
||||
----
|
||||
{
|
||||
"functions": {
|
||||
"sale()": "mult(price,quantity)"
|
||||
},
|
||||
"expressions" : {
|
||||
"max_sale" : "max(sale())",
|
||||
"med_sale" : "median(sale())"
|
||||
},
|
||||
"groupings" : {
|
||||
"sales" : {
|
||||
"expressions" : {
|
||||
"stddev_sale" : "stddev(sale())",
|
||||
"min_price" : "min(price)",
|
||||
"max_quantity" : "max(quantity)"
|
||||
},
|
||||
"facets" : {
|
||||
"category" : {
|
||||
"type" : "value",
|
||||
"expression" : "fill_missing(category, 'No Category')",
|
||||
"sort" : {
|
||||
"criteria" : [
|
||||
{
|
||||
"type" : "expression",
|
||||
"expression" : "min_price",
|
||||
"direction" : "ascending"
|
||||
},
|
||||
{
|
||||
"type" : "facetvalue",
|
||||
"direction" : "descending"
|
||||
}
|
||||
],
|
||||
"limit" : 10
|
||||
}
|
||||
},
|
||||
"temps" : {
|
||||
"type" : "query",
|
||||
"queries" : {
|
||||
"hot" : "temp:[90 TO *]",
|
||||
"cold" : "temp:[* TO 50]"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Expressions
|
||||
|
||||
Expressions are the way to request pieces of information from the analytics component. These are the statistical expressions that you want computed and returned in your response.
|
||||
|
||||
=== Constructing an Expression
|
||||
|
||||
==== Expression Components
|
||||
|
||||
An expression is built using fields, constants, mapping functions and reduction functions. The ways that these can be defined are described below.
|
||||
|
||||
Sources::
|
||||
* Constants: The values defined in the expression.
|
||||
The supported constant types are described in the <<analytics-expression-sources.adoc#constants, Analytics Expression Source Reference>>.
|
||||
|
||||
* Fields: Solr fields that are read from the index.
|
||||
The supported fields are listed in the <<analytics-expression-sources.adoc#supported-field-types, Analytics Expression Source Reference>>.
|
||||
|
||||
Mapping Functions::
|
||||
Mapping functions map values for each Solr Document or Reduction.
|
||||
The provided mapping functions are detailed in the <<analytics-mapping-functions.adoc#analytics-mapping-functions,Analytics Mapping Function Reference>>.
|
||||
|
||||
* Unreduced Mapping: Mapping a Field with another Field or Constant returns a value for every Solr Document.
|
||||
Unreduced mapping functions can take fields, constants as well as other unreduced mapping functions as input.
|
||||
|
||||
* Reduced Mapping: Mapping a Reduction Function with another Reduction Function or Constant returns a single value.
|
||||
|
||||
Reduction Functions::
|
||||
Functions that reduce the values of sources and/or unreduced mapping functions for every Solr Document to a single value.
|
||||
The provided reduction functions are detailed in the <<analytics-reduction-functions.adoc#analytics-reduction-functions,Analytics Reduction Function Reference>>.
|
||||
|
||||
==== Component Ordering
|
||||
|
||||
The expression components must be used in the following order to create valid expressions.
|
||||
|
||||
. Reduced Mapping Function
|
||||
.. Constants
|
||||
.. Reduction Function
|
||||
... Sources
|
||||
... Unreduced Mapping Function
|
||||
.... Sources
|
||||
.... Unreduced Mapping Function
|
||||
.. Reduced Mapping Function
|
||||
. Reduction Function
|
||||
|
||||
This ordering is based on the following rules:
|
||||
|
||||
* No reduction function can be an argument of another reduction function.
|
||||
Since all reduction is done together in one step, one reduction function cannot rely on the result of another.
|
||||
* No fields can be left unreduced, since the analytics component cannot return a list of values for an expression (one for every document).
|
||||
Every expression must be reduced to a single value.
|
||||
* Mapping functions are not necessary when creating functions, however as many nested mappings as needed can be used.
|
||||
* Nested mapping functions must be the same type, so either both must be unreduced or both must be reduced.
|
||||
A reduced mapping function cannot take an unreduced mapping function as a parameter and vice versa.
|
||||
|
||||
==== Example Construction
|
||||
|
||||
With the above definitions and ordering, an example expression can be broken up into its components:
|
||||
|
||||
[source,bash]
|
||||
div(sum(a,fill_missing(b,0)),add(10.5,count(mult(a,c)))))
|
||||
|
||||
As a whole, this is a reduced mapping function. The `div` function is a reduced mapping function since it is a <<analytics-mapping-functions.adoc#division,provided mapping function>> and has reduced arguments.
|
||||
|
||||
If we break down the expression further:
|
||||
|
||||
* `sum(a,fill_missing(b,0))`: Reduction Function +
|
||||
`sum` is a <<analytics-reduction-functions.adoc#sum,provided reduction function>>.
|
||||
** `a`: Field
|
||||
** `fill_missing(b,0)`: Unreduced Mapping Function +
|
||||
`fill_missing` is an unreduced mapping function since it is a <<analytics-mapping-functions.adoc#fill-missing,provided mapping function>> and has a field argument.
|
||||
*** `b`: Field
|
||||
*** `0`: Constant
|
||||
|
||||
* `add(10.5,count(mult(a,c)))`: Reduced Mapping Function +
|
||||
`add` is a reduced mapping function since it is a <<analytics-mapping-functions.adoc#addition,provided mapping function>> and has a reduction function argument.
|
||||
** `10.5`: Constant
|
||||
** `count(mult(a,c))`: Reduction Function +
|
||||
`count` is a <<analytics-reduction-functions.adoc#count,provided reduction function>>
|
||||
*** `mult(a,c)`: Unreduced Mapping Function +
|
||||
`mult` is an unreduced mapping function since it is a <<analytics-mapping-functions.adoc#multiplication,provided mapping function>> and has two field arguments.
|
||||
**** `a`: Field
|
||||
**** `c`: Field
|
||||
|
||||
=== Expression Cardinality (Multi-Valued and Single-Valued)
|
||||
|
||||
The root of all multi-valued expressions are multi-valued fields. Single-valued expressions can be started with constants or single-valued fields.
|
||||
All single-valued expressions can be treated as multi-valued expressions that contain one value.
|
||||
|
||||
Single-valued expressions and multi-valued expressions can be used together in many mapping functions, as well as multi-valued expressions being used alone, and many single-valued expressions being used together. For example:
|
||||
|
||||
`add(<single-valued double>, <single-valued double>, ...)`::
|
||||
Returns a single-valued double expression where the value of the values of each expression are added.
|
||||
|
||||
`add(<single-valued double>, <multi-valued double>)`::
|
||||
Returns a multi-valued double expression where each value of the second expression is added to the single value of the first expression.
|
||||
|
||||
`add(<multi-valued double>, <single-valued double>)`::
|
||||
Acts the same as the above function.
|
||||
|
||||
`add(<multi-valued double>)`::
|
||||
Returns a single-valued double expression which is the sum of the multiple values of the parameter expression.
|
||||
|
||||
=== Types and Implicit Casting
|
||||
|
||||
The new analytics component currently supports the types listed in the below table.
|
||||
These types have one-way implicit casting enabled for the following relationships:
|
||||
|
||||
[cols="20s,80",options="header"]
|
||||
|===
|
||||
| Type | Implicitly Casts To
|
||||
| Boolean | String
|
||||
| Date | Long, String
|
||||
| Integer | Long, Float, Double, String
|
||||
| Long | Double, String
|
||||
| Float | Double, String
|
||||
| Double | String
|
||||
| String | _none_
|
||||
|===
|
||||
|
||||
An implicit cast means that if a function requires a certain type of value as a parameter, arguments will be automatically converted to that type if it is possible.
|
||||
|
||||
For example, `concat()` only accepts string parameters and since all types can be implicitly cast to strings, any type is accepted as an argument.
|
||||
|
||||
This also goes for dynamically typed functions. `fill_missing()` requires two arguments of the same type. However, two types that implicitly cast to the same type can also be used.
|
||||
|
||||
For example, `fill_missing(<long>,<float>)` will be cast to `fill_missing(<double>,<double>)` since long cannot be cast to float and float cannot be cast to long implicitly.
|
||||
|
||||
There is an ordering to implicit casts, where the more specialized type is ordered ahead of the more general type.
|
||||
Therefore even though both long and float can be implicitly cast to double and string, they will be cast to double.
|
||||
This is because double is a more specialized type than string, which every type can be cast to.
|
||||
|
||||
The ordering is the same as their order in the above table.
|
||||
|
||||
Cardinality can also be implicitly cast.
|
||||
Single-valued expressions can always be implicitly cast to multi-valued expressions, since all single-valued expressions are multi-valued expressions with one value.
|
||||
|
||||
Implicit casting will only occur when an expression will not "compile" without it.
|
||||
If an expression follows all typing rules initially, no implicit casting will occur.
|
||||
Certain functions such as `string()`, `date()`, `round()`, `floor()`, and `ceil()` act as explicit casts, declaring the type that is desired.
|
||||
However `round()`, `floor()` and `cell()` can return either int or long, depending on the argument type.
|
||||
|
||||
== Variable Functions
|
||||
|
||||
Variable functions are a way to shorten your expressions and make writing analytics queries easier. They are essentially lambda functions defined in a request.
|
||||
|
||||
[source,json]
|
||||
.Example Basic Function
|
||||
----
|
||||
{
|
||||
"functions" : {
|
||||
"sale()" : "mult(price,quantity)"
|
||||
},
|
||||
"expressions" : {
|
||||
"max_sale" : "max(sale())",
|
||||
"med_sale" : "median(sale())"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
In the above request, instead of writing `mult(price,quantity)` twice, a function `sale()` was defined to abstract this idea. Then that function was used in the multiple expressions.
|
||||
|
||||
Suppose that we want to look at the sales of specific categories:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"functions" : {
|
||||
"clothing_sale()" : "filter(mult(price,quantity),equal(category,'Clothing'))",
|
||||
"kitchen_sale()" : "filter(mult(price,quantity),equal(category,\"Kitchen\"))"
|
||||
},
|
||||
"expressions" : {
|
||||
"max_clothing_sale" : "max(clothing_sale())"
|
||||
, "med_clothing_sale" : "median(clothing_sale())"
|
||||
, "max_kitchen_sale" : "max(kitchen_sale())"
|
||||
, "med_kitchen_sale" : "median(kitchen_sale())"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
=== Arguments
|
||||
|
||||
Instead of making a function for each category, it would be much easier to use `category` as an input to the `sale()` function.
|
||||
An example of this functionality is shown below:
|
||||
|
||||
[source,json]
|
||||
.Example Function with Arguments
|
||||
----
|
||||
{
|
||||
"functions" : {
|
||||
"sale(cat)" : "filter(mult(price,quantity),equal(category,cat))"
|
||||
},
|
||||
"expressions" : {
|
||||
"max_clothing_sale" : "max(sale(\"Clothing\"))"
|
||||
, "med_clothing_sale" : "median(sale('Clothing'))"
|
||||
, "max_kitchen_sale" : "max(sale(\"Kitchen\"))"
|
||||
, "med_kitchen_sale" : "median(sale('Kitchen'))"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
Variable Functions can take any number of arguments and use them in the function expression as if they were a field or constant.
|
||||
|
||||
=== Variable Length Arguments
|
||||
|
||||
There are analytics functions that take a variable amount of parameters.
|
||||
Therefore there are use cases where variable functions would need to take a variable amount of parameters.
|
||||
|
||||
For example, maybe there are multiple, yet undetermined, number of components to the price of a product.
|
||||
Functions can take a variable length of parameters if the last parameter is followed by `..`
|
||||
|
||||
[source,json]
|
||||
.Example Function with a Variable Length Argument
|
||||
----
|
||||
{
|
||||
"functions" : {
|
||||
"sale(cat, costs..)" : "filter(mult(add(costs),quantity),equal(category,cat))"
|
||||
},
|
||||
"expressions" : {
|
||||
"max_clothing_sale" : "max(sale('Clothing', material, tariff, tax))"
|
||||
, "med_clothing_sale" : "median(sale('Clothing', material, tariff, tax))"
|
||||
, "max_kitchen_sale" : "max(sale('Kitchen', material, construction))"
|
||||
, "med_kitchen_sale" : "median(sale('Kitchen', material, construction))"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
In the above example a variable length argument is used to encapsulate all of the costs to use for a product.
|
||||
There is no definite number of arguments requested for the variable length parameter, therefore the clothing expressions can use 3 and the kitchen expressions can use 2.
|
||||
When the `sale()` function is called, `costs` is expanded to the arguments given.
|
||||
|
||||
Therefore in the above request, inside of the `sale` function:
|
||||
|
||||
* `add(costs)`
|
||||
|
||||
is expanded to both of the following:
|
||||
|
||||
* `add(material, tariff, tax)`
|
||||
* `add(material, construction)`
|
||||
|
||||
=== For-Each Functions
|
||||
|
||||
[CAUTION]
|
||||
.Advanced Functionality
|
||||
====
|
||||
The following function details are for advanced requests.
|
||||
====
|
||||
|
||||
Although the above functionality allows for an undefined number of arguments to be passed to a function, it does not allow for interacting with those arguments.
|
||||
|
||||
Many times we might want to wrap each argument in additional functions.
|
||||
For example maybe we want to be able to look at multiple categories at the same time.
|
||||
So we want to see if `category EQUALS x *OR* category EQUALS y` and so on.
|
||||
|
||||
In order to do this we need to use for-each lambda functions, which transform each value of the variable length parameter.
|
||||
The for-each is started with the `:` character after the variable length parameter.
|
||||
|
||||
[source,json]
|
||||
.Example Function with a For-Each
|
||||
----
|
||||
{
|
||||
"functions" : {
|
||||
"sale(cats..)" : "filter(mult(price,quantity),or(cats:equal(category,_)))"
|
||||
},
|
||||
"expressions" : {
|
||||
"max_sale_1" : "max(sale('Clothing', 'Kitchen'))"
|
||||
, "med_sale_1" : "median(sale('Clothing', 'Kitchen'))"
|
||||
, "max_sale_2" : "max(sale('Electronics', 'Entertainment', 'Travel'))"
|
||||
, "med_sale_2" : "median(sale('Electronics', 'Entertainment', 'Travel'))"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
In this example, `cats:` is the syntax that starts a for-each lambda function over every parameter `cats`, and the `\_` character is used to refer to the value of `cats` in each iteration in the for-each.
|
||||
When `sale("Clothing", "Kitchen")` is called, the lambda function `equal(category,_)` is applied to both Clothing and Kitchen inside of the `or()` function.
|
||||
|
||||
Using all of these rules, the expression:
|
||||
|
||||
[source,text]
|
||||
`sale("Clothing","Kitchen")`
|
||||
|
||||
is expanded to:
|
||||
|
||||
[source,text]
|
||||
`filter(mult(price,quantity),or(equal(category,"Kitchen"),equal(category,"Clothing")))`
|
||||
|
||||
by the expression parser.
|
||||
|
||||
== Groupings And Facets
|
||||
|
||||
Facets, much like in other parts of Solr, allow analytics results to be broken up and grouped by attributes of the data that the expressions are being calculated over.
|
||||
|
||||
The currently available facets for use in the analytics component are Value Facets, Pivot Facets, Range Facets and Query Facets.
|
||||
Each facet is required to have a unique name within the grouping it is defined in, and no facet can be defined outside of a grouping.
|
||||
|
||||
Groupings allow users to calculate the same grouping of expressions over a set of facets.
|
||||
Groupings must have both `expressions` and `facets` given.
|
||||
|
||||
[source,json]
|
||||
.Example Base Facet Request
|
||||
----
|
||||
{
|
||||
"functions" : {
|
||||
"sale()" : "mult(price,quantity)"
|
||||
},
|
||||
"groupings" : {
|
||||
"sales_numbers" : {
|
||||
"expressions" : {
|
||||
"max_sale" : "max(sale())",
|
||||
"med_sale" : "median(sale())"
|
||||
},
|
||||
"facets" : {
|
||||
"<name>" : "< facet request >"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
[source,json]
|
||||
.Example Base Facet Response
|
||||
----
|
||||
{
|
||||
"analytics_response" : {
|
||||
"groupings" : {
|
||||
"sales_numbers" : {
|
||||
"<name>" : "< facet response >"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
=== Facet Sorting
|
||||
|
||||
Some Analytics facets allow for complex sorting of their results.
|
||||
The two current sortable facets are <<value-facets, Analytic Value Facets>> and <<analytic-pivot-facets, Analytic Pivot Facets>>.
|
||||
|
||||
==== Parameters
|
||||
|
||||
`criteria`::
|
||||
The list of criteria to sort the facet by.
|
||||
+
|
||||
It takes the following parameters:
|
||||
|
||||
`type`::: The type of sort. There are two possible values:
|
||||
* `expression`: Sort by the value of an expression defined in the same grouping.
|
||||
* `facetvalue`: Sort by the string-representation of the facet value.
|
||||
|
||||
`Direction`:::
|
||||
_(Optional)_ The direction to sort.
|
||||
* `ascending` _(Default)_
|
||||
* `descending`
|
||||
|
||||
`expression`:::
|
||||
When `type = expression`, the name of an expression defined in the same grouping.
|
||||
|
||||
`limit`::
|
||||
Limit the number of returned facet values to the top _N_. _(Optional)_
|
||||
|
||||
`offset`::
|
||||
When a limit is set, skip the top _N_ facet values. _(Optional)_
|
||||
|
||||
[source,json]
|
||||
.Example Sort Request
|
||||
----
|
||||
{
|
||||
"criteria" : [
|
||||
{
|
||||
"type" : "expression",
|
||||
"expression" : "max_sale",
|
||||
"direction" : "ascending"
|
||||
},
|
||||
{
|
||||
"type" : "facetvalue",
|
||||
"direction" : "descending"
|
||||
}
|
||||
],
|
||||
"limit" : 10,
|
||||
"offset" : 5
|
||||
}
|
||||
----
|
||||
|
||||
=== Value Facets
|
||||
|
||||
Value Facets are used to group documents by the value of a mapping expression applied to each document.
|
||||
Mapping expressions are expressions that do not include a reduction function.
|
||||
|
||||
For more information, refer to the <<expression-components, Expressions section>>.
|
||||
|
||||
* `mult(quantity, sum(price, tax))`: breakup documents by the revenue generated
|
||||
* `fillmissing(state, "N/A")`: breakup documents by state, where N/A is used when the document doesn't contain a state
|
||||
|
||||
Value Facets can be sorted.
|
||||
|
||||
==== Parameters
|
||||
|
||||
`expression`:: The expression to choose a facet bucket for each document.
|
||||
`sort`:: A <<Facet Sorting,sort>> for the results of the pivot.
|
||||
|
||||
[NOTE]
|
||||
.Optional Parameters
|
||||
The `sort` parameter is optional.
|
||||
|
||||
[source,json]
|
||||
.Example Value Facet Request
|
||||
----
|
||||
{
|
||||
"type" : "value",
|
||||
"expression" : "fillmissing(category,'No Category')",
|
||||
"sort" : {}
|
||||
}
|
||||
----
|
||||
|
||||
[source,json]
|
||||
.Example Value Facet Response
|
||||
----
|
||||
[
|
||||
{ "..." : "..." },
|
||||
{
|
||||
"value" : "Electronics",
|
||||
"results" : {
|
||||
"max_sale" : 103.75,
|
||||
"med_sale" : 15.5
|
||||
}
|
||||
},
|
||||
{
|
||||
"value" : "Kitchen",
|
||||
"results" : {
|
||||
"max_sale" : 88.25,
|
||||
"med_sale" : 11.37
|
||||
}
|
||||
},
|
||||
{ "..." : "..." }
|
||||
]
|
||||
----
|
||||
|
||||
[NOTE]
|
||||
.Field Facets
|
||||
This is a replacement for Field Facets in the original Analytics Component.
|
||||
Field Facet functionality is maintained in Value Facets by using the name of a field as the expression.
|
||||
|
||||
=== Analytic Pivot Facets
|
||||
|
||||
Pivot Facets are used to group documents by the value of multiple mapping expressions applied to each document.
|
||||
|
||||
Pivot Facets work much like layers of <<value-facets,Analytic Value Facets>>.
|
||||
A list of pivots is required, and the order of the list directly impacts the results returned.
|
||||
The first pivot given will be treated like a normal value facet.
|
||||
The second pivot given will be treated like one value facet for each value of the first pivot.
|
||||
Each of these second-level value facets will be limited to the documents in their first-level facet bucket.
|
||||
This continues for however many pivots are provided.
|
||||
|
||||
Sorting is enabled on a per-pivot basis. This means that if your top pivot has a sort with `limit:1`, then only that first value of the facet will be drilled down into. Sorting in each pivot is independent of the other pivots.
|
||||
|
||||
==== Parameters
|
||||
|
||||
`pivots`:: The list of pivots to calculate a drill-down facet for. The list is ordered by top-most to bottom-most level.
|
||||
`name`::: The name of the pivot.
|
||||
`expression`::: The expression to choose a facet bucket for each document.
|
||||
`sort`::: A <<Facet Sorting,sort>> for the results of the pivot.
|
||||
|
||||
[NOTE]
|
||||
.Optional Parameters
|
||||
The `sort` parameter within the pivot object is optional, and can be given in any, none or all of the provided pivots.
|
||||
|
||||
[source,json]
|
||||
.Example Pivot Facet Request
|
||||
----
|
||||
{
|
||||
"type" : "pivot",
|
||||
"pivots" : [
|
||||
{
|
||||
"name" : "country",
|
||||
"expression" : "country",
|
||||
"sort" : {}
|
||||
},
|
||||
{
|
||||
"name" : "state",
|
||||
"expression" : "fillmissing(state, fillmissing(providence, territory))"
|
||||
},
|
||||
{
|
||||
"name" : "city",
|
||||
"expression" : "fillmissing(city, 'N/A')",
|
||||
"sort" : {}
|
||||
}
|
||||
]
|
||||
}
|
||||
----
|
||||
|
||||
|
||||
[source,json]
|
||||
.Example Pivot Facet Response
|
||||
----
|
||||
[
|
||||
{ "..." : "..." },
|
||||
{
|
||||
"pivot" : "Country",
|
||||
"value" : "USA",
|
||||
"results" : {
|
||||
"max_sale" : 103.75,
|
||||
"med_sale" : 15.5
|
||||
},
|
||||
"children" : [
|
||||
{ "..." : "..." },
|
||||
{
|
||||
"pivot" : "State",
|
||||
"value" : "Texas",
|
||||
"results" : {
|
||||
"max_sale" : 99.2,
|
||||
"med_sale" : 20.35
|
||||
},
|
||||
"children" : [
|
||||
{ "..." : "..." },
|
||||
{
|
||||
"pivot" : "City",
|
||||
"value" : "Austin",
|
||||
"results" : {
|
||||
"max_sale" : 94.34,
|
||||
"med_sale" : 17.60
|
||||
}
|
||||
},
|
||||
{ "..." : "..." }
|
||||
]
|
||||
},
|
||||
{ "..." : "..." }
|
||||
]
|
||||
},
|
||||
{ "..." : "..." }
|
||||
]
|
||||
----
|
||||
|
||||
=== Analytics Range Facets
|
||||
|
||||
Range Facets are used to group documents by the value of a field into a given set of ranges.
|
||||
The inputs for analytics range facets are identical to those used for Solr range facets.
|
||||
Refer to the <<faceting.adoc#range-faceting,Range Facet documentation>> for additional questions regarding use.
|
||||
|
||||
==== Parameters
|
||||
|
||||
`field`:: Field to be faceted over
|
||||
`start`:: The bottom end of the range
|
||||
`end`:: The top end of the range
|
||||
`gap`:: A list of range gaps to generate facet buckets. If the buckets do not add up to fit the `start` to `end` range,
|
||||
then the last `gap` value will repeated as many times as needed to fill any unused range.
|
||||
`hardend`:: Whether to cutoff the last facet bucket range at the `end` value if it spills over. Defaults to `false`.
|
||||
`include`:: The boundaries to include in the facet buckets. Defaults to `lower`.
|
||||
* `lower` - All gap-based ranges include their lower bound.
|
||||
* `upper` - All gap-based ranges include their upper bound.
|
||||
* `edge` - The first and last gap ranges include their edge bounds (lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified.
|
||||
* `outer` - The `before` and `after` ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries.
|
||||
* `all` - Includes all options: `lower`, `upper`, `edge`, and `outer`
|
||||
`others`:: Additional ranges to include in the facet. Defaults to `none`.
|
||||
* `before` - All records with field values lower then lower bound of the first range.
|
||||
* `after` - All records with field values greater then the upper bound of the last range.
|
||||
* `between` - All records with field values between the lower bound of the first range and the upper bound of the last range.
|
||||
* `none` - Include facet buckets for none of the above.
|
||||
* `all` - Include facet buckets for `before`, `after` and `between`.
|
||||
|
||||
[NOTE]
|
||||
.Optional Parameters
|
||||
The `hardend`, `include` and `others` parameters are all optional.
|
||||
|
||||
[source,json]
|
||||
.Example Range Facet Request
|
||||
----
|
||||
{
|
||||
"type" : "range",
|
||||
"field" : "price",
|
||||
"start" : "0",
|
||||
"end" : "100",
|
||||
"gap" : [
|
||||
"5",
|
||||
"10",
|
||||
"10",
|
||||
"25"
|
||||
],
|
||||
"hardend" : true,
|
||||
"include" : [
|
||||
"lower",
|
||||
"upper"
|
||||
],
|
||||
"others" : [
|
||||
"after",
|
||||
"between"
|
||||
]
|
||||
}
|
||||
----
|
||||
|
||||
[source,json]
|
||||
.Example Range Facet Response
|
||||
----
|
||||
[
|
||||
{
|
||||
"value" : "[0 TO 5]",
|
||||
"results" : {
|
||||
"max_sale" : 4.75,
|
||||
"med_sale" : 3.45
|
||||
}
|
||||
},
|
||||
{
|
||||
"value" : "[5 TO 15]",
|
||||
"results" : {
|
||||
"max_sale" : 13.25,
|
||||
"med_sale" : 10.20
|
||||
}
|
||||
},
|
||||
{
|
||||
"value" : "[15 TO 25]",
|
||||
"results" : {
|
||||
"max_sale" : 22.75,
|
||||
"med_sale" : 18.50
|
||||
}
|
||||
},
|
||||
{
|
||||
"value" : "[25 TO 50]",
|
||||
"results" : {
|
||||
"max_sale" : 47.55,
|
||||
"med_sale" : 30.33
|
||||
}
|
||||
},
|
||||
{
|
||||
"value" : "[50 TO 75]",
|
||||
"results" : {
|
||||
"max_sale" : 70.25,
|
||||
"med_sale" : 64.54
|
||||
}
|
||||
},
|
||||
{ "..." : "..." }
|
||||
]
|
||||
----
|
||||
|
||||
=== Query Facets
|
||||
|
||||
Query Facets are used to group documents by given set of queries.
|
||||
|
||||
==== Parameters
|
||||
|
||||
`queries`:: The list of queries to facet by.
|
||||
|
||||
[source,json]
|
||||
.Example Query Facet Request
|
||||
----
|
||||
{
|
||||
"type" : "query",
|
||||
"queries" : {
|
||||
"high_quantity" : "quantity:[ 5 TO 14 ] AND price:[ 100 TO * ]",
|
||||
"low_quantity" : "quantity:[ 1 TO 4 ] AND price:[ 100 TO * ]"
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
[source,json]
|
||||
.Example Query Facet Response
|
||||
----
|
||||
[
|
||||
{
|
||||
"value" : "high_quantity",
|
||||
"results" : {
|
||||
"max_sale" : 4.75,
|
||||
"med_sale" : 3.45
|
||||
}
|
||||
},
|
||||
{
|
||||
"value" : "low_quantity",
|
||||
"results" : {
|
||||
"max_sale" : 13.25,
|
||||
"med_sale" : 10.20
|
||||
}
|
||||
}
|
||||
]
|
||||
----
|
|
@ -1,5 +1,5 @@
|
|||
= Searching
|
||||
:page-children: overview-of-searching-in-solr, velocity-search-ui, relevance, query-syntax-and-parsing, json-request-api, faceting, highlighting, spell-checking, query-re-ranking, transforming-result-documents, suggester, morelikethis, pagination-of-results, collapse-and-expand-results, result-grouping, result-clustering, spatial-search, the-terms-component, the-term-vector-component, the-stats-component, the-query-elevation-component, response-writers, near-real-time-searching, realtime-get, exporting-result-sets, streaming-expressions, parallel-sql-interface
|
||||
:page-children: overview-of-searching-in-solr, velocity-search-ui, relevance, query-syntax-and-parsing, json-request-api, faceting, highlighting, spell-checking, query-re-ranking, transforming-result-documents, suggester, morelikethis, pagination-of-results, collapse-and-expand-results, result-grouping, result-clustering, spatial-search, the-terms-component, the-term-vector-component, the-stats-component, the-query-elevation-component, response-writers, near-real-time-searching, realtime-get, exporting-result-sets, streaming-expressions, parallel-sql-interface, analytics
|
||||
// Licensed to the Apache Software Foundation (ASF) under one
|
||||
// or more contributor license agreements. See the NOTICE file
|
||||
// distributed with this work for additional information
|
||||
|
@ -55,3 +55,4 @@ This section describes how Solr works with search requests. It covers the follow
|
|||
* <<exporting-result-sets.adoc#exporting-result-sets,Exporting Result Sets>>: Functionality to export large result sets out of Solr.
|
||||
* <<streaming-expressions.adoc#streaming-expressions,Streaming Expressions>>: A stream processing language for Solr, with a suite of functions to perform many types of queries and parallel execution tasks.
|
||||
* <<parallel-sql-interface.adoc#parallel-sql-interface,Parallel SQL Interface>>: An interface for sending SQL statements to Solr, and using advanced parallel query processing and relational algebra for complex data analysis.
|
||||
* <<analytics.adoc#analytics,The Analytics Component>>: A framework to compute complex analytics over a result set.
|
||||
|
|
Loading…
Reference in New Issue