Solr Ref Guide: update 7.1 statistical function docs

This commit is contained in:
Joel Bernstein 2017-10-17 21:25:13 -04:00
parent 2da777cdb8
commit 7a5733d107
2 changed files with 92 additions and 20 deletions

View File

@ -455,10 +455,81 @@ Returns the following response:
== Setting Variables with let
The `let` function sets variables and runs a streaming expression that references the variables. The `let` function can be used to
write small statistical programs.
The `let` function sets variables and returns the last variable. The output of any statistical function can be set to a variable.
A variable can be set to the output of any streaming expression. Here is a very simple example:
Below is a simple example setting three variables `a`, `b` and `correlation`.
[source,text]
----
let(a=array(1,2,3),
b=array(10, 20, 30),
correlation=corr(a, b))
----
Here is the output:
[source,json]
----
{
"result-set": {
"docs": [
{
"correlation": 1
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
All variables can be output by setting the `echo` variable to `true`.
[source,text]
----
let(echo=true,
a=array(1,2,3),
b=array(10, 20, 30),
correlation=corr(a, b))
----
Here is the output:
[source,json]
----
{
"result-set": {
"docs": [
{
"a": [
1,
2,
3
],
"b": [
10,
20,
30
],
"correlation": 1
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
----
Streaming expressions can also be used inside of a `let` expression in the following ways:
* A variable can be set to the output of any streaming expression.
* A streaming expression can be executed after all variables have been set. The variables can then be referenced by the streaming expression that is executed. The `let` expression will stream the tuples that are emitted by the final streaming expression.
Here is a very simple example:
[source,text]
----

View File

@ -660,8 +660,8 @@ numeric array
== empiricalDistribution
The `empiricalDistribution` function returns a continuous probability distribution function based
on an actual data set (https://en.wikipedia.org/wiki/Empirical_distribution_function). This function is part of the probability distribution framework and is designed to
work with the `sample`, `kolmogorovSmirnov` and `cumulativeProbability` functions.
on an actual data set (https://en.wikipedia.org/wiki/Empirical_distribution_function). This function is part of the probability distribution framework and is
designed to work with the `sample`, `kolmogorovSmirnov` and `cumulativeProbability` functions.
This function is designed to work with continuous data. To build a distribution from
a discrete data set use the `enumeratedDistribution`.
@ -1053,7 +1053,7 @@ The supported distribution functions are:
=== kolmogorovSmirnov Returns
result tuple : A tuple containing the p-value and d-statistic for test result.
result tuple : A tuple containing the p-value and d-statistic for the test result.
=== kolmogorovSmirnov Syntax
@ -1163,7 +1163,7 @@ if(gt(fieldA,fieldB),mod(fieldA,fieldB),mod(fieldB,fieldA)) // if fieldA > field
== monteCarlo
The `monteCarlo` function performs a Monte Carlo simulation (https://en.wikipedia.org/wiki/Monte_Carlo_method)
based on its parameters. The monteCarlo function runs another function a set number of times and returns the results.
based on its parameters. The monteCarlo function runs another function a specified number of times and returns the results.
The function being run typically has one or more variables that are drawn from probability
distributions on each run. The `sample` function is used in the function to draw the samples.
@ -1330,7 +1330,7 @@ or(fieldA,fieldB,fieldC,and(fieldD,fieldE),fieldF)
== poissonDistribution
The `poissonDistribution` function returns a poisson probability distribution (https://en.wikipedia.org/wiki/Poisson_distribution)
based on its parameters. This function is part of the probability distribution framework and is designed to
based on its parameter. This function is part of the probability distribution framework and is designed to
work with the `sample`, `probability` and `cumulativeProbability` functions.
=== poissonDistribution Parameters
@ -1352,7 +1352,7 @@ The `polyFit` function performs polynomial curve fitting (https://en.wikipedia.o
=== polyFit Parameters
* `numeric array` : (Optional) x values. If omitted an sequence will be created for the x values.
* `numeric array` : (Optional) x values. If omitted a sequence will be created for the x values.
* `numeric array` : y values
* `integer` : (Optional) polynomial degree. Defaults to 3.
@ -1363,7 +1363,8 @@ numeric array : curve that was fit to the data points.
=== polyFit Syntax
[source,text]
polyFit(yValues) // This creates the xValues automatically and fits a curve through the data points using a the default 3 degree polynomial.
polyFit(yValues) // This creates the xValues automatically and fits a curve through the data points using the default 3 degree polynomial.
polyFit(yValues, 5) // This creates the xValues automatically and fits a curve through the data points using a 5 degree polynomial.
polyFit(xValues, yValues, 5) // This will fit a curve through the data points using a 5 degree polynomial.
== polyfitDerivative
@ -1372,7 +1373,7 @@ The `polyfitDerivative` function returns the derivative of the curve created by
=== polyfitDerivative Parameters
* `numeric array` : (Optional) x values. If omitted an sequence will be created for the x values.
* `numeric array` : (Optional) x values. If omitted a sequence will be created for the x values.
* `numeric array` : y values
* `integer` : (Optional) polynomial degree. Defaults to 3.
@ -1384,6 +1385,7 @@ numeric array : The curve for the derivative created by the polynomial curve fit
[source,text]
polyfitDerivative(yValues) // This creates the xValues automatically and returns the polyfit derivative
polyfitDerivative(yValues, 5) // This creates the xValues automatically and fits a curve through the data points using a 5 degree polynomial and returns the polyfit derivative.
polyfitDerivative(xValues, yValues, 5) // This will fit a curve through the data points using a 5 degree polynomial and returns the polyfit derivative.
== pow
@ -1443,13 +1445,12 @@ numeric array
== probability
The `probability` function returns the probability of encountering a random variable within a discrete
probability distribution.
The `probability` function returns the probability of a random variable within a discrete probability distribution.
=== probability Parameters
* `discrete probability distribution` : poissonDistribution | binomialDistribution | uniformDistribution | enumeratedDistribution
* `integer` : Value to compute the probability for.
* `integer` : Value of the random variable to compute the probability for.
=== probability Returns
@ -1458,7 +1459,7 @@ double : the probability
=== probability Syntax
[source,text]
probability(poissonDistribution(10), 7) // Returns the probability of encountering a random sample if 7 in a poisson distribution with a mean of 10.
probability(poissonDistribution(10), 7) // Returns the probability of a random sample of 7 in a poisson distribution with a mean of 10.
== rank
@ -1497,7 +1498,7 @@ eq(raw(fieldA), fieldA) // true if the value of fieldA equals the string "fieldA
== regress
The `regress` function performs a simple regression on two numeric arrays.
The `regress` function performs a simple regression of two numeric arrays.
The result of this expression is also used by the `predict` and `residuals` functions.
@ -1516,8 +1517,8 @@ regress(numericArray1, numericArray2)
The `residuals` function takes three parameters: a simple regression model, an array of predictor values
and an array of actual values. The residuals function applies the simple regression model to the
array of predictor values and computes a predictions array. The actual values array is then
subtracted from the predictions array to compute the residuals array.
array of predictor values and computes a predictions array. The predicted values array is then
subtracted from the actual value array to compute the residuals array.
=== residuals Parameters
@ -1580,8 +1581,8 @@ Either a single numeric random sample, or a numeric array depending on the sampl
=== sample Syntax
[source,text]
sample(normalDistribution(50, 5)) // Return a single random sample from a normalDistribution with mean of 50 and standard deviation of 5.
sample(poissonDistribution(5), 1000) // Return 1000 random samples from poissonDistribution with a mean of 5.
sample(poissonDistribution(5)) // Returns a single random sample from a poissonDistribution with mean of 5.
sample(poissonDistribution(5), 1000) // Returns 1000 random samples from poissonDistribution with a mean of 5.
== scale