SOLR-13105 - Visual Guide to Math Expressions (#2227)
* SOLR-13105: The Visual Guide to Streaming Expressions and Math Expressions
|
@ -23,103 +23,82 @@ This section of the math expressions user guide covers computational geometry fu
|
|||
|
||||
A convex hull is the smallest convex set of points that encloses a data set. Math expressions has support for computing
|
||||
the convex hull of a 2D data set. Once a convex hull has been calculated, a set of math expression functions
|
||||
can be applied to geometrically describe the convex hull.
|
||||
can be applied to geometrically describe and visualize the convex hull.
|
||||
|
||||
The `convexHull` function finds the convex hull of an observation matrix of 2D vectors.
|
||||
Each row of the matrix is a 2D observation.
|
||||
=== Visualization
|
||||
|
||||
In the example below a convex hull is calculated for a randomly generated set of 100 2D observations.
|
||||
The `convexHull` function can be used to visualize a border around a
|
||||
set of 2D points. Border visualizations can be useful for understanding where data points are
|
||||
in relation to the border.
|
||||
|
||||
Then the following functions are called on the convex hull:
|
||||
In the examples below the `convexHull` function is used
|
||||
to visualize a border for a set of latitude and longitude points of rat sightings in the NYC311
|
||||
complaints database. An investigation of the border around the rat sightings can be done
|
||||
to better understand how rats may be entering or exiting the specific region.
|
||||
|
||||
-`getBaryCenter`: Returns the 2D point that is the bary center of the convex hull.
|
||||
==== Scatter Plot
|
||||
|
||||
-`getArea`: Returns the area of the convex hull.
|
||||
Before visualizing the convex hull its often useful to visualize the 2D points as a scatter plot.
|
||||
|
||||
-`getBoundarySize`: Returns the boundary size of the convex hull.
|
||||
In this example the `random` function draws a sample of records from the NYC311 (complaints database) collection where
|
||||
the complaint description matches "rat sighting" and the zip code is 11238. The latitude and longitude fields
|
||||
are then vectorized and plotted as a scatter plot with longitude on x-axis and latitude on the
|
||||
y-axis.
|
||||
|
||||
-`getVertices`: Returns a set of 2D points that are the vertices of the convex hull.
|
||||
image::images/math-expressions/convex0.png[]
|
||||
|
||||
Notice from the scatter plot that many of the points appear to lie near the border of the plot.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="baryCenter, area, boundarySize, vertices",
|
||||
x=sample(normalDistribution(0, 20), 100),
|
||||
y=sample(normalDistribution(0, 10), 100),
|
||||
observations=transpose(matrix(x,y)),
|
||||
chull=convexHull(observations),
|
||||
baryCenter=getBaryCenter(chull),
|
||||
area=getArea(chull),
|
||||
boundarySize=getBoundarySize(chull),
|
||||
vertices=getVertices(chull))
|
||||
----
|
||||
==== Convex Hull Plot
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
The `convexHull` function can be used to visualize the border. The example uses the same points
|
||||
drawn from the NYC311 database. But instead of plotting the points directly the latitude and
|
||||
longitude points are added as rows to a matrix. The matrix is then transposed with `transpose`
|
||||
function so that each row of the matrix contains a single latitude and longitude point.
|
||||
|
||||
The `convexHull` function is then used calculate the convex hull for the matrix of points.
|
||||
The convex hull is set a variable called `hull`.
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"baryCenter": [
|
||||
-3.0969292101230343,
|
||||
1.2160948182691975
|
||||
],
|
||||
"area": 3477.480599967595,
|
||||
"boundarySize": 267.52419019533664,
|
||||
"vertices": [
|
||||
[
|
||||
-66.17632818958485,
|
||||
-8.394931552315256
|
||||
],
|
||||
[
|
||||
-47.556667594765216,
|
||||
-16.940434013651263
|
||||
],
|
||||
[
|
||||
-33.13582183446102,
|
||||
-17.30914425443977
|
||||
],
|
||||
[
|
||||
-9.97459859015698,
|
||||
-17.795012801599654
|
||||
],
|
||||
[
|
||||
27.7705917246824,
|
||||
-14.487224686587767
|
||||
],
|
||||
[
|
||||
54.689432954170236,
|
||||
-1.3333371984299605
|
||||
],
|
||||
[
|
||||
35.97568654458672,
|
||||
23.054169251772556
|
||||
],
|
||||
[
|
||||
-15.539456215337585,
|
||||
19.811330468093704
|
||||
],
|
||||
[
|
||||
-17.05125031092752,
|
||||
19.53581741341663
|
||||
],
|
||||
[
|
||||
-35.92010024412891,
|
||||
15.126430698395572
|
||||
]
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
Once the convex hull has been created the `getVertices` function can be used to
|
||||
retrieve the matrix of points in the scatter plot that comprise the convex border around the scatter plot.
|
||||
The `colAt` function can then be used to retrieve the latitude and longitude vectors from the matrix
|
||||
so they can visualized by the `zplot` function. In the example below the convex hull points are
|
||||
visualized as a scatter plot.
|
||||
|
||||
image::images/math-expressions/hullplot.png[]
|
||||
|
||||
Notice that the 15 points in the scatter plot describe that latitude and longitude points of the
|
||||
convex hull.
|
||||
|
||||
==== Projecting and Clustering
|
||||
|
||||
The once a convex hull as been calculated the `projectToBorder` can then be used to project
|
||||
points to the nearest point on the border. In the example below the `projectToBorder` function
|
||||
is used to project the original scatter scatter plot points to the nearest border.
|
||||
|
||||
The `projectToBorder` function returns a matrix of lat/lon points for the border projections. In
|
||||
the example the matrix of border points is then clustered into 7 clusters using kmeans clustering.
|
||||
The `zplot` function is then used to plot the clustered border points.
|
||||
|
||||
image::images/math-expressions/convex1.png[]
|
||||
|
||||
Notice in the visualization its easy to see which spots along the border have the highest
|
||||
density of points. In the case or the rat sightings this information is useful in understanding
|
||||
which border points are closest for the rats to enter or exit from.
|
||||
|
||||
==== Plotting the Centroids
|
||||
|
||||
Once the border points have been clustered its very easy to extract the centroids of the clusters
|
||||
and plot them on a map. The example below extracts the centroids from the clusters using the
|
||||
`getCentroids` function. `getCentroids` returns the matrix of lat/lon points which represent
|
||||
the centroids of border clusters. The `colAt` function can then be used to extract the lat/lon
|
||||
vectors so they can be plotted on a map using `zplot`.
|
||||
|
||||
image::images/math-expressions/convex2.png[]
|
||||
|
||||
The map above shows the centroids of the border clusters. The centroids from the highest
|
||||
density clusters can now be zoomed and investigated geo-spatially to determine what might be
|
||||
the best places to begin an investigation of the border.
|
||||
|
||||
== Enclosing Disk
|
||||
|
||||
|
@ -131,11 +110,11 @@ In the example below an enclosing disk is calculated for a randomly generated se
|
|||
|
||||
Then the following functions are called on the enclosing disk:
|
||||
|
||||
-`getCenter`: Returns the 2D point that is the center of the disk.
|
||||
* `getCenter`: Returns the 2D point that is the center of the disk.
|
||||
|
||||
-`getRadius`: Returns the radius of the disk.
|
||||
* `getRadius`: Returns the radius of the disk.
|
||||
|
||||
-`getSupportPoints`: Returns the support points of the disk.
|
||||
* `getSupportPoints`: Returns the support points of the disk.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
These functions support constructing a curve.
|
||||
These functions support constructing a curve through bivariate non-linear data.
|
||||
|
||||
== Polynomial Curve Fitting
|
||||
|
||||
|
@ -25,201 +25,86 @@ the non-linear relationship between two random variables.
|
|||
|
||||
The `polyfit` function is passed x- and y-axes and fits a smooth curve to the data.
|
||||
If only a single array is provided it is treated as the y-axis and a sequence is generated
|
||||
for the x-axis.
|
||||
|
||||
The `polyfit` function also has a parameter the specifies the degree of the polynomial. The higher
|
||||
for the x-axis. A third parameter can be added that specifies the degree of the polynomial. If the degree is
|
||||
not provided a 3 degree polynomial is used by default. The higher
|
||||
the degree the more curves that can be modeled.
|
||||
|
||||
The example below uses the `polyfit` function to fit a curve to an array using
|
||||
a 3 degree polynomial. The fitted curve is then subtracted from the original curve. The output
|
||||
shows the error between the fitted curve and the original curve, known as the residuals.
|
||||
The output also includes the sum-of-squares of the residuals which provides a measure
|
||||
of how large the error is.
|
||||
The `polyfit` function can be visualized in a similar manner to linear regression with
|
||||
Zeppelin-Solr.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="residuals, sumSqError",
|
||||
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
|
||||
curve=polyfit(y, 3),
|
||||
residuals=ebeSubtract(y, curve),
|
||||
sumSqError=sumSq(residuals))
|
||||
----
|
||||
The example below uses the `polyfit` function to fit a non-linear curve to a scatter
|
||||
plot of a random sample. The blue points are the scatter plot of the original observations and the red points
|
||||
are the predicted curve.
|
||||
|
||||
When this expression is sent to the `/stream` handler it
|
||||
responds with:
|
||||
image::images/math-expressions/polyfit.png[]
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"residuals": [
|
||||
0.5886274509803899,
|
||||
-0.0746078431372561,
|
||||
-0.49492135315664765,
|
||||
-0.6689571213100631,
|
||||
-0.5933591898297781,
|
||||
0.4352283990519288,
|
||||
0.32016160310277897,
|
||||
1.1647963800904968,
|
||||
0.272488687782805,
|
||||
-0.3534055160525744,
|
||||
0.2904697263520779,
|
||||
-0.7925296272355089,
|
||||
-0.5990476190476182,
|
||||
-0.12572829131652274,
|
||||
0.6307843137254909
|
||||
],
|
||||
"sumSqError": 4.7294282482223595
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
In the example above a random sample containing two fields, `filesize_d`
|
||||
and `response_d`, is drawn from the `logs` collection.
|
||||
The two fields are vectorized and set to the variables `x` and `y`.
|
||||
|
||||
In the next example the curve is fit using a 5 degree polynomial. Notice that the curve
|
||||
is fit closer, shown by the smaller residuals and lower value for the sum-of-squares of the
|
||||
residuals. This is because the higher polynomial produced a closer fit.
|
||||
Then the `polyfit` function is used to fit a non-linear model to the data using a 5 degree
|
||||
polynomial. The `polyfit` function returns a model that is then directly plotted
|
||||
by `zplot` along with the original observations.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="residuals, sumSqError",
|
||||
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
|
||||
curve=polyfit(y, 5),
|
||||
residuals=ebeSubtract(y, curve),
|
||||
sumSqError=sumSq(residuals))
|
||||
----
|
||||
The fitted model can also be used
|
||||
by the `predict` function in the same manner as linear regression. The example below
|
||||
uses the fitted model to predict a response time for a file size of 42000.
|
||||
|
||||
When this expression is sent to the `/stream` handler it
|
||||
responds with:
|
||||
image::images/math-expressions/polyfit-predict.png[]
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"residuals": [
|
||||
-0.12337461300309674,
|
||||
0.22708978328173413,
|
||||
0.12266015718028167,
|
||||
-0.16502738747320755,
|
||||
-0.41142804563857105,
|
||||
0.2603044014808713,
|
||||
-0.12128970101106162,
|
||||
0.6234168308471704,
|
||||
-0.1754692675745293,
|
||||
-0.5379689969473249,
|
||||
0.4651616185671843,
|
||||
-0.288175756132409,
|
||||
0.027970945463215102,
|
||||
0.18699690402476687,
|
||||
-0.09086687306501587
|
||||
],
|
||||
"sumSqError": 1.413089480179252
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
If an array of predictor values is provided an array of predictions will be returned.
|
||||
|
||||
The `polyfit` model performs both *interpolation* and *extrapolation*,
|
||||
which means that it can predict results both within the bounds of the data set
|
||||
and beyond the bounds.
|
||||
|
||||
=== Residuals
|
||||
|
||||
The residuals can be calculated and visualized in the same manner as linear
|
||||
regression as well. In the example below the `ebeSubtract` function is used
|
||||
to subtract the fitted model from the observed values, to
|
||||
calculate a vector of residuals. The residuals are then plotted in a *residual plot*
|
||||
with the predictions along the x-axis and the model error on the y-axis.
|
||||
|
||||
image::images/math-expressions/polyfit-resid.png[]
|
||||
|
||||
|
||||
=== Prediction, Derivatives and Integrals
|
||||
== Gaussian Curve Fitting
|
||||
|
||||
The `polyfit` function returns a function that can be used with the `predict`
|
||||
function.
|
||||
The `gaussfit` function fits a smooth curve through a Gaussian peak. The `gaussfit`
|
||||
function takes an x- and y-axis and fits a smooth gaussian curve to the data. If
|
||||
only one vector of numbers is passed, `gaussfit` will treat it as the y-axis
|
||||
and will generate a sequence for the x-axis.
|
||||
|
||||
In the example below the x-axis is included for clarity.
|
||||
The `polyfit` function returns a function for the fitted curve.
|
||||
The `predict` function is then used to predict a value along the curve, in this
|
||||
case the prediction is made for the *`x`* value of 5.
|
||||
One of the interesting use cases for `gaussfit` is to visualize how well a regression
|
||||
model's residuals fit a normal distribution.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
|
||||
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
|
||||
curve=polyfit(x, y, 5),
|
||||
p=predict(curve, 5))
|
||||
----
|
||||
One of the characteristics of a well-fit regression model is that its residuals will ideally fit a normal distribution.
|
||||
We can
|
||||
test this by building a histogram of the residuals and then fitting a gaussian curve to the curve of the histogram.
|
||||
|
||||
When this expression is sent to the `/stream` handler it
|
||||
responds with:
|
||||
In the example below the residuals from a `polyfit` regression are modeled with the
|
||||
`hist` function to return a histogram with 32 bins. The `hist` function returns
|
||||
a list of tuples with statistics about each bin. In the example the `col` function is
|
||||
used to return a vector with the `N` column for each bin, which is the count of
|
||||
observations in the
|
||||
bin. If the residuals are normally distributed we would expect the bin counts
|
||||
to roughly follow a gaussian curve.
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"p": 5.439695598519129
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
The bin count vector is then passed to `gaussfit` as the y-axis. `gaussfit` generates
|
||||
a sequence for the x-axis and then fits the gaussian curve to data.
|
||||
|
||||
The `derivative` and `integrate` functions can be used to compute the derivative
|
||||
and integrals for the fitted
|
||||
curve. The example below demonstrates how to compute a derivative
|
||||
for the fitted curve.
|
||||
`zplot` is then used to plot the original bin counts and the fitted curve. In the
|
||||
example below, the blue line is the bin counts, and the smooth yellow line is the
|
||||
fitted curve. We can see that the binned residuals fit fairly well to a normal
|
||||
distribution.
|
||||
|
||||
image::images/math-expressions/gaussfit.png[]
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14),
|
||||
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
|
||||
curve=polyfit(x, y, 5),
|
||||
d=derivative(curve))
|
||||
----
|
||||
The second plot shows the two curves overlaid with an area chart:
|
||||
|
||||
When this expression is sent to the `/stream` handler it
|
||||
responds with:
|
||||
image::images/math-expressions/gaussfit2.png[]
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"d": [
|
||||
0.3198918573686361,
|
||||
0.9261492094077225,
|
||||
1.2374272373653175,
|
||||
1.30051359631081,
|
||||
1.1628032287629813,
|
||||
0.8722983646900058,
|
||||
0.47760852150945,
|
||||
0.02795050408827482,
|
||||
-0.42685159525716865,
|
||||
-0.8363663967611356,
|
||||
-1.1495552332084857,
|
||||
-1.3147721499346892,
|
||||
-1.2797639048258267,
|
||||
-0.9916699683185771,
|
||||
-0.3970225234002308
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Harmonic Curve Fitting
|
||||
|
||||
|
@ -232,169 +117,19 @@ The example below shows `harmfit` fitting a single oscillation of a sine wave. T
|
|||
returns the smoothed values at each control point. The return value is also a model which can be used by
|
||||
the `predict`, `derivative` and `integrate` functions.
|
||||
|
||||
There are also three helper functions that can be used to retrieve the estimated parameters of the fitted model:
|
||||
|
||||
* `getAmplitude`: Returns the amplitude of the sine wave.
|
||||
* `getAngularFrequency`: Returns the angular frequency of the sine wave.
|
||||
* `getPhase`: Returns the phase of the sine wave.
|
||||
|
||||
NOTE: The `harmfit` function works best when run on a single oscillation rather than a long sequence of
|
||||
oscillations. This is particularly true if the sine wave has noise. After the curve has been fit it can be
|
||||
extrapolated to any point in time in the past or future.
|
||||
|
||||
In the example below the `harmfit` function fits control points, provided as x and y axes, and then the
|
||||
angular frequency, phase and amplitude are retrieved from the fitted model.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="freq, phase, amp",
|
||||
x=array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19),
|
||||
y=array(-0.7441113653915925,-0.8997532112139415, -0.9853140681578838, -0.9941296760805463,
|
||||
-0.9255133950087844, -0.7848096869247675, -0.5829778403072583, -0.33573836075915076,
|
||||
-0.06234851460699166, 0.215897602691855, 0.47732764497752245, 0.701579055431586,
|
||||
0.8711850882773975, 0.9729352782968976, 0.9989043923858761, 0.9470697190130273,
|
||||
0.8214686154479715, 0.631884041542757, 0.39308257356494, 0.12366424851680227),
|
||||
model=harmfit(x, y),
|
||||
freq=getAngularFrequency(model),
|
||||
phase=getPhase(model),
|
||||
amp=getAmplitude(model))
|
||||
----
|
||||
In the example below the original control points are shown in blue and the fitted curve is shown in yellow.
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"freq": 0.28,
|
||||
"phase": 2.4100000000000006,
|
||||
"amp": 0.9999999999999999
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
=== Interpolation and Extrapolation
|
||||
|
||||
The `harmfit` function returns a fitted model of the sine wave that can used by the `predict` function to
|
||||
interpolate or extrapolate the sine wave.
|
||||
|
||||
The example below uses the fitted model to extrapolate the sine wave beyond the control points
|
||||
to the x-axis points 20, 21, 22, 23.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(x=array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19),
|
||||
y=array(-0.7441113653915925,-0.8997532112139415, -0.9853140681578838, -0.9941296760805463,
|
||||
-0.9255133950087844, -0.7848096869247675, -0.5829778403072583, -0.33573836075915076,
|
||||
-0.06234851460699166, 0.215897602691855, 0.47732764497752245, 0.701579055431586,
|
||||
0.8711850882773975, 0.9729352782968976, 0.9989043923858761, 0.9470697190130273,
|
||||
0.8214686154479715, 0.631884041542757, 0.39308257356494, 0.12366424851680227),
|
||||
model=harmfit(x, y),
|
||||
extrapolation=predict(model, array(20, 21, 22, 23)))
|
||||
----
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"extrapolation": [
|
||||
-0.1553861764415666,
|
||||
-0.42233370833176975,
|
||||
-0.656386037906838,
|
||||
-0.8393130343914845
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Gaussian Curve Fitting
|
||||
|
||||
The `gaussfit` function fits a smooth curve through a Gaussian peak.
|
||||
This is shown in the example below.
|
||||
image::images/math-expressions/harmfit.png[]
|
||||
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(x=array(0,1,2,3,4,5,6,7,8,9, 10),
|
||||
y=array(4,55,1200,3028,12000,18422,13328,6426,1696,239,20),
|
||||
f=gaussfit(x, y))
|
||||
----
|
||||
The output of `harmfit` is a model that can be used by the `predict` function to interpolate and extrapolate
|
||||
the sine wave. In the example below the `natural` function creates an x-axis from 0 to 127
|
||||
used to predict results for the model. This extrapolates the sine wave out to 128 points, when
|
||||
the original model curve had only 19 control points.
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"f": [
|
||||
2.81764431935644,
|
||||
61.157417979413424,
|
||||
684.2328985468831,
|
||||
3945.9411154167447,
|
||||
11729.758936952656,
|
||||
17972.951897338007,
|
||||
14195.201949425435,
|
||||
5779.03836032222,
|
||||
1212.7224502169634,
|
||||
131.17742331530349,
|
||||
7.3138931735866946
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
Like the `polyfit` function, the `gaussfit` function returns a function that can
|
||||
be used directly by the `predict`, `derivative` and `integrate` functions.
|
||||
|
||||
The example below demonstrates how to compute an integral for a fitted Gaussian curve.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(x=array(0,1,2,3,4,5,6,7,8,9, 10),
|
||||
y=array(4,55,1200,3028,12000,18422,13328,6426,1696,239,20),
|
||||
f=gaussfit(x, y),
|
||||
i=integrate(f, 0, 5))
|
||||
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it
|
||||
responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"i": 25261.666789766092
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
image::images/math-expressions/harmfit2.png[]
|
||||
|
|
|
@ -19,438 +19,77 @@
|
|||
This section of the user guide explores functions that are commonly used in the field of
|
||||
Digital Signal Processing (DSP).
|
||||
|
||||
== Dot Product
|
||||
|
||||
The `dotProduct` function is used to calculate the dot product of two numeric arrays.
|
||||
The dot product is a fundamental calculation for the DSP functions discussed in this section. Before diving into
|
||||
the more advanced DSP functions its useful to develop a deeper intuition of the dot product.
|
||||
|
||||
The dot product operation is performed in two steps:
|
||||
|
||||
. Element-by-element multiplication of two vectors which produces a vector of products.
|
||||
|
||||
. Sum the vector of products to produce a scalar result.
|
||||
|
||||
This simple bit of math has a number of important applications.
|
||||
|
||||
=== Representing Linear Combinations
|
||||
|
||||
The `dotProduct` performs the math of a _linear combination_. A linear combination has the following form:
|
||||
|
||||
[source,text]
|
||||
----
|
||||
(a1*v1)+(a2*v2)...
|
||||
----
|
||||
|
||||
In the above example `a1` and `a2` are random variables that change. `v1` and `v2` are constant values.
|
||||
|
||||
When computing the dot product the elements of two vectors are multiplied together and the results are added.
|
||||
If the first vector contains random variables and the second vector contains constant values
|
||||
then the dot product is performing a linear combination.
|
||||
|
||||
This scenario comes up again and again in machine learning. For example both linear and logistic regression
|
||||
solve for a vector of constant weights. In order to perform a prediction, a dot product is calculated
|
||||
between a random observation vector and the constant weight vector. That dot product is a linear combination because
|
||||
one of the vectors holds constant weights.
|
||||
|
||||
Lets look at simple example of how a linear combination can be used to find the mean of a vector of numbers.
|
||||
|
||||
In the example below two arrays are set to variables *`a`* and *`b`* and then operated on by the `dotProduct` function.
|
||||
The output of the `dotProduct` function is set to variable *`c`*.
|
||||
|
||||
The `mean` function is then used to compute the mean of the first array which is set to the variable *`d`*.
|
||||
|
||||
Both the dot product and the mean are included in the output.
|
||||
|
||||
When we look at the output of this expression we see that the dot product and the mean of the first array
|
||||
are both 30.
|
||||
|
||||
The `dotProduct` function calculated the mean of the first array.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="c, d",
|
||||
a=array(10, 20, 30, 40, 50),
|
||||
b=array(.2, .2, .2, .2, .2),
|
||||
c=dotProduct(a, b),
|
||||
d=mean(a))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": 30,
|
||||
"d": 30
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
To get a better understanding of how the dot product calculated the mean we can perform the steps of the
|
||||
calculation using vector math and look at the output of each step.
|
||||
|
||||
In the example below the `ebeMultiply` function performs an element-by-element multiplication of
|
||||
two arrays. This is the first step of the dot product calculation. The result of the element-by-element
|
||||
multiplication is assigned to variable *`c`*.
|
||||
|
||||
In the next step the `add` function adds all the elements of the array in variable *`c`*.
|
||||
|
||||
Notice that multiplying each element of the first array by .2 and then adding the results is
|
||||
equivalent to the formula for computing the mean of the first array. The formula for computing the mean
|
||||
of an array is to add all the elements and divide by the number of elements.
|
||||
|
||||
The output includes the output of both the `ebeMultiply` function and the `add` function.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="c, d",
|
||||
a=array(10, 20, 30, 40, 50),
|
||||
b=array(.2, .2, .2, .2, .2),
|
||||
c=ebeMultiply(a, b),
|
||||
d=add(c))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": [
|
||||
2,
|
||||
4,
|
||||
6,
|
||||
8,
|
||||
10
|
||||
],
|
||||
"d": 30
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
In the example above two arrays were combined in a way that produced the mean of the first. In the second array
|
||||
each value was set to .2. Another way of looking at this is that each value in the second array is
|
||||
applying the same weight to the values in the first array.
|
||||
By varying the weights in the second array we can produce a different result.
|
||||
For example if the first array represents a time series,
|
||||
the weights in the second array can be set to add more weight to a particular element in the first array.
|
||||
|
||||
The example below creates a weighted average with the weight decreasing from right to left.
|
||||
Notice that the weighted mean
|
||||
of 36.666 is larger than the previous mean which was 30. This is because more weight was given to last element in the
|
||||
array.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="c, d",
|
||||
a=array(10, 20, 30, 40, 50),
|
||||
b=array(.066666666666666,.133333333333333,.2, .266666666666666, .33333333333333),
|
||||
c=ebeMultiply(a, b),
|
||||
d=add(c))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": [
|
||||
0.66666666666666,
|
||||
2.66666666666666,
|
||||
6,
|
||||
10.66666666666664,
|
||||
16.6666666666665
|
||||
],
|
||||
"d": 36.66666666666646
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
=== Representing Correlation
|
||||
|
||||
Often when we think of correlation, we are thinking of _Pearson correlation_ in the field of statistics. But the definition of
|
||||
correlation is actually more general: a mutual relationship or connection between two or more things.
|
||||
In the field of digital signal processing the dot product is used to represent correlation. The examples below demonstrates
|
||||
how the dot product can be used to represent correlation.
|
||||
|
||||
In the example below the dot product is computed for two vectors. Notice that the vectors have different values that fluctuate
|
||||
together. The output of the dot product is 190, which is hard to reason about because it's not scaled.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="c, d",
|
||||
a=array(10, 20, 30, 20, 10),
|
||||
b=array(1, 2, 3, 2, 1),
|
||||
c=dotProduct(a, b))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": 190
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
One approach to scaling the dot product is to first scale the vectors so that both vectors have a magnitude of 1. Vectors with a
|
||||
magnitude of 1, also called unit vectors, are used when comparing only the angle between vectors rather than the magnitude.
|
||||
The `unitize` function can be used to unitize the vectors before calculating the dot product.
|
||||
|
||||
Notice in the example below the dot product result, set to variable *`e`*, is effectively 1. When applied to unit vectors the dot product
|
||||
will be scaled between 1 and -1. Also notice in the example `cosineSimilarity` is calculated on the unscaled vectors and the
|
||||
answer is also effectively 1. This is because cosine similarity is a scaled dot product.
|
||||
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="e, f",
|
||||
a=array(10, 20, 30, 20, 10),
|
||||
b=array(1, 2, 3, 2, 1),
|
||||
c=unitize(a),
|
||||
d=unitize(b),
|
||||
e=dotProduct(c, d),
|
||||
f=cosineSimilarity(a, b))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"e": 0.9999999999999998,
|
||||
"f": 0.9999999999999999
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
If we transpose the first two numbers in the first array, so that the vectors
|
||||
are not perfectly correlated, we see that the cosine similarity drops. This illustrates
|
||||
how the dot product represents correlation.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(echo="c, d",
|
||||
a=array(20, 10, 30, 20, 10),
|
||||
b=array(1, 2, 3, 2, 1),
|
||||
c=cosineSimilarity(a, b))
|
||||
----
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": 0.9473684210526314
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Convolution
|
||||
|
||||
The `conv` function calculates the convolution of two vectors. The convolution is calculated by reversing
|
||||
The `conv` function calculates the convolution of two vectors. The convolution is calculated by *reversing*
|
||||
the second vector and sliding it across the first vector. The dot product of the two vectors
|
||||
is calculated at each point as the second vector is slid across the first vector.
|
||||
The dot products are collected in a third vector which is the convolution of the two vectors.
|
||||
|
||||
=== Moving Average Function
|
||||
|
||||
Before looking at an example of convolution its useful to review the `movingAvg` function. The moving average
|
||||
Before looking at an example of convolution it's useful to review the `movingAvg` function. The moving average
|
||||
function computes a moving average by sliding a window across a vector and computing
|
||||
the average of the window at each shift. If that sounds similar to convolution, that's because the `movingAvg` function
|
||||
is syntactic sugar for convolution.
|
||||
the average of the window at each shift. If that sounds similar to convolution, that's because the `movingAvg`
|
||||
function involves a sliding window approach similar to convolution.
|
||||
|
||||
Below is an example of a moving average with a window size of 5. Notice that original vector has 13 elements
|
||||
Below is an example of a moving average with a window size of 5. Notice that the original vector has 13 elements
|
||||
but the result of the moving average has only 9 elements. This is because the `movingAvg` function
|
||||
only begins generating results when it has a full window. In this case because the window size is 5 so the
|
||||
moving average starts generating results from the 4^th^ index of the original array.
|
||||
only begins generating results when it has a full window. The `ltrim` function is used to trim the
|
||||
first four elements from the original `y` array to line up with the moving average.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=array(1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
b=movingAvg(a, 5))
|
||||
----
|
||||
image::images/math-expressions/conv1.png[]
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"b": [
|
||||
3,
|
||||
4,
|
||||
5,
|
||||
5.6,
|
||||
5.8,
|
||||
5.6,
|
||||
5,
|
||||
4,
|
||||
3
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
=== Convolutional Smoothing
|
||||
|
||||
The moving average can also be computed using convolution. In the example
|
||||
below the `conv` function is used to compute the moving average of the first array
|
||||
by applying the second array as the filter.
|
||||
by applying the second array as a filter.
|
||||
|
||||
Looking at the result, we see that it is not exactly the same as the result
|
||||
of the `movingAvg` function. That is because the `conv` pads zeros
|
||||
Looking at the result, we see that the convolution produced an array with 17 values instead of the 9 values created by the
|
||||
moving average. That is because the `conv` function pads zeros
|
||||
to the front and back of the first vector so that the window size is always full.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=array(1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
b=array(.2, .2, .2, .2, .2),
|
||||
c=conv(a, b))
|
||||
----
|
||||
image::images/math-expressions/conv2.png[]
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
We achieve the same result as the `movingAvg` function by trimming the first and last 4 values of
|
||||
the convolution result using the `ltrim` and `rtrim` functions.
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": [
|
||||
0.2,
|
||||
0.6000000000000001,
|
||||
1.2,
|
||||
2.0000000000000004,
|
||||
3.0000000000000004,
|
||||
4,
|
||||
5,
|
||||
5.6000000000000005,
|
||||
5.800000000000001,
|
||||
5.6000000000000005,
|
||||
5.000000000000001,
|
||||
4,
|
||||
3,
|
||||
2,
|
||||
1.2000000000000002,
|
||||
0.6000000000000001,
|
||||
0.2
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
The example below plots both the trimmed convolution and the moving average on the same plot. Notice that
|
||||
they perfectly overlap.
|
||||
|
||||
We achieve the same result as the `movingAvg` function by using the `copyOfRange` function to copy a range of
|
||||
the result that drops the first and last 4 values of
|
||||
the convolution result. In the example below the `precision` function is also also used to remove floating point errors from the
|
||||
convolution result. When this is added the output is exactly the same as the `movingAvg` function.
|
||||
image::images/math-expressions/conv3.png[]
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=array(1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
b=array(.2, .2, .2, .2, .2),
|
||||
c=conv(a, b),
|
||||
d=copyOfRange(c, 4, 13),
|
||||
e=precision(d, 2))
|
||||
----
|
||||
This demonstrates how convolution can be used to smooth a signal by sliding a filter across the signal and
|
||||
computing the dot product at each point. The smoothing effect is caused by the design of the filter.
|
||||
In the example, the filter length is 5 and each value in the filter is .2. This filter calculates a
|
||||
simple moving average with a window size of 5.
|
||||
|
||||
The formula for computing a simple moving average using convolution is to make the filter length the window
|
||||
size and make the values of the filter all the same and sum to 1. A moving average with a window size of 4
|
||||
can be computed by changing the filter to a length of 4 with each value being .25.
|
||||
|
||||
==== Changing the Weights
|
||||
|
||||
The filter, which is sometimes called the *kernel*, can be viewed as a vector of weights. In the initial
|
||||
example all values in the filter have the same weight (.2). The weights in the filter can be changed to
|
||||
produce different smoothing effects. This is demonstrated in the example below.
|
||||
|
||||
In this example the filter increases in weight from .1 to .3. This places more weight towards the front
|
||||
of the filter. Notice that the filter is reversed with the `rev` function before the `conv` function applies it.
|
||||
This is done because convolution will reverse
|
||||
the filter. In this case we reverse it ahead of time and when convolution reverses it back, it is the same
|
||||
as the original filter.
|
||||
|
||||
The plot shows the effect of the different weights in the filter. The dark blue line is the initial array.
|
||||
The light blue line is the convolution and the orange line is the moving average. Notice that the convolution
|
||||
responds quicker to the movements in the underlying array. This is because more weight has been placed
|
||||
at the front of the filter.
|
||||
|
||||
image::images/math-expressions/conv4.png[]
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"e": [
|
||||
3,
|
||||
4,
|
||||
5,
|
||||
5.6,
|
||||
5.8,
|
||||
5.6,
|
||||
5,
|
||||
4,
|
||||
3
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Cross-Correlation
|
||||
|
||||
|
@ -467,54 +106,8 @@ rather than the convolution calculation.
|
|||
|
||||
Notice in the result the highest value is 217. This is the point where the two vectors have the highest correlation.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=array(1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
b=array(4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
c=conv(a, rev(b)))
|
||||
----
|
||||
image::images/math-expressions/crosscorr.png[]
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": [
|
||||
1,
|
||||
4,
|
||||
10,
|
||||
20,
|
||||
35,
|
||||
56,
|
||||
84,
|
||||
116,
|
||||
149,
|
||||
180,
|
||||
203,
|
||||
216,
|
||||
217,
|
||||
204,
|
||||
180,
|
||||
148,
|
||||
111,
|
||||
78,
|
||||
50,
|
||||
28,
|
||||
13,
|
||||
4
|
||||
]
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Find Delay
|
||||
|
||||
|
@ -525,67 +118,29 @@ and then computes the delay between the two signals.
|
|||
Below is an example of the `finddelay` function. Notice that the `finddelay` function reports a 3 period delay between the first
|
||||
and second signal.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=array(1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
b=array(4, 5, 6, 7, 6, 5, 4, 3, 2, 1),
|
||||
c=finddelay(a, b))
|
||||
----
|
||||
image::images/math-expressions/delay.png[]
|
||||
|
||||
When this expression is sent to the `/stream` handler it responds with:
|
||||
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"result-set": {
|
||||
"docs": [
|
||||
{
|
||||
"c": 3
|
||||
},
|
||||
{
|
||||
"EOF": true,
|
||||
"RESPONSE_TIME": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
----
|
||||
|
||||
== Oscillate (Sine Wave)
|
||||
|
||||
The `oscillate` function generates a periodic oscillating signal which can be used to model and study sine waves.
|
||||
|
||||
The `oscillate` function takes three parameters: *amplitude*, *angular frequency*
|
||||
and *phase* and returns a vector containing the y-axis points of a sine wave.
|
||||
The `oscillate` function takes three parameters: `amplitude`, `angular frequency`, and `phase` and returns a vector containing the y-axis points of a sine wave.
|
||||
|
||||
The y-axis points were generated from an x-axis sequence of 0-127.
|
||||
|
||||
Below is an example of the `oscillate` function called with an amplitude of
|
||||
1, and angular frequency of .28 and phase of 1.57.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
oscillate(1, 0.28, 1.57)
|
||||
----
|
||||
|
||||
The result of the `oscillate` function is plotted below:
|
||||
|
||||
image::images/math-expressions/sinewave.png[]
|
||||
|
||||
=== Sine Wave Interpolation, Extrapolation
|
||||
=== Sine Wave Interpolation & Extrapolation
|
||||
|
||||
The `oscillate` function returns a function which can be used by the `predict` function to interpolate or extrapolate a sine wave.
|
||||
|
||||
The example below extrapolates the sine wave to an x-axis sequence of 0-256.
|
||||
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)))
|
||||
----
|
||||
|
||||
The extrapolated sine wave is plotted below:
|
||||
|
||||
image::images/math-expressions/sinewave256.png[]
|
||||
|
||||
|
||||
|
@ -599,11 +154,6 @@ A few examples, with plots, will help to understand the concepts.
|
|||
The first example simply revisits the example above of an extrapolated sine wave. The result of this
|
||||
is plotted in the image below. Notice that there is a structure to the plot that is clearly not random.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)))
|
||||
----
|
||||
|
||||
image::images/math-expressions/sinewave256.png[]
|
||||
|
||||
|
@ -612,11 +162,6 @@ In the next example the `sample` function is used to draw 256 samples from a `un
|
|||
vector of random data. The result of this is plotted in the image below. Notice that there is no clear structure to the
|
||||
data and the data appears to be random.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
sample(uniformDistribution(-1.5, 1.5), 256)
|
||||
----
|
||||
|
||||
image::images/math-expressions/noise.png[]
|
||||
|
||||
|
||||
|
@ -625,13 +170,6 @@ The result of this is plotted in the image below. Notice that the sine wave has
|
|||
somewhat within the noise. Its difficult to say for sure if there is structure. As plots
|
||||
becomes more dense it can become harder to see a pattern hidden within noise.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)),
|
||||
c=sample(uniformDistribution(-1.5, 1.5), 256),
|
||||
d=ebeAdd(b,c))
|
||||
----
|
||||
|
||||
image::images/math-expressions/hidden-signal.png[]
|
||||
|
||||
|
@ -649,12 +187,6 @@ intensity as the sine wave slides farther away from being directly lined up.
|
|||
|
||||
This is the autocorrelation plot of a pure signal.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)),
|
||||
c=conv(b, rev(b)))
|
||||
----
|
||||
|
||||
image::images/math-expressions/signal-autocorrelation.png[]
|
||||
|
||||
|
@ -666,11 +198,6 @@ This is followed by another long period of low intensity correlation.
|
|||
|
||||
This is the autocorrelation plot of pure noise.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=sample(uniformDistribution(-1.5, 1.5), 256),
|
||||
b=conv(a, rev(a)),
|
||||
----
|
||||
|
||||
image::images/math-expressions/noise-autocorrelation.png[]
|
||||
|
||||
|
@ -680,25 +207,17 @@ Notice that this plot shows very clear signs of structure which is similar to au
|
|||
pure signal. The correlation is less intense due to noise but the shape of the correlation plot suggests
|
||||
strongly that there is an underlying signal hidden within the noise.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)),
|
||||
c=sample(uniformDistribution(-1.5, 1.5), 256),
|
||||
d=ebeAdd(b, c),
|
||||
e=conv(d, rev(d)))
|
||||
----
|
||||
|
||||
image::images/math-expressions/hidden-signal-autocorrelation.png[]
|
||||
|
||||
|
||||
== Discrete Fourier Transform
|
||||
|
||||
The convolution based functions described above are operating on signals in the time domain. In the time
|
||||
domain the X axis is time and the Y axis is the quantity of some value at a specific point in time.
|
||||
The convolution-based functions described above are operating on signals in the time domain. In the time
|
||||
domain the x-axis is time and the y-axis is the quantity of some value at a specific point in time.
|
||||
|
||||
The discrete Fourier Transform translates a time domain signal into the frequency domain.
|
||||
In the frequency domain the X axis is frequency, and Y axis is the accumulated power at a specific frequency.
|
||||
In the frequency domain the x-axis is frequency, and y-axis is the accumulated power at a specific frequency.
|
||||
|
||||
The basic principle is that every time domain signal is composed of one or more signals (sine waves)
|
||||
at different frequencies. The discrete Fourier transform decomposes a time domain signal into its component
|
||||
|
@ -711,26 +230,21 @@ to determine if a signal has structure or if it is purely random.
|
|||
|
||||
The `fft` function performs the discrete Fourier Transform on a vector of *real* data. The result
|
||||
of the `fft` function is returned as *complex* numbers. A complex number has two parts, *real* and *imaginary*.
|
||||
The imaginary part of the complex number is ignored in the examples below, but there
|
||||
are many tutorials on the FFT and that include complex numbers available online.
|
||||
|
||||
But before diving into the examples it is important to understand how the `fft` function formats the
|
||||
complex numbers in the result.
|
||||
The *real* part of the result describes the magnitude of the signal at different frequencies.
|
||||
The *imaginary* part of the result describes the *phase*. The examples below deal only with the *real*
|
||||
part of the result.
|
||||
|
||||
The `fft` function returns a `matrix` with two rows. The first row in the matrix is the *real*
|
||||
part of the complex result. The second row in the matrix is the *imaginary* part of the complex result.
|
||||
|
||||
The `rowAt` function can be used to access the rows so they can be processed as vectors.
|
||||
This approach was taken because all of the vector math functions operate on vectors of real numbers.
|
||||
Rather then introducing a complex number abstraction into the expression language, the `fft` result is
|
||||
represented as two vectors of real numbers.
|
||||
|
||||
|
||||
=== Fast Fourier Transform Examples
|
||||
|
||||
In the first example the `fft` function is called on the sine wave used in the autocorrelation example.
|
||||
|
||||
The results of the `fft` function is a matrix. The `rowAt` function is used to return the first row of
|
||||
the matrix which is a vector containing the real values of the fft response.
|
||||
the matrix which is a vector containing the real values of the `fft` response.
|
||||
|
||||
The plot of the real values of the `fft` response is shown below. Notice there are two
|
||||
peaks on opposite sides of the plot. The plot is actually showing a mirrored response. The right side
|
||||
|
@ -741,14 +255,6 @@ Also notice that the `fft` has accumulated significant power in a single peak. T
|
|||
the specific frequency of the sine wave. The vast majority of frequencies in the plot have close to 0 power
|
||||
associated with them. This `fft` shows a clear signal with very low levels of noise.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)),
|
||||
c=fft(b),
|
||||
d=rowAt(c, 0))
|
||||
----
|
||||
|
||||
|
||||
image::images/math-expressions/signal-fft.png[]
|
||||
|
||||
|
@ -758,17 +264,8 @@ autocorrelation example. The plot of the real values of the `fft` response is sh
|
|||
Notice that in is this response there is no clear peak. Instead all frequencies have accumulated a random level of
|
||||
power. This `fft` shows no clear sign of signal and appears to be noise.
|
||||
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=sample(uniformDistribution(-1.5, 1.5), 256),
|
||||
b=fft(a),
|
||||
c=rowAt(b, 0))
|
||||
----
|
||||
|
||||
image::images/math-expressions/noise-fft.png[]
|
||||
|
||||
|
||||
In the third example the `fft` function is called on the same signal hidden within noise that was used for
|
||||
the autocorrelation example. The plot of the real values of the `fft` response is shown below.
|
||||
|
||||
|
@ -776,14 +273,5 @@ Notice that there are two clear mirrored peaks, at the same locations as the `ff
|
|||
there is also now considerable noise on the frequencies. The `fft` has found the signal and but also
|
||||
shows that there is considerable noise along with the signal.
|
||||
|
||||
[source,text]
|
||||
----
|
||||
let(a=oscillate(1, 0.28, 1.57),
|
||||
b=predict(a, sequence(256, 0, 1)),
|
||||
c=sample(uniformDistribution(-1.5, 1.5), 256),
|
||||
d=ebeAdd(b, c),
|
||||
e=fft(d),
|
||||
f=rowAt(e, 0))
|
||||
----
|
||||
|
||||
image::images/math-expressions/hidden-signal-fft.png[]
|
||||
|
|
After Width: | Height: | Size: 2.5 MiB |
After Width: | Height: | Size: 453 KiB |
After Width: | Height: | Size: 521 KiB |
After Width: | Height: | Size: 2.5 MiB |
After Width: | Height: | Size: 490 KiB |
After Width: | Height: | Size: 278 KiB |
After Width: | Height: | Size: 228 KiB |
After Width: | Height: | Size: 213 KiB |
After Width: | Height: | Size: 95 KiB |
After Width: | Height: | Size: 238 KiB |
After Width: | Height: | Size: 124 KiB |
After Width: | Height: | Size: 135 KiB |
After Width: | Height: | Size: 138 KiB |
After Width: | Height: | Size: 222 KiB |
After Width: | Height: | Size: 219 KiB |
After Width: | Height: | Size: 246 KiB |
After Width: | Height: | Size: 1.8 MiB |
After Width: | Height: | Size: 1.8 MiB |
After Width: | Height: | Size: 286 KiB |
After Width: | Height: | Size: 317 KiB |
After Width: | Height: | Size: 2.4 MiB |
After Width: | Height: | Size: 193 KiB |
After Width: | Height: | Size: 107 KiB |
After Width: | Height: | Size: 239 KiB |
After Width: | Height: | Size: 139 KiB |
After Width: | Height: | Size: 151 KiB |
After Width: | Height: | Size: 168 KiB |
After Width: | Height: | Size: 180 KiB |
After Width: | Height: | Size: 265 KiB |
After Width: | Height: | Size: 303 KiB |
After Width: | Height: | Size: 292 KiB |
After Width: | Height: | Size: 2.3 MiB |
After Width: | Height: | Size: 114 KiB |
After Width: | Height: | Size: 356 KiB |
After Width: | Height: | Size: 165 KiB |
After Width: | Height: | Size: 164 KiB |
After Width: | Height: | Size: 164 KiB |
After Width: | Height: | Size: 178 KiB |
After Width: | Height: | Size: 110 KiB |
After Width: | Height: | Size: 150 KiB |
After Width: | Height: | Size: 162 KiB |
After Width: | Height: | Size: 217 KiB |
After Width: | Height: | Size: 176 KiB |
After Width: | Height: | Size: 152 KiB |
After Width: | Height: | Size: 140 KiB |
After Width: | Height: | Size: 260 KiB |
After Width: | Height: | Size: 152 KiB |
After Width: | Height: | Size: 2.4 MiB |
After Width: | Height: | Size: 1.9 MiB |
After Width: | Height: | Size: 101 KiB |
After Width: | Height: | Size: 163 KiB |
After Width: | Height: | Size: 162 KiB |
After Width: | Height: | Size: 296 KiB |
After Width: | Height: | Size: 185 KiB |
After Width: | Height: | Size: 120 KiB |
After Width: | Height: | Size: 136 KiB |
After Width: | Height: | Size: 240 KiB |
After Width: | Height: | Size: 381 KiB |
After Width: | Height: | Size: 202 KiB |
After Width: | Height: | Size: 333 KiB |
After Width: | Height: | Size: 126 KiB |
After Width: | Height: | Size: 364 KiB |
After Width: | Height: | Size: 100 KiB |
After Width: | Height: | Size: 146 KiB |
After Width: | Height: | Size: 119 KiB |
After Width: | Height: | Size: 180 KiB |
After Width: | Height: | Size: 268 KiB |
After Width: | Height: | Size: 286 KiB |
After Width: | Height: | Size: 150 KiB |
After Width: | Height: | Size: 200 KiB |
After Width: | Height: | Size: 148 KiB |
After Width: | Height: | Size: 172 KiB |
After Width: | Height: | Size: 211 KiB |
After Width: | Height: | Size: 258 KiB |
After Width: | Height: | Size: 208 KiB |
After Width: | Height: | Size: 190 KiB |
After Width: | Height: | Size: 215 KiB |
After Width: | Height: | Size: 141 KiB |
After Width: | Height: | Size: 245 KiB |
After Width: | Height: | Size: 245 KiB |
After Width: | Height: | Size: 127 KiB |
After Width: | Height: | Size: 258 KiB |
After Width: | Height: | Size: 324 KiB |
After Width: | Height: | Size: 174 KiB |
After Width: | Height: | Size: 157 KiB |
After Width: | Height: | Size: 132 KiB |
After Width: | Height: | Size: 40 KiB |
After Width: | Height: | Size: 161 KiB |
After Width: | Height: | Size: 200 KiB |
After Width: | Height: | Size: 93 KiB |
After Width: | Height: | Size: 192 KiB |
After Width: | Height: | Size: 284 KiB |
Before Width: | Height: | Size: 253 KiB After Width: | Height: | Size: 199 KiB |
Before Width: | Height: | Size: 211 KiB After Width: | Height: | Size: 202 KiB |
Before Width: | Height: | Size: 312 KiB After Width: | Height: | Size: 245 KiB |
After Width: | Height: | Size: 148 KiB |
After Width: | Height: | Size: 203 KiB |