opensearch-docs-cn/_search-plugins/knn/painless-functions.md

69 lines
4.1 KiB
Markdown
Raw Normal View History

2021-05-05 13:09:47 -04:00
---
layout: default
2021-05-11 12:29:35 -04:00
title: k-NN Painless extensions
Search with k-NN filters (#1814) * new file for knn filter searches Signed-off-by: alicejw <alicejw@amazon.com> * for knn filter queries Signed-off-by: alicejw <alicejw@amazon.com> * more details and include graphic Signed-off-by: alicejw <alicejw@amazon.com> * add graph of filtered doc set Signed-off-by: alicejw <alicejw@amazon.com> * add release label Signed-off-by: alicejw <alicejw@amazon.com> * filters are defined by Query DSL Signed-off-by: alicejw <alicejw@amazon.com> * more details about how the algorithm works and how to specify lucene as the search engine Signed-off-by: alicejw <alicejw@amazon.com> * more refining sentences Signed-off-by: alicejw <alicejw@amazon.com> * for response samples Signed-off-by: alicejw <alicejw@amazon.com> * reorg heading levels Signed-off-by: alicejw <alicejw@amazon.com> * more rewrites for clarity Signed-off-by: alicejw <alicejw@amazon.com> * to add the complex filter query Signed-off-by: alicejw <alicejw@amazon.com> * update response for complex query Signed-off-by: alicejw <alicejw@amazon.com> * for typo Signed-off-by: alicejw <alicejw@amazon.com> * for rewrites to overview Signed-off-by: alicejw <alicejw@amazon.com> * to add better request/response for the complex filter example Signed-off-by: alicejw <alicejw@amazon.com> * for eng review update Signed-off-by: alicejw <alicejw@amazon.com> * format fix for example Signed-off-by: alicejw <alicejw@amazon.com> * for filter selectiveness use case section Signed-off-by: alicejw <alicejw@amazon.com> * for new workflow diagram and description Signed-off-by: alicejw <alicejw@amazon.com> * update section headings Signed-off-by: alicejw <alicejw@amazon.com> * add image for algorithm workflow diagram Signed-off-by: alicejw <alicejw@amazon.com> * reorg sections to make more concise Signed-off-by: alicejw <alicejw@amazon.com> * explain selectiveness percentage Signed-off-by: alicejw <alicejw@amazon.com> * more rewrites to complex query description Signed-off-by: alicejw <alicejw@amazon.com> * define complex query Signed-off-by: alicejw <alicejw@amazon.com> * more rewrites Signed-off-by: alicejw <alicejw@amazon.com> * for tech review feedback and add new information Signed-off-by: alicejw <alicejw@amazon.com> * to blend new Boolean query example into filter approaches section Signed-off-by: alicejw <alicejw@amazon.com> * for complex query description clarity Signed-off-by: alicejw <alicejw@amazon.com> * more rewrites Signed-off-by: alicejw <alicejw@amazon.com> * typo Signed-off-by: alicejw <alicejw@amazon.com> * eng review updates Signed-off-by: alicejw <alicejw@amazon.com> * nit for grammar Signed-off-by: alicejw <alicejw@amazon.com> * to fix incorrect descriptions of restrictive filters Signed-off-by: alicejw <alicejw@amazon.com> * to fix incorrect descriptions of restrictive filters Signed-off-by: alicejw <alicejw@amazon.com> * for doc review feedback updates Signed-off-by: alicejw <alicejw@amazon.com> * minor grammar change Signed-off-by: alicejw <alicejw@amazon.com> * removed figure and table titles, per AWS Style Guide Signed-off-by: alicejw <alicejw@amazon.com> * remove table title per style guide Signed-off-by: alicejw <alicejw@amazon.com> * update nav orders for all pages to give space for new topics in multiples of 5, and add links to other knn topics where appropriate Signed-off-by: alicejw <alicejw@amazon.com> * small rewrite Signed-off-by: alicejw <alicejw@amazon.com> * for second doc review comments Signed-off-by: alicejw <alicejw@amazon.com> * Update _search-plugins/knn/filter-search-knn.md Co-authored-by: Nate Bower <nbower@amazon.com> * Update _search-plugins/knn/filter-search-knn.md Co-authored-by: Nate Bower <nbower@amazon.com> * for editorial review updates Signed-off-by: alicejw <alicejw@amazon.com> * for editorial review updates Signed-off-by: alicejw <alicejw@amazon.com> * fix cross-ref link Signed-off-by: alicejw <alicejw@amazon.com> * fix undone commit suggestions Signed-off-by: alicejw <alicejw@amazon.com> Signed-off-by: alicejw <alicejw@amazon.com> Co-authored-by: Nate Bower <nbower@amazon.com>
2022-11-11 14:23:45 -05:00
nav_order: 25
2021-05-05 13:09:47 -04:00
parent: k-NN
has_children: false
has_math: true
---
2021-05-11 12:29:35 -04:00
# k-NN Painless Scripting extensions
2021-05-05 13:09:47 -04:00
2021-06-09 22:15:41 -04:00
With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in [k-NN score script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script), so you can use them to customize your k-NN workload.
2021-05-05 13:09:47 -04:00
## Get started with k-NN's Painless Scripting functions
To use k-NN's Painless Scripting functions, first create an index with `knn_vector` fields like in [k-NN score script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script#getting-started-with-the-score-script-for-vectors). Once the index is created and you ingest some data, you can use the painless extensions:
2021-05-05 13:09:47 -04:00
```json
GET my-knn-index-2/_search
{
"size": 2,
"query": {
"script_score": {
"query": {
"bool": {
"filter": {
"term": {
"color": "BLUE"
}
}
}
},
"script": {
"source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])",
"params": {
"field": "my_vector",
"query_value": [9.9, 9.9]
}
}
}
}
}
```
`field` needs to map to a `knn_vector` field, and `query_value` needs to be a floating point array with the same dimension as `field`.
## Function types
2021-05-11 12:29:35 -04:00
The following table describes the available painless functions the k-NN plugin provides:
2021-05-05 13:09:47 -04:00
2021-05-11 12:29:35 -04:00
Function name | Function signature | Description
2021-05-28 16:54:13 -04:00
:--- | :---
2021-05-11 12:29:35 -04:00
l2Squared | `float l2Squared (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.
l1Norm | `float l1Norm (float[] queryVector, doc['vector field'])` | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.
cosineSimilarity | `float cosineSimilarity (float[] queryVector, doc['vector field'])` | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of 1. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance, instead of calculating the magnitude every time for every filtered document:<br /> `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)` <br />In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from 0 to 1 because the tf-idf statistic can't be negative. Therefore, the k-NN plugin adds 1.0 in order to always yield a positive cosine similarity score.
2021-05-05 13:09:47 -04:00
## Constraints
2021-05-28 16:54:13 -04:00
2021-05-05 13:09:47 -04:00
1. If a documents `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`.
2021-05-28 16:54:13 -04:00
2021-05-11 12:29:35 -04:00
2. If a vector field doesn't have a value, the function throws an <code>IllegalStateException</code>.
2021-05-28 16:54:13 -04:00
2021-05-05 13:09:47 -04:00
You can avoid this situation by first checking if a document has a value in its field:
2021-05-28 16:54:13 -04:00
```
"source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
```
Because scores can only be positive, this script ranks documents with vector fields higher than those without.