opensearch-docs-cn/_search-plugins/knn/painless-functions.md
Alice Williams 253ae34bd8
Search with k-NN filters (#1814)
* new file for knn filter searches

Signed-off-by: alicejw <alicejw@amazon.com>

* for knn filter queries

Signed-off-by: alicejw <alicejw@amazon.com>

* more details and include graphic

Signed-off-by: alicejw <alicejw@amazon.com>

* add graph of filtered doc set

Signed-off-by: alicejw <alicejw@amazon.com>

* add release label

Signed-off-by: alicejw <alicejw@amazon.com>

* filters are defined by Query DSL

Signed-off-by: alicejw <alicejw@amazon.com>

* more details about how the algorithm works and how to specify lucene as the search engine

Signed-off-by: alicejw <alicejw@amazon.com>

* more refining sentences

Signed-off-by: alicejw <alicejw@amazon.com>

* for response samples

Signed-off-by: alicejw <alicejw@amazon.com>

* reorg heading levels

Signed-off-by: alicejw <alicejw@amazon.com>

* more rewrites for clarity

Signed-off-by: alicejw <alicejw@amazon.com>

* to add the complex filter query

Signed-off-by: alicejw <alicejw@amazon.com>

* update response for complex query

Signed-off-by: alicejw <alicejw@amazon.com>

* for typo

Signed-off-by: alicejw <alicejw@amazon.com>

* for rewrites to overview

Signed-off-by: alicejw <alicejw@amazon.com>

* to add better request/response for the complex filter example

Signed-off-by: alicejw <alicejw@amazon.com>

* for eng review update

Signed-off-by: alicejw <alicejw@amazon.com>

* format fix for example

Signed-off-by: alicejw <alicejw@amazon.com>

* for filter selectiveness use case section

Signed-off-by: alicejw <alicejw@amazon.com>

* for new workflow diagram and description

Signed-off-by: alicejw <alicejw@amazon.com>

* update section headings

Signed-off-by: alicejw <alicejw@amazon.com>

* add image for algorithm workflow diagram

Signed-off-by: alicejw <alicejw@amazon.com>

* reorg sections to make more concise

Signed-off-by: alicejw <alicejw@amazon.com>

* explain selectiveness percentage

Signed-off-by: alicejw <alicejw@amazon.com>

* more rewrites to complex query description

Signed-off-by: alicejw <alicejw@amazon.com>

* define complex query

Signed-off-by: alicejw <alicejw@amazon.com>

* more rewrites

Signed-off-by: alicejw <alicejw@amazon.com>

* for tech review feedback and add new information

Signed-off-by: alicejw <alicejw@amazon.com>

* to blend new Boolean query example into filter approaches section

Signed-off-by: alicejw <alicejw@amazon.com>

* for complex query description clarity

Signed-off-by: alicejw <alicejw@amazon.com>

* more rewrites

Signed-off-by: alicejw <alicejw@amazon.com>

* typo

Signed-off-by: alicejw <alicejw@amazon.com>

* eng review updates

Signed-off-by: alicejw <alicejw@amazon.com>

* nit for grammar

Signed-off-by: alicejw <alicejw@amazon.com>

* to fix incorrect descriptions of restrictive filters

Signed-off-by: alicejw <alicejw@amazon.com>

* to fix incorrect descriptions of restrictive filters

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc review feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* minor grammar change

Signed-off-by: alicejw <alicejw@amazon.com>

* removed figure and table titles, per AWS Style Guide

Signed-off-by: alicejw <alicejw@amazon.com>

* remove table title per style guide

Signed-off-by: alicejw <alicejw@amazon.com>

* update nav orders for all pages to give space for new topics in multiples of 5, and add links to other knn topics where appropriate

Signed-off-by: alicejw <alicejw@amazon.com>

* small rewrite

Signed-off-by: alicejw <alicejw@amazon.com>

* for second doc review comments

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _search-plugins/knn/filter-search-knn.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* Update _search-plugins/knn/filter-search-knn.md

Co-authored-by: Nate Bower <nbower@amazon.com>

* for editorial review updates

Signed-off-by: alicejw <alicejw@amazon.com>

* for editorial review updates

Signed-off-by: alicejw <alicejw@amazon.com>

* fix cross-ref link

Signed-off-by: alicejw <alicejw@amazon.com>

* fix undone commit suggestions

Signed-off-by: alicejw <alicejw@amazon.com>

Signed-off-by: alicejw <alicejw@amazon.com>
Co-authored-by: Nate Bower <nbower@amazon.com>
2022-11-11 11:23:45 -08:00

4.1 KiB
Raw Blame History

layout title nav_order parent has_children has_math
default k-NN Painless extensions 25 k-NN false true

k-NN Painless Scripting extensions

With the k-NN plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on knn_vector fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin adds Painless Scripting extensions to a few of the distance functions used in k-NN score script, so you can use them to customize your k-NN workload.

Get started with k-NN's Painless Scripting functions

To use k-NN's Painless Scripting functions, first create an index with knn_vector fields like in k-NN score script. Once the index is created and you ingest some data, you can use the painless extensions:

GET my-knn-index-2/_search
{
  "size": 2,
  "query": {
    "script_score": {
      "query": {
        "bool": {
          "filter": {
            "term": {
              "color": "BLUE"
            }
          }
        }
      },
      "script": {
        "source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])",
        "params": {
          "field": "my_vector",
          "query_value": [9.9, 9.9]
        }
      }
    }
  }
}

field needs to map to a knn_vector field, and query_value needs to be a floating point array with the same dimension as field.

Function types

The following table describes the available painless functions the k-NN plugin provides:

Function name | Function signature | Description :--- | :--- l2Squared | float l2Squared (float[] queryVector, doc['vector field']) | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors. l1Norm | float l1Norm (float[] queryVector, doc['vector field']) | This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors. cosineSimilarity | float cosineSimilarity (float[] queryVector, doc['vector field']) | Cosine similarity is an inner product of the query vector and document vector normalized to both have a length of 1. If the magnitude of the query vector doesn't change throughout the query, you can pass the magnitude of the query vector to improve performance, instead of calculating the magnitude every time for every filtered document:
float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)
In general, the range of cosine similarity is [-1, 1]. However, in the case of information retrieval, the cosine similarity of two documents ranges from 0 to 1 because the tf-idf statistic can't be negative. Therefore, the k-NN plugin adds 1.0 in order to always yield a positive cosine similarity score.

Constraints

  1. If a documents knn_vector field has different dimensions than the query, the function throws an IllegalArgumentException.

  2. If a vector field doesn't have a value, the function throws an IllegalStateException.

    You can avoid this situation by first checking if a document has a value in its field:

    "source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))",
    

    Because scores can only be positive, this script ranks documents with vector fields higher than those without.