kolchfa-aws f999e0a8a8
Add an overview of search methods and pages for each search method (#5636)
* Restructuring TOC

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Resolve merge conflicts

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More foundational rewrites of ML

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* TOC restructure

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Rename and rewrite search pages and add keyword search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Small wording change

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Small wording change

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Updated response

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Small rewording

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Move neural search to top of vector search list

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Change terminology

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Reorganize search methods list

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Rename links

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More link renames

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-11-29 15:28:20 -05:00

3.6 KiB

layout title nav_order has_children has_toc redirect_from
default k-NN search 20 true false
/search-plugins/knn/

k-NN search

Short for k-nearest neighbors, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points.

Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For more background information about k-NN search, see Wikipedia.

This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors:

  1. Approximate k-NN

    The first method takes an approximate nearest neighbor approach---it uses one of several algorithms to return the approximate k-nearest neighbors to a query vector. Usually, these algorithms sacrifice indexing speed and search accuracy in return for performance benefits such as lower latency, smaller memory footprints and more scalable search. To learn more about the algorithms, refer to nmslib's and faiss's documentation.

    Approximate k-NN is the best choice for searches over large indexes (that is, hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the script scoring method or Painless extensions.

    For more details about this method, including recommendations for which engine to use, see Approximate k-NN search.

  2. Script Score k-NN

    The second method extends OpenSearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, you can run k-NN search on a subset of vectors in your index (sometimes referred to as a pre-filter search).

    Use this approach for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indexes may lead to high latencies.

    For more details about this method, see Exact k-NN with scoring script.

  3. Painless extensions

    The third method adds the distance functions as painless extensions that you can use in more complex combinations. Similar to the k-NN Script Score, you can use this method to perform a brute force, exact k-NN search across an index, which also supports pre-filtering.

    This approach has slightly slower query performance compared to the k-NN Script Score. If your use case requires more customization over the final score, you should use this approach over Script Score k-NN.

    For more details about this method, see Painless scripting functions.

Overall, for larger data sets, you should generally choose the approximate nearest neighbor method because it scales significantly better. For smaller data sets, where you may want to apply a filter, you should choose the custom scoring approach. If you have a more complex use case where you need to use a distance function as part of their scoring method, you should use the painless scripting approach.