Refactor k-NN filter search (#3613)
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
This commit is contained in:
parent
b6763b1815
commit
ade705e9f5
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Approximate search
|
title: Approximate search
|
||||||
nav_order: 10
|
nav_order: 15
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
has_math: true
|
has_math: true
|
||||||
|
|
|
@ -1,126 +1,140 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Search with k-NN filters
|
title: k-NN search with filters
|
||||||
nav_order: 15
|
nav_order: 20
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
has_math: true
|
has_math: true
|
||||||
---
|
---
|
||||||
|
|
||||||
# Search with k-NN filters
|
# k-NN search with filters
|
||||||
Introduced 2.4
|
|
||||||
{: .label .label-purple }
|
|
||||||
|
|
||||||
You can create custom filters using Query domain-specific language (DSL) search options to refine your k-NN searches. You define the filter criteria within the `knn_vector` field's `filter` subsection in your query. You can use any of the OpenSearch query DSL query types as a filter. This includes the common query types: `term`, `range`, `regexp`, and `wildcard`, as well as custom query types. To include or exclude results, use Boolean query clauses. You can also specify a query point with the `knn_vector` type and search for nearest neighbors that match your filter criteria.
|
To refine k-NN results, you can filter a k-NN search using one of the following methods:
|
||||||
To run k-NN queries with a filter, the Lucene search engine and Hierarchical Navigable Small World (HNSW) method are required.
|
|
||||||
|
|
||||||
To learn more about how to use query DSL Boolean query clauses, see [Boolean queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/compound/bool). For more details about the `knn_vector` data type definition, see [k-NN Index]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/).
|
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
|
||||||
{: .note }
|
|
||||||
|
|
||||||
## How does a k-NN filter work?
|
- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
|
||||||
|
|
||||||
The OpenSearch k-NN plugin version 2.2 introduced support for the Lucene engine in order to process k-NN searches. The Lucene engine provides a search that is based on the HNSW algorithm in order to represent a multi-layered graph. The OpenSearch k-NN plugin version 2.4 can incorporate filters for searches based on Lucene 9.4.
|
- [Lucene k-NN filter](#using-a-lucene-k-nn-filter): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. You can only use this method with the Hierarchical Navigable Small World (HNSW) algorithm implemented by the Lucene search engine in k-NN plugin versions 2.4 and later.
|
||||||
|
|
||||||
After a filter is applied to a set of documents to be searched, the algorithm decides whether to perform pre-filtering for an exact k-NN search or modified post-filtering for an approximate search. The approximate search with filtering ensures the top number of closest vectors in the results.
|
## Filtered search optimization
|
||||||
|
|
||||||
Lucene also provides the capability to operate its `KnnVectorQuery` across a subset of documents. To learn more about this capability, see the [Apache Lucene Documentation](https://issues.apache.org/jira/browse/LUCENE-10382).
|
Depending on your dataset and use case, you might be more interested in maximizing recall or minimizing latency. The following table provides guidance on various k-NN search configurations and the filtering methods used to optimize for higher recall or lower latency. The first three columns of the table provide several example k-NN search configurations. A search configuration consists of:
|
||||||
|
|
||||||
To learn more about all available k-NN search approaches, including approximate k-NN, exact k-NN with script score, and pre-filtering with painless extensions, see [k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/).
|
- The number of documents in an index, where one OpenSearch document corresponds to one k-NN vector.
|
||||||
|
- The percentage of documents left in the results after filtering. This value depends on the restrictiveness of the filter that you provide in the query. The most restrictive filter in the table returns 2.5% of documents in the index, while the least restrictive filter returns 80% of documents.
|
||||||
|
- The desired number of returned results (k).
|
||||||
|
|
||||||
### Filtered search performance
|
Once you've estimated the number of documents in your index, the restrictiveness of your filter, and the desired number of nearest neighbors, use the following table to choose a filtering method that optimizes for recall or latency.
|
||||||
|
|
||||||
Filtering that is tightly integrated with the Lucene HNSW algorithm implementation allows you to apply k-NN searches more efficiently, both in terms of relevancy of search results and performance. Consider, for example, an exact search using post-filtering on a large dataset that returns results slowly and does not ensure the required number of results specified by `k`.
|
| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency |
|
||||||
With this new capability, you can create an approximate k-NN search, apply filters, and get the number of results that you need. To learn more about approximate searches, see [Approximate k-NN search]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/).
|
| :-- | :-- | :-- | :-- | :-- |
|
||||||
|
| 10M | 2.5 | 100 | Scoring script | Scoring script |
|
||||||
|
| 10M | 38 | 100 | Lucene filter | Boolean filter |
|
||||||
|
| 10M | 80 | 100 | Scoring script | Lucene filter |
|
||||||
|
| 1M | 2.5 | 100 | Lucene filter | Scoring script |
|
||||||
|
| 1M | 38 | 100 | Lucene filter | Lucene filter/scoring script |
|
||||||
|
| 1M | 80 | 100 | Boolean filter | Lucene filter |
|
||||||
|
|
||||||
The HNSW algorithm decides which type of filtering to apply to a search based on the volume of documents and number of `k` points in the index that you search with a filter.
|
## Scoring script filter
|
||||||
|
|
||||||
![How the algorithm evaluates a doc set]({{site.url}}{{site.baseurl}}/images/hsnw-algorithm.png)
|
A scoring script filter first filters the documents and then uses a brute-force exact k-NN search on the results. For example, the following query searches for hotels with a rating between 8 and 10, inclusive, that provide parking and then performs a k-NN search to return the 3 hotels that are closest to the specified `location`:
|
||||||
|
|
||||||
| Variable | Description |
|
|
||||||
-- | -- | -- |
|
|
||||||
N | The number of documents in the index.
|
|
||||||
P | The number of documents in the search set after the filter is applied using the formula P <= N.
|
|
||||||
q | The search vector.
|
|
||||||
k | The maximum number of vectors to return in the response.
|
|
||||||
|
|
||||||
To learn more about k-NN performance tuning, see [Performance tuning]({{site.url}}{{site.baseurl}}/search-plugins/knn/performance-tuning/).
|
|
||||||
|
|
||||||
## Filter approaches by use case
|
|
||||||
|
|
||||||
Depending on the dataset that you are searching, you might choose a different approach to minimize recall or latency. You can create filters that are:
|
|
||||||
|
|
||||||
* Very restrictive: Returns the lowest number of documents (for example, 2.5%).
|
|
||||||
* Somewhat restrictive: Returns some documents (for example, 38%).
|
|
||||||
* Not very restrictive: Returns the highest number of documents (for example, 80%).
|
|
||||||
|
|
||||||
The restrictive percentage indicates the number of documents the filter returns for any given document set in an index.
|
|
||||||
|
|
||||||
Number of Vectors | Filter Restrictive Percentage | k | Recall | Latency
|
|
||||||
-- | -- | -- | -- | --
|
|
||||||
10M | 2.5 | 100 | Scoring script | Scoring script
|
|
||||||
10M | 38 | 100 | Lucene filter | Boolean filter
|
|
||||||
10M | 80 | 100 | Scoring script | Lucene filter
|
|
||||||
1M | 2.5 | 100 | Lucene filter | Scoring script
|
|
||||||
1M | 38 | 100 | Lucene filter | lucene_filtering / Scoring script
|
|
||||||
1M | 80 | 100 | Boolean filter | lucene_filtering
|
|
||||||
|
|
||||||
In this context, *Scoring script* is essentially a brute force search, whereas a Boolean filter is an approximate k-NN search with post-filtering.
|
|
||||||
|
|
||||||
To learn more about the dynamic searches you can perform with the score script plugin, see [Exact k-NN with scoring script]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-score-script/).
|
|
||||||
|
|
||||||
### Boolean filter with approximate k-NN search
|
|
||||||
|
|
||||||
In a Boolean query that uses post-filtering, you can join a k-NN query with a filter using a `bool` `must` query clause.
|
|
||||||
|
|
||||||
#### Example request
|
|
||||||
|
|
||||||
The following k-NN query uses a Boolean query clause to filter results:
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST /hotels-index/_search
|
POST /hotels-index/_search
|
||||||
{
|
{
|
||||||
"size": 3,
|
"size": 3,
|
||||||
"query": {
|
"query": {
|
||||||
|
"script_score": {
|
||||||
|
"query": {
|
||||||
"bool": {
|
"bool": {
|
||||||
"filter": {
|
"filter": {
|
||||||
"bool": {
|
"bool": {
|
||||||
"must": [
|
"must": [
|
||||||
{
|
|
||||||
"range": {
|
|
||||||
"rating": {
|
|
||||||
"gte": 8,
|
|
||||||
"lte": 10
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"term": {
|
|
||||||
"parking": "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"must": [
|
|
||||||
{
|
{
|
||||||
"knn": {
|
"range": {
|
||||||
"location": {
|
"rating": {
|
||||||
"vector": [
|
"gte": 8,
|
||||||
5.0,
|
"lte": 10
|
||||||
4.0
|
|
||||||
],
|
|
||||||
"k": 20
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"term": {
|
||||||
|
"parking": "true"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
},
|
||||||
|
"script": {
|
||||||
|
"source": "knn_score",
|
||||||
|
"lang": "knn",
|
||||||
|
"params": {
|
||||||
|
"field": "location",
|
||||||
|
"query_value": [
|
||||||
|
5.0,
|
||||||
|
4.0
|
||||||
|
],
|
||||||
|
"space_type": "l2"
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
#### Example response
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
The Boolean query filter returns the following results in the response:
|
## Boolean filter with ANN search
|
||||||
|
|
||||||
|
A Boolean filter consists of a Boolean query that contains a k-NN query and a filter. For example, the following query searches for hotels that are closest to the specified `location` and then filters the results to return hotels with a rating between 8 and 10, inclusive, that provide parking:
|
||||||
|
|
||||||
|
```json
|
||||||
|
POST /hotels-index/_search
|
||||||
|
{
|
||||||
|
"size": 3,
|
||||||
|
"query": {
|
||||||
|
"bool": {
|
||||||
|
"filter": {
|
||||||
|
"bool": {
|
||||||
|
"must": [
|
||||||
|
{
|
||||||
|
"range": {
|
||||||
|
"rating": {
|
||||||
|
"gte": 8,
|
||||||
|
"lte": 10
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"term": {
|
||||||
|
"parking": "true"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"must": [
|
||||||
|
{
|
||||||
|
"knn": {
|
||||||
|
"location": {
|
||||||
|
"vector": [
|
||||||
|
5,
|
||||||
|
4
|
||||||
|
],
|
||||||
|
"k": 20
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The response includes documents containing the matching hotels:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
@ -183,167 +197,73 @@ The Boolean query filter returns the following results in the response:
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Use case 1: Very restrictive 2.5% filter
|
## Lucene k-NN filter implementation
|
||||||
|
|
||||||
A very restrictive filter returns the lowest number of documents in your dataset. For example, the following filter criteria specifies hotels with feedback ratings less than or equal to 3. This 2.5% filter only returns 1 document:
|
k-NN plugin version 2.2 introduced support for running k-NN searches with the Lucene engine using HNSW graphs. Starting with version 2.4, which is based on Lucene version 9.4, you can use Lucene filters for k-NN searches.
|
||||||
|
|
||||||
```json
|
When you specify a Lucene filter for a k-NN search, the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering. The algorithm uses the following variables:
|
||||||
"filter": {
|
|
||||||
"bool": {
|
|
||||||
"must": [
|
|
||||||
{
|
|
||||||
"range": {
|
|
||||||
"rating": {
|
|
||||||
"lte": 3
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Use case 2: Somewhat restrictive 38% filter
|
- N: The number of documents in the index.
|
||||||
|
- P: The number of documents in the document subset after the filter is applied (P <= N).
|
||||||
|
- k: The maximum number of vectors to return in the response.
|
||||||
|
|
||||||
A somewhat restrictive filter returns 38% of the documents in the data set that you search. For example, the following filter criteria specifies hotels with parking and feedback ratings less than or equal to 8 and returns 5 documents:
|
The following flow chart outlines the Lucene algorithm.
|
||||||
|
|
||||||
```json
|
![Lucene algorithm for filtering]({{site.url}}{{site.baseurl}}/images/lucene-algorithm.png)
|
||||||
"filter": {
|
|
||||||
"bool": {
|
|
||||||
"must": [
|
|
||||||
{
|
|
||||||
"range": {
|
|
||||||
"rating": {
|
|
||||||
"lte": 8
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"term": {
|
|
||||||
"parking": "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Use case 3: Not very restrictive 80% filter
|
For more information about the Lucene filtering implementation and the underlying `KnnVectorQuery`, see the [Apache Lucene documentation](https://issues.apache.org/jira/browse/LUCENE-10382).
|
||||||
|
|
||||||
A filter that is not very restrictive will return 80% of the documents that you search. For example, the following filter criteria specifies hotels with feedback ratings greater than or equal to 5 and returns 10 documents:
|
## Using a Lucene k-NN filter
|
||||||
|
|
||||||
```json
|
Consider a dataset that includes 12 documents containing hotel information. The following image shows all hotels on an xy coordinate plane by location. Additionally, the points for hotels that have a rating between 8 and 10, inclusive, are depicted with orange dots, and hotels that provide parking are depicted with green circles. The search point is colored in red:
|
||||||
"filter": {
|
|
||||||
"bool": {
|
|
||||||
"must": [
|
|
||||||
{
|
|
||||||
"range": {
|
|
||||||
"rating": {
|
|
||||||
"gte": 5
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Overview: How to use filters in a k-NN search
|
![Graph of documents with filter criteria]({{site.url}}{{site.baseurl}}/images/knn-doc-set-for-filtering.png)
|
||||||
|
|
||||||
You can search with a filter by following these three steps:
|
In this example, you will create an index and search for the three hotels with high ratings and parking that are the closest to the search location.
|
||||||
1. Create an index and specify the requirements for the Lucene engine and HNSW requirements in the mapping.
|
|
||||||
1. Add your data to the index.
|
|
||||||
1. Search the index and specify these three items in your query:
|
|
||||||
* One or more filters defined by query DSL
|
|
||||||
* A vector reference point defined by the `vector` field
|
|
||||||
* The number of matches you want returned with the `k` field
|
|
||||||
|
|
||||||
You can use a range query to specify hotel feedback ratings and a term query to require that parking is available. The criteria is processed with Boolean clauses to indicate whether or not the document contains the criteria.
|
### Step 1: Create a new index
|
||||||
|
|
||||||
Consider a dataset that contains 12 documents, a search reference point, and documents that meet two filter criteria.
|
Before you can run a k-NN search with a filter, you need to create an index with a `knn_vector` field. For this field, you need to specify `lucene` as the engine and `hnsw` as the `method` in the mapping.
|
||||||
|
|
||||||
![Graph of documents with filter criteria]({{site.url}}{{site.baseurl}}/images/knn-two-filters.png)
|
The following request creates a new index called `hotels-index` with a `knn-filter` field called `location`:
|
||||||
|
|
||||||
## Step 1: Create a new index with a Lucene mapping
|
|
||||||
|
|
||||||
Before you can run a k-NN search with a filter, you need to create an index, specify the Lucene engine in a mapping, and add data to the index.
|
|
||||||
|
|
||||||
You need to add a `location` field to represent the location and specify it as the `knn_vector` type. The most basic vector can be two-dimensional. For example:
|
|
||||||
|
|
||||||
```
|
|
||||||
"type": "knn_vector",
|
|
||||||
"dimension": 2,
|
|
||||||
```
|
|
||||||
|
|
||||||
### Requirement: Lucene engine with HNSW method
|
|
||||||
|
|
||||||
Make sure to specify "hnsw" method and "lucene" engine in the `knn_vector` field description, as follows:
|
|
||||||
|
|
||||||
```json
|
|
||||||
"my_field": {
|
|
||||||
"type": "knn_vector",
|
|
||||||
"dimension": 2,
|
|
||||||
"method": {
|
|
||||||
"name": "hnsw",
|
|
||||||
"space_type": "l2",
|
|
||||||
"engine": "lucene"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Example request
|
|
||||||
|
|
||||||
The following request creates a new index called "hotels-index":
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
PUT /hotels-index
|
PUT /hotels-index
|
||||||
{
|
{
|
||||||
"settings": {
|
"settings": {
|
||||||
"index": {
|
"index": {
|
||||||
"knn": true,
|
"knn": true,
|
||||||
"knn.algo_param.ef_search": 100,
|
"knn.algo_param.ef_search": 100,
|
||||||
"number_of_shards": 1,
|
"number_of_shards": 1,
|
||||||
"number_of_replicas": 0
|
"number_of_replicas": 0
|
||||||
}
|
|
||||||
},
|
|
||||||
"mappings": {
|
|
||||||
"properties": {
|
|
||||||
"location": {
|
|
||||||
"type": "knn_vector",
|
|
||||||
"dimension": 2,
|
|
||||||
"method": {
|
|
||||||
"name": "hnsw",
|
|
||||||
"space_type": "l2",
|
|
||||||
"engine": "lucene",
|
|
||||||
"parameters": {
|
|
||||||
"ef_construction": 100,
|
|
||||||
"m": 16
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
},
|
||||||
|
"mappings": {
|
||||||
|
"properties": {
|
||||||
|
"location": {
|
||||||
|
"type": "knn_vector",
|
||||||
|
"dimension": 2,
|
||||||
|
"method": {
|
||||||
|
"name": "hnsw",
|
||||||
|
"space_type": "l2",
|
||||||
|
"engine": "lucene",
|
||||||
|
"parameters": {
|
||||||
|
"ef_construction": 100,
|
||||||
|
"m": 16
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
#### Example response
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
Upon success, you should receive a "200-OK" status with the following response:
|
### Step 2: Add data to your index
|
||||||
|
|
||||||
```json
|
Next, add data to your index.
|
||||||
{
|
|
||||||
"acknowledged" : true,
|
|
||||||
"shards_acknowledged" : true,
|
|
||||||
"index" : "hotels-index"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step 2: Add data to your index
|
The following request adds 12 documents that contain hotel location, rating, and parking information:
|
||||||
|
|
||||||
Next, add data to your index with a PUT HTTP request. Make sure that the search criteria is defined in the body of the request.
|
|
||||||
|
|
||||||
#### Example request
|
|
||||||
|
|
||||||
The following request adds 12 hotel documents that contain criteria such as feedback ratings and whether or not parking is available:
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST /_bulk
|
POST /_bulk
|
||||||
|
@ -372,90 +292,53 @@ POST /_bulk
|
||||||
{ "index": { "_index": "hotels-index", "_id": "12" } }
|
{ "index": { "_index": "hotels-index", "_id": "12" } }
|
||||||
{ "location": [5.0, 1.0], "parking" : "true", "rating" : 3 }
|
{ "location": [5.0, 1.0], "parking" : "true", "rating" : 3 }
|
||||||
```
|
```
|
||||||
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
#### Example response
|
### Step 3: Search your data with a filter
|
||||||
|
|
||||||
Upon success, you should receive a "200-OK" status with entries for each document ID added to the index. The following response is truncated to only show one document:
|
Now you can create a k-NN search with filters. In the k-NN query clause, include the point of interest that is used to search for nearest neighbors, the number of nearest neighbors to return (`k`), and a filter with the restriction criteria. Depending on how restrictive you want your filter to be, you can add multiple query clauses to a single request.
|
||||||
|
|
||||||
```json
|
The following request creates a k-NN query that searches for the top three hotels near the location with the coordinates `[5, 4]` that are rated between 8 and 10, inclusive, and provide parking:
|
||||||
{
|
|
||||||
"took" : 140,
|
|
||||||
"errors" : false,
|
|
||||||
"items" : [
|
|
||||||
{
|
|
||||||
"index" : {
|
|
||||||
"_index" : "hotels-index",
|
|
||||||
"_id" : "1",
|
|
||||||
"_version" : 2,
|
|
||||||
"result" : "updated",
|
|
||||||
"_shards" : {
|
|
||||||
"total" : 1,
|
|
||||||
"successful" : 1,
|
|
||||||
"failed" : 0
|
|
||||||
},
|
|
||||||
"_seq_no" : 12,
|
|
||||||
"_primary_term" : 3,
|
|
||||||
"status" : 200
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
## Step 3: Search your data with a filter
|
|
||||||
|
|
||||||
Now you can create a k-NN search that specifies filters by using query DSL Boolean clauses. You need to include your reference point to search for nearest neighbors. Provide an x-y coordinate for the point within the `vector` field, such as `"vector": [ 5.0, 4.0]`.
|
|
||||||
|
|
||||||
To learn more about how to specify ranges with query DSL, see [Range query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/term/#range).
|
|
||||||
{: .note }
|
|
||||||
|
|
||||||
#### Example request
|
|
||||||
|
|
||||||
The following request creates a k-NN query that only returns the top hotels rated between 8 and 10 and that provide parking. The filter criteria to indicate the range for the feedback ratings uses a `range` query and a `term` query clause to indicate "parking":
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST /hotels-index/_search
|
POST /hotels-index/_search
|
||||||
{
|
{
|
||||||
"size": 3,
|
"size": 3,
|
||||||
"query": {
|
"query": {
|
||||||
"knn": {
|
"knn": {
|
||||||
"location": {
|
"location": {
|
||||||
"vector": [
|
"vector": [
|
||||||
5.0,
|
5,
|
||||||
4.0
|
4
|
||||||
],
|
],
|
||||||
"k": 3,
|
"k": 3,
|
||||||
"filter": {
|
"filter": {
|
||||||
"bool": {
|
"bool": {
|
||||||
"must": [
|
"must": [
|
||||||
{
|
{
|
||||||
"range": {
|
"range": {
|
||||||
"rating": {
|
"rating": {
|
||||||
"gte": 8,
|
"gte": 8,
|
||||||
"lte": 10
|
"lte": 10
|
||||||
}
|
}
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"term": {
|
|
||||||
"parking": "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
},
|
||||||
|
{
|
||||||
|
"term": {
|
||||||
|
"parking": "true"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
{% include copy-curl.html %}
|
||||||
|
|
||||||
|
The response returns the three hotels that are nearest to the search point and have met the filter criteria:
|
||||||
#### Sample Response
|
|
||||||
|
|
||||||
The following response indicates that only three hotels met the filter criteria:
|
|
||||||
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
@ -516,134 +399,68 @@ The following response indicates that only three hotels met the filter criteria:
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Additional complex filter query
|
Note that there are multiple ways to construct a filter that returns hotels that provide parking, for example:
|
||||||
|
|
||||||
Depending on how restrictive you want your filter to be, you can add multiple query types to a single request, such as `term`, `wildcard`, `regexp`, or `range`. You can then filter out the search results with the Boolean clauses `must`, `should`, and `must_not`.
|
- A `term` query clause in the `should` clause
|
||||||
|
- A `wildcard` query clause in the `should` clause
|
||||||
|
- A `regexp` query clause in the `should` clause
|
||||||
|
- A `must_not` clause to eliminate hotels with `parking` set to `false`.
|
||||||
|
|
||||||
#### Example request
|
The following request illustrates these four different ways of searching for hotels with parking:
|
||||||
|
|
||||||
The following request returns hotels that provide parking. This request illustrates multiple alternative mechanisms to obtain the parking filter criteria. It uses a regular expression for the value `true`, a term query for the key-value pair `"parking":"true"`, a wildcard for the characters that spell "true", and the `must_not` clause to eliminate hotels with "parking" set to `false`:
|
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST /hotels-index/_search
|
POST /hotels-index/_search
|
||||||
{
|
{
|
||||||
"size": 3,
|
"size": 3,
|
||||||
"query": {
|
"query": {
|
||||||
"knn": {
|
"knn": {
|
||||||
"location": {
|
"location": {
|
||||||
"vector": [
|
"vector": [ 5.0, 4.0 ],
|
||||||
5.0,
|
"k": 3,
|
||||||
4.0
|
"filter": {
|
||||||
],
|
"bool": {
|
||||||
"k": 3,
|
"must": {
|
||||||
"filter": {
|
"range": {
|
||||||
"bool": {
|
"rating": {
|
||||||
"must": {
|
"gte": 1,
|
||||||
"range": {
|
"lte": 6
|
||||||
"rating": {
|
|
||||||
"gte": 1,
|
|
||||||
"lte": 6
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"should": [
|
|
||||||
{
|
|
||||||
"term": {
|
|
||||||
"parking": "true"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"wildcard": {
|
|
||||||
"parking": {
|
|
||||||
"value": "t*e"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"regexp": {
|
|
||||||
"parking": "[a-zA-Z]rue"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"must_not": [
|
|
||||||
{
|
|
||||||
"term": {
|
|
||||||
"parking": "false"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"minimum_should_match": 1
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"should": [
|
||||||
|
{
|
||||||
|
"term": {
|
||||||
|
"parking": "true"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"wildcard": {
|
||||||
|
"parking": {
|
||||||
|
"value": "t*e"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"regexp": {
|
||||||
|
"parking": "[a-zA-Z]rue"
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
],
|
||||||
}
|
"must_not": [
|
||||||
}
|
{
|
||||||
```
|
"term": {
|
||||||
#### Example response
|
"parking": "false"
|
||||||
|
}
|
||||||
The following response indicates a few results for the search with filters:
|
}
|
||||||
|
],
|
||||||
```json
|
"minimum_should_match": 1
|
||||||
{
|
}
|
||||||
"took" : 94,
|
|
||||||
"timed_out" : false,
|
|
||||||
"_shards" : {
|
|
||||||
"total" : 1,
|
|
||||||
"successful" : 1,
|
|
||||||
"skipped" : 0,
|
|
||||||
"failed" : 0
|
|
||||||
},
|
|
||||||
"hits" : {
|
|
||||||
"total" : {
|
|
||||||
"value" : 3,
|
|
||||||
"relation" : "eq"
|
|
||||||
},
|
|
||||||
"max_score" : 0.8333333,
|
|
||||||
"hits" : [
|
|
||||||
{
|
|
||||||
"_index" : "hotels-index",
|
|
||||||
"_id" : "1",
|
|
||||||
"_score" : 0.8333333,
|
|
||||||
"_source" : {
|
|
||||||
"location" : [
|
|
||||||
5.2,
|
|
||||||
4.4
|
|
||||||
],
|
|
||||||
"parking" : "true",
|
|
||||||
"rating" : 5
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"_index" : "hotels-index",
|
|
||||||
"_id" : "7",
|
|
||||||
"_score" : 0.154321,
|
|
||||||
"_source" : {
|
|
||||||
"location" : [
|
|
||||||
4.2,
|
|
||||||
6.2
|
|
||||||
],
|
|
||||||
"parking" : "true",
|
|
||||||
"rating" : 5
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"_index" : "hotels-index",
|
|
||||||
"_id" : "12",
|
|
||||||
"_score" : 0.1,
|
|
||||||
"_source" : {
|
|
||||||
"location" : [
|
|
||||||
5.0,
|
|
||||||
1.0
|
|
||||||
],
|
|
||||||
"parking" : "true",
|
|
||||||
"rating" : 3
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
]
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
{% include copy-curl.html %}
|
|
@ -1,7 +1,7 @@
|
||||||
---
|
---
|
||||||
layout: default
|
layout: default
|
||||||
title: Exact k-NN with scoring script
|
title: Exact k-NN with scoring script
|
||||||
nav_order: 20
|
nav_order: 10
|
||||||
parent: k-NN
|
parent: k-NN
|
||||||
has_children: false
|
has_children: false
|
||||||
has_math: true
|
has_math: true
|
||||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 44 KiB |
Binary file not shown.
After Width: | Height: | Size: 66 KiB |
Binary file not shown.
Before Width: | Height: | Size: 65 KiB |
Binary file not shown.
After Width: | Height: | Size: 50 KiB |
Loading…
Reference in New Issue