Adds Lucene version variable, minor improvements to REST API and query DSL

This commit is contained in:
aetter 2021-09-01 11:37:54 -07:00
parent f29b04bdb9
commit ecea83707d
13 changed files with 92 additions and 55 deletions

View File

@ -7,6 +7,7 @@ permalink: /:path/
opensearch_version: 1.0.1
opensearch_major_minor_version: 1.0
lucene_version: 8_8_2
# Build settings
markdown: kramdown

View File

@ -88,8 +88,8 @@ GET opensearch_dashboards_sample_data_ecommerce/_search
}
```
The cardinality count is approximate.
If you had tens of thousands of products in your store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well because it requires more memory and causes high latency.
Cardinality count is approximate.
If you have tens of thousands of products in your hypothetical store, an accurate cardinality calculation requires loading all the values into a hash set and returning its size. This approach doesn't scale well; it requires huge amounts of memory and can cause high latencies.
You can control the trade-off between memory and accuracy with the `precision_threshold` setting. This setting defines the threshold below which counts are expected to be close to accurate. Above this value, counts might become a bit less accurate. The default value of `precision_threshold` is 3,000. The maximum supported value is 40,000.

View File

@ -21,7 +21,7 @@ This page lists all full-text query types and common options. Given the sheer nu
## Match
Creates a [boolean query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
Creates a [boolean query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/BooleanQuery.html) that returns results if the search term is present in the field.
The most basic form of the query provides only a field (`title`) and a term (`wind`):
@ -126,7 +126,7 @@ GET _search
## Match boolean prefix
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match](#match), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -164,7 +164,7 @@ GET _search
## Match phrase
Creates a [phrase query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
Creates a [phrase query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) that matches a sequence of terms.
```json
GET _search
@ -198,7 +198,7 @@ GET _search
## Match phrase prefix
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
Similar to [match phrase](#match-phrase), but creates a [prefix query](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PrefixQuery.html) out of the last term in the query string.
```json
GET _search
@ -410,7 +410,7 @@ Option | Valid values | Description
`allow_leading_wildcard` | Boolean | Whether `*` and `?` are allowed as the first character of a search term. The default is true.
`analyze_wildcard` | Boolean | Whether OpenSearch should attempt to analyze wildcard terms. Some analyzers do a poor job at this task, so the default is false.
`analyzer` | `standard, simple, whitespace, stop, keyword, pattern, <language>, fingerprint` | The analyzer you want to use for the query. Different analyzers have different character filters, tokenizers, and token filters. The `stop` analyzer, for example, removes stop words (e.g. "an," "but," "this") from the query string.
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`auto_generate_synonyms_phrase_query` | Boolean | A value of true (default) automatically generates [phrase queries](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html) for multi-term synonyms. For example, if you have the synonym `"ba, batting average"` and search for "ba," OpenSearch searches for `ba OR "batting average"` (if this option is true) or `ba OR (batting AND average)` (if this option is false).
`boost` | Floating-point | Boosts the clause by the given multiplier. Useful for weighing clauses in compound queries. The default is 1.0.
`cutoff_frequency` | Between `0.0` and `1.0` or a positive integer | This value lets you define high and low frequency terms based on number of occurrences in the index. Numbers between 0 and 1 are treated as a percentage. For example, 0.10 is 10%. This value means that if a word occurs within the search field in more than 10% of the documents on the shard, OpenSearch considers the word "high frequency" and deemphasizes it when calculating search score.<br /><br />Because this setting is *per shard*, testing its impact on search results can be challenging unless a cluster has many documents.
`enable_position_increments` | Boolean | When true, result queries are aware of position increments. This setting is useful when the removal of stop words leaves an unwanted "gap" between terms. The default is true.
@ -420,7 +420,7 @@ Option | Valid values | Description
`fuzzy_transpositions` | Boolean | Setting `fuzzy_transpositions` to true (default) adds swaps of adjacent characters to the insert, delete, and substitute operations of the `fuzziness` option. For example, the distance between `wind` and `wnid` is 1 if `fuzzy_transpositions` is true (swap "n" and "i") and 2 if it is false (delete "n", insert "n"). <br /><br />If `fuzzy_transpositions` is false, `rewind` and `wnid` have the same distance (2) from `wind`, despite the more human-centric opinion that `wnid` is an obvious typo. The default is a good choice for most use cases.
`lenient` | Boolean | Setting `lenient` to true lets you ignore data type mismatches between the query and the document field. For example, a query string of "8.2" could match a field of type `float`. The default is false.
`low_freq_operator` | `and, or` | The operator for low-frequency terms. The default is `or`. See [Common terms](#common-terms) queries and `operator` in this table.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_determinized_states` | Positive integer | The maximum number of "[states](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/util/automaton/Operations.html#DEFAULT_MAX_DETERMINIZED_STATES)" (a measure of complexity) that Lucene can create for query strings that contain regular expressions (e.g. `"query": "/wind.+?/"`). Larger numbers allow for queries that use more memory. The default is 10,000.
`max_expansions` | Positive integer | Fuzzy queries "expand to" a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms against its indices. `max_expansions` specifies the maximum number of terms that the fuzzy query expands to. The default is 50.
`minimum_should_match` | Positive or negative integer, positive or negative percentage, combination | If the query string contains multiple search terms and you used the `or` operator, the number of terms that need to match for the document to be considered a match. For example, if `minimum_should_match` is 2, "wind often rising" does not match "The Wind Rises." If `minimum_should_match` is 1, it matches. This option also has `low_freq` and `high_freq` properties for [Common terms](#common-terms) queries.
`operator` | `or, and` | If the query string contains multiple search terms, whether all terms need to match (`and`) or only one term needs to match (`or`) for a document to be considered a match.
@ -428,7 +428,7 @@ Option | Valid values | Description
`prefix_length` | `0` (default) or a positive integer | The number of leading characters that are not considered in fuzziness.
`quote_field_suffix` | String | This option lets you search different fields depending on whether terms are wrapped in quotes. For example, if `quote_field_suffix` is `".exact"` and you search for `"lightly"` (in quotes) in the `title` field, OpenSearch searches the `title.exact` field. This second field might use a different type (e.g. `keyword` rather than `text`) or a different analyzer. The default is null.
`rewrite` | `constant_score, scoring_boolean, constant_score_boolean, top_terms_N, top_terms_boost_N, top_terms_blended_freqs_N` | Determines how OpenSearch rewrites and scores multi-term queries. The default is `constant_score`.
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`slop` | `0` (default) or a positive integer | Controls the degree to which words in a query can be misordered and still be considered a match. From the [Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/PhraseQuery.html#getSlop--): "The number of other words permitted between words in query phrase. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. A value of zero requires an exact match."
`tie_breaker` | `0.0` (default) to `1.0` | Changes the way OpenSearch scores searches. For example, a `type` of `best_fields` typically uses the highest score from any one field. If you specify a `tie_breaker` value between 0.0 and 1.0, the score changes to highest score + `tie_breaker` * score for all other matching fields. If you specify a value of 1.0, OpenSearch adds together the scores for all matching fields (effectively defeating the purpose of `best_fields`).
`time_zone` | UTC offset | The time zone to use (e.g. `-08:00`) if the query string contains a date range (e.g. `"query": "wind rises release_date[2012-01-01 TO 2014-01-01]"`). The default is `UTC`.
`type` | `best_fields, most_fields, cross-fields, phrase, phrase_prefix` | Determines how OpenSearch executes the query and scores the results. The default is `best_fields`.

View File

@ -430,7 +430,7 @@ Wildcard queries tend to be slow because they need to iterate over a lot of term
## Regex
Use the `regex` query to search for terms that match a regular expression.
Use the `regexp` query to search for terms that match a regular expression.
This regular expression matches any single uppercase or lowercase letter:
@ -439,12 +439,14 @@ GET shakespeare/_search
{
"query": {
"regexp": {
"play_name": "H[a-zA-Z]+mlet"
"play_name": "[a-zA-Z]amlet"
}
}
}
```
Regular expressions are applied to the terms in the field and not the entire value of the field.
A few important notes:
The efficiency of your regular expression depends a lot on the patterns you write. Make sure that you write `regex` queries with either a prefix or suffix to improve performance.
- Regular expressions are applied to the terms in the field (i.e. tokens), not the entire field.
- Regular expressions use the Lucene syntax, which differs from more standardized implementations. Test thoroughly to ensure that you receive the results you expect. To learn more, see [the Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/index.html).
- `regexp` queries can be expensive operations and require the `search.allow_expensive_queries` setting to be set to `true`. Before making frequent `regexp` queries, test their impact on cluster performance and examine alternative queries for achieving similar results.

View File

@ -32,7 +32,7 @@ POST _bulk
```
POST _bulk
POST {index}/_bulk
POST <index>/_bulk
```
Specifying the index in the path means you don't need to include it in the [request body]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/bulk/#request-body).

View File

@ -38,8 +38,7 @@ All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Name or list of the data streams, indices, or aliases to delete from. Supports wildcards. If left blank, OpenSearch searches all indices.
allow_no_indices - Whether to ignore wildcards that dont match any indices. Default is `true`.
allow_no_indices | Boolean | False indicates to OpenSearch the request should return an error if any wildcard expression or index alias targets only missing or closed indices. Default is true.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
analyzer | String | The analyzer to use in the query string.
analyze_wildcard | Boolean | Specifies whether to analyze wildcard and prefix queries. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the delete by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.

View File

@ -45,7 +45,7 @@ All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :--- | :---
&lt;index&gt; | String | Comma-separated list of indices to update. To update all indices, use * or omit this parameter.
allow_no_indices | String | Whether to ignore wildcards that dont match any indices. Default is true.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
analyzer | String | Analyzer to use in the query string.
analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
conflicts | String | Indicates to OpenSearch what should happen if the update by query operation runs into a version conflict. Valid options are `abort` and `proceed`. Default is `abort`.

View File

@ -11,7 +11,7 @@ Introduced 1.0
Wondering why a specific document ranks higher (or lower) for a query? You can use the explain API for an explanation of how the relevance score (`_score`) is calculated for every result.
OpenSearch uses a probabilistic ranking framework called [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) to calculate relevance scores. Okapi BM25 is based on the original [TF/IDF](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/package-summary.html#scoring) framework used by Apache Lucene.
OpenSearch uses a probabilistic ranking framework called [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) to calculate relevance scores. Okapi BM25 is based on the original [TF/IDF](http://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/search/package-summary.html#scoring) framework used by Apache Lucene.
The explain API is an expensive operation in terms of both resources and time. On production clusters, we recommend using it sparingly for the purpose of troubleshooting.
{: .warning }

View File

@ -1,63 +1,94 @@
---
layout: default
title: Multi search
title: Multi-search
parent: REST API reference
nav_order: 130
---
# Multi search
# Multi-search
Introduced 1.0
{: .label .label-purple }
The multi-search operation lets you bundle multiple search requests and send them to your OpenSearch cluster in a single request. This operation executes searches in parallel, so you get back the response more quickly as compared to independent search requests. It also executes each request independently, so the failure of one request doesn't affect the others.
As the name suggests, the multi-search operation lets you bundle multiple search requests into a single request. OpenSearch then executes the searches in parallel, so you get back the response more quickly compared to sending one request per search. OpenSearch executes each search independently, so the failure of one doesn't affect the others.
The multi-search request body follows this pattern:
```
header\n
body\n
header\n
body\n
```
OpenSearch uses newline characters to parse multi-search requests and requires that each request ends with a newline character.
## Example
```json
GET _msearch
{"index":"opensearch_dashboards_sample_data_logs"}
{"query":{"match_all":{}},"from":0,"size":10}
{"index":"opensearch_dashboards_sample_data_ecommerce","search_type":"dfs_query_then_fetch"}
{"query":{"match_all":{}}}
{ "index": "opensearch_dashboards_sample_data_logs"}
{ "query": { "match_all": {} }, "from": 0, "size": 10}
{ "index": "opensearch_dashboards_sample_data_ecommerce", "search_type": "dfs_query_then_fetch"}
{ "query": { "match_all": {} } }
```
## Path and HTTP methods
```
GET <target>/_msearch
GET _msearch
GET <indices>/_msearch
POST _msearch
POST <indices>/_msearch
```
## URL parameters
All multi-search URL parameters are optional.
## Request body
Parameter | Type | Description
The multi-search request body follows this pattern:
```
Metadata\n
Query\n
Metadata\n
Query\n
```
- Metadata lines include options, such as which indices to search and the type of search.
- Query lines use the [query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/).
Just like the [bulk]({{site.url}}{{site.baseurl}}/opensearch/rest-api/document-apis/bulk/) operation, the JSON doesn't need to be minified---spaces are fine---but it does need to be on a single line. OpenSearch uses newline characters to parse multi-search requests and requires that the request body end with a newline character.
## URL parameters and metadata options
All multi-search URL parameters are optional. Some can also be applied per-search as part of each metadata line.
Parameter | Type | Description | Supported in metadata line
:--- | :--- | :---
`allow_no_indices` | Boolean | Whether to ignore wildcards that don't match any indices. Default is `true`.
`css_minimize_roundtrips` | Boolean | If true, network roundtrips between the local node and remote clusters are minimized for cross-cluster search requests. Default is `true`.
`expand_wildcards` | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`.
`ignore_unavailable` | Boolean | If an index from the indices list doesnt exist, whether to ignore it rather than fail the query. Default is `false`.
`max_concurrent_searches` | Integer | Maximum number of searches executed in parallel. Default is `max(1, (number of of data nodes * min(search thread pool size, 10)))`.
`max_concurrent_shard_requests` | Integer | Maximum number of concurrent shard requests that each sub-search request executes per node. Default is 5. If you have an environment where a very low number of concurrent search requests is expected, a higher value of this parameter might improve performance.
`pre_filter_shard_size` | Integer | Defines a threshold that enforces a round-trip to pre-filter search shards that cannot possibly match. This filter phase can limit the number of searched shards significantly. For instance, if a date range filter is applied, then all indices that don't contain documents within that date range are skipped. Default is 128.
`rest_total_hits_as_int` | String | Whether the `hits.total` property is returned as an integer or an object. Default is `false`.
`search_type` | String | Whether global term and document frequencies are used when calculating the relevance score. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It's usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It's usually slower but more accurate. Default is `query_then_fetch`.
`typed_keys` | Boolean | Whether aggregation names are prefixed by their internal types in the response. Default is `false`.
allow_no_indices | Boolean | Whether to ignore wildcards that don't match any indices. Default is `true`. | Yes
css_minimize_roundtrips | Boolean | Whether OpenSearch should try to minimize the number of network round trips between the coordinating node and remote clusters (only applicable to cross-cluster search requests). Default is `true`. | No
expand_wildcards | Enum | Expands wildcard expressions to concrete indices. Combine multiple values with commas. Supported values are `all`, `open`, `closed`, `hidden`, and `none`. Default is `open`. | Yes
ignore_unavailable | Boolean | If an index from the indices list doesnt exist, whether to ignore it rather than fail the query. Default is `false`. | Yes
max_concurrent_searches | Integer | The maximum number of concurrent searches. The default depends on your node count and search thread pool size. Higher values can improve performance, but risk overloading the cluster. | No
max_concurrent_shard_requests | Integer | Maximum number of concurrent shard requests that each search executes per node. Default is 5. Higher values can improve performance, but risk overloading the cluster. | No
pre_filter_shard_size | Integer | Default is 128. | No
rest_total_hits_as_int | String | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`. | No
search_type | String | Affects relevance score. Valid options are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using term and document frequencies for the shard (faster, less accurate), whereas `dfs_query_then_fetch` uses term and document frequencies across all shards (slower, more accurate). Default is `query_then_fetch`. | Yes
typed_keys | Boolean | Whether to prefix aggregation names with their internal types in the response. Default is `false`. | No
{% comment %}Regarding `pre_filter_shard_size`: The description from the REST API specification is unintelligible---to me, anyway. I wasn't able to learn anything from reading the source code, either, so I've included the default value and nothing else in the table above. - aetter
From the REST API specification: A threshold that enforces a pre-filter round trip to prefilter search shards based on query rewriting if the number of shards the search request expands to exceeds the threshold. This filter roundtrip can limit the number of shards significantly if for instance a shard can not match any documents based on its rewrite method ie. if date filters are mandatory to match but the shard bounds and the query are disjoint.{% endcomment %}
## Metadata-only options
Some options can't be applied as URL parameters to the entire request. Instead, you can apply them per-search as part of each metadata line. All are optional.
Option | Type | Description
:--- | :--- | :---
index | String, string array | If you don't specify an index or multiple indices as part of the URL (or want to override the URL value for an individual search), you can include it here. Examples include `"logs-*"` and `["my-store", "sample_data_ecommerce"]`.
preference | String | The nodes or shards that you'd like to perform the search. This setting can be useful for testing, but in most situations, the default behavior provides the best search latencies. Options include `_local`, `_only_local`, `_prefer_nodes`, `_only_nodes`, and `_shards`. These last three options accept a list of nodes or shards. Examples include `"_only_nodes:data-node1,data-node2"` and `"_shards:0,1`.
request_cache | Boolean | Whether to cache results, which can improve latency for repeat searches. Default is to use the `index.requests.cache.enable` setting for the index (which defaults to `true` for new indices).
routing | String | Comma-separated custom routing values (e.g. `"routing": "value1,value2,value3"`.
## Response
You get back the responses in an array form, where the search response for each search request matches its order in the original multi-search request.
OpenSearch returns an array with the results of each search in the same order as the multi-search request.
```json
{

View File

@ -98,7 +98,7 @@ Parameter | Type | Description
:--- | :--- | :---
scroll | Time | Specifies the amount of time the search context is maintained.
scroll_id | String | The scroll ID for the search.
rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer or an object. Default is false.
rest_total_hits_as_int | Boolean | Whether the `hits.total` property is returned as an integer (`true`) or an object (`false`). Default is `false`.
## Response

View File

@ -47,7 +47,7 @@ All update mapping parameters are optional.
Parameter | Data Type | Description
:--- | :--- | :---
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is true.
allow_no_indices | Boolean | Whether to ignore wildcards that dont match any indices. Default is `true`.
expand_wildcards | String | Expands wildcard expressions to different indices. Combine multiple values with commas. Available values are `all` (match all indices), `open` (match open indices), `closed` (match closed indices), `hidden` (match hidden indices), and `none` (do not accept wildcard expressions), which must be used with `open`, `closed`, or both. Default is `open`.
ignore_unavailable | Boolean | If true, OpenSearch does not include missing or closed indices in the response.
master_timeout | Time | How long to wait for a connection to the master node. Default is `30s`.

View File

@ -194,12 +194,16 @@ For asynchronous searches with `keep_on_completion` as `true` and a sufficiently
Introduced 1.0
{: .label .label-purple }
You can use the DELETE API operation to delete any ongoing asynchronous search by its ID. If the search is still running, its canceled. If the search is complete, the saved search results are deleted.
To delete an asynchronous search:
```json
```
DELETE _plugins/_asynchronous_search/<ID>?pretty
```
- If the search is still running, OpenSearch cancels it.
- If the search is complete, OpenSearch deletes the saved results.
#### Sample response
```json

View File

@ -11,7 +11,7 @@ has_math: true
The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the Hierarchical Navigable Small World (HNSW) algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is preferred.
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
The k-NN plugin builds an HNSW graph of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). These graphs are loaded into native memory during search and managed by a cache. To learn more about pre-loading graphs into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what graphs are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.