Merge pull request #351 from opensearch-project/index-apis
Added search API
This commit is contained in:
commit
9f66079121
|
@ -22,7 +22,7 @@ The explain API is an expensive operation in terms of both resources and time. O
|
||||||
To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request:
|
To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST kibana_sample_data_ecommerce/_search?explain=true
|
POST opensearch_dashboards_sample_data_ecommerce/_search?explain=true
|
||||||
{
|
{
|
||||||
"query": {
|
"query": {
|
||||||
"match": {
|
"match": {
|
||||||
|
@ -35,7 +35,7 @@ POST kibana_sample_data_ecommerce/_search?explain=true
|
||||||
More often, you want the output for a single document. In that case, specify the document ID in the URL:
|
More often, you want the output for a single document. In that case, specify the document ID in the URL:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
POST kibana_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
|
POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
|
||||||
{
|
{
|
||||||
"query": {
|
"query": {
|
||||||
"match": {
|
"match": {
|
||||||
|
@ -158,6 +158,6 @@ Term frequency (`tf`) | How many times the term appears in a field for a given d
|
||||||
Inverse document frequency (`idf`) | How often the term appears within the index (across all the documents). The more often the term appears the lower is the relevance score.
|
Inverse document frequency (`idf`) | How often the term appears within the index (across all the documents). The more often the term appears the lower is the relevance score.
|
||||||
Field normalization factor (`fieldNorm`) | The length of the field. OpenSearch assigns a higher relevance score to a term appearing in a relatively short field.
|
Field normalization factor (`fieldNorm`) | The length of the field. OpenSearch assigns a higher relevance score to a term appearing in a relatively short field.
|
||||||
|
|
||||||
The `tf`, `idf`, and `fieldNorm` values are calculated and stored at index time when a document is added or updated. The values might have some (typically small) inaccuracies as it’s based on summing the samples returned from each shard.
|
The `tf`, `idf`, and `fieldNorm` values are calculated and stored at index time when a document is added or updated. The values might have some (typically small) inaccuracies as it’s based on summing the samples returned from each shard.
|
||||||
|
|
||||||
Individual queries include other factors for calculating the relevance score, such as term proximity, fuzziness, and so on.
|
Individual queries include other factors for calculating the relevance score, such as term proximity, fuzziness, and so on.
|
||||||
|
|
|
@ -0,0 +1,159 @@
|
||||||
|
---
|
||||||
|
layout: default
|
||||||
|
title: Search
|
||||||
|
parent: REST API reference
|
||||||
|
nav_order: 4
|
||||||
|
---
|
||||||
|
|
||||||
|
# Search
|
||||||
|
Introduced 1.0
|
||||||
|
{: .label .label-purple }
|
||||||
|
|
||||||
|
The Search API operation lets you execute a search request to search your cluster for data.
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
```json
|
||||||
|
GET /movies/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"text_entry": "I am the night"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Path and HTTP Methods
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /<target-index>/_search
|
||||||
|
GET /_search
|
||||||
|
|
||||||
|
POST /<target-index>/_search
|
||||||
|
POST /_search
|
||||||
|
```
|
||||||
|
|
||||||
|
## URL Parameters
|
||||||
|
|
||||||
|
All URL parameters are optional.
|
||||||
|
|
||||||
|
Parameter | Type | Description
|
||||||
|
:--- | :--- | :---
|
||||||
|
allow_no_indexes | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is true.
|
||||||
|
allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is true.
|
||||||
|
analyzer | String | Analyzer to use in the query string.
|
||||||
|
analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
|
||||||
|
batched_reduce_size | Integer | How many shard results to reduce on a node. Default is 512.
|
||||||
|
css_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is true.
|
||||||
|
default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
|
||||||
|
df | String | The default field in case a field prefix is not provided in the query string.
|
||||||
|
docvalue_fields | String | The fields that OpenSearch should return using their docvalue forms.
|
||||||
|
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are all (match any index), open (match open, non-hidden indexes), closed (match closed, non-hidden indexes), hidden (match hidden indexes), and none (deny wildcard expressions). Default is open.
|
||||||
|
explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is false.
|
||||||
|
from | Integer | The starting index to search from. Default is 0.
|
||||||
|
ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is true.
|
||||||
|
ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
|
||||||
|
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
|
||||||
|
max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5.
|
||||||
|
pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards.
|
||||||
|
preference | String | Specifies which shard or node OpenSearch should perform the count operation on.
|
||||||
|
q | String | Lucene query string’s query.
|
||||||
|
request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether it’s enabled in the index’s settings.
|
||||||
|
rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false.
|
||||||
|
routing | String | Value used to route the update by query operation to a specific shard.
|
||||||
|
scroll | Time | How long to keep the search context open.
|
||||||
|
search_type | String | Whether OpenSearch should use global term and document frequencies when calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It’s usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It’s usually slower but more accurate. Default is `query_then_fetch`.
|
||||||
|
seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
|
||||||
|
size | Integer | How many results to include in the response.
|
||||||
|
sort | List | A comma-separated list of <field> : <direction> pairs to sort by.
|
||||||
|
_source | String | Whether to include the `_source` field in the response.
|
||||||
|
_source_excludes | List | A comma-separated list of source fields to exclude from the response.
|
||||||
|
_source_includes | List | A comma-separated list of source fields to include in the response.
|
||||||
|
stats | String | Value to associate with the request for additional logging.
|
||||||
|
stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
|
||||||
|
suggest_field | String | Fields OpenSearch can use to look for similar terms.
|
||||||
|
suggest_mode | String | The mode to use when searching. Available options are `always` (use suggestions based on the provided terms), `popular` (use suggestions that have more occurrences), and `missing` (use suggestions for terms not in the index).
|
||||||
|
suggest_size | Integer | How many suggestions to return.
|
||||||
|
suggest_text | String | The source that suggestions should be based off of.
|
||||||
|
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
|
||||||
|
timeout | Time | How long the operation should wait for a response from active shards. Default is `1m`.
|
||||||
|
track_scores | Boolean | Whether to return document scores. Default is false.
|
||||||
|
track_total_hits | Boolean or Integer | Whether to return how many documents matched the query.
|
||||||
|
typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true.
|
||||||
|
version | Boolean | Whether to include the document version as a match.
|
||||||
|
|
||||||
|
## Request body
|
||||||
|
|
||||||
|
All fields are optional.
|
||||||
|
|
||||||
|
Field | Type | Description
|
||||||
|
:--- | :--- | :---
|
||||||
|
docvalue_fields | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
|
||||||
|
fields | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
|
||||||
|
explain | String | Whether to return details about how OpenSearch computed the document's score. Default is false.
|
||||||
|
from | Integer | The starting index to search from. Default is 0.
|
||||||
|
indices_boost | Array of objects | Scores used to boost specified indices' scores. Specify in the format of <index> : <boost-multiplier>
|
||||||
|
min_score | Integer | Specify a score threshold to return only documents above the threshold.
|
||||||
|
query | Object | The [DSL query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) to use in the request.
|
||||||
|
seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
|
||||||
|
size | Integer | How many results to return. Default is 10.
|
||||||
|
_source | | Whether to include the `_source` field in the response.
|
||||||
|
stats | String | Value to associate with the request for additional logging.
|
||||||
|
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
|
||||||
|
timeout | Time | How long to wait for a response. Default is no timeout.
|
||||||
|
version | Boolean | Whether to include the document version in the response.
|
||||||
|
|
||||||
|
## Response body
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"took": 3,
|
||||||
|
"timed_out": false,
|
||||||
|
"_shards": {
|
||||||
|
"total": 1,
|
||||||
|
"successful": 1,
|
||||||
|
"skipped": 0,
|
||||||
|
"failed": 0
|
||||||
|
},
|
||||||
|
"hits": {
|
||||||
|
"total": {
|
||||||
|
"value": 1,
|
||||||
|
"relation": "eq"
|
||||||
|
},
|
||||||
|
"max_score": 1.0,
|
||||||
|
"hits": [
|
||||||
|
{
|
||||||
|
"_index": "superheroes",
|
||||||
|
"_type": "_doc",
|
||||||
|
"_id": "1",
|
||||||
|
"_score": 1.0,
|
||||||
|
"_source": {
|
||||||
|
"superheroes": [
|
||||||
|
{
|
||||||
|
"Hero name": "Superman",
|
||||||
|
"Real identity": "Clark Kent",
|
||||||
|
"Age": 28
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Hero name": "Batman",
|
||||||
|
"Real identity": "Bruce Wayne",
|
||||||
|
"Age": 26
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Hero name": "Flash",
|
||||||
|
"Real identity": "Barry Allen",
|
||||||
|
"Age": 28
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Hero name": "Robin",
|
||||||
|
"Real identity": "Dick Grayson",
|
||||||
|
"Age": 15
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
|
@ -16,13 +16,9 @@ neighbors. Of the three search methods the plugin provides, this method offers t
|
||||||
data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is
|
data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is
|
||||||
preferred.
|
preferred.
|
||||||
|
|
||||||
The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during
|
The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
|
||||||
indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about
|
|
||||||
Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
|
|
||||||
These native library indices are loaded into native memory during search and managed by a cache. To learn more about
|
These native library indices are loaded into native memory during search and managed by a cache. To learn more about
|
||||||
pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation).
|
pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
|
||||||
Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the
|
|
||||||
[stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
|
|
||||||
|
|
||||||
Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
|
Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
|
||||||
and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor
|
and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor
|
||||||
|
|
Loading…
Reference in New Issue