Merge pull request #351 from opensearch-project/index-apis

Added search API
This commit is contained in:
Keith Chan 2022-01-06 11:32:47 -08:00 committed by GitHub
commit 9f66079121
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 164 additions and 9 deletions

View File

@ -22,7 +22,7 @@ The explain API is an expensive operation in terms of both resources and time. O
To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request:
```json
POST kibana_sample_data_ecommerce/_search?explain=true
POST opensearch_dashboards_sample_data_ecommerce/_search?explain=true
{
"query": {
"match": {
@ -35,7 +35,7 @@ POST kibana_sample_data_ecommerce/_search?explain=true
More often, you want the output for a single document. In that case, specify the document ID in the URL:
```json
POST kibana_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
{
"query": {
"match": {
@ -158,6 +158,6 @@ Term frequency (`tf`) | How many times the term appears in a field for a given d
Inverse document frequency (`idf`) | How often the term appears within the index (across all the documents). The more often the term appears the lower is the relevance score.
Field normalization factor (`fieldNorm`) | The length of the field. OpenSearch assigns a higher relevance score to a term appearing in a relatively short field.
The `tf`, `idf`, and `fieldNorm` values are calculated and stored at index time when a document is added or updated. The values might have some (typically small) inaccuracies as its based on summing the samples returned from each shard.
The `tf`, `idf`, and `fieldNorm` values are calculated and stored at index time when a document is added or updated. The values might have some (typically small) inaccuracies as its based on summing the samples returned from each shard.
Individual queries include other factors for calculating the relevance score, such as term proximity, fuzziness, and so on.

View File

@ -0,0 +1,159 @@
---
layout: default
title: Search
parent: REST API reference
nav_order: 4
---
# Search
Introduced 1.0
{: .label .label-purple }
The Search API operation lets you execute a search request to search your cluster for data.
## Example
```json
GET /movies/_search
{
"query": {
"match": {
"text_entry": "I am the night"
}
}
}
```
## Path and HTTP Methods
```
GET /<target-index>/_search
GET /_search
POST /<target-index>/_search
POST /_search
```
## URL Parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
allow_no_indexes | Boolean | Whether to ignore wildcards that dont match any indexes. Default is true.
allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is true.
analyzer | String | Analyzer to use in the query string.
analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
batched_reduce_size | Integer | How many shard results to reduce on a node. Default is 512.
css_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is true.
default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
df | String | The default field in case a field prefix is not provided in the query string.
docvalue_fields | String | The fields that OpenSearch should return using their docvalue forms.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are all (match any index), open (match open, non-hidden indexes), closed (match closed, non-hidden indexes), hidden (match hidden indexes), and none (deny wildcard expressions). Default is open.
explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is false.
from | Integer | The starting index to search from. Default is 0.
ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is true.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5.
pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards.
preference | String | Specifies which shard or node OpenSearch should perform the count operation on.
q | String | Lucene query strings query.
request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether its enabled in the indexs settings.
rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false.
routing | String | Value used to route the update by query operation to a specific shard.
scroll | Time | How long to keep the search context open.
search_type | String | Whether OpenSearch should use global term and document frequencies when calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. Its usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. Its usually slower but more accurate. Default is `query_then_fetch`.
seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
size | Integer | How many results to include in the response.
sort | List | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
_source | String | Whether to include the `_source` field in the response.
_source_excludes | List | A comma-separated list of source fields to exclude from the response.
_source_includes | List | A comma-separated list of source fields to include in the response.
stats | String | Value to associate with the request for additional logging.
stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
suggest_field | String | Fields OpenSearch can use to look for similar terms.
suggest_mode | String | The mode to use when searching. Available options are `always` (use suggestions based on the provided terms), `popular` (use suggestions that have more occurrences), and `missing` (use suggestions for terms not in the index).
suggest_size | Integer | How many suggestions to return.
suggest_text | String | The source that suggestions should be based off of.
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
timeout | Time | How long the operation should wait for a response from active shards. Default is `1m`.
track_scores | Boolean | Whether to return document scores. Default is false.
track_total_hits | Boolean or Integer | Whether to return how many documents matched the query.
typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true.
version | Boolean | Whether to include the document version as a match.
## Request body
All fields are optional.
Field | Type | Description
:--- | :--- | :---
docvalue_fields | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
fields | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
explain | String | Whether to return details about how OpenSearch computed the document's score. Default is false.
from | Integer | The starting index to search from. Default is 0.
indices_boost | Array of objects | Scores used to boost specified indices' scores. Specify in the format of &lt;index&gt; : &lt;boost-multiplier&gt;
min_score | Integer | Specify a score threshold to return only documents above the threshold.
query | Object | The [DSL query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) to use in the request.
seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
size | Integer | How many results to return. Default is 10.
_source | | Whether to include the `_source` field in the response.
stats | String | Value to associate with the request for additional logging.
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
timeout | Time | How long to wait for a response. Default is no timeout.
version | Boolean | Whether to include the document version in the response.
## Response body
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "superheroes",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"superheroes": [
{
"Hero name": "Superman",
"Real identity": "Clark Kent",
"Age": 28
},
{
"Hero name": "Batman",
"Real identity": "Bruce Wayne",
"Age": 26
},
{
"Hero name": "Flash",
"Real identity": "Barry Allen",
"Age": 28
},
{
"Hero name": "Robin",
"Real identity": "Dick Grayson",
"Age": 15
}
]
}
}
]
}
}
```

View File

@ -16,13 +16,9 @@ neighbors. Of the three search methods the plugin provides, this method offers t
data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is
preferred.
The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during
indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about
Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
These native library indices are loaded into native memory during search and managed by a cache. To learn more about
pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation).
Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the
[stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor