Merge pull request #351 from opensearch-project/index-apis

Added search API
This commit is contained in:
Keith Chan 2022-01-06 11:32:47 -08:00 committed by GitHub
commit 9f66079121
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 164 additions and 9 deletions

View File

@ -22,7 +22,7 @@ The explain API is an expensive operation in terms of both resources and time. O
To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request:
```json
POST kibana_sample_data_ecommerce/_search?explain=true
POST opensearch_dashboards_sample_data_ecommerce/_search?explain=true
{
"query": {
"match": {
@ -35,7 +35,7 @@ POST kibana_sample_data_ecommerce/_search?explain=true
More often, you want the output for a single document. In that case, specify the document ID in the URL:
```json
POST kibana_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
{
"query": {
"match": {

View File

@ -0,0 +1,159 @@
---
layout: default
title: Search
parent: REST API reference
nav_order: 4
---
# Search
Introduced 1.0
{: .label .label-purple }
The Search API operation lets you execute a search request to search your cluster for data.
## Example
```json
GET /movies/_search
{
"query": {
"match": {
"text_entry": "I am the night"
}
}
}
```
## Path and HTTP Methods
```
GET /<target-index>/_search
GET /_search
POST /<target-index>/_search
POST /_search
```
## URL Parameters
All URL parameters are optional.
Parameter | Type | Description
:--- | :--- | :---
allow_no_indexes | Boolean | Whether to ignore wildcards that dont match any indexes. Default is true.
allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is true.
analyzer | String | Analyzer to use in the query string.
analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
batched_reduce_size | Integer | How many shard results to reduce on a node. Default is 512.
css_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is true.
default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
df | String | The default field in case a field prefix is not provided in the query string.
docvalue_fields | String | The fields that OpenSearch should return using their docvalue forms.
expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are all (match any index), open (match open, non-hidden indexes), closed (match closed, non-hidden indexes), hidden (match hidden indexes), and none (deny wildcard expressions). Default is open.
explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is false.
from | Integer | The starting index to search from. Default is 0.
ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is true.
ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5.
pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards.
preference | String | Specifies which shard or node OpenSearch should perform the count operation on.
q | String | Lucene query strings query.
request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether its enabled in the indexs settings.
rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false.
routing | String | Value used to route the update by query operation to a specific shard.
scroll | Time | How long to keep the search context open.
search_type | String | Whether OpenSearch should use global term and document frequencies when calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. Its usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. Its usually slower but more accurate. Default is `query_then_fetch`.
seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
size | Integer | How many results to include in the response.
sort | List | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
_source | String | Whether to include the `_source` field in the response.
_source_excludes | List | A comma-separated list of source fields to exclude from the response.
_source_includes | List | A comma-separated list of source fields to include in the response.
stats | String | Value to associate with the request for additional logging.
stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
suggest_field | String | Fields OpenSearch can use to look for similar terms.
suggest_mode | String | The mode to use when searching. Available options are `always` (use suggestions based on the provided terms), `popular` (use suggestions that have more occurrences), and `missing` (use suggestions for terms not in the index).
suggest_size | Integer | How many suggestions to return.
suggest_text | String | The source that suggestions should be based off of.
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
timeout | Time | How long the operation should wait for a response from active shards. Default is `1m`.
track_scores | Boolean | Whether to return document scores. Default is false.
track_total_hits | Boolean or Integer | Whether to return how many documents matched the query.
typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true.
version | Boolean | Whether to include the document version as a match.
## Request body
All fields are optional.
Field | Type | Description
:--- | :--- | :---
docvalue_fields | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
fields | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
explain | String | Whether to return details about how OpenSearch computed the document's score. Default is false.
from | Integer | The starting index to search from. Default is 0.
indices_boost | Array of objects | Scores used to boost specified indices' scores. Specify in the format of &lt;index&gt; : &lt;boost-multiplier&gt;
min_score | Integer | Specify a score threshold to return only documents above the threshold.
query | Object | The [DSL query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) to use in the request.
seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
size | Integer | How many results to return. Default is 10.
_source | | Whether to include the `_source` field in the response.
stats | String | Value to associate with the request for additional logging.
terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
timeout | Time | How long to wait for a response. Default is no timeout.
version | Boolean | Whether to include the document version in the response.
## Response body
```json
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "superheroes",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"superheroes": [
{
"Hero name": "Superman",
"Real identity": "Clark Kent",
"Age": 28
},
{
"Hero name": "Batman",
"Real identity": "Bruce Wayne",
"Age": 26
},
{
"Hero name": "Flash",
"Real identity": "Barry Allen",
"Age": 28
},
{
"Hero name": "Robin",
"Real identity": "Dick Grayson",
"Age": 15
}
]
}
}
]
}
}
```

View File

@ -16,13 +16,9 @@ neighbors. Of the three search methods the plugin provides, this method offers t
data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is
preferred.
The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during
indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about
Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description).
These native library indices are loaded into native memory during search and managed by a cache. To learn more about
pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation).
Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the
[stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor