Merge pull request #351 from opensearch-project/index-apis

Added search API
2022-01-06 11:32:47 -08:00 · 2022-01-06 11:32:47 -08:00 · 9f66079121
parent 8924d44139 eecaeab15f
commit 9f66079121
3 changed files with 164 additions and 9 deletions
--- a/_opensearch/rest-api/explain.md
+++ b/_opensearch/rest-api/explain.md
@ -22,7 +22,7 @@ The explain API is an expensive operation in terms of both resources and time. O
 To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request:

 ```json
-POST kibana_sample_data_ecommerce/_search?explain=true
+POST opensearch_dashboards_sample_data_ecommerce/_search?explain=true
 {
  "query": {
    "match": {
@ -35,7 +35,7 @@ POST kibana_sample_data_ecommerce/_search?explain=true
 More often, you want the output for a single document. In that case, specify the document ID in the URL:

 ```json
-POST kibana_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
+POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
 {
  "query": {
    "match": {
--- a/_opensearch/rest-api/search.md
+++ b/_opensearch/rest-api/search.md
@ -0,0 +1,159 @@
+---
+layout: default
+title: Search
+parent: REST API reference
+nav_order: 4
+---
+
+# Search
+Introduced 1.0
+{: .label .label-purple }
+
+The Search API operation lets you execute a search request to search your cluster for data.
+
+## Example
+
+```json
+GET /movies/_search
+{
+  "query": {
+    "match": {
+      "text_entry": "I am the night"
+    }
+  }
+}
+```
+
+## Path and HTTP Methods
+
+```
+GET /<target-index>/_search
+GET /_search
+
+POST /<target-index>/_search
+POST /_search
+```
+
+## URL Parameters
+
+All URL parameters are optional.
+
+Parameter | Type | Description
+:--- | :--- | :---
+allow_no_indexes | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is true.
+allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is true.
+analyzer | String | Analyzer to use in the query string.
+analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
+batched_reduce_size | Integer | How many shard results to reduce on a node. Default is 512.
+css_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is true.
+default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
+df | String | The default field in case a field prefix is not provided in the query string.
+docvalue_fields | String | The fields that OpenSearch should return using their docvalue forms.
+expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are all (match any index), open (match open, non-hidden indexes), closed (match closed, non-hidden indexes), hidden (match hidden indexes), and none (deny wildcard expressions). Default is open.
+explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is false.
+from | Integer | The starting index to search from. Default is 0.
+ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is true.
+ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
+lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
+max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5.
+pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards.
+preference | String | Specifies which shard or node OpenSearch should perform the count operation on.
+q | String | Lucene query string’s query.
+request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether it’s enabled in the index’s settings.
+rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false.
+routing | String | Value used to route the update by query operation to a specific shard.
+scroll | Time | How long to keep the search context open.
+search_type | String | Whether OpenSearch should use global term and document frequencies when calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It’s usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It’s usually slower but more accurate. Default is `query_then_fetch`.
+seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
+size | Integer | How many results to include in the response.
+sort | List | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
+_source | String | Whether to include the `_source` field in the response.
+_source_excludes | List | A comma-separated list of source fields to exclude from the response.
+_source_includes | List | A comma-separated list of source fields to include in the response.
+stats | String | Value to associate with the request for additional logging.
+stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
+suggest_field | String | Fields OpenSearch can use to look for similar terms.
+suggest_mode | String | The mode to use when searching. Available options are `always` (use suggestions based on the provided terms), `popular` (use suggestions that have more occurrences), and `missing` (use suggestions for terms not in the index).
+suggest_size | Integer | How many suggestions to return.
+suggest_text | String | The source that suggestions should be based off of.
+terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
+timeout | Time | How long the operation should wait for a response from active shards. Default is `1m`.
+track_scores | Boolean | Whether to return document scores. Default is false.
+track_total_hits | Boolean or Integer | Whether to return how many documents matched the query.
+typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true.
+version | Boolean | Whether to include the document version as a match.
+
+## Request body
+
+All fields are optional.
+
+Field | Type | Description
+:--- | :--- | :---
+docvalue_fields | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
+fields | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
+explain | String | Whether to return details about how OpenSearch computed the document's score. Default is false.
+from | Integer | The starting index to search from. Default is 0.
+indices_boost | Array of objects | Scores used to boost specified indices' scores. Specify in the format of &lt;index&gt; : &lt;boost-multiplier&gt;
+min_score | Integer | Specify a score threshold to return only documents above the threshold.
+query | Object | The [DSL query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) to use in the request.
+seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
+size | Integer | How many results to return. Default is 10.
+_source | | Whether to include the `_source` field in the response.
+stats | String | Value to associate with the request for additional logging.
+terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
+timeout | Time | How long to wait for a response. Default is no timeout.
+version | Boolean | Whether to include the document version in the response.
+
+## Response body
+
+```json
+{
+  "took": 3,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1.0,
+    "hits": [
+      {
+        "_index": "superheroes",
+        "_type": "_doc",
+        "_id": "1",
+        "_score": 1.0,
+        "_source": {
+          "superheroes": [
+            {
+              "Hero name": "Superman",
+              "Real identity": "Clark Kent",
+              "Age": 28
+            },
+            {
+              "Hero name": "Batman",
+              "Real identity": "Bruce Wayne",
+              "Age": 26
+            },
+            {
+              "Hero name": "Flash",
+              "Real identity": "Barry Allen",
+              "Age": 28
+            },
+            {
+              "Hero name": "Robin",
+              "Real identity": "Dick Grayson",
+              "Age": 15
+            }
+          ]
+        }
+      }
+    ]
+  }
+}
+```
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@ -16,13 +16,9 @@ neighbors. Of the three search methods the plugin provides, this method offers t
 data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is 
 preferred.

-The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during 
-indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about 
-Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). 
+The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). 
 These native library indices are loaded into native memory during search and managed by a cache. To learn more about
-pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). 
-Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the 
-[stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
+pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).

 Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
 and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor