Merge pull request #351 from opensearch-project/index-apis

Added search API
2022-01-06 11:32:47 -08:00 · 2022-01-06 11:32:47 -08:00 · 9f66079121
parent 8924d44139 eecaeab15f
commit 9f66079121
3 changed files with 164 additions and 9 deletions
--- a/_opensearch/rest-api/explain.md
+++ b/_opensearch/rest-api/explain.md
@ -22,7 +22,7 @@ The explain API is an expensive operation in terms of both resources and time. O
 To see the explain output for all results, set the `explain` flag to `true` either in the URL or in the body of the request:
 ```json
-POST kibana_sample_data_ecommerce/_search?explain=true
+POST opensearch_dashboards_sample_data_ecommerce/_search?explain=true
 {
  "query": {
    "match": {
@ -35,7 +35,7 @@ POST kibana_sample_data_ecommerce/_search?explain=true
 More often, you want the output for a single document. In that case, specify the document ID in the URL:
 ```json
-POST kibana_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
+POST opensearch_dashboards_sample_data_ecommerce/_explain/EVz1Q3sBgg5eWQP6RSte
 {
  "query": {
    "match": {
--- a/_opensearch/rest-api/search.md
+++ b/_opensearch/rest-api/search.md
@ -0,0 +1,159 @@
 ---
 layout: default
 title: Search
 parent: REST API reference
 nav_order: 4
 ---
 # Search
 Introduced 1.0
 {: .label .label-purple }
 The Search API operation lets you execute a search request to search your cluster for data.
 ## Example
 ```json
 GET /movies/_search
 {
  "query": {
    "match": {
      "text_entry": "I am the night"
    }
  }
 }
 ```
 ## Path and HTTP Methods
 ```
 GET /<target-index>/_search
 GET /_search
 POST /<target-index>/_search
 POST /_search
 ```
 ## URL Parameters
 All URL parameters are optional.
 Parameter | Type | Description
 :--- | :--- | :---
 allow_no_indexes | Boolean | Whether to ignore wildcards that don’t match any indexes. Default is true.
 allow_partial_search_results | Boolean | Whether to return partial results if the request runs into an error or times out. Default is true.
 analyzer | String | Analyzer to use in the query string.
 analyze_wildcard | Boolean | Whether the update operation should include wildcard and prefix queries in the analysis. Default is false.
 batched_reduce_size | Integer | How many shard results to reduce on a node. Default is 512.
 css_minimize_roundtrips | Boolean | Whether to minimize roundtrips between a node and remote clusters. Default is true.
 default_operator | String | Indicates whether the default operator for a string query should be AND or OR. Default is OR.
 df | String | The default field in case a field prefix is not provided in the query string.
 docvalue_fields | String | The fields that OpenSearch should return using their docvalue forms.
 expand_wildcards | String | Specifies the type of index that wildcard expressions can match. Supports comma-separated values. Valid values are all (match any index), open (match open, non-hidden indexes), closed (match closed, non-hidden indexes), hidden (match hidden indexes), and none (deny wildcard expressions). Default is open.
 explain | Boolean | Whether to return details about how OpenSearch computed the document's score. Default is false.
 from | Integer | The starting index to search from. Default is 0.
 ignore_throttled | Boolean | Whether to ignore concrete, expanded, or indexes with aliases if indexes are frozen. Default is true.
 ignore_unavailable | Boolean | Specifies whether to include missing or closed indexes in the response. Default is false.
 lenient | Boolean | Specifies whether OpenSearch should accept requests if queries have format errors (for example, querying a text field for an integer). Default is false.
 max_concurrent_shard_requests | Integer | How many concurrent shard requests this request should execute on each node. Default is 5.
 pre_filter_shard_size | Integer | A prefilter size threshold that triggers a prefilter operation if the request exceeds the threshold. Default is 128 shards.
 preference | String | Specifies which shard or node OpenSearch should perform the count operation on.
 q | String | Lucene query string’s query.
 request_cache | Boolean | Specifies whether OpenSearch should use the request cache. Default is whether it’s enabled in the index’s settings.
 rest_total_hits_as_int | Boolean | Whether to return `hits.total` as an integer. Returns an object otherwise. Default is false.
 routing | String | Value used to route the update by query operation to a specific shard.
 scroll | Time | How long to keep the search context open.
 search_type | String | Whether OpenSearch should use global term and document frequencies when calculating revelance scores. Valid choices are `query_then_fetch` and `dfs_query_then_fetch`. `query_then_fetch` scores documents using local term and document frequencies for the shard. It’s usually faster but less accurate. `dfs_query_then_fetch` scores documents using global term and document frequencies across all shards. It’s usually slower but more accurate. Default is `query_then_fetch`.
 seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
 size | Integer | How many results to include in the response.
 sort | List | A comma-separated list of &lt;field&gt; : &lt;direction&gt; pairs to sort by.
 _source | String | Whether to include the `_source` field in the response.
 _source_excludes | List | A comma-separated list of source fields to exclude from the response.
 _source_includes | List | A comma-separated list of source fields to include in the response.
 stats | String | Value to associate with the request for additional logging.
 stored_fields | Boolean | Whether the get operation should retrieve fields stored in the index. Default is false.
 suggest_field | String | Fields OpenSearch can use to look for similar terms.
 suggest_mode | String | The mode to use when searching. Available options are `always` (use suggestions based on the provided terms), `popular` (use suggestions that have more occurrences), and `missing` (use suggestions for terms not in the index).
 suggest_size | Integer | How many suggestions to return.
 suggest_text | String | The source that suggestions should be based off of.
 terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
 timeout | Time | How long the operation should wait for a response from active shards. Default is `1m`.
 track_scores | Boolean | Whether to return document scores. Default is false.
 track_total_hits | Boolean or Integer | Whether to return how many documents matched the query.
 typed_keys | Boolean | Whether returned aggregations and suggested terms should include their types in the response. Default is true.
 version | Boolean | Whether to include the document version as a match.
 ## Request body
 All fields are optional.
 Field | Type | Description
 :--- | :--- | :---
 docvalue_fields | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
 fields | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
 explain | String | Whether to return details about how OpenSearch computed the document's score. Default is false.
 from | Integer | The starting index to search from. Default is 0.
 indices_boost | Array of objects | Scores used to boost specified indices' scores. Specify in the format of &lt;index&gt; : &lt;boost-multiplier&gt;
 min_score | Integer | Specify a score threshold to return only documents above the threshold.
 query | Object | The [DSL query]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index) to use in the request.
 seq_no_primary_term | Boolean | Whether to return sequence number and primary term of the last operation of each document hit.
 size | Integer | How many results to return. Default is 10.
 _source | | Whether to include the `_source` field in the response.
 stats | String | Value to associate with the request for additional logging.
 terminate_after | Integer | The maximum number of documents OpenSearch should process before terminating the request. Default is 0.
 timeout | Time | How long to wait for a response. Default is no timeout.
 version | Boolean | Whether to include the document version in the response.
 ## Response body
 ```json
 {
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "superheroes",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "superheroes": [
            {
              "Hero name": "Superman",
              "Real identity": "Clark Kent",
              "Age": 28
            },
            {
              "Hero name": "Batman",
              "Real identity": "Bruce Wayne",
              "Age": 26
            },
            {
              "Hero name": "Flash",
              "Real identity": "Barry Allen",
              "Age": 28
            },
            {
              "Hero name": "Robin",
              "Real identity": "Dick Grayson",
              "Age": 15
            }
          ]
        }
      }
    ]
  }
 }
 ```
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@ -16,13 +16,9 @@ neighbors. Of the three search methods the plugin provides, this method offers t
 data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach is 
 preferred.
-The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during 
+The k-NN plugin builds a native library index of the vectors for each "knn-vector field"/ "Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). 
 indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. To learn more about 
 Lucene segments, see the [Apache Lucene documentation](https://lucene.apache.org/core/{{site.lucene_version}}/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). 
 These native library indices are loaded into native memory during search and managed by a cache. To learn more about
-pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). 
+pre-loading native library indices into memory, refer to the [warmup API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#warmup-operation). Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
 Additionally, you can see what native library indices are already loaded in memory, which you can learn more about in the 
 [stats API section]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#stats).
 Because the native library indices are constructed during indexing, it is not possible to apply a filter on an index
 and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor