OpenSearch/docs/java-rest/high-level/search/search.asciidoc

465 lines
20 KiB
Plaintext

--
:api: search
:request: SearchRequest
:response: SearchResponse
--
[id="{upid}-{api}"]
=== Search API
[id="{upid}-{api}-request"]
==== Search Request
The +{request}+ is used for any operation that has to do with searching
documents, aggregations, suggestions and also offers ways of requesting
highlighting on the resulting documents.
In its most basic form, we can add a query to the request:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-basic]
--------------------------------------------------
<1> Creates the `SeachRequest`. Without arguments this runs against all indices.
<2> Most search parameters are added to the `SearchSourceBuilder`. It offers setters for everything that goes into the search request body.
<3> Add a `match_all` query to the `SearchSourceBuilder`.
<4> Add the `SearchSourceBuilder` to the `SeachRequest`.
[id="{upid}-{api}-request-optional"]
===== Optional arguments
Let's first look at some of the optional arguments of a +{request}+:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-indices]
--------------------------------------------------
<1> Restricts the request to an index
There are a couple of other interesting optional parameters:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-routing]
--------------------------------------------------
<1> Set a routing parameter
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-indicesOptions]
--------------------------------------------------
<1> Setting `IndicesOptions` controls how unavailable indices are resolved and
how wildcard expressions are expanded
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-preference]
--------------------------------------------------
<1> Use the preference parameter e.g. to execute the search to prefer local
shards. The default is to randomize across shards.
===== Using the SearchSourceBuilder
Most options controlling the search behavior can be set on the
`SearchSourceBuilder`,
which contains more or less the equivalent of the options in the search request
body of the Rest API.
Here are a few examples of some common options:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-source-basics]
--------------------------------------------------
<1> Create a `SearchSourceBuilder` with default options.
<2> Set the query. Can be any type of `QueryBuilder`
<3> Set the `from` option that determines the result index to start searching
from. Defaults to 0.
<4> Set the `size` option that determines the number of search hits to return.
Defaults to 10.
<5> Set an optional timeout that controls how long the search is allowed to
take.
After this, the `SearchSourceBuilder` only needs to be added to the
+{request}+:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-source-setter]
--------------------------------------------------
[id="{upid}-{api}-request-building-queries"]
===== Building queries
Search queries are created using `QueryBuilder` objects. A `QueryBuilder` exists
for every search query type supported by Elasticsearch's {ref}/query-dsl.html[Query DSL].
A `QueryBuilder` can be created using its constructor:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-query-builder-ctor]
--------------------------------------------------
<1> Create a full text {ref}/query-dsl-match-query.html[Match Query] that matches
the text "kimchy" over the field "user".
Once created, the `QueryBuilder` object provides methods to configure the options
of the search query it creates:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-query-builder-options]
--------------------------------------------------
<1> Enable fuzzy matching on the match query
<2> Set the prefix length option on the match query
<3> Set the max expansion options to control the fuzzy
process of the query
`QueryBuilder` objects can also be created using the `QueryBuilders` utility class.
This class provides helper methods that can be used to create `QueryBuilder` objects
using a fluent programming style:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-query-builders]
--------------------------------------------------
Whatever the method used to create it, the `QueryBuilder` object must be added
to the `SearchSourceBuilder` as follows:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-query-setter]
--------------------------------------------------
The <<{upid}-query-builders, Building Queries>> page gives a list of all available search queries with
their corresponding `QueryBuilder` objects and `QueryBuilders` helper methods.
===== Specifying Sorting
The `SearchSourceBuilder` allows to add one or more `SortBuilder` instances. There are four special implementations (Field-, Score-, GeoDistance- and ScriptSortBuilder).
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-source-sorting]
--------------------------------------------------
<1> Sort descending by `_score` (the default)
<2> Also sort ascending by `_id` field
===== Source filtering
By default, search requests return the contents of the document `_source` but like in the Rest API you can overwrite this behavior. For example, you can turn off `_source` retrieval completely:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-source-filtering-off]
--------------------------------------------------
The method also accepts an array of one or more wildcard patterns to control which fields get included or excluded in a more fine grained way:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-source-filtering-includes]
--------------------------------------------------
[id="{upid}-{api}-request-highlighting"]
===== Requesting Highlighting
Highlighting search results can be achieved by setting a `HighlightBuilder` on the
`SearchSourceBuilder`. Different highlighting behaviour can be defined for each
fields by adding one or more `HighlightBuilder.Field` instances to a `HighlightBuilder`.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-highlighting]
--------------------------------------------------
<1> Creates a new `HighlightBuilder`
<2> Create a field highlighter for the `title` field
<3> Set the field highlighter type
<4> Add the field highlighter to the highlight builder
There are many options which are explained in detail in the Rest API documentation. The Rest
API parameters (e.g. `pre_tags`) are usually changed by
setters with a similar name (e.g. `#preTags(String ...)`).
Highlighted text fragments can <<{upid}-{api}-response-highlighting,later be retrieved>> from the +{response}+.
[id="{upid}-{api}-request-building-aggs"]
===== Requesting Aggregations
Aggregations can be added to the search by first creating the appropriate
`AggregationBuilder` and then setting it on the `SearchSourceBuilder`. In the
following example we create a `terms` aggregation on company names with a
sub-aggregation on the average age of employees in the company:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-aggregations]
--------------------------------------------------
The <<{upid}-aggregation-builders, Building Aggregations>> page gives a list of all available aggregations with
their corresponding `AggregationBuilder` objects and `AggregationBuilders` helper methods.
We will later see how to <<{upid}-{api}-response-aggs,access aggregations>> in the +{response}+.
===== Requesting Suggestions
To add Suggestions to the search request, use one of the `SuggestionBuilder` implementations
that are easily accessible from the `SuggestBuilders` factory class. Suggestion builders
need to be added to the top level `SuggestBuilder`, which itself can be set on the `SearchSourceBuilder`.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-suggestion]
--------------------------------------------------
<1> Creates a new `TermSuggestionBuilder` for the `user` field and
the text `kmichy`
<2> Adds the suggestion builder and names it `suggest_user`
We will later see how to <<{upid}-{api}-response-suggestions,retrieve suggestions>> from the
+{response}+.
===== Profiling Queries and Aggregations
The {ref}/search-profile.html[Profile API] can be used to profile the execution of queries and aggregations for
a specific search request. in order to use it, the profile flag must be set to true on the `SearchSourceBuilder`:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-profiling]
--------------------------------------------------
Once the +{request}+ is executed the corresponding +{response}+ will
<<{upid}-{api}-response-profile,contain the profiling results>>.
include::../execution.asciidoc[]
[id="{upid}-{api}-response"]
==== {response}
The +{response}+ that is returned by executing the search provides details
about the search execution itself as well as access to the documents returned.
First, there is useful information about the request execution itself, like the
HTTP status code, execution time or whether the request terminated early or timed
out:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-response-1]
--------------------------------------------------
Second, the response also provides information about the execution on the
shard level by offering statistics about the total number of shards that were
affected by the search, and the successful vs. unsuccessful shards. Possible
failures can also be handled by iterating over an array off
`ShardSearchFailures` like in the following example:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-response-2]
--------------------------------------------------
[id="{upid}-{api}-response-search-hits"]
===== Retrieving SearchHits
To get access to the returned documents, we need to first get the `SearchHits`
contained in the response:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-hits-get]
--------------------------------------------------
The `SearchHits` provides global information about all hits, like total number
of hits or the maximum score:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-hits-info]
--------------------------------------------------
Nested inside the `SearchHits` are the individual search results that can
be iterated over:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-hits-singleHit]
--------------------------------------------------
The `SearchHit` provides access to basic information like index, document ID
and score of each search hit:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-hits-singleHit-properties]
--------------------------------------------------
Furthermore, it lets you get back the document source, either as a simple
JSON-String or as a map of key/value pairs. In this map, regular fields
are keyed by the field name and contain the field value. Multi-valued fields are
returned as lists of objects, nested objects as another key/value map. These
cases need to be cast accordingly:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-hits-singleHit-source]
--------------------------------------------------
[id="{upid}-{api}-response-highlighting"]
===== Retrieving Highlighting
If <<{upid}-{api}-request-highlighting,requested>>, highlighted text fragments can be retrieved from each `SearchHit` in the result. The hit object offers
access to a map of field names to `HighlightField` instances, each of which contains one
or many highlighted text fragments:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-highlighting-get]
--------------------------------------------------
<1> Get the highlighting for the `title` field
<2> Get one or many fragments containing the highlighted field content
[id="{upid}-{api}-response-aggs"]
===== Retrieving Aggregations
Aggregations can be retrieved from the +{response}+ by first getting the
root of the aggregation tree, the `Aggregations` object, and then getting the
aggregation by name.
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-aggregations-get]
--------------------------------------------------
<1> Get the `by_company` terms aggregation
<2> Get the buckets that is keyed with `Elastic`
<3> Get the `average_age` sub-aggregation from that bucket
Note that if you access aggregations by name, you need to specify the
aggregation interface according to the type of aggregation you requested,
otherwise a `ClassCastException` will be thrown:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[search-request-aggregations-get-wrongCast]
--------------------------------------------------
<1> This will throw an exception because "by_company" is a `terms` aggregation
but we try to retrieve it as a `range` aggregation
It is also possible to access all aggregations as a map that is keyed by the
aggregation name. In this case, the cast to the proper aggregation interface
needs to happen explicitly:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-aggregations-asMap]
--------------------------------------------------
There are also getters that return all top level aggregations as a list:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-aggregations-asList]
--------------------------------------------------
And last but not least you can iterate over all aggregations and then e.g.
decide how to further process them based on their type:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-aggregations-iterator]
--------------------------------------------------
[id="{upid}-{api}-response-suggestions"]
===== Retrieving Suggestions
To get back the suggestions from a +{response}+, use the `Suggest` object as an entry point and then retrieve the nested suggestion objects:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-suggestion-get]
--------------------------------------------------
<1> Use the `Suggest` class to access suggestions
<2> Suggestions can be retrieved by name. You need to assign them to the correct
type of Suggestion class (here `TermSuggestion`), otherwise a `ClassCastException` is thrown
<3> Iterate over the suggestion entries
<4> Iterate over the options in one entry
[id="{upid}-{api}-response-profile"]
===== Retrieving Profiling Results
Profiling results are retrieved from a +{response}+ using the `getProfileResults()` method. This
method returns a `Map` containing a `ProfileShardResult` object for every shard involved in the
+{request}+ execution. `ProfileShardResult` are stored in the `Map` using a key that uniquely
identifies the shard the profile result corresponds to.
Here is a sample code that shows how to iterate over all the profiling results of every shard:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-profiling-get]
--------------------------------------------------
<1> Retrieve the `Map` of `ProfileShardResult` from the +{response}+
<2> Profiling results can be retrieved by shard's key if the key is known, otherwise it might be simpler
to iterate over all the profiling results
<3> Retrieve the key that identifies which shard the `ProfileShardResult` belongs to
<4> Retrieve the `ProfileShardResult` for the given shard
The `ProfileShardResult` object itself contains one or more query profile results, one for each query
executed against the underlying Lucene index:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-profiling-queries]
--------------------------------------------------
<1> Retrieve the list of `QueryProfileShardResult`
<2> Iterate over each `QueryProfileShardResult`
Each `QueryProfileShardResult` gives access to the detailed query tree execution, returned as a list of
`ProfileResult` objects:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-profiling-queries-results]
--------------------------------------------------
<1> Iterate over the profile results
<2> Retrieve the name of the Lucene query
<3> Retrieve the time in millis spent executing the Lucene query
<4> Retrieve the profile results for the sub-queries (if any)
The Rest API documentation contains more information about {ref}/search-profile.html#profiling-queries[Profiling Queries] with
a description of the query profiling information.
The `QueryProfileShardResult` also gives access to the profiling information for the Lucene collectors:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-profiling-queries-collectors]
--------------------------------------------------
<1> Retrieve the profiling result of the Lucene collector
<2> Retrieve the name of the Lucene collector
<3> Retrieve the time in millis spent executing the Lucene collector
<4> Retrieve the profile results for the sub-collectors (if any)
The Rest API documentation contains more information about profiling information
for Lucene collectors. See {ref}/search-profile.html#profiling-queries[Profiling queries].
In a very similar manner to the query tree execution, the `QueryProfileShardResult` objects gives access
to the detailed aggregations tree execution:
["source","java",subs="attributes,callouts,macros"]
--------------------------------------------------
include-tagged::{doc-tests-file}[{api}-request-profiling-aggs]
--------------------------------------------------
<1> Retrieve the `AggregationProfileShardResult`
<2> Iterate over the aggregation profile results
<3> Retrieve the type of the aggregation (corresponds to Java class used to execute the aggregation)
<4> Retrieve the time in millis spent executing the Lucene collector
<5> Retrieve the profile results for the sub-aggregations (if any)
The Rest API documentation contains more information about
{ref}/search-profile.html#profiling-aggregations[Profiling aggregations].