kolchfa-aws 5abc22147c
Refactor the Query DSL section (#2904)
* for query dsl index page rewrites for proper index page

Signed-off-by: alicejw <alicejw@amazon.com>

* fix formatting in table

Signed-off-by: alicejw <alicejw@amazon.com>

* update query table intro

Signed-off-by: alicejw <alicejw@amazon.com>

* rmv proprietary from overview

Signed-off-by: alicejw <alicejw@amazon.com>

* awkward sentence fix

Signed-off-by: alicejw <alicejw@amazon.com>

* to add list of all query categories

Signed-off-by: alicejw <alicejw@amazon.com>

* for query category descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* remove commented note

Signed-off-by: alicejw <alicejw@amazon.com>

* update term-level query page

Signed-off-by: alicejw <alicejw@amazon.com>

* for clarity about term and full-text query use cases

Signed-off-by: alicejw <alicejw@amazon.com>

* for parallel bullet list of queries

Signed-off-by: alicejw <alicejw@amazon.com>

* remove redundant word

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* for tech review feedback

Signed-off-by: alicejw <alicejw@amazon.com>

* for entire list of query types we support, even though we don't have document topic pages for them yet.

Signed-off-by: alicejw <alicejw@amazon.com>

* to include full list of query types we support

Signed-off-by: alicejw <alicejw@amazon.com>

* change Boolean to  type for consistency in the section

Signed-off-by: alicejw <alicejw@amazon.com>

* update query type category list title

Signed-off-by: alicejw <alicejw@amazon.com>

* for compound query type definitions

Signed-off-by: alicejw <alicejw@amazon.com>

* for additional descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* for query context descriptions

Signed-off-by: alicejw <alicejw@amazon.com>

* for additional edits to query descriptions list

Signed-off-by: alicejw <alicejw@amazon.com>

* create span query category page and update bullet list on index to cross-reference to it.

Signed-off-by: alicejw <alicejw@amazon.com>

* add pages for geo and shape query category, and add cross-references

Signed-off-by: alicejw <alicejw@amazon.com>

* remove regex it is part of term-level queries

Signed-off-by: alicejw <alicejw@amazon.com>

* for bullet list granular edits

Signed-off-by: alicejw <alicejw@amazon.com>

* put bullet list in alphabetical order

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc review updates

Signed-off-by: alicejw <alicejw@amazon.com>

* reword for reviewer feedback

Signed-off-by: alicejw <alicejw@amazon.com>

* small rewording

Signed-off-by: alicejw <alicejw@amazon.com>

* typo space

Signed-off-by: alicejw <alicejw@amazon.com>

* put topics in alphabetical order in left nav

Signed-off-by: alicejw <alicejw@amazon.com>

* additional reviewer's comment

Signed-off-by: alicejw <alicejw@amazon.com>

* for second doc reviewer's feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* for doc reviewer comment that was hidden

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/geo-and-shape.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/index.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/span-query.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/span-query.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* updates from third doc review for tech accuracy requested by editorial

Signed-off-by: alicejw <alicejw@amazon.com>

* create compound query sub-page to move descriptions to make bullet list parallel

Signed-off-by: alicejw <alicejw@amazon.com>

* fix compound query page title

Signed-off-by: alicejw <alicejw@amazon.com>

* add fuzzy query definition

Signed-off-by: alicejw <alicejw@amazon.com>

* for editorial feedback updates

Signed-off-by: alicejw <alicejw@amazon.com>

* Update _opensearch/query-dsl/term.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Refactor Query DSL section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Adds doc review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Fix typo

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Changed periods to colons when introducing code blocks

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: alicejw <alicejw@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: alicejw <alicejw@amazon.com>
Co-authored-by: Alice Williams <88908598+alicejw-aws@users.noreply.github.com>
2023-02-15 17:12:50 -05:00

234 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: default
title: Term-level and full-text queries compared
parent: Query DSL
nav_order: 10
---
# Term-level and full-text queries compared
You can use both term-level and full-text queries to search text, but while term-level queries are usually used to search structured data, full-text queries are used for full-text search. The main difference between term-level and full-text queries is that term-level queries search documents for an exact specified term, while full-text queries analyze the query string. The following table summarizes the differences between term-level and full-text queries.
| | Term-level queries | Full-text queries
:--- | :--- | :---
*Description* | Term-level queries answer which documents match a query. | Full-text queries answer how well the documents match a query.
*Analyzer* | The search term isn't analyzed. This means that the term query searches for your search term as it is. | The search term is analyzed by the same analyzer that was used for the specific document field at the time it was indexed. This means that your search term goes through the same analysis process as the document's field.
*Relevance* | Term-level queries simply return documents that match without sorting them based on the relevance score. They still calculate the relevance score, but this score is the same for all the documents that are returned. | Full-text queries calculate a relevance score for each match and sort the results by decreasing order of relevance.
*Use Case* | Use term-level queries when you want to match exact values such as numbers, dates, or tags and don't need the matches to be sorted by relevance. | Use full-text queries to match text fields and sort by relevance after taking into account factors like casing and stemming variants.
OpenSearch uses the BM25 ranking algorithm to calculate relevance scores. To learn more, see [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25).
{: .note }
## Should I use a full-text or a term-level query?
To clarify the difference between full-text and term-level queries, consider the following two examples that search for a specific text phrase. The complete works of Shakespeare are indexed in an OpenSearch cluster.
### Example: Phrase search
In this example, you'll search the complete works of Shakespeare for the phrase "To be, or not to be" in the `text_entry` field.
First, use a **term-level query** for this search:
```json
GET shakespeare/_search
{
"query": {
"term": {
"text_entry": "To be, or not to be"
}
}
}
```
The response contains no matches, indicated by zero `hits`:
```json
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
```
This is because the term “To be, or not to be” is searched literally in the inverted index, where only the analyzed values of the text fields are stored. Term-level queries arent suited for searching analyzed text fields because they often yield unexpected results. When working with text data, use term-level queries only for fields mapped as `keyword`.
Now search for the same phrase using a **full-text query**:
```json
GET shakespeare/_search
{
"query": {
"match": {
"text_entry": "To be, or not to be"
}
}
}
```
The search query “To be, or not to be” is analyzed and tokenized into an array of tokens just like the `text_entry` field of the documents. The full-text query takes an intersection of tokens between the search query and the `text_entry` fields for all the documents, and then sorts the results by relevance score:
```json
{
"took" : 19,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 17.419369,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "34229",
"_score" : 17.419369,
"_source" : {
"type" : "line",
"line_id" : 34230,
"play_name" : "Hamlet",
"speech_number" : 19,
"line_number" : "3.1.64",
"speaker" : "HAMLET",
"text_entry" : "To be, or not to be: that is the question:"
}
},
{
"_index" : "shakespeare",
"_id" : "109930",
"_score" : 14.883024,
"_source" : {
"type" : "line",
"line_id" : 109931,
"play_name" : "A Winters Tale",
"speech_number" : 23,
"line_number" : "4.4.153",
"speaker" : "PERDITA",
"text_entry" : "Not like a corse; or if, not to be buried,"
}
},
{
"_index" : "shakespeare",
"_id" : "103117",
"_score" : 14.782743,
"_source" : {
"type" : "line",
"line_id" : 103118,
"play_name" : "Twelfth Night",
"speech_number" : 53,
"line_number" : "1.3.95",
"speaker" : "SIR ANDREW",
"text_entry" : "will not be seen; or if she be, its four to one"
}
}
]
}
}
...
```
For a list of all full-text queries, see [Full-text queries]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/full-text/index).
### Example: Exact term search
If you want to search for an exact term like “HAMLET” in the `speaker` field and don't need the results to be sorted by relevance score, a term-level query is more efficient:
```json
GET shakespeare/_search
{
"query": {
"term": {
"speaker": "HAMLET"
}
}
}
```
The response contains document matches:
```json
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1582,
"relation" : "eq"
},
"max_score" : 4.2540946,
"hits" : [
{
"_index" : "shakespeare",
"_id" : "32700",
"_score" : 4.2540946,
"_source" : {
"type" : "line",
"line_id" : 32701,
"play_name" : "Hamlet",
"speech_number" : 9,
"line_number" : "1.2.66",
"speaker" : "HAMLET",
"text_entry" : "[Aside] A little more than kin, and less than kind."
}
},
{
"_index" : "shakespeare",
"_id" : "32702",
"_score" : 4.2540946,
"_source" : {
"type" : "line",
"line_id" : 32703,
"play_name" : "Hamlet",
"speech_number" : 11,
"line_number" : "1.2.68",
"speaker" : "HAMLET",
"text_entry" : "Not so, my lord; I am too much i' the sun."
}
},
{
"_index" : "shakespeare",
"_id" : "32709",
"_score" : 4.2540946,
"_source" : {
"type" : "line",
"line_id" : 32710,
"play_name" : "Hamlet",
"speech_number" : 13,
"line_number" : "1.2.75",
"speaker" : "HAMLET",
"text_entry" : "Ay, madam, it is common."
}
}
]
}
}
...
```
The term-level queries provide exact matches. So if you search for “Hamlet”, you dont receive any matches, because “HAMLET” is a keyword field and is stored in OpenSearch literally and not in an analyzed form.
The search query “HAMLET” is also searched literally. So to get a match for this field, we need to enter the exact same characters.