diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index d1362bea..69ca0032 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -23,7 +23,7 @@ Boolean | [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-t IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): An IP address in IPv4 or IPv6 format. [Range]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/range/) | A range of values (`integer_range`, `long_range`, `double_range`, `float_range`, `date_range`, `ip_range`). [Object]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object-fields/)| [`object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/object/): A JSON object.
[`nested`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/nested/): Used when objects in an array need to be indexed independently as separate documents.
[`flat_object`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/flat-object/): A JSON object treated as a string.
[`join`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/join/): Establishes a parent-child relationship between documents in the same index. -[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. +[String]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/string/)|[`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/): Contains a string that is not analyzed.
[`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/): Contains a string that is analyzed.
[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/): A space-optimized version of a `text` field.
[`token_count`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/token-count/): Stores the number of analyzed tokens in a string. [Autocomplete]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/autocomplete/) |[`completion`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/completion/): Provides autocomplete functionality through a completion suggester.
[`search_as_you_type`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/search-as-you-type/): Provides search-as-you-type functionality using both prefix and infix completion. [Geographic]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geographic/)| [`geo_point`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-point/): A geographic point.
[`geo_shape`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/geo-shape/): A geographic shape. [Rank]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/) | Boosts or decreases the relevance score of documents (`rank_feature`, `rank_features`). diff --git a/_field-types/supported-field-types/match-only-text.md b/_field-types/supported-field-types/match-only-text.md new file mode 100644 index 00000000..fd2c6b58 --- /dev/null +++ b/_field-types/supported-field-types/match-only-text.md @@ -0,0 +1,101 @@ +--- +layout: default +title: Match-only text +nav_order: 61 +has_children: false +parent: String field types +grand_parent: Supported field types +--- + +# Match-only text field type + +A `match_only_text` field is a variant of a `text` field designed for full-text search when scoring and positional information of terms within a document are not critical. + +A `match_only_text` field is different from a `text` field in the following ways: + + - Omits storing positions, frequencies, and norms, reducing storage requirements. + - Disables scoring so that all matching documents receive a constant score of 1.0. + - Supports all query types except interval and span queries. + +Choose the `match_only_text` field type to prioritize efficient full-text search over complex ranking and positional queries while optimizing storage costs. Using `match_only_text` creates significantly smaller indexes, which results in lower storage costs, especially for large datasets. + +Use a `match_only_text` field when you need to quickly find documents containing specific terms without the overhead of storing frequencies and positions. The `match_only_text` field type is not the best choice for ranking results based on relevance or for queries that rely on term proximity or order, like interval or span queries. While this field type does support phrase queries, their performance isn't as efficient as when using the `text` field type. If identifying exact phrases or their locations within documents is essential, use the `text` field type instead. + +## Example + +Create a mapping with a `match_only_text` field: + +```json +PUT movies +{ + "mappings" : { + "properties" : { + "title" : { + "type" : "match_only_text" + } + } + } +} +``` +{% include copy-curl.html %} + +## Parameters + +While `match_only_text` supports most parameters available for `text` fields, modifying most of them can be counterproductive. This field type is intended to be simple and efficient, minimizing data stored in the index to optimize storage costs. Therefore, keeping the default settings is generally the best approach. Any modifications beyond analyzer settings can reintroduce overhead and negate the efficiency benefits of `match_only_text`. + +The following table lists all parameters available for `match_text_only` fields. + +Parameter | Description +:--- | :--- +`analyzer` | The analyzer to be used for the field. By default, it will be used at index time and at search time. To override it at search time, set the `search_analyzer` parameter. Default is the `standard` analyzer, which uses grammar-based tokenization and is based on the [Unicode Text Segmentation](https://unicode.org/reports/tr29/) algorithm. +`boost` | All hits are assigned a score of 1 and are multiplied by `boost` to produce the final score for the query clause. +`eager_global_ordinals` | Specifies whether global ordinals should be loaded eagerly on refresh. If the field is often used for aggregations, this parameter should be set to `true`. Default is `false`. +`fielddata` | A Boolean value that specifies whether to access analyzed tokens for sorting, aggregation, and scripting. Default is `false`. +`fielddata_frequency_filter` | A JSON object specifying that only those analyzed tokens whose document frequency is between the `min` and `max` values (provided as either an absolute number or a percentage) should be loaded into memory. Frequency is computed per segment. Parameters: `min`, `max`, `min_segment_size`. Default is to load all analyzed tokens. +`fields` | To index the same string in several ways (for example, as a keyword and text), provide the `fields` parameter. You can specify one version of the field to be used for search and another to be used for sorting and aggregation. +`index` | A Boolean value that specifies whether the field should be searchable. Default is `true`. +`index_options` | You cannot modify this parameter. +`index_phrases` | Not supported. +`index_prefixes` | Not supported. +`meta` | Accepts metadata for this field. +`norms` | Norms are disabled and cannot be enabled. +`position_increment_gap` | Although positions are disabled, `position_increment_gap` behaves similarly to the `text` field when used in phrase queries. Such queries may be slower but are still functional. +`similarity` | Setting similarity has no impact. The `match_only_text` field type doesn't support queries like `more_like_this`, which rely on similarity. Use a `keyword` or `text` field for queries that rely on similarity. +`term_vector` | Term vectors are supported, but using them is discouraged because it contradicts the primary purpose of this field---storage optimization. + +## Migrating a field from `text` to `match_only_text` + +You can use the [Reindex API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/reindex/) to migrate from a `text` field to `match_only_text` by updating the correct mapping in the destination index. + +In the following example, the `source` index contains a `title` field of type `text`. + +Create a destination index with the `title` field mapped as `text`: + +```json +PUT destination +{ + "mappings" : { + "properties" : { + "title" : { + "type" : "match_only_text" + } + } + } +} +``` +{% include copy-curl.html %} + +Reindex the data: + +```json +POST _reindex +{ + "source": { + "index":"source" + }, + "dest": { + "index":"destination" + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/supported-field-types/string.md b/_field-types/supported-field-types/string.md index f24dea23..c891f86c 100644 --- a/_field-types/supported-field-types/string.md +++ b/_field-types/supported-field-types/string.md @@ -18,4 +18,5 @@ Field data type | Description :--- | :--- [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) | A string that is not analyzed. Useful for exact-value search. [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) | A string that is analyzed. Useful for full-text search. +[`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) | A space-optimized version of a `text` field. [`token_count`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/token-count/) | Counts the number of tokens in a string. diff --git a/_field-types/supported-field-types/text.md b/_field-types/supported-field-types/text.md index 0a16f3aa..16350c0c 100644 --- a/_field-types/supported-field-types/text.md +++ b/_field-types/supported-field-types/text.md @@ -12,12 +12,15 @@ redirect_from: # Text field type -A text field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches with multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, stopwords removed, synonyms applied, etc. +A `text` field type contains a string that is analyzed. It is used for full-text search because it allows partial matches. Searches for multiple terms can match some but not all of them. Depending on the analyzer, results can be case insensitive, stemmed, have stopwords removed, have synonyms applied, and so on. If you need to use a field for exact-value search, map it as a [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) instead. {: .note } +The [`match_only_text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/match-only-text/) field is a space-optimized version of the `text` field. If you don't need to query phrases or use positional queries, map the field as `match_only_text` instead of `text`. Positional queries are queries in which the position of the term in the phrase is important, such as interval or span queries. +{: .note} + ## Example Create a mapping with a text field: diff --git a/_field-types/supported-field-types/token-count.md b/_field-types/supported-field-types/token-count.md index 07982431..6c3445e6 100644 --- a/_field-types/supported-field-types/token-count.md +++ b/_field-types/supported-field-types/token-count.md @@ -1,7 +1,7 @@ --- layout: default title: Token count -nav_order: 48 +nav_order: 70 has_children: false parent: String field types grand_parent: Supported field types