mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-25 01:19:02 +00:00
Docs: Refactored the mapping meta-fields docs
This commit is contained in:
parent
c4778b8e78
commit
c56ce0e242
@ -52,7 +52,7 @@ creating a new index.
|
||||
[float]
|
||||
=== Mapper settings
|
||||
|
||||
`index.mapper.dynamic` (_static_)::
|
||||
`index.mapper.dynamic` (_dynamic_)::
|
||||
|
||||
Dynamic creation of mappings for unmapped types can be completely
|
||||
disabled by setting `index.mapper.dynamic` to `false`.
|
||||
|
@ -1,27 +1,78 @@
|
||||
[[mapping-fields]]
|
||||
== Fields
|
||||
== Meta-Fields
|
||||
|
||||
Each mapping has a number of fields associated with it
|
||||
which can be used to control how the document metadata
|
||||
(eg <<mapping-all-field>>) is indexed.
|
||||
Each document has metadata associated with it, such as the `_index`, mapping
|
||||
<<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields
|
||||
can be customised when a mapping type is created.
|
||||
|
||||
The meta-fields are:
|
||||
|
||||
[horizontal]
|
||||
<<mapping-index-field,`_index`>>::
|
||||
|
||||
The index to which the document belongs.
|
||||
|
||||
<<mapping-uid-field,`_uid`>>::
|
||||
|
||||
A composite field consisting of the `_type` and the `_id`.
|
||||
|
||||
<<mapping-type-field,`_type`>>::
|
||||
|
||||
The document's <<all-mapping-types,mapping type>>.
|
||||
|
||||
<<mapping-id-field,`_id`>>::
|
||||
|
||||
The document's ID.
|
||||
|
||||
<<mapping-source-field,`_source`>>::
|
||||
|
||||
The original JSON representing the body of the document.
|
||||
|
||||
<<mapping-all-field,`_all`>>::
|
||||
|
||||
A _catch-all_ field that indexes the values of all other fields.
|
||||
|
||||
<<mapping-field-names-field,`_field_names`>>::
|
||||
|
||||
All fields in the document which contain non-null values.
|
||||
|
||||
<<mapping-parent-field,`_parent`>>::
|
||||
|
||||
Used to create a parent-child relationship between two mapping types.
|
||||
|
||||
<<mapping-routing-field,`_routing`>>::
|
||||
|
||||
A custom routing value which routes a document to a particular shard.
|
||||
|
||||
<<mapping-size-field,`_size`>>::
|
||||
|
||||
The size of the `_source` field in bytes.
|
||||
|
||||
<<mapping-timestamp-field,`_timestamp`>>::
|
||||
|
||||
A timestamp associated with the document, either specified manually or auto-generated.
|
||||
|
||||
<<mapping-ttl-field,`_ttl`>>::
|
||||
|
||||
How long a document should live before it is automatically deleted.
|
||||
|
||||
include::fields/index-field.asciidoc[]
|
||||
|
||||
include::fields/uid-field.asciidoc[]
|
||||
|
||||
include::fields/id-field.asciidoc[]
|
||||
|
||||
include::fields/type-field.asciidoc[]
|
||||
|
||||
include::fields/id-field.asciidoc[]
|
||||
|
||||
include::fields/source-field.asciidoc[]
|
||||
|
||||
include::fields/all-field.asciidoc[]
|
||||
|
||||
include::fields/parent-field.asciidoc[]
|
||||
|
||||
include::fields/field-names-field.asciidoc[]
|
||||
|
||||
include::fields/routing-field.asciidoc[]
|
||||
include::fields/parent-field.asciidoc[]
|
||||
|
||||
include::fields/index-field.asciidoc[]
|
||||
include::fields/routing-field.asciidoc[]
|
||||
|
||||
include::fields/size-field.asciidoc[]
|
||||
|
||||
|
@ -1,78 +1,416 @@
|
||||
[[mapping-all-field]]
|
||||
=== `_all`
|
||||
=== `_all` field
|
||||
|
||||
The idea of the `_all` field is that it includes the text of one or more
|
||||
other fields within the document indexed. It can come very handy
|
||||
especially for search requests, where we want to execute a search query
|
||||
against the content of a document, without knowing which fields to
|
||||
search on. This comes at the expense of CPU cycles and index size.
|
||||
The `_all` field is a special _catch-all_ field which concatenates the values
|
||||
of all of the other fields into one big string, which is then
|
||||
<<analysis,analyzed>> and indexed, but not stored. This means that it can be
|
||||
searched, but not retrieved.
|
||||
|
||||
The `_all` fields can be completely disabled. Explicit field mappings and
|
||||
object mappings can be excluded / included in the `_all` field. By
|
||||
default, it is enabled and all fields are included in it for ease of
|
||||
use.
|
||||
|
||||
When disabling the `_all` field, it is a good practice to set
|
||||
`index.query.default_field` to a different value (for example, if you
|
||||
have a main "message" field in your data, set it to `message`).
|
||||
|
||||
One of the nice features of the `_all` field is that it takes into
|
||||
account specific fields boost levels. Meaning that if a title field is
|
||||
boosted more than content, the title (part) in the `_all` field will
|
||||
mean more than the content (part) in the `_all` field.
|
||||
|
||||
Here is a sample mapping:
|
||||
The `_all` field allows you to search for values in documents without knowing
|
||||
which field contains the value. This makes it a useful option when getting
|
||||
started with a new dataset. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
--------------------------------
|
||||
PUT my_index/user/1 <1>
|
||||
{
|
||||
"person" : {
|
||||
"_all" : {"enabled" : true},
|
||||
"properties" : {
|
||||
"name" : {
|
||||
"type" : "object",
|
||||
"dynamic" : false,
|
||||
"properties" : {
|
||||
"first" : {"type" : "string", "store" : true , "include_in_all" : false},
|
||||
"last" : {"type" : "string", "index" : "not_analyzed"}
|
||||
}
|
||||
},
|
||||
"address" : {
|
||||
"type" : "object",
|
||||
"include_in_all" : false,
|
||||
"properties" : {
|
||||
"first" : {
|
||||
"properties" : {
|
||||
"location" : {"type" : "string", "store" : true}
|
||||
}
|
||||
},
|
||||
"last" : {
|
||||
"properties" : {
|
||||
"location" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"simple1" : {"type" : "long", "include_in_all" : true},
|
||||
"simple2" : {"type" : "long", "include_in_all" : false}
|
||||
}
|
||||
}
|
||||
"first_name": "John",
|
||||
"last_name": "Smith",
|
||||
"date_of_birth": "1970-10-24"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The `_all` fields allows for `store`, `term_vector` and `analyzer` (with
|
||||
specific `analyzer` and `search_analyzer`) to be set.
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"_all": "john smith 1970"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `_all` field will contain the terms: [ `"john"`, `"smith"`, `"1970"`, `"10"`, `"24"` ]
|
||||
|
||||
[float]
|
||||
[[highlighting]]
|
||||
==== Highlighting
|
||||
[NOTE]
|
||||
.All values treated as strings
|
||||
=============================================================================
|
||||
|
||||
The `date_of_birth` field in the above example is recognised as a `date` field
|
||||
and so will index a single term representing `1970-10-24 00:00:00 UTC`. The
|
||||
`_all` field, however, treats all values as strings, so the date value is
|
||||
indexed as the three string terms: `"1970"`, `"24"`, `"10"`.
|
||||
|
||||
It is important to note that the `_all` field combines the original values
|
||||
from each field as a string. It does not combine the _terms_ from each field.
|
||||
|
||||
=============================================================================
|
||||
|
||||
The `_all` field is just a <<string,`string`>> field, and accepts the same
|
||||
parameters that other string fields accept, including `analyzer`,
|
||||
`term_vectors`, `index_options`, and `store`.
|
||||
|
||||
The `_all` field can be useful, especially when exploring new data using
|
||||
simple filtering. However, by concatenating field values into one big string,
|
||||
the `_all` field loses the distinction between short fields (more relevant)
|
||||
and long fields (less relevant). For use cases where search relevance is
|
||||
important, it is better to query individual fields specifically.
|
||||
|
||||
The `_all` field is not free: it requires extra CPU cycles and uses more disk
|
||||
space. If not needed, it can be completely <<disabling-all-field,disabled>> or
|
||||
customised on a <<include-in-all,per-field basis>>.
|
||||
|
||||
[[querying-all-field]]
|
||||
==== Using the `_all` field in queries
|
||||
|
||||
The <<query-dsl-query-string-query,`query_string`>> and
|
||||
<<query-dsl-simple-query-string-query,`simple_query_string`>> queries query
|
||||
the `_all` field by default, unless another field is specified:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
GET _search
|
||||
{
|
||||
"query": {
|
||||
"query_string": {
|
||||
"query": "john smith 1970"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
The same goes for the `?q=` parameter in <<search-uri-request, URI search
|
||||
requests>> (which is rewritten to a `query_string` query internally):
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
GET _search?q=john+smith+1970
|
||||
--------------------------------
|
||||
|
||||
Other queries, such as the <<query-dsl-match-query,`match`>> and
|
||||
<<query-dsl-term-query,`term`>> queries require you to specify
|
||||
the `_all` field explicitly, as per the
|
||||
<<mapping-all-field,first example>>.
|
||||
|
||||
[[disabling-all-field]]
|
||||
==== Disabling the `_all` field
|
||||
|
||||
The `_all` field can be completely disabled per-type by setting `enabled` to
|
||||
`false`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"type_1": { <1>
|
||||
"properties": {...}
|
||||
},
|
||||
"type_2": { <2>
|
||||
"_all": {
|
||||
"enabled": false
|
||||
},
|
||||
"properties": {...}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `_all` field in `type_1` is enabled.
|
||||
<2> The `_all` field in `type_2` is completely disabled.
|
||||
|
||||
If the `_all` field is disabled, then URI search requests and the
|
||||
`query_string` and `simple_query_string` queries will not be able to use it
|
||||
for queries (see <<querying-all-field>>). You can configure them to use a
|
||||
different field with the `index.query.default_field` setting:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"_all": {
|
||||
"enabled": false <1>
|
||||
},
|
||||
"properties": {
|
||||
"content": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"settings": {
|
||||
"index.query.default_field": "content" <2>
|
||||
},
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `_all` field is disabled for the `my_type` type.
|
||||
<2> The `query_string` query will default to querying the `content` field in this index.
|
||||
|
||||
[[include-in-all]]
|
||||
==== Including specific fields in `_all`
|
||||
|
||||
Individual fields can be included or excluded from the `_all` field with the
|
||||
`include_in_all` setting, which defaults to `true`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"title": { <1>
|
||||
"type": "string"
|
||||
}
|
||||
"content": { <1>
|
||||
"type": "string"
|
||||
},
|
||||
"date": { <2>
|
||||
"type": "date",
|
||||
"include_in_all": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `title` and `content` fields with be included in the `_all` field.
|
||||
<2> The `date` field will not be included in the `_all` field.
|
||||
|
||||
The `include_in_all` parameter can also be set at the type level and on
|
||||
<<mapping-object-type,`object`>> or <<mapping-nested-type,`nested`>> fields,
|
||||
in which case all sub-fields inherit that setting. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"include_in_all": false, <1>
|
||||
"properties": {
|
||||
"title": { "type": "string" },
|
||||
"author": {
|
||||
"include_in_all": true, <2>
|
||||
"properties": {
|
||||
"first_name": { "type": "string" },
|
||||
"last_name": { "type": "string" }
|
||||
}
|
||||
},
|
||||
"editor": {
|
||||
"properties": {
|
||||
"first_name": { "type": "string" }, <3>
|
||||
"last_name": { "type": "string", "include_in_all": true } <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> All fields in `my_type` are excluded from `_all`.
|
||||
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
|
||||
<3> Only the `editor.last_name` field is included in `_all`.
|
||||
The `editor.first_name` inherits the type-level setting and is excluded.
|
||||
|
||||
[[all-field-and-boosting]]
|
||||
==== Index boosting and the `_all` field
|
||||
|
||||
Individual fields can be _boosted_ at index time, with the `boost` parameter.
|
||||
The `_all` field takes these boosts into account:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT myindex
|
||||
{
|
||||
"mappings": {
|
||||
"mytype": {
|
||||
"properties": {
|
||||
"title": { <1>
|
||||
"type": "string",
|
||||
"boost": 2
|
||||
},
|
||||
"content": { <1>
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> When querying the `_all` field, words that originated in the
|
||||
`title` field are twice as relevant as words that originated in
|
||||
the `content` field.
|
||||
|
||||
WARNING: Using index-time boosting with the `_all` field has a significant
|
||||
impact on query performance. Usually the better solution is to query fields
|
||||
individually, with optional query time boosting.
|
||||
|
||||
|
||||
[[custom-all-fields]]
|
||||
==== Custom `_all` fields
|
||||
|
||||
While there is only a single `_all` field per index, the <<copy-to,`copy_to`>>
|
||||
parameter allows the creation of multiple __custom `_all` fields__. For
|
||||
instance, `first_name` and `last_name` fields can be combined together into
|
||||
the `full_name` field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT myindex
|
||||
{
|
||||
"mappings": {
|
||||
"mytype": {
|
||||
"properties": {
|
||||
"first_name": {
|
||||
"type": "string",
|
||||
"copy_to": "full_name" <1>
|
||||
},
|
||||
"last_name": {
|
||||
"type": "string",
|
||||
"copy_to": "full_name" <1>
|
||||
},
|
||||
"full_name": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT myindex/mytype/1
|
||||
{
|
||||
"first_name": "John",
|
||||
"last_name": "Smith"
|
||||
}
|
||||
|
||||
GET myindex/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"full_name": "John Smith"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `first_name` and `last_name` values are copied to the `full_name` field.
|
||||
|
||||
[[highlighting-all-field]]
|
||||
==== Highlighting and the `_all` field
|
||||
|
||||
A field can only be used for <<search-request-highlighting,highlighting>> if
|
||||
the original string value is available, either from the
|
||||
<<mapping-source-field,`_source`>> field or as a stored field.
|
||||
|
||||
The `_all` field is not present in the `_source` field and it is not stored by
|
||||
default, and so cannot be highlighted. There are two options. Either
|
||||
<<all-field-store,store the `_all` field>> or highlight the
|
||||
<<all-highlight-fields,original fields>>.
|
||||
|
||||
[[all-field-store]]
|
||||
===== Store the `_all` field
|
||||
|
||||
If `store` is set to `true`, then the original field value is retrievable and
|
||||
can be highlighted:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT myindex
|
||||
{
|
||||
"mappings": {
|
||||
"mytype": {
|
||||
"_all": {
|
||||
"store": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT myindex/mytype/1
|
||||
{
|
||||
"first_name": "John",
|
||||
"last_name": "Smith"
|
||||
}
|
||||
|
||||
GET _search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"_all": "John Smith"
|
||||
}
|
||||
},
|
||||
"highlight": {
|
||||
"fields": {
|
||||
"_all": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
Of course, storing the `_all` field will use significantly more disk space
|
||||
and, because it is a combination of other fields, it may result in odd
|
||||
highlighting results.
|
||||
|
||||
The `_all` field also accepts the `term_vector` and `index_options`
|
||||
parameters, allowing the use of the fast vector highlighter and the postings
|
||||
highlighter.
|
||||
|
||||
[[all-highlight-fields]]
|
||||
===== Highlight original fields
|
||||
|
||||
You can query the `_all` field, but use the original fields for highlighting as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT myindex
|
||||
{
|
||||
"mappings": {
|
||||
"mytype": {
|
||||
"_all": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT myindex/mytype/1
|
||||
{
|
||||
"first_name": "John",
|
||||
"last_name": "Smith"
|
||||
}
|
||||
|
||||
GET _search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"_all": "John Smith" <1>
|
||||
}
|
||||
},
|
||||
"highlight": {
|
||||
"fields": {
|
||||
"*_name": { <2>
|
||||
"require_field_match": "false" <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The query inspects the `_all` field to find matching documents.
|
||||
<2> Highlighting is performed on the two name fields, which are available from the `_source`.
|
||||
<3> The query wasn't run against the name fields, so set `require_field_match` to `false`.
|
||||
|
||||
For any field to allow
|
||||
<<search-request-highlighting,highlighting>> it has
|
||||
to be either stored or part of the `_source` field. By default the `_all`
|
||||
field does not qualify for either, so highlighting for it does not yield
|
||||
any data.
|
||||
|
||||
Although it is possible to `store` the `_all` field, it is basically an
|
||||
aggregation of all fields, which means more data will be stored, and
|
||||
highlighting it might produce strange results.
|
||||
|
@ -1,6 +1,55 @@
|
||||
[[mapping-field-names-field]]
|
||||
=== `_field_names`
|
||||
=== `_field_names` field
|
||||
|
||||
The `_field_names` field indexes the names of every field in a document that
|
||||
contains any value other than `null`. This field is used by the
|
||||
<<query-dsl-exists-query,`exists`>> and <<query-dsl-missing-query,`missing`>>
|
||||
queries to find documents that either have or don't have any non-+null+ value
|
||||
for a particular field.
|
||||
|
||||
The value of the `_field_name` field is accessible in queries, aggregations, and
|
||||
scripts:
|
||||
|
||||
[source,js]
|
||||
--------------------------
|
||||
# Example documents
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"title": "This is a document"
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"title": "This is another document",
|
||||
"body": "This document has a body"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_field_names": [ "title" ] <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"Field names": {
|
||||
"terms": {
|
||||
"field": "_field_names", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"script_fields": {
|
||||
"Field names": {
|
||||
"script": "doc['_field_names']" <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_field_names` field (also see the <<query-dsl-exists-query,`exists`>> and <<query-dsl-missing-query,`missing`>> queries)
|
||||
<2> Aggregating on the `_field_names` field
|
||||
<3> Accessing the `_field_names` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
The `_field_names` field indexes the field names of a document, which can later
|
||||
be used to search for documents based on the fields that they contain typically
|
||||
using the `exists` and `missing` filters.
|
||||
|
@ -1,11 +1,44 @@
|
||||
[[mapping-id-field]]
|
||||
=== `_id`
|
||||
=== `_id` field
|
||||
|
||||
Each document indexed is associated with an id and a type. The `_id`
|
||||
field allows accessing only the id of a document.
|
||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
|
||||
`_id` field is not indexed as its value can be derived automatically from the
|
||||
<<mapping-uid-field,`_uid`>> field.
|
||||
|
||||
Note, even though the `_id` is not indexed, all the APIs still work
|
||||
(since they work with the `_uid` field), as well as fetching by ids
|
||||
using `term`, `terms` or `prefix` queries/filters (including the
|
||||
specific `ids` query/filter).
|
||||
The value of the `_id` field is accessible in queries and scripts, but _not_
|
||||
in aggregations or when sorting, where the <<mapping-uid-field,`_uid`>> field
|
||||
should be used instead:
|
||||
|
||||
[source,js]
|
||||
--------------------------
|
||||
# Example documents
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Document with ID 1"
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"text": "Document with ID 2"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_id": [ "1", "2" ] <1>
|
||||
}
|
||||
},
|
||||
"script_fields": {
|
||||
"UID": {
|
||||
"script": "doc['_id']" <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
|
||||
<2> Accessing the `_id` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
|
@ -1,15 +1,56 @@
|
||||
[[mapping-index-field]]
|
||||
=== `_index`
|
||||
=== `_index` field
|
||||
|
||||
The ability to store in a document the index it belongs to. By default
|
||||
it is disabled, in order to enable it, the following mapping should be
|
||||
defined:
|
||||
The name of the index that contains the document. This field is not indexed
|
||||
but can be automatically derived from the index itself.
|
||||
|
||||
Its value is accessible in queries, aggregations, scripts, and when sorting:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
--------------------------
|
||||
# Example documents
|
||||
PUT index_1/my_type/1
|
||||
{
|
||||
"tweet" : {
|
||||
"_index" : { "enabled" : true }
|
||||
}
|
||||
"text": "Document in index 1"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
PUT index_2/my_type/2
|
||||
{
|
||||
"text": "Document in index 2"
|
||||
}
|
||||
|
||||
GET index_1,index_2/_search
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_index": ["index_1", "index_2"] <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"indices": {
|
||||
"terms": {
|
||||
"field": "_index", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_index": { <3>
|
||||
"order": "asc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"index_name": {
|
||||
"script": "doc['_index']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_index` field
|
||||
<2> Aggregating on the `_index` field
|
||||
<3> Sorting on the `_index` field
|
||||
<4> Accessing the `_index` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
@ -1,54 +1,165 @@
|
||||
[[mapping-parent-field]]
|
||||
=== `_parent`
|
||||
=== `_parent` field
|
||||
|
||||
TIP: It is highly recommend to reindex all indices with `_parent` field created before version 2.x.
|
||||
The reason for this is to gain from all the optimizations added with the 2.0 release.
|
||||
added[2.0.0,The parent-child implementation has been completely rewritten. It is advisable to reindex any 1.x indices which use parent-child to take advantage of the new optimizations]
|
||||
|
||||
The parent field mapping is defined on a child mapping, and points to
|
||||
the parent type this child relates to. For example, in case of a `blog`
|
||||
type and a `blog_tag` type child document, the mapping for `blog_tag`
|
||||
should be:
|
||||
A parent-child relationship can be established between documents in the same
|
||||
index by making one mapping type the parent of another:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"blog_tag" : {
|
||||
"_parent" : {
|
||||
"type" : "blog"
|
||||
}
|
||||
"mappings": {
|
||||
"my_parent": {},
|
||||
"my_child": {
|
||||
"_parent": {
|
||||
"type": "my_parent" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_parent/1 <2>
|
||||
{
|
||||
"text": "This is a parent document"
|
||||
}
|
||||
|
||||
PUT my_index/my_child/2?parent=1 <3>
|
||||
{
|
||||
"text": "This is a child document"
|
||||
}
|
||||
|
||||
PUT my_index/my_child/3?parent=1 <3>
|
||||
{
|
||||
"text": "This is another child document"
|
||||
}
|
||||
|
||||
GET my_index/my_parent/_search
|
||||
{
|
||||
"query": {
|
||||
"has_child": { <4>
|
||||
"type": "my_child",
|
||||
"query": {
|
||||
"match": {
|
||||
"text": "child document"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `my_parent` type is parent to the `my_child` type.
|
||||
<2> Index a parent document.
|
||||
<3> Index two child documents, specifying the parent document's ID.
|
||||
<4> Find all parent documents that have children which match the query.
|
||||
|
||||
The mapping is automatically stored and indexed (meaning it can be
|
||||
searched on using the `_parent` field notation).
|
||||
|
||||
==== Limitations
|
||||
See the <<query-dsl-has-child-query,`has_child`>> and
|
||||
<<query-dsl-has-parent-query,`has_parent`>> queries,
|
||||
the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation,
|
||||
and <<parent-child-inner-hits,inner hits>> for more information.
|
||||
|
||||
The `_parent.type` setting can only point to a type that doesn't exist yet.
|
||||
This means that a type can't become a parent type after is has been created.
|
||||
The value of the `_parent` field is accessible in queries, aggregations, scripts,
|
||||
and when sorting:
|
||||
|
||||
The `parent.type` setting can't point to itself. This means self referential
|
||||
parent/child isn't supported.
|
||||
[source,js]
|
||||
--------------------------
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_parent": [ "1" ] <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"parents": {
|
||||
"terms": {
|
||||
"field": "_parent", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_parent": { <3>
|
||||
"order": "desc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"parent": {
|
||||
"script": "doc['_parent']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_parent` field (also see the <<query-dsl-has-parent-query,`has_parent` query>> and the <<query-dsl-has-child-query,`has_child` query>>)
|
||||
<2> Aggregating on the `_parent` field (also see the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation)
|
||||
<3> Sorting on the `_parent` field
|
||||
<4> Accessing the `_parent` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
|
||||
==== Parent-child restrictions
|
||||
|
||||
* The parent and child types must be different -- parent-child relationships
|
||||
cannot be established between documents of the same type.
|
||||
|
||||
* The `_parent.type` setting can only point to a type that doesn't exist yet.
|
||||
This means that a type cannot become a parent type after it is has been
|
||||
created.
|
||||
|
||||
* Parent and child documents must be indexed on the same shard. The `parent`
|
||||
ID is used as the <<mapping-routing-field,routing>> value for the child,
|
||||
to ensure that the child is indexed on the same shard as the parent.
|
||||
This means that the same `parent` value needs to be provided when
|
||||
<<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
|
||||
a child document.
|
||||
|
||||
==== Global ordinals
|
||||
|
||||
Parent-child uses <<global-ordinals,global ordinals>> to speed up joins and global ordinals need to be rebuilt after any change to a shard.
|
||||
The more parent id values are stored in a shard, the longer it takes to rebuild global ordinals for the `_parent` field.
|
||||
Parent-child uses <<global-ordinals,global ordinals>> to speed up joins.
|
||||
Global ordinals need to be rebuilt after any change to a shard. The more
|
||||
parent id values are stored in a shard, the longer it takes to rebuild the
|
||||
global ordinals for the `_parent` field.
|
||||
|
||||
Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals.
|
||||
This can introduce a significant latency spike for your users. You can use <<fielddata-loading,eager_global_ordinals>> to shift the cost of building global ordinals
|
||||
from query time to refresh time, by mapping the _parent field as follows:
|
||||
|
||||
==== Memory usage
|
||||
|
||||
The only on heap memory used by parent/child is the global ordinals for the `_parent` field.
|
||||
|
||||
How much memory is used for the global ordianls for the `_parent` field in the fielddata cache
|
||||
can be checked via the <<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
|
||||
APIS, eg:
|
||||
Global ordinals, by default, are built lazily: the first parent-child query or
|
||||
aggregation after a refresh will trigger building of global ordinals. This can
|
||||
introduce a significant latency spike for your users. You can use
|
||||
<<fielddata-loading,eager_global_ordinals>> to shift the cost of building global
|
||||
ordinals from query time to refresh time, by mapping the `_parent` field as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
curl -XGET "http://localhost:9200/_stats/fielddata?pretty&human&fielddata_fields=_parent"
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_parent": {},
|
||||
"my_child": {
|
||||
"_parent": {
|
||||
"type": "my_parent",
|
||||
"fielddata": {
|
||||
"loading": "eager_global_ordinals"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
The amount of heap used by global ordinals can be checked as follows:
|
||||
|
||||
[source,sh]
|
||||
--------------------------------------------------
|
||||
# Per-index
|
||||
GET _stats/fielddata?human&fields=_parent
|
||||
|
||||
# Per-node per-index
|
||||
GET _nodes/stats/indices/fielddata?human&fields=_parent
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
|
@ -1,22 +1,134 @@
|
||||
[[mapping-routing-field]]
|
||||
=== `_routing`
|
||||
=== `_routing` field
|
||||
|
||||
The routing field allows to control the `_routing` aspect when indexing
|
||||
data and explicit routing control is required. It is stored and indexed.
|
||||
A document is routed to a particular shard in an index using the following
|
||||
formula:
|
||||
|
||||
[float]
|
||||
==== required
|
||||
shard_num = hash(_routing) % num_primary_shards
|
||||
|
||||
Another aspect of the `_routing` mapping is the ability to define it as
|
||||
required by setting `required` to `true`. This is very important to set
|
||||
when using routing features, as it allows different APIs to make use of
|
||||
it. For example, an index operation will be rejected if no routing value
|
||||
has been provided.
|
||||
The default value used for `_routing` is the document's <<mapping-id-field,`_id`>>
|
||||
or the document's <<mapping-parent-field,`_parent`>> ID, if present.
|
||||
|
||||
[float]
|
||||
==== id uniqueness
|
||||
Custom routing patterns can be implemented by specifying a custom `routing`
|
||||
value per document. For instance:
|
||||
|
||||
When indexing documents specifying a custom `_routing`, the uniqueness
|
||||
of the `_id` is not guaranteed throughout all the shards that the index
|
||||
is composed of. In fact, documents with the same `_id` might end up in
|
||||
different shards if indexed with different `_routing` values.
|
||||
[source,js]
|
||||
------------------------------
|
||||
PUT my_index/my_type/1?routing=user1 <1>
|
||||
{
|
||||
"title": "This is a document"
|
||||
}
|
||||
|
||||
GET my_index/my_type/1?routing=user1 <2>
|
||||
------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> This document uses `user1` as its routing value, instead of its ID.
|
||||
<2> The the same `routing` value needs to be provided when
|
||||
<<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
|
||||
the document.
|
||||
|
||||
The value of the `_routing` field is accessible in queries, aggregations, scripts,
|
||||
and when sorting:
|
||||
|
||||
[source,js]
|
||||
--------------------------
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_routing": [ "user1" ] <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"Routing values": {
|
||||
"terms": {
|
||||
"field": "_routing", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_routing": { <3>
|
||||
"order": "desc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"Routing value": {
|
||||
"script": "doc['_routing']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_routing` field (also see the <<query-dsl-ids-query,`ids` query>>)
|
||||
<2> Aggregating on the `_routing` field
|
||||
<3> Sorting on the `_routing` field
|
||||
<4> Accessing the `_routing` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
|
||||
==== Searching with custom routing
|
||||
|
||||
Custom routing can reduce the impact of searches. Instead of having to fan
|
||||
out a search request to all the shards in an index, the request can be sent to
|
||||
just the shard that matches the specific routing value (or values):
|
||||
|
||||
[source,js]
|
||||
------------------------------
|
||||
GET my_index/_search?routing=user1,user2 <1>
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"title": "document"
|
||||
}
|
||||
}
|
||||
}
|
||||
------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> This search request will only be executed on the shards associated with the `user1` and `user2` routing values.
|
||||
|
||||
|
||||
==== Making a routing value required
|
||||
|
||||
When using custom routing, it is important to provide the routing value
|
||||
whenever <<docs-index_,indexing>>, <<docs-get,getting>>,
|
||||
<<docs-delete,deleting>>, or <<docs-update,updating>> a document.
|
||||
|
||||
Forgetting the routing value can lead to a document being indexed on more than
|
||||
one shard. As a safeguard, the `_routing` field can be configured to make a
|
||||
custom `routing` value required for all CRUD operations:
|
||||
|
||||
[source,js]
|
||||
------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"_routing": {
|
||||
"required": true <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1 <2>
|
||||
{
|
||||
"text": "No routing value provided"
|
||||
}
|
||||
------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Routing is required for `my_type` documents.
|
||||
<2> This index request throws a `routing_missing_exception`.
|
||||
|
||||
==== Unique IDs with custom routing
|
||||
|
||||
When indexing documents specifying a custom `_routing`, the uniqueness of the
|
||||
`_id` is not guaranteed across all of the shards in the index. In fact,
|
||||
documents with the same `_id` might end up on different shards if indexed with
|
||||
different `_routing` values.
|
||||
|
||||
It is up to the user to ensure that IDs are unique across the index.
|
||||
|
@ -1,15 +1,76 @@
|
||||
[[mapping-size-field]]
|
||||
=== `_size`
|
||||
=== `_size` field
|
||||
|
||||
The `_size` field allows to automatically index the size of the original
|
||||
`_source` indexed. By default, it's disabled. In order to enable it, set
|
||||
the mapping to:
|
||||
The `_size` field, when enabled, indexes the size in bytes of the original
|
||||
<<mapping-source-field,`_source`>>. In order to enable it, set
|
||||
the mapping as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
--------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"tweet" : {
|
||||
"_size" : {"enabled" : true}
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"_size": {
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
The value of the `_size` field is accessible in queries, aggregations, scripts,
|
||||
and when sorting:
|
||||
|
||||
[source,js]
|
||||
--------------------------
|
||||
# Example documents
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "This is a document"
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"text": "This is another document"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"range": {
|
||||
"_size": { <1>
|
||||
"gt": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"Sizes": {
|
||||
"terms": {
|
||||
"field": "_size", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_size": { <3>
|
||||
"order": "desc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"Size": {
|
||||
"script": "doc['_size']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_size` field
|
||||
<2> Aggregating on the `_size` field
|
||||
<3> Sorting on the `_size` field
|
||||
<4> Accessing the `_size` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
|
@ -1,12 +1,12 @@
|
||||
[[mapping-source-field]]
|
||||
=== `_source`
|
||||
=== `_source` field
|
||||
|
||||
The `_source` field is an automatically generated field that stores the actual
|
||||
JSON that was used as the indexed document. It is not indexed (searchable),
|
||||
just stored. When executing "fetch" requests, like <<docs-get,get>> or
|
||||
<<search-search,search>>, the `_source` field is returned by default.
|
||||
The `_source` field contains the original JSON document body that was passed
|
||||
at index time. The `_source` field itself is not indexed (and thus is not
|
||||
searchable), but it is stored so that it can be returned when executing
|
||||
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.
|
||||
|
||||
==== Disabling source
|
||||
==== Disabling the `_source` field
|
||||
|
||||
Though very handy to have around, the source field does incur storage overhead
|
||||
within the index. For this reason, it can be disabled as follows:
|
||||
@ -26,7 +26,7 @@ PUT tweets
|
||||
// AUTOSENSE
|
||||
|
||||
[WARNING]
|
||||
.Think before disabling the source field
|
||||
.Think before disabling the `_source` field
|
||||
==================================================
|
||||
|
||||
Users often disable the `_source` field without thinking about the
|
||||
@ -46,11 +46,11 @@ available then a number of features are not supported:
|
||||
|
||||
* Potentially in the future, the ability to repair index corruption
|
||||
automatically.
|
||||
|
||||
If disk space is a concern, rather increase the
|
||||
<<index-codec,compression level>> instead of disabling the `_source`.
|
||||
==================================================
|
||||
|
||||
TIP: If disk space is a concern, rather increase the
|
||||
<<index-codec,compression level>> instead of disabling the `_source`.
|
||||
|
||||
.The metrics use case
|
||||
**************************************************
|
||||
|
||||
@ -69,12 +69,20 @@ metrics case.
|
||||
|
||||
|
||||
[[include-exclude]]
|
||||
==== Including / Excluding fields from source
|
||||
==== Including / Excluding fields from `_source`
|
||||
|
||||
An expert-only feature is the ability to prune the contents of the `_source`
|
||||
field after the document has been indexed, but before the `_source` field is
|
||||
stored. The `includes`/`excludes` parameters (which also accept wildcards)
|
||||
can be used as follows:
|
||||
stored.
|
||||
|
||||
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
||||
`_source`, especially the fact that you cannot reindex documents from one
|
||||
Elasticsearch index to another. Consider using
|
||||
<<search-request-source-filtering,source filtering>> or a
|
||||
<<mapping-transform,transform script>> instead.
|
||||
|
||||
The `includes`/`excludes` parameters (which also accept wildcards) can be used
|
||||
as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
@ -126,8 +134,3 @@ GET logs/event/_search
|
||||
<1> These fields will be removed from the stored `_source` field.
|
||||
<2> We can still search on this field, even though it is not in the stored `_source`.
|
||||
|
||||
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
||||
`_source`, especially the fact that you cannot reindex documents from one
|
||||
Elasticsearch index to another. Consider using
|
||||
<<search-request-source-filtering,source filtering>> or a
|
||||
<<mapping-transform,transform script>> instead.
|
||||
|
@ -1,90 +1,94 @@
|
||||
[[mapping-timestamp-field]]
|
||||
=== `_timestamp`
|
||||
=== `_timestamp` field
|
||||
|
||||
The `_timestamp` field allows to automatically index the timestamp of a
|
||||
document. If it is not provided it will be automatically set
|
||||
to a <<mapping-timestamp-field-default,default date>>.
|
||||
|
||||
[float]
|
||||
==== enabled
|
||||
|
||||
By default it is disabled. In order to enable it, the following mapping
|
||||
should be defined:
|
||||
The `_timestamp` field, when enabled, allows a timestamp to be indexed and
|
||||
stored with a document. The timestamp may be specified manually, generated
|
||||
automatically, or set to a default value:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"tweet" : {
|
||||
"_timestamp" : { "enabled" : true }
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"_timestamp": { <1>
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[mapping-timestamp-field-format]]
|
||||
==== format
|
||||
PUT my_index/my_type/1?timestamp=2015-01-01 <2>
|
||||
{ "text": "Timestamp as a formatted date" }
|
||||
|
||||
You can define the <<mapping-date-format,date
|
||||
format>> used to parse the provided timestamp value. For example:
|
||||
PUT my_index/my_type/2?timestamp=1420070400000 <3>
|
||||
{ "text": "Timestamp as milliseconds since the epoch" }
|
||||
|
||||
PUT my_index/my_type/3 <4>
|
||||
{ "text": "Autogenerated timestamp set to now()" }
|
||||
|
||||
------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Enable the `_timestamp` field with default settings.
|
||||
<2> Set the timestamp manually with a formatted date.
|
||||
<3> Set the timestamp with milliseconds since the epoch.
|
||||
<4> Auto-generates a timestamp with <<date-math,now()>>.
|
||||
|
||||
The behaviour of the `_timestamp` field can be configured with the following parameters:
|
||||
|
||||
`default`::
|
||||
|
||||
A default value to be used if none is provided. Defaults to <<date-math,now()>>.
|
||||
|
||||
`format`::
|
||||
|
||||
The <<mapping-date-format,date format>> (or formats) to use when parsing timestamps. Defaults to `epoch_millis||strictDateOptionalTime`.
|
||||
|
||||
`ignore_missing`::
|
||||
|
||||
If `true` (default), replace missing timestamps with the `default` value. If `false`, throw an exception.
|
||||
|
||||
|
||||
The value of the `_timestamp` field is accessible in queries, aggregations, scripts,
|
||||
and when sorting:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
--------------------------
|
||||
GET my_index/_search
|
||||
{
|
||||
"tweet" : {
|
||||
"_timestamp" : {
|
||||
"enabled" : true,
|
||||
"path" : "post_date",
|
||||
"format" : "YYYY-MM-dd"
|
||||
}
|
||||
"query": {
|
||||
"range": {
|
||||
"_timestamp": { <1>
|
||||
"gte": "2015-01-01"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Note, the default format is `epoch_millis||strictDateOptionalTime`. The timestamp value will
|
||||
first be parsed as a number and if it fails the format will be tried.
|
||||
|
||||
[float]
|
||||
[[mapping-timestamp-field-default]]
|
||||
==== default
|
||||
|
||||
You can define a default value for when timestamp is not provided
|
||||
within the index request or in the `_source` document.
|
||||
|
||||
By default, the default value is `now` which means the date the document was processed by the indexing chain.
|
||||
|
||||
You can reject documents which do not provide a `timestamp` value by setting `ignore_missing` to false (default to `true`):
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"_timestamp" : {
|
||||
"enabled" : true,
|
||||
"ignore_missing" : false
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"Timestamps": {
|
||||
"terms": {
|
||||
"field": "_timestamp", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
You can also set the default value to any date respecting <<mapping-timestamp-field-format,timestamp format>>:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"_timestamp" : {
|
||||
"enabled" : true,
|
||||
"format" : "YYYY-MM-dd",
|
||||
"default" : "1970-01-01"
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_timestamp": { <3>
|
||||
"order": "desc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"Timestamp": {
|
||||
"script": "doc['_timestamp']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If you don't provide any timestamp value, _timestamp will be set to this default value.
|
||||
|
||||
In elasticsearch 1.4, we allowed setting explicitly `"default":null` which is not possible anymore
|
||||
as we added a new `ignore_missing` setting.
|
||||
When reading an index created with elasticsearch 1.4 and using this, we automatically update it by
|
||||
removing `"default": null` and setting `"ignore_missing": false`
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_timestamp` field
|
||||
<2> Aggregating on the `_timestamp` field
|
||||
<3> Sorting on the `_timestamp` field
|
||||
<4> Accessing the `_timestamp` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
@ -1,67 +1,106 @@
|
||||
[[mapping-ttl-field]]
|
||||
=== `_ttl`
|
||||
|
||||
A lot of documents naturally come with an expiration date. Documents can
|
||||
therefore have a `_ttl` (time to live), which will cause the expired
|
||||
documents to be deleted automatically.
|
||||
Some types of documents, such as session data or special offers, come with an
|
||||
expiration date. The `_ttl` field allows you to specify the minimum time a
|
||||
document should live, after which time the document is deleted automatically.
|
||||
|
||||
`_ttl` accepts two parameters which are described below, every other setting will be silently ignored.
|
||||
[TIP]
|
||||
.Prefer index-per-timeframe to TTL
|
||||
======================================================
|
||||
|
||||
[float]
|
||||
==== enabled
|
||||
With TTL , expired documents first have to be marked as deleted then later
|
||||
purged from the index when segments are merged. For append-only time-based
|
||||
data such as log events, it is much more efficient to use an index-per-day /
|
||||
week / month instead of TTLs. Old log data can be removed by simply deleting
|
||||
old indices.
|
||||
|
||||
By default it is disabled, in order to enable it, the following mapping
|
||||
should be defined:
|
||||
======================================================
|
||||
|
||||
The `_ttl` field may be enabled as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
-------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"tweet" : {
|
||||
"_ttl" : { "enabled" : true }
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"_ttl": {
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
`_ttl` can only be enabled once and never be disabled again.
|
||||
PUT my_index/my_type/1?ttl=10m <1>
|
||||
{
|
||||
"text": "Will expire in 10 minutes"
|
||||
}
|
||||
|
||||
[float]
|
||||
==== default
|
||||
PUT my_index/my_type/2 <2>
|
||||
{
|
||||
"text": "Will not expire"
|
||||
}
|
||||
-------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This document will expire 10 minutes after being indexed.
|
||||
<2> This document has no TTL set and will not expire.
|
||||
|
||||
You can provide a per index/type default `_ttl` value as follows:
|
||||
The expiry time is calculated as the value of the
|
||||
<<mapping-timestamp-field,`_timestamp`>> field (or `now()` if the `_timestamp`
|
||||
is not enabled) plus the `ttl` specified in the indexing request.
|
||||
|
||||
==== Default TTL
|
||||
|
||||
You can provide a default `_ttl`, which will be applied to indexing requests where the `ttl` is not specified:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
-------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"tweet" : {
|
||||
"_ttl" : { "enabled" : true, "default" : "1d" }
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"_ttl": {
|
||||
"enabled": true,
|
||||
"defaut": "5m"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In this case, if you don't provide a `_ttl` value in your query or in
|
||||
the `_source` all tweets will have a `_ttl` of one day.
|
||||
PUT my_index/my_type/1?ttl=10m <1>
|
||||
{
|
||||
"text": "Will expire in 10 minutes"
|
||||
}
|
||||
|
||||
In case you do not specify a time unit like `d` (days), `m` (minutes),
|
||||
`h` (hours), `ms` (milliseconds) or `w` (weeks), milliseconds is used as
|
||||
default unit.
|
||||
PUT my_index/my_type/2 <2>
|
||||
{
|
||||
"text": "Will expire in 5 minutes"
|
||||
}
|
||||
-------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This document will expire 10 minutes after being indexed.
|
||||
<2> This document has no TTL set and so will expire after the default 5 minutes.
|
||||
|
||||
If no `default` is set and no `_ttl` value is given then the document
|
||||
has an infinite `_ttl` and will not expire.
|
||||
The `default` value can use <<time-units,time units>> like `d` for days, and
|
||||
will use `ms` as the default unit if no time unit is provided.
|
||||
|
||||
You can dynamically update the `default` value using the put mapping
|
||||
API. It won't change the `_ttl` of already indexed documents but will be
|
||||
used for future documents.
|
||||
|
||||
[float]
|
||||
==== Note on documents expiration
|
||||
|
||||
Expired documents will be automatically deleted regularly. You can
|
||||
dynamically set the `indices.ttl.interval` to fit your needs. The
|
||||
default value is `60s`.
|
||||
Expired documents will be automatically deleted periodoically. The following
|
||||
settings control the expiry process:
|
||||
|
||||
The deletion orders are processed by bulk. You can set
|
||||
`indices.ttl.bulk_size` to fit your needs. The default value is `10000`.
|
||||
`indices.ttl.interval`::
|
||||
|
||||
How often the purge process should run. Defaults to `60s`. Expired documents
|
||||
may still be retrieved before they are purged.
|
||||
|
||||
`indices.ttl.bulk_size`::
|
||||
|
||||
How many deletions are handled by a single <<docs-bulk,`bulk`>> request. The
|
||||
default value is `10000`.
|
||||
|
||||
Note that the expiration procedure handle versioning properly so if a
|
||||
document is updated between the collection of documents to expire and
|
||||
the delete order, the document won't be deleted.
|
||||
|
@ -1,7 +1,60 @@
|
||||
[[mapping-type-field]]
|
||||
=== `_type`
|
||||
=== `_type` field
|
||||
|
||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
|
||||
`_type` field is indexed in order to make searching by type name fast.
|
||||
|
||||
The value of the `_type` field is accessible in queries, aggregations,
|
||||
scripts, and when sorting:
|
||||
|
||||
[source,js]
|
||||
--------------------------
|
||||
# Example documents
|
||||
PUT my_index/type_1/1
|
||||
{
|
||||
"text": "Document with type 1"
|
||||
}
|
||||
|
||||
PUT my_index/type_2/2
|
||||
{
|
||||
"text": "Document with type 2"
|
||||
}
|
||||
|
||||
GET my_index/_search/type_*
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_type": [ "type_1", "type_2" ] <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"types": {
|
||||
"terms": {
|
||||
"field": "_type", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_type": { <3>
|
||||
"order": "desc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"type": {
|
||||
"script": "doc['_type']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_type` field
|
||||
<2> Aggregating on the `_type` field
|
||||
<3> Sorting on the `_type` field
|
||||
<4> Accessing the `_type` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
Each document indexed is associated with an id and a type. The `_type`
|
||||
field allows accessing only the type of a document. It is indexed
|
||||
to allow quickly filtering on type, for example, when performing
|
||||
a search request on a single or multiple types.
|
||||
|
@ -1,10 +1,59 @@
|
||||
[[mapping-uid-field]]
|
||||
=== `_uid`
|
||||
=== `_uid` field
|
||||
|
||||
Each document indexed is associated with an id and a type, the internal
|
||||
`_uid` field is the unique identifier of a document within an index and
|
||||
is composed of the type and the id (meaning that different types can
|
||||
have the same id and still maintain uniqueness).
|
||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. These
|
||||
values are combined as `{type}#{id}` and indexed as the `_uid` field.
|
||||
|
||||
The value of the `_uid` field is accessible in queries, aggregations, scripts,
|
||||
and when sorting:
|
||||
|
||||
[source,js]
|
||||
--------------------------
|
||||
# Example documents
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Document with ID 1"
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"text": "Document with ID 2"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"terms": {
|
||||
"_uid": [ "my_type#1", "my_type#2" ] <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"UIDs": {
|
||||
"terms": {
|
||||
"field": "_uid", <2>
|
||||
"size": 10
|
||||
}
|
||||
}
|
||||
},
|
||||
"sort": [
|
||||
{
|
||||
"_uid": { <3>
|
||||
"order": "desc"
|
||||
}
|
||||
}
|
||||
],
|
||||
"script_fields": {
|
||||
"UID": {
|
||||
"script": "doc['_uid']" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Querying on the `_uid` field (also see the <<query-dsl-ids-query,`ids` query>>)
|
||||
<2> Aggregating on the `_uid` field
|
||||
<3> Sorting on the `_uid` field
|
||||
<4> Accessing the `_uid` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
|
||||
|
||||
The `_uid` field is for type based filtering, as well as for
|
||||
lookups of `_id` and `_type`.
|
||||
|
Loading…
x
Reference in New Issue
Block a user