Docs: Refactored the mapping meta-fields docs

This commit is contained in:
Clinton Gormley 2015-07-20 01:24:29 +02:00
parent c4778b8e78
commit c56ce0e242
14 changed files with 1237 additions and 293 deletions

View File

@ -52,7 +52,7 @@ creating a new index.
[float]
=== Mapper settings
`index.mapper.dynamic` (_static_)::
`index.mapper.dynamic` (_dynamic_)::
Dynamic creation of mappings for unmapped types can be completely
disabled by setting `index.mapper.dynamic` to `false`.

View File

@ -1,27 +1,78 @@
[[mapping-fields]]
== Fields
== Meta-Fields
Each mapping has a number of fields associated with it
which can be used to control how the document metadata
(eg <<mapping-all-field>>) is indexed.
Each document has metadata associated with it, such as the `_index`, mapping
<<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields
can be customised when a mapping type is created.
The meta-fields are:
[horizontal]
<<mapping-index-field,`_index`>>::
The index to which the document belongs.
<<mapping-uid-field,`_uid`>>::
A composite field consisting of the `_type` and the `_id`.
<<mapping-type-field,`_type`>>::
The document's <<all-mapping-types,mapping type>>.
<<mapping-id-field,`_id`>>::
The document's ID.
<<mapping-source-field,`_source`>>::
The original JSON representing the body of the document.
<<mapping-all-field,`_all`>>::
A _catch-all_ field that indexes the values of all other fields.
<<mapping-field-names-field,`_field_names`>>::
All fields in the document which contain non-null values.
<<mapping-parent-field,`_parent`>>::
Used to create a parent-child relationship between two mapping types.
<<mapping-routing-field,`_routing`>>::
A custom routing value which routes a document to a particular shard.
<<mapping-size-field,`_size`>>::
The size of the `_source` field in bytes.
<<mapping-timestamp-field,`_timestamp`>>::
A timestamp associated with the document, either specified manually or auto-generated.
<<mapping-ttl-field,`_ttl`>>::
How long a document should live before it is automatically deleted.
include::fields/index-field.asciidoc[]
include::fields/uid-field.asciidoc[]
include::fields/id-field.asciidoc[]
include::fields/type-field.asciidoc[]
include::fields/id-field.asciidoc[]
include::fields/source-field.asciidoc[]
include::fields/all-field.asciidoc[]
include::fields/parent-field.asciidoc[]
include::fields/field-names-field.asciidoc[]
include::fields/routing-field.asciidoc[]
include::fields/parent-field.asciidoc[]
include::fields/index-field.asciidoc[]
include::fields/routing-field.asciidoc[]
include::fields/size-field.asciidoc[]

View File

@ -1,78 +1,416 @@
[[mapping-all-field]]
=== `_all`
=== `_all` field
The idea of the `_all` field is that it includes the text of one or more
other fields within the document indexed. It can come very handy
especially for search requests, where we want to execute a search query
against the content of a document, without knowing which fields to
search on. This comes at the expense of CPU cycles and index size.
The `_all` field is a special _catch-all_ field which concatenates the values
of all of the other fields into one big string, which is then
<<analysis,analyzed>> and indexed, but not stored. This means that it can be
searched, but not retrieved.
The `_all` fields can be completely disabled. Explicit field mappings and
object mappings can be excluded / included in the `_all` field. By
default, it is enabled and all fields are included in it for ease of
use.
When disabling the `_all` field, it is a good practice to set
`index.query.default_field` to a different value (for example, if you
have a main "message" field in your data, set it to `message`).
One of the nice features of the `_all` field is that it takes into
account specific fields boost levels. Meaning that if a title field is
boosted more than content, the title (part) in the `_all` field will
mean more than the content (part) in the `_all` field.
Here is a sample mapping:
The `_all` field allows you to search for values in documents without knowing
which field contains the value. This makes it a useful option when getting
started with a new dataset. For instance:
[source,js]
--------------------------------------------------
--------------------------------
PUT my_index/user/1 <1>
{
"person" : {
"_all" : {"enabled" : true},
"properties" : {
"name" : {
"type" : "object",
"dynamic" : false,
"properties" : {
"first" : {"type" : "string", "store" : true , "include_in_all" : false},
"last" : {"type" : "string", "index" : "not_analyzed"}
}
},
"address" : {
"type" : "object",
"include_in_all" : false,
"properties" : {
"first" : {
"properties" : {
"location" : {"type" : "string", "store" : true}
}
},
"last" : {
"properties" : {
"location" : {"type" : "string"}
}
}
}
},
"simple1" : {"type" : "long", "include_in_all" : true},
"simple2" : {"type" : "long", "include_in_all" : false}
}
}
"first_name": "John",
"last_name": "Smith",
"date_of_birth": "1970-10-24"
}
--------------------------------------------------
The `_all` fields allows for `store`, `term_vector` and `analyzer` (with
specific `analyzer` and `search_analyzer`) to be set.
GET my_index/_search
{
"query": {
"match": {
"_all": "john smith 1970"
}
}
}
--------------------------------
// AUTOSENSE
<1> The `_all` field will contain the terms: [ `"john"`, `"smith"`, `"1970"`, `"10"`, `"24"` ]
[float]
[[highlighting]]
==== Highlighting
[NOTE]
.All values treated as strings
=============================================================================
The `date_of_birth` field in the above example is recognised as a `date` field
and so will index a single term representing `1970-10-24 00:00:00 UTC`. The
`_all` field, however, treats all values as strings, so the date value is
indexed as the three string terms: `"1970"`, `"24"`, `"10"`.
It is important to note that the `_all` field combines the original values
from each field as a string. It does not combine the _terms_ from each field.
=============================================================================
The `_all` field is just a <<string,`string`>> field, and accepts the same
parameters that other string fields accept, including `analyzer`,
`term_vectors`, `index_options`, and `store`.
The `_all` field can be useful, especially when exploring new data using
simple filtering. However, by concatenating field values into one big string,
the `_all` field loses the distinction between short fields (more relevant)
and long fields (less relevant). For use cases where search relevance is
important, it is better to query individual fields specifically.
The `_all` field is not free: it requires extra CPU cycles and uses more disk
space. If not needed, it can be completely <<disabling-all-field,disabled>> or
customised on a <<include-in-all,per-field basis>>.
[[querying-all-field]]
==== Using the `_all` field in queries
The <<query-dsl-query-string-query,`query_string`>> and
<<query-dsl-simple-query-string-query,`simple_query_string`>> queries query
the `_all` field by default, unless another field is specified:
[source,js]
--------------------------------
GET _search
{
"query": {
"query_string": {
"query": "john smith 1970"
}
}
}
--------------------------------
// AUTOSENSE
The same goes for the `?q=` parameter in <<search-uri-request, URI search
requests>> (which is rewritten to a `query_string` query internally):
[source,js]
--------------------------------
GET _search?q=john+smith+1970
--------------------------------
Other queries, such as the <<query-dsl-match-query,`match`>> and
<<query-dsl-term-query,`term`>> queries require you to specify
the `_all` field explicitly, as per the
<<mapping-all-field,first example>>.
[[disabling-all-field]]
==== Disabling the `_all` field
The `_all` field can be completely disabled per-type by setting `enabled` to
`false`:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"type_1": { <1>
"properties": {...}
},
"type_2": { <2>
"_all": {
"enabled": false
},
"properties": {...}
}
}
}
--------------------------------
// AUTOSENSE
<1> The `_all` field in `type_1` is enabled.
<2> The `_all` field in `type_2` is completely disabled.
If the `_all` field is disabled, then URI search requests and the
`query_string` and `simple_query_string` queries will not be able to use it
for queries (see <<querying-all-field>>). You can configure them to use a
different field with the `index.query.default_field` setting:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"_all": {
"enabled": false <1>
},
"properties": {
"content": {
"type": "string"
}
}
}
},
"settings": {
"index.query.default_field": "content" <2>
},
}
--------------------------------
// AUTOSENSE
<1> The `_all` field is disabled for the `my_type` type.
<2> The `query_string` query will default to querying the `content` field in this index.
[[include-in-all]]
==== Including specific fields in `_all`
Individual fields can be included or excluded from the `_all` field with the
`include_in_all` setting, which defaults to `true`:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": { <1>
"type": "string"
}
"content": { <1>
"type": "string"
},
"date": { <2>
"type": "date",
"include_in_all": false
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> The `title` and `content` fields with be included in the `_all` field.
<2> The `date` field will not be included in the `_all` field.
The `include_in_all` parameter can also be set at the type level and on
<<mapping-object-type,`object`>> or <<mapping-nested-type,`nested`>> fields,
in which case all sub-fields inherit that setting. For instance:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"include_in_all": false, <1>
"properties": {
"title": { "type": "string" },
"author": {
"include_in_all": true, <2>
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" }
}
},
"editor": {
"properties": {
"first_name": { "type": "string" }, <3>
"last_name": { "type": "string", "include_in_all": true } <3>
}
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> All fields in `my_type` are excluded from `_all`.
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
<3> Only the `editor.last_name` field is included in `_all`.
The `editor.first_name` inherits the type-level setting and is excluded.
[[all-field-and-boosting]]
==== Index boosting and the `_all` field
Individual fields can be _boosted_ at index time, with the `boost` parameter.
The `_all` field takes these boosts into account:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"properties": {
"title": { <1>
"type": "string",
"boost": 2
},
"content": { <1>
"type": "string"
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> When querying the `_all` field, words that originated in the
`title` field are twice as relevant as words that originated in
the `content` field.
WARNING: Using index-time boosting with the `_all` field has a significant
impact on query performance. Usually the better solution is to query fields
individually, with optional query time boosting.
[[custom-all-fields]]
==== Custom `_all` fields
While there is only a single `_all` field per index, the <<copy-to,`copy_to`>>
parameter allows the creation of multiple __custom `_all` fields__. For
instance, `first_name` and `last_name` fields can be combined together into
the `full_name` field:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"properties": {
"first_name": {
"type": "string",
"copy_to": "full_name" <1>
},
"last_name": {
"type": "string",
"copy_to": "full_name" <1>
},
"full_name": {
"type": "string"
}
}
}
}
}
PUT myindex/mytype/1
{
"first_name": "John",
"last_name": "Smith"
}
GET myindex/_search
{
"query": {
"match": {
"full_name": "John Smith"
}
}
}
--------------------------------
// AUTOSENSE
<1> The `first_name` and `last_name` values are copied to the `full_name` field.
[[highlighting-all-field]]
==== Highlighting and the `_all` field
A field can only be used for <<search-request-highlighting,highlighting>> if
the original string value is available, either from the
<<mapping-source-field,`_source`>> field or as a stored field.
The `_all` field is not present in the `_source` field and it is not stored by
default, and so cannot be highlighted. There are two options. Either
<<all-field-store,store the `_all` field>> or highlight the
<<all-highlight-fields,original fields>>.
[[all-field-store]]
===== Store the `_all` field
If `store` is set to `true`, then the original field value is retrievable and
can be highlighted:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"_all": {
"store": true
}
}
}
}
PUT myindex/mytype/1
{
"first_name": "John",
"last_name": "Smith"
}
GET _search
{
"query": {
"match": {
"_all": "John Smith"
}
},
"highlight": {
"fields": {
"_all": {}
}
}
}
--------------------------------
// AUTOSENSE
Of course, storing the `_all` field will use significantly more disk space
and, because it is a combination of other fields, it may result in odd
highlighting results.
The `_all` field also accepts the `term_vector` and `index_options`
parameters, allowing the use of the fast vector highlighter and the postings
highlighter.
[[all-highlight-fields]]
===== Highlight original fields
You can query the `_all` field, but use the original fields for highlighting as follows:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"_all": {}
}
}
}
PUT myindex/mytype/1
{
"first_name": "John",
"last_name": "Smith"
}
GET _search
{
"query": {
"match": {
"_all": "John Smith" <1>
}
},
"highlight": {
"fields": {
"*_name": { <2>
"require_field_match": "false" <3>
}
}
}
}
--------------------------------
// AUTOSENSE
<1> The query inspects the `_all` field to find matching documents.
<2> Highlighting is performed on the two name fields, which are available from the `_source`.
<3> The query wasn't run against the name fields, so set `require_field_match` to `false`.
For any field to allow
<<search-request-highlighting,highlighting>> it has
to be either stored or part of the `_source` field. By default the `_all`
field does not qualify for either, so highlighting for it does not yield
any data.
Although it is possible to `store` the `_all` field, it is basically an
aggregation of all fields, which means more data will be stored, and
highlighting it might produce strange results.

View File

@ -1,6 +1,55 @@
[[mapping-field-names-field]]
=== `_field_names`
=== `_field_names` field
The `_field_names` field indexes the names of every field in a document that
contains any value other than `null`. This field is used by the
<<query-dsl-exists-query,`exists`>> and <<query-dsl-missing-query,`missing`>>
queries to find documents that either have or don't have any non-+null+ value
for a particular field.
The value of the `_field_name` field is accessible in queries, aggregations, and
scripts:
[source,js]
--------------------------
# Example documents
PUT my_index/my_type/1
{
"title": "This is a document"
}
PUT my_index/my_type/1
{
"title": "This is another document",
"body": "This document has a body"
}
GET my_index/_search
{
"query": {
"terms": {
"_field_names": [ "title" ] <1>
}
},
"aggs": {
"Field names": {
"terms": {
"field": "_field_names", <2>
"size": 10
}
}
},
"script_fields": {
"Field names": {
"script": "doc['_field_names']" <3>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_field_names` field (also see the <<query-dsl-exists-query,`exists`>> and <<query-dsl-missing-query,`missing`>> queries)
<2> Aggregating on the `_field_names` field
<3> Accessing the `_field_names` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
The `_field_names` field indexes the field names of a document, which can later
be used to search for documents based on the fields that they contain typically
using the `exists` and `missing` filters.

View File

@ -1,11 +1,44 @@
[[mapping-id-field]]
=== `_id`
=== `_id` field
Each document indexed is associated with an id and a type. The `_id`
field allows accessing only the id of a document.
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
`_id` field is not indexed as its value can be derived automatically from the
<<mapping-uid-field,`_uid`>> field.
Note, even though the `_id` is not indexed, all the APIs still work
(since they work with the `_uid` field), as well as fetching by ids
using `term`, `terms` or `prefix` queries/filters (including the
specific `ids` query/filter).
The value of the `_id` field is accessible in queries and scripts, but _not_
in aggregations or when sorting, where the <<mapping-uid-field,`_uid`>> field
should be used instead:
[source,js]
--------------------------
# Example documents
PUT my_index/my_type/1
{
"text": "Document with ID 1"
}
PUT my_index/my_type/2
{
"text": "Document with ID 2"
}
GET my_index/_search
{
"query": {
"terms": {
"_id": [ "1", "2" ] <1>
}
},
"script_fields": {
"UID": {
"script": "doc['_id']" <2>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_id` field (also see the <<query-dsl-ids-query,`ids` query>>)
<2> Accessing the `_id` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)

View File

@ -1,15 +1,56 @@
[[mapping-index-field]]
=== `_index`
=== `_index` field
The ability to store in a document the index it belongs to. By default
it is disabled, in order to enable it, the following mapping should be
defined:
The name of the index that contains the document. This field is not indexed
but can be automatically derived from the index itself.
Its value is accessible in queries, aggregations, scripts, and when sorting:
[source,js]
--------------------------------------------------
--------------------------
# Example documents
PUT index_1/my_type/1
{
"tweet" : {
"_index" : { "enabled" : true }
}
"text": "Document in index 1"
}
--------------------------------------------------
PUT index_2/my_type/2
{
"text": "Document in index 2"
}
GET index_1,index_2/_search
{
"query": {
"terms": {
"_index": ["index_1", "index_2"] <1>
}
},
"aggs": {
"indices": {
"terms": {
"field": "_index", <2>
"size": 10
}
}
},
"sort": [
{
"_index": { <3>
"order": "asc"
}
}
],
"script_fields": {
"index_name": {
"script": "doc['_index']" <4>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_index` field
<2> Aggregating on the `_index` field
<3> Sorting on the `_index` field
<4> Accessing the `_index` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)

View File

@ -1,54 +1,165 @@
[[mapping-parent-field]]
=== `_parent`
=== `_parent` field
TIP: It is highly recommend to reindex all indices with `_parent` field created before version 2.x.
The reason for this is to gain from all the optimizations added with the 2.0 release.
added[2.0.0,The parent-child implementation has been completely rewritten. It is advisable to reindex any 1.x indices which use parent-child to take advantage of the new optimizations]
The parent field mapping is defined on a child mapping, and points to
the parent type this child relates to. For example, in case of a `blog`
type and a `blog_tag` type child document, the mapping for `blog_tag`
should be:
A parent-child relationship can be established between documents in the same
index by making one mapping type the parent of another:
[source,js]
--------------------------------------------------
PUT my_index
{
"blog_tag" : {
"_parent" : {
"type" : "blog"
}
"mappings": {
"my_parent": {},
"my_child": {
"_parent": {
"type": "my_parent" <1>
}
}
}
}
PUT my_index/my_parent/1 <2>
{
"text": "This is a parent document"
}
PUT my_index/my_child/2?parent=1 <3>
{
"text": "This is a child document"
}
PUT my_index/my_child/3?parent=1 <3>
{
"text": "This is another child document"
}
GET my_index/my_parent/_search
{
"query": {
"has_child": { <4>
"type": "my_child",
"query": {
"match": {
"text": "child document"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `my_parent` type is parent to the `my_child` type.
<2> Index a parent document.
<3> Index two child documents, specifying the parent document's ID.
<4> Find all parent documents that have children which match the query.
The mapping is automatically stored and indexed (meaning it can be
searched on using the `_parent` field notation).
==== Limitations
See the <<query-dsl-has-child-query,`has_child`>> and
<<query-dsl-has-parent-query,`has_parent`>> queries,
the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation,
and <<parent-child-inner-hits,inner hits>> for more information.
The `_parent.type` setting can only point to a type that doesn't exist yet.
This means that a type can't become a parent type after is has been created.
The value of the `_parent` field is accessible in queries, aggregations, scripts,
and when sorting:
The `parent.type` setting can't point to itself. This means self referential
parent/child isn't supported.
[source,js]
--------------------------
GET my_index/_search
{
"query": {
"terms": {
"_parent": [ "1" ] <1>
}
},
"aggs": {
"parents": {
"terms": {
"field": "_parent", <2>
"size": 10
}
}
},
"sort": [
{
"_parent": { <3>
"order": "desc"
}
}
],
"script_fields": {
"parent": {
"script": "doc['_parent']" <4>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_parent` field (also see the <<query-dsl-has-parent-query,`has_parent` query>> and the <<query-dsl-has-child-query,`has_child` query>>)
<2> Aggregating on the `_parent` field (also see the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation)
<3> Sorting on the `_parent` field
<4> Accessing the `_parent` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
==== Parent-child restrictions
* The parent and child types must be different -- parent-child relationships
cannot be established between documents of the same type.
* The `_parent.type` setting can only point to a type that doesn't exist yet.
This means that a type cannot become a parent type after it is has been
created.
* Parent and child documents must be indexed on the same shard. The `parent`
ID is used as the <<mapping-routing-field,routing>> value for the child,
to ensure that the child is indexed on the same shard as the parent.
This means that the same `parent` value needs to be provided when
<<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
a child document.
==== Global ordinals
Parent-child uses <<global-ordinals,global ordinals>> to speed up joins and global ordinals need to be rebuilt after any change to a shard.
The more parent id values are stored in a shard, the longer it takes to rebuild global ordinals for the `_parent` field.
Parent-child uses <<global-ordinals,global ordinals>> to speed up joins.
Global ordinals need to be rebuilt after any change to a shard. The more
parent id values are stored in a shard, the longer it takes to rebuild the
global ordinals for the `_parent` field.
Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals.
This can introduce a significant latency spike for your users. You can use <<fielddata-loading,eager_global_ordinals>> to shift the cost of building global ordinals
from query time to refresh time, by mapping the _parent field as follows:
==== Memory usage
The only on heap memory used by parent/child is the global ordinals for the `_parent` field.
How much memory is used for the global ordianls for the `_parent` field in the fielddata cache
can be checked via the <<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
APIS, eg:
Global ordinals, by default, are built lazily: the first parent-child query or
aggregation after a refresh will trigger building of global ordinals. This can
introduce a significant latency spike for your users. You can use
<<fielddata-loading,eager_global_ordinals>> to shift the cost of building global
ordinals from query time to refresh time, by mapping the `_parent` field as follows:
[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/_stats/fielddata?pretty&human&fielddata_fields=_parent"
PUT my_index
{
"mappings": {
"my_parent": {},
"my_child": {
"_parent": {
"type": "my_parent",
"fielddata": {
"loading": "eager_global_ordinals"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
The amount of heap used by global ordinals can be checked as follows:
[source,sh]
--------------------------------------------------
# Per-index
GET _stats/fielddata?human&fields=_parent
# Per-node per-index
GET _nodes/stats/indices/fielddata?human&fields=_parent
--------------------------------------------------
// AUTOSENSE

View File

@ -1,22 +1,134 @@
[[mapping-routing-field]]
=== `_routing`
=== `_routing` field
The routing field allows to control the `_routing` aspect when indexing
data and explicit routing control is required. It is stored and indexed.
A document is routed to a particular shard in an index using the following
formula:
[float]
==== required
shard_num = hash(_routing) % num_primary_shards
Another aspect of the `_routing` mapping is the ability to define it as
required by setting `required` to `true`. This is very important to set
when using routing features, as it allows different APIs to make use of
it. For example, an index operation will be rejected if no routing value
has been provided.
The default value used for `_routing` is the document's <<mapping-id-field,`_id`>>
or the document's <<mapping-parent-field,`_parent`>> ID, if present.
[float]
==== id uniqueness
Custom routing patterns can be implemented by specifying a custom `routing`
value per document. For instance:
When indexing documents specifying a custom `_routing`, the uniqueness
of the `_id` is not guaranteed throughout all the shards that the index
is composed of. In fact, documents with the same `_id` might end up in
different shards if indexed with different `_routing` values.
[source,js]
------------------------------
PUT my_index/my_type/1?routing=user1 <1>
{
"title": "This is a document"
}
GET my_index/my_type/1?routing=user1 <2>
------------------------------
// AUTOSENSE
<1> This document uses `user1` as its routing value, instead of its ID.
<2> The the same `routing` value needs to be provided when
<<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
the document.
The value of the `_routing` field is accessible in queries, aggregations, scripts,
and when sorting:
[source,js]
--------------------------
GET my_index/_search
{
"query": {
"terms": {
"_routing": [ "user1" ] <1>
}
},
"aggs": {
"Routing values": {
"terms": {
"field": "_routing", <2>
"size": 10
}
}
},
"sort": [
{
"_routing": { <3>
"order": "desc"
}
}
],
"script_fields": {
"Routing value": {
"script": "doc['_routing']" <4>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_routing` field (also see the <<query-dsl-ids-query,`ids` query>>)
<2> Aggregating on the `_routing` field
<3> Sorting on the `_routing` field
<4> Accessing the `_routing` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
==== Searching with custom routing
Custom routing can reduce the impact of searches. Instead of having to fan
out a search request to all the shards in an index, the request can be sent to
just the shard that matches the specific routing value (or values):
[source,js]
------------------------------
GET my_index/_search?routing=user1,user2 <1>
{
"query": {
"match": {
"title": "document"
}
}
}
------------------------------
// AUTOSENSE
<1> This search request will only be executed on the shards associated with the `user1` and `user2` routing values.
==== Making a routing value required
When using custom routing, it is important to provide the routing value
whenever <<docs-index_,indexing>>, <<docs-get,getting>>,
<<docs-delete,deleting>>, or <<docs-update,updating>> a document.
Forgetting the routing value can lead to a document being indexed on more than
one shard. As a safeguard, the `_routing` field can be configured to make a
custom `routing` value required for all CRUD operations:
[source,js]
------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"_routing": {
"required": true <1>
}
}
}
}
PUT my_index/my_type/1 <2>
{
"text": "No routing value provided"
}
------------------------------
// AUTOSENSE
<1> Routing is required for `my_type` documents.
<2> This index request throws a `routing_missing_exception`.
==== Unique IDs with custom routing
When indexing documents specifying a custom `_routing`, the uniqueness of the
`_id` is not guaranteed across all of the shards in the index. In fact,
documents with the same `_id` might end up on different shards if indexed with
different `_routing` values.
It is up to the user to ensure that IDs are unique across the index.

View File

@ -1,15 +1,76 @@
[[mapping-size-field]]
=== `_size`
=== `_size` field
The `_size` field allows to automatically index the size of the original
`_source` indexed. By default, it's disabled. In order to enable it, set
the mapping to:
The `_size` field, when enabled, indexes the size in bytes of the original
<<mapping-source-field,`_source`>>. In order to enable it, set
the mapping as follows:
[source,js]
--------------------------------------------------
--------------------------
PUT my_index
{
"tweet" : {
"_size" : {"enabled" : true}
"mappings": {
"my_type": {
"_size": {
"enabled": true
}
}
}
}
--------------------------------------------------
--------------------------
// AUTOSENSE
The value of the `_size` field is accessible in queries, aggregations, scripts,
and when sorting:
[source,js]
--------------------------
# Example documents
PUT my_index/my_type/1
{
"text": "This is a document"
}
PUT my_index/my_type/2
{
"text": "This is another document"
}
GET my_index/_search
{
"query": {
"range": {
"_size": { <1>
"gt": 10
}
}
},
"aggs": {
"Sizes": {
"terms": {
"field": "_size", <2>
"size": 10
}
}
},
"sort": [
{
"_size": { <3>
"order": "desc"
}
}
],
"script_fields": {
"Size": {
"script": "doc['_size']" <4>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_size` field
<2> Aggregating on the `_size` field
<3> Sorting on the `_size` field
<4> Accessing the `_size` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)

View File

@ -1,12 +1,12 @@
[[mapping-source-field]]
=== `_source`
=== `_source` field
The `_source` field is an automatically generated field that stores the actual
JSON that was used as the indexed document. It is not indexed (searchable),
just stored. When executing "fetch" requests, like <<docs-get,get>> or
<<search-search,search>>, the `_source` field is returned by default.
The `_source` field contains the original JSON document body that was passed
at index time. The `_source` field itself is not indexed (and thus is not
searchable), but it is stored so that it can be returned when executing
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.
==== Disabling source
==== Disabling the `_source` field
Though very handy to have around, the source field does incur storage overhead
within the index. For this reason, it can be disabled as follows:
@ -26,7 +26,7 @@ PUT tweets
// AUTOSENSE
[WARNING]
.Think before disabling the source field
.Think before disabling the `_source` field
==================================================
Users often disable the `_source` field without thinking about the
@ -46,11 +46,11 @@ available then a number of features are not supported:
* Potentially in the future, the ability to repair index corruption
automatically.
If disk space is a concern, rather increase the
<<index-codec,compression level>> instead of disabling the `_source`.
==================================================
TIP: If disk space is a concern, rather increase the
<<index-codec,compression level>> instead of disabling the `_source`.
.The metrics use case
**************************************************
@ -69,12 +69,20 @@ metrics case.
[[include-exclude]]
==== Including / Excluding fields from source
==== Including / Excluding fields from `_source`
An expert-only feature is the ability to prune the contents of the `_source`
field after the document has been indexed, but before the `_source` field is
stored. The `includes`/`excludes` parameters (which also accept wildcards)
can be used as follows:
stored.
WARNING: Removing fields from the `_source` has similar downsides to disabling
`_source`, especially the fact that you cannot reindex documents from one
Elasticsearch index to another. Consider using
<<search-request-source-filtering,source filtering>> or a
<<mapping-transform,transform script>> instead.
The `includes`/`excludes` parameters (which also accept wildcards) can be used
as follows:
[source,js]
--------------------------------------------------
@ -126,8 +134,3 @@ GET logs/event/_search
<1> These fields will be removed from the stored `_source` field.
<2> We can still search on this field, even though it is not in the stored `_source`.
WARNING: Removing fields from the `_source` has similar downsides to disabling
`_source`, especially the fact that you cannot reindex documents from one
Elasticsearch index to another. Consider using
<<search-request-source-filtering,source filtering>> or a
<<mapping-transform,transform script>> instead.

View File

@ -1,90 +1,94 @@
[[mapping-timestamp-field]]
=== `_timestamp`
=== `_timestamp` field
The `_timestamp` field allows to automatically index the timestamp of a
document. If it is not provided it will be automatically set
to a <<mapping-timestamp-field-default,default date>>.
[float]
==== enabled
By default it is disabled. In order to enable it, the following mapping
should be defined:
The `_timestamp` field, when enabled, allows a timestamp to be indexed and
stored with a document. The timestamp may be specified manually, generated
automatically, or set to a default value:
[source,js]
--------------------------------------------------
------------------------------------
PUT my_index
{
"tweet" : {
"_timestamp" : { "enabled" : true }
"mappings": {
"my_type": {
"_timestamp": { <1>
"enabled": true
}
}
}
}
--------------------------------------------------
[float]
[[mapping-timestamp-field-format]]
==== format
PUT my_index/my_type/1?timestamp=2015-01-01 <2>
{ "text": "Timestamp as a formatted date" }
You can define the <<mapping-date-format,date
format>> used to parse the provided timestamp value. For example:
PUT my_index/my_type/2?timestamp=1420070400000 <3>
{ "text": "Timestamp as milliseconds since the epoch" }
PUT my_index/my_type/3 <4>
{ "text": "Autogenerated timestamp set to now()" }
------------------------------------
// AUTOSENSE
<1> Enable the `_timestamp` field with default settings.
<2> Set the timestamp manually with a formatted date.
<3> Set the timestamp with milliseconds since the epoch.
<4> Auto-generates a timestamp with <<date-math,now()>>.
The behaviour of the `_timestamp` field can be configured with the following parameters:
`default`::
A default value to be used if none is provided. Defaults to <<date-math,now()>>.
`format`::
The <<mapping-date-format,date format>> (or formats) to use when parsing timestamps. Defaults to `epoch_millis||strictDateOptionalTime`.
`ignore_missing`::
If `true` (default), replace missing timestamps with the `default` value. If `false`, throw an exception.
The value of the `_timestamp` field is accessible in queries, aggregations, scripts,
and when sorting:
[source,js]
--------------------------------------------------
--------------------------
GET my_index/_search
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"path" : "post_date",
"format" : "YYYY-MM-dd"
}
"query": {
"range": {
"_timestamp": { <1>
"gte": "2015-01-01"
}
}
}
--------------------------------------------------
Note, the default format is `epoch_millis||strictDateOptionalTime`. The timestamp value will
first be parsed as a number and if it fails the format will be tried.
[float]
[[mapping-timestamp-field-default]]
==== default
You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.
By default, the default value is `now` which means the date the document was processed by the indexing chain.
You can reject documents which do not provide a `timestamp` value by setting `ignore_missing` to false (default to `true`):
[source,js]
--------------------------------------------------
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"ignore_missing" : false
}
},
"aggs": {
"Timestamps": {
"terms": {
"field": "_timestamp", <2>
"size": 10
}
}
}
--------------------------------------------------
You can also set the default value to any date respecting <<mapping-timestamp-field-format,timestamp format>>:
[source,js]
--------------------------------------------------
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"format" : "YYYY-MM-dd",
"default" : "1970-01-01"
}
},
"sort": [
{
"_timestamp": { <3>
"order": "desc"
}
}
],
"script_fields": {
"Timestamp": {
"script": "doc['_timestamp']" <4>
}
}
}
--------------------------------------------------
If you don't provide any timestamp value, _timestamp will be set to this default value.
In elasticsearch 1.4, we allowed setting explicitly `"default":null` which is not possible anymore
as we added a new `ignore_missing` setting.
When reading an index created with elasticsearch 1.4 and using this, we automatically update it by
removing `"default": null` and setting `"ignore_missing": false`
--------------------------
// AUTOSENSE
<1> Querying on the `_timestamp` field
<2> Aggregating on the `_timestamp` field
<3> Sorting on the `_timestamp` field
<4> Accessing the `_timestamp` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)

View File

@ -1,67 +1,106 @@
[[mapping-ttl-field]]
=== `_ttl`
A lot of documents naturally come with an expiration date. Documents can
therefore have a `_ttl` (time to live), which will cause the expired
documents to be deleted automatically.
Some types of documents, such as session data or special offers, come with an
expiration date. The `_ttl` field allows you to specify the minimum time a
document should live, after which time the document is deleted automatically.
`_ttl` accepts two parameters which are described below, every other setting will be silently ignored.
[TIP]
.Prefer index-per-timeframe to TTL
======================================================
[float]
==== enabled
With TTL , expired documents first have to be marked as deleted then later
purged from the index when segments are merged. For append-only time-based
data such as log events, it is much more efficient to use an index-per-day /
week / month instead of TTLs. Old log data can be removed by simply deleting
old indices.
By default it is disabled, in order to enable it, the following mapping
should be defined:
======================================================
The `_ttl` field may be enabled as follows:
[source,js]
--------------------------------------------------
-------------------------------
PUT my_index
{
"tweet" : {
"_ttl" : { "enabled" : true }
"mappings": {
"my_type": {
"_ttl": {
"enabled": true
}
}
}
}
--------------------------------------------------
`_ttl` can only be enabled once and never be disabled again.
PUT my_index/my_type/1?ttl=10m <1>
{
"text": "Will expire in 10 minutes"
}
[float]
==== default
PUT my_index/my_type/2 <2>
{
"text": "Will not expire"
}
-------------------------------
// AUTOSENSE
<1> This document will expire 10 minutes after being indexed.
<2> This document has no TTL set and will not expire.
You can provide a per index/type default `_ttl` value as follows:
The expiry time is calculated as the value of the
<<mapping-timestamp-field,`_timestamp`>> field (or `now()` if the `_timestamp`
is not enabled) plus the `ttl` specified in the indexing request.
==== Default TTL
You can provide a default `_ttl`, which will be applied to indexing requests where the `ttl` is not specified:
[source,js]
--------------------------------------------------
-------------------------------
PUT my_index
{
"tweet" : {
"_ttl" : { "enabled" : true, "default" : "1d" }
"mappings": {
"my_type": {
"_ttl": {
"enabled": true,
"defaut": "5m"
}
}
}
}
--------------------------------------------------
In this case, if you don't provide a `_ttl` value in your query or in
the `_source` all tweets will have a `_ttl` of one day.
PUT my_index/my_type/1?ttl=10m <1>
{
"text": "Will expire in 10 minutes"
}
In case you do not specify a time unit like `d` (days), `m` (minutes),
`h` (hours), `ms` (milliseconds) or `w` (weeks), milliseconds is used as
default unit.
PUT my_index/my_type/2 <2>
{
"text": "Will expire in 5 minutes"
}
-------------------------------
// AUTOSENSE
<1> This document will expire 10 minutes after being indexed.
<2> This document has no TTL set and so will expire after the default 5 minutes.
If no `default` is set and no `_ttl` value is given then the document
has an infinite `_ttl` and will not expire.
The `default` value can use <<time-units,time units>> like `d` for days, and
will use `ms` as the default unit if no time unit is provided.
You can dynamically update the `default` value using the put mapping
API. It won't change the `_ttl` of already indexed documents but will be
used for future documents.
[float]
==== Note on documents expiration
Expired documents will be automatically deleted regularly. You can
dynamically set the `indices.ttl.interval` to fit your needs. The
default value is `60s`.
Expired documents will be automatically deleted periodoically. The following
settings control the expiry process:
The deletion orders are processed by bulk. You can set
`indices.ttl.bulk_size` to fit your needs. The default value is `10000`.
`indices.ttl.interval`::
How often the purge process should run. Defaults to `60s`. Expired documents
may still be retrieved before they are purged.
`indices.ttl.bulk_size`::
How many deletions are handled by a single <<docs-bulk,`bulk`>> request. The
default value is `10000`.
Note that the expiration procedure handle versioning properly so if a
document is updated between the collection of documents to expire and
the delete order, the document won't be deleted.

View File

@ -1,7 +1,60 @@
[[mapping-type-field]]
=== `_type`
=== `_type` field
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
`_type` field is indexed in order to make searching by type name fast.
The value of the `_type` field is accessible in queries, aggregations,
scripts, and when sorting:
[source,js]
--------------------------
# Example documents
PUT my_index/type_1/1
{
"text": "Document with type 1"
}
PUT my_index/type_2/2
{
"text": "Document with type 2"
}
GET my_index/_search/type_*
{
"query": {
"terms": {
"_type": [ "type_1", "type_2" ] <1>
}
},
"aggs": {
"types": {
"terms": {
"field": "_type", <2>
"size": 10
}
}
},
"sort": [
{
"_type": { <3>
"order": "desc"
}
}
],
"script_fields": {
"type": {
"script": "doc['_type']" <4>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_type` field
<2> Aggregating on the `_type` field
<3> Sorting on the `_type` field
<4> Accessing the `_type` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
Each document indexed is associated with an id and a type. The `_type`
field allows accessing only the type of a document. It is indexed
to allow quickly filtering on type, for example, when performing
a search request on a single or multiple types.

View File

@ -1,10 +1,59 @@
[[mapping-uid-field]]
=== `_uid`
=== `_uid` field
Each document indexed is associated with an id and a type, the internal
`_uid` field is the unique identifier of a document within an index and
is composed of the type and the id (meaning that different types can
have the same id and still maintain uniqueness).
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. These
values are combined as `{type}#{id}` and indexed as the `_uid` field.
The value of the `_uid` field is accessible in queries, aggregations, scripts,
and when sorting:
[source,js]
--------------------------
# Example documents
PUT my_index/my_type/1
{
"text": "Document with ID 1"
}
PUT my_index/my_type/2
{
"text": "Document with ID 2"
}
GET my_index/_search
{
"query": {
"terms": {
"_uid": [ "my_type#1", "my_type#2" ] <1>
}
},
"aggs": {
"UIDs": {
"terms": {
"field": "_uid", <2>
"size": 10
}
}
},
"sort": [
{
"_uid": { <3>
"order": "desc"
}
}
],
"script_fields": {
"UID": {
"script": "doc['_uid']" <4>
}
}
}
--------------------------
// AUTOSENSE
<1> Querying on the `_uid` field (also see the <<query-dsl-ids-query,`ids` query>>)
<2> Aggregating on the `_uid` field
<3> Sorting on the `_uid` field
<4> Accessing the `_uid` field in scripts (inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work)
The `_uid` field is for type based filtering, as well as for
lookups of `_id` and `_type`.