Docs: Mapping docs completely rewritten for 2.0

This commit is contained in:
Clinton Gormley 2015-08-06 17:24:29 +02:00
parent 40cd460647
commit ac2b8951c6
87 changed files with 4760 additions and 2433 deletions

2
.gitignore vendored
View File

@ -14,7 +14,7 @@ docs/html/
docs/build.log docs/build.log
/tmp/ /tmp/
backwards/ backwards/
html_docs
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects) ## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
## All files (.project, .classpath, .settings/*) should be generated through Maven which ## All files (.project, .classpath, .settings/*) should be generated through Maven which
## will correctly set the classpath based on the declared dependencies and write settings ## will correctly set the classpath based on the declared dependencies and write settings

View File

@ -53,7 +53,7 @@ Response:
} }
-------------------------------------------------- --------------------------------------------------
The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the `geo_point` <<mapping-geo-point-type,type>>: The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the <<geo-point,`geo_point` type>>:
* Object format: `{ "lat" : 52.3760, "lon" : 4.894 }` - this is the safest format as it is the most explicit about the `lat` & `lon` values * Object format: `{ "lat" : 52.3760, "lon" : 4.894 }` - this is the safest format as it is the most explicit about the `lat` & `lon` values
* String format: `"52.3760, 4.894"` - where the first number is the `lat` and the second is the `lon` * String format: `"52.3760, 4.894"` - where the first number is the `lat` and the second is the `lon`

View File

@ -200,7 +200,7 @@ and therefore can't be used in the `order` option of the `terms` aggregator.
If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned. If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested` has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
or `reverse_nested` aggregator. Read more about nested in the <<mapping-nested-type,nested type mapping>>. or `reverse_nested` aggregator. Read more about nested in the <<nested,nested type mapping>>.
If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why

View File

@ -152,6 +152,33 @@ being consumed by a monitoring tool, rather than intended for human
consumption. The default for the `human` flag is consumption. The default for the `human` flag is
`false`. `false`.
[[date-math]]
=== Date Math
Most parameters which accept a formatted date value -- such as `gt` and `lt`
in <<query-dsl-range-query,range queries>> `range` queries, or `from` and `to`
in <<search-aggregations-bucket-daterange-aggregation,`daterange`
aggregations>> -- understand date maths.
The expression starts with an anchor date, which can either be `now`, or a
date string ending with `||`. This anchor date can optionally be followed by
one or more maths expressions:
* `+1h` - add one hour
* `-1d` - subtract one day
* `/d` - round down to the nearest day
The supported <<time-units,time units>> are: `y` (year), `M` (month), `w` (week),
`d` (day), `h` (hour), `m` (minute), and `s` (second).
Some examples are:
[horizontal]
`now+1h`:: The current time plus one hour, with ms resolution.
`now+1h+1m`:: The current time plus one hour plus one minute, with ms resolution.
`now+1h/d`:: The current time plus one hour, rounded down to the nearest day.
`2015-01-01||+1M/d`:: `2015-01-01` plus one month, rounded down to the nearest day.
[float] [float]
=== Response Filtering === Response Filtering
@ -237,10 +264,10 @@ curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
-------------------------------------------------- --------------------------------------------------
Note that elasticsearch sometimes returns directly the raw value of a field, Note that elasticsearch sometimes returns directly the raw value of a field,
like the `_source` field. If you want to filter _source fields, you should like the `_source` field. If you want to filter `_source` fields, you should
consider combining the already existing `_source` parameter (see consider combining the already existing `_source` parameter (see
<<get-source-filtering,Get API>> for more details) with the `filter_path` <<get-source-filtering,Get API>> for more details) with the `filter_path`
parameter like this: parameter like this:
[source,sh] [source,sh]
-------------------------------------------------- --------------------------------------------------
@ -318,8 +345,9 @@ of supporting the native JSON number types.
[float] [float]
=== Time units === Time units
Whenever durations need to be specified, eg for a `timeout` parameter, the duration Whenever durations need to be specified, eg for a `timeout` parameter, the
can be specified as a whole number representing time in milliseconds, or as a time value like `2d` for 2 days. The supported units are: duration must specify the unit, like `2d` for 2 days. The supported units
are:
[horizontal] [horizontal]
`y`:: Year `y`:: Year
@ -329,6 +357,7 @@ can be specified as a whole number representing time in milliseconds, or as a ti
`h`:: Hour `h`:: Hour
`m`:: Minute `m`:: Minute
`s`:: Second `s`:: Second
`ms`:: Milli-second
[[distance-units]] [[distance-units]]
[float] [float]

View File

@ -6,53 +6,3 @@ added to an index either when creating it or by using the put mapping
api. It also handles the dynamic mapping support for types that have no api. It also handles the dynamic mapping support for types that have no
explicit mappings pre defined. For more information about mapping explicit mappings pre defined. For more information about mapping
definitions, check out the <<mapping,mapping section>>. definitions, check out the <<mapping,mapping section>>.
[float]
=== Dynamic Mappings
New types and new fields within types can be added dynamically just
by indexing a document. When Elasticsearch encounters a new type,
it creates the type using the `_default_` mapping (see below).
When it encounters a new field within a type, it autodetects the
datatype that the field contains and adds it to the type mapping
automatically.
See <<mapping-dynamic-mapping>> for details of how to control and
configure dynamic mapping.
[float]
=== Default Mapping
When a new type is created (at <<indices-create-index,index creation>> time,
using the <<indices-put-mapping,`put-mapping` API>> or just by indexing a
document into it), the type uses the `_default_` mapping as its basis. Any
mapping specified in the <<indices-create-index,`create-index`>> or
<<indices-put-mapping,`put-mapping`>> request override values set in the
`_default_` mapping.
The default mapping definition is a plain mapping definition that is
embedded within Elasticsearch:
[source,js]
--------------------------------------------------
{
_default_ : {
}
}
--------------------------------------------------
Pretty short, isn't it? Basically, everything is `_default_`ed, including the
dynamic nature of the root object mapping which allows new fields to be added
automatically.
The default mapping can be overridden by specifying the `_default_` type when
creating a new index.
[float]
=== Mapper settings
`index.mapper.dynamic` (_dynamic_)::
Dynamic creation of mappings for unmapped types can be completely
disabled by setting `index.mapper.dynamic` to `false`.

View File

@ -6,8 +6,8 @@ are scored. Similarity is per field, meaning that via the mapping one
can define a different similarity per field. can define a different similarity per field.
Configuring a custom similarity is considered a expert feature and the Configuring a custom similarity is considered a expert feature and the
builtin similarities are most likely sufficient as is described in the builtin similarities are most likely sufficient as is described in
<<mapping-core-types,mapping section>> <<similarity>>.
[float] [float]
[[configuration]] [[configuration]]
@ -41,7 +41,7 @@ Here we configure the DFRSimilarity so it can be referenced as
"properties" : { "properties" : {
"title" : { "type" : "string", "similarity" : "my_similarity" } "title" : { "type" : "string", "similarity" : "my_similarity" }
} }
} }
-------------------------------------------------- --------------------------------------------------
[float] [float]
@ -52,9 +52,9 @@ Here we configure the DFRSimilarity so it can be referenced as
==== Default similarity ==== Default similarity
The default similarity that is based on the TF/IDF model. This The default similarity that is based on the TF/IDF model. This
similarity has the following option: similarity has the following option:
`discount_overlaps`:: `discount_overlaps`::
Determines whether overlap tokens (Tokens with Determines whether overlap tokens (Tokens with
0 position increment) are ignored when computing norm. By default this 0 position increment) are ignored when computing norm. By default this
is true, meaning overlap tokens do not count when computing norms. is true, meaning overlap tokens do not count when computing norms.
@ -71,14 +71,14 @@ http://en.wikipedia.org/wiki/Okapi_BM25[Okapi_BM25] for more details.
This similarity has the following options: This similarity has the following options:
[horizontal] [horizontal]
`k1`:: `k1`::
Controls non-linear term frequency normalization Controls non-linear term frequency normalization
(saturation). (saturation).
`b`:: `b`::
Controls to what degree document length normalizes tf values. Controls to what degree document length normalizes tf values.
`discount_overlaps`:: `discount_overlaps`::
Determines whether overlap tokens (Tokens with Determines whether overlap tokens (Tokens with
0 position increment) are ignored when computing norm. By default this 0 position increment) are ignored when computing norm. By default this
is true, meaning overlap tokens do not count when computing norms. is true, meaning overlap tokens do not count when computing norms.
@ -90,17 +90,17 @@ Type name: `BM25`
==== DFR similarity ==== DFR similarity
Similarity that implements the Similarity that implements the
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
from randomness] framework. This similarity has the following options: from randomness] framework. This similarity has the following options:
[horizontal] [horizontal]
`basic_model`:: `basic_model`::
Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`. Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`.
`after_effect`:: `after_effect`::
Possible values: `no`, `b` and `l`. Possible values: `no`, `b` and `l`.
`normalization`:: `normalization`::
Possible values: `no`, `h1`, `h2`, `h3` and `z`. Possible values: `no`, `h1`, `h2`, `h3` and `z`.
All options but the first option need a normalization value. All options but the first option need a normalization value.
@ -111,12 +111,12 @@ Type name: `DFR`
[[ib]] [[ib]]
==== IB similarity. ==== IB similarity.
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
based model] . This similarity has the following options: based model] . This similarity has the following options:
[horizontal] [horizontal]
`distribution`:: Possible values: `ll` and `spl`. `distribution`:: Possible values: `ll` and `spl`.
`lambda`:: Possible values: `df` and `ttf`. `lambda`:: Possible values: `df` and `ttf`.
`normalization`:: Same as in `DFR` similarity. `normalization`:: Same as in `DFR` similarity.
Type name: `IB` Type name: `IB`
@ -125,7 +125,7 @@ Type name: `IB`
[[lm_dirichlet]] [[lm_dirichlet]]
==== LM Dirichlet similarity. ==== LM Dirichlet similarity.
http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
Dirichlet similarity] . This similarity has the following options: Dirichlet similarity] . This similarity has the following options:
[horizontal] [horizontal]
@ -137,7 +137,7 @@ Type name: `LMDirichlet`
[[lm_jelinek_mercer]] [[lm_jelinek_mercer]]
==== LM Jelinek Mercer similarity. ==== LM Jelinek Mercer similarity.
http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
Jelinek Mercer similarity] . This similarity has the following options: Jelinek Mercer similarity] . This similarity has the following options:
[horizontal] [horizontal]

View File

@ -3,76 +3,157 @@
[partintro] [partintro]
-- --
Mapping is the process of defining how a document should be mapped to
the Search Engine, including its searchable characteristics such as
which fields are searchable and if/how they are tokenized. In
Elasticsearch, an index may store documents of different "mapping
types". Elasticsearch allows one to associate multiple mapping
definitions for each mapping type.
Explicit mapping is defined on an index/type level. By default, there Mapping is the process of defining how a document, and the fields it contains,
isn't a need to define an explicit mapping, since one is automatically are stored and indexed. For instance, use mappings to define:
created and registered when a new type or new field is introduced (with
no performance overhead) and have sensible defaults. Only when the * which string fields should be treated as full text fields.
defaults need to be overridden must a mapping definition be provided. * which fields contain numbers, dates, or geolocations.
* whether the values of all fields in the document should be
indexed into the catch-all <<mapping-all-field,`_all`>> field.
* the <<mapping-date-format,format>> of date values.
* custom rules to control the mapping for
<<dynamic-mapping,dynamically added fields>>.
[float] [float]
[[all-mapping-types]] [[mapping-type]]
=== Mapping Types == Mapping Types
Mapping types are a way to divide the documents in an index into logical Each index has one or more _mapping types_, which are used to divide the
groups. Think of it as tables in a database. Though there is separation documents in an index into logical groups. User documents might be stored in a
between types, it's not a full separation (all end up as a document `user` type, and blog posts in a `blogpost` type.
within the same Lucene index).
Field names with the same name across types are highly recommended to Each mapping type has:
have the same type and same mapping characteristics (analysis settings
for example). There is an effort to allow to explicitly "choose" which
field to use by using type prefix (`my_type.my_field`), but it's not
complete, and there are places where it will never work (like
aggregations on the field).
In practice though, this restriction is almost never an issue. The field <<mapping-fields,Meta-fields>>::
name usually ends up being a good indication to its "typeness" (e.g.
"first_name" will always be a string). Note also, that this does not Meta-fields are used to customize how a document's metadata associated is
apply to the cross index case. treated. Examples of meta-fields include the document's
<<mapping-index-field,`_index`>>, <<mapping-type-field,`_type`>>,
<<mapping-id-field,`_id`>>, and <<mapping-source-field,`_source`>> fields.
<<mapping-types,Fields>> or _properties_::
Each mapping type contains a list of fields or `properties` pertinent to that
type. A `user` type might contain `title`, `name`, and `age` fields, while a
`blogpost` type might contain `title`, `body`, `user_id` and `created`
fields.
The mapping for the above example could look like this:
[source,js]
---------------------------------------
PUT my_index <1>
{
"mappings": {
"user": { <2>
"_all": { "enabled": false }, <3>
"properties": { <4>
"title": { "type": "string" }, <5>
"name": { "type": "string" }, <5>
"age": { "type": "integer" } <5>
}
},
"blogpost": { <2>
"properties": { <4>
"title": { "type": "string" }, <5>
"body": { "type": "string" }, <5>
"user_id": {
"type": "string", <5>
"index": "not_analyzed"
},
"created": {
"type": "date", <5>
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
}
---------------------------------------
// AUTOSENSE
<1> Create an index called `my_index`.
<2> Add mapping types called `user` and `blogpost`.
<3> Disable the `_all` <<mapping-fields,meta field>> for the `user` mapping type.
<4> Specify fields or _properties_ in each mapping type.
<5> Specify the data `type` and mapping for each field.
[float] [float]
[[mapping-api]] == Field datatypes
=== Mapping API
To create a mapping, you will need the <<indices-put-mapping,Put Mapping Each field has a data `type` which can be:
API>>, or you can add multiple mappings when you <<indices-create-index,create an
index>>. * a simple type like <<string,`string`>>, <<date,`date`>>, <<number,`long`>>,
<<number,`double`>>, <<boolean,`boolean`>> or <<ip,`ip`>>.
* a type which supports the hierarchical nature of JSON such as
<<object,`object`>> or <<nested,`nested`>>.
* or a specialised type like <<geo-point,`geo_point`>>,
<<geo-shape,`geo_shape`>>, or <<search-suggesters-completion,`completion`>>.
[IMPORTANT]
.Fields are shared across mapping types
=========================================
Mapping types are used to group fields, but the fields in each mapping type
are not independent of each other. Fields with:
* the _same name_
* in the _same index_
* in _different mapping types_
* map to the _same field_ internally,
* and *must have the same mapping*.
The `title` field exists in both the `user` and `blogpost` mapping types and
so must have exactly the same mapping in each type. The only exceptions to
this rule are the <<copy-to>>, <<dynamic>>, <<enabled>>, <<ignore-above>>,
<<include-in-all>>, and <<properties>> parameters, which may have different
settings per field.
Usually, fields with the same name also contain the same type of data, so
having the same mapping is not a problem. When conflicts do arise, these can
be solved by choosing more descriptive names, such as `user_title` and
`blog_title`.
=========================================
[float] [float]
[[mapping-settings]] == Dynamic mapping
=== Global Settings
The `index.mapping.ignore_malformed` global setting can be set on the Fields and mapping types do not need to be defined before being used. Thanks
index level to allow to ignore malformed content globally across all to _dynamic mapping_, new mapping types and new field names will be added
mapping types (malformed content example is trying to index a text string automatically, just by indexing a document. New fields can be added both to
value as a numeric type). the top-level mapping type, and to inner <<object,`object`>> and
<<nested,`nested`>> fields.
The
<<dynamic-mapping,dynamic mapping>> rules can be configured to
customise the mapping that is used for new types and new fields.
[float]
== Explicit mappings
You know more about your data than Elasticsearch can guess, so while dynamic
mapping can be useful to get started, at some point you will want to specify
your own explicit mappings.
You can create mapping types and field mappings when you
<<indices-create-index,create an index>>, and you can add mapping types and
fields to an existing index with the <<indices-put-mapping,PUT mapping API>>.
[float]
== Updating existing mappings
Other than where documented, *existing type and field mappings cannot be
updated*. Changing the mapping would mean invalidating already indexed
documents. Instead, you should create a new index with the correct mappings
and reindex your data into that index.
The `index.mapping.coerce` global setting can be set on the
index level to coerce numeric content globally across all
mapping types (The default setting is true and coercions attempted are
to convert strings with numbers into numeric types and also numeric values
with fractions to any integer/short/long values minus the fraction part).
When the permitted conversions fail in their attempts, the value is considered
malformed and the ignore_malformed setting dictates what will happen next.
-- --
include::mapping/fields.asciidoc[]
include::mapping/types.asciidoc[] include::mapping/types.asciidoc[]
include::mapping/date-format.asciidoc[] include::mapping/fields.asciidoc[]
include::mapping/fielddata_formats.asciidoc[] include::mapping/params.asciidoc[]
include::mapping/dynamic-mapping.asciidoc[] include::mapping/dynamic-mapping.asciidoc[]
include::mapping/meta.asciidoc[]
include::mapping/transform.asciidoc[]

View File

@ -1,238 +0,0 @@
[[mapping-date-format]]
== Date Format
In JSON documents, dates are represented as strings. Elasticsearch uses a set
of pre-configured format to recognize and convert those, but you can change the
defaults by specifying the `format` option when defining a `date` type, or by
specifying `dynamic_date_formats` in the `root object` mapping (which will
be used unless explicitly overridden by a `date` type). There are built in
formats supported, as well as complete custom one.
The parsing of dates uses http://www.joda.org/joda-time/[Joda]. The
default date parsing used if no format is specified is
http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[ISODateTimeFormat.dateOptionalTimeParser].
An extension to the format allow to define several formats using `||`
separator. This allows to define less strict formats that can be used,
for example, the `yyyy/MM/dd HH:mm:ss||yyyy/MM/dd` format will parse
both `yyyy/MM/dd HH:mm:ss` and `yyyy/MM/dd`. The first format will also
act as the one that converts back from milliseconds to a string
representation.
[float]
[[date-math]]
=== Date Math
The `date` type supports using date math expression when using it in a
query/filter (mainly makes sense in `range` query/filter).
The expression starts with an "anchor" date, which can be either `now`
or a date string (in the applicable format) ending with `||`. It can
then follow by a math expression, supporting `+`, `-` and `/`
(rounding). The units supported are `y` (year), `M` (month), `w` (week),
`d` (day), `h` (hour), `m` (minute), and `s` (second).
Here are some samples: `now+1h`, `now+1h+1m`, `now+1h/d`,
`2012-01-01||+1M/d`.
When doing `range` type searches with rounding, the value parsed
depends on whether the end of the range is inclusive or exclusive, and
whether the beginning or end of the range. Rounding up moves to the
last millisecond of the rounding scope, and rounding down to the
first millisecond of the rounding scope. The semantics work as follows:
* `gt` - round up, and use > that value (`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie excluding the entire month)
* `gte` - round D down, and use >= that value (`2014-11-18||/M` becomes `2014-11-01`, ie including the entire month)
* `lt` - round D down, and use < that value (`2014-11-18||/M` becomes `2014-11-01`, ie excluding the entire month)
* `lte` - round D up, and use <= that value(`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie including the entire month)
[float]
[[built-in]]
=== Built In Formats
Most of the below dates have a `strict` companion dates, which means, that
year, month and day parts of the week must have prepending zeros in order
to be valid. This means, that a date like `5/11/1` would not be valid, but
you would need to specify the full date, which would be `2005/11/01` in this
example. So instead of `date_optional_time` you would need to specify
`strict_date_optional_time`.
The following tables lists all the defaults ISO formats supported:
[cols="<,<",options="header",]
|=======================================================================
|Name |Description
|`basic_date`|A basic formatter for a full date as four digit year, two
digit month of year, and two digit day of month (yyyyMMdd).
|`basic_date_time`|A basic formatter that combines a basic date and time,
separated by a 'T' (yyyyMMdd'T'HHmmss.SSSZ).
|`basic_date_time_no_millis`|A basic formatter that combines a basic date
and time without millis, separated by a 'T' (yyyyMMdd'T'HHmmssZ).
|`basic_ordinal_date`|A formatter for a full ordinal date, using a four
digit year and three digit dayOfYear (yyyyDDD).
|`basic_ordinal_date_time`|A formatter for a full ordinal date and time,
using a four digit year and three digit dayOfYear
(yyyyDDD'T'HHmmss.SSSZ).
|`basic_ordinal_date_time_no_millis`|A formatter for a full ordinal date
and time without millis, using a four digit year and three digit
dayOfYear (yyyyDDD'T'HHmmssZ).
|`basic_time`|A basic formatter for a two digit hour of day, two digit
minute of hour, two digit second of minute, three digit millis, and time
zone offset (HHmmss.SSSZ).
|`basic_time_no_millis`|A basic formatter for a two digit hour of day,
two digit minute of hour, two digit second of minute, and time zone
offset (HHmmssZ).
|`basic_t_time`|A basic formatter for a two digit hour of day, two digit
minute of hour, two digit second of minute, three digit millis, and time
zone off set prefixed by 'T' ('T'HHmmss.SSSZ).
|`basic_t_time_no_millis`|A basic formatter for a two digit hour of day,
two digit minute of hour, two digit second of minute, and time zone
offset prefixed by 'T' ('T'HHmmssZ).
|`basic_week_date`|A basic formatter for a full date as four digit
weekyear, two digit week of weekyear, and one digit day of week
(xxxx'W'wwe). `strict_basic_week_date` is supported.
|`basic_week_date_time`|A basic formatter that combines a basic weekyear
date and time, separated by a 'T' (xxxx'W'wwe'T'HHmmss.SSSZ).
`strict_basic_week_date_time` is supported.
|`basic_week_date_time_no_millis`|A basic formatter that combines a basic
weekyear date and time without millis, separated by a 'T'
(xxxx'W'wwe'T'HHmmssZ). `strict_week_date_time` is supported.
|`date`|A formatter for a full date as four digit year, two digit month
of year, and two digit day of month (yyyy-MM-dd). `strict_date` is supported.
_
|`date_hour`|A formatter that combines a full date and two digit hour of
day. strict_date_hour` is supported.
|`date_hour_minute`|A formatter that combines a full date, two digit hour
of day, and two digit minute of hour. strict_date_hour_minute` is supported.
|`date_hour_minute_second`|A formatter that combines a full date, two
digit hour of day, two digit minute of hour, and two digit second of
minute. `strict_date_hour_minute_second` is supported.
|`date_hour_minute_second_fraction`|A formatter that combines a full
date, two digit hour of day, two digit minute of hour, two digit second
of minute, and three digit fraction of second
(yyyy-MM-dd'T'HH:mm:ss.SSS). `strict_date_hour_minute_second_fraction` is supported.
|`date_hour_minute_second_millis`|A formatter that combines a full date,
two digit hour of day, two digit minute of hour, two digit second of
minute, and three digit fraction of second (yyyy-MM-dd'T'HH:mm:ss.SSS).
`strict_date_hour_minute_second_millis` is supported.
|`date_optional_time`|a generic ISO datetime parser where the date is
mandatory and the time is optional. `strict_date_optional_time` is supported.
|`date_time`|A formatter that combines a full date and time, separated by
a 'T' (yyyy-MM-dd'T'HH:mm:ss.SSSZZ). `strict_date_time` is supported.
|`date_time_no_millis`|A formatter that combines a full date and time
without millis, separated by a 'T' (yyyy-MM-dd'T'HH:mm:ssZZ).
`strict_date_time_no_millis` is supported.
|`hour`|A formatter for a two digit hour of day. `strict_hour` is supported.
|`hour_minute`|A formatter for a two digit hour of day and two digit
minute of hour. `strict_hour_minute` is supported.
|`hour_minute_second`|A formatter for a two digit hour of day, two digit
minute of hour, and two digit second of minute.
`strict_hour_minute_second` is supported.
|`hour_minute_second_fraction`|A formatter for a two digit hour of day,
two digit minute of hour, two digit second of minute, and three digit
fraction of second (HH:mm:ss.SSS).
`strict_hour_minute_second_fraction` is supported.
|`hour_minute_second_millis`|A formatter for a two digit hour of day, two
digit minute of hour, two digit second of minute, and three digit
fraction of second (HH:mm:ss.SSS).
`strict_hour_minute_second_millis` is supported.
|`ordinal_date`|A formatter for a full ordinal date, using a four digit
year and three digit dayOfYear (yyyy-DDD). `strict_ordinal_date` is supported.
|`ordinal_date_time`|A formatter for a full ordinal date and time, using
a four digit year and three digit dayOfYear (yyyy-DDD'T'HH:mm:ss.SSSZZ).
`strict_ordinal_date_time` is supported.
|`ordinal_date_time_no_millis`|A formatter for a full ordinal date and
time without millis, using a four digit year and three digit dayOfYear
(yyyy-DDD'T'HH:mm:ssZZ).
`strict_ordinal_date_time_no_millis` is supported.
|`time`|A formatter for a two digit hour of day, two digit minute of
hour, two digit second of minute, three digit fraction of second, and
time zone offset (HH:mm:ss.SSSZZ). `strict_time` is supported.
|`time_no_millis`|A formatter for a two digit hour of day, two digit
minute of hour, two digit second of minute, and time zone offset
(HH:mm:ssZZ). `strict_time_no_millis` is supported.
|`t_time`|A formatter for a two digit hour of day, two digit minute of
hour, two digit second of minute, three digit fraction of second, and
time zone offset prefixed by 'T' ('T'HH:mm:ss.SSSZZ).
`strict_t_time` is supported.
|`t_time_no_millis`|A formatter for a two digit hour of day, two digit
minute of hour, two digit second of minute, and time zone offset
prefixed by 'T' ('T'HH:mm:ssZZ). `strict_t_time_no_millis` is supported.
|`week_date`|A formatter for a full date as four digit weekyear, two
digit week of weekyear, and one digit day of week (xxxx-'W'ww-e).
`strict_week_date` is supported.
|`week_date_time`|A formatter that combines a full weekyear date and
time, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ).
`strict_week_date_time` is supported.
|`week_date_time_no_millis`|A formatter that combines a full weekyear date
and time without millis, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ssZZ).
`strict_week_date_time` is supported.
|`weekyear`|A formatter for a four digit weekyear. `strict_week_year` is supported.
|`weekyear_week`|A formatter for a four digit weekyear and two digit week
of weekyear. `strict_weekyear_week` is supported.
|`weekyear_week_day`|A formatter for a four digit weekyear, two digit week
of weekyear, and one digit day of week. `strict_weekyear_week_day` is supported.
|`year`|A formatter for a four digit year. `strict_year` is supported.
|`year_month`|A formatter for a four digit year and two digit month of
year. `strict_year_month` is supported.
|`year_month_day`|A formatter for a four digit year, two digit month of
year, and two digit day of month. `strict_year_month_day` is supported.
|`epoch_second`|A formatter for the number of seconds since the epoch.
Note, that this timestamp allows a max length of 10 chars, so dates
older than 1653 and 2286 are not supported. You should use a different
date formatter in that case.
|`epoch_millis`|A formatter for the number of milliseconds since the epoch.
Note, that this timestamp allows a max length of 13 chars, so dates
older than 1653 and 2286 are not supported. You should use a different
date formatter in that case.
|=======================================================================
[float]
[[custom]]
=== Custom Format
Allows for a completely customizable date format explained
http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[here].

View File

@ -1,73 +1,67 @@
[[mapping-dynamic-mapping]] [[dynamic-mapping]]
== Dynamic Mapping == Dynamic Mapping
Default mappings allow generic mapping definitions to be automatically applied One of the most important features of Elasticsearch is that it tries to get
to types that do not have mappings predefined. This is mainly done out of your way and let you start exploring your data as quickly as possible.
thanks to the fact that the To index a document, you don't have to first create an index, define a mapping
<<mapping-object-type,object mapping>> and type, and define your fields -- you can just index a document and the index,
namely the <<mapping-root-object-type,root type, and fields will spring to life automatically:
object mapping>> allow for schema-less dynamic addition of unmapped
fields.
The default mapping definition is a plain mapping definition that is
embedded within the distribution:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ PUT data/counters/1 <1>
"_default_" : { { "count": 5 }
}
}
-------------------------------------------------- --------------------------------------------------
// AUTOSENSE
<1> Creates the `data` index, the `counters` mapping type, and a field
called `count` with datatype `long`.
Pretty short, isn't it? Basically, everything is defaulted, especially the The automatic detection and addition of new types and fields is called
dynamic nature of the root object mapping. The default mapping can be _dynamic mapping_. The dynamic mapping rules can be customised to suit your
overridden by specifying the `_default_` type when creating a new index. purposes with:
The dynamic creation of mappings for unmapped types can be completely <<default-mapping,`_default_` mapping>>::
disabled by setting `index.mapper.dynamic` to `false`.
The dynamic creation of fields within a type can be completely Configure the base mapping to be used for new mapping types.
disabled by setting the `dynamic` property of the type to `strict`.
Here is a <<indices-put-mapping,Put Mapping>> example that <<dynamic-field-mapping,Dynamic field mappings>>::
disables dynamic field creation for a `tweet`:
[source,js] The rules governing dynamic field detection.
--------------------------------------------------
$ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d '
{
"tweet" : {
"dynamic": "strict",
"properties" : {
"message" : {"type" : "string", "store" : true }
}
}
}
'
--------------------------------------------------
Here is how we can change the default <<dynamic-templates,Dynamic templates>>::
<<mapping-date-format,date_formats>> used in the
root and inner object types: Custom rules to configure the mapping for dynamically added fields.
TIP: <<indices-templates,Index templates>> allow you to configure the default
mappings, settings, aliases, and warmers for new indices, whether created
automatically or explicitly.
[source,js]
--------------------------------------------------
{
"_default_" : {
"dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy", "date_optional_time"]
}
}
--------------------------------------------------
[float] [float]
=== Unmapped fields in queries === Disabling automatic type creation
Queries and filters can refer to fields that don't exist in a mapping. Whether this Automatic type creation can be disabled by setting the `index.mapper.dynamic`
is allowed is controlled by the `index.query.parse.allow_unmapped_fields` setting. setting to `false`, either by setting the default value in the
This setting defaults to `true`. Setting it to `false` will disallow the usage of `config/elasticsearch.yml` file, or per-index as an index setting:
unmapped fields in queries.
[source,js]
--------------------------------------------------
PUT /_settings <1>
{
"index.mapper.dynamic":false
}
--------------------------------------------------
// AUTOSENSE
<1> Disable automatic type creation for all indices.
Regardless of the value of this setting, types can still be added explicitly
when <<indices-create-index,creating an index>> or with the
<<indices-put-mapping,PUT mapping>> API.
include::dynamic/default-mapping.asciidoc[]
include::dynamic/field-mapping.asciidoc[]
include::dynamic/templates.asciidoc[]
When registering a new <<search-percolate,percolator query>> or creating
a <<filtered,filtered alias>> then the `index.query.parse.allow_unmapped_fields` setting
is forcefully overwritten to disallowed unmapped fields.

View File

@ -0,0 +1,82 @@
[[default-mapping]]
=== `_default_` mapping
The default mapping, which will be used as the base mapping for any new
mapping types, can be customised by adding a mapping type with the name
`_default_` to an index, either when
<<indices-create-index,creating the index>> or later on with the
<<indices-put-mapping,PUT mapping>> API.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"_default_": { <1>
"_all": {
"enabled": false
}
},
"user": {}, <2>
"blogpost": { <3>
"_all": {
"enabled": true
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `_default_` mapping defaults the <<mapping-all-field,`_all`>> field to disabled.
<2> The `user` type inherits the settings from `_default_`.
<3> The `blogpost` type overrides the defaults and enables the <<mapping-all-field,`_all`>> field.
While the `_default_` mapping can be updated after an index has been created,
the new defaults will only affect mapping types that are created afterwards.
The `_default_` mapping can be used in conjunction with
<<indices-templates,Index templates>> to control dynamically created types
within automatically created indices:
[source,js]
--------------------------------------------------
PUT _template/logging
{
"template": "logs-*", <1>
"settings": { "number_of_shards": 1 }, <2>
"mappings": {
"_default_": {
"_all": { <3>
"enabled": false
},
"dynamic_templates": [
{
"strings": { <4>
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
PUT logs-2015.10.01/event/1
{ "message": "error:16" }
--------------------------------------------------
// AUTOSENSE
<1> The `logging` template will match any indices beginning with `logs-`.
<2> Matching indices will be created with a single primary shard.
<3> The `_all` field will be disabled by default for new type mappings.
<4> String fields will be created with an `analyzed` main field, and a `not_analyzed` `.raw` field.

View File

@ -0,0 +1,139 @@
[[dynamic-field-mapping]]
=== Dynamic field mapping
By default, when a previously unseen field is found in a document,
Elasticsearch will add the new field to the type mapping. This behaviour can
be disabled, both at the document and at the <<object,`object`>> level, by
setting the <<dynamic,`dynamic`>> parameter to `false` or to `strict`.
Assuming `dynamic` field mapping is enabled, some simple rules are used to
determine which datatype the field should have:
[horizontal]
*JSON datatype*:: *Elasticsearch datatype*
`null`:: No field is added.
`true` or `false`:: <<boolean,`boolean`>> field
floating{nbsp}point{nbsp}number:: <<number,`double`>> field
integer:: <<number,`long`>> field
object:: <<object,`object`>> field
array:: Depends on the first non-`null` value in the array.
string:: Either a <<date,`date`>> field
(if the value passes <<date-detection,date detection>>),
a <<number,`double`>> or <<number,`long`>> field
(if the value passes <<numeric-detection,numeric detection>>)
or an <<mapping-index,`analyzed`>> <<string,`string`>> field.
These are the only <<mapping-types,field datatypes>> that are dynamically
detected. All other datatypes must be mapped explicitly.
Besides the options listed below, dynamic field mapping rules can be further
customised with <<dynamic-templates,`dynamic_templates`>>.
[[date-detection]]
==== Date detection
If `date_detection` is enabled (default), then new string fields are checked
to see whether their contents match any of the date patterns specified in
`dynamic_date_formats`. If a match is found, a new <<date,`date`>> field is
added with the corresponding format.
The default value for `dynamic_date_formats` is:
&#91; <<strict-date-time,`"strict_date_optional_time"`>>,`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`]
For example:
[source,js]
--------------------------------------------------
PUT my_index/my_type/1
{
"create_date": "2015/09/02"
}
GET my_index/_mapping <1>
--------------------------------------------------
// AUTOSENSE
<1> The `create_date` field has been added as a <<date,`date`>>
field with the <<mapping-date-format,`format`>>: +
`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`.
===== Disabling date detection
Dynamic date dection can be disabled by setting `date_detection` to `false`:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"date_detection": false
}
}
}
PUT my_index/my_type/1 <1>
{
"create": "2015/09/02"
}
--------------------------------------------------
// AUTOSENSE
<1> The `create_date` field has been added as a <<string,`string`>> field.
===== Customising detected date formats
Alternatively, the `dynamic_date_formats` can be customised to support your
own <<mapping-date-format,date formats>>:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_date_formats": ["MM/dd/yyyy"]
}
}
}
PUT my_index/my_type/1
{
"create_date": "09/25/2015"
}
--------------------------------------------------
// AUTOSENSE
[[numeric-detection]]
==== Numeric detection
While JSON has support for native floating point and integer datatypes, some
applications or languages may sometimes render numbers as strings. Usually the
correct solution is to map these fields explicitly, but numeric detection
(which is disabled by default) can be enabled to do this automatically:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"numeric_detection": true
}
}
}
PUT my_index/my_type/1
{
"my_float": "1.0", <1>
"my_integer": "1" <2>
}
--------------------------------------------------
// AUTOSENSE
<1> The `my_float` field is added as a <<number,`double`>> field.
<2> The `my_integer` field is added as a <<number,`long`>> field.

View File

@ -0,0 +1,251 @@
[[dynamic-templates]]
=== Dynamic templates
Dynamic templates allow you to define custom mappings that can be applied to
dynamically added fields based on:
* the <<dynamic-mapping,datatype>> detected by Elasticsearch, with <<match-mapping-type,`match_mapping_type`>>.
* the name of the field, with <<match-unmatch,`match` and `unmatch`>> or <<match-pattern,`match_pattern`>>.
* the full dotted path to the field, with <<path-match-unmatch,`path_match` and `path_unmatch`>>.
The original field name `{name}` and the detected datatype
`{dynamic_type`} <<template-variables,template variables>> can be used in
the mapping specification as placeholders.
IMPORTANT: Dynamic field mappings are only added when a field contains a
concrete value -- not `null` or an empty array. This means that if the
`null_value` option is used in a `dynamic_template`, it will only be applied
after the first document with a concrete value for the field has been
indexed.
Dynamic templates are specified as an array of named objects:
[source,js]
--------------------------------------------------
"dynamic_templates": [
{
"my_template_name": { <1>
... match conditions ... <2>
"mapping": { ... } <3>
}
},
...
]
--------------------------------------------------
<1> The template name can be any string value.
<2> The match conditions can include any of : `match_mapping_type`, `match`, `match_pattern`, `unmatch`, `match_path`, `unmatch_path`.
<3> The mapping that the matched field should use.
Templates are processed in order -- the first matching template wins. New
templates can be appended to the end of the list with the
<<indices-put-mapping,PUT mapping>> API. If a new template has the same
name as an existing template, it will replace the old version.
[[match-mapping-type]]
==== `match_mapping_type`
The `match_mapping_type` matches on the datatype detected by
<<dynamic-field-mapping,dynamic field mapping>>, in other words, the datatype
that Elasticsearch thinks the field should have. Only the following datatypes
can be automatically detected: `boolean`, `date`, `double`, `long`, `object`,
`string`. It also accepts `*` to match all datatypes.
For example, if we wanted to map all integer fields as `integer` instead of
`long`, and all `string` fields as both `analyzed` and `not_analyzed`, we
could use the following template:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"integers": {
"match_mapping_type": "long",
"mapping": {
"type": "integer"
}
}
},
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
}
}
]
}
}
}
PUT my_index/my_type/1
{
"my_integer": 5, <1>
"my_string": "Some string" <2>
}
--------------------------------------------------
// AUTOSENSE
<1> The `my_integer` field is mapped as an `integer`.
<2> The `my_string` field is mapped as an analyzed `string`, with a `not_analyzed` <<multi-fields,multi field>>.
[[match-unmatch]]
==== `match` and `unmatch`
The `match` parameter uses a pattern to match on the fieldname, while
`unmatch` uses a pattern to exclude fields matched by `match`.
The following example matches all `string` fields whose name starts with
`long_` (except for those which end with `_text`) and maps them as `long`
fields:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"longs_as_strings": {
"match_mapping_type": "string",
"match": "long_*",
"unmatch": "*_text",
"mapping": {
"type": "long"
}
}
}
]
}
}
}
PUT my_index/my_type/1
{
"long_num": "5", <1>
"long_text": "foo" <2>
}
--------------------------------------------------
// AUTOSENSE
<1> The `long_num` field is mapped as a `long`.
<2> The `long_text` field uses the default `string` mapping.
[[match-pattern]]
==== `match_pattern`
The `match_pattern` parameter behaves just like the `match` parameter, but
supports full Java regular expression matching on the field name instead of
simple wildcards, for instance:
[source,js]
--------------------------------------------------
"match_pattern": "^profit_\d+$"
--------------------------------------------------
[[path-match-unmatch]]
==== `path_match` and `path_unmatch`
The `path_match` and `path_unmatch` parameters work in the same way as `match`
and `unmatch`, but operate on the full dotted path to the field, not just the
final name, e.g. `some_object.*.some_field`.
This example copies the values of any fields in the `name` object to the
top-level `full_name` field, except for the `middle` field:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"full_name": {
"path_match": "name.*",
"path_unmatch": "*.middle",
"mapping": {
"type": "string",
"copy_to": "full_name"
}
}
}
]
}
}
}
PUT my_index/my_type/1
{
"name": {
"first": "Alice",
"middle": "Mary",
"last": "White"
}
}
--------------------------------------------------
// AUTOSENSE
[[template-variables]]
==== `{name}` and `{dynamic_type}`
The `{name}` and `{dynamic_type}` placeholders are replaced in the `mapping`
with the field name and detected dynamic type. The following example sets all
string fields to use an <<analyzer,`analyzer`>> with the same name as the
field, and disables <<doc-values,`doc_values`>> for all non-string fields:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"named_analyzers": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "string",
"analyzer": "{name}"
}
}
},
{
"no_doc_values": {
"match_mapping_type":"*",
"mapping": {
"type": "{dynamic_type}",
"doc_values": false
}
}
}
]
}
}
}
PUT my_index/my_type/1
{
"english": "Some English text", <1>
"count": 5 <2>
}
--------------------------------------------------
// AUTOSENSE
<1> The `english` field is mapped as a `string` field with the `english` analyzer.
<2> The `count` field is mapped as a `long` field with `doc_values` disabled

View File

@ -1,257 +0,0 @@
[[fielddata-formats]]
== Fielddata formats
The field data format controls how field data should be stored.
Depending on the field type, there might be several field data types
available. In particular, string, geo-point and numeric types support the `doc_values`
format which allows for computing the field data data-structures at indexing
time and storing them on disk. Although it will make the index larger and may
be slightly slower, this implementation will be more near-realtime-friendly
and will require much less memory from the JVM than other implementations.
Here is an example of how to configure the `tag` field to use the `paged_bytes` field
data format.
[source,js]
--------------------------------------------------
{
"tag": {
"type": "string",
"fielddata": {
"format": "paged_bytes"
}
}
}
--------------------------------------------------
It is possible to change the field data format (and the field data settings
in general) on a live index by using the update mapping API.
[float]
=== String field data types
`paged_bytes` (default on analyzed string fields)::
Stores unique terms sequentially in a large buffer and maps documents to
the indices of the terms they contain in this large buffer.
`doc_values` (default when index is set to `not_analyzed`)::
Computes and stores field data data-structures on disk at indexing time.
Lowers memory usage but only works on non-analyzed strings (`index`: `no` or
`not_analyzed`).
[float]
=== Numeric field data types
`array`::
Stores field values in memory using arrays.
`doc_values` (default unless doc values are disabled)::
Computes and stores field data data-structures on disk at indexing time.
[float]
=== Geo point field data types
`array`::
Stores latitudes and longitudes in arrays.
`doc_values` (default unless doc values are disabled)::
Computes and stores field data data-structures on disk at indexing time.
[float]
[[global-ordinals]]
=== Global ordinals
Global ordinals is a data-structure on top of field data, that maintains an
incremental numbering for all the terms in field data in a lexicographic order.
Each term has a unique number and the number of term 'A' is lower than the number
of term 'B'. Global ordinals are only supported on string fields.
Field data on string also has ordinals, which is a unique numbering for all terms
in a particular segment and field. Global ordinals just build on top of this,
by providing a mapping between the segment ordinals and the global ordinals.
The latter being unique across the entire shard.
Global ordinals can be beneficial in search features that use segment ordinals already
such as the terms aggregator to improve the execution time. Often these search features
need to merge the segment ordinal results to a cross segment terms result. With
global ordinals this mapping happens during field data load time instead of during each
query execution. With global ordinals search features only need to resolve the actual
term when building the (shard) response, but during the execution there is no need
at all to use the actual terms and the unique numbering global ordinals provided is
sufficient and improves the execution time.
Global ordinals for a specified field are tied to all the segments of a shard (Lucene index),
which is different than for field data for a specific field which is tied to a single segment.
For this reason global ordinals need to be rebuilt in its entirety once new segments
become visible. This one time cost would happen anyway without global ordinals, but
then it would happen for each search execution instead!
The loading time of global ordinals depends on the number of terms in a field, but in general
it is low, since it source field data has already been loaded. The memory overhead of global
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
can move the loading time from the first search request, to the refresh itself.
[float]
[[fielddata-loading]]
=== Fielddata loading
By default, field data is loaded lazily, ie. the first time that a query that
requires them is executed. However, this can make the first requests that
follow a merge operation quite slow since fielddata loading is a heavy
operation.
It is possible to force field data to be loaded and cached eagerly through the
`loading` setting of fielddata:
[source,js]
--------------------------------------------------
{
"category": {
"type": "string",
"fielddata": {
"loading": "eager"
}
}
}
--------------------------------------------------
Global ordinals can also be eagerly loaded:
[source,js]
--------------------------------------------------
{
"category": {
"type": "string",
"fielddata": {
"loading": "eager_global_ordinals"
}
}
}
--------------------------------------------------
With the above setting both field data and global ordinals for a specific field
are eagerly loaded.
[float]
==== Disabling field data loading
Field data can take a lot of RAM so it makes sense to disable field data
loading on the fields that don't need field data, for example those that are
used for full-text search only. In order to disable field data loading, just
change the field data format to `disabled`. When disabled, all requests that
will try to load field data, e.g. when they include aggregations and/or sorting,
will return an error.
[source,js]
--------------------------------------------------
{
"text": {
"type": "string",
"fielddata": {
"format": "disabled"
}
}
}
--------------------------------------------------
The `disabled` format is supported by all field types.
[float]
[[field-data-filtering]]
=== Filtering fielddata
It is possible to control which field values are loaded into memory,
which is particularly useful for string fields. When specifying the
<<mapping-core-types,mapping>> for a field, you
can also specify a fielddata filter.
Fielddata filters can be changed using the
<<indices-put-mapping,PUT mapping>>
API. After changing the filters, use the
<<indices-clearcache,Clear Cache>> API
to reload the fielddata using the new filters.
[float]
==== Filtering by frequency:
The frequency filter allows you to only load terms whose frequency falls
between a `min` and `max` value, which can be expressed an absolute
number (when the number is bigger than 1.0) or as a percentage
(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
*per segment*. Percentages are based on the number of docs which have a
value for the field, as opposed to all docs in the segment.
Small segments can be excluded completely by specifying the minimum
number of docs that the segment should contain with `min_segment_size`:
[source,js]
--------------------------------------------------
{
"tag": {
"type": "string",
"fielddata": {
"filter": {
"frequency": {
"min": 0.001,
"max": 0.1,
"min_segment_size": 500
}
}
}
}
}
--------------------------------------------------
[float]
==== Filtering by regex
Terms can also be filtered by regular expression - only values which
match the regular expression are loaded. Note: the regular expression is
applied to each term in the field, not to the whole field value. For
instance, to only load hashtags from a tweet, we can use a regular
expression which matches terms beginning with `#`:
[source,js]
--------------------------------------------------
{
"tweet": {
"type": "string",
"analyzer": "whitespace"
"fielddata": {
"filter": {
"regex": {
"pattern": "^#.*"
}
}
}
}
}
--------------------------------------------------
[float]
==== Combining filters
The `frequency` and `regex` filters can be combined:
[source,js]
--------------------------------------------------
{
"tweet": {
"type": "string",
"analyzer": "whitespace"
"fielddata": {
"filter": {
"regex": {
"pattern": "^#.*",
},
"frequency": {
"min": 0.001,
"max": 0.1,
"min_segment_size": 500
}
}
}
}
}
--------------------------------------------------

View File

@ -5,7 +5,8 @@ Each document has metadata associated with it, such as the `_index`, mapping
<<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields <<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields
can be customised when a mapping type is created. can be customised when a mapping type is created.
The meta-fields are: [float]
=== Identity meta-fields
[horizontal] [horizontal]
<<mapping-index-field,`_index`>>:: <<mapping-index-field,`_index`>>::
@ -18,16 +19,26 @@ The meta-fields are:
<<mapping-type-field,`_type`>>:: <<mapping-type-field,`_type`>>::
The document's <<all-mapping-types,mapping type>>. The document's <<mapping-type,mapping type>>.
<<mapping-id-field,`_id`>>:: <<mapping-id-field,`_id`>>::
The document's ID. The document's ID.
[float]
=== Document source meta-fields
<<mapping-source-field,`_source`>>:: <<mapping-source-field,`_source`>>::
The original JSON representing the body of the document. The original JSON representing the body of the document.
<<mapping-size-field,`_size`>>::
The size of the `_source` field in bytes.
[float]
=== Indexing meta-fields
<<mapping-all-field,`_all`>>:: <<mapping-all-field,`_all`>>::
A _catch-all_ field that indexes the values of all other fields. A _catch-all_ field that indexes the values of all other fields.
@ -36,18 +47,6 @@ The meta-fields are:
All fields in the document which contain non-null values. All fields in the document which contain non-null values.
<<mapping-parent-field,`_parent`>>::
Used to create a parent-child relationship between two mapping types.
<<mapping-routing-field,`_routing`>>::
A custom routing value which routes a document to a particular shard.
<<mapping-size-field,`_size`>>::
The size of the `_source` field in bytes.
<<mapping-timestamp-field,`_timestamp`>>:: <<mapping-timestamp-field,`_timestamp`>>::
A timestamp associated with the document, either specified manually or auto-generated. A timestamp associated with the document, either specified manually or auto-generated.
@ -56,27 +55,49 @@ The meta-fields are:
How long a document should live before it is automatically deleted. How long a document should live before it is automatically deleted.
include::fields/index-field.asciidoc[] [float]
=== Routing meta-fields
include::fields/uid-field.asciidoc[] <<mapping-parent-field,`_parent`>>::
include::fields/type-field.asciidoc[] Used to create a parent-child relationship between two mapping types.
<<mapping-routing-field,`_routing`>>::
A custom routing value which routes a document to a particular shard.
[float]
=== Other meta-field
<<mapping-meta-field,`_meta`>>::
Application specific metadata.
include::fields/id-field.asciidoc[]
include::fields/source-field.asciidoc[]
include::fields/all-field.asciidoc[] include::fields/all-field.asciidoc[]
include::fields/field-names-field.asciidoc[] include::fields/field-names-field.asciidoc[]
include::fields/id-field.asciidoc[]
include::fields/index-field.asciidoc[]
include::fields/meta-field.asciidoc[]
include::fields/parent-field.asciidoc[] include::fields/parent-field.asciidoc[]
include::fields/routing-field.asciidoc[] include::fields/routing-field.asciidoc[]
include::fields/size-field.asciidoc[] include::fields/size-field.asciidoc[]
include::fields/source-field.asciidoc[]
include::fields/timestamp-field.asciidoc[] include::fields/timestamp-field.asciidoc[]
include::fields/ttl-field.asciidoc[] include::fields/ttl-field.asciidoc[]
include::fields/type-field.asciidoc[]
include::fields/uid-field.asciidoc[]

View File

@ -151,82 +151,18 @@ PUT my_index
<1> The `_all` field is disabled for the `my_type` type. <1> The `_all` field is disabled for the `my_type` type.
<2> The `query_string` query will default to querying the `content` field in this index. <2> The `query_string` query will default to querying the `content` field in this index.
[[include-in-all]] [[excluding-from-all]]
==== Including specific fields in `_all` ==== Excluding fields from `_all`
Individual fields can be included or excluded from the `_all` field with the Individual fields can be included or excluded from the `_all` field with the
`include_in_all` setting, which defaults to `true`: <<include-in-all,`include_in_all`>> setting.
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": { <1>
"type": "string"
}
"content": { <1>
"type": "string"
},
"date": { <2>
"type": "date",
"include_in_all": false
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> The `title` and `content` fields with be included in the `_all` field.
<2> The `date` field will not be included in the `_all` field.
The `include_in_all` parameter can also be set at the type level and on
<<mapping-object-type,`object`>> or <<mapping-nested-type,`nested`>> fields,
in which case all sub-fields inherit that setting. For instance:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"include_in_all": false, <1>
"properties": {
"title": { "type": "string" },
"author": {
"include_in_all": true, <2>
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" }
}
},
"editor": {
"properties": {
"first_name": { "type": "string" }, <3>
"last_name": { "type": "string", "include_in_all": true } <3>
}
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> All fields in `my_type` are excluded from `_all`.
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
<3> Only the `editor.last_name` field is included in `_all`.
The `editor.first_name` inherits the type-level setting and is excluded.
[[all-field-and-boosting]] [[all-field-and-boosting]]
==== Index boosting and the `_all` field ==== Index boosting and the `_all` field
Individual fields can be _boosted_ at index time, with the `boost` parameter. Individual fields can be _boosted_ at index time, with the <<index-boost,`boost`>>
The `_all` field takes these boosts into account: parameter. The `_all` field takes these boosts into account:
[source,js] [source,js]
-------------------------------- --------------------------------

View File

@ -2,8 +2,8 @@
=== `_id` field === `_id` field
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The <<mapping-type>>) and an <<mapping-id-field,`_id`>>. The `_id` field is not
`_id` field is not indexed as its value can be derived automatically from the indexed as its value can be derived automatically from the
<<mapping-uid-field,`_uid`>> field. <<mapping-uid-field,`_uid`>> field.
The value of the `_id` field is accessible in queries and scripts, but _not_ The value of the `_id` field is accessible in queries and scripts, but _not_

View File

@ -0,0 +1,30 @@
[[mapping-meta-field]]
=== `_meta` field
Each mapping type can have custom meta data associated with it. These are not
used at all by Elasticsearch, but can be used to store application-specific
metadata, such as the class that a document belongs to:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"user": {
"_meta": { <1>
"class": "MyApp::User",
"version": {
"min": "1.0",
"max": "1.3"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> This `_meta` info can be retrieved with the
<<indices-get-mapping,GET mapping>> API.
The `_meta` field can be updated on an existing type using the
<<indices-put-mapping,PUT mapping>> API.

View File

@ -78,8 +78,7 @@ stored.
WARNING: Removing fields from the `_source` has similar downsides to disabling WARNING: Removing fields from the `_source` has similar downsides to disabling
`_source`, especially the fact that you cannot reindex documents from one `_source`, especially the fact that you cannot reindex documents from one
Elasticsearch index to another. Consider using Elasticsearch index to another. Consider using
<<search-request-source-filtering,source filtering>> or a <<search-request-source-filtering,source filtering>> instead.
<<mapping-transform,transform script>> instead.
The `includes`/`excludes` parameters (which also accept wildcards) can be used The `includes`/`excludes` parameters (which also accept wildcards) can be used
as follows: as follows:

View File

@ -1,5 +1,5 @@
[[mapping-ttl-field]] [[mapping-ttl-field]]
=== `_ttl` === `_ttl` field
Some types of documents, such as session data or special offers, come with an Some types of documents, such as session data or special offers, come with an
expiration date. The `_ttl` field allows you to specify the minimum time a expiration date. The `_ttl` field allows you to specify the minimum time a

View File

@ -2,8 +2,8 @@
=== `_type` field === `_type` field
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The <<mapping-type>>) and an <<mapping-id-field,`_id`>>. The `_type` field is
`_type` field is indexed in order to make searching by type name fast. indexed in order to make searching by type name fast.
The value of the `_type` field is accessible in queries, aggregations, The value of the `_type` field is accessible in queries, aggregations,
scripts, and when sorting: scripts, and when sorting:

View File

@ -2,8 +2,8 @@
=== `_uid` field === `_uid` field
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. These <<mapping-type>>) and an <<mapping-id-field,`_id`>>. These values are
values are combined as `{type}#{id}` and indexed as the `_uid` field. combined as `{type}#{id}` and indexed as the `_uid` field.
The value of the `_uid` field is accessible in queries, aggregations, scripts, The value of the `_uid` field is accessible in queries, aggregations, scripts,
and when sorting: and when sorting:

View File

@ -1,25 +0,0 @@
[[mapping-meta]]
== Meta
Each mapping can have custom meta data associated with it. These are
simple storage elements that are simply persisted along with the mapping
and can be retrieved when fetching the mapping definition. The meta is
defined under the `_meta` element, for example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"_meta" : {
"attr1" : "value1",
"attr2" : {
"attr3" : "value3"
}
}
}
}
--------------------------------------------------
Meta can be handy for example for client libraries that perform
serialization and deserialization to store its meta model (for example,
the class the document maps to).

View File

@ -0,0 +1,100 @@
[[mapping-params]]
== Mapping parameters
The following pages provide detailed explanations of the various mapping
parameters that are used by <<mapping-types,field mappings>>:
The following mapping parameters are common to some or all field datatypes:
* <<analyzer,`analyzer`>>
* <<index-boost,`boost`>>
* <<coerce,`coerce`>>
* <<copy-to,`copy_to`>>
* <<doc-values,`doc_values`>>
* <<dynamic,`dynamic`>>
* <<enabled,`enabled`>>
* <<fielddata,`fielddata`>>
* <<geohash,`geohash`>>
* <<geohash-precision,`geohash_precision`>>
* <<geohash-prefix,`geohash_prefix`>>
* <<mapping-date-format,`format`>>
* <<ignore-above,`ignore_above`>>
* <<ignore-malformed,`ignore_malformed`>>
* <<include-in-all,`include_in_all`>>
* <<index-options,`index_options`>>
* <<lat-lon,`lat_lon`>>
* <<mapping-index,`index`>>
* <<multi-fields,`fields`>>
* <<norms,`norms`>>
* <<null-value,`null_value`>>
* <<position-offset-gap,`position_offset_gap`>>
* <<properties,`properties`>>
* <<search-analyzer,`search_analyzer`>>
* <<similarity,`similarity`>>
* <<mapping-store,`store`>>
* <<term-vector,`term_vector`>>
include::params/analyzer.asciidoc[]
include::params/boost.asciidoc[]
include::params/coerce.asciidoc[]
include::params/copy-to.asciidoc[]
include::params/doc-values.asciidoc[]
include::params/dynamic.asciidoc[]
include::params/enabled.asciidoc[]
include::params/fielddata.asciidoc[]
include::params/format.asciidoc[]
include::params/geohash.asciidoc[]
include::params/geohash-precision.asciidoc[]
include::params/geohash-prefix.asciidoc[]
include::params/ignore-above.asciidoc[]
include::params/ignore-malformed.asciidoc[]
include::params/include-in-all.asciidoc[]
include::params/index.asciidoc[]
include::params/index-options.asciidoc[]
include::params/lat-lon.asciidoc[]
include::params/multi-fields.asciidoc[]
include::params/norms.asciidoc[]
include::params/null-value.asciidoc[]
include::params/position-offset-gap.asciidoc[]
include::params/precision-step.asciidoc[]
include::params/properties.asciidoc[]
include::params/search-analyzer.asciidoc[]
include::params/similarity.asciidoc[]
include::params/store.asciidoc[]
include::params/term-vector.asciidoc[]
[source,js]
--------------------------------------------------
--------------------------------------------------
// AUTOSENSE

View File

@ -0,0 +1,80 @@
[[analyzer]]
=== `analyzer`
The values of <<mapping-index,`analyzed`>> string fields are passed through an
<<analysis,analyzer>> to convert the string into a stream of _tokens_ or
_terms_. For instance, the string `"The quick Brown Foxes."` may, depending
on which analyzer is used, be analyzed to the tokens: `quick`, `brown`,
`fox`. These are the actual terms that are indexed for the field, which makes
it possible to search efficiently for individual words _within_ big blobs of
text.
This analysis process needs to happen not just at index time, but also at
query time: the query string needs to be passed through the same (or a
similar) analyzer so that the terms that it tries to find are in the same
format as those that exist in the index.
Elasticsearch ships with a number of <<analysis-analyzers,pre-defined analyzers>>,
which can be used without further configuration. It also ships with many
<<analysis-charfilters,character filters>>, <<analysis-tokenizers,tokenizers>>,
and <<analysis-tokenfilters>> which can be combined to configure
custom analyzers per index.
Analyzers can be specified per-query, per-field or per-index. At index time,
Elasticsearch will look for an analyzer in this order:
* The `analyzer` defined in the field mapping.
* An analyzer named `default` in the index settings.
* The <<analysis-standard-analyzer,`standard`>> analyzer.
At query time, there are a few more layers:
* The `analyzer` defined in a <<full-text-queries,full-text query>>.
* The `search_analyzer` defined in the field mapping.
* The `analyzer` defined in the field mapping.
* An analyzer named `default_search` in the index settings.
* An analyzer named `default` in the index settings.
* The <<analysis-standard-analyzer,`standard`>> analyzer.
The easiest way to specify an analyzer for a particular field is to define it
in the field mapping, as follows:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": { <1>
"type": "string",
"fields": {
"english": { <2>
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
}
GET my_index/_analyze?field=text <3>
{
"text": "The quick Brown Foxes."
}
GET my_index/_analyze?field=text.english <4>
{
"text": "The quick Brown Foxes."
}
--------------------------------------------------
// AUTOSENSE
<1> The `text` field uses the default `standard` analyzer`.
<2> The `text.english` <<multi-fields,multi-field>> uses the `english` analyzer, which removes stop words and applies stemming.
<3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ].
<4> This returns the tokens: [ `quick`, `brown`, `fox` ].

View File

@ -0,0 +1,59 @@
[[index-boost]]
=== `boost`
Individual fields can be _boosted_ -- count more towards the relevance score
-- at index time, with the `boost` parameter as follows:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string",
"boost": 2 <1>
},
"content": {
"type": "string"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Matches on the `title` field will have twice the weight as those on the
`content` field, which has the default `boost` of `1.0`.
Note that a `title` field will usually be shorter than a `content` field. The
default relevance calculation takes field length into account, so a short
`title` field will have a higher natural boost than a long `content` field.
[WARNING]
.Why index time boosting is a bad idea
==================================================
We advise against using index time boosting for the following reasons:
* You cannot change index-time `boost` values without reindexing all of your
documents.
* Every query supports query-time boosting which achieves the same effect. The
difference is that you can tweak the `boost` value without having to reindex.
* Index-time boosts are stored as part of the <<norms,`norm`>>, which is only one
byte. This reduces the resolution of the field length normalization factor
which can lead to lower quality relevance calculations.
==================================================
The only advantage that index time boosting has is that it is copied with the
value into the <<mapping-all-field,`_all`>> field. This means that, when
querying the `_all` field, words that originated from the `title` field will
have a higher score than words that originated in the `content` field.
This functionality comes at a cost: queries on the `_all` field are slower
when index-time boosting is used.

View File

@ -0,0 +1,89 @@
[[coerce]]
=== `coerce`
Data is not always clean. Depending on how it is produced a number might be
rendered in the JSON body as a true JSON number, e.g. `5`, but it might also
be rendered as a string, e.g. `"5"`. Alternatively, a number that should be
an integer might instead be rendered as a floating point, e.g. `5.0`, or even
`"5.0"`.
Coercion attempts to clean up dirty values to fit the datatype of a field.
For instance:
* Strings will be coerced to numbers.
* Floating points will be truncated for integer values.
* Lon/lat geo-points will be normalized to a standard -180:180 / -90:90 coordinate system.
For instance:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"number_one": {
"type": "integer"
},
"number_two": {
"type": "integer",
"coerce": false
}
}
}
}
}
PUT my_index/my_type/1
{
"number_one": "10" <1>
}
PUT my_index/my_type/2
{
"number_two": "10" <2>
}
--------------------------------------------------
// AUTOSENSE
<1> The `number_one` field will contain the integer `10`.
<2> This document will be rejected because coercion is disabled.
[[coerce-setting]]
==== Index-level default
The `index.mapping.coerce` setting can be set on the index level to disable
coercion globally across all mapping types:
[source,js]
--------------------------------------------------
PUT my_index
{
"settings": {
"index.mapping.coerce": false
},
"mappings": {
"my_type": {
"properties": {
"number_one": {
"type": "integer"
},
"number_two": {
"type": "integer",
"coerce": true
}
}
}
}
}
PUT my_index/my_type/1
{ "number_one": "10" } <1>
PUT my_index/my_type/2
{ "number_two": "10" } <2>
--------------------------------------------------
// AUTOSENSE
<1> This document will be rejected because the `number_one` field inherits the index-level coercion setting.
<2> The `number_two` field overrides the index level setting to enable coercion.

View File

@ -0,0 +1,64 @@
[[copy-to]]
=== `copy_to`
The `copy_to` parameter allows you to create custom
<<mapping-all-field,`_all`>> fields. In other words, the values of multiple
fields can be copied into a group field, which can then be queried as a single
field. For instance, the `first_name` and `last_name` fields can be copied to
the `full_name` field as follows:
[source,js]
--------------------------------------------------
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "string",
"copy_to": "full_name" <1>
},
"last_name": {
"type": "string",
"copy_to": "full_name" <1>
},
"full_name": {
"type": "string"
}
}
}
}
}
PUT /my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
GET /my_index/_search
{
"query": {
"match": {
"full_name": { <2>
"query": "John Smith",
"operator": "and"
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The values of the `first_name` and `last_name` fields are copied to the
`full_name` field.
<2> The `first_name` and `last_name` fields can still be queried for the
first name and last name respectively, but the `full_name` field can be
queried for both first and last names.
Some important points:
* It is the field _value_ which is copied, not the terms (which result from the analysis process).
* The original <<mapping-source-field,`_source`>> field will not be modified to show the copied values.
* The same value can be copied to multiple fields, with `"copy_to": [ "field_1", "field_2" ]`

View File

@ -0,0 +1,46 @@
[[doc-values]]
=== `doc_values`
Most fields are <<mapping-index,indexed>> by default, which makes them
searchable. The inverted index allows queries to look up the search term in
unique sorted list of terms, and from that immediately have access to the list
of documents that contain the term.
Sorting, aggregations, and access to field values in scripts requires a
different data access pattern. Instead of lookup up the term and finding
documents, we need to be able to look up the document and find the terms that
is has in a field.
Doc values are the on-disk data structure, built at document index time, which
makes this data access pattern possible. Doc values are supported on almost
all field types, with the __notable exception of `analyzed` string fields__.
All fields which support doc values have them enabled by default. If you are
sure that you don't need to sort or aggregate on a field, or access the field
value from a script, you can disable doc values in order to save disk space:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"status_code": { <1>
"type": "string",
"index": "not_analyzed"
},
"session_id": { <2>
"type": "string",
"index": "not_analyzed",
"doc_values": false
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `status_code` field has `doc_values` enabled by default.
<2> The `session_id` has `doc_values` disabled, but can still be queried.

View File

@ -0,0 +1,87 @@
[[dynamic]]
=== `dynamic`
By default, fields can be added _dynamically_ to a document, or to
<<object,inner objects>> within a document, just by indexing a document
containing the new field. For instance:
[source,js]
--------------------------------------------------
DELETE my_index <1>
PUT my_index/my_type/1 <2>
{
"username": "johnsmith",
"name": {
"first": "John",
"last": "Smith"
}
}
GET my_index/_mapping <3>
PUT my_index/my_type/2 <4>
{
"username": "marywhite",
"email": "mary@white.com",
"name": {
"first": "Mary",
"middle": "Alice",
"last": "White"
}
}
GET my_index/_mapping <5>
--------------------------------------------------
// AUTOSENSE
<1> First delete the index, in case it already exists.
<2> This document introduces the string field `username`, the object field
`name`, and two string fields under the `name` object which can be
referred to as `name.first` and `name.last`.
<3> Check the mapping to verify the above.
<4> This document adds two string fields: `email` and `name.middle`.
<5> Check the mapping to verify the changes.
The details of how new fields are detected and added to the mapping is explained in <<dynamic-mapping>>.
The `dynamic` setting controls whether new fields can be added dynamically or
not. It accepts three settings:
[horizontal]
`true`:: Newly detected fields are added to the mapping. (default)
`false`:: Newly detected fields are ignored. New fields must be added explicitly.
`strict`:: If new fields are detected, an exception is thrown and the document is rejected.
The `dynamic` setting may be set at the mapping type level, and on each
<<object,inner object>>. Inner objects inherit the setting from their parent
object or from the mapping type. For instance:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"dynamic": false, <1>
"properties": {
"user": { <2>
"properties": {
"name": {
"type": "string"
},
"social_networks": { <3>
"dynamic": true,
"properties": {}
}
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Dynamic mapping is disabled at the type level, so no new top-level fields will be added dynamically.
<2> The `user` object inherits the type-level setting.
<3> The `user.social_networks` object enables dynamic mapping, so new fields may be added to this inner object.

View File

@ -0,0 +1,94 @@
[[enabled]]
=== `enabled`
Elasticsearch tries to index all of the fields you give it, but sometimes you
want to just store the field without indexing it. For instance, imagine that
you are using Elasticsearch as a web session store. You may want to index the
session ID and last update time, but you don't need to query or run
aggregations on the session data itself.
The `enabled` setting, which can be applied only to the mapping type and to
<<object,`object`>> fields, causes Elasticsearch to skip parsing of the
contents of the field entirely. The JSON can still be retrieved from the
<<mapping-source-field,`_source`>> field, but it is not searchable or stored
in any other way:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"session": {
"properties": {
"user_id": {
"type": "string",
"index": "not_analyzed"
},
"last_updated": {
"type": "date"
},
"session_data": { <1>
"enabled": false
}
}
}
}
}
PUT my_index/session/session_1
{
"user_id": "kimchy",
"session_data": { <2>
"arbitrary_object": {
"some_array": [ "foo", "bar", { "baz": 2 } ]
}
},
"last_updated": "2015-12-06T18:20:22"
}
PUT my_index/session/session_2
{
"user_id": "jpountz",
"session_data": "none", <3>
"last_updated": "2015-12-06T18:22:13"
}
--------------------------------------------------
// AUTOSENSE
<1> The `session_data` field is disabled.
<2> Any arbitrary data can be passed to the `session_data` field as it will be entirely ignored.
<3> The `session_data` will also ignore values that are not JSON objects.
The entire mapping type may be disabled as well, in which case the document is
stored in the <<mapping-source-field,`_source`>> field, which means it can be
retrieved, but none of its contents are indexed in any way:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"session": { <1>
"enabled": false
}
}
}
PUT my_index/session/session_1
{
"user_id": "kimchy",
"session_data": {
"arbitrary_object": {
"some_array": [ "foo", "bar", { "baz": 2 } ]
}
},
"last_updated": "2015-12-06T18:20:22"
}
GET my_index/session/session_1 <2>
GET my_index/_mapping <3>
--------------------------------------------------
// AUTOSENSE
<1> The entire `session` mapping type is disabled.
<2> The document can be retrieved.
<3> Checking the mapping reveals that no fields have been added.

View File

@ -0,0 +1,225 @@
[[fielddata]]
=== `fielddata`
Most fields are <<mapping-index,indexed>> by default, which makes them
searchable. The inverted index allows queries to look up the search term in
unique sorted list of terms, and from that immediately have access to the list
of documents that contain the term.
Sorting, aggregations, and access to field values in scripts requires a
different data access pattern. Instead of lookup up the term and finding
documents, we need to be able to look up the document and find the terms that
is has in a field.
Most fields can use index-time, on-disk <<doc-values,`doc_values`>> to support
this type of data access pattern, but `analyzed` string fields do not support
`doc_values`.
Instead, `analyzed` strings use a query-time data structure called
`fielddata`. This data structure is built on demand the first time that a
field is used for aggregations, sorting, or is accessed in a script. It is built
by reading the entire inverted index for each segment from disk, inverting the
term ↔︎ document relationship, and storing the result in memory, in the
JVM heap.
Loading fielddata is an expensive process so, once it has been loaded, it
remains in memory for the lifetime of the segment.
[WARNING]
.Fielddata can fill up your heap space
==============================================================================
Fielddata can consume a lot of heap space, especially when loading high
cardinality `analyzed` string fields. Most of the time, it doesn't make sense
to sort or aggregate on `analyzed` string fields (with the notable exception
of the
<<search-aggregations-bucket-significantterms-aggregation,`significant_terms`>>
aggregation). Always think about whether a `not_analyzed` field (which can
use `doc_values`) would be a better fit for your use case.
==============================================================================
[[fielddata-format]]
==== `fielddata.format`
For `analyzed` string fields, the fielddata `format` controls whether
fielddata should be enabled or not. It accepts: `disabled` and `paged_bytes`
(enabled, which is the default). To disable fielddata loading, you can use
the following mapping:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "string",
"fielddata": {
"format": "disabled" <1>
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `text` field cannot be used for sorting, aggregations, or in scripts.
.Fielddata and other datatypes
[NOTE]
==================================================
Historically, other field datatypes also used fielddata, but this has been replaced
by index-time, disk-based <<doc-values,`doc_values`>>.
==================================================
[[fielddata-loading]]
==== `fielddata.loading`
This per-field setting controls when fielddata is loaded into memory. It
accepts three options:
[horizontal]
`lazy`::
Fielddata is only loaded into memory when it is needed. (default)
`eager`::
Fielddata is loaded into memory before a new search segment becomes
visible to search. This can reduce the latency that a user may experience
if their search request has to trigger lazy loading from a big segment.
`eager_global_ordinals`::
Loading fielddata into memory is only part of the work that is required.
After loading the fielddata for each segment, Elasticsearch builds the
<<global-ordinals>> data structure to make a list of all unique terms
across all the segments in a shard. By default, global ordinals are built
lazily. If the field has a very high cardinality, global ordinals may
take some time to build, in which case you can use eager loading instead.
[[global-ordinals]]
.Global ordinals
*****************************************
Global ordinals is a data-structure on top of fielddata and doc values, that
maintains an incremental numbering for each unique term in a lexicographic
order. Each term has a unique number and the number of term 'A' is lower than
the number of term 'B'. Global ordinals are only supported on string fields.
Fielddata and doc values also have ordinals, which is a unique numbering for all terms
in a particular segment and field. Global ordinals just build on top of this,
by providing a mapping between the segment ordinals and the global ordinals,
the latter being unique across the entire shard.
Global ordinals are used for features that use segment ordinals, such as
sorting and the terms aggregation, to improve the execution time. A terms
aggregation relies purely on global ordinals to perform the aggregation at the
shard level, then converts global ordinals to the real term only for the final
reduce phase, which combines results from different shards.
Global ordinals for a specified field are tied to _all the segments of a
shard_, while fielddata and doc values ordinals are tied to a single segment.
which is different than for field data for a specific field which is tied to a
single segment. For this reason global ordinals need to be entirely rebuilt
whenever a once new segment becomes visible.
The loading time of global ordinals depends on the number of terms in a field, but in general
it is low, since it source field data has already been loaded. The memory overhead of global
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
can move the loading time from the first search request, to the refresh itself.
*****************************************
[[field-data-filtering]]
==== `fielddata.filter`
Fielddata filtering can be used to reduce the number of terms loaded into
memory, and thus reduce memory usage. Terms can be filtered by _frequency_ or
by _regular expression_, or a combination of the two:
Filtering by frequency::
+
--
The frequency filter allows you to only load terms whose term frequency falls
between a `min` and `max` value, which can be expressed an absolute
number (when the number is bigger than 1.0) or as a percentage
(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
*per segment*. Percentages are based on the number of docs which have a
value for the field, as opposed to all docs in the segment.
Small segments can be excluded completely by specifying the minimum
number of docs that the segment should contain with `min_segment_size`:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"tag": {
"type": "string",
"fielddata": {
"filter": {
"frequency": {
"min": 0.001,
"max": 0.1,
"min_segment_size": 500
}
}
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
--
Filtering by regex::
+
--
Terms can also be filtered by regular expression - only values which
match the regular expression are loaded. Note: the regular expression is
applied to each term in the field, not to the whole field value. For
instance, to only load hashtags from a tweet, we can use a regular
expression which matches terms beginning with `#`:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"tweet": {
"type": "string",
"analyzer": "whitespace",
"fielddata": {
"filter": {
"regex": {
"pattern": "^#.*"
}
}
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
--
These filters can be updated on an existing field mapping and will take
effect the next time the fielddata for a segment is loaded. Use the
<<indices-clearcache,Clear Cache>> API
to reload the fielddata using the new filters.

View File

@ -0,0 +1,281 @@
[[mapping-date-format]]
=== `format`
In JSON documents, dates are represented as strings. Elasticsearch uses a set
of preconfigured formats to recognize and parse these strings into a long
value representing _milliseconds-since-the-epoch_ in UTC.
Besides the <<built-in-date-formats,built-in formats>>, your own
<<custom-date-formats,custom formats>> can be specified using the familiar
`yyyy/MM/dd` syntax:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
Many APIs which support date values also support <<date-math,date math>>
expressions, such as `now-1m/d` -- the current time, minus one month, rounded
down to the nearest day.
[[custom-date-formats]]
==== Custom date formats
Completely customizable date formats are supported. The syntax for these is explained
http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[in the Joda docs].
[[built-in-date-formats]]
==== Built In Formats
Most of the below dates have a `strict` companion dates, which means, that
year, month and day parts of the week must have prepending zeros in order
to be valid. This means, that a date like `5/11/1` would not be valid, but
you would need to specify the full date, which would be `2005/11/01` in this
example. So instead of `date_optional_time` you would need to specify
`strict_date_optional_time`.
The following tables lists all the defaults ISO formats supported:
`epoch_millis`::
A formatter for the number of milliseconds since the epoch. Note, that
this timestamp allows a max length of 13 chars, so dates older than 1653
and 2286 are not supported. You should use a different date formatter in
that case.
`epoch_second`::
A formatter for the number of seconds since the epoch. Note, that this
timestamp allows a max length of 10 chars, so dates older than 1653 and
2286 are not supported. You should use a different date formatter in that
case.
[[strict-date-time]]`date_optional_time` or `strict_date_optional_time`::
A generic ISO datetime parser where the date is mandatory and the time is
optional.
http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[Full details here].
`basic_date`::
A basic formatter for a full date as four digit year, two digit month of
year, and two digit day of month: `yyyyMMdd`.
`basic_date_time`::
A basic formatter that combines a basic date and time, separated by a 'T':
`yyyyMMdd'T'HHmmss.SSSZ`.
`basic_date_time_no_millis`::
A basic formatter that combines a basic date and time without millis,
separated by a 'T': `yyyyMMdd'T'HHmmssZ`.
`basic_ordinal_date`::
A formatter for a full ordinal date, using a four digit year and three
digit dayOfYear: `yyyyDDD`.
`basic_ordinal_date_time`::
A formatter for a full ordinal date and time, using a four digit year and
three digit dayOfYear: `yyyyDDD'T'HHmmss.SSSZ`.
`basic_ordinal_date_time_no_millis`::
A formatter for a full ordinal date and time without millis, using a four
digit year and three digit dayOfYear: `yyyyDDD'T'HHmmssZ`.
`basic_time`::
A basic formatter for a two digit hour of day, two digit minute of hour,
two digit second of minute, three digit millis, and time zone offset:
`HHmmss.SSSZ`.
`basic_time_no_millis`::
A basic formatter for a two digit hour of day, two digit minute of hour,
two digit second of minute, and time zone offset: `HHmmssZ`.
`basic_t_time`::
A basic formatter for a two digit hour of day, two digit minute of hour,
two digit second of minute, three digit millis, and time zone off set
prefixed by 'T': `'T'HHmmss.SSSZ`.
`basic_t_time_no_millis`::
A basic formatter for a two digit hour of day, two digit minute of hour,
two digit second of minute, and time zone offset prefixed by 'T':
`'T'HHmmssZ`.
`basic_week_date` or `strict_basic_week_date`::
A basic formatter for a full date as four digit weekyear, two digit week
of weekyear, and one digit day of week: `xxxx'W'wwe`.
`basic_week_date_time` or `strict_basic_week_date_time`::
A basic formatter that combines a basic weekyear date and time, separated
by a 'T': `xxxx'W'wwe'T'HHmmss.SSSZ`.
`basic_week_date_time_no_millis` or `strict_basic_week_date_time_no_millis`::
A basic formatter that combines a basic weekyear date and time without
millis, separated by a 'T': `xxxx'W'wwe'T'HHmmssZ`.
`date` or `strict_date`::
A formatter for a full date as four digit year, two digit month of year,
and two digit day of month: `yyyy-MM-dd`.
`date_hour` or `strict_date_hour`::
A formatter that combines a full date and two digit hour of day.
`date_hour_minute` or `strict_date_hour_minute`::
A formatter that combines a full date, two digit hour of day, and two
digit minute of hour.
`date_hour_minute_second` or `strict_date_hour_minute_second`::
A formatter that combines a full date, two digit hour of day, two digit
minute of hour, and two digit second of minute.
`date_hour_minute_second_fraction` or `strict_date_hour_minute_second_fraction`::
A formatter that combines a full date, two digit hour of day, two digit
minute of hour, two digit second of minute, and three digit fraction of
second: `yyyy-MM-dd'T'HH:mm:ss.SSS`.
`date_hour_minute_second_millis` or `strict_date_hour_minute_second_millis`::
A formatter that combines a full date, two digit hour of day, two digit
minute of hour, two digit second of minute, and three digit fraction of
second: `yyyy-MM-dd'T'HH:mm:ss.SSS`.
`date_time` or `strict_date_time`::
A formatter that combines a full date and time, separated by a 'T': `yyyy-
MM-dd'T'HH:mm:ss.SSSZZ`.
`date_time_no_millis` or `strict_date_time_no_millis`::
A formatter that combines a full date and time without millis, separated
by a 'T': `yyyy-MM-dd'T'HH:mm:ssZZ`.
`hour` or `strict_hour`::
A formatter for a two digit hour of day.
`hour_minute` or `strict_hour_minute`::
A formatter for a two digit hour of day and two digit minute of hour.
`hour_minute_second` or `strict_hour_minute_second`::
A formatter for a two digit hour of day, two digit minute of hour, and two
digit second of minute.
`hour_minute_second_fraction` or `strict_hour_minute_second_fraction`::
A formatter for a two digit hour of day, two digit minute of hour, two
digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`.
`hour_minute_second_millis` or `strict_hour_minute_second_millis`::
A formatter for a two digit hour of day, two digit minute of hour, two
digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`.
`ordinal_date` or `strict_ordinal_date`::
A formatter for a full ordinal date, using a four digit year and three
digit dayOfYear: `yyyy-DDD`.
`ordinal_date_time` or `strict_ordinal_date_time`::
A formatter for a full ordinal date and time, using a four digit year and
three digit dayOfYear: `yyyy-DDD'T'HH:mm:ss.SSSZZ`.
`ordinal_date_time_no_millis` or `strict_ordinal_date_time_no_millis`::
A formatter for a full ordinal date and time without millis, using a four
digit year and three digit dayOfYear: `yyyy-DDD'T'HH:mm:ssZZ`.
`time` or `strict_time`::
A formatter for a two digit hour of day, two digit minute of hour, two
digit second of minute, three digit fraction of second, and time zone
offset: `HH:mm:ss.SSSZZ`.
`time_no_millis` or `strict_time_no_millis`::
A formatter for a two digit hour of day, two digit minute of hour, two
digit second of minute, and time zone offset: `HH:mm:ssZZ`.
`t_time` or `strict_t_time`::
A formatter for a two digit hour of day, two digit minute of hour, two
digit second of minute, three digit fraction of second, and time zone
offset prefixed by 'T': `'T'HH:mm:ss.SSSZZ`.
`t_time_no_millis` or `strict_t_time_no_millis`::
A formatter for a two digit hour of day, two digit minute of hour, two
digit second of minute, and time zone offset prefixed by 'T': `'T'HH:mm:ssZZ`.
`week_date` or `strict_week_date`::
A formatter for a full date as four digit weekyear, two digit week of
weekyear, and one digit day of week: `xxxx-'W'ww-e`.
`week_date_time` or `strict_week_date_time`::
A formatter that combines a full weekyear date and time, separated by a
'T': `xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ`.
`week_date_time_no_millis` or `strict_week_date_time_no_millis`::
A formatter that combines a full weekyear date and time without millis,
separated by a 'T': `xxxx-'W'ww-e'T'HH:mm:ssZZ`.
`weekyear` or `strict_weekyear`::
A formatter for a four digit weekyear.
`weekyear_week` or `strict_weekyear_week`::
A formatter for a four digit weekyear and two digit week of weekyear.
`weekyear_week_day` or `strict_weekyear_week_day`::
A formatter for a four digit weekyear, two digit week of weekyear, and one
digit day of week.
`year` or `strict_year`::
A formatter for a four digit year.
`year_month` or `strict_year_month`::
A formatter for a four digit year and two digit month of year.
`year_month_day` or `strict_year_month_day`::
A formatter for a four digit year, two digit month of year, and two digit
day of month.

View File

@ -0,0 +1,60 @@
[[geohash-precision]]
=== `geohash_precision`
Geohashes are a form of lat/lon encoding which divides the earth up into
a grid. Each cell in this grid is represented by a geohash string. Each
cell in turn can be further subdivided into smaller cells which are
represented by a longer string. So the longer the geohash, the smaller
(and thus more accurate) the cell is.
The `geohash_precision` setting controls the length of the geohash that is
indexed when the <<geohash,`geohash`>> option is enabled, and the maximum
geohash length when the <<geohash-prefix,`geohash_prefix`>> option is enabled.
It accepts:
* a number between 1 and 12 (default), which represents the length of the geohash.
* a <<distance-units,distance>>, e.g. `1km`.
If a distance is specified, it will be translated to the smallest
geohash-length that will provide the requested resolution.
For example:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_point",
"geohash_prefix": true,
"geohash_precision": 6 <1>
}
}
}
}
}
PUT my_index/my_type/1
{
"location": {
"lat": 41.12,
"lon": -71.34
}
}
GET my_index/_search?fielddata_fields=location.geohash
{
"query": {
"term": {
"location.geohash": "drm3bt"
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> A `geohash_precision` of 6 equates to geohash cells of approximately 1.26km x 0.6km

View File

@ -0,0 +1,64 @@
[[geohash-prefix]]
=== `geohash_prefix`
Geohashes are a form of lat/lon encoding which divides the earth up into
a grid. Each cell in this grid is represented by a geohash string. Each
cell in turn can be further subdivided into smaller cells which are
represented by a longer string. So the longer the geohash, the smaller
(and thus more accurate) the cell is.
While the <<geohash,`geohash`>> option enables indexing the geohash that
corresponds to the lat/lon point, at the specified
<<geohash-precision,precision>>, the `geohash_prefix` option will also
index all the enclosing cells as well.
For instance, a geohash of `drm3btev3e86` will index all of the following
terms: [ `d`, `dr`, `drm`, `drm3`, `drm3b`, `drm3bt`, `drm3bte`, `drm3btev`,
`drm3btev3`, `drm3btev3e`, `drm3btev3e8`, `drm3btev3e86` ].
The geohash prefixes can be used with the
<<query-dsl-geohash-cell-query,`geohash_cell` query>> to find points within a
particular geohash, or its neighbours:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_point",
"geohash_prefix": true,
"geohash_precision": 6
}
}
}
}
}
PUT my_index/my_type/1
{
"location": {
"lat": 41.12,
"lon": -71.34
}
}
GET my_index/_search?fielddata_fields=location.geohash
{
"query": {
"geohash_cell": {
"location": {
"lat": 41.02,
"lon": -71.48
},
"precision": 4, <1>
"neighbors": true <1>
}
}
}
--------------------------------------------------
// AUTOSENSE

View File

@ -0,0 +1,70 @@
[[geohash]]
=== `geohash`
Geohashes are a form of lat/lon encoding which divides the earth up into
a grid. Each cell in this grid is represented by a geohash string. Each
cell in turn can be further subdivided into smaller cells which are
represented by a longer string. So the longer the geohash, the smaller
(and thus more accurate) the cell is.
Because geohashes are just strings, they can be stored in an inverted
index like any other string, which makes querying them very efficient.
If you enable the `geohash` option, a `geohash` ``sub-field'' will be indexed
as, eg `.geohash`. The length of the geohash is controlled by the
<<geohash-precision,`geohash_precision`>> parameter.
If the <<geohash-prefix,`geohash_prefix`>> option is enabled, the `geohash`
option will be enabled automatically.
For example:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_point", <1>
"geohash": true
}
}
}
}
}
PUT my_index/my_type/1
{
"location": {
"lat": 41.12,
"lon": -71.34
}
}
GET my_index/_search?fielddata_fields=location.geohash <2>
{
"query": {
"prefix": {
"location.geohash": "drm3b" <3>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> A `location.geohash` field will be indexed for each geo-point.
<2> The geohash can be retrieved with <<doc-values,`doc_values`>>.
<3> A <<query-dsl-prefix-query,`prefix`>> query can find all geohashes which start with a particular prefix.
[WARNING]
============================================
A `prefix` query on geohashes is expensive. Instead, consider using the
<<geohash-prefix,`geohash_prefix`>> to pay the expense once at index time
instead of on every query.
============================================

View File

@ -0,0 +1,61 @@
[[ignore-above]]
=== `ignore_above`
Strings longer than the `ignore_above` setting will not be processed by the
<<analyzer,analyzer>> and will not be indexed. This is mainly useful for
<<mapping-index,`not_analyzed`>> string fields, which are typically used for
filtering, aggregations, and sorting. These are structured fields and it
doesn't usually make sense to allow very long terms to be indexed in these
fields.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"message": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 20 <1>
}
}
}
}
}
PUT my_index/my_type/1 <2>
{
"message": "Syntax error"
}
PUT my_index/my_type/2 <3>
{
"message": "Syntax error with some long stacktrace"
}
GET _search <4>
{
"aggs": {
"messages": {
"terms": {
"field": "message"
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> This field will ignore any string longer than 20 characters.
<2> This document is indexed successfully.
<3> This document will be indexed, but without indexing the `message` field.
<4> Search returns both documents, but only the first is present in the terms aggregation.
This option is also useful for protecting against Lucene's term byte-length
limit of `32766`.
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
3 bytes.

View File

@ -0,0 +1,83 @@
[[ignore-malformed]]
=== `ignore_malformed`
Sometimes you don't have much control over the data that you receive. One
user may send a `login` field that is a <<date,`date`>>, and another sends a
`login` field that is an email address.
Trying to index the wrong datatype into a field throws an exception by
default, and rejects the whole document. The `ignore_malformed` parameter, if
set to `true`, allows the exception to be ignored. The malformed field is not
indexed, but other fields in the document are processed normally.
For example:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"number_one": {
"type": "integer"
},
"number_two": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Some text value",
"number_one": "foo" <1>
}
PUT my_index/my_type/2
{
"text": "Some text value",
"number_two": "foo" <2>
}
--------------------------------------------------
// AUTOSENSE
<1> This document will be rejected because `number_one` does not allow malformed values.
<2> This document will have the `text` field indexed, but not the `number_two` field.
[[ignore-malformed-setting]]
==== Index-level default
The `index.mapping.ignore_malformed` setting can be set on the index level to
allow to ignore malformed content globally across all mapping types.
[source,js]
--------------------------------------------------
PUT my_index
{
"settings": {
"index.mapping.ignore_malformed": true <1>
},
"mappings": {
"my_type": {
"properties": {
"number_one": { <1>
"type": "byte"
},
"number_two": {
"type": "integer",
"ignore_malformed": false <2>
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `number_one` field inherits the index-level setting.
<2> The `number_two` field overrides the index-level setting to turn off `ignore_malformed`.

View File

@ -0,0 +1,83 @@
[[include-in-all]]
=== `include_in_all`
The `include_in_all` parameter provides per-field control over which fields
are included in the <<mapping-all-field,`_all`>> field. It defaults to `true`, unless <<mapping-index,`index`>> is set to `no`.
This example demonstrates how to exclude the `date` field from the `_all` field:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": { <1>
"type": "string"
}
"content": { <1>
"type": "string"
},
"date": { <2>
"type": "date",
"include_in_all": false
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> The `title` and `content` fields with be included in the `_all` field.
<2> The `date` field will not be included in the `_all` field.
The `include_in_all` parameter can also be set at the type level and on
<<object,`object`>> or <<nested,`nested`>> fields, in which case all sub-
fields inherit that setting. For instance:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"include_in_all": false, <1>
"properties": {
"title": { "type": "string" },
"author": {
"include_in_all": true, <2>
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" }
}
},
"editor": {
"properties": {
"first_name": { "type": "string" }, <3>
"last_name": { "type": "string", "include_in_all": true } <3>
}
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> All fields in `my_type` are excluded from `_all`.
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
<3> Only the `editor.last_name` field is included in `_all`.
The `editor.first_name` inherits the type-level setting and is excluded.
[NOTE]
.Multi-fields and `include_in_all`
=================================
The original field value is added to the `_all` field, not the terms produced
by a field's analyzer. For this reason, it makes no sense to set
`include_in_all` to `true` on <<multi-fields,multi-fields>>, as each
multi-field has exactly the same value as its parent.
=================================

View File

@ -0,0 +1,70 @@
[[index-options]]
=== `index_options`
The `index_options` parameter controls what information is added to the
inverted index, for search and highlighting purposes. It accepts the
following settings:
[horizontal]
`docs`::
Only the doc number is indexed. Can answer the question _Does this term
exist in this field?_
`freqs`::
Doc number and term frequencies are indexed. Term frequencies are used to
score repeated terms higher than single terms.
`positions`::
Doc number, term frequencies, and term positions (or order) are indexed.
Positions can be used for
<<query-dsl-match-query-phrase,proximity or phrase queries>>.
`offsets`::
Doc number, term frequencies, positions, and start and end character
offsets (which map the term back to the original string) are indexed.
Offsets are used by the <<postings-highlighter,postings highlighter>>.
<<mapping-index,Analyzed>> string fields use `positions` as the default, and
<<all other fields use `docs` as the default.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "string",
"index_options": "offsets"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Quick brown fox"
}
GET my_index/_search
{
"query": {
"match": {
"text": "brown fox"
}
},
"highlight": {
"fields": {
"text": {} <1>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `text` field will use the postings highlighter by default because `offsets` are indexed.

View File

@ -0,0 +1,48 @@
[[mapping-index]]
=== `index`
The `index` option controls how field values are indexed and, thus, how they
are searchable. It accepts three values:
[horizontal]
`no`::
Do not add this field value to the index. With this setting, the field
will not be queryable.
`not_analyzed`::
Add the field value to the index unchanged, as a single term. This is the
default for all fields that support this option except for
<<string,`string`>> fields. `not_analyzed` fields are usually used with
<<term-level-queries,term-level queries>> for structured search.
`analyzed`::
This option applies only to `string` fields, for which it is the default.
The string field value is first <<analysis,analyzed>> to convert the
string into terms (e.g. a list of individual words), which are then
indexed. At search time, the the query string is passed through
(<<search-analyzer,usually>>) the same analyzer to generate terms
in the same format as those in the index. It is this process that enables
<<full-text-queries,full text search>>.
For example, you can create a `not_analyzed` string field with the following:
[source,js]
--------------------------------------------------
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"status_code": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE

View File

@ -0,0 +1,63 @@
[[lat-lon]]
=== `lat_lon`
<<geo-queries,Geo-queries>> are usually performed by plugging the value of
each <<geo-point,`geo_point`>> field into a formula to determine whether it
falls into the required area or not. Unlike most queries, the inverted index
is not involved.
Setting `lat_lon` to `true` causes the latitude and longitude values to be
indexed as numeric fields (called `.lat` and `.lon`). These fields can be used
by the <<query-dsl-geo-bounding-box-query,`geo_bounding_box`>> and
<<query-dsl-geo-distance-query,`geo_distance`>> queries instead of
performing in-memory calculations.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_point",
"lat_lon": true <1>
}
}
}
}
}
PUT my_index/my_type/1
{
"location": {
"lat": 41.12,
"lon": -71.34
}
}
GET my_index/_search
{
"query": {
"geo_distance": {
"location": {
"lat": 41,
"lon": -71
},
"distance": "50km",
"optimize_bbox": "indexed" <2>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Setting `lat_lon` to true indexes the geo-point in the `location.lat` and `location.lon` fields.
<2> The `indexed` option tells the geo-distance query to use the inverted index instead of the in-memory calculation.
Whether the in-memory or indexed operation performs better depends both on
your dataset and on the types of queries that you are running.
NOTE: The `lat_lon` option only makes sense for single-value `geo_point`
fields. It will not work with arrays of geo-points.

View File

@ -0,0 +1,132 @@
[[multi-fields]]
=== `fields`
It is often useful to index the same field in different ways for different
purposes. This is the purpose of _multi-fields_. For instance, a `string`
field could be <<mapping-index,indexed>> as an `analyzed` field for full-text
search, and as a `not_analyzed` field for sorting or aggregations:
[source,js]
--------------------------------------------------
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": { <1>
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
PUT /my_index/my_type/1
{
"city": "New York"
}
PUT /my_index/my_type/2
{
"city": "York"
}
GET /my_index/_search
{
"query": {
"match": {
"city": "york" <2>
}
},
"sort": {
"city.raw": "asc" <3>
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw" <3>
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `city.raw` field is a `not_analyzed` version of the `city` field.
<2> The analyzed `city` field can be used for full text search.
<3> The `city.raw` field can be used for sorting and aggregations
NOTE: Multi-fields do not change the original `_source` field.
==== Multi-fields with multiple analyzers
Another use case of multi-fields is to analyze the same field in different
ways for better relevance. For instance we could index a field with the
<<analysis-standard-analyzer,`standard` analyzer>> which breaks text up into
words, and again with the <<english-analyzer,`english` analyzer>>
which stems words into their root form:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": { <1>
"type": "string"
},
"fields": {
"english": { <2>
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
PUT my_index/my_type/1
{ "text": "quick brown fox" } <3>
PUT my_index/my_type/2
{ "text": "quick brown foxes" } <3>
GET my_index/_search
{
"query": {
"multi_match": {
"query": "quick brown foxes",
"fields": [ <4>
"text",
"text.english"
],
"type": "most_fields" <4>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `text` field uses the `standard` analyzer.
<2> The `text.english` field uses the `english` analyzer.
<3> Index two documents, one with `fox` and the other with `foxes`.
<4> Query both the `text` and `text.english` fields and combine the scores.
The `text` field contains the term `fox` in the first document and `foxes` in
the second document. The `text.english` field contains `fox` for both
documents, because `foxes` is stemmed to `fox`.
The query string is also analyzed by the `standard` analyzer for the `text`
field, and by the `english` analyzer` for the `text.english` field. The
stemmed field allows a query for `foxes` to also match the document containing
just `fox`. This allows us to match as many documents as possible. By also
querying the unstemmed `text` field, we improve the relevance score of the
document which matches `foxes` exactly.

View File

@ -0,0 +1,64 @@
[[norms]]
=== `norms`
Norms store various normalization factors -- a number to represent the
relative field length and the <<index-boost,index time `boost`>> setting --
that are later used at query time in order to compute the score of a document
relatively to a query.
Although useful for scoring, norms also require quite a lot of memory
(typically in the order of one byte per document per field in your index, even
for documents that don't have this specific field). As a consequence, if you
don't need scoring on a specific field, you should disable norms on that
field. In particular, this is the case for fields that are used solely for
filtering or aggregations.
Norms can be disabled (but not reenabled) after the fact, using the
<<indices-put-mapping,PUT mapping API>> like so:
[source,js]
------------
PUT my_index/_mapping/my_type
{
"properties": {
"title": {
"type": "string",
"norms": {
"enabled": false
}
}
}
}
------------
// AUTOSENSE
NOTE: Norms will not be removed instantly, but will be removed as old segments
are merged into new segments as you continue indexing new documents. Any score
computation on a field that has had norms removed might return inconsistent
results since some documents won't have norms anymore while other documents
might still have norms.
==== Lazy loading of norms
Norms can be loaded into memory eagerly (`eager`), whenever a new segment
comes online, or they can loaded lazily (`lazy`, default), only when the field
is queried.
Eager loading can be configured as follows:
[source,js]
------------
PUT my_index/_mapping/my_type
{
"properties": {
"title": {
"type": "string",
"norms": {
"loading": "eager"
}
}
}
}
------------
// AUTOSENSE

View File

@ -0,0 +1,58 @@
[[null-value]]
=== `null_value`
A `null` value cannot be indexed or searched. When a field is set to `null`,
(or an empty array or an array of `null` values) it is treated as though that
field has no values.
The `null_value` parameter allows you to replace explicit `null` values with
the specified value so that it can be indexed and searched. For instance:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"status_code": {
"type": "string",
"index": "not_analyzed",
"null_value": "NULL" <1>
}
}
}
}
}
PUT my_index/my_type/1
{
"status_code": null
}
PUT my_index/my_type/2
{
"status_code": [] <2>
}
GET my_index/_search
{
"query": {
"term": {
"status_code": "NULL" <3>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Replace explicit `null` values with the term `NULL`.
<2> An empty array does not contain an explicit `null`, and so won't be replaced with the `null_value`.
<3> A query for `NULL` returns document 1, but not document 2.
IMPORTANT: The `null_value` needs to be the same datatype as the field. For
instance, a `long` field cannot have a string `null_value`. String fields
which are `analyzed` will also pass the `null_value` through the configured
analyzer.
Also see the <<query-dsl-missing-query,`missing` query>> for its `null_value` support.

View File

@ -0,0 +1,68 @@
[[position-offset-gap]]
=== `position_offset_gap`
<<mapping-index,Analyzed>> string fields take term <<index-options,positions>>
into account, in order to be able to support
<<query-dsl-match-query-phrase,proximity or phrase queries>>.
When indexing an array of strings, each string of the array is indexed
directly after the previous one, almost as though all the strings in the array
had been concatenated into one big string.
This can result in matches from phrase queries spanning two array elements.
For instance:
[source,js]
--------------------------------------------------
PUT /my_index/groups/1
{
"names": [ "John Abraham", "Lincoln Smith"]
}
GET /my_index/groups/_search
{
"query": {
"match_phrase": {
"names": "Abraham Lincoln" <1>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> This phrase query matches our document, even though `Abraham` and `Lincoln` are in separate strings.
The `position_offset_gap` can introduce a fake gap between each array element. For instance:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"names": {
"type": "string",
"position_offset_gap": 50 <1>
}
}
}
}
}
PUT /my_index/groups/1
{
"names": [ "John Abraham", "Lincoln Smith"]
}
GET /my_index/groups/_search
{
"query": {
"match_phrase": {
"names": "Abraham Lincoln" <2>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The first term in the next array element will be 50 terms apart from the
last term in the previous array element.
<2> The phrase query no longer matches our document.

View File

@ -0,0 +1,56 @@
[[precision-step]]
=== `precision_step`
Most <<number,numeric>> datatypes index extra terms representing numeric
ranges for each number to make <<query-dsl-range-query,`range` queries>>
faster. For instance, this `range` query:
[source,js]
--------------------------------------------------
"range": {
"number": {
"gte": 0
"lte": 321
}
}
--------------------------------------------------
might be executed internally as a <<query-dsl-terms-query,`terms` query>> that
looks something like this:
[source,js]
--------------------------------------------------
"terms": {
"number": [
"0-255",
"256-319"
"320",
"321"
]
}
--------------------------------------------------
These extra terms greatly reduce the number of terms that have to be examined,
at the cost of increased disk space.
The default value for `precision_step` depends on the `type` of the numeric field:
[horizontal]
`long`, `double`, `date`, `ip`:: `16` (3 extra terms)
`integer`, `float`, `short`:: `8` (3 extra terms)
`byte`:: `2147483647` (0 extra terms)
`token_count`:: `32` (0 extra terms)
The value of the `precision_step` setting indicates the number of bits that
should be compressed into an extra term. A `long` value consists of 64 bits,
so a `precision_step` of 16 results in the following terms:
[horizontal]
Bits 0-15:: `value & 1111111111111111 0000000000000000 0000000000000000 0000000000000000`
Bits 0-31:: `value & 1111111111111111 1111111111111111 0000000000000000 0000000000000000`
Bits 0-47:: `value & 1111111111111111 1111111111111111 1111111111111111 0000000000000000`
Bits 0-63:: `value`

View File

@ -0,0 +1,101 @@
[[properties]]
=== `properties`
Type mappings, <<object,`object` fields>> and <<nested,`nested` fields>>
contain sub-fields, called `properties`. These properties may be of any
<<mapping-types,datatype>>, including `object` and `nested`. Properties can
be added:
* explicitly by defining them when <<indices-create-index,creating an index>>.
* explicitily by defining them when adding or updating a mapping type with the <<indices-put-mapping,PUT mapping>> API.
* <<dynamic-mapping,dynamically>> just by indexing documents containing new fields.
Below is an example of adding `properties` to a mapping type, an `object`
field, and a `nested` field:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": { <1>
"properties": {
"manager": { <2>
"properties": {
"age": { "type": "integer" },
"name": { "type": "string" }
}
},
"employees": { <3>
"type": "nested",
"properties": {
"age": { "type": "integer" },
"name": { "type": "string" }
}
}
}
}
}
}
PUT my_index/my_type/1 <4>
{
"region": "US",
"manager": {
"name": "Alice White",
"age": 30
},
"employees": [
{
"name": "John Smith",
"age": 34
},
{
"name": "Peter Brown",
"age": 26
}
]
}
--------------------------------------------------
// AUTOSENSE
<1> Properties under the `my_type` mapping type.
<2> Properties under the `manager` object field.
<3> Properties under the `employees` nested field.
<4> An example document which corresponds to the above mapping.
==== Dot notation
Inner fields can be referred to in queries, aggregations, etc., using _dot
notation_:
[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"match": {
"manager.name": "Alice White" <1>
}
},
"aggs": {
"Employees": {
"nested": {
"path": "employees"
},
"aggs": {
"Employee Ages": {
"histogram": {
"field": "employees.age", <2>
"interval": 5
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
IMPORTANT: The full path to the inner field must be specified.

View File

@ -0,0 +1,79 @@
[[search-analyzer]]
=== `search_analyzer`
Usually, the same <<analyzer,analyzer>> should be applied at index time and at
search time, to ensure that the terms in the query are in the same format as
the terms in the inverted index.
Sometimes, though, it can make sense to use a different analyzer at search
time, such as when using the <<analysis-edgengram-tokenizer,`edge_ngram`>>
tokenizer for autocomplete.
By default, queries will use the `analyzer` defined in the field mapping, but
this can be overridden with the `search_analyzer` setting:
[source,js]
--------------------------------------------------
PUT /my_index
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": { <1>
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "string",
"analyzer": "autocomplete", <2>
"search_analyzer": "standard" <2>
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Quick Brown Fox" <3>
}
GET my_index/_search
{
"query": {
"match": {
"text": {
"query": "Quick Br", <4>
"operator": "and"
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Analysis settings to define the custom `autocomplete` analyzer.
<2> The `text` field uses the `autocomplete` analyzer at index time, but the `standard` analyzer at search time.
<3> This field is indexed as the terms: [ `q`, `qu`, `qui`, `quic`, `quick`, `b`, `br`, `bro`, `brow`, `brown`, `f`, `fo`, `fox` ]
<4> The query searches for both of these terms: [ `quick`, `br` ]
See {defguide}/_index_time_search_as_you_type.html[Index time search-as-you-
type] for a full explanation of this example.

View File

@ -0,0 +1,54 @@
[[similarity]]
=== `similarity`
Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
field. The `similarity` setting provides a simple way of choosing a similarity
algorithm other than the default TF/IDF, such as `BM25`.
Similarities are mostly useful for <<string,`string`>> fields, especially
`analyzed` string fields, but can also apply to other field types.
Custom similarites can be configured by tuning the parameters of the built-in
similarities. For more details about this expert options, see the
<<index-modules-similarity,similarity module>>.
The only similarities which can be used out of the box, without any further
configuration are:
`default`::
The Default TF/IDF algorithm used by Elasticsearch and
Lucene. See {defguide}/practical-scoring-function.html[Lucenes Practical Scoring Function]
for more information.
`BM25`::
The Okapi BM25 algorithm.
See {defguide}/pluggable-similarites.html[Plugggable Similarity Algorithms]
for more information.
The `similarity` can be set on the field level when a field is first created,
as follows:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"default_field": { <1>
"type": "string"
},
"bm25_field": {
"type": "string",
"similarity": "BM25" <2>
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `default_field` uses the `default` similarity (ie TF/IDF).
<2> The `bm25_field` uses the `BM25` similarity.

View File

@ -0,0 +1,73 @@
[[mapping-store]]
=== `store`
By default, field values <<mapping-index,indexed>> to make them searchable,
but they are not _stored_. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn't matter. The field value is already part of the
<<mapping-source-field,`_source` field>>, which is stored by default. If you
only want to retrieve the value of a single field or of a few fields, instead
of the whole `_source`, then this can be achieved with
<<search-request-source-filtering,source filtering>>.
In certain situations it can make sense to `store` a field. For instance, if
you have a document with a `title`, a `date`, and a very large `content`
field, you may want to retrieve just the `title` and the `date` without having
to extract those fields from a large `_source` field:
[source,js]
--------------------------------------------------
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "string",
"store": true <1>
},
"date": {
"type": "date",
"store": true <1>
},
"content": {
"type": "string"
}
}
}
}
}
PUT /my_index/my_type/1
{
"title": "Some short title",
"date": "2015-01-01",
"content": "A very long content field..."
}
GET my_index/_search
{
"fields": [ "title", "date" ] <2>
}
--------------------------------------------------
// AUTOSENSE
<1> The `title` and `date` fields are stored.
<2> This request will retrieve the values of the `title` and `date` fields.
[NOTE]
.Stored fields returned as arrays
======================================
For consistency, stored fields are always returned as an _array_ because there
is no way of knowing if the original field value was a single value, multiple
values, or an empty array.
If you need the original value, you should retrieve it from the `_source`
field instead.
======================================
Another situation where it can make sense to make a field stored is for those
that don't appear in the `_source` field (such as <<copy-to,`copy_to` fields>>).

View File

@ -0,0 +1,68 @@
[[term-vector]]
=== `term_vector`
Term vectors contain information about the terms produced by the
<<analysis,analysis>> process, including:
* a list of terms.
* the position (or order) of each term.
* the start and end character offsets mapping the term to its
origin in the original string.
These term vectors can be stored so that they can be retrieved for a
particular document.
The `term_vector` setting accepts:
[horizontal]
`no`:: No term vectors are stored. (default)
`yes`:: Just the terms in the field are stored.
`with_positions`:: Terms and positions are stored.
`with_offsets`:: Terms and character offsets are stored.
`with_positions_offsets`:: Terms, positions, and character offsets are stored.
The fast vector highlighter requires `with_positions_offsets`. The term
vectors API can retrieve whatever is stored.
WARNING: Setting `with_positions_offsets` will double the size of a field's
index.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Quick brown fox"
}
GET my_index/_search
{
"query": {
"match": {
"text": "brown fox"
}
},
"highlight": {
"fields": {
"text": {} <1>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The fast vector highlighter will be used by default for the `text` field
because term vectors are enabled.

View File

@ -1,61 +0,0 @@
[[mapping-transform]]
== Transform
The document can be transformed before it is indexed by registering a
script in the `transform` element of the mapping. The result of the
transform is indexed but the original source is stored in the `_source`
field. Example:
[source,js]
--------------------------------------------------
{
"example" : {
"transform" : {
"script" : {
"inline": "if (ctx._source['title']?.startsWith('t')) ctx._source['suggest'] = ctx._source['content']",
"params" : {
"variable" : "not used but an example anyway"
},
"lang": "groovy"
}
},
"properties": {
"title": { "type": "string" },
"content": { "type": "string" },
"suggest": { "type": "string" }
}
}
}
--------------------------------------------------
Its also possible to specify multiple transforms:
[source,js]
--------------------------------------------------
{
"example" : {
"transform" : [
{"script": "ctx._source['suggest'] = ctx._source['content']"}
{"script": "ctx._source['foo'] = ctx._source['bar'];"}
]
}
}
--------------------------------------------------
Because the result isn't stored in the source it can't normally be fetched by
source filtering. It can be highlighted if it is marked as stored.
=== Get Transformed
The get endpoint will retransform the source if the `_source_transform`
parameter is set. Example:
[source,sh]
--------------------------------------------------
curl -XGET "http://localhost:9200/test/example/3?pretty&_source_transform"
--------------------------------------------------
The transform is performed before any source filtering but it is mostly
designed to make it easy to see what was passed to the index for debugging.
=== Immutable Transformation
Once configured the transform script cannot be modified. This is not
because that is technically impossible but instead because madness lies
down that road.

View File

@ -1,24 +1,71 @@
[[mapping-types]] [[mapping-types]]
== Types == Field datatypes
The datatype for each field in a document (eg strings, numbers, Elasticsearch supports a number of different datatypes for the fields in a
objects etc) can be controlled via the type mapping. document:
include::types/core-types.asciidoc[] [float]
=== Core datatypes
include::types/array-type.asciidoc[] <<string>>:: `string`
<<number>>:: `long`, `integer`, `short`, `byte`, `double`, `float`
<<date>>:: `date`
<<boolean>>:: `boolean`
<<binary>>:: `binary`
include::types/object-type.asciidoc[] [float]
=== Complex datatypes
include::types/root-object-type.asciidoc[] <<array>>:: Array support does not require a dedicated `type`
<<object>>:: `object` for single JSON objects
<<nested>>:: `nested` for arrays of JSON objects
[float]
=== Geo dataypes
<<geo-point>>:: `geo_point` for lat/lon points
<<geo-shape>>:: `geo_shape` for complex shapes like polygons
[float]
=== Specialised datatypes
<<ip>>:: `ip` for IPv4 addresses
<<search-suggesters-completion,Completion datatype>>::
`completion` to provide auto-complete suggestions
<<token-count>>:: `token_count` to count the number of tokens in a string
Attachment datatype::
See the https://github.com/elastic/elasticsearch-mapper-attachments[mapper attachment plugin]
which supports indexing ``attachments'' like Microsoft Office formats, Open
Document formats, ePub, HTML, etc. into an `attachment` datatype.
include::types/array.asciidoc[]
include::types/binary.asciidoc[]
include::types/boolean.asciidoc[]
include::types/date.asciidoc[]
include::types/geo-point.asciidoc[]
include::types/geo-shape.asciidoc[]
include::types/ip.asciidoc[]
include::types/nested.asciidoc[]
include::types/numeric.asciidoc[]
include::types/object.asciidoc[]
include::types/string.asciidoc[]
include::types/token-count.asciidoc[]
include::types/nested-type.asciidoc[]
include::types/ip-type.asciidoc[]
include::types/geo-point-type.asciidoc[]
include::types/geo-shape-type.asciidoc[]
include::types/attachment-type.asciidoc[]

View File

@ -1,69 +0,0 @@
[[mapping-array-type]]
=== Array Type
JSON documents allow to define an array (list) of fields or objects.
Mapping array types could not be simpler since arrays gets automatically
detected and mapping them can be done either with
<<mapping-core-types,Core Types>> or
<<mapping-object-type,Object Type>> mappings.
For example, the following JSON defines several arrays:
[source,js]
--------------------------------------------------
{
"tweet" : {
"message" : "some arrays in this tweet...",
"tags" : ["elasticsearch", "wow"],
"lists" : [
{
"name" : "prog_list",
"description" : "programming list"
},
{
"name" : "cool_list",
"description" : "cool stuff list"
}
]
}
}
--------------------------------------------------
The above JSON has the `tags` property defining a list of a simple
`string` type, and the `lists` property is an `object` type array. Here
is a sample explicit mapping:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"message" : {"type" : "string"},
"tags" : {"type" : "string"},
"lists" : {
"properties" : {
"name" : {"type" : "string"},
"description" : {"type" : "string"}
}
}
}
}
}
--------------------------------------------------
The fact that array types are automatically supported can be shown by
the fact that the following JSON document is perfectly fine:
[source,js]
--------------------------------------------------
{
"tweet" : {
"message" : "some arrays in this tweet...",
"tags" : "elasticsearch",
"lists" : {
"name" : "prog_list",
"description" : "programming list"
}
}
}
--------------------------------------------------

View File

@ -0,0 +1,99 @@
[[array]]
=== Array datatype
In Elasticsearch, there is no dedicated `array` type. Any field can contain
zero or more values by default, however, all values in the array must be of
the same datatype. For instance:
* an array of strings: [ `"one"`, `"two"` ]
* an array of integers: [ `1`, `2` ]
* an array of arrays: [ `1`, [ `2`, `3` ]] which is the equivalent of [ `1`, `2`, `3` ]
* an array of objects: [ `{ "name": "Mary", "age": 12 }`, `{ "name": "John", "age": 10 }`]
.Arrays of objects
[NOTE]
====================================================
Arrays of objects do not work as you would expect: you cannot query each
object independently of the other objects in the array. If you need to be
able to do this then you should use the <<nested,`nested`>> datatype instead
of the <<object,`object`>> datatype.
This is explained in more detail in <<nested>>.
====================================================
When adding a field dynamically, the first value in the array determines the
field `type`. All subsequent values must be of the same datatype or it must
at least be possible to <<coerce,coerce>> subsequent values to the same
datatype.
Arrays with a mixture of datatypes are _not_ supported: [ `10`, `"some string"` ]
An array may contain `null` values, which are either replaced by the
configured <<null-value,`null_value`>> or skipped entirely. An empty array
`[]` is treated as a missing field -- a field with no values.
Nothing needs to be pre-configured in order to use arrays in documents, they
are supported out of the box:
[source,js]
--------------------------------------------------
PUT my_index/my_type/1
{
"message": "some arrays in this document...",
"tags": [ "elasticsearch", "wow" ], <1>
"lists": [ <2>
{
"name": "prog_list",
"description": "programming list"
},
{
"name": "cool_list",
"description": "cool stuff list"
}
]
}
PUT my_index/my_type/2 <3>
{
"message": "no arrays in this document...",
"tags": "elasticsearch",
"lists": {
"name": "prog_list",
"description": "programming list"
}
}
GET my_index/_search
{
"query": {
"match": {
"tags": "elasticsearch" <4>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `tags` field is dynamically added as a `string` field.
<2> The `lists` field is dynamically added as an `object` field.
<3> The second document contains no arrays, but can be indexed into the same fields.
<4> The query looks for `elasticsearch` in the `tags` field, and matches both documents.
.Multi-value fields and the inverted index
****************************************************
The fact that all field types support multi-value fields out of the box is a
consequence of the origins of Lucene. Lucene was designed to be a full text
search engine. In order to be able to search for individual words within a
big block of text, Lucene tokenizes the text into individual terms, and
adds each term to the inverted index separately.
This means that even a simple text field must be able to support multiple
values by default. When other datatypes were added, such as numbers and
dates, they used the same data structure as strings, and so got multi-values
for free.
****************************************************

View File

@ -1,13 +0,0 @@
[[mapping-attachment-type]]
=== Attachment Type
The `attachment` type allows to index different "attachment" type field
(encoded as `base64`), for example, Microsoft Office formats, open
document formats, ePub, HTML, and so on.
The `attachment` type is provided as a
https://github.com/elasticsearch/elasticsearch-mapper-attachments[plugin
extension]. It uses http://tika.apache.org/[Apache Tika] behind the scene.
See https://github.com/elasticsearch/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch[README file]
for details.

View File

@ -0,0 +1,52 @@
[[binary]]
=== Binary datatype
The `binary` type accepts a binary value as a
https://en.wikipedia.org/wiki/Base64[Base64] encoded string. The field is not
stored by default and is not searchable:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string"
},
"blob": {
"type": "binary"
}
}
}
}
}
PUT my_index/my_type/1
{
"name": "Some binary blob",
"blob": "U29tZSBiaW5hcnkgYmxvYg==" <1>
}
--------------------------------------------------
<1> The Base64 encoded binary value must not have embedded newlines `\n`.
[[binary-params]]
==== Parameters for `binary` fields
The following parameters are accepted by `binary` fields:
[horizontal]
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` or `false` (default).
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).

View File

@ -0,0 +1,119 @@
[[boolean]]
=== Boolean datatype
Boolean fields accept JSON `true` and `false` values, but can also accept
strings and numbers which are interpreted as either true or false:
[horizontal]
False values::
`false`, `"false"`, `"off"`, `"no"`, `"0"`, `""` (empty string), `0`, `0.0`
True values::
Anything that isn't false.
For example:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"is_published": {
"type": "boolean"
}
}
}
}
}
POST my_index/my_type/1
{
"is_published": true <1>
}
GET my_index/_search
{
"query": {
"term": {
"is_published": 1 <2>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Indexing a document with a JSON `true`.
<2> Querying for the document with `1`, which is interpreted as `true`.
Aggregations like the <<search-aggregations-bucket-terms-aggregation,`terms`
aggregation>> use `1` and `0` for the `key`, and the strings `"true"` and
`"false"` for the `key_as_string`. Boolean fields when used in scripts,
return `1` and `0`:
[source,js]
--------------------------------------------------
POST my_index/my_type/1
{
"is_published": true
}
POST my_index/my_type/2
{
"is_published": false
}
GET my_index/_search
{
"aggs": {
"publish_state": {
"terms": {
"field": "is_published"
}
}
},
"script_fields": {
"is_published": {
"script": "doc['is_published'].value" <1>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work.
[[boolean-params]]
==== Parameters for `boolean` fields
The following parameters are accepted by `boolean` fields:
[horizontal]
<<index-boost,`boost`>>::
Field-level index time boosting. Accepts a floating point number, defaults
to `1.0`.
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` (default) or `false`.
<<mapping-index,`index`>>::
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
<<null-value,`null_value`>>::
Accepts any of the true or false values listed above. The value is
substituted for any explicit `null` values. Defaults to `null`, which
means the field is treated as missing.
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).

View File

@ -1,649 +0,0 @@
[[mapping-core-types]]
=== Core Types
Each JSON field can be mapped to a specific core type. JSON itself
already provides us with some typing, with its support for `string`,
`integer`/`long`, `float`/`double`, `boolean`, and `null`.
The following sample tweet JSON document will be used to explain the
core types:
[source,js]
--------------------------------------------------
{
"tweet" {
"user" : "kimchy",
"message" : "This is a tweet!",
"postDate" : "2009-11-15T14:12:12",
"priority" : 4,
"rank" : 12.3
}
}
--------------------------------------------------
Explicit mapping for the above JSON tweet can be:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"user" : {"type" : "string", "index" : "not_analyzed"},
"message" : {"type" : "string", "null_value" : "na"},
"postDate" : {"type" : "date"},
"priority" : {"type" : "integer"},
"rank" : {"type" : "float"}
}
}
}
--------------------------------------------------
[float]
[[string]]
==== String
The text based string type is the most basic type, and contains one or
more characters. An example mapping can be:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"message" : {
"type" : "string",
"store" : true,
"index" : "analyzed",
"null_value" : "na"
},
"user" : {
"type" : "string",
"index" : "not_analyzed",
"norms" : {
"enabled" : false
}
}
}
}
}
--------------------------------------------------
The above mapping defines a `string` `message` property/field within the
`tweet` type. The field is stored in the index (so it can later be
retrieved using selective loading when searching), and it gets analyzed
(broken down into searchable terms). If the message has a `null` value,
then the value that will be stored is `na`. There is also a `string` `user`
which is indexed as-is (not broken down into tokens) and has norms
disabled (so that matching this field is a binary decision, no match is
better than another one).
The following table lists all the attributes that can be used with the
`string` type:
[cols="<,<",options="header",]
|=======================================================================
|Attribute |Description
|`index_name` |The name of the field that will be stored in the index.
Defaults to the property/field name.
|`store` |Set to `true` to actually store the field in the index, `false` to not
store it. Since by default Elasticsearch stores all fields of the source
document in the special `_source` field, this option is primarily useful when
the `_source` field has been disabled in the type definition. Defaults to
`false`.
|`index` |Set to `analyzed` for the field to be indexed and searchable
after being broken down into token using an analyzer. `not_analyzed`
means that its still searchable, but does not go through any analysis
process or broken down into tokens. `no` means that it won't be
searchable at all (as an individual field; it may still be included in
`_all`). Setting to `no` disables `include_in_all`. Defaults to
`analyzed`.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
Automatically set to `true` when the <<fielddata-formats,`fielddata` format>> is `doc_values`.
|`term_vector` |Possible values are `no`, `yes`, `with_offsets`,
`with_positions`, `with_positions_offsets`. Defaults to `no`.
|`boost` |The boost value. Defaults to `1.0`.
|`null_value` |When there is a (JSON) null value for the field, use the
`null_value` as the field value. Defaults to not adding the field at
all.
|`norms: {enabled: <value>}` |Boolean value if norms should be enabled or
not. Defaults to `true` for `analyzed` fields, and to `false` for
`not_analyzed` fields. See the <<norms,section about norms>>.
|`norms: {loading: <value>}` |Describes how norms should be loaded, possible values are
`eager` and `lazy` (default). It is possible to change the default value to
eager for all fields by configuring the index setting `index.norms.loading`
to `eager`.
|`index_options` | Allows to set the indexing
options, possible values are `docs` (only doc numbers are indexed),
`freqs` (doc numbers and term frequencies), and `positions` (doc
numbers, term frequencies and positions). Defaults to `positions` for
`analyzed` fields, and to `docs` for `not_analyzed` fields. It
is also possible to set it to `offsets` (doc numbers, term
frequencies, positions and offsets).
|`analyzer` |The analyzer used to analyze the text contents when
`analyzed` during indexing and searching.
Defaults to the globally configured analyzer.
|`search_analyzer` |The analyzer used to analyze the field when searching, which
overrides the value of `analyzer`. Can be updated on an existing field.
|`include_in_all` |Should the field be included in the `_all` field (if
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
defaults to `true` or to the parent `object` type setting.
|`ignore_above` |The analyzer will ignore strings larger than this size.
Useful for generic `not_analyzed` fields that should ignore long text.
This option is also useful for protecting against Lucene's term byte-length
limit of `32766`. Note: the value for `ignore_above` is the _character count_,
but Lucene counts bytes, so if you have UTF-8 text, you may want to set the
limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most 3
bytes.
|`position_offset_gap` |Position increment gap between field instances
with the same field name. Defaults to 0.
|=======================================================================
The `string` type also support custom indexing parameters associated
with the indexed value. For example:
[source,js]
--------------------------------------------------
{
"message" : {
"_value": "boosted value",
"_boost": 2.0
}
}
--------------------------------------------------
The mapping is required to disambiguate the meaning of the document.
Otherwise, the structure would interpret "message" as a value of type
"object". The key `_value` (or `value`) in the inner document specifies
the real string content that should eventually be indexed. The `_boost`
(or `boost`) key specifies the per field document boost (here 2.0).
[float]
[[norms]]
===== Norms
Norms store various normalization factors that are later used (at query time)
in order to compute the score of a document relatively to a query.
Although useful for scoring, norms also require quite a lot of memory
(typically in the order of one byte per document per field in your index,
even for documents that don't have this specific field). As a consequence, if
you don't need scoring on a specific field, it is highly recommended to disable
norms on it. In particular, this is the case for fields that are used solely
for filtering or aggregations.
In case you would like to disable norms after the fact, it is possible to do so
by using the <<indices-put-mapping,PUT mapping API>>, like this:
[source,js]
------------
PUT my_index/_mapping/my_type
{
"properties": {
"title": {
"type": "string",
"norms": {
"enabled": false
}
}
}
}
------------
Please however note that norms won't be removed instantly, but will be removed
as old segments are merged into new segments as you continue indexing new documents.
Any score computation on a field that has had
norms removed might return inconsistent results since some documents won't have
norms anymore while other documents might still have norms.
[float]
[[number]]
==== Number
A number based type supporting `float`, `double`, `byte`, `short`,
`integer`, and `long`. It uses specific constructs within Lucene in
order to support numeric values. The number types have the same ranges
as corresponding
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html[Java
types]. An example mapping can be:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"rank" : {
"type" : "float",
"null_value" : 1.0
}
}
}
}
--------------------------------------------------
The following table lists all the attributes that can be used with a
numbered type:
[cols="<,<",options="header",]
|=======================================================================
|Attribute |Description
|`type` |The type of the number. Can be `float`, `double`, `integer`,
`long`, `short`, `byte`. Required.
|`index_name` |The name of the field that will be stored in the index.
Defaults to the property/field name.
|`store` |Set to `true` to store actual field in the index, `false` to not
store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. Setting to
`no` disables `include_in_all`. If set to `no` the field should be either stored
in `_source`, have `include_in_all` enabled, or `store` be set to
`true` for this to be useful.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
Automatically set to `true` when the fielddata format is `doc_values`.
|`precision_step` |The precision step (influences the number of terms
generated for each number value). Defaults to `16` for `long`, `double`,
`8` for `short`, `integer`, `float`, and `2147483647` for `byte`.
|`boost` |The boost value. Defaults to `1.0`.
|`null_value` |When there is a (JSON) null value for the field, use the
`null_value` as the field value. Defaults to not adding the field at
all.
|`include_in_all` |Should the field be included in the `_all` field (if
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
defaults to `true` or to the parent `object` type setting.
|`ignore_malformed` |Ignored a malformed number. Defaults to `false`.
|`coerce` |Try convert strings to numbers and truncate fractions for integers. Defaults to `true`.
|=======================================================================
[float]
[[token_count]]
==== Token Count
The `token_count` type maps to the JSON string type but indexes and stores
the number of tokens in the string rather than the string itself. For
example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"name" : {
"type" : "string",
"fields" : {
"word_count": {
"type" : "token_count",
"store" : "yes",
"analyzer" : "standard"
}
}
}
}
}
}
--------------------------------------------------
All the configuration that can be specified for a number can be specified
for a token_count. The only extra configuration is the required
`analyzer` field which specifies which analyzer to use to break the string
into tokens. For best performance, use an analyzer with no token filters.
[NOTE]
===================================================================
Technically the `token_count` type sums position increments rather than
counting tokens. This means that even if the analyzer filters out stop
words they are included in the count.
===================================================================
[float]
[[date]]
==== Date
The date type is a special type which maps to JSON string type. It
follows a specific format that can be explicitly set. All dates are
`UTC`. Internally, a date maps to a number type `long`, with the added
parsing stage from string to long and from long to string. An example
mapping:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"postDate" : {
"type" : "date",
"format" : "YYYY-MM-dd"
}
}
}
}
--------------------------------------------------
The date type will also accept a long number representing UTC
milliseconds since the epoch, regardless of the format it can handle.
The following table lists all the attributes that can be used with a
date type:
[cols="<,<",options="header",]
|=======================================================================
|Attribute |Description
|`index_name` |The name of the field that will be stored in the index.
Defaults to the property/field name.
|`format` |The <<mapping-date-format,date
format>>. Defaults to `epoch_millis||strictDateOptionalTime`.
|`store` |Set to `true` to store actual field in the index, `false` to not
store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. Setting to
`no` disables `include_in_all`. If set to `no` the field should be either stored
in `_source`, have `include_in_all` enabled, or `store` be set to
`true` for this to be useful.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
Automatically set to `true` when the fielddata format is `doc_values`.
|`precision_step` |The precision step (influences the number of terms
generated for each number value). Defaults to `16`.
|`boost` |The boost value. Defaults to `1.0`.
|`null_value` |When there is a (JSON) null value for the field, use the
`null_value` as the field value. Defaults to not adding the field at
all.
|`include_in_all` |Should the field be included in the `_all` field (if
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
defaults to `true` or to the parent `object` type setting.
|`ignore_malformed` |Ignored a malformed number. Defaults to `false`.
|=======================================================================
[float]
[[boolean]]
==== Boolean
The boolean type Maps to the JSON boolean type. It ends up storing
within the index either `T` or `F`, with automatic translation to `true`
and `false` respectively.
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"hes_my_special_tweet" : {
"type" : "boolean"
}
}
}
}
--------------------------------------------------
The boolean type also supports passing the value as a number or a string
(in this case `0`, an empty string, `false`, `off` and `no` are
`false`, all other values are `true`).
The following table lists all the attributes that can be used with the
boolean type:
[cols="<,<",options="header",]
|=======================================================================
|Attribute |Description
|`index_name` |The name of the field that will be stored in the index.
Defaults to the property/field name.
|`store` |Set to `true` to store actual field in the index, `false` to not
store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. Setting to
`no` disables `include_in_all`. If set to `no` the field should be either stored
in `_source`, have `include_in_all` enabled, or `store` be set to
`true` for this to be useful.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
Automatically set to `true` when the fielddata format is `doc_values`.
|`boost` |The boost value. Defaults to `1.0`.
|`null_value` |When there is a (JSON) null value for the field, use the
`null_value` as the field value. Defaults to not adding the field at
all.
|=======================================================================
[float]
[[binary]]
==== Binary
The binary type is a base64 representation of binary data that can be
stored in the index. The field is not stored by default and not indexed at
all.
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"image" : {
"type" : "binary"
}
}
}
}
--------------------------------------------------
The following table lists all the attributes that can be used with the
binary type:
[horizontal]
`index_name`::
The name of the field that will be stored in the index. Defaults to the
property/field name.
`store`::
Set to `true` to store actual field in the index, `false` to not store it.
Defaults to `false` (note, the JSON document itself is already stored, so
the binary field can be retrieved from there).
`doc_values`::
Set to `true` to store field values in a column-stride fashion.
[float]
[[fielddata-filters]]
==== Fielddata filters
It is possible to control which field values are loaded into memory,
which is particularly useful for aggregations on string fields, using
fielddata filters, which are explained in detail in the
<<modules-fielddata,Fielddata>> section.
Fielddata filters can exclude terms which do not match a regex, or which
don't fall between a `min` and `max` frequency range:
[source,js]
--------------------------------------------------
{
tweet: {
type: "string",
analyzer: "whitespace"
fielddata: {
filter: {
regex: {
"pattern": "^#.*"
},
frequency: {
min: 0.001,
max: 0.1,
min_segment_size: 500
}
}
}
}
}
--------------------------------------------------
These filters can be updated on an existing field mapping and will take
effect the next time the fielddata for a segment is loaded. Use the
<<indices-clearcache,Clear Cache>> API
to reload the fielddata using the new filters.
[float]
==== Similarity
Elasticsearch allows you to configure a similarity (scoring algorithm) per field.
The `similarity` setting provides a simple way of choosing a similarity algorithm
other than the default TF/IDF, such as `BM25`.
You can configure similarities via the
<<index-modules-similarity,similarity module>>
[float]
===== Configuring Similarity per Field
Defining the Similarity for a field is done via the `similarity` mapping
property, as this example shows:
[source,js]
--------------------------------------------------
{
"book":{
"properties":{
"title":{
"type":"string", "similarity":"BM25"
}
}
}
}
--------------------------------------------------
The following Similarities are configured out-of-box:
`default`::
The Default TF/IDF algorithm used by Elasticsearch and
Lucene in previous versions.
`BM25`::
The BM25 algorithm.
http://en.wikipedia.org/wiki/Okapi_BM25[See Okapi_BM25] for more
details.
[[copy-to]]
[float]
===== Copy to field
Adding `copy_to` parameter to any field mapping will cause all values of this field to be copied to fields specified in
the parameter. In the following example all values from fields `title` and `abstract` will be copied to the field
`meta_data`. The field which is being copied to will be indexed (i.e. searchable, and available through `fielddata_field`) but the original source will not be modified.
[source,js]
--------------------------------------------------
{
"book" : {
"properties" : {
"title" : { "type" : "string", "copy_to" : "meta_data" },
"abstract" : { "type" : "string", "copy_to" : "meta_data" },
"meta_data" : { "type" : "string" }
}
}
--------------------------------------------------
Multiple fields are also supported:
[source,js]
--------------------------------------------------
{
"book" : {
"properties" : {
"title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] }
}
}
--------------------------------------------------
[float]
[[multi-fields]]
===== Multi fields
The `fields` options allows to map several core types fields into a single
json source field. This can be useful if a single field need to be
used in different ways. For example a single field is to be used for both
free text search and sorting.
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"name" : {
"type" : "string",
"index" : "analyzed",
"fields" : {
"raw" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
}
}
--------------------------------------------------
In the above example the field `name` gets processed twice. The first time it gets
processed as an analyzed string and this version is accessible under the field name
`name`, this is the main field and is in fact just like any other field. The second time
it gets processed as a not analyzed string and is accessible under the name `name.raw`.
[float]
==== Include in All
The `include_in_all` setting is ignored on any field that is defined in
the `fields` options. Setting the `include_in_all` only makes sense on
the main field, since the raw field value is copied to the `_all` field,
the tokens aren't copied.
[float]
==== Updating a field
In essence a field cannot be updated. However multi fields can be
added to existing fields. This allows for example to have a different
`analyzer` configuration in addition to the already configured
`analyzer` configuration specified in the main and other multi fields.
Also the new multi field will only be applied on document that have been
added after the multi field has been added and in fact the new multi field
doesn't exist in existing documents.
Another important note is that new multi fields will be merged into the
list of existing multi fields, so when adding new multi fields for a field
previous added multi fields don't need to be specified.

View File

@ -0,0 +1,138 @@
[[date]]
=== Date datatype
JSON doesn't have a date datatype, so dates in Elasticsearch can either be:
* strings containing formatted dates, e.g. `¨2015-01-01¨` or `¨2015/01/01 12:10:30`.
* a long number representing _milliseconds-since-the-epoch_.
* an integer representing _seconds-since-the-epoch_.
Internally, dates are converted to UTC (if the time-zone is specified) and
stored as a long number representing milliseconds-since-the-epoch.
Date formats can be customised, but if no `format` is specified then it uses
the default: `strictDateOptionalTime||epoch_millis`. This means that it will
accept dates with optional timestamps, which conform to the formats supported
by <<strict-date-time,`strictDateOptionalTime`>> or milliseconds-since-the-epoch.
For instance:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date" <1>
}
}
}
}
}
PUT my_index/my_type/1
{ "date": "2015-01-01" } <2>
PUT my_index/my_type/2
{ "date": "2015-01-01T12:10:30Z" } <3>
PUT my_index/my_type/3
{ "date": 1420070400001 } <4>
GET my_index/_search
{
"sort": { "date": "asc"} <5>
}
--------------------------------------------------
// AUTOSENSE
<1> The `date` field uses the default `format`.
<2> This document uses a plain date.
<3> This document includes a time.
<4> This document uses milliseconds-since-the-epoch.
<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch.
[[multiple-date-formats]]
==== Multiple date formats
Multiple formats can be specified by separating them with `||` as a separator.
Each format will be tried in turn until a matching format is found. The first
format will be used to convert the _milliseconds-since-the-epoch_ value back
into a string.
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"date": {
"type": "date",
"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
[[date-params]]
==== Parameters for `date` fields
The following parameters are accepted by `date` fields:
[horizontal]
<<index-boost,`boost`>>::
Field-level index time boosting. Accepts a floating point number, defaults
to `1.0`.
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` (default) or `false`.
<<mapping-date-format,`format`>>::
The date format(s) that can be parsed. Defaults to
`epoch_millis||strictDateOptionalTime`.
<<ignore-malformed,`ignore_malformed`>>::
If `true`, malformed numbers are ignored. If `false` (default), malformed
numbers throw an exception and reject the whole document.
<<include-in-all,`include_in_all`>>::
Whether or not the field value should be included in the
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
<<object,`object`>> field sets `include_in_all` to `false`.
Otherwise defaults to `true`.
<<mapping-index,`index`>>::
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
<<null-value,`null_value`>>::
Accepts a date value in one of the configured +format+'s as the field
which is substituted for any explicit `null` values. Defaults to `null`,
which means the field is treated as missing.
<<precision-step,`precision_step`>>::
Controls the number of extra terms that are indexed to make
<<query-dsl-range-query,`range` queries>> faster. Defaults to `16`.
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).

View File

@ -1,215 +0,0 @@
[[mapping-geo-point-type]]
=== Geo Point Type
Mapper type called `geo_point` to support geo based points. The
declaration looks as follows:
[source,js]
--------------------------------------------------
{
"pin" : {
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
}
--------------------------------------------------
[float]
==== Indexed Fields
The `geo_point` mapping will index a single field with the format of
`lat,lon`. The `lat_lon` option can be set to also index the `.lat` and
`.lon` as numeric fields, and `geohash` can be set to `true` to also
index `.geohash` value.
A good practice is to enable indexing `lat_lon` as well, since both the
geo distance and bounding box filters can either be executed using in
memory checks, or using the indexed lat lon values, and it really
depends on the data set which one performs better. Note though, that
indexed lat lon only make sense when there is a single geo point value
for the field, and not multi values.
[float]
==== Geohashes
Geohashes are a form of lat/lon encoding which divides the earth up into
a grid. Each cell in this grid is represented by a geohash string. Each
cell in turn can be further subdivided into smaller cells which are
represented by a longer string. So the longer the geohash, the smaller
(and thus more accurate) the cell is.
Because geohashes are just strings, they can be stored in an inverted
index like any other string, which makes querying them very efficient.
If you enable the `geohash` option, a `geohash` ``sub-field'' will be
indexed as, eg `pin.geohash`. The length of the geohash is controlled by
the `geohash_precision` parameter, which can either be set to an absolute
length (eg `12`, the default) or to a distance (eg `1km`).
More usefully, set the `geohash_prefix` option to `true` to not only index
the geohash value, but all the enclosing cells as well. For instance, a
geohash of `u30` will be indexed as `[u,u3,u30]`. This option can be used
by the <<query-dsl-geohash-cell-query>> to find geopoints within a
particular cell very efficiently.
[float]
==== Input Structure
The above mapping defines a `geo_point`, which accepts different
formats. The following formats are supported:
[float]
===== Lat Lon as Properties
[source,js]
--------------------------------------------------
{
"pin" : {
"location" : {
"lat" : 41.12,
"lon" : -71.34
}
}
}
--------------------------------------------------
[float]
===== Lat Lon as String
Format in `lat,lon`.
[source,js]
--------------------------------------------------
{
"pin" : {
"location" : "41.12,-71.34"
}
}
--------------------------------------------------
[float]
===== Geohash
[source,js]
--------------------------------------------------
{
"pin" : {
"location" : "drm3btev3e86"
}
}
--------------------------------------------------
[float]
===== Lat Lon as Array
Format in `[lon, lat]`, note, the order of lon/lat here in order to
conform with http://geojson.org/[GeoJSON].
[source,js]
--------------------------------------------------
{
"pin" : {
"location" : [-71.34, 41.12]
}
}
--------------------------------------------------
[float]
==== Mapping Options
[cols="<,<",options="header",]
|=======================================================================
|Option |Description
|`lat_lon` |Set to `true` to also index the `.lat` and `.lon` as fields.
Defaults to `false`.
|`geohash` |Set to `true` to also index the `.geohash` as a field.
Defaults to `false`.
|`geohash_precision` |Sets the geohash precision. It can be set to an
absolute geohash length or a distance value (eg 1km, 1m, 1ml) defining
the size of the smallest cell. Defaults to an absolute length of 12.
|`geohash_prefix` |If this option is set to `true`, not only the geohash
but also all its parent cells (true prefixes) will be indexed as well. The
number of terms that will be indexed depends on the `geohash_precision`.
Defaults to `false`. *Note*: This option implicitly enables `geohash`.
|`validate` |Set to `false` to accept geo points with invalid latitude or
longitude (default is `true`). *Note*: Validation only works when
normalization has been disabled. This option will be deprecated and removed
in upcoming releases.
|`validate_lat` |Set to `false` to accept geo points with an invalid
latitude (default is `true`). This option will be deprecated and removed
in upcoming releases.
|`validate_lon` |Set to `false` to accept geo points with an invalid
longitude (default is `true`). This option will be deprecated and removed
in upcoming releases.
|`normalize` |Set to `true` to normalize latitude and longitude (default
is `true`).
|`normalize_lat` |Set to `true` to normalize latitude.
|`normalize_lon` |Set to `true` to normalize longitude.
|`precision_step` |The precision step (influences the number of terms
generated for each number value) for `.lat` and `.lon` fields
if `lat_lon` is set to `true`.
Defaults to `16`.
|=======================================================================
[float]
==== Field data
By default, geo points use the `array` format which loads geo points into two
parallel double arrays, making sure there is no precision loss. However, this
can require a non-negligible amount of memory (16 bytes per document) which is
why Elasticsearch also provides a field data implementation with lossy
compression called `compressed`:
[source,js]
--------------------------------------------------
{
"pin" : {
"properties" : {
"location" : {
"type" : "geo_point",
"fielddata" : {
"format" : "compressed",
"precision" : "1cm"
}
}
}
}
}
--------------------------------------------------
This field data format comes with a `precision` option which allows to
configure how much precision can be traded for memory. The default value is
`1cm`. The following table presents values of the memory savings given various
precisions:
|=============================================
| Precision | Bytes per point | Size reduction
| 1km | 4 | 75%
| 3m | 6 | 62.5%
| 1cm | 8 | 50%
| 1mm | 10 | 37.5%
|=============================================
Precision can be changed on a live index by using the update mapping API.
[float]
==== Usage in Scripts
When using `doc[geo_field_name]` (in the above mapping,
`doc['location']`), the `doc[...].value` returns a `GeoPoint`, which
then allows access to `lat` and `lon` (for example,
`doc[...].value.lat`). For performance, it is better to access the `lat`
and `lon` directly using `doc[...].lat` and `doc[...].lon`.

View File

@ -0,0 +1,167 @@
[[geo-point]]
=== Geo-point datatype
Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
* to find geo-points within a <<query-dsl-geo-bounding-box-query,bounding box>>,
within a certain <<query-dsl-geo-distance-query,distance>> of a central point,
within a <<query-dsl-geo-polygon-query,polygon>>, or within a
<<query-dsl-geohash-cell-query,geohash>> cell.
* to aggregate documents by <<search-aggregations-bucket-geohashgrid-aggregation,geographically>>
or by <<search-aggregations-bucket-geodistance-aggregation,distance>> from a central point.
* to integerate distance into a document's <<query-dsl-function-score-query,relevance score>>.
* to <<geo-sorting,sort>> documents by distance.
There are four ways that a geo-point may be specified, as demonstrated below:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Geo-point as an object",
"location": { <1>
"lat": 41.12,
"lon": -71.34
}
}
PUT my_index/my_type/2
{
"text": "Geo-point as a string",
"location": "41.12,-71.34" <2>
}
PUT my_index/my_type/3
{
"text": "Geo-point as a geohash",
"location": "drm3btev3e86" <3>
}
PUT my_index/my_type/4
{
"text": "Geo-point as an array",
"location": [ -71.34, 41.12 ] <4>
}
GET my_index/_search
{
"query": {
"geo_bounding_box": { <5>
"location": {
"top_left": {
"lat": 42,
"lon": -72
},
"bottom_right": {
"lat": 40,
"lon": -74
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> Geo-point expressed as an object, with `lat` and `lon` keys.
<2> Geo-point expressed as a string with the format: `"lat,lon"`.
<3> Geo-point expressed as a geohash.
<4> Geo-point expressed as an array with the format: [ `lon`, `lat`]
<5> A geo-bounding box query which finds all geo-points that fall inside the box.
[IMPORTANT]
.Geo-points expressed as an array or string
==================================================
Please note that string geo-points are ordered as `lat,lon`, while array
geo-points are ordered as the reverse: `lon,lat`.
Originally, `lat,lon` was used for both array and string, but the array
format was changed early on to conform to the format used by GeoJSON.
==================================================
[[geo-point-params]]
==== Parameters for `geo_point` fields
The following parameters are accepted by `geo_point` fields:
[horizontal]
<<coerce,`coerce`>>::
Normalize longitude and latitude values to a standard -180:180 / -90:90
coordinate system. Accepts `true` and `false` (default).
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` (default) or `false`.
<<geohash,`geohash`>>::
Should the geo-point also be indexed as a geohash in the `.geohash`
sub-field? Defaults to `false`, unless `geohash_prefix` is `true`.
<<geohash-precision,`geohash_precision`>>::
The maximum length of the geohash to use for the `geohash` and
`geohash_prefix` options.
<<geohash-prefix,`geohash_prefix`>>::
Should the geo-point also be indexed as a geohash plus all its prefixes?
Defaults to `false`.
<<ignore-malformed,`ignore_malformed`>>::
If `true`, malformed geo-points are ignored. If `false` (default),
malformed geo-points throw an exception and reject the whole document.
<<lat-lon,`lat_lon`>>::
Should the geo-point also be indexed as `.lat` and `.lon` sub-fields?
Accepts `true` and `false` (default).
<<precision-step,`precision_step`>>::
Controls the number of extra terms that are indexed for each lat/lon point.
Defaults to `16`. Ignored if `lat_lon` is `false`.
==== Using geo-points in scripts
When accessing the value of a geo-point in a script, the value is returned as
a `GeoPoint` object, which allows access to the `.lat` and `.lon` values
respectively:
[source,js]
--------------------------------------------------
geopoint = doc['location'].value;
lat = geopoint.lat;
lon = geopoint.lon;
--------------------------------------------------
For performance reasons, it is better to access the lat/lon values directly:
[source,js]
--------------------------------------------------
lat = doc['location'].lat;
lon = doc['location'].lon;
--------------------------------------------------

View File

@ -1,7 +1,7 @@
[[mapping-geo-shape-type]] [[geo-shape]]
=== Geo Shape Type === Geo-Shape datatype
The `geo_shape` mapping type facilitates the indexing of and searching The `geo_shape` datatype facilitates the indexing of and searching
with arbitrary geo shapes such as rectangles and polygons. It should be with arbitrary geo shapes such as rectangles and polygons. It should be
used when either the data being indexed or the queries being executed used when either the data being indexed or the queries being executed
contain shapes other than just points. contain shapes other than just points.

View File

@ -1,40 +0,0 @@
[[mapping-ip-type]]
=== IP Type
An `ip` mapping type allows to store _ipv4_ addresses in a numeric form
allowing to easily sort, and range query it (using ip values).
The following table lists all the attributes that can be used with an ip
type:
[cols="<,<",options="header",]
|=======================================================================
|Attribute |Description
|`index_name` |The name of the field that will be stored in the index.
Defaults to the property/field name.
|`store` |Set to `true` to store actual field in the index, `false` to not
store it. Defaults to `false` (note, the JSON document itself is stored,
and it can be retrieved from it).
|`index` |Set to `no` if the value should not be indexed. In this case,
`store` should be set to `true`, since if it's not indexed and not
stored, there is nothing to do with it.
|`precision_step` |The precision step (influences the number of terms
generated for each number value). Defaults to `16`.
|`boost` |The boost value. Defaults to `1.0`.
|`null_value` |When there is a (JSON) null value for the field, use the
`null_value` as the field value. Defaults to not adding the field at
all.
|`include_in_all` |Should the field be included in the `_all` field (if
enabled). Defaults to `true` or to the parent `object` type setting.
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
Automatically set to `true` when the <<fielddata-formats,`fielddata` format>> is `doc_values`.
|=======================================================================

View File

@ -0,0 +1,89 @@
[[ip]]
=== IPv4 datatype
An `ip` field is really a <<number,`long`>> field which accepts
https://en.wikipedia.org/wiki/IPv4[IPv4] addresses and indexes them as long
values:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"ip_addr": {
"type": "ip"
}
}
}
}
}
PUT my_index/my_type/1
{
"ip_addr": "192.168.1.1"
}
GET my_index/_search
{
"query": {
"range": {
"ip_addr": {
"gte": "192.168.1.0",
"lt": "192.168.2.0"
}
}
}
}
--------------------------------------------------
// AUTOSENSE
[[ip-params]]
==== Parameters for `ip` fields
The following parameters are accepted by `ip` fields:
[horizontal]
<<index-boost,`boost`>>::
Field-level index time boosting. Accepts a floating point number, defaults
to `1.0`.
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` (default) or `false`.
<<include-in-all,`include_in_all`>>::
Whether or not the field value should be included in the
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
<<object,`object`>> field sets `include_in_all` to `false`.
Otherwise defaults to `true`.
<<mapping-index,`index`>>::
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
<<null-value,`null_value`>>::
Accepts an IPv4 value which is substituted for any explicit `null` values.
Defaults to `null`, which means the field is treated as missing.
<<precision-step,`precision_step`>>::
Controls the number of extra terms that are indexed to make
<<query-dsl-range-query,`range` queries>> faster. Defaults to `16`.
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).
NOTE: IPv6 addresses are not supported yet.

View File

@ -1,165 +0,0 @@
[[mapping-nested-type]]
=== Nested Type
The `nested` type works like the <<mapping-object-type,`object` type>> except
that an array of `objects` is flattened, while an array of `nested` objects
allows each object to be queried independently. To explain, consider this
document:
[source,js]
--------------------------------------------------
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
},
]
}
--------------------------------------------------
If the `user` field is of type `object`, this document would be indexed
internally something like this:
[source,js]
--------------------------------------------------
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
--------------------------------------------------
The `first` and `last` fields are flattened, and the association between
`alice` and `white` is lost. This document would incorrectly match a query
for `alice AND smith`.
If the `user` field is of type `nested`, each object is indexed as a separate
document, something like this:
[source,js]
--------------------------------------------------
{ <1>
"user.first" : "alice",
"user.last" : "white"
}
{ <1>
"user.first" : "john",
"user.last" : "smith"
}
{ <2>
"group" : "fans"
}
--------------------------------------------------
<1> Hidden nested documents.
<2> Visible ``parent'' document.
By keeping each nested object separate, the association between the
`user.first` and `user.last` fields is maintained. The query for `alice AND
smith` would *not* match this document.
Searching on nested docs can be done using either the
<<query-dsl-nested-query,nested query>>.
==== Mapping
The mapping for `nested` fields is the same as `object` fields, except that it
uses type `nested`:
[source,js]
--------------------------------------------------
{
"type1" : {
"properties" : {
"user" : {
"type" : "nested",
"properties": {
"first" : {"type": "string" },
"last" : {"type": "string" }
}
}
}
}
}
--------------------------------------------------
NOTE: changing an `object` type to `nested` type requires reindexing.
You may want to index inner objects both as `nested` fields *and* as flattened
`object` fields, eg for highlighting. This can be achieved by setting
`include_in_parent` to `true`:
[source,js]
--------------------------------------------------
{
"type1" : {
"properties" : {
"user" : {
"type" : "nested",
"include_in_parent": true,
"properties": {
"first" : {"type": "string" },
"last" : {"type": "string" }
}
}
}
}
}
--------------------------------------------------
The result of indexing our example document would be something like this:
[source,js]
--------------------------------------------------
{ <1>
"user.first" : "alice",
"user.last" : "white"
}
{ <1>
"user.first" : "john",
"user.last" : "smith"
}
{ <2>
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
--------------------------------------------------
<1> Hidden nested documents.
<2> Visible ``parent'' document.
Nested fields may contain other nested fields. The `include_in_parent` object
refers to the direct parent of the field, while the `include_in_root`
parameter refers only to the topmost ``root'' object or document.
NOTE: The `include_in_parent` and `include_in_root` options do not apply
to <<mapping-geo-shape-type,`geo_shape` fields>>, which are only ever
indexed inside the nested document.
Nested docs will automatically use the root doc `_all` field only.
.Internal Implementation
*********************************************
Internally, nested objects are indexed as additional documents, but,
since they can be guaranteed to be indexed within the same "block", it
allows for extremely fast joining with parent docs.
Those internal nested documents are automatically masked away when doing
operations against the index (like searching with a match_all query),
and they bubble out when using the nested query.
Because nested docs are always masked to the parent doc, the nested docs
can never be accessed outside the scope of the `nested` query. For example
stored fields can be enabled on fields inside nested objects, but there is
no way of retrieving them, since stored fields are fetched outside of
the `nested` query scope.
The `_source` field is always associated with the parent document and
because of that field values via the source can be fetched for nested object.
*********************************************

View File

@ -0,0 +1,201 @@
[[nested]]
=== Nested datatype
The `nested` type is a specialised version of the <<object,`object`>> datatype
that allows arrays of objects to be indexed and queried independently of each
other.
==== How arrays of objects are flattened
Arrays of inner <<object,`object` fields>> do not work the way you may expect.
Lucene has no concept of inner objects, so Elasticsearch flattens object
hierarchies into a simple list of field names and values. For instance, the
following document:
[source,js]
--------------------------------------------------
PUT my_index/my_type/1
{
"group" : "fans",
"user" : [ <1>
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
--------------------------------------------------
// AUTOSENSE
<1> The `user` field is dynamically added as a field of type `object`.
would be transformed internally into a document that looks more like this:
[source,js]
--------------------------------------------------
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
--------------------------------------------------
The `user.first` and `user.last` fields are flattened into multi-value fields,
and the association between `alice` and `white` is lost. This document would
incorrectly match a query for `alice AND smith`:
[source,js]
--------------------------------------------------
GET my_index/_search
{
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "White" }}
]
}
}
}
--------------------------------------------------
// AUTOSENSE
==== Using `nested` fields for arrays of objects
If you need to index arrays of objects and to maintain the independence of
each object in the array, you should used the `nested` datatype instead of the
<<object,`object`>> datatype. Internally, nested objects index each object in
the array as a separate hidden document, meaning that each nested object can be
queried independently of the others, with the <<query-dsl-nested-query,`nested` query>>:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"user": {
"type": "nested" <1>
}
}
}
}
}
PUT my_index/my_type/1
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
GET my_index/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "White" }} <2>
]
}
}
}
}
}
GET my_index/_search
{
"query": {
"nested": {
"path": "user",
"query": {
"bool": {
"must": [
{ "match": { "user.first": "Alice" }},
{ "match": { "user.last": "Smith" }} <3>
]
}
},
"inner_hits": { <4>
"highlight": {
"fields": {
"user.first": {}
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `user` field is mapped as type `nested` instead of type `object`.
<2> This query doesn't match because `Alice` and `White` are not in the same nested object.
<3> This query matches because `Alice` and `White` are in the same nested object.
<4> `inner_hits` allow us to highlight the matching nested documents.
Nested documents can be:
* queried with the <<query-dsl-nested-query,`nested`>> query.
* analyzed with the <<search-aggregations-bucket-nested-aggregation,`nested`>>
and <<search-aggregations-bucket-reverse-nested-aggregation, `reverse_nested`>>
aggregations.
* sorted with <<nested-sorting,nested sorting>>.
* retrieved and highlighted with <<nested-inner-hits,nested inner hits>>.
[[nested-params]]
==== Parameters for `nested` fields
The following parameters are accepted by `nested` fields:
[horizontal]
<<dynamic,`dynamic`>>::
Whether or not new `properties` should be added dynamically to an existing
nested object. Accepts `true` (default), `false` and `strict`.
<<include-in-all,`include_in_all`>>::
Sets the default `include_in_all` value for all the `properties` within
the nested object. Nested documents do not have their own `_all` field.
Instead, values are added to the `_all` field of the main ``root''
document.
<<properties,`properties`>>::
The fields within the nested object, which can be of any
<<mapping-types,datatype>>, including `nested`. New properties
may be added to an existing nested object.
[IMPORTANT]
=============================================
Because nested documents are indexed as separate documents, they can only be
accessed within the scope of the `nested` query, the
`nested`/`reverse_nested`, or <<nested-inner-hits,nested inner hits>>.
For instance, if a string field within a nested document has
<<index-options,`index_options`>> set to `offsets` to allow use of the postings
highlighter, these offsets will not be available during the main highlighting
phase. Instead, highlighting needs to be performed via
<<nested-inner-hits,nested inner hits>>.
=============================================

View File

@ -0,0 +1,93 @@
[[number]]
=== Numeric datatypes
The following numeric types are supported:
[horizontal]
`long`:: A signed 64-bit integer with a minimum value of +-2^63^+ and a maximum value of +2^63^-1+.
`integer`:: A signed 32-bit integer with a minimum value of +-2^31^+ and a maximum value of +2^31^-1+.
`short`:: A signed 16-bit integer with a minimum value of +-32,768+ and a maximum value of +32,767+.
`byte`:: A signed 8-bit integer with a minimum value of +-128+ and a maximum value of +127+.
`double`:: A double-precision 64-bit IEEE 754 floating point.
`float`:: A single-precision 32-bit IEEE 754 floating point.
Below is an example of configuring a mapping with numeric fields:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"number_of_bytes": {
"type": "integer"
},
"time_in_seconds": {
"type": "float"
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
[[number-params]]
==== Parameters for numeric fields
The following parameters are accepted by numeric types:
[horizontal]
<<coerce,`coerce`>>::
Try to convert strings to numbers and truncate fractions for integers.
Accepts `true` (default) and `false`.
<<index-boost,`boost`>>::
Field-level index time boosting. Accepts a floating point number, defaults
to `1.0`.
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` (default) or `false`.
<<ignore-malformed,`ignore_malformed`>>::
If `true`, malformed numbers are ignored. If `false` (default), malformed
numbers throw an exception and reject the whole document.
<<include-in-all,`include_in_all`>>::
Whether or not the field value should be included in the
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
<<object,`object`>> field sets `include_in_all` to `false`.
Otherwise defaults to `true`.
<<mapping-index,`index`>>::
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
<<null-value,`null_value`>>::
Accepts a numeric value of the same `type` as the field which is
substituted for any explicit `null` values. Defaults to `null`, which
means the field is treated as missing.
<<precision-step,`precision_step`>>::
Controls the number of extra terms that are indexed to make
<<query-dsl-range-query,`range` queries>> faster. The default depends on the
numeric `type`.
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).

View File

@ -1,179 +0,0 @@
[[mapping-object-type]]
=== Object Type
JSON documents are hierarchical in nature, allowing them to define inner
"objects" within the actual JSON. Elasticsearch completely understands
the nature of these inner objects and can map them easily, providing
query support for their inner fields. Because each document can have
objects with different fields each time, objects mapped this way are
known as "dynamic". Dynamic mapping is enabled by default. Let's take
the following JSON as an example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"person" : {
"name" : {
"first_name" : "Shay",
"last_name" : "Banon"
},
"sid" : "12345"
},
"message" : "This is a tweet!"
}
}
--------------------------------------------------
The above shows an example where a tweet includes the actual `person`
details. A `person` is an object, with a `sid`, and a `name` object
which has `first_name` and `last_name`. It's important to note that
`tweet` is also an object, although it is a special
<<mapping-root-object-type,root object type>>
which allows for additional mapping definitions.
The following is an example of explicit mapping for the above JSON:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"person" : {
"type" : "object",
"properties" : {
"name" : {
"type" : "object",
"properties" : {
"first_name" : {"type" : "string"},
"last_name" : {"type" : "string"}
}
},
"sid" : {"type" : "string", "index" : "not_analyzed"}
}
},
"message" : {"type" : "string"}
}
}
}
--------------------------------------------------
In order to mark a mapping of type `object`, set the `type` to object.
This is an optional step, since if there are `properties` defined for
it, it will automatically be identified as an `object` mapping.
[float]
==== properties
An object mapping can optionally define one or more properties using the
`properties` tag for a field. Each property can be either another
`object`, or one of the
<<mapping-core-types,core_types>>.
[float]
==== dynamic
One of the most important features of Elasticsearch is its ability to be
schema-less. This means that, in our example above, the `person` object
can be indexed later with a new property -- `age`, for example -- and it
will automatically be added to the mapping definitions. Same goes for
the `tweet` root object.
This feature is by default turned on, and it's the `dynamic` nature of
each object mapped. Each object mapped is automatically dynamic, though
it can be explicitly turned off:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"person" : {
"type" : "object",
"properties" : {
"name" : {
"dynamic" : false,
"properties" : {
"first_name" : {"type" : "string"},
"last_name" : {"type" : "string"}
}
},
"sid" : {"type" : "string", "index" : "not_analyzed"}
}
},
"message" : {"type" : "string"}
}
}
}
--------------------------------------------------
In the above example, the `name` object mapped is not dynamic, meaning
that if, in the future, we try to index JSON with a `middle_name` within
the `name` object, it will get discarded and not added.
There is no performance overhead if an `object` is dynamic, the ability
to turn it off is provided as a safety mechanism so "malformed" objects
won't, by mistake, index data that we do not wish to be indexed.
If a dynamic object contains yet another inner `object`, it will be
automatically added to the index and mapped as well.
When processing dynamic new fields, their type is automatically derived.
For example, if it is a `number`, it will automatically be treated as
number <<mapping-core-types,core_type>>. Dynamic
fields default to their default attributes, for example, they are not
stored and they are always indexed.
Date fields are special since they are represented as a `string`. Date
fields are detected if they can be parsed as a date when they are first
introduced into the system. The set of date formats that are tested
against can be configured using the `dynamic_date_formats` on the root object,
which is explained later.
Note, once a field has been added, *its type can not change*. For
example, if we added age and its value is a number, then it can't be
treated as a string.
The `dynamic` parameter can also be set to `strict`, meaning that not
only will new fields not be introduced into the mapping, but also that parsing
(indexing) docs with such new fields will fail.
[float]
==== enabled
The `enabled` flag allows to disable parsing and indexing a named object
completely. This is handy when a portion of the JSON document contains
arbitrary JSON which should not be indexed, nor added to the mapping.
For example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"properties" : {
"person" : {
"type" : "object",
"properties" : {
"name" : {
"type" : "object",
"enabled" : false
},
"sid" : {"type" : "string", "index" : "not_analyzed"}
}
},
"message" : {"type" : "string"}
}
}
}
--------------------------------------------------
In the above, `name` and its content will not be indexed at all.
[float]
==== include_in_all
`include_in_all` can be set on the `object` type level. When set, it
propagates down to all the inner mappings defined within the `object`
that do not explicitly set it.

View File

@ -0,0 +1,105 @@
[[object]]
=== Object datatype
JSON documents are hierarchical in nature: the document may contain inner
objects which, in turn, may contain inner objects themselves:
[source,js]
--------------------------------------------------
PUT my_index/my_type/1
{ <1>
"region": "US",
"manager": { <2>
"age": 30,
"name": { <3>
"first": "John",
"last": "Smith"
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The outer document is also a JSON object.
<2> It contains an inner object called `manager`.
<3> Which in turn contains an inner object called `name`.
Internally, this document is indexed as a simple, flat list of key-value
pairs, something like this:
[source,js]
--------------------------------------------------
{
"region": "US",
"manager.age": 30,
"manager.name.first": "John",
"manager.name.last": "Smith"
}
--------------------------------------------------
An explicit mapping for the above document could look like this:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": { <1>
"properties": {
"region": {
"type": "string",
"index": "not_analyzed"
},
"manager": { <2>
"properties": {
"age": { "type": "integer" },
"name": { <3>
"properties": {
"first": { "type": "string" },
"last": { "type": "string" }
}
}
}
}
}
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The mapping type is a type of object, and has a `properties` field.
<2> The `manager` field is an inner `object` field.
<3> The `manager.name` field is an inner `object` field within the `manager` field.
You are not required to set the field `type` to `object` explicitly, as this is the default value.
[[object-params]]
==== Parameters for `object` fields
The following parameters are accepted by `object` fields:
[horizontal]
<<dynamic,`dynamic`>>::
Whether or not new `properties` should be added dynamically
to an existing object. Accepts `true` (default), `false`
and `strict`.
<<enabled,`enabled`>>::
Whether the JSON value given for the object field should be
parsed and indexed (`true`, default) or completely ignored (`false`).
<<include-in-all,`include_in_all`>>::
Sets the default `include_in_all` value for all the `properties` within
the object. The object itself is not added to the `_all` field.
<<properties,`properties`>>::
The fields within the object, which can be of any
<<mapping-types,datatype>>, including `object`. New properties
may be added to an existing object.
IMPORTANT: If you need to index arrays of objects instead of single objects,
read <<nested>> first.

View File

@ -1,190 +0,0 @@
[[mapping-root-object-type]]
=== Root Object Type
The root object mapping is an <<mapping-object-type,object type mapping>> that
maps the root object (the type itself). It supports all of the different
mappings that can be set using the <<mapping-object-type,object type mapping>>.
The root object mapping allows to index a JSON document that only contains its
fields. For example, the following `tweet` JSON can be indexed without
specifying the `tweet` type in the document itself:
[source,js]
--------------------------------------------------
{
"message" : "This is a tweet!"
}
--------------------------------------------------
[float]
==== dynamic_date_formats
`dynamic_date_formats` (old setting called `date_formats` still works)
is the ability to set one or more date formats that will be used to
detect `date` fields. For example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
"properties" : {
"message" : {"type" : "string"}
}
}
}
--------------------------------------------------
In the above mapping, if a new JSON field of type string is detected,
the date formats specified will be used in order to check if its a date.
If it passes parsing, then the field will be declared with `date` type,
and will use the matching format as its format attribute. The date
format itself is explained
<<mapping-date-format,here>>.
The default formats are: `strictDateOptionalTime` (ISO) and
`yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z` and `epoch_millis`.
*Note:* `dynamic_date_formats` are used *only* for dynamically added
date fields, not for `date` fields that you specify in your mapping.
[float]
==== date_detection
Allows to disable automatic date type detection (if a new field is introduced
and matches the provided format), for example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"date_detection" : false,
"properties" : {
"message" : {"type" : "string"}
}
}
}
--------------------------------------------------
[float]
==== numeric_detection
Sometimes, even though json has support for native numeric types,
numeric values are still provided as strings. In order to try and
automatically detect numeric values from string, the `numeric_detection`
can be set to `true`. For example:
[source,js]
--------------------------------------------------
{
"tweet" : {
"numeric_detection" : true,
"properties" : {
"message" : {"type" : "string"}
}
}
}
--------------------------------------------------
[float]
==== dynamic_templates
Dynamic templates allow to define mapping templates that will be applied
when dynamic introduction of fields / objects happens.
IMPORTANT: Dynamic field mappings are only added when a field contains
a concrete value -- not `null` or an empty array. This means that if the `null_value` option
is used in a `dynamic_template`, it will only be applied after the first document
with a concrete value for the field has been indexed.
For example, we might want to have all fields to be stored by default,
or all `string` fields to be stored, or have `string` fields to always
be indexed with multi fields syntax, once analyzed and once not_analyzed.
Here is a simple example:
[source,js]
--------------------------------------------------
{
"person" : {
"dynamic_templates" : [
{
"template_1" : {
"match" : "multi*",
"mapping" : {
"type" : "{dynamic_type}",
"index" : "analyzed",
"fields" : {
"org" : {"type": "{dynamic_type}", "index" : "not_analyzed"}
}
}
}
},
{
"template_2" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
]
}
}
--------------------------------------------------
The above mapping will create a field with multi fields for all field
names starting with multi, and will map all `string` types to be
`not_analyzed`.
Dynamic templates are named to allow for simple merge behavior. A new
mapping, just with a new template can be "put" and that template will be
added, or if it has the same name, the template will be replaced.
The `match` allow to define matching on the field name. An `unmatch`
option is also available to exclude fields if they do match on `match`.
The `match_mapping_type` controls if this template will be applied only
for dynamic fields of the specified type (as guessed by the json
format).
Another option is to use `path_match`, which allows to match the dynamic
template against the "full" dot notation name of the field (for example
`obj1.*.value` or `obj1.obj2.*`), with the respective `path_unmatch`.
The format of all the matching is simple format, allowing to use * as a
matching element supporting simple patterns such as xxx*, *xxx, xxx*yyy
(with arbitrary number of pattern types), as well as direct equality.
The `match_pattern` can be set to `regex` to allow for regular
expression based matching.
The `mapping` element provides the actual mapping definition. The
`{name}` keyword can be used and will be replaced with the actual
dynamic field name being introduced. The `{dynamic_type}` (or
`{dynamicType}`) can be used and will be replaced with the mapping
derived based on the field type (or the derived type, like `date`).
Complete generic settings can also be applied, for example, to have all
mappings be stored, just set:
[source,js]
--------------------------------------------------
{
"person" : {
"dynamic_templates" : [
{
"store_generic" : {
"match" : "*",
"mapping" : {
"store" : true
}
}
}
]
}
}
--------------------------------------------------
Such generic templates should be placed at the end of the
`dynamic_templates` list because when two or more dynamic templates
match a field, only the first matching one from the list is used.

View File

@ -0,0 +1,170 @@
[[string]]
=== String datatype
Fields of type `string` accept text values. Strings may be sub-divided into:
Full text::
+
--
Full text values, like the body of an email, are typically used for text based
relevance searches, such as: _Find the most relevant documents that match a
query for "quick brown fox"_.
These fields are `analyzed`, that is they are passed through an
<<analysis,analyzer>> to convert the string into a list of individual terms
before being indexed. The analysis process allows Elasticsearch to search for
individual words _within_ each full text field. Full text fields are not
used for sorting and seldom used for aggregations (although the
<<search-aggregations-bucket-significantterms-aggregation,significant terms aggregation>> is a notable exception).
--
Keywords::
Keywords are exact values like email addresses, hostnames, status codes, or
tags. They are typically used for filtering (_Find me all blog posts where
++status++ is ++published++_), for sorting, and for aggregations. Keyword
fields are `not_analyzed`. Instead, the exact string value is added to the
index as a single term.
Below is an example of a mapping for a full text (`analyzed`) and a keyword
(`not_analyzed`) string field:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"full_name": { <1>
"type": "string"
},
"status": {
"type": "string", <2>
"index": "not_analyzed"
}
}
}
}
}
--------------------------------
// AUTOSENSE
<1> The `full_name` field is an `analyzed` full text field -- `index:analyzed` is the default.
<2> The `status` field is a `not_analyzed` keyword field.
Sometimes it is useful to have both a full text (`analyzed`) and a keyword
(`not_analyzed`) version of the same field: one for full text search and the
other for aggregations and sorting. This can be achieved with
<<multi-fields,multi-fields>>.
[[string-params]]
==== Parameters for string fields
The following parameters are accepted by `string` fields:
[horizontal]
<<analyzer,`analyzer`>>::
The <<analysis,analyzer>> which should be used for
<<mapping-index,`analyzed`>> string fields, both at index-time
and at search-time (unless overridden by the <<search-analyzer,`search_analyzer>>).
Defaults to the default index analyzer, or the
<<analysis-standard-analyzer,`standard` analyzer>>.
<<index-boost,`boost`>>::
Field-level index time boosting. Accepts a floating point number, defaults
to `1.0`.
<<doc-values,`doc_values`>>::
Can the field use on-disk index-time doc values for sorting, aggregations,
or scripting? Accepts `true` or `false`. Defaults to `true` for
`not_analyzed` fields. Analyzed fields do not support doc values.
<<fielddata,`fielddata`>>::
Can the field use in-memory fielddata for sorting, aggregations,
or scripting? Accepts `disabled` or `paged_bytes` (default).
Not analyzed fields will use <<doc-values,doc values>> in preference
to fielddata.
<<multi-fields,`fields`>>::
Multi-fields allow the same string value to be indexed in multiple ways for
different purposes, such as one field for search and a multi-field for
sorting and aggregations, or the same string value analyzed by different
analyzers.
<<ignore-above,`ignore_above`>>::
Do not index or analyze any string longer than this value. Defaults to `0` (disabled).
<<include-in-all,`include_in_all`>>::
Whether or not the field value should be included in the
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
<<object,`object`>> field sets `include_in_all` to `false`.
Otherwise defaults to `true`.
<<mapping-index,`index`>>::
Should the field be searchable? Accepts `analyzed` (default, treat as full-text field),
`not_analyzed` (treat as keyword field) and `no`.
<<index-options,`index_options`>>::
What information should be stored in the index, for search and highlighting purposes.
Defaults to `positions` for <<mapping-index,`analyzed`>> fields, and to `docs` for
`not_analyzed` fields.
<<norms,`norms`>>::
+
--
Whether field-length should be taken into account when scoring queries.
Defaults depend on the <<mapping-index,`index`>> setting:
* `analyzed` fields default to `{ "enabled": true, "loading": "lazy" }`.
* `not_analyzed` fields default to `{ "enabled": false }`.
--
<<null-value,`null_value`>>::
Accepts a string value which is substituted for any explicit `null`
values. Defaults to `null`, which means the field is treated as missing.
If the field is `analyzed`, the `null_value` will also be analyzed.
<<position-offset-gap,`position_offset_gap`>>::
The number of fake term positions which should be inserted between
each element of an array of strings. Defaults to 0.
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).
<<search-analyzer,`search_analyzer`>>::
The <<analyzer,`analyzer`>> that should be used at search time on
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.
<<similarity,`similarity`>>::
Which scoring algorithm or _similarity_ should be used. Defaults
to `default`, which uses TF/IDF.
<<term-vector,`term_vector`>>::
Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
field. Defaults to `no`.

View File

@ -0,0 +1,107 @@
[[token-count]]
=== Token count datatype
A field of type `token_count` is really an <<number,`integer>> field which
accepts string values, analyzes them, then indexes the number of tokens in the
string.
For instance:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"name": { <1>
"type": "string",
"fields": {
"length": { <2>
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
PUT my_index/my_type/1
{ "name": "John Smith" }
PUT my_index/my_type/2
{ "name": "Rachel Alice Williams" }
GET my_index/_search
{
"query": {
"term": {
"name.length": 3 <3>
}
}
}
--------------------------------------------------
// AUTOSENSE
<1> The `name` field is an analyzed string field which uses the default `standard` analyzer.
<2> The `name.length` field is a `token_count` <<multi-fields,multi-field>> which will index the number of tokens in the `name` field.
<3> This query matches only the document containing `Rachel Alice Williams`, as it contains three tokens.
[NOTE]
===================================================================
Technically the `token_count` type sums position increments rather than
counting tokens. This means that even if the analyzer filters out stop
words they are included in the count.
===================================================================
[[token-count-params]]
==== Parameters for `token_count` fields
The following parameters are accepted by `token_count` fields:
[horizontal]
<<analyzer,`analyzer`>>::
The <<analysis,analyzer>> which should be used to analyze the string
value. Required. For best performance, use an analyzer without token
filters.
<<index-boost,`boost`>>::
Field-level index time boosting. Accepts a floating point number, defaults
to `1.0`.
<<doc-values,`doc_values`>>::
Can the field value be used for sorting, aggregations, or scripting?
Accepts `true` (default) or `false`.
<<mapping-index,`index`>>::
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
<<include-in-all,`include_in_all`>>::
Whether or not the field value should be included in the
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
to `false`. Note: if `true`, it is the string value that is added to `_all`,
not the calculated token count.
<<null-value,`null_value`>>::
Accepts a numeric value of the same `type` as the field which is
substituted for any explicit `null` values. Defaults to `null`, which
means the field is treated as missing.
<<precision-step,`precision_step`>>::
Controls the number of extra terms that are indexed to make
<<query-dsl-range-query,`range` queries>> faster. Defaults to `32`.
<<mapping-store,`store`>>::
Whether the field value should be stored and retrievable separately from
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
(default).

View File

@ -71,7 +71,7 @@ Field statistics can be accessed with a subscript operator like this:
Field statistics are computed per shard and therefore these numbers can vary Field statistics are computed per shard and therefore these numbers can vary
depending on the shard the current document resides in. depending on the shard the current document resides in.
The number of terms in a field cannot be accessed using the `_index` variable. See <<mapping-core-types, word count mapping type>> on how to do that. The number of terms in a field cannot be accessed using the `_index` variable. See <<token-count>> for how to do that.
[float] [float]
==== Term statistics: ==== Term statistics:
@ -80,7 +80,7 @@ Term statistics for a field can be accessed with a subscript operator like
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist. this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
If you do not need the term frequency, call `_index['FIELD'].get('TERM', 0)` If you do not need the term frequency, call `_index['FIELD'].get('TERM', 0)`
to avoid unnecessary initialization of the frequencies. The flag will have only to avoid unnecessary initialization of the frequencies. The flag will have only
affect is your set the `index_options` to `docs` (see <<mapping-core-types, mapping documentation>>). affect is your set the <<index-options,`index_options`>> to `docs`.
`_index['FIELD']['TERM'].df()`:: `_index['FIELD']['TERM'].df()`::
@ -176,7 +176,7 @@ return score;
[float] [float]
==== Term vectors: ==== Term vectors:
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (see <<term-vector>>). To access them, call
`_index.termVectors()` to get a `_index.termVectors()` to get a
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields] https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields]
instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field. instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field.

View File

@ -284,7 +284,6 @@ supported operations are:
|======================================================================= |=======================================================================
|Value |Description |Value |Description
| `aggs` |Aggregations (wherever they may be used) | `aggs` |Aggregations (wherever they may be used)
| `mapping` |Mappings (script transform feature)
| `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields) | `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields)
| `update` |Update api | `update` |Update api
| `plugin` |Any plugin that makes use of scripts under the generic `plugin` category | `plugin` |Any plugin that makes use of scripts under the generic `plugin` category

View File

@ -44,7 +44,7 @@ These documents would *not* match the above query:
[float] [float]
===== `null_value` mapping ===== `null_value` mapping
If the field mapping includes the `null_value` setting (see <<mapping-core-types>>) If the field mapping includes the <<null-value,`null_value`>> setting
then explicit `null` values are replaced with the specified `null_value`. For then explicit `null` values are replaced with the specified `null_value`. For
instance, if the `user` field were mapped as follows: instance, if the `user` field were mapped as follows:

View File

@ -254,7 +254,7 @@ decay function is specified as
<1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`. <1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`.
<2> The specified field must be a numeric, date, or geo-point field. <2> The specified field must be a numeric, date, or geo-point field.
In the above example, the field is a <<mapping-geo-point-type>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example: In the above example, the field is a <<geo-point,`geo_point`>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example:
[source,js] [source,js]
@ -268,7 +268,7 @@ In the above example, the field is a <<mapping-geo-point-type>> and origin can b
} }
} }
-------------------------------------------------- --------------------------------------------------
<1> The date format of the origin depends on the <<mapping-date-format>> defined in <1> The date format of the origin depends on the <<mapping-date-format,`format`>> defined in
your mapping. If you do not define the origin, the current time is used. your mapping. If you do not define the origin, the current time is used.
<2> The `offset` and `decay` parameters are optional. <2> The `offset` and `decay` parameters are optional.

View File

@ -112,7 +112,6 @@ Format in `lat,lon`.
[float] [float]
==== geo_point Type ==== geo_point Type
The filter *requires* the The query *requires* the <<geo-point,`geo_point`>> type to be set on the
<<mapping-geo-point-type,geo_point>> type to be relevant field.
set on the relevant field.

View File

@ -2,8 +2,8 @@
== Geo queries == Geo queries
Elasticsearch supports two types of geo data: Elasticsearch supports two types of geo data:
<<mapping-geo-point-type,`geo_point`>> fields which support lat/lon pairs, and <<geo-point,`geo_point`>> fields which support lat/lon pairs, and
<<mapping-geo-shape-type,`geo_shape`>> fields, which support points, <<geo-shape,`geo_shape`>> fields, which support points,
lines, circles, polygons, multi-polygons etc. lines, circles, polygons, multi-polygons etc.
The queries in this group are: The queries in this group are:

View File

@ -3,7 +3,7 @@
Filter documents indexed using the `geo_shape` type. Filter documents indexed using the `geo_shape` type.
Requires the <<mapping-geo-shape-type,geo_shape Mapping>>. Requires the <<geo-shape,`geo_shape` Mapping>>.
The `geo_shape` query uses the same grid square representation as the The `geo_shape` query uses the same grid square representation as the
geo_shape mapping to find documents that have a shape that intersects geo_shape mapping to find documents that have a shape that intersects

View File

@ -2,13 +2,13 @@
=== Geohash Cell Query === Geohash Cell Query
The `geohash_cell` query provides access to a hierarchy of geohashes. The `geohash_cell` query provides access to a hierarchy of geohashes.
By defining a geohash cell, only <<mapping-geo-point-type,geopoints>> By defining a geohash cell, only <<geo-point,geopoints>>
within this cell will match this filter. within this cell will match this filter.
To get this filter work all prefixes of a geohash need to be indexed. In To get this filter work all prefixes of a geohash need to be indexed. In
example a geohash `u30` needs to be decomposed into three terms: `u30`, example a geohash `u30` needs to be decomposed into three terms: `u30`,
`u3` and `u`. This decomposition must be enabled in the mapping of the `u3` and `u`. This decomposition must be enabled in the mapping of the
<<mapping-geo-point-type,geopoint>> field that's going to be filtered by <<geo-point,geopoint>> field that's going to be filtered by
setting the `geohash_prefix` option: setting the `geohash_prefix` option:
[source,js] [source,js]

View File

@ -7,7 +7,7 @@ which are designed to scale horizontally.
<<query-dsl-nested-query,`nested` query>>:: <<query-dsl-nested-query,`nested` query>>::
Documents may contains fields of type <<mapping-nested-type,`nested`>>. These Documents may contains fields of type <<nested,`nested`>>. These
fields are used to index arrays of objects, where each object can be queried fields are used to index arrays of objects, where each object can be queried
(with the `nested` query) as an independent document. (with the `nested` query) as an independent document.

View File

@ -44,7 +44,7 @@ These documents would *not* match the above filter:
[float] [float]
==== `null_value` mapping ==== `null_value` mapping
If the field mapping includes a `null_value` (see <<mapping-core-types>>) then explicit `null` values If the field mapping includes a <<null-value,`null_value`>> then explicit `null` values
are replaced with the specified `null_value`. For instance, if the `user` field were mapped are replaced with the specified `null_value`. For instance, if the `user` field were mapped
as follows: as follows:

View File

@ -2,7 +2,7 @@
=== Nested Query === Nested Query
Nested query allows to query nested objects / docs (see Nested query allows to query nested objects / docs (see
<<mapping-nested-type,nested mapping>>). The <<nested,nested mapping>>). The
query is executed against the nested objects / docs as if they were query is executed against the nested objects / docs as if they were
indexed as separate docs (they are, internally) and resulting in the indexed as separate docs (they are, internally) and resulting in the
root parent doc (or parent nested mapping). Here is a sample mapping we root parent doc (or parent nested mapping). Here is a sample mapping we
@ -48,7 +48,7 @@ And here is a sample nested query usage:
The query `path` points to the nested object path, and the `query` (or The query `path` points to the nested object path, and the `query` (or
`filter`) includes the query that will run on the nested docs matching `filter`) includes the query that will run on the nested docs matching
the direct path, and joining with the root parent docs. Note that any the direct path, and joining with the root parent docs. Note that any
fields referenced inside the query must use the complete path (fully fields referenced inside the query must use the complete path (fully
qualified). qualified).
The `score_mode` allows to set how inner children matching affects The `score_mode` allows to set how inner children matching affects

View File

@ -29,33 +29,60 @@ The `range` query accepts the following parameters:
`lt`:: Less-than `lt`:: Less-than
`boost`:: Sets the boost value of the query, defaults to `1.0` `boost`:: Sets the boost value of the query, defaults to `1.0`
[float]
==== Date options
When applied on `date` fields the `range` filter accepts also a `time_zone` parameter. [[ranges-on-dates]]
The `time_zone` parameter will be applied to your input lower and upper bounds and will ==== Ranges on date fields
move them to UTC time based date:
When running `range` queries on fields of type <<date,`date`>>, ranges can be
specified using <<date-math>>::
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
{ {
"range" : { "range" : {
"born" : { "date" : {
"gte": "2012-01-01", "gte" : "now-1d/d",
"lte": "now", "lt" : "now/d"
"time_zone": "+01:00"
} }
} }
} }
-------------------------------------------------- --------------------------------------------------
In the above example, `gte` will be actually moved to `2011-12-31T23:00:00` UTC date. ===== Date math and rounding
NOTE: if you give a date with a timezone explicitly defined and use the `time_zone` parameter, `time_zone` will be When using <<date-math,date math>> to round dates to the nearest day, month,
ignored. For example, setting `gte` to `2012-01-01T00:00:00+01:00` with `"time_zone":"+10:00"` will still use `+01:00` time zone. hour, etc, the rounded dates depend on whether the ends of the ranges are
inclusive or exclusive.
When applied on `date` fields the `range` query accepts also a `format` parameter. Rounding up moves to the last millisecond of the rounding scope, and rounding
The `format` parameter will help support another date format than the one defined in mapping: down to the first millisecond of the rounding scope. For example:
[horizontal]
`gt`::
Greater than the date rounded up: `2014-11-18||/M` becomes
`2014-11-30T23:59:59.999`, ie excluding the entire month.
`gte`::
Greater than or equal to the date rounded down: `2014-11-18||/M` becomes
`2014-11-01`, ie including the entire month.
`lt`::
Less than the date rounded down: `2014-11-18||/M` becomes `2014-11-01`, ie
excluding the entire month.
`lte`::
Less than or equal to the date rounded up: `2014-11-18||/M` becomes
`2014-11-30T23:59:59.999`, ie including the entire month.
===== Date format in range queries
Formatted dates will be parsed using the <<mapping-date-format,`format`>>
specified on the <<date,`date`>> field by default, but it can be overridden by
passing the `format` parameter to the `range` query:
[source,js] [source,js]
-------------------------------------------------- --------------------------------------------------
@ -69,3 +96,25 @@ The `format` parameter will help support another date format than the one define
} }
} }
-------------------------------------------------- --------------------------------------------------
===== Time zone in range queries
Dates can be converted from another timezone to UTC either by specifying the
time zone in the date value itself (if the <<mapping-date-format, `format`>>
accepts it), or it can be specified as the `time_zone` parameter:
[source,js]
--------------------------------------------------
{
"range" : {
"timestamp" : {
"gte": "2015-01-01 00:00:00", <1>
"lte": "now",
"time_zone": "+01:00"
}
}
}
--------------------------------------------------
<1> This date will be converted to `2014-12-31T23:00:00 UTC`.

View File

@ -3,7 +3,7 @@
experimental[] experimental[]
The <<mapping-parent-field, parent/child>> and <<mapping-nested-type, nested>> features allow the return of documents that The <<mapping-parent-field, parent/child>> and <<nested, nested>> features allow the return of documents that
have matches in a different scope. In the parent/child case, parent document are returned based on matches in child have matches in a different scope. In the parent/child case, parent document are returned based on matches in child
documents or child document are returned based on matches in parent documents. In the nested case, documents are returned documents or child document are returned based on matches in parent documents. In the nested case, documents are returned
based on matches in nested inner objects. based on matches in nested inner objects.

View File

@ -71,6 +71,7 @@ curl -XPOST 'localhost:9200/_search' -d '{
}' }'
-------------------------------------------------- --------------------------------------------------
[[nested-sorting]]
==== Sorting within nested objects. ==== Sorting within nested objects.
Elasticsearch also supports sorting by Elasticsearch also supports sorting by
@ -166,6 +167,7 @@ If any of the indices that are queried doesn't have a mapping for `price`
then Elasticsearch will handle it as if there was a mapping of type then Elasticsearch will handle it as if there was a mapping of type
`long`, with all documents in this index having no value for this field. `long`, with all documents in this index having no value for this field.
[[geo-sorting]]
==== Geo Distance Sorting ==== Geo Distance Sorting
Allow to sort by `_geo_distance`. Here is an example: Allow to sort by `_geo_distance`. Here is an example: