Docs: Mapping docs completely rewritten for 2.0
This commit is contained in:
parent
40cd460647
commit
ac2b8951c6
|
@ -14,7 +14,7 @@ docs/html/
|
|||
docs/build.log
|
||||
/tmp/
|
||||
backwards/
|
||||
|
||||
html_docs
|
||||
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
|
||||
## All files (.project, .classpath, .settings/*) should be generated through Maven which
|
||||
## will correctly set the classpath based on the declared dependencies and write settings
|
||||
|
|
|
@ -53,7 +53,7 @@ Response:
|
|||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the `geo_point` <<mapping-geo-point-type,type>>:
|
||||
The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the <<geo-point,`geo_point` type>>:
|
||||
|
||||
* Object format: `{ "lat" : 52.3760, "lon" : 4.894 }` - this is the safest format as it is the most explicit about the `lat` & `lon` values
|
||||
* String format: `"52.3760, 4.894"` - where the first number is the `lat` and the second is the `lon`
|
||||
|
|
|
@ -200,7 +200,7 @@ and therefore can't be used in the `order` option of the `terms` aggregator.
|
|||
If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
|
||||
Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
|
||||
has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
|
||||
or `reverse_nested` aggregator. Read more about nested in the <<mapping-nested-type,nested type mapping>>.
|
||||
or `reverse_nested` aggregator. Read more about nested in the <<nested,nested type mapping>>.
|
||||
|
||||
If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
|
||||
the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why
|
||||
|
|
|
@ -152,6 +152,33 @@ being consumed by a monitoring tool, rather than intended for human
|
|||
consumption. The default for the `human` flag is
|
||||
`false`.
|
||||
|
||||
[[date-math]]
|
||||
=== Date Math
|
||||
|
||||
Most parameters which accept a formatted date value -- such as `gt` and `lt`
|
||||
in <<query-dsl-range-query,range queries>> `range` queries, or `from` and `to`
|
||||
in <<search-aggregations-bucket-daterange-aggregation,`daterange`
|
||||
aggregations>> -- understand date maths.
|
||||
|
||||
The expression starts with an anchor date, which can either be `now`, or a
|
||||
date string ending with `||`. This anchor date can optionally be followed by
|
||||
one or more maths expressions:
|
||||
|
||||
* `+1h` - add one hour
|
||||
* `-1d` - subtract one day
|
||||
* `/d` - round down to the nearest day
|
||||
|
||||
The supported <<time-units,time units>> are: `y` (year), `M` (month), `w` (week),
|
||||
`d` (day), `h` (hour), `m` (minute), and `s` (second).
|
||||
|
||||
Some examples are:
|
||||
|
||||
[horizontal]
|
||||
`now+1h`:: The current time plus one hour, with ms resolution.
|
||||
`now+1h+1m`:: The current time plus one hour plus one minute, with ms resolution.
|
||||
`now+1h/d`:: The current time plus one hour, rounded down to the nearest day.
|
||||
`2015-01-01||+1M/d`:: `2015-01-01` plus one month, rounded down to the nearest day.
|
||||
|
||||
[float]
|
||||
=== Response Filtering
|
||||
|
||||
|
@ -237,10 +264,10 @@ curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
|
|||
--------------------------------------------------
|
||||
|
||||
Note that elasticsearch sometimes returns directly the raw value of a field,
|
||||
like the `_source` field. If you want to filter _source fields, you should
|
||||
like the `_source` field. If you want to filter `_source` fields, you should
|
||||
consider combining the already existing `_source` parameter (see
|
||||
<<get-source-filtering,Get API>> for more details) with the `filter_path`
|
||||
parameter like this:
|
||||
parameter like this:
|
||||
|
||||
[source,sh]
|
||||
--------------------------------------------------
|
||||
|
@ -318,8 +345,9 @@ of supporting the native JSON number types.
|
|||
[float]
|
||||
=== Time units
|
||||
|
||||
Whenever durations need to be specified, eg for a `timeout` parameter, the duration
|
||||
can be specified as a whole number representing time in milliseconds, or as a time value like `2d` for 2 days. The supported units are:
|
||||
Whenever durations need to be specified, eg for a `timeout` parameter, the
|
||||
duration must specify the unit, like `2d` for 2 days. The supported units
|
||||
are:
|
||||
|
||||
[horizontal]
|
||||
`y`:: Year
|
||||
|
@ -329,6 +357,7 @@ can be specified as a whole number representing time in milliseconds, or as a ti
|
|||
`h`:: Hour
|
||||
`m`:: Minute
|
||||
`s`:: Second
|
||||
`ms`:: Milli-second
|
||||
|
||||
[[distance-units]]
|
||||
[float]
|
||||
|
|
|
@ -6,53 +6,3 @@ added to an index either when creating it or by using the put mapping
|
|||
api. It also handles the dynamic mapping support for types that have no
|
||||
explicit mappings pre defined. For more information about mapping
|
||||
definitions, check out the <<mapping,mapping section>>.
|
||||
|
||||
[float]
|
||||
=== Dynamic Mappings
|
||||
|
||||
New types and new fields within types can be added dynamically just
|
||||
by indexing a document. When Elasticsearch encounters a new type,
|
||||
it creates the type using the `_default_` mapping (see below).
|
||||
|
||||
When it encounters a new field within a type, it autodetects the
|
||||
datatype that the field contains and adds it to the type mapping
|
||||
automatically.
|
||||
|
||||
See <<mapping-dynamic-mapping>> for details of how to control and
|
||||
configure dynamic mapping.
|
||||
|
||||
[float]
|
||||
=== Default Mapping
|
||||
|
||||
When a new type is created (at <<indices-create-index,index creation>> time,
|
||||
using the <<indices-put-mapping,`put-mapping` API>> or just by indexing a
|
||||
document into it), the type uses the `_default_` mapping as its basis. Any
|
||||
mapping specified in the <<indices-create-index,`create-index`>> or
|
||||
<<indices-put-mapping,`put-mapping`>> request override values set in the
|
||||
`_default_` mapping.
|
||||
|
||||
The default mapping definition is a plain mapping definition that is
|
||||
embedded within Elasticsearch:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
_default_ : {
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Pretty short, isn't it? Basically, everything is `_default_`ed, including the
|
||||
dynamic nature of the root object mapping which allows new fields to be added
|
||||
automatically.
|
||||
|
||||
The default mapping can be overridden by specifying the `_default_` type when
|
||||
creating a new index.
|
||||
|
||||
[float]
|
||||
=== Mapper settings
|
||||
|
||||
`index.mapper.dynamic` (_dynamic_)::
|
||||
|
||||
Dynamic creation of mappings for unmapped types can be completely
|
||||
disabled by setting `index.mapper.dynamic` to `false`.
|
||||
|
|
|
@ -6,8 +6,8 @@ are scored. Similarity is per field, meaning that via the mapping one
|
|||
can define a different similarity per field.
|
||||
|
||||
Configuring a custom similarity is considered a expert feature and the
|
||||
builtin similarities are most likely sufficient as is described in the
|
||||
<<mapping-core-types,mapping section>>
|
||||
builtin similarities are most likely sufficient as is described in
|
||||
<<similarity>>.
|
||||
|
||||
[float]
|
||||
[[configuration]]
|
||||
|
@ -90,7 +90,7 @@ Type name: `BM25`
|
|||
==== DFR similarity
|
||||
|
||||
Similarity that implements the
|
||||
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
|
||||
from randomness] framework. This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
|
@ -111,7 +111,7 @@ Type name: `DFR`
|
|||
[[ib]]
|
||||
==== IB similarity.
|
||||
|
||||
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||
based model] . This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
|
@ -125,7 +125,7 @@ Type name: `IB`
|
|||
[[lm_dirichlet]]
|
||||
==== LM Dirichlet similarity.
|
||||
|
||||
http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
|
||||
Dirichlet similarity] . This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
|
@ -137,7 +137,7 @@ Type name: `LMDirichlet`
|
|||
[[lm_jelinek_mercer]]
|
||||
==== LM Jelinek Mercer similarity.
|
||||
|
||||
http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||
Jelinek Mercer similarity] . This similarity has the following options:
|
||||
|
||||
[horizontal]
|
||||
|
|
|
@ -3,76 +3,157 @@
|
|||
|
||||
[partintro]
|
||||
--
|
||||
Mapping is the process of defining how a document should be mapped to
|
||||
the Search Engine, including its searchable characteristics such as
|
||||
which fields are searchable and if/how they are tokenized. In
|
||||
Elasticsearch, an index may store documents of different "mapping
|
||||
types". Elasticsearch allows one to associate multiple mapping
|
||||
definitions for each mapping type.
|
||||
|
||||
Explicit mapping is defined on an index/type level. By default, there
|
||||
isn't a need to define an explicit mapping, since one is automatically
|
||||
created and registered when a new type or new field is introduced (with
|
||||
no performance overhead) and have sensible defaults. Only when the
|
||||
defaults need to be overridden must a mapping definition be provided.
|
||||
Mapping is the process of defining how a document, and the fields it contains,
|
||||
are stored and indexed. For instance, use mappings to define:
|
||||
|
||||
* which string fields should be treated as full text fields.
|
||||
* which fields contain numbers, dates, or geolocations.
|
||||
* whether the values of all fields in the document should be
|
||||
indexed into the catch-all <<mapping-all-field,`_all`>> field.
|
||||
* the <<mapping-date-format,format>> of date values.
|
||||
* custom rules to control the mapping for
|
||||
<<dynamic-mapping,dynamically added fields>>.
|
||||
|
||||
[float]
|
||||
[[all-mapping-types]]
|
||||
=== Mapping Types
|
||||
[[mapping-type]]
|
||||
== Mapping Types
|
||||
|
||||
Mapping types are a way to divide the documents in an index into logical
|
||||
groups. Think of it as tables in a database. Though there is separation
|
||||
between types, it's not a full separation (all end up as a document
|
||||
within the same Lucene index).
|
||||
Each index has one or more _mapping types_, which are used to divide the
|
||||
documents in an index into logical groups. User documents might be stored in a
|
||||
`user` type, and blog posts in a `blogpost` type.
|
||||
|
||||
Field names with the same name across types are highly recommended to
|
||||
have the same type and same mapping characteristics (analysis settings
|
||||
for example). There is an effort to allow to explicitly "choose" which
|
||||
field to use by using type prefix (`my_type.my_field`), but it's not
|
||||
complete, and there are places where it will never work (like
|
||||
aggregations on the field).
|
||||
Each mapping type has:
|
||||
|
||||
In practice though, this restriction is almost never an issue. The field
|
||||
name usually ends up being a good indication to its "typeness" (e.g.
|
||||
"first_name" will always be a string). Note also, that this does not
|
||||
apply to the cross index case.
|
||||
<<mapping-fields,Meta-fields>>::
|
||||
|
||||
Meta-fields are used to customize how a document's metadata associated is
|
||||
treated. Examples of meta-fields include the document's
|
||||
<<mapping-index-field,`_index`>>, <<mapping-type-field,`_type`>>,
|
||||
<<mapping-id-field,`_id`>>, and <<mapping-source-field,`_source`>> fields.
|
||||
|
||||
<<mapping-types,Fields>> or _properties_::
|
||||
|
||||
Each mapping type contains a list of fields or `properties` pertinent to that
|
||||
type. A `user` type might contain `title`, `name`, and `age` fields, while a
|
||||
`blogpost` type might contain `title`, `body`, `user_id` and `created`
|
||||
fields.
|
||||
|
||||
The mapping for the above example could look like this:
|
||||
|
||||
[source,js]
|
||||
---------------------------------------
|
||||
PUT my_index <1>
|
||||
{
|
||||
"mappings": {
|
||||
"user": { <2>
|
||||
"_all": { "enabled": false }, <3>
|
||||
"properties": { <4>
|
||||
"title": { "type": "string" }, <5>
|
||||
"name": { "type": "string" }, <5>
|
||||
"age": { "type": "integer" } <5>
|
||||
}
|
||||
},
|
||||
"blogpost": { <2>
|
||||
"properties": { <4>
|
||||
"title": { "type": "string" }, <5>
|
||||
"body": { "type": "string" }, <5>
|
||||
"user_id": {
|
||||
"type": "string", <5>
|
||||
"index": "not_analyzed"
|
||||
},
|
||||
"created": {
|
||||
"type": "date", <5>
|
||||
"format": "strict_date_optional_time||epoch_millis"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
---------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Create an index called `my_index`.
|
||||
<2> Add mapping types called `user` and `blogpost`.
|
||||
<3> Disable the `_all` <<mapping-fields,meta field>> for the `user` mapping type.
|
||||
<4> Specify fields or _properties_ in each mapping type.
|
||||
<5> Specify the data `type` and mapping for each field.
|
||||
|
||||
[float]
|
||||
[[mapping-api]]
|
||||
=== Mapping API
|
||||
== Field datatypes
|
||||
|
||||
To create a mapping, you will need the <<indices-put-mapping,Put Mapping
|
||||
API>>, or you can add multiple mappings when you <<indices-create-index,create an
|
||||
index>>.
|
||||
Each field has a data `type` which can be:
|
||||
|
||||
* a simple type like <<string,`string`>>, <<date,`date`>>, <<number,`long`>>,
|
||||
<<number,`double`>>, <<boolean,`boolean`>> or <<ip,`ip`>>.
|
||||
* a type which supports the hierarchical nature of JSON such as
|
||||
<<object,`object`>> or <<nested,`nested`>>.
|
||||
* or a specialised type like <<geo-point,`geo_point`>>,
|
||||
<<geo-shape,`geo_shape`>>, or <<search-suggesters-completion,`completion`>>.
|
||||
|
||||
[IMPORTANT]
|
||||
.Fields are shared across mapping types
|
||||
=========================================
|
||||
|
||||
Mapping types are used to group fields, but the fields in each mapping type
|
||||
are not independent of each other. Fields with:
|
||||
|
||||
* the _same name_
|
||||
* in the _same index_
|
||||
* in _different mapping types_
|
||||
* map to the _same field_ internally,
|
||||
* and *must have the same mapping*.
|
||||
|
||||
The `title` field exists in both the `user` and `blogpost` mapping types and
|
||||
so must have exactly the same mapping in each type. The only exceptions to
|
||||
this rule are the <<copy-to>>, <<dynamic>>, <<enabled>>, <<ignore-above>>,
|
||||
<<include-in-all>>, and <<properties>> parameters, which may have different
|
||||
settings per field.
|
||||
|
||||
Usually, fields with the same name also contain the same type of data, so
|
||||
having the same mapping is not a problem. When conflicts do arise, these can
|
||||
be solved by choosing more descriptive names, such as `user_title` and
|
||||
`blog_title`.
|
||||
|
||||
=========================================
|
||||
|
||||
[float]
|
||||
[[mapping-settings]]
|
||||
=== Global Settings
|
||||
== Dynamic mapping
|
||||
|
||||
The `index.mapping.ignore_malformed` global setting can be set on the
|
||||
index level to allow to ignore malformed content globally across all
|
||||
mapping types (malformed content example is trying to index a text string
|
||||
value as a numeric type).
|
||||
Fields and mapping types do not need to be defined before being used. Thanks
|
||||
to _dynamic mapping_, new mapping types and new field names will be added
|
||||
automatically, just by indexing a document. New fields can be added both to
|
||||
the top-level mapping type, and to inner <<object,`object`>> and
|
||||
<<nested,`nested`>> fields.
|
||||
|
||||
The
|
||||
<<dynamic-mapping,dynamic mapping>> rules can be configured to
|
||||
customise the mapping that is used for new types and new fields.
|
||||
|
||||
[float]
|
||||
== Explicit mappings
|
||||
|
||||
You know more about your data than Elasticsearch can guess, so while dynamic
|
||||
mapping can be useful to get started, at some point you will want to specify
|
||||
your own explicit mappings.
|
||||
|
||||
You can create mapping types and field mappings when you
|
||||
<<indices-create-index,create an index>>, and you can add mapping types and
|
||||
fields to an existing index with the <<indices-put-mapping,PUT mapping API>>.
|
||||
|
||||
[float]
|
||||
== Updating existing mappings
|
||||
|
||||
Other than where documented, *existing type and field mappings cannot be
|
||||
updated*. Changing the mapping would mean invalidating already indexed
|
||||
documents. Instead, you should create a new index with the correct mappings
|
||||
and reindex your data into that index.
|
||||
|
||||
The `index.mapping.coerce` global setting can be set on the
|
||||
index level to coerce numeric content globally across all
|
||||
mapping types (The default setting is true and coercions attempted are
|
||||
to convert strings with numbers into numeric types and also numeric values
|
||||
with fractions to any integer/short/long values minus the fraction part).
|
||||
When the permitted conversions fail in their attempts, the value is considered
|
||||
malformed and the ignore_malformed setting dictates what will happen next.
|
||||
--
|
||||
|
||||
include::mapping/fields.asciidoc[]
|
||||
|
||||
include::mapping/types.asciidoc[]
|
||||
|
||||
include::mapping/date-format.asciidoc[]
|
||||
include::mapping/fields.asciidoc[]
|
||||
|
||||
include::mapping/fielddata_formats.asciidoc[]
|
||||
include::mapping/params.asciidoc[]
|
||||
|
||||
include::mapping/dynamic-mapping.asciidoc[]
|
||||
|
||||
include::mapping/meta.asciidoc[]
|
||||
|
||||
include::mapping/transform.asciidoc[]
|
||||
|
|
|
@ -1,238 +0,0 @@
|
|||
[[mapping-date-format]]
|
||||
== Date Format
|
||||
|
||||
In JSON documents, dates are represented as strings. Elasticsearch uses a set
|
||||
of pre-configured format to recognize and convert those, but you can change the
|
||||
defaults by specifying the `format` option when defining a `date` type, or by
|
||||
specifying `dynamic_date_formats` in the `root object` mapping (which will
|
||||
be used unless explicitly overridden by a `date` type). There are built in
|
||||
formats supported, as well as complete custom one.
|
||||
|
||||
The parsing of dates uses http://www.joda.org/joda-time/[Joda]. The
|
||||
default date parsing used if no format is specified is
|
||||
http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[ISODateTimeFormat.dateOptionalTimeParser].
|
||||
|
||||
An extension to the format allow to define several formats using `||`
|
||||
separator. This allows to define less strict formats that can be used,
|
||||
for example, the `yyyy/MM/dd HH:mm:ss||yyyy/MM/dd` format will parse
|
||||
both `yyyy/MM/dd HH:mm:ss` and `yyyy/MM/dd`. The first format will also
|
||||
act as the one that converts back from milliseconds to a string
|
||||
representation.
|
||||
|
||||
[float]
|
||||
[[date-math]]
|
||||
=== Date Math
|
||||
|
||||
The `date` type supports using date math expression when using it in a
|
||||
query/filter (mainly makes sense in `range` query/filter).
|
||||
|
||||
The expression starts with an "anchor" date, which can be either `now`
|
||||
or a date string (in the applicable format) ending with `||`. It can
|
||||
then follow by a math expression, supporting `+`, `-` and `/`
|
||||
(rounding). The units supported are `y` (year), `M` (month), `w` (week),
|
||||
`d` (day), `h` (hour), `m` (minute), and `s` (second).
|
||||
|
||||
Here are some samples: `now+1h`, `now+1h+1m`, `now+1h/d`,
|
||||
`2012-01-01||+1M/d`.
|
||||
|
||||
When doing `range` type searches with rounding, the value parsed
|
||||
depends on whether the end of the range is inclusive or exclusive, and
|
||||
whether the beginning or end of the range. Rounding up moves to the
|
||||
last millisecond of the rounding scope, and rounding down to the
|
||||
first millisecond of the rounding scope. The semantics work as follows:
|
||||
* `gt` - round up, and use > that value (`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie excluding the entire month)
|
||||
* `gte` - round D down, and use >= that value (`2014-11-18||/M` becomes `2014-11-01`, ie including the entire month)
|
||||
* `lt` - round D down, and use < that value (`2014-11-18||/M` becomes `2014-11-01`, ie excluding the entire month)
|
||||
* `lte` - round D up, and use <= that value(`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie including the entire month)
|
||||
|
||||
[float]
|
||||
[[built-in]]
|
||||
=== Built In Formats
|
||||
|
||||
Most of the below dates have a `strict` companion dates, which means, that
|
||||
year, month and day parts of the week must have prepending zeros in order
|
||||
to be valid. This means, that a date like `5/11/1` would not be valid, but
|
||||
you would need to specify the full date, which would be `2005/11/01` in this
|
||||
example. So instead of `date_optional_time` you would need to specify
|
||||
`strict_date_optional_time`.
|
||||
|
||||
The following tables lists all the defaults ISO formats supported:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Name |Description
|
||||
|`basic_date`|A basic formatter for a full date as four digit year, two
|
||||
digit month of year, and two digit day of month (yyyyMMdd).
|
||||
|
||||
|`basic_date_time`|A basic formatter that combines a basic date and time,
|
||||
separated by a 'T' (yyyyMMdd'T'HHmmss.SSSZ).
|
||||
|
||||
|`basic_date_time_no_millis`|A basic formatter that combines a basic date
|
||||
and time without millis, separated by a 'T' (yyyyMMdd'T'HHmmssZ).
|
||||
|
||||
|`basic_ordinal_date`|A formatter for a full ordinal date, using a four
|
||||
digit year and three digit dayOfYear (yyyyDDD).
|
||||
|
||||
|`basic_ordinal_date_time`|A formatter for a full ordinal date and time,
|
||||
using a four digit year and three digit dayOfYear
|
||||
(yyyyDDD'T'HHmmss.SSSZ).
|
||||
|
||||
|`basic_ordinal_date_time_no_millis`|A formatter for a full ordinal date
|
||||
and time without millis, using a four digit year and three digit
|
||||
dayOfYear (yyyyDDD'T'HHmmssZ).
|
||||
|
||||
|`basic_time`|A basic formatter for a two digit hour of day, two digit
|
||||
minute of hour, two digit second of minute, three digit millis, and time
|
||||
zone offset (HHmmss.SSSZ).
|
||||
|
||||
|`basic_time_no_millis`|A basic formatter for a two digit hour of day,
|
||||
two digit minute of hour, two digit second of minute, and time zone
|
||||
offset (HHmmssZ).
|
||||
|
||||
|`basic_t_time`|A basic formatter for a two digit hour of day, two digit
|
||||
minute of hour, two digit second of minute, three digit millis, and time
|
||||
zone off set prefixed by 'T' ('T'HHmmss.SSSZ).
|
||||
|
||||
|`basic_t_time_no_millis`|A basic formatter for a two digit hour of day,
|
||||
two digit minute of hour, two digit second of minute, and time zone
|
||||
offset prefixed by 'T' ('T'HHmmssZ).
|
||||
|
||||
|`basic_week_date`|A basic formatter for a full date as four digit
|
||||
weekyear, two digit week of weekyear, and one digit day of week
|
||||
(xxxx'W'wwe). `strict_basic_week_date` is supported.
|
||||
|
||||
|`basic_week_date_time`|A basic formatter that combines a basic weekyear
|
||||
date and time, separated by a 'T' (xxxx'W'wwe'T'HHmmss.SSSZ).
|
||||
`strict_basic_week_date_time` is supported.
|
||||
|
||||
|`basic_week_date_time_no_millis`|A basic formatter that combines a basic
|
||||
weekyear date and time without millis, separated by a 'T'
|
||||
(xxxx'W'wwe'T'HHmmssZ). `strict_week_date_time` is supported.
|
||||
|
||||
|`date`|A formatter for a full date as four digit year, two digit month
|
||||
of year, and two digit day of month (yyyy-MM-dd). `strict_date` is supported.
|
||||
_
|
||||
|`date_hour`|A formatter that combines a full date and two digit hour of
|
||||
day. strict_date_hour` is supported.
|
||||
|
||||
|
||||
|`date_hour_minute`|A formatter that combines a full date, two digit hour
|
||||
of day, and two digit minute of hour. strict_date_hour_minute` is supported.
|
||||
|
||||
|`date_hour_minute_second`|A formatter that combines a full date, two
|
||||
digit hour of day, two digit minute of hour, and two digit second of
|
||||
minute. `strict_date_hour_minute_second` is supported.
|
||||
|
||||
|`date_hour_minute_second_fraction`|A formatter that combines a full
|
||||
date, two digit hour of day, two digit minute of hour, two digit second
|
||||
of minute, and three digit fraction of second
|
||||
(yyyy-MM-dd'T'HH:mm:ss.SSS). `strict_date_hour_minute_second_fraction` is supported.
|
||||
|
||||
|`date_hour_minute_second_millis`|A formatter that combines a full date,
|
||||
two digit hour of day, two digit minute of hour, two digit second of
|
||||
minute, and three digit fraction of second (yyyy-MM-dd'T'HH:mm:ss.SSS).
|
||||
`strict_date_hour_minute_second_millis` is supported.
|
||||
|
||||
|`date_optional_time`|a generic ISO datetime parser where the date is
|
||||
mandatory and the time is optional. `strict_date_optional_time` is supported.
|
||||
|
||||
|`date_time`|A formatter that combines a full date and time, separated by
|
||||
a 'T' (yyyy-MM-dd'T'HH:mm:ss.SSSZZ). `strict_date_time` is supported.
|
||||
|
||||
|`date_time_no_millis`|A formatter that combines a full date and time
|
||||
without millis, separated by a 'T' (yyyy-MM-dd'T'HH:mm:ssZZ).
|
||||
`strict_date_time_no_millis` is supported.
|
||||
|
||||
|`hour`|A formatter for a two digit hour of day. `strict_hour` is supported.
|
||||
|
||||
|`hour_minute`|A formatter for a two digit hour of day and two digit
|
||||
minute of hour. `strict_hour_minute` is supported.
|
||||
|
||||
|`hour_minute_second`|A formatter for a two digit hour of day, two digit
|
||||
minute of hour, and two digit second of minute.
|
||||
`strict_hour_minute_second` is supported.
|
||||
|
||||
|`hour_minute_second_fraction`|A formatter for a two digit hour of day,
|
||||
two digit minute of hour, two digit second of minute, and three digit
|
||||
fraction of second (HH:mm:ss.SSS).
|
||||
`strict_hour_minute_second_fraction` is supported.
|
||||
|
||||
|`hour_minute_second_millis`|A formatter for a two digit hour of day, two
|
||||
digit minute of hour, two digit second of minute, and three digit
|
||||
fraction of second (HH:mm:ss.SSS).
|
||||
`strict_hour_minute_second_millis` is supported.
|
||||
|
||||
|`ordinal_date`|A formatter for a full ordinal date, using a four digit
|
||||
year and three digit dayOfYear (yyyy-DDD). `strict_ordinal_date` is supported.
|
||||
|
||||
|`ordinal_date_time`|A formatter for a full ordinal date and time, using
|
||||
a four digit year and three digit dayOfYear (yyyy-DDD'T'HH:mm:ss.SSSZZ).
|
||||
`strict_ordinal_date_time` is supported.
|
||||
|
||||
|`ordinal_date_time_no_millis`|A formatter for a full ordinal date and
|
||||
time without millis, using a four digit year and three digit dayOfYear
|
||||
(yyyy-DDD'T'HH:mm:ssZZ).
|
||||
`strict_ordinal_date_time_no_millis` is supported.
|
||||
|
||||
|`time`|A formatter for a two digit hour of day, two digit minute of
|
||||
hour, two digit second of minute, three digit fraction of second, and
|
||||
time zone offset (HH:mm:ss.SSSZZ). `strict_time` is supported.
|
||||
|
||||
|`time_no_millis`|A formatter for a two digit hour of day, two digit
|
||||
minute of hour, two digit second of minute, and time zone offset
|
||||
(HH:mm:ssZZ). `strict_time_no_millis` is supported.
|
||||
|
||||
|`t_time`|A formatter for a two digit hour of day, two digit minute of
|
||||
hour, two digit second of minute, three digit fraction of second, and
|
||||
time zone offset prefixed by 'T' ('T'HH:mm:ss.SSSZZ).
|
||||
`strict_t_time` is supported.
|
||||
|
||||
|`t_time_no_millis`|A formatter for a two digit hour of day, two digit
|
||||
minute of hour, two digit second of minute, and time zone offset
|
||||
prefixed by 'T' ('T'HH:mm:ssZZ). `strict_t_time_no_millis` is supported.
|
||||
|
||||
|`week_date`|A formatter for a full date as four digit weekyear, two
|
||||
digit week of weekyear, and one digit day of week (xxxx-'W'ww-e).
|
||||
`strict_week_date` is supported.
|
||||
|
||||
|`week_date_time`|A formatter that combines a full weekyear date and
|
||||
time, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ).
|
||||
`strict_week_date_time` is supported.
|
||||
|
||||
|`week_date_time_no_millis`|A formatter that combines a full weekyear date
|
||||
and time without millis, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ssZZ).
|
||||
`strict_week_date_time` is supported.
|
||||
|
||||
|`weekyear`|A formatter for a four digit weekyear. `strict_week_year` is supported.
|
||||
|
||||
|`weekyear_week`|A formatter for a four digit weekyear and two digit week
|
||||
of weekyear. `strict_weekyear_week` is supported.
|
||||
|
||||
|`weekyear_week_day`|A formatter for a four digit weekyear, two digit week
|
||||
of weekyear, and one digit day of week. `strict_weekyear_week_day` is supported.
|
||||
|
||||
|`year`|A formatter for a four digit year. `strict_year` is supported.
|
||||
|
||||
|`year_month`|A formatter for a four digit year and two digit month of
|
||||
year. `strict_year_month` is supported.
|
||||
|
||||
|`year_month_day`|A formatter for a four digit year, two digit month of
|
||||
year, and two digit day of month. `strict_year_month_day` is supported.
|
||||
|
||||
|`epoch_second`|A formatter for the number of seconds since the epoch.
|
||||
Note, that this timestamp allows a max length of 10 chars, so dates
|
||||
older than 1653 and 2286 are not supported. You should use a different
|
||||
date formatter in that case.
|
||||
|
||||
|`epoch_millis`|A formatter for the number of milliseconds since the epoch.
|
||||
Note, that this timestamp allows a max length of 13 chars, so dates
|
||||
older than 1653 and 2286 are not supported. You should use a different
|
||||
date formatter in that case.
|
||||
|=======================================================================
|
||||
|
||||
[float]
|
||||
[[custom]]
|
||||
=== Custom Format
|
||||
|
||||
Allows for a completely customizable date format explained
|
||||
http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[here].
|
|
@ -1,73 +1,67 @@
|
|||
[[mapping-dynamic-mapping]]
|
||||
[[dynamic-mapping]]
|
||||
== Dynamic Mapping
|
||||
|
||||
Default mappings allow generic mapping definitions to be automatically applied
|
||||
to types that do not have mappings predefined. This is mainly done
|
||||
thanks to the fact that the
|
||||
<<mapping-object-type,object mapping>> and
|
||||
namely the <<mapping-root-object-type,root
|
||||
object mapping>> allow for schema-less dynamic addition of unmapped
|
||||
fields.
|
||||
|
||||
The default mapping definition is a plain mapping definition that is
|
||||
embedded within the distribution:
|
||||
One of the most important features of Elasticsearch is that it tries to get
|
||||
out of your way and let you start exploring your data as quickly as possible.
|
||||
To index a document, you don't have to first create an index, define a mapping
|
||||
type, and define your fields -- you can just index a document and the index,
|
||||
type, and fields will spring to life automatically:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_default_" : {
|
||||
}
|
||||
}
|
||||
PUT data/counters/1 <1>
|
||||
{ "count": 5 }
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Creates the `data` index, the `counters` mapping type, and a field
|
||||
called `count` with datatype `long`.
|
||||
|
||||
Pretty short, isn't it? Basically, everything is defaulted, especially the
|
||||
dynamic nature of the root object mapping. The default mapping can be
|
||||
overridden by specifying the `_default_` type when creating a new index.
|
||||
The automatic detection and addition of new types and fields is called
|
||||
_dynamic mapping_. The dynamic mapping rules can be customised to suit your
|
||||
purposes with:
|
||||
|
||||
The dynamic creation of mappings for unmapped types can be completely
|
||||
disabled by setting `index.mapper.dynamic` to `false`.
|
||||
<<default-mapping,`_default_` mapping>>::
|
||||
|
||||
The dynamic creation of fields within a type can be completely
|
||||
disabled by setting the `dynamic` property of the type to `strict`.
|
||||
Configure the base mapping to be used for new mapping types.
|
||||
|
||||
Here is a <<indices-put-mapping,Put Mapping>> example that
|
||||
disables dynamic field creation for a `tweet`:
|
||||
<<dynamic-field-mapping,Dynamic field mappings>>::
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
$ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d '
|
||||
{
|
||||
"tweet" : {
|
||||
"dynamic": "strict",
|
||||
"properties" : {
|
||||
"message" : {"type" : "string", "store" : true }
|
||||
}
|
||||
}
|
||||
}
|
||||
'
|
||||
--------------------------------------------------
|
||||
The rules governing dynamic field detection.
|
||||
|
||||
Here is how we can change the default
|
||||
<<mapping-date-format,date_formats>> used in the
|
||||
root and inner object types:
|
||||
<<dynamic-templates,Dynamic templates>>::
|
||||
|
||||
Custom rules to configure the mapping for dynamically added fields.
|
||||
|
||||
TIP: <<indices-templates,Index templates>> allow you to configure the default
|
||||
mappings, settings, aliases, and warmers for new indices, whether created
|
||||
automatically or explicitly.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"_default_" : {
|
||||
"dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy", "date_optional_time"]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
=== Unmapped fields in queries
|
||||
=== Disabling automatic type creation
|
||||
|
||||
Queries and filters can refer to fields that don't exist in a mapping. Whether this
|
||||
is allowed is controlled by the `index.query.parse.allow_unmapped_fields` setting.
|
||||
This setting defaults to `true`. Setting it to `false` will disallow the usage of
|
||||
unmapped fields in queries.
|
||||
Automatic type creation can be disabled by setting the `index.mapper.dynamic`
|
||||
setting to `false`, either by setting the default value in the
|
||||
`config/elasticsearch.yml` file, or per-index as an index setting:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /_settings <1>
|
||||
{
|
||||
"index.mapper.dynamic":false
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Disable automatic type creation for all indices.
|
||||
|
||||
Regardless of the value of this setting, types can still be added explicitly
|
||||
when <<indices-create-index,creating an index>> or with the
|
||||
<<indices-put-mapping,PUT mapping>> API.
|
||||
|
||||
|
||||
include::dynamic/default-mapping.asciidoc[]
|
||||
|
||||
include::dynamic/field-mapping.asciidoc[]
|
||||
|
||||
include::dynamic/templates.asciidoc[]
|
||||
|
||||
When registering a new <<search-percolate,percolator query>> or creating
|
||||
a <<filtered,filtered alias>> then the `index.query.parse.allow_unmapped_fields` setting
|
||||
is forcefully overwritten to disallowed unmapped fields.
|
|
@ -0,0 +1,82 @@
|
|||
[[default-mapping]]
|
||||
=== `_default_` mapping
|
||||
|
||||
The default mapping, which will be used as the base mapping for any new
|
||||
mapping types, can be customised by adding a mapping type with the name
|
||||
`_default_` to an index, either when
|
||||
<<indices-create-index,creating the index>> or later on with the
|
||||
<<indices-put-mapping,PUT mapping>> API.
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"_default_": { <1>
|
||||
"_all": {
|
||||
"enabled": false
|
||||
}
|
||||
},
|
||||
"user": {}, <2>
|
||||
"blogpost": { <3>
|
||||
"_all": {
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `_default_` mapping defaults the <<mapping-all-field,`_all`>> field to disabled.
|
||||
<2> The `user` type inherits the settings from `_default_`.
|
||||
<3> The `blogpost` type overrides the defaults and enables the <<mapping-all-field,`_all`>> field.
|
||||
|
||||
While the `_default_` mapping can be updated after an index has been created,
|
||||
the new defaults will only affect mapping types that are created afterwards.
|
||||
|
||||
The `_default_` mapping can be used in conjunction with
|
||||
<<indices-templates,Index templates>> to control dynamically created types
|
||||
within automatically created indices:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT _template/logging
|
||||
{
|
||||
"template": "logs-*", <1>
|
||||
"settings": { "number_of_shards": 1 }, <2>
|
||||
"mappings": {
|
||||
"_default_": {
|
||||
"_all": { <3>
|
||||
"enabled": false
|
||||
},
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"strings": { <4>
|
||||
"match_mapping_type": "string",
|
||||
"mapping": {
|
||||
"type": "string",
|
||||
"fields": {
|
||||
"raw": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed",
|
||||
"ignore_above": 256
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT logs-2015.10.01/event/1
|
||||
{ "message": "error:16" }
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `logging` template will match any indices beginning with `logs-`.
|
||||
<2> Matching indices will be created with a single primary shard.
|
||||
<3> The `_all` field will be disabled by default for new type mappings.
|
||||
<4> String fields will be created with an `analyzed` main field, and a `not_analyzed` `.raw` field.
|
|
@ -0,0 +1,139 @@
|
|||
[[dynamic-field-mapping]]
|
||||
=== Dynamic field mapping
|
||||
|
||||
By default, when a previously unseen field is found in a document,
|
||||
Elasticsearch will add the new field to the type mapping. This behaviour can
|
||||
be disabled, both at the document and at the <<object,`object`>> level, by
|
||||
setting the <<dynamic,`dynamic`>> parameter to `false` or to `strict`.
|
||||
|
||||
Assuming `dynamic` field mapping is enabled, some simple rules are used to
|
||||
determine which datatype the field should have:
|
||||
|
||||
[horizontal]
|
||||
*JSON datatype*:: *Elasticsearch datatype*
|
||||
|
||||
`null`:: No field is added.
|
||||
`true` or `false`:: <<boolean,`boolean`>> field
|
||||
floating{nbsp}point{nbsp}number:: <<number,`double`>> field
|
||||
integer:: <<number,`long`>> field
|
||||
object:: <<object,`object`>> field
|
||||
array:: Depends on the first non-`null` value in the array.
|
||||
string:: Either a <<date,`date`>> field
|
||||
(if the value passes <<date-detection,date detection>>),
|
||||
a <<number,`double`>> or <<number,`long`>> field
|
||||
(if the value passes <<numeric-detection,numeric detection>>)
|
||||
or an <<mapping-index,`analyzed`>> <<string,`string`>> field.
|
||||
|
||||
These are the only <<mapping-types,field datatypes>> that are dynamically
|
||||
detected. All other datatypes must be mapped explicitly.
|
||||
|
||||
Besides the options listed below, dynamic field mapping rules can be further
|
||||
customised with <<dynamic-templates,`dynamic_templates`>>.
|
||||
|
||||
[[date-detection]]
|
||||
==== Date detection
|
||||
|
||||
If `date_detection` is enabled (default), then new string fields are checked
|
||||
to see whether their contents match any of the date patterns specified in
|
||||
`dynamic_date_formats`. If a match is found, a new <<date,`date`>> field is
|
||||
added with the corresponding format.
|
||||
|
||||
The default value for `dynamic_date_formats` is:
|
||||
|
||||
[ <<strict-date-time,`"strict_date_optional_time"`>>,`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`]
|
||||
|
||||
For example:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"create_date": "2015/09/02"
|
||||
}
|
||||
|
||||
GET my_index/_mapping <1>
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `create_date` field has been added as a <<date,`date`>>
|
||||
field with the <<mapping-date-format,`format`>>: +
|
||||
`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`.
|
||||
|
||||
===== Disabling date detection
|
||||
|
||||
Dynamic date dection can be disabled by setting `date_detection` to `false`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"date_detection": false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1 <1>
|
||||
{
|
||||
"create": "2015/09/02"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `create_date` field has been added as a <<string,`string`>> field.
|
||||
|
||||
===== Customising detected date formats
|
||||
|
||||
Alternatively, the `dynamic_date_formats` can be customised to support your
|
||||
own <<mapping-date-format,date formats>>:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"dynamic_date_formats": ["MM/dd/yyyy"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"create_date": "09/25/2015"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
|
||||
[[numeric-detection]]
|
||||
==== Numeric detection
|
||||
|
||||
While JSON has support for native floating point and integer datatypes, some
|
||||
applications or languages may sometimes render numbers as strings. Usually the
|
||||
correct solution is to map these fields explicitly, but numeric detection
|
||||
(which is disabled by default) can be enabled to do this automatically:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"numeric_detection": true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"my_float": "1.0", <1>
|
||||
"my_integer": "1" <2>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `my_float` field is added as a <<number,`double`>> field.
|
||||
<2> The `my_integer` field is added as a <<number,`long`>> field.
|
||||
|
|
@ -0,0 +1,251 @@
|
|||
[[dynamic-templates]]
|
||||
=== Dynamic templates
|
||||
|
||||
Dynamic templates allow you to define custom mappings that can be applied to
|
||||
dynamically added fields based on:
|
||||
|
||||
* the <<dynamic-mapping,datatype>> detected by Elasticsearch, with <<match-mapping-type,`match_mapping_type`>>.
|
||||
* the name of the field, with <<match-unmatch,`match` and `unmatch`>> or <<match-pattern,`match_pattern`>>.
|
||||
* the full dotted path to the field, with <<path-match-unmatch,`path_match` and `path_unmatch`>>.
|
||||
|
||||
The original field name `{name}` and the detected datatype
|
||||
`{dynamic_type`} <<template-variables,template variables>> can be used in
|
||||
the mapping specification as placeholders.
|
||||
|
||||
IMPORTANT: Dynamic field mappings are only added when a field contains a
|
||||
concrete value -- not `null` or an empty array. This means that if the
|
||||
`null_value` option is used in a `dynamic_template`, it will only be applied
|
||||
after the first document with a concrete value for the field has been
|
||||
indexed.
|
||||
|
||||
Dynamic templates are specified as an array of named objects:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"my_template_name": { <1>
|
||||
... match conditions ... <2>
|
||||
"mapping": { ... } <3>
|
||||
}
|
||||
},
|
||||
...
|
||||
]
|
||||
--------------------------------------------------
|
||||
<1> The template name can be any string value.
|
||||
<2> The match conditions can include any of : `match_mapping_type`, `match`, `match_pattern`, `unmatch`, `match_path`, `unmatch_path`.
|
||||
<3> The mapping that the matched field should use.
|
||||
|
||||
|
||||
Templates are processed in order -- the first matching template wins. New
|
||||
templates can be appended to the end of the list with the
|
||||
<<indices-put-mapping,PUT mapping>> API. If a new template has the same
|
||||
name as an existing template, it will replace the old version.
|
||||
|
||||
[[match-mapping-type]]
|
||||
==== `match_mapping_type`
|
||||
|
||||
The `match_mapping_type` matches on the datatype detected by
|
||||
<<dynamic-field-mapping,dynamic field mapping>>, in other words, the datatype
|
||||
that Elasticsearch thinks the field should have. Only the following datatypes
|
||||
can be automatically detected: `boolean`, `date`, `double`, `long`, `object`,
|
||||
`string`. It also accepts `*` to match all datatypes.
|
||||
|
||||
For example, if we wanted to map all integer fields as `integer` instead of
|
||||
`long`, and all `string` fields as both `analyzed` and `not_analyzed`, we
|
||||
could use the following template:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"integers": {
|
||||
"match_mapping_type": "long",
|
||||
"mapping": {
|
||||
"type": "integer"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"strings": {
|
||||
"match_mapping_type": "string",
|
||||
"mapping": {
|
||||
"type": "string",
|
||||
"fields": {
|
||||
"raw": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed",
|
||||
"ignore_above": 256
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"my_integer": 5, <1>
|
||||
"my_string": "Some string" <2>
|
||||
}
|
||||
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `my_integer` field is mapped as an `integer`.
|
||||
<2> The `my_string` field is mapped as an analyzed `string`, with a `not_analyzed` <<multi-fields,multi field>>.
|
||||
|
||||
|
||||
[[match-unmatch]]
|
||||
==== `match` and `unmatch`
|
||||
|
||||
The `match` parameter uses a pattern to match on the fieldname, while
|
||||
`unmatch` uses a pattern to exclude fields matched by `match`.
|
||||
|
||||
The following example matches all `string` fields whose name starts with
|
||||
`long_` (except for those which end with `_text`) and maps them as `long`
|
||||
fields:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"longs_as_strings": {
|
||||
"match_mapping_type": "string",
|
||||
"match": "long_*",
|
||||
"unmatch": "*_text",
|
||||
"mapping": {
|
||||
"type": "long"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"long_num": "5", <1>
|
||||
"long_text": "foo" <2>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `long_num` field is mapped as a `long`.
|
||||
<2> The `long_text` field uses the default `string` mapping.
|
||||
|
||||
[[match-pattern]]
|
||||
==== `match_pattern`
|
||||
|
||||
The `match_pattern` parameter behaves just like the `match` parameter, but
|
||||
supports full Java regular expression matching on the field name instead of
|
||||
simple wildcards, for instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"match_pattern": "^profit_\d+$"
|
||||
--------------------------------------------------
|
||||
|
||||
[[path-match-unmatch]]
|
||||
==== `path_match` and `path_unmatch`
|
||||
|
||||
The `path_match` and `path_unmatch` parameters work in the same way as `match`
|
||||
and `unmatch`, but operate on the full dotted path to the field, not just the
|
||||
final name, e.g. `some_object.*.some_field`.
|
||||
|
||||
This example copies the values of any fields in the `name` object to the
|
||||
top-level `full_name` field, except for the `middle` field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"full_name": {
|
||||
"path_match": "name.*",
|
||||
"path_unmatch": "*.middle",
|
||||
"mapping": {
|
||||
"type": "string",
|
||||
"copy_to": "full_name"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"name": {
|
||||
"first": "Alice",
|
||||
"middle": "Mary",
|
||||
"last": "White"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
[[template-variables]]
|
||||
==== `{name}` and `{dynamic_type}`
|
||||
|
||||
The `{name}` and `{dynamic_type}` placeholders are replaced in the `mapping`
|
||||
with the field name and detected dynamic type. The following example sets all
|
||||
string fields to use an <<analyzer,`analyzer`>> with the same name as the
|
||||
field, and disables <<doc-values,`doc_values`>> for all non-string fields:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"dynamic_templates": [
|
||||
{
|
||||
"named_analyzers": {
|
||||
"match_mapping_type": "string",
|
||||
"match": "*",
|
||||
"mapping": {
|
||||
"type": "string",
|
||||
"analyzer": "{name}"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"no_doc_values": {
|
||||
"match_mapping_type":"*",
|
||||
"mapping": {
|
||||
"type": "{dynamic_type}",
|
||||
"doc_values": false
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"english": "Some English text", <1>
|
||||
"count": 5 <2>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `english` field is mapped as a `string` field with the `english` analyzer.
|
||||
<2> The `count` field is mapped as a `long` field with `doc_values` disabled
|
||||
|
|
@ -1,257 +0,0 @@
|
|||
[[fielddata-formats]]
|
||||
== Fielddata formats
|
||||
|
||||
The field data format controls how field data should be stored.
|
||||
|
||||
Depending on the field type, there might be several field data types
|
||||
available. In particular, string, geo-point and numeric types support the `doc_values`
|
||||
format which allows for computing the field data data-structures at indexing
|
||||
time and storing them on disk. Although it will make the index larger and may
|
||||
be slightly slower, this implementation will be more near-realtime-friendly
|
||||
and will require much less memory from the JVM than other implementations.
|
||||
|
||||
Here is an example of how to configure the `tag` field to use the `paged_bytes` field
|
||||
data format.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tag": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"format": "paged_bytes"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
It is possible to change the field data format (and the field data settings
|
||||
in general) on a live index by using the update mapping API.
|
||||
|
||||
[float]
|
||||
=== String field data types
|
||||
|
||||
`paged_bytes` (default on analyzed string fields)::
|
||||
Stores unique terms sequentially in a large buffer and maps documents to
|
||||
the indices of the terms they contain in this large buffer.
|
||||
|
||||
`doc_values` (default when index is set to `not_analyzed`)::
|
||||
Computes and stores field data data-structures on disk at indexing time.
|
||||
Lowers memory usage but only works on non-analyzed strings (`index`: `no` or
|
||||
`not_analyzed`).
|
||||
|
||||
[float]
|
||||
=== Numeric field data types
|
||||
|
||||
`array`::
|
||||
Stores field values in memory using arrays.
|
||||
|
||||
`doc_values` (default unless doc values are disabled)::
|
||||
Computes and stores field data data-structures on disk at indexing time.
|
||||
|
||||
[float]
|
||||
=== Geo point field data types
|
||||
|
||||
`array`::
|
||||
Stores latitudes and longitudes in arrays.
|
||||
|
||||
`doc_values` (default unless doc values are disabled)::
|
||||
Computes and stores field data data-structures on disk at indexing time.
|
||||
|
||||
[float]
|
||||
[[global-ordinals]]
|
||||
=== Global ordinals
|
||||
|
||||
Global ordinals is a data-structure on top of field data, that maintains an
|
||||
incremental numbering for all the terms in field data in a lexicographic order.
|
||||
Each term has a unique number and the number of term 'A' is lower than the number
|
||||
of term 'B'. Global ordinals are only supported on string fields.
|
||||
|
||||
Field data on string also has ordinals, which is a unique numbering for all terms
|
||||
in a particular segment and field. Global ordinals just build on top of this,
|
||||
by providing a mapping between the segment ordinals and the global ordinals.
|
||||
The latter being unique across the entire shard.
|
||||
|
||||
Global ordinals can be beneficial in search features that use segment ordinals already
|
||||
such as the terms aggregator to improve the execution time. Often these search features
|
||||
need to merge the segment ordinal results to a cross segment terms result. With
|
||||
global ordinals this mapping happens during field data load time instead of during each
|
||||
query execution. With global ordinals search features only need to resolve the actual
|
||||
term when building the (shard) response, but during the execution there is no need
|
||||
at all to use the actual terms and the unique numbering global ordinals provided is
|
||||
sufficient and improves the execution time.
|
||||
|
||||
Global ordinals for a specified field are tied to all the segments of a shard (Lucene index),
|
||||
which is different than for field data for a specific field which is tied to a single segment.
|
||||
For this reason global ordinals need to be rebuilt in its entirety once new segments
|
||||
become visible. This one time cost would happen anyway without global ordinals, but
|
||||
then it would happen for each search execution instead!
|
||||
|
||||
The loading time of global ordinals depends on the number of terms in a field, but in general
|
||||
it is low, since it source field data has already been loaded. The memory overhead of global
|
||||
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
|
||||
can move the loading time from the first search request, to the refresh itself.
|
||||
|
||||
[float]
|
||||
[[fielddata-loading]]
|
||||
=== Fielddata loading
|
||||
|
||||
By default, field data is loaded lazily, ie. the first time that a query that
|
||||
requires them is executed. However, this can make the first requests that
|
||||
follow a merge operation quite slow since fielddata loading is a heavy
|
||||
operation.
|
||||
|
||||
It is possible to force field data to be loaded and cached eagerly through the
|
||||
`loading` setting of fielddata:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"category": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"loading": "eager"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Global ordinals can also be eagerly loaded:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"category": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"loading": "eager_global_ordinals"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
With the above setting both field data and global ordinals for a specific field
|
||||
are eagerly loaded.
|
||||
|
||||
[float]
|
||||
==== Disabling field data loading
|
||||
|
||||
Field data can take a lot of RAM so it makes sense to disable field data
|
||||
loading on the fields that don't need field data, for example those that are
|
||||
used for full-text search only. In order to disable field data loading, just
|
||||
change the field data format to `disabled`. When disabled, all requests that
|
||||
will try to load field data, e.g. when they include aggregations and/or sorting,
|
||||
will return an error.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"text": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"format": "disabled"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The `disabled` format is supported by all field types.
|
||||
|
||||
[float]
|
||||
[[field-data-filtering]]
|
||||
=== Filtering fielddata
|
||||
|
||||
It is possible to control which field values are loaded into memory,
|
||||
which is particularly useful for string fields. When specifying the
|
||||
<<mapping-core-types,mapping>> for a field, you
|
||||
can also specify a fielddata filter.
|
||||
|
||||
Fielddata filters can be changed using the
|
||||
<<indices-put-mapping,PUT mapping>>
|
||||
API. After changing the filters, use the
|
||||
<<indices-clearcache,Clear Cache>> API
|
||||
to reload the fielddata using the new filters.
|
||||
|
||||
[float]
|
||||
==== Filtering by frequency:
|
||||
|
||||
The frequency filter allows you to only load terms whose frequency falls
|
||||
between a `min` and `max` value, which can be expressed an absolute
|
||||
number (when the number is bigger than 1.0) or as a percentage
|
||||
(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
|
||||
*per segment*. Percentages are based on the number of docs which have a
|
||||
value for the field, as opposed to all docs in the segment.
|
||||
|
||||
Small segments can be excluded completely by specifying the minimum
|
||||
number of docs that the segment should contain with `min_segment_size`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tag": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"filter": {
|
||||
"frequency": {
|
||||
"min": 0.001,
|
||||
"max": 0.1,
|
||||
"min_segment_size": 500
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Filtering by regex
|
||||
|
||||
Terms can also be filtered by regular expression - only values which
|
||||
match the regular expression are loaded. Note: the regular expression is
|
||||
applied to each term in the field, not to the whole field value. For
|
||||
instance, to only load hashtags from a tweet, we can use a regular
|
||||
expression which matches terms beginning with `#`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet": {
|
||||
"type": "string",
|
||||
"analyzer": "whitespace"
|
||||
"fielddata": {
|
||||
"filter": {
|
||||
"regex": {
|
||||
"pattern": "^#.*"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Combining filters
|
||||
|
||||
The `frequency` and `regex` filters can be combined:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet": {
|
||||
"type": "string",
|
||||
"analyzer": "whitespace"
|
||||
"fielddata": {
|
||||
"filter": {
|
||||
"regex": {
|
||||
"pattern": "^#.*",
|
||||
},
|
||||
"frequency": {
|
||||
"min": 0.001,
|
||||
"max": 0.1,
|
||||
"min_segment_size": 500
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
|
@ -5,7 +5,8 @@ Each document has metadata associated with it, such as the `_index`, mapping
|
|||
<<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields
|
||||
can be customised when a mapping type is created.
|
||||
|
||||
The meta-fields are:
|
||||
[float]
|
||||
=== Identity meta-fields
|
||||
|
||||
[horizontal]
|
||||
<<mapping-index-field,`_index`>>::
|
||||
|
@ -18,16 +19,26 @@ The meta-fields are:
|
|||
|
||||
<<mapping-type-field,`_type`>>::
|
||||
|
||||
The document's <<all-mapping-types,mapping type>>.
|
||||
The document's <<mapping-type,mapping type>>.
|
||||
|
||||
<<mapping-id-field,`_id`>>::
|
||||
|
||||
The document's ID.
|
||||
|
||||
[float]
|
||||
=== Document source meta-fields
|
||||
|
||||
<<mapping-source-field,`_source`>>::
|
||||
|
||||
The original JSON representing the body of the document.
|
||||
|
||||
<<mapping-size-field,`_size`>>::
|
||||
|
||||
The size of the `_source` field in bytes.
|
||||
|
||||
[float]
|
||||
=== Indexing meta-fields
|
||||
|
||||
<<mapping-all-field,`_all`>>::
|
||||
|
||||
A _catch-all_ field that indexes the values of all other fields.
|
||||
|
@ -36,18 +47,6 @@ The meta-fields are:
|
|||
|
||||
All fields in the document which contain non-null values.
|
||||
|
||||
<<mapping-parent-field,`_parent`>>::
|
||||
|
||||
Used to create a parent-child relationship between two mapping types.
|
||||
|
||||
<<mapping-routing-field,`_routing`>>::
|
||||
|
||||
A custom routing value which routes a document to a particular shard.
|
||||
|
||||
<<mapping-size-field,`_size`>>::
|
||||
|
||||
The size of the `_source` field in bytes.
|
||||
|
||||
<<mapping-timestamp-field,`_timestamp`>>::
|
||||
|
||||
A timestamp associated with the document, either specified manually or auto-generated.
|
||||
|
@ -56,27 +55,49 @@ The meta-fields are:
|
|||
|
||||
How long a document should live before it is automatically deleted.
|
||||
|
||||
include::fields/index-field.asciidoc[]
|
||||
[float]
|
||||
=== Routing meta-fields
|
||||
|
||||
include::fields/uid-field.asciidoc[]
|
||||
<<mapping-parent-field,`_parent`>>::
|
||||
|
||||
include::fields/type-field.asciidoc[]
|
||||
Used to create a parent-child relationship between two mapping types.
|
||||
|
||||
<<mapping-routing-field,`_routing`>>::
|
||||
|
||||
A custom routing value which routes a document to a particular shard.
|
||||
|
||||
[float]
|
||||
=== Other meta-field
|
||||
|
||||
<<mapping-meta-field,`_meta`>>::
|
||||
|
||||
Application specific metadata.
|
||||
|
||||
include::fields/id-field.asciidoc[]
|
||||
|
||||
include::fields/source-field.asciidoc[]
|
||||
|
||||
include::fields/all-field.asciidoc[]
|
||||
|
||||
include::fields/field-names-field.asciidoc[]
|
||||
|
||||
include::fields/id-field.asciidoc[]
|
||||
|
||||
include::fields/index-field.asciidoc[]
|
||||
|
||||
include::fields/meta-field.asciidoc[]
|
||||
|
||||
include::fields/parent-field.asciidoc[]
|
||||
|
||||
include::fields/routing-field.asciidoc[]
|
||||
|
||||
include::fields/size-field.asciidoc[]
|
||||
|
||||
include::fields/source-field.asciidoc[]
|
||||
|
||||
include::fields/timestamp-field.asciidoc[]
|
||||
|
||||
include::fields/ttl-field.asciidoc[]
|
||||
|
||||
include::fields/type-field.asciidoc[]
|
||||
|
||||
include::fields/uid-field.asciidoc[]
|
||||
|
||||
|
|
|
@ -151,82 +151,18 @@ PUT my_index
|
|||
<1> The `_all` field is disabled for the `my_type` type.
|
||||
<2> The `query_string` query will default to querying the `content` field in this index.
|
||||
|
||||
[[include-in-all]]
|
||||
==== Including specific fields in `_all`
|
||||
[[excluding-from-all]]
|
||||
==== Excluding fields from `_all`
|
||||
|
||||
Individual fields can be included or excluded from the `_all` field with the
|
||||
`include_in_all` setting, which defaults to `true`:
|
||||
<<include-in-all,`include_in_all`>> setting.
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"title": { <1>
|
||||
"type": "string"
|
||||
}
|
||||
"content": { <1>
|
||||
"type": "string"
|
||||
},
|
||||
"date": { <2>
|
||||
"type": "date",
|
||||
"include_in_all": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `title` and `content` fields with be included in the `_all` field.
|
||||
<2> The `date` field will not be included in the `_all` field.
|
||||
|
||||
The `include_in_all` parameter can also be set at the type level and on
|
||||
<<mapping-object-type,`object`>> or <<mapping-nested-type,`nested`>> fields,
|
||||
in which case all sub-fields inherit that setting. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"include_in_all": false, <1>
|
||||
"properties": {
|
||||
"title": { "type": "string" },
|
||||
"author": {
|
||||
"include_in_all": true, <2>
|
||||
"properties": {
|
||||
"first_name": { "type": "string" },
|
||||
"last_name": { "type": "string" }
|
||||
}
|
||||
},
|
||||
"editor": {
|
||||
"properties": {
|
||||
"first_name": { "type": "string" }, <3>
|
||||
"last_name": { "type": "string", "include_in_all": true } <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> All fields in `my_type` are excluded from `_all`.
|
||||
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
|
||||
<3> Only the `editor.last_name` field is included in `_all`.
|
||||
The `editor.first_name` inherits the type-level setting and is excluded.
|
||||
|
||||
[[all-field-and-boosting]]
|
||||
==== Index boosting and the `_all` field
|
||||
|
||||
Individual fields can be _boosted_ at index time, with the `boost` parameter.
|
||||
The `_all` field takes these boosts into account:
|
||||
Individual fields can be _boosted_ at index time, with the <<index-boost,`boost`>>
|
||||
parameter. The `_all` field takes these boosts into account:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
=== `_id` field
|
||||
|
||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
|
||||
`_id` field is not indexed as its value can be derived automatically from the
|
||||
<<mapping-type>>) and an <<mapping-id-field,`_id`>>. The `_id` field is not
|
||||
indexed as its value can be derived automatically from the
|
||||
<<mapping-uid-field,`_uid`>> field.
|
||||
|
||||
The value of the `_id` field is accessible in queries and scripts, but _not_
|
||||
|
|
|
@ -0,0 +1,30 @@
|
|||
[[mapping-meta-field]]
|
||||
=== `_meta` field
|
||||
|
||||
Each mapping type can have custom meta data associated with it. These are not
|
||||
used at all by Elasticsearch, but can be used to store application-specific
|
||||
metadata, such as the class that a document belongs to:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"user": {
|
||||
"_meta": { <1>
|
||||
"class": "MyApp::User",
|
||||
"version": {
|
||||
"min": "1.0",
|
||||
"max": "1.3"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This `_meta` info can be retrieved with the
|
||||
<<indices-get-mapping,GET mapping>> API.
|
||||
|
||||
The `_meta` field can be updated on an existing type using the
|
||||
<<indices-put-mapping,PUT mapping>> API.
|
|
@ -78,8 +78,7 @@ stored.
|
|||
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
||||
`_source`, especially the fact that you cannot reindex documents from one
|
||||
Elasticsearch index to another. Consider using
|
||||
<<search-request-source-filtering,source filtering>> or a
|
||||
<<mapping-transform,transform script>> instead.
|
||||
<<search-request-source-filtering,source filtering>> instead.
|
||||
|
||||
The `includes`/`excludes` parameters (which also accept wildcards) can be used
|
||||
as follows:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
[[mapping-ttl-field]]
|
||||
=== `_ttl`
|
||||
=== `_ttl` field
|
||||
|
||||
Some types of documents, such as session data or special offers, come with an
|
||||
expiration date. The `_ttl` field allows you to specify the minimum time a
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
=== `_type` field
|
||||
|
||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
|
||||
`_type` field is indexed in order to make searching by type name fast.
|
||||
<<mapping-type>>) and an <<mapping-id-field,`_id`>>. The `_type` field is
|
||||
indexed in order to make searching by type name fast.
|
||||
|
||||
The value of the `_type` field is accessible in queries, aggregations,
|
||||
scripts, and when sorting:
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
=== `_uid` field
|
||||
|
||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. These
|
||||
values are combined as `{type}#{id}` and indexed as the `_uid` field.
|
||||
<<mapping-type>>) and an <<mapping-id-field,`_id`>>. These values are
|
||||
combined as `{type}#{id}` and indexed as the `_uid` field.
|
||||
|
||||
The value of the `_uid` field is accessible in queries, aggregations, scripts,
|
||||
and when sorting:
|
||||
|
|
|
@ -1,25 +0,0 @@
|
|||
[[mapping-meta]]
|
||||
== Meta
|
||||
|
||||
Each mapping can have custom meta data associated with it. These are
|
||||
simple storage elements that are simply persisted along with the mapping
|
||||
and can be retrieved when fetching the mapping definition. The meta is
|
||||
defined under the `_meta` element, for example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"_meta" : {
|
||||
"attr1" : "value1",
|
||||
"attr2" : {
|
||||
"attr3" : "value3"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Meta can be handy for example for client libraries that perform
|
||||
serialization and deserialization to store its meta model (for example,
|
||||
the class the document maps to).
|
|
@ -0,0 +1,100 @@
|
|||
[[mapping-params]]
|
||||
== Mapping parameters
|
||||
|
||||
The following pages provide detailed explanations of the various mapping
|
||||
parameters that are used by <<mapping-types,field mappings>>:
|
||||
|
||||
|
||||
The following mapping parameters are common to some or all field datatypes:
|
||||
|
||||
* <<analyzer,`analyzer`>>
|
||||
* <<index-boost,`boost`>>
|
||||
* <<coerce,`coerce`>>
|
||||
* <<copy-to,`copy_to`>>
|
||||
* <<doc-values,`doc_values`>>
|
||||
* <<dynamic,`dynamic`>>
|
||||
* <<enabled,`enabled`>>
|
||||
* <<fielddata,`fielddata`>>
|
||||
* <<geohash,`geohash`>>
|
||||
* <<geohash-precision,`geohash_precision`>>
|
||||
* <<geohash-prefix,`geohash_prefix`>>
|
||||
* <<mapping-date-format,`format`>>
|
||||
* <<ignore-above,`ignore_above`>>
|
||||
* <<ignore-malformed,`ignore_malformed`>>
|
||||
* <<include-in-all,`include_in_all`>>
|
||||
* <<index-options,`index_options`>>
|
||||
* <<lat-lon,`lat_lon`>>
|
||||
* <<mapping-index,`index`>>
|
||||
* <<multi-fields,`fields`>>
|
||||
* <<norms,`norms`>>
|
||||
* <<null-value,`null_value`>>
|
||||
* <<position-offset-gap,`position_offset_gap`>>
|
||||
* <<properties,`properties`>>
|
||||
* <<search-analyzer,`search_analyzer`>>
|
||||
* <<similarity,`similarity`>>
|
||||
* <<mapping-store,`store`>>
|
||||
* <<term-vector,`term_vector`>>
|
||||
|
||||
|
||||
include::params/analyzer.asciidoc[]
|
||||
|
||||
include::params/boost.asciidoc[]
|
||||
|
||||
include::params/coerce.asciidoc[]
|
||||
|
||||
include::params/copy-to.asciidoc[]
|
||||
|
||||
include::params/doc-values.asciidoc[]
|
||||
|
||||
include::params/dynamic.asciidoc[]
|
||||
|
||||
include::params/enabled.asciidoc[]
|
||||
|
||||
include::params/fielddata.asciidoc[]
|
||||
|
||||
include::params/format.asciidoc[]
|
||||
|
||||
include::params/geohash.asciidoc[]
|
||||
|
||||
include::params/geohash-precision.asciidoc[]
|
||||
|
||||
include::params/geohash-prefix.asciidoc[]
|
||||
|
||||
include::params/ignore-above.asciidoc[]
|
||||
|
||||
include::params/ignore-malformed.asciidoc[]
|
||||
|
||||
include::params/include-in-all.asciidoc[]
|
||||
|
||||
include::params/index.asciidoc[]
|
||||
|
||||
include::params/index-options.asciidoc[]
|
||||
|
||||
include::params/lat-lon.asciidoc[]
|
||||
|
||||
include::params/multi-fields.asciidoc[]
|
||||
|
||||
include::params/norms.asciidoc[]
|
||||
|
||||
include::params/null-value.asciidoc[]
|
||||
|
||||
include::params/position-offset-gap.asciidoc[]
|
||||
|
||||
include::params/precision-step.asciidoc[]
|
||||
|
||||
include::params/properties.asciidoc[]
|
||||
|
||||
include::params/search-analyzer.asciidoc[]
|
||||
|
||||
include::params/similarity.asciidoc[]
|
||||
|
||||
include::params/store.asciidoc[]
|
||||
|
||||
include::params/term-vector.asciidoc[]
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
|
@ -0,0 +1,80 @@
|
|||
[[analyzer]]
|
||||
=== `analyzer`
|
||||
|
||||
The values of <<mapping-index,`analyzed`>> string fields are passed through an
|
||||
<<analysis,analyzer>> to convert the string into a stream of _tokens_ or
|
||||
_terms_. For instance, the string `"The quick Brown Foxes."` may, depending
|
||||
on which analyzer is used, be analyzed to the tokens: `quick`, `brown`,
|
||||
`fox`. These are the actual terms that are indexed for the field, which makes
|
||||
it possible to search efficiently for individual words _within_ big blobs of
|
||||
text.
|
||||
|
||||
This analysis process needs to happen not just at index time, but also at
|
||||
query time: the query string needs to be passed through the same (or a
|
||||
similar) analyzer so that the terms that it tries to find are in the same
|
||||
format as those that exist in the index.
|
||||
|
||||
Elasticsearch ships with a number of <<analysis-analyzers,pre-defined analyzers>>,
|
||||
which can be used without further configuration. It also ships with many
|
||||
<<analysis-charfilters,character filters>>, <<analysis-tokenizers,tokenizers>>,
|
||||
and <<analysis-tokenfilters>> which can be combined to configure
|
||||
custom analyzers per index.
|
||||
|
||||
Analyzers can be specified per-query, per-field or per-index. At index time,
|
||||
Elasticsearch will look for an analyzer in this order:
|
||||
|
||||
* The `analyzer` defined in the field mapping.
|
||||
* An analyzer named `default` in the index settings.
|
||||
* The <<analysis-standard-analyzer,`standard`>> analyzer.
|
||||
|
||||
At query time, there are a few more layers:
|
||||
|
||||
* The `analyzer` defined in a <<full-text-queries,full-text query>>.
|
||||
* The `search_analyzer` defined in the field mapping.
|
||||
* The `analyzer` defined in the field mapping.
|
||||
* An analyzer named `default_search` in the index settings.
|
||||
* An analyzer named `default` in the index settings.
|
||||
* The <<analysis-standard-analyzer,`standard`>> analyzer.
|
||||
|
||||
The easiest way to specify an analyzer for a particular field is to define it
|
||||
in the field mapping, as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"text": { <1>
|
||||
"type": "string",
|
||||
"fields": {
|
||||
"english": { <2>
|
||||
"type": "string",
|
||||
"analyzer": "english"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_analyze?field=text <3>
|
||||
{
|
||||
"text": "The quick Brown Foxes."
|
||||
}
|
||||
|
||||
GET my_index/_analyze?field=text.english <4>
|
||||
{
|
||||
"text": "The quick Brown Foxes."
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `text` field uses the default `standard` analyzer`.
|
||||
<2> The `text.english` <<multi-fields,multi-field>> uses the `english` analyzer, which removes stop words and applies stemming.
|
||||
<3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ].
|
||||
<4> This returns the tokens: [ `quick`, `brown`, `fox` ].
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
[[index-boost]]
|
||||
=== `boost`
|
||||
|
||||
Individual fields can be _boosted_ -- count more towards the relevance score
|
||||
-- at index time, with the `boost` parameter as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"boost": 2 <1>
|
||||
},
|
||||
"content": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Matches on the `title` field will have twice the weight as those on the
|
||||
`content` field, which has the default `boost` of `1.0`.
|
||||
|
||||
Note that a `title` field will usually be shorter than a `content` field. The
|
||||
default relevance calculation takes field length into account, so a short
|
||||
`title` field will have a higher natural boost than a long `content` field.
|
||||
|
||||
[WARNING]
|
||||
.Why index time boosting is a bad idea
|
||||
==================================================
|
||||
|
||||
We advise against using index time boosting for the following reasons:
|
||||
|
||||
* You cannot change index-time `boost` values without reindexing all of your
|
||||
documents.
|
||||
|
||||
* Every query supports query-time boosting which achieves the same effect. The
|
||||
difference is that you can tweak the `boost` value without having to reindex.
|
||||
|
||||
* Index-time boosts are stored as part of the <<norms,`norm`>>, which is only one
|
||||
byte. This reduces the resolution of the field length normalization factor
|
||||
which can lead to lower quality relevance calculations.
|
||||
|
||||
==================================================
|
||||
|
||||
The only advantage that index time boosting has is that it is copied with the
|
||||
value into the <<mapping-all-field,`_all`>> field. This means that, when
|
||||
querying the `_all` field, words that originated from the `title` field will
|
||||
have a higher score than words that originated in the `content` field.
|
||||
This functionality comes at a cost: queries on the `_all` field are slower
|
||||
when index-time boosting is used.
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
[[coerce]]
|
||||
=== `coerce`
|
||||
|
||||
Data is not always clean. Depending on how it is produced a number might be
|
||||
rendered in the JSON body as a true JSON number, e.g. `5`, but it might also
|
||||
be rendered as a string, e.g. `"5"`. Alternatively, a number that should be
|
||||
an integer might instead be rendered as a floating point, e.g. `5.0`, or even
|
||||
`"5.0"`.
|
||||
|
||||
Coercion attempts to clean up dirty values to fit the datatype of a field.
|
||||
For instance:
|
||||
|
||||
* Strings will be coerced to numbers.
|
||||
* Floating points will be truncated for integer values.
|
||||
* Lon/lat geo-points will be normalized to a standard -180:180 / -90:90 coordinate system.
|
||||
|
||||
For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"number_one": {
|
||||
"type": "integer"
|
||||
},
|
||||
"number_two": {
|
||||
"type": "integer",
|
||||
"coerce": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"number_one": "10" <1>
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"number_two": "10" <2>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `number_one` field will contain the integer `10`.
|
||||
<2> This document will be rejected because coercion is disabled.
|
||||
|
||||
[[coerce-setting]]
|
||||
==== Index-level default
|
||||
|
||||
The `index.mapping.coerce` setting can be set on the index level to disable
|
||||
coercion globally across all mapping types:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"settings": {
|
||||
"index.mapping.coerce": false
|
||||
},
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"number_one": {
|
||||
"type": "integer"
|
||||
},
|
||||
"number_two": {
|
||||
"type": "integer",
|
||||
"coerce": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{ "number_one": "10" } <1>
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{ "number_two": "10" } <2>
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This document will be rejected because the `number_one` field inherits the index-level coercion setting.
|
||||
<2> The `number_two` field overrides the index level setting to enable coercion.
|
||||
|
|
@ -0,0 +1,64 @@
|
|||
[[copy-to]]
|
||||
=== `copy_to`
|
||||
|
||||
The `copy_to` parameter allows you to create custom
|
||||
<<mapping-all-field,`_all`>> fields. In other words, the values of multiple
|
||||
fields can be copied into a group field, which can then be queried as a single
|
||||
field. For instance, the `first_name` and `last_name` fields can be copied to
|
||||
the `full_name` field as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"first_name": {
|
||||
"type": "string",
|
||||
"copy_to": "full_name" <1>
|
||||
},
|
||||
"last_name": {
|
||||
"type": "string",
|
||||
"copy_to": "full_name" <1>
|
||||
},
|
||||
"full_name": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT /my_index/my_type/1
|
||||
{
|
||||
"first_name": "John",
|
||||
"last_name": "Smith"
|
||||
}
|
||||
|
||||
GET /my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"full_name": { <2>
|
||||
"query": "John Smith",
|
||||
"operator": "and"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The values of the `first_name` and `last_name` fields are copied to the
|
||||
`full_name` field.
|
||||
|
||||
<2> The `first_name` and `last_name` fields can still be queried for the
|
||||
first name and last name respectively, but the `full_name` field can be
|
||||
queried for both first and last names.
|
||||
|
||||
Some important points:
|
||||
|
||||
* It is the field _value_ which is copied, not the terms (which result from the analysis process).
|
||||
* The original <<mapping-source-field,`_source`>> field will not be modified to show the copied values.
|
||||
* The same value can be copied to multiple fields, with `"copy_to": [ "field_1", "field_2" ]`
|
|
@ -0,0 +1,46 @@
|
|||
[[doc-values]]
|
||||
=== `doc_values`
|
||||
|
||||
Most fields are <<mapping-index,indexed>> by default, which makes them
|
||||
searchable. The inverted index allows queries to look up the search term in
|
||||
unique sorted list of terms, and from that immediately have access to the list
|
||||
of documents that contain the term.
|
||||
|
||||
Sorting, aggregations, and access to field values in scripts requires a
|
||||
different data access pattern. Instead of lookup up the term and finding
|
||||
documents, we need to be able to look up the document and find the terms that
|
||||
is has in a field.
|
||||
|
||||
Doc values are the on-disk data structure, built at document index time, which
|
||||
makes this data access pattern possible. Doc values are supported on almost
|
||||
all field types, with the __notable exception of `analyzed` string fields__.
|
||||
|
||||
All fields which support doc values have them enabled by default. If you are
|
||||
sure that you don't need to sort or aggregate on a field, or access the field
|
||||
value from a script, you can disable doc values in order to save disk space:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"status_code": { <1>
|
||||
"type": "string",
|
||||
"index": "not_analyzed"
|
||||
},
|
||||
"session_id": { <2>
|
||||
"type": "string",
|
||||
"index": "not_analyzed",
|
||||
"doc_values": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `status_code` field has `doc_values` enabled by default.
|
||||
<2> The `session_id` has `doc_values` disabled, but can still be queried.
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
[[dynamic]]
|
||||
=== `dynamic`
|
||||
|
||||
By default, fields can be added _dynamically_ to a document, or to
|
||||
<<object,inner objects>> within a document, just by indexing a document
|
||||
containing the new field. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
DELETE my_index <1>
|
||||
|
||||
PUT my_index/my_type/1 <2>
|
||||
{
|
||||
"username": "johnsmith",
|
||||
"name": {
|
||||
"first": "John",
|
||||
"last": "Smith"
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_mapping <3>
|
||||
|
||||
PUT my_index/my_type/2 <4>
|
||||
{
|
||||
"username": "marywhite",
|
||||
"email": "mary@white.com",
|
||||
"name": {
|
||||
"first": "Mary",
|
||||
"middle": "Alice",
|
||||
"last": "White"
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_mapping <5>
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> First delete the index, in case it already exists.
|
||||
<2> This document introduces the string field `username`, the object field
|
||||
`name`, and two string fields under the `name` object which can be
|
||||
referred to as `name.first` and `name.last`.
|
||||
<3> Check the mapping to verify the above.
|
||||
<4> This document adds two string fields: `email` and `name.middle`.
|
||||
<5> Check the mapping to verify the changes.
|
||||
|
||||
The details of how new fields are detected and added to the mapping is explained in <<dynamic-mapping>>.
|
||||
|
||||
The `dynamic` setting controls whether new fields can be added dynamically or
|
||||
not. It accepts three settings:
|
||||
|
||||
[horizontal]
|
||||
`true`:: Newly detected fields are added to the mapping. (default)
|
||||
`false`:: Newly detected fields are ignored. New fields must be added explicitly.
|
||||
`strict`:: If new fields are detected, an exception is thrown and the document is rejected.
|
||||
|
||||
The `dynamic` setting may be set at the mapping type level, and on each
|
||||
<<object,inner object>>. Inner objects inherit the setting from their parent
|
||||
object or from the mapping type. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"dynamic": false, <1>
|
||||
"properties": {
|
||||
"user": { <2>
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"social_networks": { <3>
|
||||
"dynamic": true,
|
||||
"properties": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Dynamic mapping is disabled at the type level, so no new top-level fields will be added dynamically.
|
||||
<2> The `user` object inherits the type-level setting.
|
||||
<3> The `user.social_networks` object enables dynamic mapping, so new fields may be added to this inner object.
|
||||
|
|
@ -0,0 +1,94 @@
|
|||
[[enabled]]
|
||||
=== `enabled`
|
||||
|
||||
Elasticsearch tries to index all of the fields you give it, but sometimes you
|
||||
want to just store the field without indexing it. For instance, imagine that
|
||||
you are using Elasticsearch as a web session store. You may want to index the
|
||||
session ID and last update time, but you don't need to query or run
|
||||
aggregations on the session data itself.
|
||||
|
||||
The `enabled` setting, which can be applied only to the mapping type and to
|
||||
<<object,`object`>> fields, causes Elasticsearch to skip parsing of the
|
||||
contents of the field entirely. The JSON can still be retrieved from the
|
||||
<<mapping-source-field,`_source`>> field, but it is not searchable or stored
|
||||
in any other way:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"session": {
|
||||
"properties": {
|
||||
"user_id": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed"
|
||||
},
|
||||
"last_updated": {
|
||||
"type": "date"
|
||||
},
|
||||
"session_data": { <1>
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/session/session_1
|
||||
{
|
||||
"user_id": "kimchy",
|
||||
"session_data": { <2>
|
||||
"arbitrary_object": {
|
||||
"some_array": [ "foo", "bar", { "baz": 2 } ]
|
||||
}
|
||||
},
|
||||
"last_updated": "2015-12-06T18:20:22"
|
||||
}
|
||||
|
||||
PUT my_index/session/session_2
|
||||
{
|
||||
"user_id": "jpountz",
|
||||
"session_data": "none", <3>
|
||||
"last_updated": "2015-12-06T18:22:13"
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `session_data` field is disabled.
|
||||
<2> Any arbitrary data can be passed to the `session_data` field as it will be entirely ignored.
|
||||
<3> The `session_data` will also ignore values that are not JSON objects.
|
||||
|
||||
The entire mapping type may be disabled as well, in which case the document is
|
||||
stored in the <<mapping-source-field,`_source`>> field, which means it can be
|
||||
retrieved, but none of its contents are indexed in any way:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"session": { <1>
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/session/session_1
|
||||
{
|
||||
"user_id": "kimchy",
|
||||
"session_data": {
|
||||
"arbitrary_object": {
|
||||
"some_array": [ "foo", "bar", { "baz": 2 } ]
|
||||
}
|
||||
},
|
||||
"last_updated": "2015-12-06T18:20:22"
|
||||
}
|
||||
|
||||
GET my_index/session/session_1 <2>
|
||||
|
||||
GET my_index/_mapping <3>
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The entire `session` mapping type is disabled.
|
||||
<2> The document can be retrieved.
|
||||
<3> Checking the mapping reveals that no fields have been added.
|
|
@ -0,0 +1,225 @@
|
|||
[[fielddata]]
|
||||
=== `fielddata`
|
||||
|
||||
Most fields are <<mapping-index,indexed>> by default, which makes them
|
||||
searchable. The inverted index allows queries to look up the search term in
|
||||
unique sorted list of terms, and from that immediately have access to the list
|
||||
of documents that contain the term.
|
||||
|
||||
Sorting, aggregations, and access to field values in scripts requires a
|
||||
different data access pattern. Instead of lookup up the term and finding
|
||||
documents, we need to be able to look up the document and find the terms that
|
||||
is has in a field.
|
||||
|
||||
Most fields can use index-time, on-disk <<doc-values,`doc_values`>> to support
|
||||
this type of data access pattern, but `analyzed` string fields do not support
|
||||
`doc_values`.
|
||||
|
||||
Instead, `analyzed` strings use a query-time data structure called
|
||||
`fielddata`. This data structure is built on demand the first time that a
|
||||
field is used for aggregations, sorting, or is accessed in a script. It is built
|
||||
by reading the entire inverted index for each segment from disk, inverting the
|
||||
term ↔︎ document relationship, and storing the result in memory, in the
|
||||
JVM heap.
|
||||
|
||||
|
||||
Loading fielddata is an expensive process so, once it has been loaded, it
|
||||
remains in memory for the lifetime of the segment.
|
||||
|
||||
[WARNING]
|
||||
.Fielddata can fill up your heap space
|
||||
==============================================================================
|
||||
Fielddata can consume a lot of heap space, especially when loading high
|
||||
cardinality `analyzed` string fields. Most of the time, it doesn't make sense
|
||||
to sort or aggregate on `analyzed` string fields (with the notable exception
|
||||
of the
|
||||
<<search-aggregations-bucket-significantterms-aggregation,`significant_terms`>>
|
||||
aggregation). Always think about whether a `not_analyzed` field (which can
|
||||
use `doc_values`) would be a better fit for your use case.
|
||||
==============================================================================
|
||||
|
||||
[[fielddata-format]]
|
||||
==== `fielddata.format`
|
||||
|
||||
For `analyzed` string fields, the fielddata `format` controls whether
|
||||
fielddata should be enabled or not. It accepts: `disabled` and `paged_bytes`
|
||||
(enabled, which is the default). To disable fielddata loading, you can use
|
||||
the following mapping:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"format": "disabled" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `text` field cannot be used for sorting, aggregations, or in scripts.
|
||||
|
||||
.Fielddata and other datatypes
|
||||
[NOTE]
|
||||
==================================================
|
||||
|
||||
Historically, other field datatypes also used fielddata, but this has been replaced
|
||||
by index-time, disk-based <<doc-values,`doc_values`>>.
|
||||
|
||||
==================================================
|
||||
|
||||
|
||||
[[fielddata-loading]]
|
||||
==== `fielddata.loading`
|
||||
|
||||
This per-field setting controls when fielddata is loaded into memory. It
|
||||
accepts three options:
|
||||
|
||||
[horizontal]
|
||||
`lazy`::
|
||||
|
||||
Fielddata is only loaded into memory when it is needed. (default)
|
||||
|
||||
`eager`::
|
||||
|
||||
Fielddata is loaded into memory before a new search segment becomes
|
||||
visible to search. This can reduce the latency that a user may experience
|
||||
if their search request has to trigger lazy loading from a big segment.
|
||||
|
||||
`eager_global_ordinals`::
|
||||
|
||||
Loading fielddata into memory is only part of the work that is required.
|
||||
After loading the fielddata for each segment, Elasticsearch builds the
|
||||
<<global-ordinals>> data structure to make a list of all unique terms
|
||||
across all the segments in a shard. By default, global ordinals are built
|
||||
lazily. If the field has a very high cardinality, global ordinals may
|
||||
take some time to build, in which case you can use eager loading instead.
|
||||
|
||||
[[global-ordinals]]
|
||||
.Global ordinals
|
||||
*****************************************
|
||||
|
||||
Global ordinals is a data-structure on top of fielddata and doc values, that
|
||||
maintains an incremental numbering for each unique term in a lexicographic
|
||||
order. Each term has a unique number and the number of term 'A' is lower than
|
||||
the number of term 'B'. Global ordinals are only supported on string fields.
|
||||
|
||||
Fielddata and doc values also have ordinals, which is a unique numbering for all terms
|
||||
in a particular segment and field. Global ordinals just build on top of this,
|
||||
by providing a mapping between the segment ordinals and the global ordinals,
|
||||
the latter being unique across the entire shard.
|
||||
|
||||
Global ordinals are used for features that use segment ordinals, such as
|
||||
sorting and the terms aggregation, to improve the execution time. A terms
|
||||
aggregation relies purely on global ordinals to perform the aggregation at the
|
||||
shard level, then converts global ordinals to the real term only for the final
|
||||
reduce phase, which combines results from different shards.
|
||||
|
||||
Global ordinals for a specified field are tied to _all the segments of a
|
||||
shard_, while fielddata and doc values ordinals are tied to a single segment.
|
||||
which is different than for field data for a specific field which is tied to a
|
||||
single segment. For this reason global ordinals need to be entirely rebuilt
|
||||
whenever a once new segment becomes visible.
|
||||
|
||||
The loading time of global ordinals depends on the number of terms in a field, but in general
|
||||
it is low, since it source field data has already been loaded. The memory overhead of global
|
||||
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
|
||||
can move the loading time from the first search request, to the refresh itself.
|
||||
|
||||
*****************************************
|
||||
|
||||
[[field-data-filtering]]
|
||||
==== `fielddata.filter`
|
||||
|
||||
Fielddata filtering can be used to reduce the number of terms loaded into
|
||||
memory, and thus reduce memory usage. Terms can be filtered by _frequency_ or
|
||||
by _regular expression_, or a combination of the two:
|
||||
|
||||
Filtering by frequency::
|
||||
+
|
||||
--
|
||||
|
||||
The frequency filter allows you to only load terms whose term frequency falls
|
||||
between a `min` and `max` value, which can be expressed an absolute
|
||||
number (when the number is bigger than 1.0) or as a percentage
|
||||
(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
|
||||
*per segment*. Percentages are based on the number of docs which have a
|
||||
value for the field, as opposed to all docs in the segment.
|
||||
|
||||
Small segments can be excluded completely by specifying the minimum
|
||||
number of docs that the segment should contain with `min_segment_size`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"tag": {
|
||||
"type": "string",
|
||||
"fielddata": {
|
||||
"filter": {
|
||||
"frequency": {
|
||||
"min": 0.001,
|
||||
"max": 0.1,
|
||||
"min_segment_size": 500
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
--
|
||||
|
||||
Filtering by regex::
|
||||
+
|
||||
--
|
||||
Terms can also be filtered by regular expression - only values which
|
||||
match the regular expression are loaded. Note: the regular expression is
|
||||
applied to each term in the field, not to the whole field value. For
|
||||
instance, to only load hashtags from a tweet, we can use a regular
|
||||
expression which matches terms beginning with `#`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"tweet": {
|
||||
"type": "string",
|
||||
"analyzer": "whitespace",
|
||||
"fielddata": {
|
||||
"filter": {
|
||||
"regex": {
|
||||
"pattern": "^#.*"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
--
|
||||
|
||||
These filters can be updated on an existing field mapping and will take
|
||||
effect the next time the fielddata for a segment is loaded. Use the
|
||||
<<indices-clearcache,Clear Cache>> API
|
||||
to reload the fielddata using the new filters.
|
|
@ -0,0 +1,281 @@
|
|||
[[mapping-date-format]]
|
||||
=== `format`
|
||||
|
||||
In JSON documents, dates are represented as strings. Elasticsearch uses a set
|
||||
of preconfigured formats to recognize and parse these strings into a long
|
||||
value representing _milliseconds-since-the-epoch_ in UTC.
|
||||
|
||||
Besides the <<built-in-date-formats,built-in formats>>, your own
|
||||
<<custom-date-formats,custom formats>> can be specified using the familiar
|
||||
`yyyy/MM/dd` syntax:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"date": {
|
||||
"type": "date",
|
||||
"format": "yyyy-MM-dd"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
Many APIs which support date values also support <<date-math,date math>>
|
||||
expressions, such as `now-1m/d` -- the current time, minus one month, rounded
|
||||
down to the nearest day.
|
||||
|
||||
[[custom-date-formats]]
|
||||
==== Custom date formats
|
||||
|
||||
Completely customizable date formats are supported. The syntax for these is explained
|
||||
http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[in the Joda docs].
|
||||
|
||||
[[built-in-date-formats]]
|
||||
==== Built In Formats
|
||||
|
||||
Most of the below dates have a `strict` companion dates, which means, that
|
||||
year, month and day parts of the week must have prepending zeros in order
|
||||
to be valid. This means, that a date like `5/11/1` would not be valid, but
|
||||
you would need to specify the full date, which would be `2005/11/01` in this
|
||||
example. So instead of `date_optional_time` you would need to specify
|
||||
`strict_date_optional_time`.
|
||||
|
||||
The following tables lists all the defaults ISO formats supported:
|
||||
|
||||
`epoch_millis`::
|
||||
|
||||
A formatter for the number of milliseconds since the epoch. Note, that
|
||||
this timestamp allows a max length of 13 chars, so dates older than 1653
|
||||
and 2286 are not supported. You should use a different date formatter in
|
||||
that case.
|
||||
|
||||
`epoch_second`::
|
||||
|
||||
A formatter for the number of seconds since the epoch. Note, that this
|
||||
timestamp allows a max length of 10 chars, so dates older than 1653 and
|
||||
2286 are not supported. You should use a different date formatter in that
|
||||
case.
|
||||
|
||||
[[strict-date-time]]`date_optional_time` or `strict_date_optional_time`::
|
||||
|
||||
A generic ISO datetime parser where the date is mandatory and the time is
|
||||
optional.
|
||||
http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[Full details here].
|
||||
|
||||
`basic_date`::
|
||||
|
||||
A basic formatter for a full date as four digit year, two digit month of
|
||||
year, and two digit day of month: `yyyyMMdd`.
|
||||
|
||||
`basic_date_time`::
|
||||
|
||||
A basic formatter that combines a basic date and time, separated by a 'T':
|
||||
`yyyyMMdd'T'HHmmss.SSSZ`.
|
||||
|
||||
`basic_date_time_no_millis`::
|
||||
|
||||
A basic formatter that combines a basic date and time without millis,
|
||||
separated by a 'T': `yyyyMMdd'T'HHmmssZ`.
|
||||
|
||||
`basic_ordinal_date`::
|
||||
|
||||
A formatter for a full ordinal date, using a four digit year and three
|
||||
digit dayOfYear: `yyyyDDD`.
|
||||
|
||||
`basic_ordinal_date_time`::
|
||||
|
||||
A formatter for a full ordinal date and time, using a four digit year and
|
||||
three digit dayOfYear: `yyyyDDD'T'HHmmss.SSSZ`.
|
||||
|
||||
`basic_ordinal_date_time_no_millis`::
|
||||
|
||||
A formatter for a full ordinal date and time without millis, using a four
|
||||
digit year and three digit dayOfYear: `yyyyDDD'T'HHmmssZ`.
|
||||
|
||||
`basic_time`::
|
||||
|
||||
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||
two digit second of minute, three digit millis, and time zone offset:
|
||||
`HHmmss.SSSZ`.
|
||||
|
||||
`basic_time_no_millis`::
|
||||
|
||||
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||
two digit second of minute, and time zone offset: `HHmmssZ`.
|
||||
|
||||
`basic_t_time`::
|
||||
|
||||
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||
two digit second of minute, three digit millis, and time zone off set
|
||||
prefixed by 'T': `'T'HHmmss.SSSZ`.
|
||||
|
||||
`basic_t_time_no_millis`::
|
||||
|
||||
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||
two digit second of minute, and time zone offset prefixed by 'T':
|
||||
`'T'HHmmssZ`.
|
||||
|
||||
`basic_week_date` or `strict_basic_week_date`::
|
||||
|
||||
A basic formatter for a full date as four digit weekyear, two digit week
|
||||
of weekyear, and one digit day of week: `xxxx'W'wwe`.
|
||||
|
||||
`basic_week_date_time` or `strict_basic_week_date_time`::
|
||||
|
||||
A basic formatter that combines a basic weekyear date and time, separated
|
||||
by a 'T': `xxxx'W'wwe'T'HHmmss.SSSZ`.
|
||||
|
||||
`basic_week_date_time_no_millis` or `strict_basic_week_date_time_no_millis`::
|
||||
|
||||
A basic formatter that combines a basic weekyear date and time without
|
||||
millis, separated by a 'T': `xxxx'W'wwe'T'HHmmssZ`.
|
||||
|
||||
`date` or `strict_date`::
|
||||
|
||||
A formatter for a full date as four digit year, two digit month of year,
|
||||
and two digit day of month: `yyyy-MM-dd`.
|
||||
|
||||
`date_hour` or `strict_date_hour`::
|
||||
|
||||
A formatter that combines a full date and two digit hour of day.
|
||||
|
||||
`date_hour_minute` or `strict_date_hour_minute`::
|
||||
|
||||
A formatter that combines a full date, two digit hour of day, and two
|
||||
digit minute of hour.
|
||||
|
||||
`date_hour_minute_second` or `strict_date_hour_minute_second`::
|
||||
|
||||
A formatter that combines a full date, two digit hour of day, two digit
|
||||
minute of hour, and two digit second of minute.
|
||||
|
||||
`date_hour_minute_second_fraction` or `strict_date_hour_minute_second_fraction`::
|
||||
|
||||
A formatter that combines a full date, two digit hour of day, two digit
|
||||
minute of hour, two digit second of minute, and three digit fraction of
|
||||
second: `yyyy-MM-dd'T'HH:mm:ss.SSS`.
|
||||
|
||||
`date_hour_minute_second_millis` or `strict_date_hour_minute_second_millis`::
|
||||
|
||||
A formatter that combines a full date, two digit hour of day, two digit
|
||||
minute of hour, two digit second of minute, and three digit fraction of
|
||||
second: `yyyy-MM-dd'T'HH:mm:ss.SSS`.
|
||||
|
||||
`date_time` or `strict_date_time`::
|
||||
|
||||
A formatter that combines a full date and time, separated by a 'T': `yyyy-
|
||||
MM-dd'T'HH:mm:ss.SSSZZ`.
|
||||
|
||||
`date_time_no_millis` or `strict_date_time_no_millis`::
|
||||
|
||||
A formatter that combines a full date and time without millis, separated
|
||||
by a 'T': `yyyy-MM-dd'T'HH:mm:ssZZ`.
|
||||
|
||||
`hour` or `strict_hour`::
|
||||
|
||||
A formatter for a two digit hour of day.
|
||||
|
||||
`hour_minute` or `strict_hour_minute`::
|
||||
|
||||
A formatter for a two digit hour of day and two digit minute of hour.
|
||||
|
||||
`hour_minute_second` or `strict_hour_minute_second`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, and two
|
||||
digit second of minute.
|
||||
|
||||
`hour_minute_second_fraction` or `strict_hour_minute_second_fraction`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||
digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`.
|
||||
|
||||
`hour_minute_second_millis` or `strict_hour_minute_second_millis`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||
digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`.
|
||||
|
||||
`ordinal_date` or `strict_ordinal_date`::
|
||||
|
||||
A formatter for a full ordinal date, using a four digit year and three
|
||||
digit dayOfYear: `yyyy-DDD`.
|
||||
|
||||
`ordinal_date_time` or `strict_ordinal_date_time`::
|
||||
|
||||
A formatter for a full ordinal date and time, using a four digit year and
|
||||
three digit dayOfYear: `yyyy-DDD'T'HH:mm:ss.SSSZZ`.
|
||||
|
||||
`ordinal_date_time_no_millis` or `strict_ordinal_date_time_no_millis`::
|
||||
|
||||
A formatter for a full ordinal date and time without millis, using a four
|
||||
digit year and three digit dayOfYear: `yyyy-DDD'T'HH:mm:ssZZ`.
|
||||
|
||||
`time` or `strict_time`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||
digit second of minute, three digit fraction of second, and time zone
|
||||
offset: `HH:mm:ss.SSSZZ`.
|
||||
|
||||
`time_no_millis` or `strict_time_no_millis`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||
digit second of minute, and time zone offset: `HH:mm:ssZZ`.
|
||||
|
||||
`t_time` or `strict_t_time`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||
digit second of minute, three digit fraction of second, and time zone
|
||||
offset prefixed by 'T': `'T'HH:mm:ss.SSSZZ`.
|
||||
|
||||
`t_time_no_millis` or `strict_t_time_no_millis`::
|
||||
|
||||
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||
digit second of minute, and time zone offset prefixed by 'T': `'T'HH:mm:ssZZ`.
|
||||
|
||||
`week_date` or `strict_week_date`::
|
||||
|
||||
A formatter for a full date as four digit weekyear, two digit week of
|
||||
weekyear, and one digit day of week: `xxxx-'W'ww-e`.
|
||||
|
||||
`week_date_time` or `strict_week_date_time`::
|
||||
|
||||
A formatter that combines a full weekyear date and time, separated by a
|
||||
'T': `xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ`.
|
||||
|
||||
`week_date_time_no_millis` or `strict_week_date_time_no_millis`::
|
||||
|
||||
A formatter that combines a full weekyear date and time without millis,
|
||||
separated by a 'T': `xxxx-'W'ww-e'T'HH:mm:ssZZ`.
|
||||
|
||||
`weekyear` or `strict_weekyear`::
|
||||
|
||||
A formatter for a four digit weekyear.
|
||||
|
||||
`weekyear_week` or `strict_weekyear_week`::
|
||||
|
||||
A formatter for a four digit weekyear and two digit week of weekyear.
|
||||
|
||||
`weekyear_week_day` or `strict_weekyear_week_day`::
|
||||
|
||||
A formatter for a four digit weekyear, two digit week of weekyear, and one
|
||||
digit day of week.
|
||||
|
||||
`year` or `strict_year`::
|
||||
|
||||
A formatter for a four digit year.
|
||||
|
||||
`year_month` or `strict_year_month`::
|
||||
|
||||
A formatter for a four digit year and two digit month of year.
|
||||
|
||||
`year_month_day` or `strict_year_month_day`::
|
||||
|
||||
A formatter for a four digit year, two digit month of year, and two digit
|
||||
day of month.
|
||||
|
|
@ -0,0 +1,60 @@
|
|||
[[geohash-precision]]
|
||||
=== `geohash_precision`
|
||||
|
||||
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||
cell in turn can be further subdivided into smaller cells which are
|
||||
represented by a longer string. So the longer the geohash, the smaller
|
||||
(and thus more accurate) the cell is.
|
||||
|
||||
The `geohash_precision` setting controls the length of the geohash that is
|
||||
indexed when the <<geohash,`geohash`>> option is enabled, and the maximum
|
||||
geohash length when the <<geohash-prefix,`geohash_prefix`>> option is enabled.
|
||||
|
||||
It accepts:
|
||||
|
||||
* a number between 1 and 12 (default), which represents the length of the geohash.
|
||||
* a <<distance-units,distance>>, e.g. `1km`.
|
||||
|
||||
If a distance is specified, it will be translated to the smallest
|
||||
geohash-length that will provide the requested resolution.
|
||||
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "geo_point",
|
||||
"geohash_prefix": true,
|
||||
"geohash_precision": 6 <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"location": {
|
||||
"lat": 41.12,
|
||||
"lon": -71.34
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_search?fielddata_fields=location.geohash
|
||||
{
|
||||
"query": {
|
||||
"term": {
|
||||
"location.geohash": "drm3bt"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> A `geohash_precision` of 6 equates to geohash cells of approximately 1.26km x 0.6km
|
|
@ -0,0 +1,64 @@
|
|||
[[geohash-prefix]]
|
||||
=== `geohash_prefix`
|
||||
|
||||
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||
cell in turn can be further subdivided into smaller cells which are
|
||||
represented by a longer string. So the longer the geohash, the smaller
|
||||
(and thus more accurate) the cell is.
|
||||
|
||||
While the <<geohash,`geohash`>> option enables indexing the geohash that
|
||||
corresponds to the lat/lon point, at the specified
|
||||
<<geohash-precision,precision>>, the `geohash_prefix` option will also
|
||||
index all the enclosing cells as well.
|
||||
|
||||
For instance, a geohash of `drm3btev3e86` will index all of the following
|
||||
terms: [ `d`, `dr`, `drm`, `drm3`, `drm3b`, `drm3bt`, `drm3bte`, `drm3btev`,
|
||||
`drm3btev3`, `drm3btev3e`, `drm3btev3e8`, `drm3btev3e86` ].
|
||||
|
||||
The geohash prefixes can be used with the
|
||||
<<query-dsl-geohash-cell-query,`geohash_cell` query>> to find points within a
|
||||
particular geohash, or its neighbours:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "geo_point",
|
||||
"geohash_prefix": true,
|
||||
"geohash_precision": 6
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"location": {
|
||||
"lat": 41.12,
|
||||
"lon": -71.34
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_search?fielddata_fields=location.geohash
|
||||
{
|
||||
"query": {
|
||||
"geohash_cell": {
|
||||
"location": {
|
||||
"lat": 41.02,
|
||||
"lon": -71.48
|
||||
},
|
||||
"precision": 4, <1>
|
||||
"neighbors": true <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
|
@ -0,0 +1,70 @@
|
|||
[[geohash]]
|
||||
=== `geohash`
|
||||
|
||||
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||
cell in turn can be further subdivided into smaller cells which are
|
||||
represented by a longer string. So the longer the geohash, the smaller
|
||||
(and thus more accurate) the cell is.
|
||||
|
||||
Because geohashes are just strings, they can be stored in an inverted
|
||||
index like any other string, which makes querying them very efficient.
|
||||
|
||||
If you enable the `geohash` option, a `geohash` ``sub-field'' will be indexed
|
||||
as, eg `.geohash`. The length of the geohash is controlled by the
|
||||
<<geohash-precision,`geohash_precision`>> parameter.
|
||||
|
||||
If the <<geohash-prefix,`geohash_prefix`>> option is enabled, the `geohash`
|
||||
option will be enabled automatically.
|
||||
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "geo_point", <1>
|
||||
"geohash": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"location": {
|
||||
"lat": 41.12,
|
||||
"lon": -71.34
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_search?fielddata_fields=location.geohash <2>
|
||||
{
|
||||
"query": {
|
||||
"prefix": {
|
||||
"location.geohash": "drm3b" <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> A `location.geohash` field will be indexed for each geo-point.
|
||||
<2> The geohash can be retrieved with <<doc-values,`doc_values`>>.
|
||||
<3> A <<query-dsl-prefix-query,`prefix`>> query can find all geohashes which start with a particular prefix.
|
||||
|
||||
[WARNING]
|
||||
============================================
|
||||
|
||||
A `prefix` query on geohashes is expensive. Instead, consider using the
|
||||
<<geohash-prefix,`geohash_prefix`>> to pay the expense once at index time
|
||||
instead of on every query.
|
||||
|
||||
============================================
|
||||
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
[[ignore-above]]
|
||||
=== `ignore_above`
|
||||
|
||||
Strings longer than the `ignore_above` setting will not be processed by the
|
||||
<<analyzer,analyzer>> and will not be indexed. This is mainly useful for
|
||||
<<mapping-index,`not_analyzed`>> string fields, which are typically used for
|
||||
filtering, aggregations, and sorting. These are structured fields and it
|
||||
doesn't usually make sense to allow very long terms to be indexed in these
|
||||
fields.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"message": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed",
|
||||
"ignore_above": 20 <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1 <2>
|
||||
{
|
||||
"message": "Syntax error"
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2 <3>
|
||||
{
|
||||
"message": "Syntax error with some long stacktrace"
|
||||
}
|
||||
|
||||
GET _search <4>
|
||||
{
|
||||
"aggs": {
|
||||
"messages": {
|
||||
"terms": {
|
||||
"field": "message"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This field will ignore any string longer than 20 characters.
|
||||
<2> This document is indexed successfully.
|
||||
<3> This document will be indexed, but without indexing the `message` field.
|
||||
<4> Search returns both documents, but only the first is present in the terms aggregation.
|
||||
|
||||
This option is also useful for protecting against Lucene's term byte-length
|
||||
limit of `32766`.
|
||||
|
||||
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
|
||||
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
|
||||
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
|
||||
3 bytes.
|
|
@ -0,0 +1,83 @@
|
|||
[[ignore-malformed]]
|
||||
=== `ignore_malformed`
|
||||
|
||||
Sometimes you don't have much control over the data that you receive. One
|
||||
user may send a `login` field that is a <<date,`date`>>, and another sends a
|
||||
`login` field that is an email address.
|
||||
|
||||
Trying to index the wrong datatype into a field throws an exception by
|
||||
default, and rejects the whole document. The `ignore_malformed` parameter, if
|
||||
set to `true`, allows the exception to be ignored. The malformed field is not
|
||||
indexed, but other fields in the document are processed normally.
|
||||
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"number_one": {
|
||||
"type": "integer"
|
||||
},
|
||||
"number_two": {
|
||||
"type": "integer",
|
||||
"ignore_malformed": true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Some text value",
|
||||
"number_one": "foo" <1>
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"text": "Some text value",
|
||||
"number_two": "foo" <2>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This document will be rejected because `number_one` does not allow malformed values.
|
||||
<2> This document will have the `text` field indexed, but not the `number_two` field.
|
||||
|
||||
|
||||
[[ignore-malformed-setting]]
|
||||
==== Index-level default
|
||||
|
||||
The `index.mapping.ignore_malformed` setting can be set on the index level to
|
||||
allow to ignore malformed content globally across all mapping types.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"settings": {
|
||||
"index.mapping.ignore_malformed": true <1>
|
||||
},
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"number_one": { <1>
|
||||
"type": "byte"
|
||||
},
|
||||
"number_two": {
|
||||
"type": "integer",
|
||||
"ignore_malformed": false <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `number_one` field inherits the index-level setting.
|
||||
<2> The `number_two` field overrides the index-level setting to turn off `ignore_malformed`.
|
||||
|
|
@ -0,0 +1,83 @@
|
|||
[[include-in-all]]
|
||||
=== `include_in_all`
|
||||
|
||||
The `include_in_all` parameter provides per-field control over which fields
|
||||
are included in the <<mapping-all-field,`_all`>> field. It defaults to `true`, unless <<mapping-index,`index`>> is set to `no`.
|
||||
|
||||
This example demonstrates how to exclude the `date` field from the `_all` field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"title": { <1>
|
||||
"type": "string"
|
||||
}
|
||||
"content": { <1>
|
||||
"type": "string"
|
||||
},
|
||||
"date": { <2>
|
||||
"type": "date",
|
||||
"include_in_all": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `title` and `content` fields with be included in the `_all` field.
|
||||
<2> The `date` field will not be included in the `_all` field.
|
||||
|
||||
The `include_in_all` parameter can also be set at the type level and on
|
||||
<<object,`object`>> or <<nested,`nested`>> fields, in which case all sub-
|
||||
fields inherit that setting. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"include_in_all": false, <1>
|
||||
"properties": {
|
||||
"title": { "type": "string" },
|
||||
"author": {
|
||||
"include_in_all": true, <2>
|
||||
"properties": {
|
||||
"first_name": { "type": "string" },
|
||||
"last_name": { "type": "string" }
|
||||
}
|
||||
},
|
||||
"editor": {
|
||||
"properties": {
|
||||
"first_name": { "type": "string" }, <3>
|
||||
"last_name": { "type": "string", "include_in_all": true } <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> All fields in `my_type` are excluded from `_all`.
|
||||
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
|
||||
<3> Only the `editor.last_name` field is included in `_all`.
|
||||
The `editor.first_name` inherits the type-level setting and is excluded.
|
||||
|
||||
[NOTE]
|
||||
.Multi-fields and `include_in_all`
|
||||
=================================
|
||||
|
||||
The original field value is added to the `_all` field, not the terms produced
|
||||
by a field's analyzer. For this reason, it makes no sense to set
|
||||
`include_in_all` to `true` on <<multi-fields,multi-fields>>, as each
|
||||
multi-field has exactly the same value as its parent.
|
||||
|
||||
=================================
|
|
@ -0,0 +1,70 @@
|
|||
[[index-options]]
|
||||
=== `index_options`
|
||||
|
||||
The `index_options` parameter controls what information is added to the
|
||||
inverted index, for search and highlighting purposes. It accepts the
|
||||
following settings:
|
||||
|
||||
[horizontal]
|
||||
`docs`::
|
||||
|
||||
Only the doc number is indexed. Can answer the question _Does this term
|
||||
exist in this field?_
|
||||
|
||||
`freqs`::
|
||||
|
||||
Doc number and term frequencies are indexed. Term frequencies are used to
|
||||
score repeated terms higher than single terms.
|
||||
|
||||
`positions`::
|
||||
|
||||
Doc number, term frequencies, and term positions (or order) are indexed.
|
||||
Positions can be used for
|
||||
<<query-dsl-match-query-phrase,proximity or phrase queries>>.
|
||||
|
||||
`offsets`::
|
||||
|
||||
Doc number, term frequencies, positions, and start and end character
|
||||
offsets (which map the term back to the original string) are indexed.
|
||||
Offsets are used by the <<postings-highlighter,postings highlighter>>.
|
||||
|
||||
<<mapping-index,Analyzed>> string fields use `positions` as the default, and
|
||||
<<all other fields use `docs` as the default.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"index_options": "offsets"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Quick brown fox"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"text": "brown fox"
|
||||
}
|
||||
},
|
||||
"highlight": {
|
||||
"fields": {
|
||||
"text": {} <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `text` field will use the postings highlighter by default because `offsets` are indexed.
|
|
@ -0,0 +1,48 @@
|
|||
[[mapping-index]]
|
||||
=== `index`
|
||||
|
||||
The `index` option controls how field values are indexed and, thus, how they
|
||||
are searchable. It accepts three values:
|
||||
|
||||
[horizontal]
|
||||
`no`::
|
||||
|
||||
Do not add this field value to the index. With this setting, the field
|
||||
will not be queryable.
|
||||
|
||||
`not_analyzed`::
|
||||
|
||||
Add the field value to the index unchanged, as a single term. This is the
|
||||
default for all fields that support this option except for
|
||||
<<string,`string`>> fields. `not_analyzed` fields are usually used with
|
||||
<<term-level-queries,term-level queries>> for structured search.
|
||||
|
||||
`analyzed`::
|
||||
|
||||
This option applies only to `string` fields, for which it is the default.
|
||||
The string field value is first <<analysis,analyzed>> to convert the
|
||||
string into terms (e.g. a list of individual words), which are then
|
||||
indexed. At search time, the the query string is passed through
|
||||
(<<search-analyzer,usually>>) the same analyzer to generate terms
|
||||
in the same format as those in the index. It is this process that enables
|
||||
<<full-text-queries,full text search>>.
|
||||
|
||||
For example, you can create a `not_analyzed` string field with the following:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"status_code": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
|
@ -0,0 +1,63 @@
|
|||
[[lat-lon]]
|
||||
=== `lat_lon`
|
||||
|
||||
<<geo-queries,Geo-queries>> are usually performed by plugging the value of
|
||||
each <<geo-point,`geo_point`>> field into a formula to determine whether it
|
||||
falls into the required area or not. Unlike most queries, the inverted index
|
||||
is not involved.
|
||||
|
||||
Setting `lat_lon` to `true` causes the latitude and longitude values to be
|
||||
indexed as numeric fields (called `.lat` and `.lon`). These fields can be used
|
||||
by the <<query-dsl-geo-bounding-box-query,`geo_bounding_box`>> and
|
||||
<<query-dsl-geo-distance-query,`geo_distance`>> queries instead of
|
||||
performing in-memory calculations.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "geo_point",
|
||||
"lat_lon": true <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"location": {
|
||||
"lat": 41.12,
|
||||
"lon": -71.34
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"geo_distance": {
|
||||
"location": {
|
||||
"lat": 41,
|
||||
"lon": -71
|
||||
},
|
||||
"distance": "50km",
|
||||
"optimize_bbox": "indexed" <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Setting `lat_lon` to true indexes the geo-point in the `location.lat` and `location.lon` fields.
|
||||
<2> The `indexed` option tells the geo-distance query to use the inverted index instead of the in-memory calculation.
|
||||
|
||||
Whether the in-memory or indexed operation performs better depends both on
|
||||
your dataset and on the types of queries that you are running.
|
||||
|
||||
NOTE: The `lat_lon` option only makes sense for single-value `geo_point`
|
||||
fields. It will not work with arrays of geo-points.
|
||||
|
|
@ -0,0 +1,132 @@
|
|||
[[multi-fields]]
|
||||
=== `fields`
|
||||
|
||||
It is often useful to index the same field in different ways for different
|
||||
purposes. This is the purpose of _multi-fields_. For instance, a `string`
|
||||
field could be <<mapping-index,indexed>> as an `analyzed` field for full-text
|
||||
search, and as a `not_analyzed` field for sorting or aggregations:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"city": {
|
||||
"type": "string",
|
||||
"fields": {
|
||||
"raw": { <1>
|
||||
"type": "string",
|
||||
"index": "not_analyzed"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT /my_index/my_type/1
|
||||
{
|
||||
"city": "New York"
|
||||
}
|
||||
|
||||
PUT /my_index/my_type/2
|
||||
{
|
||||
"city": "York"
|
||||
}
|
||||
|
||||
GET /my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"city": "york" <2>
|
||||
}
|
||||
},
|
||||
"sort": {
|
||||
"city.raw": "asc" <3>
|
||||
},
|
||||
"aggs": {
|
||||
"Cities": {
|
||||
"terms": {
|
||||
"field": "city.raw" <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `city.raw` field is a `not_analyzed` version of the `city` field.
|
||||
<2> The analyzed `city` field can be used for full text search.
|
||||
<3> The `city.raw` field can be used for sorting and aggregations
|
||||
|
||||
NOTE: Multi-fields do not change the original `_source` field.
|
||||
|
||||
==== Multi-fields with multiple analyzers
|
||||
|
||||
Another use case of multi-fields is to analyze the same field in different
|
||||
ways for better relevance. For instance we could index a field with the
|
||||
<<analysis-standard-analyzer,`standard` analyzer>> which breaks text up into
|
||||
words, and again with the <<english-analyzer,`english` analyzer>>
|
||||
which stems words into their root form:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"text": { <1>
|
||||
"type": "string"
|
||||
},
|
||||
"fields": {
|
||||
"english": { <2>
|
||||
"type": "string",
|
||||
"analyzer": "english"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{ "text": "quick brown fox" } <3>
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{ "text": "quick brown foxes" } <3>
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"multi_match": {
|
||||
"query": "quick brown foxes",
|
||||
"fields": [ <4>
|
||||
"text",
|
||||
"text.english"
|
||||
],
|
||||
"type": "most_fields" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> The `text` field uses the `standard` analyzer.
|
||||
<2> The `text.english` field uses the `english` analyzer.
|
||||
<3> Index two documents, one with `fox` and the other with `foxes`.
|
||||
<4> Query both the `text` and `text.english` fields and combine the scores.
|
||||
|
||||
The `text` field contains the term `fox` in the first document and `foxes` in
|
||||
the second document. The `text.english` field contains `fox` for both
|
||||
documents, because `foxes` is stemmed to `fox`.
|
||||
|
||||
The query string is also analyzed by the `standard` analyzer for the `text`
|
||||
field, and by the `english` analyzer` for the `text.english` field. The
|
||||
stemmed field allows a query for `foxes` to also match the document containing
|
||||
just `fox`. This allows us to match as many documents as possible. By also
|
||||
querying the unstemmed `text` field, we improve the relevance score of the
|
||||
document which matches `foxes` exactly.
|
||||
|
|
@ -0,0 +1,64 @@
|
|||
[[norms]]
|
||||
=== `norms`
|
||||
|
||||
Norms store various normalization factors -- a number to represent the
|
||||
relative field length and the <<index-boost,index time `boost`>> setting --
|
||||
that are later used at query time in order to compute the score of a document
|
||||
relatively to a query.
|
||||
|
||||
Although useful for scoring, norms also require quite a lot of memory
|
||||
(typically in the order of one byte per document per field in your index, even
|
||||
for documents that don't have this specific field). As a consequence, if you
|
||||
don't need scoring on a specific field, you should disable norms on that
|
||||
field. In particular, this is the case for fields that are used solely for
|
||||
filtering or aggregations.
|
||||
|
||||
Norms can be disabled (but not reenabled) after the fact, using the
|
||||
<<indices-put-mapping,PUT mapping API>> like so:
|
||||
|
||||
[source,js]
|
||||
------------
|
||||
PUT my_index/_mapping/my_type
|
||||
{
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"norms": {
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
------------
|
||||
// AUTOSENSE
|
||||
|
||||
NOTE: Norms will not be removed instantly, but will be removed as old segments
|
||||
are merged into new segments as you continue indexing new documents. Any score
|
||||
computation on a field that has had norms removed might return inconsistent
|
||||
results since some documents won't have norms anymore while other documents
|
||||
might still have norms.
|
||||
|
||||
==== Lazy loading of norms
|
||||
|
||||
Norms can be loaded into memory eagerly (`eager`), whenever a new segment
|
||||
comes online, or they can loaded lazily (`lazy`, default), only when the field
|
||||
is queried.
|
||||
|
||||
Eager loading can be configured as follows:
|
||||
|
||||
[source,js]
|
||||
------------
|
||||
PUT my_index/_mapping/my_type
|
||||
{
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"norms": {
|
||||
"loading": "eager"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
------------
|
||||
// AUTOSENSE
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
[[null-value]]
|
||||
=== `null_value`
|
||||
|
||||
A `null` value cannot be indexed or searched. When a field is set to `null`,
|
||||
(or an empty array or an array of `null` values) it is treated as though that
|
||||
field has no values.
|
||||
|
||||
The `null_value` parameter allows you to replace explicit `null` values with
|
||||
the specified value so that it can be indexed and searched. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"status_code": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed",
|
||||
"null_value": "NULL" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"status_code": null
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"status_code": [] <2>
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"term": {
|
||||
"status_code": "NULL" <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Replace explicit `null` values with the term `NULL`.
|
||||
<2> An empty array does not contain an explicit `null`, and so won't be replaced with the `null_value`.
|
||||
<3> A query for `NULL` returns document 1, but not document 2.
|
||||
|
||||
IMPORTANT: The `null_value` needs to be the same datatype as the field. For
|
||||
instance, a `long` field cannot have a string `null_value`. String fields
|
||||
which are `analyzed` will also pass the `null_value` through the configured
|
||||
analyzer.
|
||||
|
||||
Also see the <<query-dsl-missing-query,`missing` query>> for its `null_value` support.
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
[[position-offset-gap]]
|
||||
=== `position_offset_gap`
|
||||
|
||||
<<mapping-index,Analyzed>> string fields take term <<index-options,positions>>
|
||||
into account, in order to be able to support
|
||||
<<query-dsl-match-query-phrase,proximity or phrase queries>>.
|
||||
When indexing an array of strings, each string of the array is indexed
|
||||
directly after the previous one, almost as though all the strings in the array
|
||||
had been concatenated into one big string.
|
||||
|
||||
This can result in matches from phrase queries spanning two array elements.
|
||||
For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /my_index/groups/1
|
||||
{
|
||||
"names": [ "John Abraham", "Lincoln Smith"]
|
||||
}
|
||||
|
||||
GET /my_index/groups/_search
|
||||
{
|
||||
"query": {
|
||||
"match_phrase": {
|
||||
"names": "Abraham Lincoln" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> This phrase query matches our document, even though `Abraham` and `Lincoln` are in separate strings.
|
||||
|
||||
The `position_offset_gap` can introduce a fake gap between each array element. For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"names": {
|
||||
"type": "string",
|
||||
"position_offset_gap": 50 <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT /my_index/groups/1
|
||||
{
|
||||
"names": [ "John Abraham", "Lincoln Smith"]
|
||||
}
|
||||
|
||||
GET /my_index/groups/_search
|
||||
{
|
||||
"query": {
|
||||
"match_phrase": {
|
||||
"names": "Abraham Lincoln" <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The first term in the next array element will be 50 terms apart from the
|
||||
last term in the previous array element.
|
||||
<2> The phrase query no longer matches our document.
|
|
@ -0,0 +1,56 @@
|
|||
[[precision-step]]
|
||||
=== `precision_step`
|
||||
|
||||
Most <<number,numeric>> datatypes index extra terms representing numeric
|
||||
ranges for each number to make <<query-dsl-range-query,`range` queries>>
|
||||
faster. For instance, this `range` query:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"range": {
|
||||
"number": {
|
||||
"gte": 0
|
||||
"lte": 321
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
might be executed internally as a <<query-dsl-terms-query,`terms` query>> that
|
||||
looks something like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"terms": {
|
||||
"number": [
|
||||
"0-255",
|
||||
"256-319"
|
||||
"320",
|
||||
"321"
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
These extra terms greatly reduce the number of terms that have to be examined,
|
||||
at the cost of increased disk space.
|
||||
|
||||
The default value for `precision_step` depends on the `type` of the numeric field:
|
||||
|
||||
[horizontal]
|
||||
`long`, `double`, `date`, `ip`:: `16` (3 extra terms)
|
||||
`integer`, `float`, `short`:: `8` (3 extra terms)
|
||||
`byte`:: `2147483647` (0 extra terms)
|
||||
`token_count`:: `32` (0 extra terms)
|
||||
|
||||
The value of the `precision_step` setting indicates the number of bits that
|
||||
should be compressed into an extra term. A `long` value consists of 64 bits,
|
||||
so a `precision_step` of 16 results in the following terms:
|
||||
|
||||
[horizontal]
|
||||
Bits 0-15:: `value & 1111111111111111 0000000000000000 0000000000000000 0000000000000000`
|
||||
Bits 0-31:: `value & 1111111111111111 1111111111111111 0000000000000000 0000000000000000`
|
||||
Bits 0-47:: `value & 1111111111111111 1111111111111111 1111111111111111 0000000000000000`
|
||||
Bits 0-63:: `value`
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,101 @@
|
|||
[[properties]]
|
||||
=== `properties`
|
||||
|
||||
Type mappings, <<object,`object` fields>> and <<nested,`nested` fields>>
|
||||
contain sub-fields, called `properties`. These properties may be of any
|
||||
<<mapping-types,datatype>>, including `object` and `nested`. Properties can
|
||||
be added:
|
||||
|
||||
* explicitly by defining them when <<indices-create-index,creating an index>>.
|
||||
* explicitily by defining them when adding or updating a mapping type with the <<indices-put-mapping,PUT mapping>> API.
|
||||
* <<dynamic-mapping,dynamically>> just by indexing documents containing new fields.
|
||||
|
||||
Below is an example of adding `properties` to a mapping type, an `object`
|
||||
field, and a `nested` field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": { <1>
|
||||
"properties": {
|
||||
"manager": { <2>
|
||||
"properties": {
|
||||
"age": { "type": "integer" },
|
||||
"name": { "type": "string" }
|
||||
}
|
||||
},
|
||||
"employees": { <3>
|
||||
"type": "nested",
|
||||
"properties": {
|
||||
"age": { "type": "integer" },
|
||||
"name": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1 <4>
|
||||
{
|
||||
"region": "US",
|
||||
"manager": {
|
||||
"name": "Alice White",
|
||||
"age": 30
|
||||
},
|
||||
"employees": [
|
||||
{
|
||||
"name": "John Smith",
|
||||
"age": 34
|
||||
},
|
||||
{
|
||||
"name": "Peter Brown",
|
||||
"age": 26
|
||||
}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Properties under the `my_type` mapping type.
|
||||
<2> Properties under the `manager` object field.
|
||||
<3> Properties under the `employees` nested field.
|
||||
<4> An example document which corresponds to the above mapping.
|
||||
|
||||
==== Dot notation
|
||||
|
||||
Inner fields can be referred to in queries, aggregations, etc., using _dot
|
||||
notation_:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"manager.name": "Alice White" <1>
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"Employees": {
|
||||
"nested": {
|
||||
"path": "employees"
|
||||
},
|
||||
"aggs": {
|
||||
"Employee Ages": {
|
||||
"histogram": {
|
||||
"field": "employees.age", <2>
|
||||
"interval": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
IMPORTANT: The full path to the inner field must be specified.
|
||||
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
[[search-analyzer]]
|
||||
=== `search_analyzer`
|
||||
|
||||
Usually, the same <<analyzer,analyzer>> should be applied at index time and at
|
||||
search time, to ensure that the terms in the query are in the same format as
|
||||
the terms in the inverted index.
|
||||
|
||||
Sometimes, though, it can make sense to use a different analyzer at search
|
||||
time, such as when using the <<analysis-edgengram-tokenizer,`edge_ngram`>>
|
||||
tokenizer for autocomplete.
|
||||
|
||||
By default, queries will use the `analyzer` defined in the field mapping, but
|
||||
this can be overridden with the `search_analyzer` setting:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /my_index
|
||||
{
|
||||
"settings": {
|
||||
"analysis": {
|
||||
"filter": {
|
||||
"autocomplete_filter": {
|
||||
"type": "edge_ngram",
|
||||
"min_gram": 1,
|
||||
"max_gram": 20
|
||||
}
|
||||
},
|
||||
"analyzer": {
|
||||
"autocomplete": { <1>
|
||||
"type": "custom",
|
||||
"tokenizer": "standard",
|
||||
"filter": [
|
||||
"lowercase",
|
||||
"autocomplete_filter"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"analyzer": "autocomplete", <2>
|
||||
"search_analyzer": "standard" <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Quick Brown Fox" <3>
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"text": {
|
||||
"query": "Quick Br", <4>
|
||||
"operator": "and"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> Analysis settings to define the custom `autocomplete` analyzer.
|
||||
<2> The `text` field uses the `autocomplete` analyzer at index time, but the `standard` analyzer at search time.
|
||||
<3> This field is indexed as the terms: [ `q`, `qu`, `qui`, `quic`, `quick`, `b`, `br`, `bro`, `brow`, `brown`, `f`, `fo`, `fox` ]
|
||||
<4> The query searches for both of these terms: [ `quick`, `br` ]
|
||||
|
||||
See {defguide}/_index_time_search_as_you_type.html[Index time search-as-you-
|
||||
type] for a full explanation of this example.
|
|
@ -0,0 +1,54 @@
|
|||
[[similarity]]
|
||||
=== `similarity`
|
||||
|
||||
Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
|
||||
field. The `similarity` setting provides a simple way of choosing a similarity
|
||||
algorithm other than the default TF/IDF, such as `BM25`.
|
||||
|
||||
Similarities are mostly useful for <<string,`string`>> fields, especially
|
||||
`analyzed` string fields, but can also apply to other field types.
|
||||
|
||||
Custom similarites can be configured by tuning the parameters of the built-in
|
||||
similarities. For more details about this expert options, see the
|
||||
<<index-modules-similarity,similarity module>>.
|
||||
|
||||
The only similarities which can be used out of the box, without any further
|
||||
configuration are:
|
||||
|
||||
`default`::
|
||||
The Default TF/IDF algorithm used by Elasticsearch and
|
||||
Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
|
||||
for more information.
|
||||
|
||||
`BM25`::
|
||||
The Okapi BM25 algorithm.
|
||||
See {defguide}/pluggable-similarites.html[Plugggable Similarity Algorithms]
|
||||
for more information.
|
||||
|
||||
|
||||
The `similarity` can be set on the field level when a field is first created,
|
||||
as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"default_field": { <1>
|
||||
"type": "string"
|
||||
},
|
||||
"bm25_field": {
|
||||
"type": "string",
|
||||
"similarity": "BM25" <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `default_field` uses the `default` similarity (ie TF/IDF).
|
||||
<2> The `bm25_field` uses the `BM25` similarity.
|
||||
|
|
@ -0,0 +1,73 @@
|
|||
[[mapping-store]]
|
||||
=== `store`
|
||||
|
||||
By default, field values <<mapping-index,indexed>> to make them searchable,
|
||||
but they are not _stored_. This means that the field can be queried, but the
|
||||
original field value cannot be retrieved.
|
||||
|
||||
Usually this doesn't matter. The field value is already part of the
|
||||
<<mapping-source-field,`_source` field>>, which is stored by default. If you
|
||||
only want to retrieve the value of a single field or of a few fields, instead
|
||||
of the whole `_source`, then this can be achieved with
|
||||
<<search-request-source-filtering,source filtering>>.
|
||||
|
||||
In certain situations it can make sense to `store` a field. For instance, if
|
||||
you have a document with a `title`, a `date`, and a very large `content`
|
||||
field, you may want to retrieve just the `title` and the `date` without having
|
||||
to extract those fields from a large `_source` field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT /my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"store": true <1>
|
||||
},
|
||||
"date": {
|
||||
"type": "date",
|
||||
"store": true <1>
|
||||
},
|
||||
"content": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT /my_index/my_type/1
|
||||
{
|
||||
"title": "Some short title",
|
||||
"date": "2015-01-01",
|
||||
"content": "A very long content field..."
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"fields": [ "title", "date" ] <2>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `title` and `date` fields are stored.
|
||||
<2> This request will retrieve the values of the `title` and `date` fields.
|
||||
|
||||
[NOTE]
|
||||
.Stored fields returned as arrays
|
||||
======================================
|
||||
|
||||
For consistency, stored fields are always returned as an _array_ because there
|
||||
is no way of knowing if the original field value was a single value, multiple
|
||||
values, or an empty array.
|
||||
|
||||
If you need the original value, you should retrieve it from the `_source`
|
||||
field instead.
|
||||
|
||||
======================================
|
||||
|
||||
Another situation where it can make sense to make a field stored is for those
|
||||
that don't appear in the `_source` field (such as <<copy-to,`copy_to` fields>>).
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
[[term-vector]]
|
||||
=== `term_vector`
|
||||
|
||||
Term vectors contain information about the terms produced by the
|
||||
<<analysis,analysis>> process, including:
|
||||
|
||||
* a list of terms.
|
||||
* the position (or order) of each term.
|
||||
* the start and end character offsets mapping the term to its
|
||||
origin in the original string.
|
||||
|
||||
These term vectors can be stored so that they can be retrieved for a
|
||||
particular document.
|
||||
|
||||
The `term_vector` setting accepts:
|
||||
|
||||
[horizontal]
|
||||
`no`:: No term vectors are stored. (default)
|
||||
`yes`:: Just the terms in the field are stored.
|
||||
`with_positions`:: Terms and positions are stored.
|
||||
`with_offsets`:: Terms and character offsets are stored.
|
||||
`with_positions_offsets`:: Terms, positions, and character offsets are stored.
|
||||
|
||||
The fast vector highlighter requires `with_positions_offsets`. The term
|
||||
vectors API can retrieve whatever is stored.
|
||||
|
||||
WARNING: Setting `with_positions_offsets` will double the size of a field's
|
||||
index.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"term_vector": "with_positions_offsets"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Quick brown fox"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"text": "brown fox"
|
||||
}
|
||||
},
|
||||
"highlight": {
|
||||
"fields": {
|
||||
"text": {} <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The fast vector highlighter will be used by default for the `text` field
|
||||
because term vectors are enabled.
|
||||
|
|
@ -1,61 +0,0 @@
|
|||
[[mapping-transform]]
|
||||
== Transform
|
||||
The document can be transformed before it is indexed by registering a
|
||||
script in the `transform` element of the mapping. The result of the
|
||||
transform is indexed but the original source is stored in the `_source`
|
||||
field. Example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"example" : {
|
||||
"transform" : {
|
||||
"script" : {
|
||||
"inline": "if (ctx._source['title']?.startsWith('t')) ctx._source['suggest'] = ctx._source['content']",
|
||||
"params" : {
|
||||
"variable" : "not used but an example anyway"
|
||||
},
|
||||
"lang": "groovy"
|
||||
}
|
||||
},
|
||||
"properties": {
|
||||
"title": { "type": "string" },
|
||||
"content": { "type": "string" },
|
||||
"suggest": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Its also possible to specify multiple transforms:
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"example" : {
|
||||
"transform" : [
|
||||
{"script": "ctx._source['suggest'] = ctx._source['content']"}
|
||||
{"script": "ctx._source['foo'] = ctx._source['bar'];"}
|
||||
]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Because the result isn't stored in the source it can't normally be fetched by
|
||||
source filtering. It can be highlighted if it is marked as stored.
|
||||
|
||||
=== Get Transformed
|
||||
The get endpoint will retransform the source if the `_source_transform`
|
||||
parameter is set. Example:
|
||||
|
||||
[source,sh]
|
||||
--------------------------------------------------
|
||||
curl -XGET "http://localhost:9200/test/example/3?pretty&_source_transform"
|
||||
--------------------------------------------------
|
||||
|
||||
The transform is performed before any source filtering but it is mostly
|
||||
designed to make it easy to see what was passed to the index for debugging.
|
||||
|
||||
=== Immutable Transformation
|
||||
Once configured the transform script cannot be modified. This is not
|
||||
because that is technically impossible but instead because madness lies
|
||||
down that road.
|
|
@ -1,24 +1,71 @@
|
|||
[[mapping-types]]
|
||||
== Types
|
||||
== Field datatypes
|
||||
|
||||
The datatype for each field in a document (eg strings, numbers,
|
||||
objects etc) can be controlled via the type mapping.
|
||||
Elasticsearch supports a number of different datatypes for the fields in a
|
||||
document:
|
||||
|
||||
include::types/core-types.asciidoc[]
|
||||
[float]
|
||||
=== Core datatypes
|
||||
|
||||
include::types/array-type.asciidoc[]
|
||||
<<string>>:: `string`
|
||||
<<number>>:: `long`, `integer`, `short`, `byte`, `double`, `float`
|
||||
<<date>>:: `date`
|
||||
<<boolean>>:: `boolean`
|
||||
<<binary>>:: `binary`
|
||||
|
||||
include::types/object-type.asciidoc[]
|
||||
[float]
|
||||
=== Complex datatypes
|
||||
|
||||
include::types/root-object-type.asciidoc[]
|
||||
<<array>>:: Array support does not require a dedicated `type`
|
||||
<<object>>:: `object` for single JSON objects
|
||||
<<nested>>:: `nested` for arrays of JSON objects
|
||||
|
||||
[float]
|
||||
=== Geo dataypes
|
||||
|
||||
<<geo-point>>:: `geo_point` for lat/lon points
|
||||
<<geo-shape>>:: `geo_shape` for complex shapes like polygons
|
||||
|
||||
[float]
|
||||
=== Specialised datatypes
|
||||
|
||||
<<ip>>:: `ip` for IPv4 addresses
|
||||
<<search-suggesters-completion,Completion datatype>>::
|
||||
`completion` to provide auto-complete suggestions
|
||||
<<token-count>>:: `token_count` to count the number of tokens in a string
|
||||
|
||||
Attachment datatype::
|
||||
|
||||
See the https://github.com/elastic/elasticsearch-mapper-attachments[mapper attachment plugin]
|
||||
which supports indexing ``attachments'' like Microsoft Office formats, Open
|
||||
Document formats, ePub, HTML, etc. into an `attachment` datatype.
|
||||
|
||||
include::types/array.asciidoc[]
|
||||
|
||||
include::types/binary.asciidoc[]
|
||||
|
||||
include::types/boolean.asciidoc[]
|
||||
|
||||
include::types/date.asciidoc[]
|
||||
|
||||
include::types/geo-point.asciidoc[]
|
||||
|
||||
include::types/geo-shape.asciidoc[]
|
||||
|
||||
include::types/ip.asciidoc[]
|
||||
|
||||
include::types/nested.asciidoc[]
|
||||
|
||||
include::types/numeric.asciidoc[]
|
||||
|
||||
include::types/object.asciidoc[]
|
||||
|
||||
include::types/string.asciidoc[]
|
||||
|
||||
include::types/token-count.asciidoc[]
|
||||
|
||||
include::types/nested-type.asciidoc[]
|
||||
|
||||
include::types/ip-type.asciidoc[]
|
||||
|
||||
include::types/geo-point-type.asciidoc[]
|
||||
|
||||
include::types/geo-shape-type.asciidoc[]
|
||||
|
||||
include::types/attachment-type.asciidoc[]
|
||||
|
||||
|
|
|
@ -1,69 +0,0 @@
|
|||
[[mapping-array-type]]
|
||||
=== Array Type
|
||||
|
||||
JSON documents allow to define an array (list) of fields or objects.
|
||||
Mapping array types could not be simpler since arrays gets automatically
|
||||
detected and mapping them can be done either with
|
||||
<<mapping-core-types,Core Types>> or
|
||||
<<mapping-object-type,Object Type>> mappings.
|
||||
For example, the following JSON defines several arrays:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"message" : "some arrays in this tweet...",
|
||||
"tags" : ["elasticsearch", "wow"],
|
||||
"lists" : [
|
||||
{
|
||||
"name" : "prog_list",
|
||||
"description" : "programming list"
|
||||
},
|
||||
{
|
||||
"name" : "cool_list",
|
||||
"description" : "cool stuff list"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above JSON has the `tags` property defining a list of a simple
|
||||
`string` type, and the `lists` property is an `object` type array. Here
|
||||
is a sample explicit mapping:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"message" : {"type" : "string"},
|
||||
"tags" : {"type" : "string"},
|
||||
"lists" : {
|
||||
"properties" : {
|
||||
"name" : {"type" : "string"},
|
||||
"description" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The fact that array types are automatically supported can be shown by
|
||||
the fact that the following JSON document is perfectly fine:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"message" : "some arrays in this tweet...",
|
||||
"tags" : "elasticsearch",
|
||||
"lists" : {
|
||||
"name" : "prog_list",
|
||||
"description" : "programming list"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
|
@ -0,0 +1,99 @@
|
|||
[[array]]
|
||||
=== Array datatype
|
||||
|
||||
In Elasticsearch, there is no dedicated `array` type. Any field can contain
|
||||
zero or more values by default, however, all values in the array must be of
|
||||
the same datatype. For instance:
|
||||
|
||||
* an array of strings: [ `"one"`, `"two"` ]
|
||||
* an array of integers: [ `1`, `2` ]
|
||||
* an array of arrays: [ `1`, [ `2`, `3` ]] which is the equivalent of [ `1`, `2`, `3` ]
|
||||
* an array of objects: [ `{ "name": "Mary", "age": 12 }`, `{ "name": "John", "age": 10 }`]
|
||||
|
||||
.Arrays of objects
|
||||
[NOTE]
|
||||
====================================================
|
||||
|
||||
Arrays of objects do not work as you would expect: you cannot query each
|
||||
object independently of the other objects in the array. If you need to be
|
||||
able to do this then you should use the <<nested,`nested`>> datatype instead
|
||||
of the <<object,`object`>> datatype.
|
||||
|
||||
This is explained in more detail in <<nested>>.
|
||||
====================================================
|
||||
|
||||
|
||||
When adding a field dynamically, the first value in the array determines the
|
||||
field `type`. All subsequent values must be of the same datatype or it must
|
||||
at least be possible to <<coerce,coerce>> subsequent values to the same
|
||||
datatype.
|
||||
|
||||
Arrays with a mixture of datatypes are _not_ supported: [ `10`, `"some string"` ]
|
||||
|
||||
An array may contain `null` values, which are either replaced by the
|
||||
configured <<null-value,`null_value`>> or skipped entirely. An empty array
|
||||
`[]` is treated as a missing field -- a field with no values.
|
||||
|
||||
Nothing needs to be pre-configured in order to use arrays in documents, they
|
||||
are supported out of the box:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"message": "some arrays in this document...",
|
||||
"tags": [ "elasticsearch", "wow" ], <1>
|
||||
"lists": [ <2>
|
||||
{
|
||||
"name": "prog_list",
|
||||
"description": "programming list"
|
||||
},
|
||||
{
|
||||
"name": "cool_list",
|
||||
"description": "cool stuff list"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2 <3>
|
||||
{
|
||||
"message": "no arrays in this document...",
|
||||
"tags": "elasticsearch",
|
||||
"lists": {
|
||||
"name": "prog_list",
|
||||
"description": "programming list"
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"tags": "elasticsearch" <4>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `tags` field is dynamically added as a `string` field.
|
||||
<2> The `lists` field is dynamically added as an `object` field.
|
||||
<3> The second document contains no arrays, but can be indexed into the same fields.
|
||||
<4> The query looks for `elasticsearch` in the `tags` field, and matches both documents.
|
||||
|
||||
.Multi-value fields and the inverted index
|
||||
****************************************************
|
||||
|
||||
The fact that all field types support multi-value fields out of the box is a
|
||||
consequence of the origins of Lucene. Lucene was designed to be a full text
|
||||
search engine. In order to be able to search for individual words within a
|
||||
big block of text, Lucene tokenizes the text into individual terms, and
|
||||
adds each term to the inverted index separately.
|
||||
|
||||
This means that even a simple text field must be able to support multiple
|
||||
values by default. When other datatypes were added, such as numbers and
|
||||
dates, they used the same data structure as strings, and so got multi-values
|
||||
for free.
|
||||
|
||||
****************************************************
|
||||
|
|
@ -1,13 +0,0 @@
|
|||
[[mapping-attachment-type]]
|
||||
=== Attachment Type
|
||||
|
||||
The `attachment` type allows to index different "attachment" type field
|
||||
(encoded as `base64`), for example, Microsoft Office formats, open
|
||||
document formats, ePub, HTML, and so on.
|
||||
|
||||
The `attachment` type is provided as a
|
||||
https://github.com/elasticsearch/elasticsearch-mapper-attachments[plugin
|
||||
extension]. It uses http://tika.apache.org/[Apache Tika] behind the scene.
|
||||
|
||||
See https://github.com/elasticsearch/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch[README file]
|
||||
for details.
|
|
@ -0,0 +1,52 @@
|
|||
[[binary]]
|
||||
=== Binary datatype
|
||||
|
||||
The `binary` type accepts a binary value as a
|
||||
https://en.wikipedia.org/wiki/Base64[Base64] encoded string. The field is not
|
||||
stored by default and is not searchable:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string"
|
||||
},
|
||||
"blob": {
|
||||
"type": "binary"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"name": "Some binary blob",
|
||||
"blob": "U29tZSBiaW5hcnkgYmxvYg==" <1>
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> The Base64 encoded binary value must not have embedded newlines `\n`.
|
||||
|
||||
[[binary-params]]
|
||||
==== Parameters for `binary` fields
|
||||
|
||||
The following parameters are accepted by `binary` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` or `false` (default).
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
||||
|
||||
|
|
@ -0,0 +1,119 @@
|
|||
[[boolean]]
|
||||
=== Boolean datatype
|
||||
|
||||
Boolean fields accept JSON `true` and `false` values, but can also accept
|
||||
strings and numbers which are interpreted as either true or false:
|
||||
|
||||
[horizontal]
|
||||
False values::
|
||||
|
||||
`false`, `"false"`, `"off"`, `"no"`, `"0"`, `""` (empty string), `0`, `0.0`
|
||||
|
||||
True values::
|
||||
|
||||
Anything that isn't false.
|
||||
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"is_published": {
|
||||
"type": "boolean"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
POST my_index/my_type/1
|
||||
{
|
||||
"is_published": true <1>
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"term": {
|
||||
"is_published": 1 <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Indexing a document with a JSON `true`.
|
||||
<2> Querying for the document with `1`, which is interpreted as `true`.
|
||||
|
||||
Aggregations like the <<search-aggregations-bucket-terms-aggregation,`terms`
|
||||
aggregation>> use `1` and `0` for the `key`, and the strings `"true"` and
|
||||
`"false"` for the `key_as_string`. Boolean fields when used in scripts,
|
||||
return `1` and `0`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST my_index/my_type/1
|
||||
{
|
||||
"is_published": true
|
||||
}
|
||||
|
||||
POST my_index/my_type/2
|
||||
{
|
||||
"is_published": false
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"aggs": {
|
||||
"publish_state": {
|
||||
"terms": {
|
||||
"field": "is_published"
|
||||
}
|
||||
}
|
||||
},
|
||||
"script_fields": {
|
||||
"is_published": {
|
||||
"script": "doc['is_published'].value" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work.
|
||||
|
||||
[[boolean-params]]
|
||||
==== Parameters for `boolean` fields
|
||||
|
||||
The following parameters are accepted by `boolean` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<index-boost,`boost`>>::
|
||||
|
||||
Field-level index time boosting. Accepts a floating point number, defaults
|
||||
to `1.0`.
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` (default) or `false`.
|
||||
|
||||
<<mapping-index,`index`>>::
|
||||
|
||||
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||
|
||||
<<null-value,`null_value`>>::
|
||||
|
||||
Accepts any of the true or false values listed above. The value is
|
||||
substituted for any explicit `null` values. Defaults to `null`, which
|
||||
means the field is treated as missing.
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
||||
|
|
@ -1,649 +0,0 @@
|
|||
[[mapping-core-types]]
|
||||
=== Core Types
|
||||
|
||||
Each JSON field can be mapped to a specific core type. JSON itself
|
||||
already provides us with some typing, with its support for `string`,
|
||||
`integer`/`long`, `float`/`double`, `boolean`, and `null`.
|
||||
|
||||
The following sample tweet JSON document will be used to explain the
|
||||
core types:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" {
|
||||
"user" : "kimchy",
|
||||
"message" : "This is a tweet!",
|
||||
"postDate" : "2009-11-15T14:12:12",
|
||||
"priority" : 4,
|
||||
"rank" : 12.3
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Explicit mapping for the above JSON tweet can be:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"user" : {"type" : "string", "index" : "not_analyzed"},
|
||||
"message" : {"type" : "string", "null_value" : "na"},
|
||||
"postDate" : {"type" : "date"},
|
||||
"priority" : {"type" : "integer"},
|
||||
"rank" : {"type" : "float"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[string]]
|
||||
==== String
|
||||
|
||||
The text based string type is the most basic type, and contains one or
|
||||
more characters. An example mapping can be:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"message" : {
|
||||
"type" : "string",
|
||||
"store" : true,
|
||||
"index" : "analyzed",
|
||||
"null_value" : "na"
|
||||
},
|
||||
"user" : {
|
||||
"type" : "string",
|
||||
"index" : "not_analyzed",
|
||||
"norms" : {
|
||||
"enabled" : false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above mapping defines a `string` `message` property/field within the
|
||||
`tweet` type. The field is stored in the index (so it can later be
|
||||
retrieved using selective loading when searching), and it gets analyzed
|
||||
(broken down into searchable terms). If the message has a `null` value,
|
||||
then the value that will be stored is `na`. There is also a `string` `user`
|
||||
which is indexed as-is (not broken down into tokens) and has norms
|
||||
disabled (so that matching this field is a binary decision, no match is
|
||||
better than another one).
|
||||
|
||||
The following table lists all the attributes that can be used with the
|
||||
`string` type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Attribute |Description
|
||||
|`index_name` |The name of the field that will be stored in the index.
|
||||
Defaults to the property/field name.
|
||||
|
||||
|`store` |Set to `true` to actually store the field in the index, `false` to not
|
||||
store it. Since by default Elasticsearch stores all fields of the source
|
||||
document in the special `_source` field, this option is primarily useful when
|
||||
the `_source` field has been disabled in the type definition. Defaults to
|
||||
`false`.
|
||||
|
||||
|`index` |Set to `analyzed` for the field to be indexed and searchable
|
||||
after being broken down into token using an analyzer. `not_analyzed`
|
||||
means that its still searchable, but does not go through any analysis
|
||||
process or broken down into tokens. `no` means that it won't be
|
||||
searchable at all (as an individual field; it may still be included in
|
||||
`_all`). Setting to `no` disables `include_in_all`. Defaults to
|
||||
`analyzed`.
|
||||
|
||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
||||
Automatically set to `true` when the <<fielddata-formats,`fielddata` format>> is `doc_values`.
|
||||
|
||||
|`term_vector` |Possible values are `no`, `yes`, `with_offsets`,
|
||||
`with_positions`, `with_positions_offsets`. Defaults to `no`.
|
||||
|
||||
|`boost` |The boost value. Defaults to `1.0`.
|
||||
|
||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
||||
`null_value` as the field value. Defaults to not adding the field at
|
||||
all.
|
||||
|
||||
|`norms: {enabled: <value>}` |Boolean value if norms should be enabled or
|
||||
not. Defaults to `true` for `analyzed` fields, and to `false` for
|
||||
`not_analyzed` fields. See the <<norms,section about norms>>.
|
||||
|
||||
|`norms: {loading: <value>}` |Describes how norms should be loaded, possible values are
|
||||
`eager` and `lazy` (default). It is possible to change the default value to
|
||||
eager for all fields by configuring the index setting `index.norms.loading`
|
||||
to `eager`.
|
||||
|
||||
|`index_options` | Allows to set the indexing
|
||||
options, possible values are `docs` (only doc numbers are indexed),
|
||||
`freqs` (doc numbers and term frequencies), and `positions` (doc
|
||||
numbers, term frequencies and positions). Defaults to `positions` for
|
||||
`analyzed` fields, and to `docs` for `not_analyzed` fields. It
|
||||
is also possible to set it to `offsets` (doc numbers, term
|
||||
frequencies, positions and offsets).
|
||||
|
||||
|`analyzer` |The analyzer used to analyze the text contents when
|
||||
`analyzed` during indexing and searching.
|
||||
Defaults to the globally configured analyzer.
|
||||
|
||||
|`search_analyzer` |The analyzer used to analyze the field when searching, which
|
||||
overrides the value of `analyzer`. Can be updated on an existing field.
|
||||
|
||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
||||
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
|
||||
defaults to `true` or to the parent `object` type setting.
|
||||
|
||||
|`ignore_above` |The analyzer will ignore strings larger than this size.
|
||||
Useful for generic `not_analyzed` fields that should ignore long text.
|
||||
|
||||
This option is also useful for protecting against Lucene's term byte-length
|
||||
limit of `32766`. Note: the value for `ignore_above` is the _character count_,
|
||||
but Lucene counts bytes, so if you have UTF-8 text, you may want to set the
|
||||
limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most 3
|
||||
bytes.
|
||||
|
||||
|`position_offset_gap` |Position increment gap between field instances
|
||||
with the same field name. Defaults to 0.
|
||||
|=======================================================================
|
||||
|
||||
The `string` type also support custom indexing parameters associated
|
||||
with the indexed value. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"message" : {
|
||||
"_value": "boosted value",
|
||||
"_boost": 2.0
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The mapping is required to disambiguate the meaning of the document.
|
||||
Otherwise, the structure would interpret "message" as a value of type
|
||||
"object". The key `_value` (or `value`) in the inner document specifies
|
||||
the real string content that should eventually be indexed. The `_boost`
|
||||
(or `boost`) key specifies the per field document boost (here 2.0).
|
||||
|
||||
[float]
|
||||
[[norms]]
|
||||
===== Norms
|
||||
|
||||
Norms store various normalization factors that are later used (at query time)
|
||||
in order to compute the score of a document relatively to a query.
|
||||
|
||||
Although useful for scoring, norms also require quite a lot of memory
|
||||
(typically in the order of one byte per document per field in your index,
|
||||
even for documents that don't have this specific field). As a consequence, if
|
||||
you don't need scoring on a specific field, it is highly recommended to disable
|
||||
norms on it. In particular, this is the case for fields that are used solely
|
||||
for filtering or aggregations.
|
||||
|
||||
In case you would like to disable norms after the fact, it is possible to do so
|
||||
by using the <<indices-put-mapping,PUT mapping API>>, like this:
|
||||
|
||||
[source,js]
|
||||
------------
|
||||
PUT my_index/_mapping/my_type
|
||||
{
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"norms": {
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
------------
|
||||
|
||||
Please however note that norms won't be removed instantly, but will be removed
|
||||
as old segments are merged into new segments as you continue indexing new documents.
|
||||
Any score computation on a field that has had
|
||||
norms removed might return inconsistent results since some documents won't have
|
||||
norms anymore while other documents might still have norms.
|
||||
|
||||
[float]
|
||||
[[number]]
|
||||
==== Number
|
||||
|
||||
A number based type supporting `float`, `double`, `byte`, `short`,
|
||||
`integer`, and `long`. It uses specific constructs within Lucene in
|
||||
order to support numeric values. The number types have the same ranges
|
||||
as corresponding
|
||||
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html[Java
|
||||
types]. An example mapping can be:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"rank" : {
|
||||
"type" : "float",
|
||||
"null_value" : 1.0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The following table lists all the attributes that can be used with a
|
||||
numbered type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Attribute |Description
|
||||
|`type` |The type of the number. Can be `float`, `double`, `integer`,
|
||||
`long`, `short`, `byte`. Required.
|
||||
|
||||
|`index_name` |The name of the field that will be stored in the index.
|
||||
Defaults to the property/field name.
|
||||
|
||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
||||
and it can be retrieved from it).
|
||||
|
||||
|`index` |Set to `no` if the value should not be indexed. Setting to
|
||||
`no` disables `include_in_all`. If set to `no` the field should be either stored
|
||||
in `_source`, have `include_in_all` enabled, or `store` be set to
|
||||
`true` for this to be useful.
|
||||
|
||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
||||
Automatically set to `true` when the fielddata format is `doc_values`.
|
||||
|
||||
|`precision_step` |The precision step (influences the number of terms
|
||||
generated for each number value). Defaults to `16` for `long`, `double`,
|
||||
`8` for `short`, `integer`, `float`, and `2147483647` for `byte`.
|
||||
|
||||
|`boost` |The boost value. Defaults to `1.0`.
|
||||
|
||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
||||
`null_value` as the field value. Defaults to not adding the field at
|
||||
all.
|
||||
|
||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
||||
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
|
||||
defaults to `true` or to the parent `object` type setting.
|
||||
|
||||
|`ignore_malformed` |Ignored a malformed number. Defaults to `false`.
|
||||
|
||||
|`coerce` |Try convert strings to numbers and truncate fractions for integers. Defaults to `true`.
|
||||
|
||||
|=======================================================================
|
||||
|
||||
[float]
|
||||
[[token_count]]
|
||||
==== Token Count
|
||||
The `token_count` type maps to the JSON string type but indexes and stores
|
||||
the number of tokens in the string rather than the string itself. For
|
||||
example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"name" : {
|
||||
"type" : "string",
|
||||
"fields" : {
|
||||
"word_count": {
|
||||
"type" : "token_count",
|
||||
"store" : "yes",
|
||||
"analyzer" : "standard"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
All the configuration that can be specified for a number can be specified
|
||||
for a token_count. The only extra configuration is the required
|
||||
`analyzer` field which specifies which analyzer to use to break the string
|
||||
into tokens. For best performance, use an analyzer with no token filters.
|
||||
|
||||
[NOTE]
|
||||
===================================================================
|
||||
Technically the `token_count` type sums position increments rather than
|
||||
counting tokens. This means that even if the analyzer filters out stop
|
||||
words they are included in the count.
|
||||
===================================================================
|
||||
|
||||
[float]
|
||||
[[date]]
|
||||
==== Date
|
||||
|
||||
The date type is a special type which maps to JSON string type. It
|
||||
follows a specific format that can be explicitly set. All dates are
|
||||
`UTC`. Internally, a date maps to a number type `long`, with the added
|
||||
parsing stage from string to long and from long to string. An example
|
||||
mapping:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"postDate" : {
|
||||
"type" : "date",
|
||||
"format" : "YYYY-MM-dd"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The date type will also accept a long number representing UTC
|
||||
milliseconds since the epoch, regardless of the format it can handle.
|
||||
|
||||
The following table lists all the attributes that can be used with a
|
||||
date type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Attribute |Description
|
||||
|`index_name` |The name of the field that will be stored in the index.
|
||||
Defaults to the property/field name.
|
||||
|
||||
|`format` |The <<mapping-date-format,date
|
||||
format>>. Defaults to `epoch_millis||strictDateOptionalTime`.
|
||||
|
||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
||||
and it can be retrieved from it).
|
||||
|
||||
|`index` |Set to `no` if the value should not be indexed. Setting to
|
||||
`no` disables `include_in_all`. If set to `no` the field should be either stored
|
||||
in `_source`, have `include_in_all` enabled, or `store` be set to
|
||||
`true` for this to be useful.
|
||||
|
||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
||||
Automatically set to `true` when the fielddata format is `doc_values`.
|
||||
|
||||
|`precision_step` |The precision step (influences the number of terms
|
||||
generated for each number value). Defaults to `16`.
|
||||
|
||||
|`boost` |The boost value. Defaults to `1.0`.
|
||||
|
||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
||||
`null_value` as the field value. Defaults to not adding the field at
|
||||
all.
|
||||
|
||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
||||
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
|
||||
defaults to `true` or to the parent `object` type setting.
|
||||
|
||||
|`ignore_malformed` |Ignored a malformed number. Defaults to `false`.
|
||||
|
||||
|=======================================================================
|
||||
|
||||
[float]
|
||||
[[boolean]]
|
||||
==== Boolean
|
||||
|
||||
The boolean type Maps to the JSON boolean type. It ends up storing
|
||||
within the index either `T` or `F`, with automatic translation to `true`
|
||||
and `false` respectively.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"hes_my_special_tweet" : {
|
||||
"type" : "boolean"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The boolean type also supports passing the value as a number or a string
|
||||
(in this case `0`, an empty string, `false`, `off` and `no` are
|
||||
`false`, all other values are `true`).
|
||||
|
||||
The following table lists all the attributes that can be used with the
|
||||
boolean type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Attribute |Description
|
||||
|`index_name` |The name of the field that will be stored in the index.
|
||||
Defaults to the property/field name.
|
||||
|
||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
||||
and it can be retrieved from it).
|
||||
|
||||
|`index` |Set to `no` if the value should not be indexed. Setting to
|
||||
`no` disables `include_in_all`. If set to `no` the field should be either stored
|
||||
in `_source`, have `include_in_all` enabled, or `store` be set to
|
||||
`true` for this to be useful.
|
||||
|
||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
||||
Automatically set to `true` when the fielddata format is `doc_values`.
|
||||
|
||||
|`boost` |The boost value. Defaults to `1.0`.
|
||||
|
||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
||||
`null_value` as the field value. Defaults to not adding the field at
|
||||
all.
|
||||
|=======================================================================
|
||||
|
||||
[float]
|
||||
[[binary]]
|
||||
==== Binary
|
||||
|
||||
The binary type is a base64 representation of binary data that can be
|
||||
stored in the index. The field is not stored by default and not indexed at
|
||||
all.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"image" : {
|
||||
"type" : "binary"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The following table lists all the attributes that can be used with the
|
||||
binary type:
|
||||
|
||||
[horizontal]
|
||||
|
||||
`index_name`::
|
||||
|
||||
The name of the field that will be stored in the index. Defaults to the
|
||||
property/field name.
|
||||
|
||||
`store`::
|
||||
|
||||
Set to `true` to store actual field in the index, `false` to not store it.
|
||||
Defaults to `false` (note, the JSON document itself is already stored, so
|
||||
the binary field can be retrieved from there).
|
||||
|
||||
`doc_values`::
|
||||
|
||||
Set to `true` to store field values in a column-stride fashion.
|
||||
|
||||
[float]
|
||||
[[fielddata-filters]]
|
||||
==== Fielddata filters
|
||||
|
||||
It is possible to control which field values are loaded into memory,
|
||||
which is particularly useful for aggregations on string fields, using
|
||||
fielddata filters, which are explained in detail in the
|
||||
<<modules-fielddata,Fielddata>> section.
|
||||
|
||||
Fielddata filters can exclude terms which do not match a regex, or which
|
||||
don't fall between a `min` and `max` frequency range:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
tweet: {
|
||||
type: "string",
|
||||
analyzer: "whitespace"
|
||||
fielddata: {
|
||||
filter: {
|
||||
regex: {
|
||||
"pattern": "^#.*"
|
||||
},
|
||||
frequency: {
|
||||
min: 0.001,
|
||||
max: 0.1,
|
||||
min_segment_size: 500
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
These filters can be updated on an existing field mapping and will take
|
||||
effect the next time the fielddata for a segment is loaded. Use the
|
||||
<<indices-clearcache,Clear Cache>> API
|
||||
to reload the fielddata using the new filters.
|
||||
|
||||
[float]
|
||||
==== Similarity
|
||||
|
||||
Elasticsearch allows you to configure a similarity (scoring algorithm) per field.
|
||||
The `similarity` setting provides a simple way of choosing a similarity algorithm
|
||||
other than the default TF/IDF, such as `BM25`.
|
||||
|
||||
You can configure similarities via the
|
||||
<<index-modules-similarity,similarity module>>
|
||||
|
||||
[float]
|
||||
===== Configuring Similarity per Field
|
||||
|
||||
Defining the Similarity for a field is done via the `similarity` mapping
|
||||
property, as this example shows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"book":{
|
||||
"properties":{
|
||||
"title":{
|
||||
"type":"string", "similarity":"BM25"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The following Similarities are configured out-of-box:
|
||||
|
||||
`default`::
|
||||
The Default TF/IDF algorithm used by Elasticsearch and
|
||||
Lucene in previous versions.
|
||||
|
||||
`BM25`::
|
||||
The BM25 algorithm.
|
||||
http://en.wikipedia.org/wiki/Okapi_BM25[See Okapi_BM25] for more
|
||||
details.
|
||||
|
||||
|
||||
[[copy-to]]
|
||||
[float]
|
||||
===== Copy to field
|
||||
|
||||
Adding `copy_to` parameter to any field mapping will cause all values of this field to be copied to fields specified in
|
||||
the parameter. In the following example all values from fields `title` and `abstract` will be copied to the field
|
||||
`meta_data`. The field which is being copied to will be indexed (i.e. searchable, and available through `fielddata_field`) but the original source will not be modified.
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"book" : {
|
||||
"properties" : {
|
||||
"title" : { "type" : "string", "copy_to" : "meta_data" },
|
||||
"abstract" : { "type" : "string", "copy_to" : "meta_data" },
|
||||
"meta_data" : { "type" : "string" }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Multiple fields are also supported:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"book" : {
|
||||
"properties" : {
|
||||
"title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] }
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
[[multi-fields]]
|
||||
===== Multi fields
|
||||
|
||||
The `fields` options allows to map several core types fields into a single
|
||||
json source field. This can be useful if a single field need to be
|
||||
used in different ways. For example a single field is to be used for both
|
||||
free text search and sorting.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"name" : {
|
||||
"type" : "string",
|
||||
"index" : "analyzed",
|
||||
"fields" : {
|
||||
"raw" : {"type" : "string", "index" : "not_analyzed"}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In the above example the field `name` gets processed twice. The first time it gets
|
||||
processed as an analyzed string and this version is accessible under the field name
|
||||
`name`, this is the main field and is in fact just like any other field. The second time
|
||||
it gets processed as a not analyzed string and is accessible under the name `name.raw`.
|
||||
|
||||
[float]
|
||||
==== Include in All
|
||||
|
||||
The `include_in_all` setting is ignored on any field that is defined in
|
||||
the `fields` options. Setting the `include_in_all` only makes sense on
|
||||
the main field, since the raw field value is copied to the `_all` field,
|
||||
the tokens aren't copied.
|
||||
|
||||
[float]
|
||||
==== Updating a field
|
||||
|
||||
In essence a field cannot be updated. However multi fields can be
|
||||
added to existing fields. This allows for example to have a different
|
||||
`analyzer` configuration in addition to the already configured
|
||||
`analyzer` configuration specified in the main and other multi fields.
|
||||
|
||||
Also the new multi field will only be applied on document that have been
|
||||
added after the multi field has been added and in fact the new multi field
|
||||
doesn't exist in existing documents.
|
||||
|
||||
Another important note is that new multi fields will be merged into the
|
||||
list of existing multi fields, so when adding new multi fields for a field
|
||||
previous added multi fields don't need to be specified.
|
|
@ -0,0 +1,138 @@
|
|||
[[date]]
|
||||
=== Date datatype
|
||||
|
||||
JSON doesn't have a date datatype, so dates in Elasticsearch can either be:
|
||||
|
||||
* strings containing formatted dates, e.g. `¨2015-01-01¨` or `¨2015/01/01 12:10:30`.
|
||||
* a long number representing _milliseconds-since-the-epoch_.
|
||||
* an integer representing _seconds-since-the-epoch_.
|
||||
|
||||
Internally, dates are converted to UTC (if the time-zone is specified) and
|
||||
stored as a long number representing milliseconds-since-the-epoch.
|
||||
|
||||
Date formats can be customised, but if no `format` is specified then it uses
|
||||
the default: `strictDateOptionalTime||epoch_millis`. This means that it will
|
||||
accept dates with optional timestamps, which conform to the formats supported
|
||||
by <<strict-date-time,`strictDateOptionalTime`>> or milliseconds-since-the-epoch.
|
||||
|
||||
For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"date": {
|
||||
"type": "date" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{ "date": "2015-01-01" } <2>
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{ "date": "2015-01-01T12:10:30Z" } <3>
|
||||
|
||||
PUT my_index/my_type/3
|
||||
{ "date": 1420070400001 } <4>
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"sort": { "date": "asc"} <5>
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `date` field uses the default `format`.
|
||||
<2> This document uses a plain date.
|
||||
<3> This document includes a time.
|
||||
<4> This document uses milliseconds-since-the-epoch.
|
||||
<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch.
|
||||
|
||||
[[multiple-date-formats]]
|
||||
==== Multiple date formats
|
||||
|
||||
Multiple formats can be specified by separating them with `||` as a separator.
|
||||
Each format will be tried in turn until a matching format is found. The first
|
||||
format will be used to convert the _milliseconds-since-the-epoch_ value back
|
||||
into a string.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"date": {
|
||||
"type": "date",
|
||||
"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
[[date-params]]
|
||||
==== Parameters for `date` fields
|
||||
|
||||
The following parameters are accepted by `date` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<index-boost,`boost`>>::
|
||||
|
||||
Field-level index time boosting. Accepts a floating point number, defaults
|
||||
to `1.0`.
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` (default) or `false`.
|
||||
|
||||
<<mapping-date-format,`format`>>::
|
||||
|
||||
The date format(s) that can be parsed. Defaults to
|
||||
`epoch_millis||strictDateOptionalTime`.
|
||||
|
||||
<<ignore-malformed,`ignore_malformed`>>::
|
||||
|
||||
If `true`, malformed numbers are ignored. If `false` (default), malformed
|
||||
numbers throw an exception and reject the whole document.
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Whether or not the field value should be included in the
|
||||
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||
Otherwise defaults to `true`.
|
||||
|
||||
<<mapping-index,`index`>>::
|
||||
|
||||
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||
|
||||
<<null-value,`null_value`>>::
|
||||
|
||||
Accepts a date value in one of the configured +format+'s as the field
|
||||
which is substituted for any explicit `null` values. Defaults to `null`,
|
||||
which means the field is treated as missing.
|
||||
|
||||
<<precision-step,`precision_step`>>::
|
||||
|
||||
Controls the number of extra terms that are indexed to make
|
||||
<<query-dsl-range-query,`range` queries>> faster. Defaults to `16`.
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
||||
|
||||
|
|
@ -1,215 +0,0 @@
|
|||
[[mapping-geo-point-type]]
|
||||
=== Geo Point Type
|
||||
|
||||
Mapper type called `geo_point` to support geo based points. The
|
||||
declaration looks as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"pin" : {
|
||||
"properties" : {
|
||||
"location" : {
|
||||
"type" : "geo_point"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Indexed Fields
|
||||
|
||||
The `geo_point` mapping will index a single field with the format of
|
||||
`lat,lon`. The `lat_lon` option can be set to also index the `.lat` and
|
||||
`.lon` as numeric fields, and `geohash` can be set to `true` to also
|
||||
index `.geohash` value.
|
||||
|
||||
A good practice is to enable indexing `lat_lon` as well, since both the
|
||||
geo distance and bounding box filters can either be executed using in
|
||||
memory checks, or using the indexed lat lon values, and it really
|
||||
depends on the data set which one performs better. Note though, that
|
||||
indexed lat lon only make sense when there is a single geo point value
|
||||
for the field, and not multi values.
|
||||
|
||||
[float]
|
||||
==== Geohashes
|
||||
|
||||
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||
cell in turn can be further subdivided into smaller cells which are
|
||||
represented by a longer string. So the longer the geohash, the smaller
|
||||
(and thus more accurate) the cell is.
|
||||
|
||||
Because geohashes are just strings, they can be stored in an inverted
|
||||
index like any other string, which makes querying them very efficient.
|
||||
|
||||
If you enable the `geohash` option, a `geohash` ``sub-field'' will be
|
||||
indexed as, eg `pin.geohash`. The length of the geohash is controlled by
|
||||
the `geohash_precision` parameter, which can either be set to an absolute
|
||||
length (eg `12`, the default) or to a distance (eg `1km`).
|
||||
|
||||
More usefully, set the `geohash_prefix` option to `true` to not only index
|
||||
the geohash value, but all the enclosing cells as well. For instance, a
|
||||
geohash of `u30` will be indexed as `[u,u3,u30]`. This option can be used
|
||||
by the <<query-dsl-geohash-cell-query>> to find geopoints within a
|
||||
particular cell very efficiently.
|
||||
|
||||
[float]
|
||||
==== Input Structure
|
||||
|
||||
The above mapping defines a `geo_point`, which accepts different
|
||||
formats. The following formats are supported:
|
||||
|
||||
[float]
|
||||
===== Lat Lon as Properties
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"pin" : {
|
||||
"location" : {
|
||||
"lat" : 41.12,
|
||||
"lon" : -71.34
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Lat Lon as String
|
||||
|
||||
Format in `lat,lon`.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"pin" : {
|
||||
"location" : "41.12,-71.34"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Geohash
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"pin" : {
|
||||
"location" : "drm3btev3e86"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
===== Lat Lon as Array
|
||||
|
||||
Format in `[lon, lat]`, note, the order of lon/lat here in order to
|
||||
conform with http://geojson.org/[GeoJSON].
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"pin" : {
|
||||
"location" : [-71.34, 41.12]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== Mapping Options
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Option |Description
|
||||
|`lat_lon` |Set to `true` to also index the `.lat` and `.lon` as fields.
|
||||
Defaults to `false`.
|
||||
|
||||
|`geohash` |Set to `true` to also index the `.geohash` as a field.
|
||||
Defaults to `false`.
|
||||
|
||||
|`geohash_precision` |Sets the geohash precision. It can be set to an
|
||||
absolute geohash length or a distance value (eg 1km, 1m, 1ml) defining
|
||||
the size of the smallest cell. Defaults to an absolute length of 12.
|
||||
|
||||
|`geohash_prefix` |If this option is set to `true`, not only the geohash
|
||||
but also all its parent cells (true prefixes) will be indexed as well. The
|
||||
number of terms that will be indexed depends on the `geohash_precision`.
|
||||
Defaults to `false`. *Note*: This option implicitly enables `geohash`.
|
||||
|
||||
|`validate` |Set to `false` to accept geo points with invalid latitude or
|
||||
longitude (default is `true`). *Note*: Validation only works when
|
||||
normalization has been disabled. This option will be deprecated and removed
|
||||
in upcoming releases.
|
||||
|
||||
|`validate_lat` |Set to `false` to accept geo points with an invalid
|
||||
latitude (default is `true`). This option will be deprecated and removed
|
||||
in upcoming releases.
|
||||
|
||||
|`validate_lon` |Set to `false` to accept geo points with an invalid
|
||||
longitude (default is `true`). This option will be deprecated and removed
|
||||
in upcoming releases.
|
||||
|
||||
|`normalize` |Set to `true` to normalize latitude and longitude (default
|
||||
is `true`).
|
||||
|
||||
|`normalize_lat` |Set to `true` to normalize latitude.
|
||||
|
||||
|`normalize_lon` |Set to `true` to normalize longitude.
|
||||
|
||||
|`precision_step` |The precision step (influences the number of terms
|
||||
generated for each number value) for `.lat` and `.lon` fields
|
||||
if `lat_lon` is set to `true`.
|
||||
Defaults to `16`.
|
||||
|=======================================================================
|
||||
|
||||
[float]
|
||||
==== Field data
|
||||
|
||||
By default, geo points use the `array` format which loads geo points into two
|
||||
parallel double arrays, making sure there is no precision loss. However, this
|
||||
can require a non-negligible amount of memory (16 bytes per document) which is
|
||||
why Elasticsearch also provides a field data implementation with lossy
|
||||
compression called `compressed`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"pin" : {
|
||||
"properties" : {
|
||||
"location" : {
|
||||
"type" : "geo_point",
|
||||
"fielddata" : {
|
||||
"format" : "compressed",
|
||||
"precision" : "1cm"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
This field data format comes with a `precision` option which allows to
|
||||
configure how much precision can be traded for memory. The default value is
|
||||
`1cm`. The following table presents values of the memory savings given various
|
||||
precisions:
|
||||
|
||||
|=============================================
|
||||
| Precision | Bytes per point | Size reduction
|
||||
| 1km | 4 | 75%
|
||||
| 3m | 6 | 62.5%
|
||||
| 1cm | 8 | 50%
|
||||
| 1mm | 10 | 37.5%
|
||||
|=============================================
|
||||
|
||||
Precision can be changed on a live index by using the update mapping API.
|
||||
|
||||
[float]
|
||||
==== Usage in Scripts
|
||||
|
||||
When using `doc[geo_field_name]` (in the above mapping,
|
||||
`doc['location']`), the `doc[...].value` returns a `GeoPoint`, which
|
||||
then allows access to `lat` and `lon` (for example,
|
||||
`doc[...].value.lat`). For performance, it is better to access the `lat`
|
||||
and `lon` directly using `doc[...].lat` and `doc[...].lon`.
|
|
@ -0,0 +1,167 @@
|
|||
[[geo-point]]
|
||||
=== Geo-point datatype
|
||||
|
||||
Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
|
||||
|
||||
* to find geo-points within a <<query-dsl-geo-bounding-box-query,bounding box>>,
|
||||
within a certain <<query-dsl-geo-distance-query,distance>> of a central point,
|
||||
within a <<query-dsl-geo-polygon-query,polygon>>, or within a
|
||||
<<query-dsl-geohash-cell-query,geohash>> cell.
|
||||
* to aggregate documents by <<search-aggregations-bucket-geohashgrid-aggregation,geographically>>
|
||||
or by <<search-aggregations-bucket-geodistance-aggregation,distance>> from a central point.
|
||||
* to integerate distance into a document's <<query-dsl-function-score-query,relevance score>>.
|
||||
* to <<geo-sorting,sort>> documents by distance.
|
||||
|
||||
There are four ways that a geo-point may be specified, as demonstrated below:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"location": {
|
||||
"type": "geo_point"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"text": "Geo-point as an object",
|
||||
"location": { <1>
|
||||
"lat": 41.12,
|
||||
"lon": -71.34
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{
|
||||
"text": "Geo-point as a string",
|
||||
"location": "41.12,-71.34" <2>
|
||||
}
|
||||
|
||||
PUT my_index/my_type/3
|
||||
{
|
||||
"text": "Geo-point as a geohash",
|
||||
"location": "drm3btev3e86" <3>
|
||||
}
|
||||
|
||||
PUT my_index/my_type/4
|
||||
{
|
||||
"text": "Geo-point as an array",
|
||||
"location": [ -71.34, 41.12 ] <4>
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"geo_bounding_box": { <5>
|
||||
"location": {
|
||||
"top_left": {
|
||||
"lat": 42,
|
||||
"lon": -72
|
||||
},
|
||||
"bottom_right": {
|
||||
"lat": 40,
|
||||
"lon": -74
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> Geo-point expressed as an object, with `lat` and `lon` keys.
|
||||
<2> Geo-point expressed as a string with the format: `"lat,lon"`.
|
||||
<3> Geo-point expressed as a geohash.
|
||||
<4> Geo-point expressed as an array with the format: [ `lon`, `lat`]
|
||||
<5> A geo-bounding box query which finds all geo-points that fall inside the box.
|
||||
|
||||
[IMPORTANT]
|
||||
.Geo-points expressed as an array or string
|
||||
==================================================
|
||||
|
||||
Please note that string geo-points are ordered as `lat,lon`, while array
|
||||
geo-points are ordered as the reverse: `lon,lat`.
|
||||
|
||||
Originally, `lat,lon` was used for both array and string, but the array
|
||||
format was changed early on to conform to the format used by GeoJSON.
|
||||
|
||||
==================================================
|
||||
|
||||
|
||||
[[geo-point-params]]
|
||||
==== Parameters for `geo_point` fields
|
||||
|
||||
The following parameters are accepted by `geo_point` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<coerce,`coerce`>>::
|
||||
|
||||
Normalize longitude and latitude values to a standard -180:180 / -90:90
|
||||
coordinate system. Accepts `true` and `false` (default).
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` (default) or `false`.
|
||||
|
||||
<<geohash,`geohash`>>::
|
||||
|
||||
Should the geo-point also be indexed as a geohash in the `.geohash`
|
||||
sub-field? Defaults to `false`, unless `geohash_prefix` is `true`.
|
||||
|
||||
<<geohash-precision,`geohash_precision`>>::
|
||||
|
||||
The maximum length of the geohash to use for the `geohash` and
|
||||
`geohash_prefix` options.
|
||||
|
||||
<<geohash-prefix,`geohash_prefix`>>::
|
||||
|
||||
Should the geo-point also be indexed as a geohash plus all its prefixes?
|
||||
Defaults to `false`.
|
||||
|
||||
<<ignore-malformed,`ignore_malformed`>>::
|
||||
|
||||
If `true`, malformed geo-points are ignored. If `false` (default),
|
||||
malformed geo-points throw an exception and reject the whole document.
|
||||
|
||||
<<lat-lon,`lat_lon`>>::
|
||||
|
||||
Should the geo-point also be indexed as `.lat` and `.lon` sub-fields?
|
||||
Accepts `true` and `false` (default).
|
||||
|
||||
<<precision-step,`precision_step`>>::
|
||||
|
||||
Controls the number of extra terms that are indexed for each lat/lon point.
|
||||
Defaults to `16`. Ignored if `lat_lon` is `false`.
|
||||
|
||||
|
||||
==== Using geo-points in scripts
|
||||
|
||||
When accessing the value of a geo-point in a script, the value is returned as
|
||||
a `GeoPoint` object, which allows access to the `.lat` and `.lon` values
|
||||
respectively:
|
||||
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
geopoint = doc['location'].value;
|
||||
lat = geopoint.lat;
|
||||
lon = geopoint.lon;
|
||||
--------------------------------------------------
|
||||
|
||||
For performance reasons, it is better to access the lat/lon values directly:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
lat = doc['location'].lat;
|
||||
lon = doc['location'].lon;
|
||||
--------------------------------------------------
|
||||
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
[[mapping-geo-shape-type]]
|
||||
=== Geo Shape Type
|
||||
[[geo-shape]]
|
||||
=== Geo-Shape datatype
|
||||
|
||||
The `geo_shape` mapping type facilitates the indexing of and searching
|
||||
The `geo_shape` datatype facilitates the indexing of and searching
|
||||
with arbitrary geo shapes such as rectangles and polygons. It should be
|
||||
used when either the data being indexed or the queries being executed
|
||||
contain shapes other than just points.
|
|
@ -1,40 +0,0 @@
|
|||
[[mapping-ip-type]]
|
||||
=== IP Type
|
||||
|
||||
An `ip` mapping type allows to store _ipv4_ addresses in a numeric form
|
||||
allowing to easily sort, and range query it (using ip values).
|
||||
|
||||
The following table lists all the attributes that can be used with an ip
|
||||
type:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|=======================================================================
|
||||
|Attribute |Description
|
||||
|`index_name` |The name of the field that will be stored in the index.
|
||||
Defaults to the property/field name.
|
||||
|
||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
||||
and it can be retrieved from it).
|
||||
|
||||
|`index` |Set to `no` if the value should not be indexed. In this case,
|
||||
`store` should be set to `true`, since if it's not indexed and not
|
||||
stored, there is nothing to do with it.
|
||||
|
||||
|`precision_step` |The precision step (influences the number of terms
|
||||
generated for each number value). Defaults to `16`.
|
||||
|
||||
|`boost` |The boost value. Defaults to `1.0`.
|
||||
|
||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
||||
`null_value` as the field value. Defaults to not adding the field at
|
||||
all.
|
||||
|
||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
||||
enabled). Defaults to `true` or to the parent `object` type setting.
|
||||
|
||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
||||
Automatically set to `true` when the <<fielddata-formats,`fielddata` format>> is `doc_values`.
|
||||
|
||||
|=======================================================================
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
[[ip]]
|
||||
=== IPv4 datatype
|
||||
|
||||
An `ip` field is really a <<number,`long`>> field which accepts
|
||||
https://en.wikipedia.org/wiki/IPv4[IPv4] addresses and indexes them as long
|
||||
values:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"ip_addr": {
|
||||
"type": "ip"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"ip_addr": "192.168.1.1"
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"range": {
|
||||
"ip_addr": {
|
||||
"gte": "192.168.1.0",
|
||||
"lt": "192.168.2.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
|
||||
[[ip-params]]
|
||||
==== Parameters for `ip` fields
|
||||
|
||||
The following parameters are accepted by `ip` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<index-boost,`boost`>>::
|
||||
|
||||
Field-level index time boosting. Accepts a floating point number, defaults
|
||||
to `1.0`.
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` (default) or `false`.
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Whether or not the field value should be included in the
|
||||
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||
Otherwise defaults to `true`.
|
||||
|
||||
<<mapping-index,`index`>>::
|
||||
|
||||
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||
|
||||
<<null-value,`null_value`>>::
|
||||
|
||||
Accepts an IPv4 value which is substituted for any explicit `null` values.
|
||||
Defaults to `null`, which means the field is treated as missing.
|
||||
|
||||
<<precision-step,`precision_step`>>::
|
||||
|
||||
Controls the number of extra terms that are indexed to make
|
||||
<<query-dsl-range-query,`range` queries>> faster. Defaults to `16`.
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
||||
|
||||
|
||||
NOTE: IPv6 addresses are not supported yet.
|
|
@ -1,165 +0,0 @@
|
|||
[[mapping-nested-type]]
|
||||
=== Nested Type
|
||||
|
||||
The `nested` type works like the <<mapping-object-type,`object` type>> except
|
||||
that an array of `objects` is flattened, while an array of `nested` objects
|
||||
allows each object to be queried independently. To explain, consider this
|
||||
document:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"group" : "fans",
|
||||
"user" : [
|
||||
{
|
||||
"first" : "John",
|
||||
"last" : "Smith"
|
||||
},
|
||||
{
|
||||
"first" : "Alice",
|
||||
"last" : "White"
|
||||
},
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
If the `user` field is of type `object`, this document would be indexed
|
||||
internally something like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"group" : "fans",
|
||||
"user.first" : [ "alice", "john" ],
|
||||
"user.last" : [ "smith", "white" ]
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The `first` and `last` fields are flattened, and the association between
|
||||
`alice` and `white` is lost. This document would incorrectly match a query
|
||||
for `alice AND smith`.
|
||||
|
||||
If the `user` field is of type `nested`, each object is indexed as a separate
|
||||
document, something like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{ <1>
|
||||
"user.first" : "alice",
|
||||
"user.last" : "white"
|
||||
}
|
||||
{ <1>
|
||||
"user.first" : "john",
|
||||
"user.last" : "smith"
|
||||
}
|
||||
{ <2>
|
||||
"group" : "fans"
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> Hidden nested documents.
|
||||
<2> Visible ``parent'' document.
|
||||
|
||||
By keeping each nested object separate, the association between the
|
||||
`user.first` and `user.last` fields is maintained. The query for `alice AND
|
||||
smith` would *not* match this document.
|
||||
|
||||
Searching on nested docs can be done using either the
|
||||
<<query-dsl-nested-query,nested query>>.
|
||||
|
||||
==== Mapping
|
||||
|
||||
The mapping for `nested` fields is the same as `object` fields, except that it
|
||||
uses type `nested`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"type1" : {
|
||||
"properties" : {
|
||||
"user" : {
|
||||
"type" : "nested",
|
||||
"properties": {
|
||||
"first" : {"type": "string" },
|
||||
"last" : {"type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
NOTE: changing an `object` type to `nested` type requires reindexing.
|
||||
|
||||
You may want to index inner objects both as `nested` fields *and* as flattened
|
||||
`object` fields, eg for highlighting. This can be achieved by setting
|
||||
`include_in_parent` to `true`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"type1" : {
|
||||
"properties" : {
|
||||
"user" : {
|
||||
"type" : "nested",
|
||||
"include_in_parent": true,
|
||||
"properties": {
|
||||
"first" : {"type": "string" },
|
||||
"last" : {"type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The result of indexing our example document would be something like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{ <1>
|
||||
"user.first" : "alice",
|
||||
"user.last" : "white"
|
||||
}
|
||||
{ <1>
|
||||
"user.first" : "john",
|
||||
"user.last" : "smith"
|
||||
}
|
||||
{ <2>
|
||||
"group" : "fans",
|
||||
"user.first" : [ "alice", "john" ],
|
||||
"user.last" : [ "smith", "white" ]
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> Hidden nested documents.
|
||||
<2> Visible ``parent'' document.
|
||||
|
||||
|
||||
Nested fields may contain other nested fields. The `include_in_parent` object
|
||||
refers to the direct parent of the field, while the `include_in_root`
|
||||
parameter refers only to the topmost ``root'' object or document.
|
||||
|
||||
NOTE: The `include_in_parent` and `include_in_root` options do not apply
|
||||
to <<mapping-geo-shape-type,`geo_shape` fields>>, which are only ever
|
||||
indexed inside the nested document.
|
||||
|
||||
Nested docs will automatically use the root doc `_all` field only.
|
||||
|
||||
.Internal Implementation
|
||||
*********************************************
|
||||
Internally, nested objects are indexed as additional documents, but,
|
||||
since they can be guaranteed to be indexed within the same "block", it
|
||||
allows for extremely fast joining with parent docs.
|
||||
|
||||
Those internal nested documents are automatically masked away when doing
|
||||
operations against the index (like searching with a match_all query),
|
||||
and they bubble out when using the nested query.
|
||||
|
||||
Because nested docs are always masked to the parent doc, the nested docs
|
||||
can never be accessed outside the scope of the `nested` query. For example
|
||||
stored fields can be enabled on fields inside nested objects, but there is
|
||||
no way of retrieving them, since stored fields are fetched outside of
|
||||
the `nested` query scope.
|
||||
|
||||
The `_source` field is always associated with the parent document and
|
||||
because of that field values via the source can be fetched for nested object.
|
||||
*********************************************
|
|
@ -0,0 +1,201 @@
|
|||
[[nested]]
|
||||
=== Nested datatype
|
||||
|
||||
The `nested` type is a specialised version of the <<object,`object`>> datatype
|
||||
that allows arrays of objects to be indexed and queried independently of each
|
||||
other.
|
||||
|
||||
==== How arrays of objects are flattened
|
||||
|
||||
Arrays of inner <<object,`object` fields>> do not work the way you may expect.
|
||||
Lucene has no concept of inner objects, so Elasticsearch flattens object
|
||||
hierarchies into a simple list of field names and values. For instance, the
|
||||
following document:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"group" : "fans",
|
||||
"user" : [ <1>
|
||||
{
|
||||
"first" : "John",
|
||||
"last" : "Smith"
|
||||
},
|
||||
{
|
||||
"first" : "Alice",
|
||||
"last" : "White"
|
||||
}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `user` field is dynamically added as a field of type `object`.
|
||||
|
||||
would be transformed internally into a document that looks more like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"group" : "fans",
|
||||
"user.first" : [ "alice", "john" ],
|
||||
"user.last" : [ "smith", "white" ]
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The `user.first` and `user.last` fields are flattened into multi-value fields,
|
||||
and the association between `alice` and `white` is lost. This document would
|
||||
incorrectly match a query for `alice AND smith`:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{ "match": { "user.first": "Alice" }},
|
||||
{ "match": { "user.last": "White" }}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
==== Using `nested` fields for arrays of objects
|
||||
|
||||
If you need to index arrays of objects and to maintain the independence of
|
||||
each object in the array, you should used the `nested` datatype instead of the
|
||||
<<object,`object`>> datatype. Internally, nested objects index each object in
|
||||
the array as a separate hidden document, meaning that each nested object can be
|
||||
queried independently of the others, with the <<query-dsl-nested-query,`nested` query>>:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"user": {
|
||||
"type": "nested" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{
|
||||
"group" : "fans",
|
||||
"user" : [
|
||||
{
|
||||
"first" : "John",
|
||||
"last" : "Smith"
|
||||
},
|
||||
{
|
||||
"first" : "Alice",
|
||||
"last" : "White"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"nested": {
|
||||
"path": "user",
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{ "match": { "user.first": "Alice" }},
|
||||
{ "match": { "user.last": "White" }} <2>
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"nested": {
|
||||
"path": "user",
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{ "match": { "user.first": "Alice" }},
|
||||
{ "match": { "user.last": "Smith" }} <3>
|
||||
]
|
||||
}
|
||||
},
|
||||
"inner_hits": { <4>
|
||||
"highlight": {
|
||||
"fields": {
|
||||
"user.first": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `user` field is mapped as type `nested` instead of type `object`.
|
||||
<2> This query doesn't match because `Alice` and `White` are not in the same nested object.
|
||||
<3> This query matches because `Alice` and `White` are in the same nested object.
|
||||
<4> `inner_hits` allow us to highlight the matching nested documents.
|
||||
|
||||
|
||||
Nested documents can be:
|
||||
|
||||
* queried with the <<query-dsl-nested-query,`nested`>> query.
|
||||
* analyzed with the <<search-aggregations-bucket-nested-aggregation,`nested`>>
|
||||
and <<search-aggregations-bucket-reverse-nested-aggregation, `reverse_nested`>>
|
||||
aggregations.
|
||||
* sorted with <<nested-sorting,nested sorting>>.
|
||||
* retrieved and highlighted with <<nested-inner-hits,nested inner hits>>.
|
||||
|
||||
|
||||
[[nested-params]]
|
||||
==== Parameters for `nested` fields
|
||||
|
||||
The following parameters are accepted by `nested` fields:
|
||||
|
||||
[horizontal]
|
||||
<<dynamic,`dynamic`>>::
|
||||
|
||||
Whether or not new `properties` should be added dynamically to an existing
|
||||
nested object. Accepts `true` (default), `false` and `strict`.
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Sets the default `include_in_all` value for all the `properties` within
|
||||
the nested object. Nested documents do not have their own `_all` field.
|
||||
Instead, values are added to the `_all` field of the main ``root''
|
||||
document.
|
||||
|
||||
<<properties,`properties`>>::
|
||||
|
||||
The fields within the nested object, which can be of any
|
||||
<<mapping-types,datatype>>, including `nested`. New properties
|
||||
may be added to an existing nested object.
|
||||
|
||||
|
||||
[IMPORTANT]
|
||||
=============================================
|
||||
|
||||
Because nested documents are indexed as separate documents, they can only be
|
||||
accessed within the scope of the `nested` query, the
|
||||
`nested`/`reverse_nested`, or <<nested-inner-hits,nested inner hits>>.
|
||||
|
||||
For instance, if a string field within a nested document has
|
||||
<<index-options,`index_options`>> set to `offsets` to allow use of the postings
|
||||
highlighter, these offsets will not be available during the main highlighting
|
||||
phase. Instead, highlighting needs to be performed via
|
||||
<<nested-inner-hits,nested inner hits>>.
|
||||
|
||||
=============================================
|
||||
|
|
@ -0,0 +1,93 @@
|
|||
[[number]]
|
||||
=== Numeric datatypes
|
||||
|
||||
The following numeric types are supported:
|
||||
|
||||
[horizontal]
|
||||
`long`:: A signed 64-bit integer with a minimum value of +-2^63^+ and a maximum value of +2^63^-1+.
|
||||
`integer`:: A signed 32-bit integer with a minimum value of +-2^31^+ and a maximum value of +2^31^-1+.
|
||||
`short`:: A signed 16-bit integer with a minimum value of +-32,768+ and a maximum value of +32,767+.
|
||||
`byte`:: A signed 8-bit integer with a minimum value of +-128+ and a maximum value of +127+.
|
||||
`double`:: A double-precision 64-bit IEEE 754 floating point.
|
||||
`float`:: A single-precision 32-bit IEEE 754 floating point.
|
||||
|
||||
Below is an example of configuring a mapping with numeric fields:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"number_of_bytes": {
|
||||
"type": "integer"
|
||||
},
|
||||
"time_in_seconds": {
|
||||
"type": "float"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
[[number-params]]
|
||||
==== Parameters for numeric fields
|
||||
|
||||
The following parameters are accepted by numeric types:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<coerce,`coerce`>>::
|
||||
|
||||
Try to convert strings to numbers and truncate fractions for integers.
|
||||
Accepts `true` (default) and `false`.
|
||||
|
||||
<<index-boost,`boost`>>::
|
||||
|
||||
Field-level index time boosting. Accepts a floating point number, defaults
|
||||
to `1.0`.
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` (default) or `false`.
|
||||
|
||||
<<ignore-malformed,`ignore_malformed`>>::
|
||||
|
||||
If `true`, malformed numbers are ignored. If `false` (default), malformed
|
||||
numbers throw an exception and reject the whole document.
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Whether or not the field value should be included in the
|
||||
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||
Otherwise defaults to `true`.
|
||||
|
||||
<<mapping-index,`index`>>::
|
||||
|
||||
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||
|
||||
<<null-value,`null_value`>>::
|
||||
|
||||
Accepts a numeric value of the same `type` as the field which is
|
||||
substituted for any explicit `null` values. Defaults to `null`, which
|
||||
means the field is treated as missing.
|
||||
|
||||
<<precision-step,`precision_step`>>::
|
||||
|
||||
Controls the number of extra terms that are indexed to make
|
||||
<<query-dsl-range-query,`range` queries>> faster. The default depends on the
|
||||
numeric `type`.
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
||||
|
||||
|
|
@ -1,179 +0,0 @@
|
|||
[[mapping-object-type]]
|
||||
=== Object Type
|
||||
|
||||
JSON documents are hierarchical in nature, allowing them to define inner
|
||||
"objects" within the actual JSON. Elasticsearch completely understands
|
||||
the nature of these inner objects and can map them easily, providing
|
||||
query support for their inner fields. Because each document can have
|
||||
objects with different fields each time, objects mapped this way are
|
||||
known as "dynamic". Dynamic mapping is enabled by default. Let's take
|
||||
the following JSON as an example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"person" : {
|
||||
"name" : {
|
||||
"first_name" : "Shay",
|
||||
"last_name" : "Banon"
|
||||
},
|
||||
"sid" : "12345"
|
||||
},
|
||||
"message" : "This is a tweet!"
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above shows an example where a tweet includes the actual `person`
|
||||
details. A `person` is an object, with a `sid`, and a `name` object
|
||||
which has `first_name` and `last_name`. It's important to note that
|
||||
`tweet` is also an object, although it is a special
|
||||
<<mapping-root-object-type,root object type>>
|
||||
which allows for additional mapping definitions.
|
||||
|
||||
The following is an example of explicit mapping for the above JSON:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"person" : {
|
||||
"type" : "object",
|
||||
"properties" : {
|
||||
"name" : {
|
||||
"type" : "object",
|
||||
"properties" : {
|
||||
"first_name" : {"type" : "string"},
|
||||
"last_name" : {"type" : "string"}
|
||||
}
|
||||
},
|
||||
"sid" : {"type" : "string", "index" : "not_analyzed"}
|
||||
}
|
||||
},
|
||||
"message" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In order to mark a mapping of type `object`, set the `type` to object.
|
||||
This is an optional step, since if there are `properties` defined for
|
||||
it, it will automatically be identified as an `object` mapping.
|
||||
|
||||
[float]
|
||||
==== properties
|
||||
|
||||
An object mapping can optionally define one or more properties using the
|
||||
`properties` tag for a field. Each property can be either another
|
||||
`object`, or one of the
|
||||
<<mapping-core-types,core_types>>.
|
||||
|
||||
[float]
|
||||
==== dynamic
|
||||
|
||||
One of the most important features of Elasticsearch is its ability to be
|
||||
schema-less. This means that, in our example above, the `person` object
|
||||
can be indexed later with a new property -- `age`, for example -- and it
|
||||
will automatically be added to the mapping definitions. Same goes for
|
||||
the `tweet` root object.
|
||||
|
||||
This feature is by default turned on, and it's the `dynamic` nature of
|
||||
each object mapped. Each object mapped is automatically dynamic, though
|
||||
it can be explicitly turned off:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"person" : {
|
||||
"type" : "object",
|
||||
"properties" : {
|
||||
"name" : {
|
||||
"dynamic" : false,
|
||||
"properties" : {
|
||||
"first_name" : {"type" : "string"},
|
||||
"last_name" : {"type" : "string"}
|
||||
}
|
||||
},
|
||||
"sid" : {"type" : "string", "index" : "not_analyzed"}
|
||||
}
|
||||
},
|
||||
"message" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In the above example, the `name` object mapped is not dynamic, meaning
|
||||
that if, in the future, we try to index JSON with a `middle_name` within
|
||||
the `name` object, it will get discarded and not added.
|
||||
|
||||
There is no performance overhead if an `object` is dynamic, the ability
|
||||
to turn it off is provided as a safety mechanism so "malformed" objects
|
||||
won't, by mistake, index data that we do not wish to be indexed.
|
||||
|
||||
If a dynamic object contains yet another inner `object`, it will be
|
||||
automatically added to the index and mapped as well.
|
||||
|
||||
When processing dynamic new fields, their type is automatically derived.
|
||||
For example, if it is a `number`, it will automatically be treated as
|
||||
number <<mapping-core-types,core_type>>. Dynamic
|
||||
fields default to their default attributes, for example, they are not
|
||||
stored and they are always indexed.
|
||||
|
||||
Date fields are special since they are represented as a `string`. Date
|
||||
fields are detected if they can be parsed as a date when they are first
|
||||
introduced into the system. The set of date formats that are tested
|
||||
against can be configured using the `dynamic_date_formats` on the root object,
|
||||
which is explained later.
|
||||
|
||||
Note, once a field has been added, *its type can not change*. For
|
||||
example, if we added age and its value is a number, then it can't be
|
||||
treated as a string.
|
||||
|
||||
The `dynamic` parameter can also be set to `strict`, meaning that not
|
||||
only will new fields not be introduced into the mapping, but also that parsing
|
||||
(indexing) docs with such new fields will fail.
|
||||
|
||||
[float]
|
||||
==== enabled
|
||||
|
||||
The `enabled` flag allows to disable parsing and indexing a named object
|
||||
completely. This is handy when a portion of the JSON document contains
|
||||
arbitrary JSON which should not be indexed, nor added to the mapping.
|
||||
For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"properties" : {
|
||||
"person" : {
|
||||
"type" : "object",
|
||||
"properties" : {
|
||||
"name" : {
|
||||
"type" : "object",
|
||||
"enabled" : false
|
||||
},
|
||||
"sid" : {"type" : "string", "index" : "not_analyzed"}
|
||||
}
|
||||
},
|
||||
"message" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In the above, `name` and its content will not be indexed at all.
|
||||
|
||||
|
||||
[float]
|
||||
==== include_in_all
|
||||
|
||||
`include_in_all` can be set on the `object` type level. When set, it
|
||||
propagates down to all the inner mappings defined within the `object`
|
||||
that do not explicitly set it.
|
||||
|
|
@ -0,0 +1,105 @@
|
|||
[[object]]
|
||||
=== Object datatype
|
||||
|
||||
JSON documents are hierarchical in nature: the document may contain inner
|
||||
objects which, in turn, may contain inner objects themselves:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index/my_type/1
|
||||
{ <1>
|
||||
"region": "US",
|
||||
"manager": { <2>
|
||||
"age": 30,
|
||||
"name": { <3>
|
||||
"first": "John",
|
||||
"last": "Smith"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The outer document is also a JSON object.
|
||||
<2> It contains an inner object called `manager`.
|
||||
<3> Which in turn contains an inner object called `name`.
|
||||
|
||||
Internally, this document is indexed as a simple, flat list of key-value
|
||||
pairs, something like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"region": "US",
|
||||
"manager.age": 30,
|
||||
"manager.name.first": "John",
|
||||
"manager.name.last": "Smith"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
An explicit mapping for the above document could look like this:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": { <1>
|
||||
"properties": {
|
||||
"region": {
|
||||
"type": "string",
|
||||
"index": "not_analyzed"
|
||||
},
|
||||
"manager": { <2>
|
||||
"properties": {
|
||||
"age": { "type": "integer" },
|
||||
"name": { <3>
|
||||
"properties": {
|
||||
"first": { "type": "string" },
|
||||
"last": { "type": "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The mapping type is a type of object, and has a `properties` field.
|
||||
<2> The `manager` field is an inner `object` field.
|
||||
<3> The `manager.name` field is an inner `object` field within the `manager` field.
|
||||
|
||||
You are not required to set the field `type` to `object` explicitly, as this is the default value.
|
||||
|
||||
[[object-params]]
|
||||
==== Parameters for `object` fields
|
||||
|
||||
The following parameters are accepted by `object` fields:
|
||||
|
||||
[horizontal]
|
||||
<<dynamic,`dynamic`>>::
|
||||
|
||||
Whether or not new `properties` should be added dynamically
|
||||
to an existing object. Accepts `true` (default), `false`
|
||||
and `strict`.
|
||||
|
||||
<<enabled,`enabled`>>::
|
||||
|
||||
Whether the JSON value given for the object field should be
|
||||
parsed and indexed (`true`, default) or completely ignored (`false`).
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Sets the default `include_in_all` value for all the `properties` within
|
||||
the object. The object itself is not added to the `_all` field.
|
||||
|
||||
<<properties,`properties`>>::
|
||||
|
||||
The fields within the object, which can be of any
|
||||
<<mapping-types,datatype>>, including `object`. New properties
|
||||
may be added to an existing object.
|
||||
|
||||
IMPORTANT: If you need to index arrays of objects instead of single objects,
|
||||
read <<nested>> first.
|
||||
|
|
@ -1,190 +0,0 @@
|
|||
[[mapping-root-object-type]]
|
||||
=== Root Object Type
|
||||
|
||||
The root object mapping is an <<mapping-object-type,object type mapping>> that
|
||||
maps the root object (the type itself). It supports all of the different
|
||||
mappings that can be set using the <<mapping-object-type,object type mapping>>.
|
||||
|
||||
The root object mapping allows to index a JSON document that only contains its
|
||||
fields. For example, the following `tweet` JSON can be indexed without
|
||||
specifying the `tweet` type in the document itself:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"message" : "This is a tweet!"
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== dynamic_date_formats
|
||||
|
||||
`dynamic_date_formats` (old setting called `date_formats` still works)
|
||||
is the ability to set one or more date formats that will be used to
|
||||
detect `date` fields. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
|
||||
"properties" : {
|
||||
"message" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In the above mapping, if a new JSON field of type string is detected,
|
||||
the date formats specified will be used in order to check if its a date.
|
||||
If it passes parsing, then the field will be declared with `date` type,
|
||||
and will use the matching format as its format attribute. The date
|
||||
format itself is explained
|
||||
<<mapping-date-format,here>>.
|
||||
|
||||
The default formats are: `strictDateOptionalTime` (ISO) and
|
||||
`yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z` and `epoch_millis`.
|
||||
|
||||
*Note:* `dynamic_date_formats` are used *only* for dynamically added
|
||||
date fields, not for `date` fields that you specify in your mapping.
|
||||
|
||||
[float]
|
||||
==== date_detection
|
||||
|
||||
Allows to disable automatic date type detection (if a new field is introduced
|
||||
and matches the provided format), for example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"date_detection" : false,
|
||||
"properties" : {
|
||||
"message" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== numeric_detection
|
||||
|
||||
Sometimes, even though json has support for native numeric types,
|
||||
numeric values are still provided as strings. In order to try and
|
||||
automatically detect numeric values from string, the `numeric_detection`
|
||||
can be set to `true`. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"tweet" : {
|
||||
"numeric_detection" : true,
|
||||
"properties" : {
|
||||
"message" : {"type" : "string"}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
[float]
|
||||
==== dynamic_templates
|
||||
|
||||
Dynamic templates allow to define mapping templates that will be applied
|
||||
when dynamic introduction of fields / objects happens.
|
||||
|
||||
IMPORTANT: Dynamic field mappings are only added when a field contains
|
||||
a concrete value -- not `null` or an empty array. This means that if the `null_value` option
|
||||
is used in a `dynamic_template`, it will only be applied after the first document
|
||||
with a concrete value for the field has been indexed.
|
||||
|
||||
For example, we might want to have all fields to be stored by default,
|
||||
or all `string` fields to be stored, or have `string` fields to always
|
||||
be indexed with multi fields syntax, once analyzed and once not_analyzed.
|
||||
Here is a simple example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"person" : {
|
||||
"dynamic_templates" : [
|
||||
{
|
||||
"template_1" : {
|
||||
"match" : "multi*",
|
||||
"mapping" : {
|
||||
"type" : "{dynamic_type}",
|
||||
"index" : "analyzed",
|
||||
"fields" : {
|
||||
"org" : {"type": "{dynamic_type}", "index" : "not_analyzed"}
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"template_2" : {
|
||||
"match" : "*",
|
||||
"match_mapping_type" : "string",
|
||||
"mapping" : {
|
||||
"type" : "string",
|
||||
"index" : "not_analyzed"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
The above mapping will create a field with multi fields for all field
|
||||
names starting with multi, and will map all `string` types to be
|
||||
`not_analyzed`.
|
||||
|
||||
Dynamic templates are named to allow for simple merge behavior. A new
|
||||
mapping, just with a new template can be "put" and that template will be
|
||||
added, or if it has the same name, the template will be replaced.
|
||||
|
||||
The `match` allow to define matching on the field name. An `unmatch`
|
||||
option is also available to exclude fields if they do match on `match`.
|
||||
The `match_mapping_type` controls if this template will be applied only
|
||||
for dynamic fields of the specified type (as guessed by the json
|
||||
format).
|
||||
|
||||
Another option is to use `path_match`, which allows to match the dynamic
|
||||
template against the "full" dot notation name of the field (for example
|
||||
`obj1.*.value` or `obj1.obj2.*`), with the respective `path_unmatch`.
|
||||
|
||||
The format of all the matching is simple format, allowing to use * as a
|
||||
matching element supporting simple patterns such as xxx*, *xxx, xxx*yyy
|
||||
(with arbitrary number of pattern types), as well as direct equality.
|
||||
The `match_pattern` can be set to `regex` to allow for regular
|
||||
expression based matching.
|
||||
|
||||
The `mapping` element provides the actual mapping definition. The
|
||||
`{name}` keyword can be used and will be replaced with the actual
|
||||
dynamic field name being introduced. The `{dynamic_type}` (or
|
||||
`{dynamicType}`) can be used and will be replaced with the mapping
|
||||
derived based on the field type (or the derived type, like `date`).
|
||||
|
||||
Complete generic settings can also be applied, for example, to have all
|
||||
mappings be stored, just set:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"person" : {
|
||||
"dynamic_templates" : [
|
||||
{
|
||||
"store_generic" : {
|
||||
"match" : "*",
|
||||
"mapping" : {
|
||||
"store" : true
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
Such generic templates should be placed at the end of the
|
||||
`dynamic_templates` list because when two or more dynamic templates
|
||||
match a field, only the first matching one from the list is used.
|
|
@ -0,0 +1,170 @@
|
|||
[[string]]
|
||||
=== String datatype
|
||||
|
||||
Fields of type `string` accept text values. Strings may be sub-divided into:
|
||||
|
||||
Full text::
|
||||
+
|
||||
--
|
||||
|
||||
Full text values, like the body of an email, are typically used for text based
|
||||
relevance searches, such as: _Find the most relevant documents that match a
|
||||
query for "quick brown fox"_.
|
||||
|
||||
These fields are `analyzed`, that is they are passed through an
|
||||
<<analysis,analyzer>> to convert the string into a list of individual terms
|
||||
before being indexed. The analysis process allows Elasticsearch to search for
|
||||
individual words _within_ each full text field. Full text fields are not
|
||||
used for sorting and seldom used for aggregations (although the
|
||||
<<search-aggregations-bucket-significantterms-aggregation,significant terms aggregation>> is a notable exception).
|
||||
|
||||
--
|
||||
|
||||
Keywords::
|
||||
|
||||
Keywords are exact values like email addresses, hostnames, status codes, or
|
||||
tags. They are typically used for filtering (_Find me all blog posts where
|
||||
++status++ is ++published++_), for sorting, and for aggregations. Keyword
|
||||
fields are `not_analyzed`. Instead, the exact string value is added to the
|
||||
index as a single term.
|
||||
|
||||
Below is an example of a mapping for a full text (`analyzed`) and a keyword
|
||||
(`not_analyzed`) string field:
|
||||
|
||||
[source,js]
|
||||
--------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"full_name": { <1>
|
||||
"type": "string"
|
||||
},
|
||||
"status": {
|
||||
"type": "string", <2>
|
||||
"index": "not_analyzed"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `full_name` field is an `analyzed` full text field -- `index:analyzed` is the default.
|
||||
<2> The `status` field is a `not_analyzed` keyword field.
|
||||
|
||||
Sometimes it is useful to have both a full text (`analyzed`) and a keyword
|
||||
(`not_analyzed`) version of the same field: one for full text search and the
|
||||
other for aggregations and sorting. This can be achieved with
|
||||
<<multi-fields,multi-fields>>.
|
||||
|
||||
|
||||
[[string-params]]
|
||||
==== Parameters for string fields
|
||||
|
||||
The following parameters are accepted by `string` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<analyzer,`analyzer`>>::
|
||||
|
||||
The <<analysis,analyzer>> which should be used for
|
||||
<<mapping-index,`analyzed`>> string fields, both at index-time
|
||||
and at search-time (unless overridden by the <<search-analyzer,`search_analyzer>>).
|
||||
Defaults to the default index analyzer, or the
|
||||
<<analysis-standard-analyzer,`standard` analyzer>>.
|
||||
|
||||
<<index-boost,`boost`>>::
|
||||
|
||||
Field-level index time boosting. Accepts a floating point number, defaults
|
||||
to `1.0`.
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field use on-disk index-time doc values for sorting, aggregations,
|
||||
or scripting? Accepts `true` or `false`. Defaults to `true` for
|
||||
`not_analyzed` fields. Analyzed fields do not support doc values.
|
||||
|
||||
<<fielddata,`fielddata`>>::
|
||||
|
||||
Can the field use in-memory fielddata for sorting, aggregations,
|
||||
or scripting? Accepts `disabled` or `paged_bytes` (default).
|
||||
Not analyzed fields will use <<doc-values,doc values>> in preference
|
||||
to fielddata.
|
||||
|
||||
<<multi-fields,`fields`>>::
|
||||
|
||||
Multi-fields allow the same string value to be indexed in multiple ways for
|
||||
different purposes, such as one field for search and a multi-field for
|
||||
sorting and aggregations, or the same string value analyzed by different
|
||||
analyzers.
|
||||
|
||||
<<ignore-above,`ignore_above`>>::
|
||||
|
||||
Do not index or analyze any string longer than this value. Defaults to `0` (disabled).
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Whether or not the field value should be included in the
|
||||
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||
Otherwise defaults to `true`.
|
||||
|
||||
<<mapping-index,`index`>>::
|
||||
|
||||
Should the field be searchable? Accepts `analyzed` (default, treat as full-text field),
|
||||
`not_analyzed` (treat as keyword field) and `no`.
|
||||
|
||||
<<index-options,`index_options`>>::
|
||||
|
||||
What information should be stored in the index, for search and highlighting purposes.
|
||||
Defaults to `positions` for <<mapping-index,`analyzed`>> fields, and to `docs` for
|
||||
`not_analyzed` fields.
|
||||
|
||||
|
||||
<<norms,`norms`>>::
|
||||
+
|
||||
--
|
||||
|
||||
Whether field-length should be taken into account when scoring queries.
|
||||
Defaults depend on the <<mapping-index,`index`>> setting:
|
||||
|
||||
* `analyzed` fields default to `{ "enabled": true, "loading": "lazy" }`.
|
||||
* `not_analyzed` fields default to `{ "enabled": false }`.
|
||||
--
|
||||
|
||||
<<null-value,`null_value`>>::
|
||||
|
||||
Accepts a string value which is substituted for any explicit `null`
|
||||
values. Defaults to `null`, which means the field is treated as missing.
|
||||
If the field is `analyzed`, the `null_value` will also be analyzed.
|
||||
|
||||
<<position-offset-gap,`position_offset_gap`>>::
|
||||
|
||||
The number of fake term positions which should be inserted between
|
||||
each element of an array of strings. Defaults to 0.
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
||||
|
||||
<<search-analyzer,`search_analyzer`>>::
|
||||
|
||||
The <<analyzer,`analyzer`>> that should be used at search time on
|
||||
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.
|
||||
|
||||
<<similarity,`similarity`>>::
|
||||
|
||||
Which scoring algorithm or _similarity_ should be used. Defaults
|
||||
to `default`, which uses TF/IDF.
|
||||
|
||||
<<term-vector,`term_vector`>>::
|
||||
|
||||
Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
|
||||
field. Defaults to `no`.
|
||||
|
||||
|
|
@ -0,0 +1,107 @@
|
|||
[[token-count]]
|
||||
=== Token count datatype
|
||||
|
||||
A field of type `token_count` is really an <<number,`integer>> field which
|
||||
accepts string values, analyzes them, then indexes the number of tokens in the
|
||||
string.
|
||||
|
||||
For instance:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT my_index
|
||||
{
|
||||
"mappings": {
|
||||
"my_type": {
|
||||
"properties": {
|
||||
"name": { <1>
|
||||
"type": "string",
|
||||
"fields": {
|
||||
"length": { <2>
|
||||
"type": "token_count",
|
||||
"analyzer": "standard"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT my_index/my_type/1
|
||||
{ "name": "John Smith" }
|
||||
|
||||
PUT my_index/my_type/2
|
||||
{ "name": "Rachel Alice Williams" }
|
||||
|
||||
GET my_index/_search
|
||||
{
|
||||
"query": {
|
||||
"term": {
|
||||
"name.length": 3 <3>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
<1> The `name` field is an analyzed string field which uses the default `standard` analyzer.
|
||||
<2> The `name.length` field is a `token_count` <<multi-fields,multi-field>> which will index the number of tokens in the `name` field.
|
||||
<3> This query matches only the document containing `Rachel Alice Williams`, as it contains three tokens.
|
||||
|
||||
[NOTE]
|
||||
===================================================================
|
||||
Technically the `token_count` type sums position increments rather than
|
||||
counting tokens. This means that even if the analyzer filters out stop
|
||||
words they are included in the count.
|
||||
===================================================================
|
||||
|
||||
[[token-count-params]]
|
||||
==== Parameters for `token_count` fields
|
||||
|
||||
The following parameters are accepted by `token_count` fields:
|
||||
|
||||
[horizontal]
|
||||
|
||||
<<analyzer,`analyzer`>>::
|
||||
|
||||
The <<analysis,analyzer>> which should be used to analyze the string
|
||||
value. Required. For best performance, use an analyzer without token
|
||||
filters.
|
||||
|
||||
<<index-boost,`boost`>>::
|
||||
|
||||
Field-level index time boosting. Accepts a floating point number, defaults
|
||||
to `1.0`.
|
||||
|
||||
<<doc-values,`doc_values`>>::
|
||||
|
||||
Can the field value be used for sorting, aggregations, or scripting?
|
||||
Accepts `true` (default) or `false`.
|
||||
|
||||
<<mapping-index,`index`>>::
|
||||
|
||||
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||
|
||||
<<include-in-all,`include_in_all`>>::
|
||||
|
||||
Whether or not the field value should be included in the
|
||||
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||
to `false`. Note: if `true`, it is the string value that is added to `_all`,
|
||||
not the calculated token count.
|
||||
|
||||
<<null-value,`null_value`>>::
|
||||
|
||||
Accepts a numeric value of the same `type` as the field which is
|
||||
substituted for any explicit `null` values. Defaults to `null`, which
|
||||
means the field is treated as missing.
|
||||
|
||||
<<precision-step,`precision_step`>>::
|
||||
|
||||
Controls the number of extra terms that are indexed to make
|
||||
<<query-dsl-range-query,`range` queries>> faster. Defaults to `32`.
|
||||
|
||||
<<mapping-store,`store`>>::
|
||||
|
||||
Whether the field value should be stored and retrievable separately from
|
||||
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||
(default).
|
|
@ -71,7 +71,7 @@ Field statistics can be accessed with a subscript operator like this:
|
|||
|
||||
Field statistics are computed per shard and therefore these numbers can vary
|
||||
depending on the shard the current document resides in.
|
||||
The number of terms in a field cannot be accessed using the `_index` variable. See <<mapping-core-types, word count mapping type>> on how to do that.
|
||||
The number of terms in a field cannot be accessed using the `_index` variable. See <<token-count>> for how to do that.
|
||||
|
||||
[float]
|
||||
==== Term statistics:
|
||||
|
@ -80,7 +80,7 @@ Term statistics for a field can be accessed with a subscript operator like
|
|||
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
|
||||
If you do not need the term frequency, call `_index['FIELD'].get('TERM', 0)`
|
||||
to avoid unnecessary initialization of the frequencies. The flag will have only
|
||||
affect is your set the `index_options` to `docs` (see <<mapping-core-types, mapping documentation>>).
|
||||
affect is your set the <<index-options,`index_options`>> to `docs`.
|
||||
|
||||
|
||||
`_index['FIELD']['TERM'].df()`::
|
||||
|
@ -176,7 +176,7 @@ return score;
|
|||
[float]
|
||||
==== Term vectors:
|
||||
|
||||
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call
|
||||
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (see <<term-vector>>). To access them, call
|
||||
`_index.termVectors()` to get a
|
||||
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields]
|
||||
instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field.
|
||||
|
|
|
@ -284,7 +284,6 @@ supported operations are:
|
|||
|=======================================================================
|
||||
|Value |Description
|
||||
| `aggs` |Aggregations (wherever they may be used)
|
||||
| `mapping` |Mappings (script transform feature)
|
||||
| `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields)
|
||||
| `update` |Update api
|
||||
| `plugin` |Any plugin that makes use of scripts under the generic `plugin` category
|
||||
|
|
|
@ -44,7 +44,7 @@ These documents would *not* match the above query:
|
|||
[float]
|
||||
===== `null_value` mapping
|
||||
|
||||
If the field mapping includes the `null_value` setting (see <<mapping-core-types>>)
|
||||
If the field mapping includes the <<null-value,`null_value`>> setting
|
||||
then explicit `null` values are replaced with the specified `null_value`. For
|
||||
instance, if the `user` field were mapped as follows:
|
||||
|
||||
|
|
|
@ -254,7 +254,7 @@ decay function is specified as
|
|||
<1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`.
|
||||
<2> The specified field must be a numeric, date, or geo-point field.
|
||||
|
||||
In the above example, the field is a <<mapping-geo-point-type>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example:
|
||||
In the above example, the field is a <<geo-point,`geo_point`>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example:
|
||||
|
||||
|
||||
[source,js]
|
||||
|
@ -268,7 +268,7 @@ In the above example, the field is a <<mapping-geo-point-type>> and origin can b
|
|||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> The date format of the origin depends on the <<mapping-date-format>> defined in
|
||||
<1> The date format of the origin depends on the <<mapping-date-format,`format`>> defined in
|
||||
your mapping. If you do not define the origin, the current time is used.
|
||||
<2> The `offset` and `decay` parameters are optional.
|
||||
|
||||
|
|
|
@ -112,7 +112,6 @@ Format in `lat,lon`.
|
|||
[float]
|
||||
==== geo_point Type
|
||||
|
||||
The filter *requires* the
|
||||
<<mapping-geo-point-type,geo_point>> type to be
|
||||
set on the relevant field.
|
||||
The query *requires* the <<geo-point,`geo_point`>> type to be set on the
|
||||
relevant field.
|
||||
|
||||
|
|
|
@ -2,8 +2,8 @@
|
|||
== Geo queries
|
||||
|
||||
Elasticsearch supports two types of geo data:
|
||||
<<mapping-geo-point-type,`geo_point`>> fields which support lat/lon pairs, and
|
||||
<<mapping-geo-shape-type,`geo_shape`>> fields, which support points,
|
||||
<<geo-point,`geo_point`>> fields which support lat/lon pairs, and
|
||||
<<geo-shape,`geo_shape`>> fields, which support points,
|
||||
lines, circles, polygons, multi-polygons etc.
|
||||
|
||||
The queries in this group are:
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
Filter documents indexed using the `geo_shape` type.
|
||||
|
||||
Requires the <<mapping-geo-shape-type,geo_shape Mapping>>.
|
||||
Requires the <<geo-shape,`geo_shape` Mapping>>.
|
||||
|
||||
The `geo_shape` query uses the same grid square representation as the
|
||||
geo_shape mapping to find documents that have a shape that intersects
|
||||
|
|
|
@ -2,13 +2,13 @@
|
|||
=== Geohash Cell Query
|
||||
|
||||
The `geohash_cell` query provides access to a hierarchy of geohashes.
|
||||
By defining a geohash cell, only <<mapping-geo-point-type,geopoints>>
|
||||
By defining a geohash cell, only <<geo-point,geopoints>>
|
||||
within this cell will match this filter.
|
||||
|
||||
To get this filter work all prefixes of a geohash need to be indexed. In
|
||||
example a geohash `u30` needs to be decomposed into three terms: `u30`,
|
||||
`u3` and `u`. This decomposition must be enabled in the mapping of the
|
||||
<<mapping-geo-point-type,geopoint>> field that's going to be filtered by
|
||||
<<geo-point,geopoint>> field that's going to be filtered by
|
||||
setting the `geohash_prefix` option:
|
||||
|
||||
[source,js]
|
||||
|
|
|
@ -7,7 +7,7 @@ which are designed to scale horizontally.
|
|||
|
||||
<<query-dsl-nested-query,`nested` query>>::
|
||||
|
||||
Documents may contains fields of type <<mapping-nested-type,`nested`>>. These
|
||||
Documents may contains fields of type <<nested,`nested`>>. These
|
||||
fields are used to index arrays of objects, where each object can be queried
|
||||
(with the `nested` query) as an independent document.
|
||||
|
||||
|
|
|
@ -44,7 +44,7 @@ These documents would *not* match the above filter:
|
|||
[float]
|
||||
==== `null_value` mapping
|
||||
|
||||
If the field mapping includes a `null_value` (see <<mapping-core-types>>) then explicit `null` values
|
||||
If the field mapping includes a <<null-value,`null_value`>> then explicit `null` values
|
||||
are replaced with the specified `null_value`. For instance, if the `user` field were mapped
|
||||
as follows:
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
=== Nested Query
|
||||
|
||||
Nested query allows to query nested objects / docs (see
|
||||
<<mapping-nested-type,nested mapping>>). The
|
||||
<<nested,nested mapping>>). The
|
||||
query is executed against the nested objects / docs as if they were
|
||||
indexed as separate docs (they are, internally) and resulting in the
|
||||
root parent doc (or parent nested mapping). Here is a sample mapping we
|
||||
|
|
|
@ -29,33 +29,60 @@ The `range` query accepts the following parameters:
|
|||
`lt`:: Less-than
|
||||
`boost`:: Sets the boost value of the query, defaults to `1.0`
|
||||
|
||||
[float]
|
||||
==== Date options
|
||||
|
||||
When applied on `date` fields the `range` filter accepts also a `time_zone` parameter.
|
||||
The `time_zone` parameter will be applied to your input lower and upper bounds and will
|
||||
move them to UTC time based date:
|
||||
[[ranges-on-dates]]
|
||||
==== Ranges on date fields
|
||||
|
||||
When running `range` queries on fields of type <<date,`date`>>, ranges can be
|
||||
specified using <<date-math>>::
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"range" : {
|
||||
"born" : {
|
||||
"gte": "2012-01-01",
|
||||
"lte": "now",
|
||||
"time_zone": "+01:00"
|
||||
"date" : {
|
||||
"gte" : "now-1d/d",
|
||||
"lt" : "now/d"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
In the above example, `gte` will be actually moved to `2011-12-31T23:00:00` UTC date.
|
||||
===== Date math and rounding
|
||||
|
||||
NOTE: if you give a date with a timezone explicitly defined and use the `time_zone` parameter, `time_zone` will be
|
||||
ignored. For example, setting `gte` to `2012-01-01T00:00:00+01:00` with `"time_zone":"+10:00"` will still use `+01:00` time zone.
|
||||
When using <<date-math,date math>> to round dates to the nearest day, month,
|
||||
hour, etc, the rounded dates depend on whether the ends of the ranges are
|
||||
inclusive or exclusive.
|
||||
|
||||
When applied on `date` fields the `range` query accepts also a `format` parameter.
|
||||
The `format` parameter will help support another date format than the one defined in mapping:
|
||||
Rounding up moves to the last millisecond of the rounding scope, and rounding
|
||||
down to the first millisecond of the rounding scope. For example:
|
||||
|
||||
[horizontal]
|
||||
`gt`::
|
||||
|
||||
Greater than the date rounded up: `2014-11-18||/M` becomes
|
||||
`2014-11-30T23:59:59.999`, ie excluding the entire month.
|
||||
|
||||
`gte`::
|
||||
|
||||
Greater than or equal to the date rounded down: `2014-11-18||/M` becomes
|
||||
`2014-11-01`, ie including the entire month.
|
||||
|
||||
`lt`::
|
||||
|
||||
Less than the date rounded down: `2014-11-18||/M` becomes `2014-11-01`, ie
|
||||
excluding the entire month.
|
||||
|
||||
`lte`::
|
||||
|
||||
Less than or equal to the date rounded up: `2014-11-18||/M` becomes
|
||||
`2014-11-30T23:59:59.999`, ie including the entire month.
|
||||
|
||||
===== Date format in range queries
|
||||
|
||||
Formatted dates will be parsed using the <<mapping-date-format,`format`>>
|
||||
specified on the <<date,`date`>> field by default, but it can be overridden by
|
||||
passing the `format` parameter to the `range` query:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -69,3 +96,25 @@ The `format` parameter will help support another date format than the one define
|
|||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
|
||||
===== Time zone in range queries
|
||||
|
||||
Dates can be converted from another timezone to UTC either by specifying the
|
||||
time zone in the date value itself (if the <<mapping-date-format, `format`>>
|
||||
accepts it), or it can be specified as the `time_zone` parameter:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
{
|
||||
"range" : {
|
||||
"timestamp" : {
|
||||
"gte": "2015-01-01 00:00:00", <1>
|
||||
"lte": "now",
|
||||
"time_zone": "+01:00"
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
<1> This date will be converted to `2014-12-31T23:00:00 UTC`.
|
||||
|
||||
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
experimental[]
|
||||
|
||||
The <<mapping-parent-field, parent/child>> and <<mapping-nested-type, nested>> features allow the return of documents that
|
||||
The <<mapping-parent-field, parent/child>> and <<nested, nested>> features allow the return of documents that
|
||||
have matches in a different scope. In the parent/child case, parent document are returned based on matches in child
|
||||
documents or child document are returned based on matches in parent documents. In the nested case, documents are returned
|
||||
based on matches in nested inner objects.
|
||||
|
|
|
@ -71,6 +71,7 @@ curl -XPOST 'localhost:9200/_search' -d '{
|
|||
}'
|
||||
--------------------------------------------------
|
||||
|
||||
[[nested-sorting]]
|
||||
==== Sorting within nested objects.
|
||||
|
||||
Elasticsearch also supports sorting by
|
||||
|
@ -166,6 +167,7 @@ If any of the indices that are queried doesn't have a mapping for `price`
|
|||
then Elasticsearch will handle it as if there was a mapping of type
|
||||
`long`, with all documents in this index having no value for this field.
|
||||
|
||||
[[geo-sorting]]
|
||||
==== Geo Distance Sorting
|
||||
|
||||
Allow to sort by `_geo_distance`. Here is an example:
|
||||
|
|
Loading…
Reference in New Issue