Docs: Mapping docs completely rewritten for 2.0
This commit is contained in:
parent
40cd460647
commit
ac2b8951c6
|
@ -14,7 +14,7 @@ docs/html/
|
||||||
docs/build.log
|
docs/build.log
|
||||||
/tmp/
|
/tmp/
|
||||||
backwards/
|
backwards/
|
||||||
|
html_docs
|
||||||
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
|
## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects)
|
||||||
## All files (.project, .classpath, .settings/*) should be generated through Maven which
|
## All files (.project, .classpath, .settings/*) should be generated through Maven which
|
||||||
## will correctly set the classpath based on the declared dependencies and write settings
|
## will correctly set the classpath based on the declared dependencies and write settings
|
||||||
|
|
|
@ -53,7 +53,7 @@ Response:
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the `geo_point` <<mapping-geo-point-type,type>>:
|
The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the <<geo-point,`geo_point` type>>:
|
||||||
|
|
||||||
* Object format: `{ "lat" : 52.3760, "lon" : 4.894 }` - this is the safest format as it is the most explicit about the `lat` & `lon` values
|
* Object format: `{ "lat" : 52.3760, "lon" : 4.894 }` - this is the safest format as it is the most explicit about the `lat` & `lon` values
|
||||||
* String format: `"52.3760, 4.894"` - where the first number is the `lat` and the second is the `lon`
|
* String format: `"52.3760, 4.894"` - where the first number is the `lat` and the second is the `lon`
|
||||||
|
|
|
@ -200,7 +200,7 @@ and therefore can't be used in the `order` option of the `terms` aggregator.
|
||||||
If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
|
If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
|
||||||
Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
|
Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
|
||||||
has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
|
has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
|
||||||
or `reverse_nested` aggregator. Read more about nested in the <<mapping-nested-type,nested type mapping>>.
|
or `reverse_nested` aggregator. Read more about nested in the <<nested,nested type mapping>>.
|
||||||
|
|
||||||
If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
|
If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
|
||||||
the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why
|
the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why
|
||||||
|
|
|
@ -152,6 +152,33 @@ being consumed by a monitoring tool, rather than intended for human
|
||||||
consumption. The default for the `human` flag is
|
consumption. The default for the `human` flag is
|
||||||
`false`.
|
`false`.
|
||||||
|
|
||||||
|
[[date-math]]
|
||||||
|
=== Date Math
|
||||||
|
|
||||||
|
Most parameters which accept a formatted date value -- such as `gt` and `lt`
|
||||||
|
in <<query-dsl-range-query,range queries>> `range` queries, or `from` and `to`
|
||||||
|
in <<search-aggregations-bucket-daterange-aggregation,`daterange`
|
||||||
|
aggregations>> -- understand date maths.
|
||||||
|
|
||||||
|
The expression starts with an anchor date, which can either be `now`, or a
|
||||||
|
date string ending with `||`. This anchor date can optionally be followed by
|
||||||
|
one or more maths expressions:
|
||||||
|
|
||||||
|
* `+1h` - add one hour
|
||||||
|
* `-1d` - subtract one day
|
||||||
|
* `/d` - round down to the nearest day
|
||||||
|
|
||||||
|
The supported <<time-units,time units>> are: `y` (year), `M` (month), `w` (week),
|
||||||
|
`d` (day), `h` (hour), `m` (minute), and `s` (second).
|
||||||
|
|
||||||
|
Some examples are:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`now+1h`:: The current time plus one hour, with ms resolution.
|
||||||
|
`now+1h+1m`:: The current time plus one hour plus one minute, with ms resolution.
|
||||||
|
`now+1h/d`:: The current time plus one hour, rounded down to the nearest day.
|
||||||
|
`2015-01-01||+1M/d`:: `2015-01-01` plus one month, rounded down to the nearest day.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
=== Response Filtering
|
=== Response Filtering
|
||||||
|
|
||||||
|
@ -237,7 +264,7 @@ curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
Note that elasticsearch sometimes returns directly the raw value of a field,
|
Note that elasticsearch sometimes returns directly the raw value of a field,
|
||||||
like the `_source` field. If you want to filter _source fields, you should
|
like the `_source` field. If you want to filter `_source` fields, you should
|
||||||
consider combining the already existing `_source` parameter (see
|
consider combining the already existing `_source` parameter (see
|
||||||
<<get-source-filtering,Get API>> for more details) with the `filter_path`
|
<<get-source-filtering,Get API>> for more details) with the `filter_path`
|
||||||
parameter like this:
|
parameter like this:
|
||||||
|
@ -318,8 +345,9 @@ of supporting the native JSON number types.
|
||||||
[float]
|
[float]
|
||||||
=== Time units
|
=== Time units
|
||||||
|
|
||||||
Whenever durations need to be specified, eg for a `timeout` parameter, the duration
|
Whenever durations need to be specified, eg for a `timeout` parameter, the
|
||||||
can be specified as a whole number representing time in milliseconds, or as a time value like `2d` for 2 days. The supported units are:
|
duration must specify the unit, like `2d` for 2 days. The supported units
|
||||||
|
are:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
`y`:: Year
|
`y`:: Year
|
||||||
|
@ -329,6 +357,7 @@ can be specified as a whole number representing time in milliseconds, or as a ti
|
||||||
`h`:: Hour
|
`h`:: Hour
|
||||||
`m`:: Minute
|
`m`:: Minute
|
||||||
`s`:: Second
|
`s`:: Second
|
||||||
|
`ms`:: Milli-second
|
||||||
|
|
||||||
[[distance-units]]
|
[[distance-units]]
|
||||||
[float]
|
[float]
|
||||||
|
|
|
@ -6,53 +6,3 @@ added to an index either when creating it or by using the put mapping
|
||||||
api. It also handles the dynamic mapping support for types that have no
|
api. It also handles the dynamic mapping support for types that have no
|
||||||
explicit mappings pre defined. For more information about mapping
|
explicit mappings pre defined. For more information about mapping
|
||||||
definitions, check out the <<mapping,mapping section>>.
|
definitions, check out the <<mapping,mapping section>>.
|
||||||
|
|
||||||
[float]
|
|
||||||
=== Dynamic Mappings
|
|
||||||
|
|
||||||
New types and new fields within types can be added dynamically just
|
|
||||||
by indexing a document. When Elasticsearch encounters a new type,
|
|
||||||
it creates the type using the `_default_` mapping (see below).
|
|
||||||
|
|
||||||
When it encounters a new field within a type, it autodetects the
|
|
||||||
datatype that the field contains and adds it to the type mapping
|
|
||||||
automatically.
|
|
||||||
|
|
||||||
See <<mapping-dynamic-mapping>> for details of how to control and
|
|
||||||
configure dynamic mapping.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
=== Default Mapping
|
|
||||||
|
|
||||||
When a new type is created (at <<indices-create-index,index creation>> time,
|
|
||||||
using the <<indices-put-mapping,`put-mapping` API>> or just by indexing a
|
|
||||||
document into it), the type uses the `_default_` mapping as its basis. Any
|
|
||||||
mapping specified in the <<indices-create-index,`create-index`>> or
|
|
||||||
<<indices-put-mapping,`put-mapping`>> request override values set in the
|
|
||||||
`_default_` mapping.
|
|
||||||
|
|
||||||
The default mapping definition is a plain mapping definition that is
|
|
||||||
embedded within Elasticsearch:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
_default_ : {
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Pretty short, isn't it? Basically, everything is `_default_`ed, including the
|
|
||||||
dynamic nature of the root object mapping which allows new fields to be added
|
|
||||||
automatically.
|
|
||||||
|
|
||||||
The default mapping can be overridden by specifying the `_default_` type when
|
|
||||||
creating a new index.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
=== Mapper settings
|
|
||||||
|
|
||||||
`index.mapper.dynamic` (_dynamic_)::
|
|
||||||
|
|
||||||
Dynamic creation of mappings for unmapped types can be completely
|
|
||||||
disabled by setting `index.mapper.dynamic` to `false`.
|
|
||||||
|
|
|
@ -6,8 +6,8 @@ are scored. Similarity is per field, meaning that via the mapping one
|
||||||
can define a different similarity per field.
|
can define a different similarity per field.
|
||||||
|
|
||||||
Configuring a custom similarity is considered a expert feature and the
|
Configuring a custom similarity is considered a expert feature and the
|
||||||
builtin similarities are most likely sufficient as is described in the
|
builtin similarities are most likely sufficient as is described in
|
||||||
<<mapping-core-types,mapping section>>
|
<<similarity>>.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[configuration]]
|
[[configuration]]
|
||||||
|
@ -90,7 +90,7 @@ Type name: `BM25`
|
||||||
==== DFR similarity
|
==== DFR similarity
|
||||||
|
|
||||||
Similarity that implements the
|
Similarity that implements the
|
||||||
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
|
||||||
from randomness] framework. This similarity has the following options:
|
from randomness] framework. This similarity has the following options:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
|
@ -111,7 +111,7 @@ Type name: `DFR`
|
||||||
[[ib]]
|
[[ib]]
|
||||||
==== IB similarity.
|
==== IB similarity.
|
||||||
|
|
||||||
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
||||||
based model] . This similarity has the following options:
|
based model] . This similarity has the following options:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
|
@ -125,7 +125,7 @@ Type name: `IB`
|
||||||
[[lm_dirichlet]]
|
[[lm_dirichlet]]
|
||||||
==== LM Dirichlet similarity.
|
==== LM Dirichlet similarity.
|
||||||
|
|
||||||
http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM
|
||||||
Dirichlet similarity] . This similarity has the following options:
|
Dirichlet similarity] . This similarity has the following options:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
|
@ -137,7 +137,7 @@ Type name: `LMDirichlet`
|
||||||
[[lm_jelinek_mercer]]
|
[[lm_jelinek_mercer]]
|
||||||
==== LM Jelinek Mercer similarity.
|
==== LM Jelinek Mercer similarity.
|
||||||
|
|
||||||
http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
||||||
Jelinek Mercer similarity] . This similarity has the following options:
|
Jelinek Mercer similarity] . This similarity has the following options:
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
|
|
|
@ -3,76 +3,157 @@
|
||||||
|
|
||||||
[partintro]
|
[partintro]
|
||||||
--
|
--
|
||||||
Mapping is the process of defining how a document should be mapped to
|
|
||||||
the Search Engine, including its searchable characteristics such as
|
|
||||||
which fields are searchable and if/how they are tokenized. In
|
|
||||||
Elasticsearch, an index may store documents of different "mapping
|
|
||||||
types". Elasticsearch allows one to associate multiple mapping
|
|
||||||
definitions for each mapping type.
|
|
||||||
|
|
||||||
Explicit mapping is defined on an index/type level. By default, there
|
Mapping is the process of defining how a document, and the fields it contains,
|
||||||
isn't a need to define an explicit mapping, since one is automatically
|
are stored and indexed. For instance, use mappings to define:
|
||||||
created and registered when a new type or new field is introduced (with
|
|
||||||
no performance overhead) and have sensible defaults. Only when the
|
* which string fields should be treated as full text fields.
|
||||||
defaults need to be overridden must a mapping definition be provided.
|
* which fields contain numbers, dates, or geolocations.
|
||||||
|
* whether the values of all fields in the document should be
|
||||||
|
indexed into the catch-all <<mapping-all-field,`_all`>> field.
|
||||||
|
* the <<mapping-date-format,format>> of date values.
|
||||||
|
* custom rules to control the mapping for
|
||||||
|
<<dynamic-mapping,dynamically added fields>>.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[all-mapping-types]]
|
[[mapping-type]]
|
||||||
=== Mapping Types
|
== Mapping Types
|
||||||
|
|
||||||
Mapping types are a way to divide the documents in an index into logical
|
Each index has one or more _mapping types_, which are used to divide the
|
||||||
groups. Think of it as tables in a database. Though there is separation
|
documents in an index into logical groups. User documents might be stored in a
|
||||||
between types, it's not a full separation (all end up as a document
|
`user` type, and blog posts in a `blogpost` type.
|
||||||
within the same Lucene index).
|
|
||||||
|
|
||||||
Field names with the same name across types are highly recommended to
|
Each mapping type has:
|
||||||
have the same type and same mapping characteristics (analysis settings
|
|
||||||
for example). There is an effort to allow to explicitly "choose" which
|
|
||||||
field to use by using type prefix (`my_type.my_field`), but it's not
|
|
||||||
complete, and there are places where it will never work (like
|
|
||||||
aggregations on the field).
|
|
||||||
|
|
||||||
In practice though, this restriction is almost never an issue. The field
|
<<mapping-fields,Meta-fields>>::
|
||||||
name usually ends up being a good indication to its "typeness" (e.g.
|
|
||||||
"first_name" will always be a string). Note also, that this does not
|
Meta-fields are used to customize how a document's metadata associated is
|
||||||
apply to the cross index case.
|
treated. Examples of meta-fields include the document's
|
||||||
|
<<mapping-index-field,`_index`>>, <<mapping-type-field,`_type`>>,
|
||||||
|
<<mapping-id-field,`_id`>>, and <<mapping-source-field,`_source`>> fields.
|
||||||
|
|
||||||
|
<<mapping-types,Fields>> or _properties_::
|
||||||
|
|
||||||
|
Each mapping type contains a list of fields or `properties` pertinent to that
|
||||||
|
type. A `user` type might contain `title`, `name`, and `age` fields, while a
|
||||||
|
`blogpost` type might contain `title`, `body`, `user_id` and `created`
|
||||||
|
fields.
|
||||||
|
|
||||||
|
The mapping for the above example could look like this:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
---------------------------------------
|
||||||
|
PUT my_index <1>
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"user": { <2>
|
||||||
|
"_all": { "enabled": false }, <3>
|
||||||
|
"properties": { <4>
|
||||||
|
"title": { "type": "string" }, <5>
|
||||||
|
"name": { "type": "string" }, <5>
|
||||||
|
"age": { "type": "integer" } <5>
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"blogpost": { <2>
|
||||||
|
"properties": { <4>
|
||||||
|
"title": { "type": "string" }, <5>
|
||||||
|
"body": { "type": "string" }, <5>
|
||||||
|
"user_id": {
|
||||||
|
"type": "string", <5>
|
||||||
|
"index": "not_analyzed"
|
||||||
|
},
|
||||||
|
"created": {
|
||||||
|
"type": "date", <5>
|
||||||
|
"format": "strict_date_optional_time||epoch_millis"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
---------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Create an index called `my_index`.
|
||||||
|
<2> Add mapping types called `user` and `blogpost`.
|
||||||
|
<3> Disable the `_all` <<mapping-fields,meta field>> for the `user` mapping type.
|
||||||
|
<4> Specify fields or _properties_ in each mapping type.
|
||||||
|
<5> Specify the data `type` and mapping for each field.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[mapping-api]]
|
== Field datatypes
|
||||||
=== Mapping API
|
|
||||||
|
|
||||||
To create a mapping, you will need the <<indices-put-mapping,Put Mapping
|
Each field has a data `type` which can be:
|
||||||
API>>, or you can add multiple mappings when you <<indices-create-index,create an
|
|
||||||
index>>.
|
* a simple type like <<string,`string`>>, <<date,`date`>>, <<number,`long`>>,
|
||||||
|
<<number,`double`>>, <<boolean,`boolean`>> or <<ip,`ip`>>.
|
||||||
|
* a type which supports the hierarchical nature of JSON such as
|
||||||
|
<<object,`object`>> or <<nested,`nested`>>.
|
||||||
|
* or a specialised type like <<geo-point,`geo_point`>>,
|
||||||
|
<<geo-shape,`geo_shape`>>, or <<search-suggesters-completion,`completion`>>.
|
||||||
|
|
||||||
|
[IMPORTANT]
|
||||||
|
.Fields are shared across mapping types
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
Mapping types are used to group fields, but the fields in each mapping type
|
||||||
|
are not independent of each other. Fields with:
|
||||||
|
|
||||||
|
* the _same name_
|
||||||
|
* in the _same index_
|
||||||
|
* in _different mapping types_
|
||||||
|
* map to the _same field_ internally,
|
||||||
|
* and *must have the same mapping*.
|
||||||
|
|
||||||
|
The `title` field exists in both the `user` and `blogpost` mapping types and
|
||||||
|
so must have exactly the same mapping in each type. The only exceptions to
|
||||||
|
this rule are the <<copy-to>>, <<dynamic>>, <<enabled>>, <<ignore-above>>,
|
||||||
|
<<include-in-all>>, and <<properties>> parameters, which may have different
|
||||||
|
settings per field.
|
||||||
|
|
||||||
|
Usually, fields with the same name also contain the same type of data, so
|
||||||
|
having the same mapping is not a problem. When conflicts do arise, these can
|
||||||
|
be solved by choosing more descriptive names, such as `user_title` and
|
||||||
|
`blog_title`.
|
||||||
|
|
||||||
|
=========================================
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
[[mapping-settings]]
|
== Dynamic mapping
|
||||||
=== Global Settings
|
|
||||||
|
|
||||||
The `index.mapping.ignore_malformed` global setting can be set on the
|
Fields and mapping types do not need to be defined before being used. Thanks
|
||||||
index level to allow to ignore malformed content globally across all
|
to _dynamic mapping_, new mapping types and new field names will be added
|
||||||
mapping types (malformed content example is trying to index a text string
|
automatically, just by indexing a document. New fields can be added both to
|
||||||
value as a numeric type).
|
the top-level mapping type, and to inner <<object,`object`>> and
|
||||||
|
<<nested,`nested`>> fields.
|
||||||
|
|
||||||
|
The
|
||||||
|
<<dynamic-mapping,dynamic mapping>> rules can be configured to
|
||||||
|
customise the mapping that is used for new types and new fields.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
== Explicit mappings
|
||||||
|
|
||||||
|
You know more about your data than Elasticsearch can guess, so while dynamic
|
||||||
|
mapping can be useful to get started, at some point you will want to specify
|
||||||
|
your own explicit mappings.
|
||||||
|
|
||||||
|
You can create mapping types and field mappings when you
|
||||||
|
<<indices-create-index,create an index>>, and you can add mapping types and
|
||||||
|
fields to an existing index with the <<indices-put-mapping,PUT mapping API>>.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
== Updating existing mappings
|
||||||
|
|
||||||
|
Other than where documented, *existing type and field mappings cannot be
|
||||||
|
updated*. Changing the mapping would mean invalidating already indexed
|
||||||
|
documents. Instead, you should create a new index with the correct mappings
|
||||||
|
and reindex your data into that index.
|
||||||
|
|
||||||
The `index.mapping.coerce` global setting can be set on the
|
|
||||||
index level to coerce numeric content globally across all
|
|
||||||
mapping types (The default setting is true and coercions attempted are
|
|
||||||
to convert strings with numbers into numeric types and also numeric values
|
|
||||||
with fractions to any integer/short/long values minus the fraction part).
|
|
||||||
When the permitted conversions fail in their attempts, the value is considered
|
|
||||||
malformed and the ignore_malformed setting dictates what will happen next.
|
|
||||||
--
|
--
|
||||||
|
|
||||||
include::mapping/fields.asciidoc[]
|
|
||||||
|
|
||||||
include::mapping/types.asciidoc[]
|
include::mapping/types.asciidoc[]
|
||||||
|
|
||||||
include::mapping/date-format.asciidoc[]
|
include::mapping/fields.asciidoc[]
|
||||||
|
|
||||||
include::mapping/fielddata_formats.asciidoc[]
|
include::mapping/params.asciidoc[]
|
||||||
|
|
||||||
include::mapping/dynamic-mapping.asciidoc[]
|
include::mapping/dynamic-mapping.asciidoc[]
|
||||||
|
|
||||||
include::mapping/meta.asciidoc[]
|
|
||||||
|
|
||||||
include::mapping/transform.asciidoc[]
|
|
||||||
|
|
|
@ -1,238 +0,0 @@
|
||||||
[[mapping-date-format]]
|
|
||||||
== Date Format
|
|
||||||
|
|
||||||
In JSON documents, dates are represented as strings. Elasticsearch uses a set
|
|
||||||
of pre-configured format to recognize and convert those, but you can change the
|
|
||||||
defaults by specifying the `format` option when defining a `date` type, or by
|
|
||||||
specifying `dynamic_date_formats` in the `root object` mapping (which will
|
|
||||||
be used unless explicitly overridden by a `date` type). There are built in
|
|
||||||
formats supported, as well as complete custom one.
|
|
||||||
|
|
||||||
The parsing of dates uses http://www.joda.org/joda-time/[Joda]. The
|
|
||||||
default date parsing used if no format is specified is
|
|
||||||
http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[ISODateTimeFormat.dateOptionalTimeParser].
|
|
||||||
|
|
||||||
An extension to the format allow to define several formats using `||`
|
|
||||||
separator. This allows to define less strict formats that can be used,
|
|
||||||
for example, the `yyyy/MM/dd HH:mm:ss||yyyy/MM/dd` format will parse
|
|
||||||
both `yyyy/MM/dd HH:mm:ss` and `yyyy/MM/dd`. The first format will also
|
|
||||||
act as the one that converts back from milliseconds to a string
|
|
||||||
representation.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[date-math]]
|
|
||||||
=== Date Math
|
|
||||||
|
|
||||||
The `date` type supports using date math expression when using it in a
|
|
||||||
query/filter (mainly makes sense in `range` query/filter).
|
|
||||||
|
|
||||||
The expression starts with an "anchor" date, which can be either `now`
|
|
||||||
or a date string (in the applicable format) ending with `||`. It can
|
|
||||||
then follow by a math expression, supporting `+`, `-` and `/`
|
|
||||||
(rounding). The units supported are `y` (year), `M` (month), `w` (week),
|
|
||||||
`d` (day), `h` (hour), `m` (minute), and `s` (second).
|
|
||||||
|
|
||||||
Here are some samples: `now+1h`, `now+1h+1m`, `now+1h/d`,
|
|
||||||
`2012-01-01||+1M/d`.
|
|
||||||
|
|
||||||
When doing `range` type searches with rounding, the value parsed
|
|
||||||
depends on whether the end of the range is inclusive or exclusive, and
|
|
||||||
whether the beginning or end of the range. Rounding up moves to the
|
|
||||||
last millisecond of the rounding scope, and rounding down to the
|
|
||||||
first millisecond of the rounding scope. The semantics work as follows:
|
|
||||||
* `gt` - round up, and use > that value (`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie excluding the entire month)
|
|
||||||
* `gte` - round D down, and use >= that value (`2014-11-18||/M` becomes `2014-11-01`, ie including the entire month)
|
|
||||||
* `lt` - round D down, and use < that value (`2014-11-18||/M` becomes `2014-11-01`, ie excluding the entire month)
|
|
||||||
* `lte` - round D up, and use <= that value(`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie including the entire month)
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[built-in]]
|
|
||||||
=== Built In Formats
|
|
||||||
|
|
||||||
Most of the below dates have a `strict` companion dates, which means, that
|
|
||||||
year, month and day parts of the week must have prepending zeros in order
|
|
||||||
to be valid. This means, that a date like `5/11/1` would not be valid, but
|
|
||||||
you would need to specify the full date, which would be `2005/11/01` in this
|
|
||||||
example. So instead of `date_optional_time` you would need to specify
|
|
||||||
`strict_date_optional_time`.
|
|
||||||
|
|
||||||
The following tables lists all the defaults ISO formats supported:
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Name |Description
|
|
||||||
|`basic_date`|A basic formatter for a full date as four digit year, two
|
|
||||||
digit month of year, and two digit day of month (yyyyMMdd).
|
|
||||||
|
|
||||||
|`basic_date_time`|A basic formatter that combines a basic date and time,
|
|
||||||
separated by a 'T' (yyyyMMdd'T'HHmmss.SSSZ).
|
|
||||||
|
|
||||||
|`basic_date_time_no_millis`|A basic formatter that combines a basic date
|
|
||||||
and time without millis, separated by a 'T' (yyyyMMdd'T'HHmmssZ).
|
|
||||||
|
|
||||||
|`basic_ordinal_date`|A formatter for a full ordinal date, using a four
|
|
||||||
digit year and three digit dayOfYear (yyyyDDD).
|
|
||||||
|
|
||||||
|`basic_ordinal_date_time`|A formatter for a full ordinal date and time,
|
|
||||||
using a four digit year and three digit dayOfYear
|
|
||||||
(yyyyDDD'T'HHmmss.SSSZ).
|
|
||||||
|
|
||||||
|`basic_ordinal_date_time_no_millis`|A formatter for a full ordinal date
|
|
||||||
and time without millis, using a four digit year and three digit
|
|
||||||
dayOfYear (yyyyDDD'T'HHmmssZ).
|
|
||||||
|
|
||||||
|`basic_time`|A basic formatter for a two digit hour of day, two digit
|
|
||||||
minute of hour, two digit second of minute, three digit millis, and time
|
|
||||||
zone offset (HHmmss.SSSZ).
|
|
||||||
|
|
||||||
|`basic_time_no_millis`|A basic formatter for a two digit hour of day,
|
|
||||||
two digit minute of hour, two digit second of minute, and time zone
|
|
||||||
offset (HHmmssZ).
|
|
||||||
|
|
||||||
|`basic_t_time`|A basic formatter for a two digit hour of day, two digit
|
|
||||||
minute of hour, two digit second of minute, three digit millis, and time
|
|
||||||
zone off set prefixed by 'T' ('T'HHmmss.SSSZ).
|
|
||||||
|
|
||||||
|`basic_t_time_no_millis`|A basic formatter for a two digit hour of day,
|
|
||||||
two digit minute of hour, two digit second of minute, and time zone
|
|
||||||
offset prefixed by 'T' ('T'HHmmssZ).
|
|
||||||
|
|
||||||
|`basic_week_date`|A basic formatter for a full date as four digit
|
|
||||||
weekyear, two digit week of weekyear, and one digit day of week
|
|
||||||
(xxxx'W'wwe). `strict_basic_week_date` is supported.
|
|
||||||
|
|
||||||
|`basic_week_date_time`|A basic formatter that combines a basic weekyear
|
|
||||||
date and time, separated by a 'T' (xxxx'W'wwe'T'HHmmss.SSSZ).
|
|
||||||
`strict_basic_week_date_time` is supported.
|
|
||||||
|
|
||||||
|`basic_week_date_time_no_millis`|A basic formatter that combines a basic
|
|
||||||
weekyear date and time without millis, separated by a 'T'
|
|
||||||
(xxxx'W'wwe'T'HHmmssZ). `strict_week_date_time` is supported.
|
|
||||||
|
|
||||||
|`date`|A formatter for a full date as four digit year, two digit month
|
|
||||||
of year, and two digit day of month (yyyy-MM-dd). `strict_date` is supported.
|
|
||||||
_
|
|
||||||
|`date_hour`|A formatter that combines a full date and two digit hour of
|
|
||||||
day. strict_date_hour` is supported.
|
|
||||||
|
|
||||||
|
|
||||||
|`date_hour_minute`|A formatter that combines a full date, two digit hour
|
|
||||||
of day, and two digit minute of hour. strict_date_hour_minute` is supported.
|
|
||||||
|
|
||||||
|`date_hour_minute_second`|A formatter that combines a full date, two
|
|
||||||
digit hour of day, two digit minute of hour, and two digit second of
|
|
||||||
minute. `strict_date_hour_minute_second` is supported.
|
|
||||||
|
|
||||||
|`date_hour_minute_second_fraction`|A formatter that combines a full
|
|
||||||
date, two digit hour of day, two digit minute of hour, two digit second
|
|
||||||
of minute, and three digit fraction of second
|
|
||||||
(yyyy-MM-dd'T'HH:mm:ss.SSS). `strict_date_hour_minute_second_fraction` is supported.
|
|
||||||
|
|
||||||
|`date_hour_minute_second_millis`|A formatter that combines a full date,
|
|
||||||
two digit hour of day, two digit minute of hour, two digit second of
|
|
||||||
minute, and three digit fraction of second (yyyy-MM-dd'T'HH:mm:ss.SSS).
|
|
||||||
`strict_date_hour_minute_second_millis` is supported.
|
|
||||||
|
|
||||||
|`date_optional_time`|a generic ISO datetime parser where the date is
|
|
||||||
mandatory and the time is optional. `strict_date_optional_time` is supported.
|
|
||||||
|
|
||||||
|`date_time`|A formatter that combines a full date and time, separated by
|
|
||||||
a 'T' (yyyy-MM-dd'T'HH:mm:ss.SSSZZ). `strict_date_time` is supported.
|
|
||||||
|
|
||||||
|`date_time_no_millis`|A formatter that combines a full date and time
|
|
||||||
without millis, separated by a 'T' (yyyy-MM-dd'T'HH:mm:ssZZ).
|
|
||||||
`strict_date_time_no_millis` is supported.
|
|
||||||
|
|
||||||
|`hour`|A formatter for a two digit hour of day. `strict_hour` is supported.
|
|
||||||
|
|
||||||
|`hour_minute`|A formatter for a two digit hour of day and two digit
|
|
||||||
minute of hour. `strict_hour_minute` is supported.
|
|
||||||
|
|
||||||
|`hour_minute_second`|A formatter for a two digit hour of day, two digit
|
|
||||||
minute of hour, and two digit second of minute.
|
|
||||||
`strict_hour_minute_second` is supported.
|
|
||||||
|
|
||||||
|`hour_minute_second_fraction`|A formatter for a two digit hour of day,
|
|
||||||
two digit minute of hour, two digit second of minute, and three digit
|
|
||||||
fraction of second (HH:mm:ss.SSS).
|
|
||||||
`strict_hour_minute_second_fraction` is supported.
|
|
||||||
|
|
||||||
|`hour_minute_second_millis`|A formatter for a two digit hour of day, two
|
|
||||||
digit minute of hour, two digit second of minute, and three digit
|
|
||||||
fraction of second (HH:mm:ss.SSS).
|
|
||||||
`strict_hour_minute_second_millis` is supported.
|
|
||||||
|
|
||||||
|`ordinal_date`|A formatter for a full ordinal date, using a four digit
|
|
||||||
year and three digit dayOfYear (yyyy-DDD). `strict_ordinal_date` is supported.
|
|
||||||
|
|
||||||
|`ordinal_date_time`|A formatter for a full ordinal date and time, using
|
|
||||||
a four digit year and three digit dayOfYear (yyyy-DDD'T'HH:mm:ss.SSSZZ).
|
|
||||||
`strict_ordinal_date_time` is supported.
|
|
||||||
|
|
||||||
|`ordinal_date_time_no_millis`|A formatter for a full ordinal date and
|
|
||||||
time without millis, using a four digit year and three digit dayOfYear
|
|
||||||
(yyyy-DDD'T'HH:mm:ssZZ).
|
|
||||||
`strict_ordinal_date_time_no_millis` is supported.
|
|
||||||
|
|
||||||
|`time`|A formatter for a two digit hour of day, two digit minute of
|
|
||||||
hour, two digit second of minute, three digit fraction of second, and
|
|
||||||
time zone offset (HH:mm:ss.SSSZZ). `strict_time` is supported.
|
|
||||||
|
|
||||||
|`time_no_millis`|A formatter for a two digit hour of day, two digit
|
|
||||||
minute of hour, two digit second of minute, and time zone offset
|
|
||||||
(HH:mm:ssZZ). `strict_time_no_millis` is supported.
|
|
||||||
|
|
||||||
|`t_time`|A formatter for a two digit hour of day, two digit minute of
|
|
||||||
hour, two digit second of minute, three digit fraction of second, and
|
|
||||||
time zone offset prefixed by 'T' ('T'HH:mm:ss.SSSZZ).
|
|
||||||
`strict_t_time` is supported.
|
|
||||||
|
|
||||||
|`t_time_no_millis`|A formatter for a two digit hour of day, two digit
|
|
||||||
minute of hour, two digit second of minute, and time zone offset
|
|
||||||
prefixed by 'T' ('T'HH:mm:ssZZ). `strict_t_time_no_millis` is supported.
|
|
||||||
|
|
||||||
|`week_date`|A formatter for a full date as four digit weekyear, two
|
|
||||||
digit week of weekyear, and one digit day of week (xxxx-'W'ww-e).
|
|
||||||
`strict_week_date` is supported.
|
|
||||||
|
|
||||||
|`week_date_time`|A formatter that combines a full weekyear date and
|
|
||||||
time, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ).
|
|
||||||
`strict_week_date_time` is supported.
|
|
||||||
|
|
||||||
|`week_date_time_no_millis`|A formatter that combines a full weekyear date
|
|
||||||
and time without millis, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ssZZ).
|
|
||||||
`strict_week_date_time` is supported.
|
|
||||||
|
|
||||||
|`weekyear`|A formatter for a four digit weekyear. `strict_week_year` is supported.
|
|
||||||
|
|
||||||
|`weekyear_week`|A formatter for a four digit weekyear and two digit week
|
|
||||||
of weekyear. `strict_weekyear_week` is supported.
|
|
||||||
|
|
||||||
|`weekyear_week_day`|A formatter for a four digit weekyear, two digit week
|
|
||||||
of weekyear, and one digit day of week. `strict_weekyear_week_day` is supported.
|
|
||||||
|
|
||||||
|`year`|A formatter for a four digit year. `strict_year` is supported.
|
|
||||||
|
|
||||||
|`year_month`|A formatter for a four digit year and two digit month of
|
|
||||||
year. `strict_year_month` is supported.
|
|
||||||
|
|
||||||
|`year_month_day`|A formatter for a four digit year, two digit month of
|
|
||||||
year, and two digit day of month. `strict_year_month_day` is supported.
|
|
||||||
|
|
||||||
|`epoch_second`|A formatter for the number of seconds since the epoch.
|
|
||||||
Note, that this timestamp allows a max length of 10 chars, so dates
|
|
||||||
older than 1653 and 2286 are not supported. You should use a different
|
|
||||||
date formatter in that case.
|
|
||||||
|
|
||||||
|`epoch_millis`|A formatter for the number of milliseconds since the epoch.
|
|
||||||
Note, that this timestamp allows a max length of 13 chars, so dates
|
|
||||||
older than 1653 and 2286 are not supported. You should use a different
|
|
||||||
date formatter in that case.
|
|
||||||
|=======================================================================
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[custom]]
|
|
||||||
=== Custom Format
|
|
||||||
|
|
||||||
Allows for a completely customizable date format explained
|
|
||||||
http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[here].
|
|
|
@ -1,73 +1,67 @@
|
||||||
[[mapping-dynamic-mapping]]
|
[[dynamic-mapping]]
|
||||||
== Dynamic Mapping
|
== Dynamic Mapping
|
||||||
|
|
||||||
Default mappings allow generic mapping definitions to be automatically applied
|
One of the most important features of Elasticsearch is that it tries to get
|
||||||
to types that do not have mappings predefined. This is mainly done
|
out of your way and let you start exploring your data as quickly as possible.
|
||||||
thanks to the fact that the
|
To index a document, you don't have to first create an index, define a mapping
|
||||||
<<mapping-object-type,object mapping>> and
|
type, and define your fields -- you can just index a document and the index,
|
||||||
namely the <<mapping-root-object-type,root
|
type, and fields will spring to life automatically:
|
||||||
object mapping>> allow for schema-less dynamic addition of unmapped
|
|
||||||
fields.
|
|
||||||
|
|
||||||
The default mapping definition is a plain mapping definition that is
|
|
||||||
embedded within the distribution:
|
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
{
|
PUT data/counters/1 <1>
|
||||||
"_default_" : {
|
{ "count": 5 }
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Creates the `data` index, the `counters` mapping type, and a field
|
||||||
|
called `count` with datatype `long`.
|
||||||
|
|
||||||
Pretty short, isn't it? Basically, everything is defaulted, especially the
|
The automatic detection and addition of new types and fields is called
|
||||||
dynamic nature of the root object mapping. The default mapping can be
|
_dynamic mapping_. The dynamic mapping rules can be customised to suit your
|
||||||
overridden by specifying the `_default_` type when creating a new index.
|
purposes with:
|
||||||
|
|
||||||
The dynamic creation of mappings for unmapped types can be completely
|
<<default-mapping,`_default_` mapping>>::
|
||||||
disabled by setting `index.mapper.dynamic` to `false`.
|
|
||||||
|
|
||||||
The dynamic creation of fields within a type can be completely
|
Configure the base mapping to be used for new mapping types.
|
||||||
disabled by setting the `dynamic` property of the type to `strict`.
|
|
||||||
|
|
||||||
Here is a <<indices-put-mapping,Put Mapping>> example that
|
<<dynamic-field-mapping,Dynamic field mappings>>::
|
||||||
disables dynamic field creation for a `tweet`:
|
|
||||||
|
|
||||||
[source,js]
|
The rules governing dynamic field detection.
|
||||||
--------------------------------------------------
|
|
||||||
$ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d '
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"dynamic": "strict",
|
|
||||||
"properties" : {
|
|
||||||
"message" : {"type" : "string", "store" : true }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
'
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Here is how we can change the default
|
<<dynamic-templates,Dynamic templates>>::
|
||||||
<<mapping-date-format,date_formats>> used in the
|
|
||||||
root and inner object types:
|
Custom rules to configure the mapping for dynamically added fields.
|
||||||
|
|
||||||
|
TIP: <<indices-templates,Index templates>> allow you to configure the default
|
||||||
|
mappings, settings, aliases, and warmers for new indices, whether created
|
||||||
|
automatically or explicitly.
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"_default_" : {
|
|
||||||
"dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy", "date_optional_time"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
=== Unmapped fields in queries
|
=== Disabling automatic type creation
|
||||||
|
|
||||||
Queries and filters can refer to fields that don't exist in a mapping. Whether this
|
Automatic type creation can be disabled by setting the `index.mapper.dynamic`
|
||||||
is allowed is controlled by the `index.query.parse.allow_unmapped_fields` setting.
|
setting to `false`, either by setting the default value in the
|
||||||
This setting defaults to `true`. Setting it to `false` will disallow the usage of
|
`config/elasticsearch.yml` file, or per-index as an index setting:
|
||||||
unmapped fields in queries.
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /_settings <1>
|
||||||
|
{
|
||||||
|
"index.mapper.dynamic":false
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Disable automatic type creation for all indices.
|
||||||
|
|
||||||
|
Regardless of the value of this setting, types can still be added explicitly
|
||||||
|
when <<indices-create-index,creating an index>> or with the
|
||||||
|
<<indices-put-mapping,PUT mapping>> API.
|
||||||
|
|
||||||
|
|
||||||
|
include::dynamic/default-mapping.asciidoc[]
|
||||||
|
|
||||||
|
include::dynamic/field-mapping.asciidoc[]
|
||||||
|
|
||||||
|
include::dynamic/templates.asciidoc[]
|
||||||
|
|
||||||
When registering a new <<search-percolate,percolator query>> or creating
|
|
||||||
a <<filtered,filtered alias>> then the `index.query.parse.allow_unmapped_fields` setting
|
|
||||||
is forcefully overwritten to disallowed unmapped fields.
|
|
|
@ -0,0 +1,82 @@
|
||||||
|
[[default-mapping]]
|
||||||
|
=== `_default_` mapping
|
||||||
|
|
||||||
|
The default mapping, which will be used as the base mapping for any new
|
||||||
|
mapping types, can be customised by adding a mapping type with the name
|
||||||
|
`_default_` to an index, either when
|
||||||
|
<<indices-create-index,creating the index>> or later on with the
|
||||||
|
<<indices-put-mapping,PUT mapping>> API.
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"_default_": { <1>
|
||||||
|
"_all": {
|
||||||
|
"enabled": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"user": {}, <2>
|
||||||
|
"blogpost": { <3>
|
||||||
|
"_all": {
|
||||||
|
"enabled": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `_default_` mapping defaults the <<mapping-all-field,`_all`>> field to disabled.
|
||||||
|
<2> The `user` type inherits the settings from `_default_`.
|
||||||
|
<3> The `blogpost` type overrides the defaults and enables the <<mapping-all-field,`_all`>> field.
|
||||||
|
|
||||||
|
While the `_default_` mapping can be updated after an index has been created,
|
||||||
|
the new defaults will only affect mapping types that are created afterwards.
|
||||||
|
|
||||||
|
The `_default_` mapping can be used in conjunction with
|
||||||
|
<<indices-templates,Index templates>> to control dynamically created types
|
||||||
|
within automatically created indices:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT _template/logging
|
||||||
|
{
|
||||||
|
"template": "logs-*", <1>
|
||||||
|
"settings": { "number_of_shards": 1 }, <2>
|
||||||
|
"mappings": {
|
||||||
|
"_default_": {
|
||||||
|
"_all": { <3>
|
||||||
|
"enabled": false
|
||||||
|
},
|
||||||
|
"dynamic_templates": [
|
||||||
|
{
|
||||||
|
"strings": { <4>
|
||||||
|
"match_mapping_type": "string",
|
||||||
|
"mapping": {
|
||||||
|
"type": "string",
|
||||||
|
"fields": {
|
||||||
|
"raw": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed",
|
||||||
|
"ignore_above": 256
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT logs-2015.10.01/event/1
|
||||||
|
{ "message": "error:16" }
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `logging` template will match any indices beginning with `logs-`.
|
||||||
|
<2> Matching indices will be created with a single primary shard.
|
||||||
|
<3> The `_all` field will be disabled by default for new type mappings.
|
||||||
|
<4> String fields will be created with an `analyzed` main field, and a `not_analyzed` `.raw` field.
|
|
@ -0,0 +1,139 @@
|
||||||
|
[[dynamic-field-mapping]]
|
||||||
|
=== Dynamic field mapping
|
||||||
|
|
||||||
|
By default, when a previously unseen field is found in a document,
|
||||||
|
Elasticsearch will add the new field to the type mapping. This behaviour can
|
||||||
|
be disabled, both at the document and at the <<object,`object`>> level, by
|
||||||
|
setting the <<dynamic,`dynamic`>> parameter to `false` or to `strict`.
|
||||||
|
|
||||||
|
Assuming `dynamic` field mapping is enabled, some simple rules are used to
|
||||||
|
determine which datatype the field should have:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
*JSON datatype*:: *Elasticsearch datatype*
|
||||||
|
|
||||||
|
`null`:: No field is added.
|
||||||
|
`true` or `false`:: <<boolean,`boolean`>> field
|
||||||
|
floating{nbsp}point{nbsp}number:: <<number,`double`>> field
|
||||||
|
integer:: <<number,`long`>> field
|
||||||
|
object:: <<object,`object`>> field
|
||||||
|
array:: Depends on the first non-`null` value in the array.
|
||||||
|
string:: Either a <<date,`date`>> field
|
||||||
|
(if the value passes <<date-detection,date detection>>),
|
||||||
|
a <<number,`double`>> or <<number,`long`>> field
|
||||||
|
(if the value passes <<numeric-detection,numeric detection>>)
|
||||||
|
or an <<mapping-index,`analyzed`>> <<string,`string`>> field.
|
||||||
|
|
||||||
|
These are the only <<mapping-types,field datatypes>> that are dynamically
|
||||||
|
detected. All other datatypes must be mapped explicitly.
|
||||||
|
|
||||||
|
Besides the options listed below, dynamic field mapping rules can be further
|
||||||
|
customised with <<dynamic-templates,`dynamic_templates`>>.
|
||||||
|
|
||||||
|
[[date-detection]]
|
||||||
|
==== Date detection
|
||||||
|
|
||||||
|
If `date_detection` is enabled (default), then new string fields are checked
|
||||||
|
to see whether their contents match any of the date patterns specified in
|
||||||
|
`dynamic_date_formats`. If a match is found, a new <<date,`date`>> field is
|
||||||
|
added with the corresponding format.
|
||||||
|
|
||||||
|
The default value for `dynamic_date_formats` is:
|
||||||
|
|
||||||
|
[ <<strict-date-time,`"strict_date_optional_time"`>>,`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`]
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"create_date": "2015/09/02"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_mapping <1>
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `create_date` field has been added as a <<date,`date`>>
|
||||||
|
field with the <<mapping-date-format,`format`>>: +
|
||||||
|
`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`.
|
||||||
|
|
||||||
|
===== Disabling date detection
|
||||||
|
|
||||||
|
Dynamic date dection can be disabled by setting `date_detection` to `false`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"date_detection": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1 <1>
|
||||||
|
{
|
||||||
|
"create": "2015/09/02"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> The `create_date` field has been added as a <<string,`string`>> field.
|
||||||
|
|
||||||
|
===== Customising detected date formats
|
||||||
|
|
||||||
|
Alternatively, the `dynamic_date_formats` can be customised to support your
|
||||||
|
own <<mapping-date-format,date formats>>:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"dynamic_date_formats": ["MM/dd/yyyy"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"create_date": "09/25/2015"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
|
||||||
|
[[numeric-detection]]
|
||||||
|
==== Numeric detection
|
||||||
|
|
||||||
|
While JSON has support for native floating point and integer datatypes, some
|
||||||
|
applications or languages may sometimes render numbers as strings. Usually the
|
||||||
|
correct solution is to map these fields explicitly, but numeric detection
|
||||||
|
(which is disabled by default) can be enabled to do this automatically:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"numeric_detection": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"my_float": "1.0", <1>
|
||||||
|
"my_integer": "1" <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `my_float` field is added as a <<number,`double`>> field.
|
||||||
|
<2> The `my_integer` field is added as a <<number,`long`>> field.
|
||||||
|
|
|
@ -0,0 +1,251 @@
|
||||||
|
[[dynamic-templates]]
|
||||||
|
=== Dynamic templates
|
||||||
|
|
||||||
|
Dynamic templates allow you to define custom mappings that can be applied to
|
||||||
|
dynamically added fields based on:
|
||||||
|
|
||||||
|
* the <<dynamic-mapping,datatype>> detected by Elasticsearch, with <<match-mapping-type,`match_mapping_type`>>.
|
||||||
|
* the name of the field, with <<match-unmatch,`match` and `unmatch`>> or <<match-pattern,`match_pattern`>>.
|
||||||
|
* the full dotted path to the field, with <<path-match-unmatch,`path_match` and `path_unmatch`>>.
|
||||||
|
|
||||||
|
The original field name `{name}` and the detected datatype
|
||||||
|
`{dynamic_type`} <<template-variables,template variables>> can be used in
|
||||||
|
the mapping specification as placeholders.
|
||||||
|
|
||||||
|
IMPORTANT: Dynamic field mappings are only added when a field contains a
|
||||||
|
concrete value -- not `null` or an empty array. This means that if the
|
||||||
|
`null_value` option is used in a `dynamic_template`, it will only be applied
|
||||||
|
after the first document with a concrete value for the field has been
|
||||||
|
indexed.
|
||||||
|
|
||||||
|
Dynamic templates are specified as an array of named objects:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"dynamic_templates": [
|
||||||
|
{
|
||||||
|
"my_template_name": { <1>
|
||||||
|
... match conditions ... <2>
|
||||||
|
"mapping": { ... } <3>
|
||||||
|
}
|
||||||
|
},
|
||||||
|
...
|
||||||
|
]
|
||||||
|
--------------------------------------------------
|
||||||
|
<1> The template name can be any string value.
|
||||||
|
<2> The match conditions can include any of : `match_mapping_type`, `match`, `match_pattern`, `unmatch`, `match_path`, `unmatch_path`.
|
||||||
|
<3> The mapping that the matched field should use.
|
||||||
|
|
||||||
|
|
||||||
|
Templates are processed in order -- the first matching template wins. New
|
||||||
|
templates can be appended to the end of the list with the
|
||||||
|
<<indices-put-mapping,PUT mapping>> API. If a new template has the same
|
||||||
|
name as an existing template, it will replace the old version.
|
||||||
|
|
||||||
|
[[match-mapping-type]]
|
||||||
|
==== `match_mapping_type`
|
||||||
|
|
||||||
|
The `match_mapping_type` matches on the datatype detected by
|
||||||
|
<<dynamic-field-mapping,dynamic field mapping>>, in other words, the datatype
|
||||||
|
that Elasticsearch thinks the field should have. Only the following datatypes
|
||||||
|
can be automatically detected: `boolean`, `date`, `double`, `long`, `object`,
|
||||||
|
`string`. It also accepts `*` to match all datatypes.
|
||||||
|
|
||||||
|
For example, if we wanted to map all integer fields as `integer` instead of
|
||||||
|
`long`, and all `string` fields as both `analyzed` and `not_analyzed`, we
|
||||||
|
could use the following template:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"dynamic_templates": [
|
||||||
|
{
|
||||||
|
"integers": {
|
||||||
|
"match_mapping_type": "long",
|
||||||
|
"mapping": {
|
||||||
|
"type": "integer"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"strings": {
|
||||||
|
"match_mapping_type": "string",
|
||||||
|
"mapping": {
|
||||||
|
"type": "string",
|
||||||
|
"fields": {
|
||||||
|
"raw": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed",
|
||||||
|
"ignore_above": 256
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"my_integer": 5, <1>
|
||||||
|
"my_string": "Some string" <2>
|
||||||
|
}
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `my_integer` field is mapped as an `integer`.
|
||||||
|
<2> The `my_string` field is mapped as an analyzed `string`, with a `not_analyzed` <<multi-fields,multi field>>.
|
||||||
|
|
||||||
|
|
||||||
|
[[match-unmatch]]
|
||||||
|
==== `match` and `unmatch`
|
||||||
|
|
||||||
|
The `match` parameter uses a pattern to match on the fieldname, while
|
||||||
|
`unmatch` uses a pattern to exclude fields matched by `match`.
|
||||||
|
|
||||||
|
The following example matches all `string` fields whose name starts with
|
||||||
|
`long_` (except for those which end with `_text`) and maps them as `long`
|
||||||
|
fields:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"dynamic_templates": [
|
||||||
|
{
|
||||||
|
"longs_as_strings": {
|
||||||
|
"match_mapping_type": "string",
|
||||||
|
"match": "long_*",
|
||||||
|
"unmatch": "*_text",
|
||||||
|
"mapping": {
|
||||||
|
"type": "long"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"long_num": "5", <1>
|
||||||
|
"long_text": "foo" <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `long_num` field is mapped as a `long`.
|
||||||
|
<2> The `long_text` field uses the default `string` mapping.
|
||||||
|
|
||||||
|
[[match-pattern]]
|
||||||
|
==== `match_pattern`
|
||||||
|
|
||||||
|
The `match_pattern` parameter behaves just like the `match` parameter, but
|
||||||
|
supports full Java regular expression matching on the field name instead of
|
||||||
|
simple wildcards, for instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"match_pattern": "^profit_\d+$"
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[[path-match-unmatch]]
|
||||||
|
==== `path_match` and `path_unmatch`
|
||||||
|
|
||||||
|
The `path_match` and `path_unmatch` parameters work in the same way as `match`
|
||||||
|
and `unmatch`, but operate on the full dotted path to the field, not just the
|
||||||
|
final name, e.g. `some_object.*.some_field`.
|
||||||
|
|
||||||
|
This example copies the values of any fields in the `name` object to the
|
||||||
|
top-level `full_name` field, except for the `middle` field:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"dynamic_templates": [
|
||||||
|
{
|
||||||
|
"full_name": {
|
||||||
|
"path_match": "name.*",
|
||||||
|
"path_unmatch": "*.middle",
|
||||||
|
"mapping": {
|
||||||
|
"type": "string",
|
||||||
|
"copy_to": "full_name"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"name": {
|
||||||
|
"first": "Alice",
|
||||||
|
"middle": "Mary",
|
||||||
|
"last": "White"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
[[template-variables]]
|
||||||
|
==== `{name}` and `{dynamic_type}`
|
||||||
|
|
||||||
|
The `{name}` and `{dynamic_type}` placeholders are replaced in the `mapping`
|
||||||
|
with the field name and detected dynamic type. The following example sets all
|
||||||
|
string fields to use an <<analyzer,`analyzer`>> with the same name as the
|
||||||
|
field, and disables <<doc-values,`doc_values`>> for all non-string fields:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"dynamic_templates": [
|
||||||
|
{
|
||||||
|
"named_analyzers": {
|
||||||
|
"match_mapping_type": "string",
|
||||||
|
"match": "*",
|
||||||
|
"mapping": {
|
||||||
|
"type": "string",
|
||||||
|
"analyzer": "{name}"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"no_doc_values": {
|
||||||
|
"match_mapping_type":"*",
|
||||||
|
"mapping": {
|
||||||
|
"type": "{dynamic_type}",
|
||||||
|
"doc_values": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"english": "Some English text", <1>
|
||||||
|
"count": 5 <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `english` field is mapped as a `string` field with the `english` analyzer.
|
||||||
|
<2> The `count` field is mapped as a `long` field with `doc_values` disabled
|
||||||
|
|
|
@ -1,257 +0,0 @@
|
||||||
[[fielddata-formats]]
|
|
||||||
== Fielddata formats
|
|
||||||
|
|
||||||
The field data format controls how field data should be stored.
|
|
||||||
|
|
||||||
Depending on the field type, there might be several field data types
|
|
||||||
available. In particular, string, geo-point and numeric types support the `doc_values`
|
|
||||||
format which allows for computing the field data data-structures at indexing
|
|
||||||
time and storing them on disk. Although it will make the index larger and may
|
|
||||||
be slightly slower, this implementation will be more near-realtime-friendly
|
|
||||||
and will require much less memory from the JVM than other implementations.
|
|
||||||
|
|
||||||
Here is an example of how to configure the `tag` field to use the `paged_bytes` field
|
|
||||||
data format.
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tag": {
|
|
||||||
"type": "string",
|
|
||||||
"fielddata": {
|
|
||||||
"format": "paged_bytes"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
It is possible to change the field data format (and the field data settings
|
|
||||||
in general) on a live index by using the update mapping API.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
=== String field data types
|
|
||||||
|
|
||||||
`paged_bytes` (default on analyzed string fields)::
|
|
||||||
Stores unique terms sequentially in a large buffer and maps documents to
|
|
||||||
the indices of the terms they contain in this large buffer.
|
|
||||||
|
|
||||||
`doc_values` (default when index is set to `not_analyzed`)::
|
|
||||||
Computes and stores field data data-structures on disk at indexing time.
|
|
||||||
Lowers memory usage but only works on non-analyzed strings (`index`: `no` or
|
|
||||||
`not_analyzed`).
|
|
||||||
|
|
||||||
[float]
|
|
||||||
=== Numeric field data types
|
|
||||||
|
|
||||||
`array`::
|
|
||||||
Stores field values in memory using arrays.
|
|
||||||
|
|
||||||
`doc_values` (default unless doc values are disabled)::
|
|
||||||
Computes and stores field data data-structures on disk at indexing time.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
=== Geo point field data types
|
|
||||||
|
|
||||||
`array`::
|
|
||||||
Stores latitudes and longitudes in arrays.
|
|
||||||
|
|
||||||
`doc_values` (default unless doc values are disabled)::
|
|
||||||
Computes and stores field data data-structures on disk at indexing time.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[global-ordinals]]
|
|
||||||
=== Global ordinals
|
|
||||||
|
|
||||||
Global ordinals is a data-structure on top of field data, that maintains an
|
|
||||||
incremental numbering for all the terms in field data in a lexicographic order.
|
|
||||||
Each term has a unique number and the number of term 'A' is lower than the number
|
|
||||||
of term 'B'. Global ordinals are only supported on string fields.
|
|
||||||
|
|
||||||
Field data on string also has ordinals, which is a unique numbering for all terms
|
|
||||||
in a particular segment and field. Global ordinals just build on top of this,
|
|
||||||
by providing a mapping between the segment ordinals and the global ordinals.
|
|
||||||
The latter being unique across the entire shard.
|
|
||||||
|
|
||||||
Global ordinals can be beneficial in search features that use segment ordinals already
|
|
||||||
such as the terms aggregator to improve the execution time. Often these search features
|
|
||||||
need to merge the segment ordinal results to a cross segment terms result. With
|
|
||||||
global ordinals this mapping happens during field data load time instead of during each
|
|
||||||
query execution. With global ordinals search features only need to resolve the actual
|
|
||||||
term when building the (shard) response, but during the execution there is no need
|
|
||||||
at all to use the actual terms and the unique numbering global ordinals provided is
|
|
||||||
sufficient and improves the execution time.
|
|
||||||
|
|
||||||
Global ordinals for a specified field are tied to all the segments of a shard (Lucene index),
|
|
||||||
which is different than for field data for a specific field which is tied to a single segment.
|
|
||||||
For this reason global ordinals need to be rebuilt in its entirety once new segments
|
|
||||||
become visible. This one time cost would happen anyway without global ordinals, but
|
|
||||||
then it would happen for each search execution instead!
|
|
||||||
|
|
||||||
The loading time of global ordinals depends on the number of terms in a field, but in general
|
|
||||||
it is low, since it source field data has already been loaded. The memory overhead of global
|
|
||||||
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
|
|
||||||
can move the loading time from the first search request, to the refresh itself.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[fielddata-loading]]
|
|
||||||
=== Fielddata loading
|
|
||||||
|
|
||||||
By default, field data is loaded lazily, ie. the first time that a query that
|
|
||||||
requires them is executed. However, this can make the first requests that
|
|
||||||
follow a merge operation quite slow since fielddata loading is a heavy
|
|
||||||
operation.
|
|
||||||
|
|
||||||
It is possible to force field data to be loaded and cached eagerly through the
|
|
||||||
`loading` setting of fielddata:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"category": {
|
|
||||||
"type": "string",
|
|
||||||
"fielddata": {
|
|
||||||
"loading": "eager"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Global ordinals can also be eagerly loaded:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"category": {
|
|
||||||
"type": "string",
|
|
||||||
"fielddata": {
|
|
||||||
"loading": "eager_global_ordinals"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
With the above setting both field data and global ordinals for a specific field
|
|
||||||
are eagerly loaded.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Disabling field data loading
|
|
||||||
|
|
||||||
Field data can take a lot of RAM so it makes sense to disable field data
|
|
||||||
loading on the fields that don't need field data, for example those that are
|
|
||||||
used for full-text search only. In order to disable field data loading, just
|
|
||||||
change the field data format to `disabled`. When disabled, all requests that
|
|
||||||
will try to load field data, e.g. when they include aggregations and/or sorting,
|
|
||||||
will return an error.
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"text": {
|
|
||||||
"type": "string",
|
|
||||||
"fielddata": {
|
|
||||||
"format": "disabled"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The `disabled` format is supported by all field types.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[field-data-filtering]]
|
|
||||||
=== Filtering fielddata
|
|
||||||
|
|
||||||
It is possible to control which field values are loaded into memory,
|
|
||||||
which is particularly useful for string fields. When specifying the
|
|
||||||
<<mapping-core-types,mapping>> for a field, you
|
|
||||||
can also specify a fielddata filter.
|
|
||||||
|
|
||||||
Fielddata filters can be changed using the
|
|
||||||
<<indices-put-mapping,PUT mapping>>
|
|
||||||
API. After changing the filters, use the
|
|
||||||
<<indices-clearcache,Clear Cache>> API
|
|
||||||
to reload the fielddata using the new filters.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Filtering by frequency:
|
|
||||||
|
|
||||||
The frequency filter allows you to only load terms whose frequency falls
|
|
||||||
between a `min` and `max` value, which can be expressed an absolute
|
|
||||||
number (when the number is bigger than 1.0) or as a percentage
|
|
||||||
(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
|
|
||||||
*per segment*. Percentages are based on the number of docs which have a
|
|
||||||
value for the field, as opposed to all docs in the segment.
|
|
||||||
|
|
||||||
Small segments can be excluded completely by specifying the minimum
|
|
||||||
number of docs that the segment should contain with `min_segment_size`:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tag": {
|
|
||||||
"type": "string",
|
|
||||||
"fielddata": {
|
|
||||||
"filter": {
|
|
||||||
"frequency": {
|
|
||||||
"min": 0.001,
|
|
||||||
"max": 0.1,
|
|
||||||
"min_segment_size": 500
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Filtering by regex
|
|
||||||
|
|
||||||
Terms can also be filtered by regular expression - only values which
|
|
||||||
match the regular expression are loaded. Note: the regular expression is
|
|
||||||
applied to each term in the field, not to the whole field value. For
|
|
||||||
instance, to only load hashtags from a tweet, we can use a regular
|
|
||||||
expression which matches terms beginning with `#`:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet": {
|
|
||||||
"type": "string",
|
|
||||||
"analyzer": "whitespace"
|
|
||||||
"fielddata": {
|
|
||||||
"filter": {
|
|
||||||
"regex": {
|
|
||||||
"pattern": "^#.*"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Combining filters
|
|
||||||
|
|
||||||
The `frequency` and `regex` filters can be combined:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet": {
|
|
||||||
"type": "string",
|
|
||||||
"analyzer": "whitespace"
|
|
||||||
"fielddata": {
|
|
||||||
"filter": {
|
|
||||||
"regex": {
|
|
||||||
"pattern": "^#.*",
|
|
||||||
},
|
|
||||||
"frequency": {
|
|
||||||
"min": 0.001,
|
|
||||||
"max": 0.1,
|
|
||||||
"min_segment_size": 500
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
|
@ -5,7 +5,8 @@ Each document has metadata associated with it, such as the `_index`, mapping
|
||||||
<<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields
|
<<mapping-type-field,`_type`>>, and `_id` meta-fields. The behaviour of some of these meta-fields
|
||||||
can be customised when a mapping type is created.
|
can be customised when a mapping type is created.
|
||||||
|
|
||||||
The meta-fields are:
|
[float]
|
||||||
|
=== Identity meta-fields
|
||||||
|
|
||||||
[horizontal]
|
[horizontal]
|
||||||
<<mapping-index-field,`_index`>>::
|
<<mapping-index-field,`_index`>>::
|
||||||
|
@ -18,16 +19,26 @@ The meta-fields are:
|
||||||
|
|
||||||
<<mapping-type-field,`_type`>>::
|
<<mapping-type-field,`_type`>>::
|
||||||
|
|
||||||
The document's <<all-mapping-types,mapping type>>.
|
The document's <<mapping-type,mapping type>>.
|
||||||
|
|
||||||
<<mapping-id-field,`_id`>>::
|
<<mapping-id-field,`_id`>>::
|
||||||
|
|
||||||
The document's ID.
|
The document's ID.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Document source meta-fields
|
||||||
|
|
||||||
<<mapping-source-field,`_source`>>::
|
<<mapping-source-field,`_source`>>::
|
||||||
|
|
||||||
The original JSON representing the body of the document.
|
The original JSON representing the body of the document.
|
||||||
|
|
||||||
|
<<mapping-size-field,`_size`>>::
|
||||||
|
|
||||||
|
The size of the `_source` field in bytes.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Indexing meta-fields
|
||||||
|
|
||||||
<<mapping-all-field,`_all`>>::
|
<<mapping-all-field,`_all`>>::
|
||||||
|
|
||||||
A _catch-all_ field that indexes the values of all other fields.
|
A _catch-all_ field that indexes the values of all other fields.
|
||||||
|
@ -36,18 +47,6 @@ The meta-fields are:
|
||||||
|
|
||||||
All fields in the document which contain non-null values.
|
All fields in the document which contain non-null values.
|
||||||
|
|
||||||
<<mapping-parent-field,`_parent`>>::
|
|
||||||
|
|
||||||
Used to create a parent-child relationship between two mapping types.
|
|
||||||
|
|
||||||
<<mapping-routing-field,`_routing`>>::
|
|
||||||
|
|
||||||
A custom routing value which routes a document to a particular shard.
|
|
||||||
|
|
||||||
<<mapping-size-field,`_size`>>::
|
|
||||||
|
|
||||||
The size of the `_source` field in bytes.
|
|
||||||
|
|
||||||
<<mapping-timestamp-field,`_timestamp`>>::
|
<<mapping-timestamp-field,`_timestamp`>>::
|
||||||
|
|
||||||
A timestamp associated with the document, either specified manually or auto-generated.
|
A timestamp associated with the document, either specified manually or auto-generated.
|
||||||
|
@ -56,27 +55,49 @@ The meta-fields are:
|
||||||
|
|
||||||
How long a document should live before it is automatically deleted.
|
How long a document should live before it is automatically deleted.
|
||||||
|
|
||||||
include::fields/index-field.asciidoc[]
|
[float]
|
||||||
|
=== Routing meta-fields
|
||||||
|
|
||||||
include::fields/uid-field.asciidoc[]
|
<<mapping-parent-field,`_parent`>>::
|
||||||
|
|
||||||
include::fields/type-field.asciidoc[]
|
Used to create a parent-child relationship between two mapping types.
|
||||||
|
|
||||||
|
<<mapping-routing-field,`_routing`>>::
|
||||||
|
|
||||||
|
A custom routing value which routes a document to a particular shard.
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Other meta-field
|
||||||
|
|
||||||
|
<<mapping-meta-field,`_meta`>>::
|
||||||
|
|
||||||
|
Application specific metadata.
|
||||||
|
|
||||||
include::fields/id-field.asciidoc[]
|
|
||||||
|
|
||||||
include::fields/source-field.asciidoc[]
|
|
||||||
|
|
||||||
include::fields/all-field.asciidoc[]
|
include::fields/all-field.asciidoc[]
|
||||||
|
|
||||||
include::fields/field-names-field.asciidoc[]
|
include::fields/field-names-field.asciidoc[]
|
||||||
|
|
||||||
|
include::fields/id-field.asciidoc[]
|
||||||
|
|
||||||
|
include::fields/index-field.asciidoc[]
|
||||||
|
|
||||||
|
include::fields/meta-field.asciidoc[]
|
||||||
|
|
||||||
include::fields/parent-field.asciidoc[]
|
include::fields/parent-field.asciidoc[]
|
||||||
|
|
||||||
include::fields/routing-field.asciidoc[]
|
include::fields/routing-field.asciidoc[]
|
||||||
|
|
||||||
include::fields/size-field.asciidoc[]
|
include::fields/size-field.asciidoc[]
|
||||||
|
|
||||||
|
include::fields/source-field.asciidoc[]
|
||||||
|
|
||||||
include::fields/timestamp-field.asciidoc[]
|
include::fields/timestamp-field.asciidoc[]
|
||||||
|
|
||||||
include::fields/ttl-field.asciidoc[]
|
include::fields/ttl-field.asciidoc[]
|
||||||
|
|
||||||
|
include::fields/type-field.asciidoc[]
|
||||||
|
|
||||||
|
include::fields/uid-field.asciidoc[]
|
||||||
|
|
||||||
|
|
|
@ -151,82 +151,18 @@ PUT my_index
|
||||||
<1> The `_all` field is disabled for the `my_type` type.
|
<1> The `_all` field is disabled for the `my_type` type.
|
||||||
<2> The `query_string` query will default to querying the `content` field in this index.
|
<2> The `query_string` query will default to querying the `content` field in this index.
|
||||||
|
|
||||||
[[include-in-all]]
|
[[excluding-from-all]]
|
||||||
==== Including specific fields in `_all`
|
==== Excluding fields from `_all`
|
||||||
|
|
||||||
Individual fields can be included or excluded from the `_all` field with the
|
Individual fields can be included or excluded from the `_all` field with the
|
||||||
`include_in_all` setting, which defaults to `true`:
|
<<include-in-all,`include_in_all`>> setting.
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------
|
|
||||||
PUT my_index
|
|
||||||
{
|
|
||||||
"mappings": {
|
|
||||||
"my_type": {
|
|
||||||
"properties": {
|
|
||||||
"title": { <1>
|
|
||||||
"type": "string"
|
|
||||||
}
|
|
||||||
"content": { <1>
|
|
||||||
"type": "string"
|
|
||||||
},
|
|
||||||
"date": { <2>
|
|
||||||
"type": "date",
|
|
||||||
"include_in_all": false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------
|
|
||||||
// AUTOSENSE
|
|
||||||
|
|
||||||
<1> The `title` and `content` fields with be included in the `_all` field.
|
|
||||||
<2> The `date` field will not be included in the `_all` field.
|
|
||||||
|
|
||||||
The `include_in_all` parameter can also be set at the type level and on
|
|
||||||
<<mapping-object-type,`object`>> or <<mapping-nested-type,`nested`>> fields,
|
|
||||||
in which case all sub-fields inherit that setting. For instance:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------
|
|
||||||
PUT my_index
|
|
||||||
{
|
|
||||||
"mappings": {
|
|
||||||
"my_type": {
|
|
||||||
"include_in_all": false, <1>
|
|
||||||
"properties": {
|
|
||||||
"title": { "type": "string" },
|
|
||||||
"author": {
|
|
||||||
"include_in_all": true, <2>
|
|
||||||
"properties": {
|
|
||||||
"first_name": { "type": "string" },
|
|
||||||
"last_name": { "type": "string" }
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"editor": {
|
|
||||||
"properties": {
|
|
||||||
"first_name": { "type": "string" }, <3>
|
|
||||||
"last_name": { "type": "string", "include_in_all": true } <3>
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------
|
|
||||||
// AUTOSENSE
|
|
||||||
|
|
||||||
<1> All fields in `my_type` are excluded from `_all`.
|
|
||||||
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
|
|
||||||
<3> Only the `editor.last_name` field is included in `_all`.
|
|
||||||
The `editor.first_name` inherits the type-level setting and is excluded.
|
|
||||||
|
|
||||||
[[all-field-and-boosting]]
|
[[all-field-and-boosting]]
|
||||||
==== Index boosting and the `_all` field
|
==== Index boosting and the `_all` field
|
||||||
|
|
||||||
Individual fields can be _boosted_ at index time, with the `boost` parameter.
|
Individual fields can be _boosted_ at index time, with the <<index-boost,`boost`>>
|
||||||
The `_all` field takes these boosts into account:
|
parameter. The `_all` field takes these boosts into account:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
|
@ -2,8 +2,8 @@
|
||||||
=== `_id` field
|
=== `_id` field
|
||||||
|
|
||||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
|
<<mapping-type>>) and an <<mapping-id-field,`_id`>>. The `_id` field is not
|
||||||
`_id` field is not indexed as its value can be derived automatically from the
|
indexed as its value can be derived automatically from the
|
||||||
<<mapping-uid-field,`_uid`>> field.
|
<<mapping-uid-field,`_uid`>> field.
|
||||||
|
|
||||||
The value of the `_id` field is accessible in queries and scripts, but _not_
|
The value of the `_id` field is accessible in queries and scripts, but _not_
|
||||||
|
|
|
@ -0,0 +1,30 @@
|
||||||
|
[[mapping-meta-field]]
|
||||||
|
=== `_meta` field
|
||||||
|
|
||||||
|
Each mapping type can have custom meta data associated with it. These are not
|
||||||
|
used at all by Elasticsearch, but can be used to store application-specific
|
||||||
|
metadata, such as the class that a document belongs to:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"user": {
|
||||||
|
"_meta": { <1>
|
||||||
|
"class": "MyApp::User",
|
||||||
|
"version": {
|
||||||
|
"min": "1.0",
|
||||||
|
"max": "1.3"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> This `_meta` info can be retrieved with the
|
||||||
|
<<indices-get-mapping,GET mapping>> API.
|
||||||
|
|
||||||
|
The `_meta` field can be updated on an existing type using the
|
||||||
|
<<indices-put-mapping,PUT mapping>> API.
|
|
@ -78,8 +78,7 @@ stored.
|
||||||
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
||||||
`_source`, especially the fact that you cannot reindex documents from one
|
`_source`, especially the fact that you cannot reindex documents from one
|
||||||
Elasticsearch index to another. Consider using
|
Elasticsearch index to another. Consider using
|
||||||
<<search-request-source-filtering,source filtering>> or a
|
<<search-request-source-filtering,source filtering>> instead.
|
||||||
<<mapping-transform,transform script>> instead.
|
|
||||||
|
|
||||||
The `includes`/`excludes` parameters (which also accept wildcards) can be used
|
The `includes`/`excludes` parameters (which also accept wildcards) can be used
|
||||||
as follows:
|
as follows:
|
||||||
|
|
|
@ -1,5 +1,5 @@
|
||||||
[[mapping-ttl-field]]
|
[[mapping-ttl-field]]
|
||||||
=== `_ttl`
|
=== `_ttl` field
|
||||||
|
|
||||||
Some types of documents, such as session data or special offers, come with an
|
Some types of documents, such as session data or special offers, come with an
|
||||||
expiration date. The `_ttl` field allows you to specify the minimum time a
|
expiration date. The `_ttl` field allows you to specify the minimum time a
|
||||||
|
|
|
@ -2,8 +2,8 @@
|
||||||
=== `_type` field
|
=== `_type` field
|
||||||
|
|
||||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. The
|
<<mapping-type>>) and an <<mapping-id-field,`_id`>>. The `_type` field is
|
||||||
`_type` field is indexed in order to make searching by type name fast.
|
indexed in order to make searching by type name fast.
|
||||||
|
|
||||||
The value of the `_type` field is accessible in queries, aggregations,
|
The value of the `_type` field is accessible in queries, aggregations,
|
||||||
scripts, and when sorting:
|
scripts, and when sorting:
|
||||||
|
|
|
@ -2,8 +2,8 @@
|
||||||
=== `_uid` field
|
=== `_uid` field
|
||||||
|
|
||||||
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
Each document indexed is associated with a <<mapping-type-field,`_type`>> (see
|
||||||
<<all-mapping-types,Mapping Types>>) and an <<mapping-id-field,`_id`>>. These
|
<<mapping-type>>) and an <<mapping-id-field,`_id`>>. These values are
|
||||||
values are combined as `{type}#{id}` and indexed as the `_uid` field.
|
combined as `{type}#{id}` and indexed as the `_uid` field.
|
||||||
|
|
||||||
The value of the `_uid` field is accessible in queries, aggregations, scripts,
|
The value of the `_uid` field is accessible in queries, aggregations, scripts,
|
||||||
and when sorting:
|
and when sorting:
|
||||||
|
|
|
@ -1,25 +0,0 @@
|
||||||
[[mapping-meta]]
|
|
||||||
== Meta
|
|
||||||
|
|
||||||
Each mapping can have custom meta data associated with it. These are
|
|
||||||
simple storage elements that are simply persisted along with the mapping
|
|
||||||
and can be retrieved when fetching the mapping definition. The meta is
|
|
||||||
defined under the `_meta` element, for example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"_meta" : {
|
|
||||||
"attr1" : "value1",
|
|
||||||
"attr2" : {
|
|
||||||
"attr3" : "value3"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Meta can be handy for example for client libraries that perform
|
|
||||||
serialization and deserialization to store its meta model (for example,
|
|
||||||
the class the document maps to).
|
|
|
@ -0,0 +1,100 @@
|
||||||
|
[[mapping-params]]
|
||||||
|
== Mapping parameters
|
||||||
|
|
||||||
|
The following pages provide detailed explanations of the various mapping
|
||||||
|
parameters that are used by <<mapping-types,field mappings>>:
|
||||||
|
|
||||||
|
|
||||||
|
The following mapping parameters are common to some or all field datatypes:
|
||||||
|
|
||||||
|
* <<analyzer,`analyzer`>>
|
||||||
|
* <<index-boost,`boost`>>
|
||||||
|
* <<coerce,`coerce`>>
|
||||||
|
* <<copy-to,`copy_to`>>
|
||||||
|
* <<doc-values,`doc_values`>>
|
||||||
|
* <<dynamic,`dynamic`>>
|
||||||
|
* <<enabled,`enabled`>>
|
||||||
|
* <<fielddata,`fielddata`>>
|
||||||
|
* <<geohash,`geohash`>>
|
||||||
|
* <<geohash-precision,`geohash_precision`>>
|
||||||
|
* <<geohash-prefix,`geohash_prefix`>>
|
||||||
|
* <<mapping-date-format,`format`>>
|
||||||
|
* <<ignore-above,`ignore_above`>>
|
||||||
|
* <<ignore-malformed,`ignore_malformed`>>
|
||||||
|
* <<include-in-all,`include_in_all`>>
|
||||||
|
* <<index-options,`index_options`>>
|
||||||
|
* <<lat-lon,`lat_lon`>>
|
||||||
|
* <<mapping-index,`index`>>
|
||||||
|
* <<multi-fields,`fields`>>
|
||||||
|
* <<norms,`norms`>>
|
||||||
|
* <<null-value,`null_value`>>
|
||||||
|
* <<position-offset-gap,`position_offset_gap`>>
|
||||||
|
* <<properties,`properties`>>
|
||||||
|
* <<search-analyzer,`search_analyzer`>>
|
||||||
|
* <<similarity,`similarity`>>
|
||||||
|
* <<mapping-store,`store`>>
|
||||||
|
* <<term-vector,`term_vector`>>
|
||||||
|
|
||||||
|
|
||||||
|
include::params/analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::params/boost.asciidoc[]
|
||||||
|
|
||||||
|
include::params/coerce.asciidoc[]
|
||||||
|
|
||||||
|
include::params/copy-to.asciidoc[]
|
||||||
|
|
||||||
|
include::params/doc-values.asciidoc[]
|
||||||
|
|
||||||
|
include::params/dynamic.asciidoc[]
|
||||||
|
|
||||||
|
include::params/enabled.asciidoc[]
|
||||||
|
|
||||||
|
include::params/fielddata.asciidoc[]
|
||||||
|
|
||||||
|
include::params/format.asciidoc[]
|
||||||
|
|
||||||
|
include::params/geohash.asciidoc[]
|
||||||
|
|
||||||
|
include::params/geohash-precision.asciidoc[]
|
||||||
|
|
||||||
|
include::params/geohash-prefix.asciidoc[]
|
||||||
|
|
||||||
|
include::params/ignore-above.asciidoc[]
|
||||||
|
|
||||||
|
include::params/ignore-malformed.asciidoc[]
|
||||||
|
|
||||||
|
include::params/include-in-all.asciidoc[]
|
||||||
|
|
||||||
|
include::params/index.asciidoc[]
|
||||||
|
|
||||||
|
include::params/index-options.asciidoc[]
|
||||||
|
|
||||||
|
include::params/lat-lon.asciidoc[]
|
||||||
|
|
||||||
|
include::params/multi-fields.asciidoc[]
|
||||||
|
|
||||||
|
include::params/norms.asciidoc[]
|
||||||
|
|
||||||
|
include::params/null-value.asciidoc[]
|
||||||
|
|
||||||
|
include::params/position-offset-gap.asciidoc[]
|
||||||
|
|
||||||
|
include::params/precision-step.asciidoc[]
|
||||||
|
|
||||||
|
include::params/properties.asciidoc[]
|
||||||
|
|
||||||
|
include::params/search-analyzer.asciidoc[]
|
||||||
|
|
||||||
|
include::params/similarity.asciidoc[]
|
||||||
|
|
||||||
|
include::params/store.asciidoc[]
|
||||||
|
|
||||||
|
include::params/term-vector.asciidoc[]
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
|
@ -0,0 +1,80 @@
|
||||||
|
[[analyzer]]
|
||||||
|
=== `analyzer`
|
||||||
|
|
||||||
|
The values of <<mapping-index,`analyzed`>> string fields are passed through an
|
||||||
|
<<analysis,analyzer>> to convert the string into a stream of _tokens_ or
|
||||||
|
_terms_. For instance, the string `"The quick Brown Foxes."` may, depending
|
||||||
|
on which analyzer is used, be analyzed to the tokens: `quick`, `brown`,
|
||||||
|
`fox`. These are the actual terms that are indexed for the field, which makes
|
||||||
|
it possible to search efficiently for individual words _within_ big blobs of
|
||||||
|
text.
|
||||||
|
|
||||||
|
This analysis process needs to happen not just at index time, but also at
|
||||||
|
query time: the query string needs to be passed through the same (or a
|
||||||
|
similar) analyzer so that the terms that it tries to find are in the same
|
||||||
|
format as those that exist in the index.
|
||||||
|
|
||||||
|
Elasticsearch ships with a number of <<analysis-analyzers,pre-defined analyzers>>,
|
||||||
|
which can be used without further configuration. It also ships with many
|
||||||
|
<<analysis-charfilters,character filters>>, <<analysis-tokenizers,tokenizers>>,
|
||||||
|
and <<analysis-tokenfilters>> which can be combined to configure
|
||||||
|
custom analyzers per index.
|
||||||
|
|
||||||
|
Analyzers can be specified per-query, per-field or per-index. At index time,
|
||||||
|
Elasticsearch will look for an analyzer in this order:
|
||||||
|
|
||||||
|
* The `analyzer` defined in the field mapping.
|
||||||
|
* An analyzer named `default` in the index settings.
|
||||||
|
* The <<analysis-standard-analyzer,`standard`>> analyzer.
|
||||||
|
|
||||||
|
At query time, there are a few more layers:
|
||||||
|
|
||||||
|
* The `analyzer` defined in a <<full-text-queries,full-text query>>.
|
||||||
|
* The `search_analyzer` defined in the field mapping.
|
||||||
|
* The `analyzer` defined in the field mapping.
|
||||||
|
* An analyzer named `default_search` in the index settings.
|
||||||
|
* An analyzer named `default` in the index settings.
|
||||||
|
* The <<analysis-standard-analyzer,`standard`>> analyzer.
|
||||||
|
|
||||||
|
The easiest way to specify an analyzer for a particular field is to define it
|
||||||
|
in the field mapping, as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"text": { <1>
|
||||||
|
"type": "string",
|
||||||
|
"fields": {
|
||||||
|
"english": { <2>
|
||||||
|
"type": "string",
|
||||||
|
"analyzer": "english"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_analyze?field=text <3>
|
||||||
|
{
|
||||||
|
"text": "The quick Brown Foxes."
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_analyze?field=text.english <4>
|
||||||
|
{
|
||||||
|
"text": "The quick Brown Foxes."
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `text` field uses the default `standard` analyzer`.
|
||||||
|
<2> The `text.english` <<multi-fields,multi-field>> uses the `english` analyzer, which removes stop words and applies stemming.
|
||||||
|
<3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ].
|
||||||
|
<4> This returns the tokens: [ `quick`, `brown`, `fox` ].
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,59 @@
|
||||||
|
[[index-boost]]
|
||||||
|
=== `boost`
|
||||||
|
|
||||||
|
Individual fields can be _boosted_ -- count more towards the relevance score
|
||||||
|
-- at index time, with the `boost` parameter as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"title": {
|
||||||
|
"type": "string",
|
||||||
|
"boost": 2 <1>
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"type": "string"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> Matches on the `title` field will have twice the weight as those on the
|
||||||
|
`content` field, which has the default `boost` of `1.0`.
|
||||||
|
|
||||||
|
Note that a `title` field will usually be shorter than a `content` field. The
|
||||||
|
default relevance calculation takes field length into account, so a short
|
||||||
|
`title` field will have a higher natural boost than a long `content` field.
|
||||||
|
|
||||||
|
[WARNING]
|
||||||
|
.Why index time boosting is a bad idea
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
We advise against using index time boosting for the following reasons:
|
||||||
|
|
||||||
|
* You cannot change index-time `boost` values without reindexing all of your
|
||||||
|
documents.
|
||||||
|
|
||||||
|
* Every query supports query-time boosting which achieves the same effect. The
|
||||||
|
difference is that you can tweak the `boost` value without having to reindex.
|
||||||
|
|
||||||
|
* Index-time boosts are stored as part of the <<norms,`norm`>>, which is only one
|
||||||
|
byte. This reduces the resolution of the field length normalization factor
|
||||||
|
which can lead to lower quality relevance calculations.
|
||||||
|
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
The only advantage that index time boosting has is that it is copied with the
|
||||||
|
value into the <<mapping-all-field,`_all`>> field. This means that, when
|
||||||
|
querying the `_all` field, words that originated from the `title` field will
|
||||||
|
have a higher score than words that originated in the `content` field.
|
||||||
|
This functionality comes at a cost: queries on the `_all` field are slower
|
||||||
|
when index-time boosting is used.
|
||||||
|
|
|
@ -0,0 +1,89 @@
|
||||||
|
[[coerce]]
|
||||||
|
=== `coerce`
|
||||||
|
|
||||||
|
Data is not always clean. Depending on how it is produced a number might be
|
||||||
|
rendered in the JSON body as a true JSON number, e.g. `5`, but it might also
|
||||||
|
be rendered as a string, e.g. `"5"`. Alternatively, a number that should be
|
||||||
|
an integer might instead be rendered as a floating point, e.g. `5.0`, or even
|
||||||
|
`"5.0"`.
|
||||||
|
|
||||||
|
Coercion attempts to clean up dirty values to fit the datatype of a field.
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
* Strings will be coerced to numbers.
|
||||||
|
* Floating points will be truncated for integer values.
|
||||||
|
* Lon/lat geo-points will be normalized to a standard -180:180 / -90:90 coordinate system.
|
||||||
|
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"number_one": {
|
||||||
|
"type": "integer"
|
||||||
|
},
|
||||||
|
"number_two": {
|
||||||
|
"type": "integer",
|
||||||
|
"coerce": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"number_one": "10" <1>
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{
|
||||||
|
"number_two": "10" <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `number_one` field will contain the integer `10`.
|
||||||
|
<2> This document will be rejected because coercion is disabled.
|
||||||
|
|
||||||
|
[[coerce-setting]]
|
||||||
|
==== Index-level default
|
||||||
|
|
||||||
|
The `index.mapping.coerce` setting can be set on the index level to disable
|
||||||
|
coercion globally across all mapping types:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"index.mapping.coerce": false
|
||||||
|
},
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"number_one": {
|
||||||
|
"type": "integer"
|
||||||
|
},
|
||||||
|
"number_two": {
|
||||||
|
"type": "integer",
|
||||||
|
"coerce": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{ "number_one": "10" } <1>
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{ "number_two": "10" } <2>
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> This document will be rejected because the `number_one` field inherits the index-level coercion setting.
|
||||||
|
<2> The `number_two` field overrides the index level setting to enable coercion.
|
||||||
|
|
|
@ -0,0 +1,64 @@
|
||||||
|
[[copy-to]]
|
||||||
|
=== `copy_to`
|
||||||
|
|
||||||
|
The `copy_to` parameter allows you to create custom
|
||||||
|
<<mapping-all-field,`_all`>> fields. In other words, the values of multiple
|
||||||
|
fields can be copied into a group field, which can then be queried as a single
|
||||||
|
field. For instance, the `first_name` and `last_name` fields can be copied to
|
||||||
|
the `full_name` field as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"first_name": {
|
||||||
|
"type": "string",
|
||||||
|
"copy_to": "full_name" <1>
|
||||||
|
},
|
||||||
|
"last_name": {
|
||||||
|
"type": "string",
|
||||||
|
"copy_to": "full_name" <1>
|
||||||
|
},
|
||||||
|
"full_name": {
|
||||||
|
"type": "string"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT /my_index/my_type/1
|
||||||
|
{
|
||||||
|
"first_name": "John",
|
||||||
|
"last_name": "Smith"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET /my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"full_name": { <2>
|
||||||
|
"query": "John Smith",
|
||||||
|
"operator": "and"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The values of the `first_name` and `last_name` fields are copied to the
|
||||||
|
`full_name` field.
|
||||||
|
|
||||||
|
<2> The `first_name` and `last_name` fields can still be queried for the
|
||||||
|
first name and last name respectively, but the `full_name` field can be
|
||||||
|
queried for both first and last names.
|
||||||
|
|
||||||
|
Some important points:
|
||||||
|
|
||||||
|
* It is the field _value_ which is copied, not the terms (which result from the analysis process).
|
||||||
|
* The original <<mapping-source-field,`_source`>> field will not be modified to show the copied values.
|
||||||
|
* The same value can be copied to multiple fields, with `"copy_to": [ "field_1", "field_2" ]`
|
|
@ -0,0 +1,46 @@
|
||||||
|
[[doc-values]]
|
||||||
|
=== `doc_values`
|
||||||
|
|
||||||
|
Most fields are <<mapping-index,indexed>> by default, which makes them
|
||||||
|
searchable. The inverted index allows queries to look up the search term in
|
||||||
|
unique sorted list of terms, and from that immediately have access to the list
|
||||||
|
of documents that contain the term.
|
||||||
|
|
||||||
|
Sorting, aggregations, and access to field values in scripts requires a
|
||||||
|
different data access pattern. Instead of lookup up the term and finding
|
||||||
|
documents, we need to be able to look up the document and find the terms that
|
||||||
|
is has in a field.
|
||||||
|
|
||||||
|
Doc values are the on-disk data structure, built at document index time, which
|
||||||
|
makes this data access pattern possible. Doc values are supported on almost
|
||||||
|
all field types, with the __notable exception of `analyzed` string fields__.
|
||||||
|
|
||||||
|
All fields which support doc values have them enabled by default. If you are
|
||||||
|
sure that you don't need to sort or aggregate on a field, or access the field
|
||||||
|
value from a script, you can disable doc values in order to save disk space:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"status_code": { <1>
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed"
|
||||||
|
},
|
||||||
|
"session_id": { <2>
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed",
|
||||||
|
"doc_values": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `status_code` field has `doc_values` enabled by default.
|
||||||
|
<2> The `session_id` has `doc_values` disabled, but can still be queried.
|
||||||
|
|
|
@ -0,0 +1,87 @@
|
||||||
|
[[dynamic]]
|
||||||
|
=== `dynamic`
|
||||||
|
|
||||||
|
By default, fields can be added _dynamically_ to a document, or to
|
||||||
|
<<object,inner objects>> within a document, just by indexing a document
|
||||||
|
containing the new field. For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
DELETE my_index <1>
|
||||||
|
|
||||||
|
PUT my_index/my_type/1 <2>
|
||||||
|
{
|
||||||
|
"username": "johnsmith",
|
||||||
|
"name": {
|
||||||
|
"first": "John",
|
||||||
|
"last": "Smith"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_mapping <3>
|
||||||
|
|
||||||
|
PUT my_index/my_type/2 <4>
|
||||||
|
{
|
||||||
|
"username": "marywhite",
|
||||||
|
"email": "mary@white.com",
|
||||||
|
"name": {
|
||||||
|
"first": "Mary",
|
||||||
|
"middle": "Alice",
|
||||||
|
"last": "White"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_mapping <5>
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> First delete the index, in case it already exists.
|
||||||
|
<2> This document introduces the string field `username`, the object field
|
||||||
|
`name`, and two string fields under the `name` object which can be
|
||||||
|
referred to as `name.first` and `name.last`.
|
||||||
|
<3> Check the mapping to verify the above.
|
||||||
|
<4> This document adds two string fields: `email` and `name.middle`.
|
||||||
|
<5> Check the mapping to verify the changes.
|
||||||
|
|
||||||
|
The details of how new fields are detected and added to the mapping is explained in <<dynamic-mapping>>.
|
||||||
|
|
||||||
|
The `dynamic` setting controls whether new fields can be added dynamically or
|
||||||
|
not. It accepts three settings:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`true`:: Newly detected fields are added to the mapping. (default)
|
||||||
|
`false`:: Newly detected fields are ignored. New fields must be added explicitly.
|
||||||
|
`strict`:: If new fields are detected, an exception is thrown and the document is rejected.
|
||||||
|
|
||||||
|
The `dynamic` setting may be set at the mapping type level, and on each
|
||||||
|
<<object,inner object>>. Inner objects inherit the setting from their parent
|
||||||
|
object or from the mapping type. For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"dynamic": false, <1>
|
||||||
|
"properties": {
|
||||||
|
"user": { <2>
|
||||||
|
"properties": {
|
||||||
|
"name": {
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"social_networks": { <3>
|
||||||
|
"dynamic": true,
|
||||||
|
"properties": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Dynamic mapping is disabled at the type level, so no new top-level fields will be added dynamically.
|
||||||
|
<2> The `user` object inherits the type-level setting.
|
||||||
|
<3> The `user.social_networks` object enables dynamic mapping, so new fields may be added to this inner object.
|
||||||
|
|
|
@ -0,0 +1,94 @@
|
||||||
|
[[enabled]]
|
||||||
|
=== `enabled`
|
||||||
|
|
||||||
|
Elasticsearch tries to index all of the fields you give it, but sometimes you
|
||||||
|
want to just store the field without indexing it. For instance, imagine that
|
||||||
|
you are using Elasticsearch as a web session store. You may want to index the
|
||||||
|
session ID and last update time, but you don't need to query or run
|
||||||
|
aggregations on the session data itself.
|
||||||
|
|
||||||
|
The `enabled` setting, which can be applied only to the mapping type and to
|
||||||
|
<<object,`object`>> fields, causes Elasticsearch to skip parsing of the
|
||||||
|
contents of the field entirely. The JSON can still be retrieved from the
|
||||||
|
<<mapping-source-field,`_source`>> field, but it is not searchable or stored
|
||||||
|
in any other way:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"session": {
|
||||||
|
"properties": {
|
||||||
|
"user_id": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed"
|
||||||
|
},
|
||||||
|
"last_updated": {
|
||||||
|
"type": "date"
|
||||||
|
},
|
||||||
|
"session_data": { <1>
|
||||||
|
"enabled": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/session/session_1
|
||||||
|
{
|
||||||
|
"user_id": "kimchy",
|
||||||
|
"session_data": { <2>
|
||||||
|
"arbitrary_object": {
|
||||||
|
"some_array": [ "foo", "bar", { "baz": 2 } ]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"last_updated": "2015-12-06T18:20:22"
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/session/session_2
|
||||||
|
{
|
||||||
|
"user_id": "jpountz",
|
||||||
|
"session_data": "none", <3>
|
||||||
|
"last_updated": "2015-12-06T18:22:13"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `session_data` field is disabled.
|
||||||
|
<2> Any arbitrary data can be passed to the `session_data` field as it will be entirely ignored.
|
||||||
|
<3> The `session_data` will also ignore values that are not JSON objects.
|
||||||
|
|
||||||
|
The entire mapping type may be disabled as well, in which case the document is
|
||||||
|
stored in the <<mapping-source-field,`_source`>> field, which means it can be
|
||||||
|
retrieved, but none of its contents are indexed in any way:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"session": { <1>
|
||||||
|
"enabled": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/session/session_1
|
||||||
|
{
|
||||||
|
"user_id": "kimchy",
|
||||||
|
"session_data": {
|
||||||
|
"arbitrary_object": {
|
||||||
|
"some_array": [ "foo", "bar", { "baz": 2 } ]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"last_updated": "2015-12-06T18:20:22"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/session/session_1 <2>
|
||||||
|
|
||||||
|
GET my_index/_mapping <3>
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The entire `session` mapping type is disabled.
|
||||||
|
<2> The document can be retrieved.
|
||||||
|
<3> Checking the mapping reveals that no fields have been added.
|
|
@ -0,0 +1,225 @@
|
||||||
|
[[fielddata]]
|
||||||
|
=== `fielddata`
|
||||||
|
|
||||||
|
Most fields are <<mapping-index,indexed>> by default, which makes them
|
||||||
|
searchable. The inverted index allows queries to look up the search term in
|
||||||
|
unique sorted list of terms, and from that immediately have access to the list
|
||||||
|
of documents that contain the term.
|
||||||
|
|
||||||
|
Sorting, aggregations, and access to field values in scripts requires a
|
||||||
|
different data access pattern. Instead of lookup up the term and finding
|
||||||
|
documents, we need to be able to look up the document and find the terms that
|
||||||
|
is has in a field.
|
||||||
|
|
||||||
|
Most fields can use index-time, on-disk <<doc-values,`doc_values`>> to support
|
||||||
|
this type of data access pattern, but `analyzed` string fields do not support
|
||||||
|
`doc_values`.
|
||||||
|
|
||||||
|
Instead, `analyzed` strings use a query-time data structure called
|
||||||
|
`fielddata`. This data structure is built on demand the first time that a
|
||||||
|
field is used for aggregations, sorting, or is accessed in a script. It is built
|
||||||
|
by reading the entire inverted index for each segment from disk, inverting the
|
||||||
|
term ↔︎ document relationship, and storing the result in memory, in the
|
||||||
|
JVM heap.
|
||||||
|
|
||||||
|
|
||||||
|
Loading fielddata is an expensive process so, once it has been loaded, it
|
||||||
|
remains in memory for the lifetime of the segment.
|
||||||
|
|
||||||
|
[WARNING]
|
||||||
|
.Fielddata can fill up your heap space
|
||||||
|
==============================================================================
|
||||||
|
Fielddata can consume a lot of heap space, especially when loading high
|
||||||
|
cardinality `analyzed` string fields. Most of the time, it doesn't make sense
|
||||||
|
to sort or aggregate on `analyzed` string fields (with the notable exception
|
||||||
|
of the
|
||||||
|
<<search-aggregations-bucket-significantterms-aggregation,`significant_terms`>>
|
||||||
|
aggregation). Always think about whether a `not_analyzed` field (which can
|
||||||
|
use `doc_values`) would be a better fit for your use case.
|
||||||
|
==============================================================================
|
||||||
|
|
||||||
|
[[fielddata-format]]
|
||||||
|
==== `fielddata.format`
|
||||||
|
|
||||||
|
For `analyzed` string fields, the fielddata `format` controls whether
|
||||||
|
fielddata should be enabled or not. It accepts: `disabled` and `paged_bytes`
|
||||||
|
(enabled, which is the default). To disable fielddata loading, you can use
|
||||||
|
the following mapping:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"text": {
|
||||||
|
"type": "string",
|
||||||
|
"fielddata": {
|
||||||
|
"format": "disabled" <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `text` field cannot be used for sorting, aggregations, or in scripts.
|
||||||
|
|
||||||
|
.Fielddata and other datatypes
|
||||||
|
[NOTE]
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
Historically, other field datatypes also used fielddata, but this has been replaced
|
||||||
|
by index-time, disk-based <<doc-values,`doc_values`>>.
|
||||||
|
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
|
||||||
|
[[fielddata-loading]]
|
||||||
|
==== `fielddata.loading`
|
||||||
|
|
||||||
|
This per-field setting controls when fielddata is loaded into memory. It
|
||||||
|
accepts three options:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`lazy`::
|
||||||
|
|
||||||
|
Fielddata is only loaded into memory when it is needed. (default)
|
||||||
|
|
||||||
|
`eager`::
|
||||||
|
|
||||||
|
Fielddata is loaded into memory before a new search segment becomes
|
||||||
|
visible to search. This can reduce the latency that a user may experience
|
||||||
|
if their search request has to trigger lazy loading from a big segment.
|
||||||
|
|
||||||
|
`eager_global_ordinals`::
|
||||||
|
|
||||||
|
Loading fielddata into memory is only part of the work that is required.
|
||||||
|
After loading the fielddata for each segment, Elasticsearch builds the
|
||||||
|
<<global-ordinals>> data structure to make a list of all unique terms
|
||||||
|
across all the segments in a shard. By default, global ordinals are built
|
||||||
|
lazily. If the field has a very high cardinality, global ordinals may
|
||||||
|
take some time to build, in which case you can use eager loading instead.
|
||||||
|
|
||||||
|
[[global-ordinals]]
|
||||||
|
.Global ordinals
|
||||||
|
*****************************************
|
||||||
|
|
||||||
|
Global ordinals is a data-structure on top of fielddata and doc values, that
|
||||||
|
maintains an incremental numbering for each unique term in a lexicographic
|
||||||
|
order. Each term has a unique number and the number of term 'A' is lower than
|
||||||
|
the number of term 'B'. Global ordinals are only supported on string fields.
|
||||||
|
|
||||||
|
Fielddata and doc values also have ordinals, which is a unique numbering for all terms
|
||||||
|
in a particular segment and field. Global ordinals just build on top of this,
|
||||||
|
by providing a mapping between the segment ordinals and the global ordinals,
|
||||||
|
the latter being unique across the entire shard.
|
||||||
|
|
||||||
|
Global ordinals are used for features that use segment ordinals, such as
|
||||||
|
sorting and the terms aggregation, to improve the execution time. A terms
|
||||||
|
aggregation relies purely on global ordinals to perform the aggregation at the
|
||||||
|
shard level, then converts global ordinals to the real term only for the final
|
||||||
|
reduce phase, which combines results from different shards.
|
||||||
|
|
||||||
|
Global ordinals for a specified field are tied to _all the segments of a
|
||||||
|
shard_, while fielddata and doc values ordinals are tied to a single segment.
|
||||||
|
which is different than for field data for a specific field which is tied to a
|
||||||
|
single segment. For this reason global ordinals need to be entirely rebuilt
|
||||||
|
whenever a once new segment becomes visible.
|
||||||
|
|
||||||
|
The loading time of global ordinals depends on the number of terms in a field, but in general
|
||||||
|
it is low, since it source field data has already been loaded. The memory overhead of global
|
||||||
|
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
|
||||||
|
can move the loading time from the first search request, to the refresh itself.
|
||||||
|
|
||||||
|
*****************************************
|
||||||
|
|
||||||
|
[[field-data-filtering]]
|
||||||
|
==== `fielddata.filter`
|
||||||
|
|
||||||
|
Fielddata filtering can be used to reduce the number of terms loaded into
|
||||||
|
memory, and thus reduce memory usage. Terms can be filtered by _frequency_ or
|
||||||
|
by _regular expression_, or a combination of the two:
|
||||||
|
|
||||||
|
Filtering by frequency::
|
||||||
|
+
|
||||||
|
--
|
||||||
|
|
||||||
|
The frequency filter allows you to only load terms whose term frequency falls
|
||||||
|
between a `min` and `max` value, which can be expressed an absolute
|
||||||
|
number (when the number is bigger than 1.0) or as a percentage
|
||||||
|
(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated
|
||||||
|
*per segment*. Percentages are based on the number of docs which have a
|
||||||
|
value for the field, as opposed to all docs in the segment.
|
||||||
|
|
||||||
|
Small segments can be excluded completely by specifying the minimum
|
||||||
|
number of docs that the segment should contain with `min_segment_size`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"tag": {
|
||||||
|
"type": "string",
|
||||||
|
"fielddata": {
|
||||||
|
"filter": {
|
||||||
|
"frequency": {
|
||||||
|
"min": 0.001,
|
||||||
|
"max": 0.1,
|
||||||
|
"min_segment_size": 500
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
--
|
||||||
|
|
||||||
|
Filtering by regex::
|
||||||
|
+
|
||||||
|
--
|
||||||
|
Terms can also be filtered by regular expression - only values which
|
||||||
|
match the regular expression are loaded. Note: the regular expression is
|
||||||
|
applied to each term in the field, not to the whole field value. For
|
||||||
|
instance, to only load hashtags from a tweet, we can use a regular
|
||||||
|
expression which matches terms beginning with `#`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"tweet": {
|
||||||
|
"type": "string",
|
||||||
|
"analyzer": "whitespace",
|
||||||
|
"fielddata": {
|
||||||
|
"filter": {
|
||||||
|
"regex": {
|
||||||
|
"pattern": "^#.*"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
--
|
||||||
|
|
||||||
|
These filters can be updated on an existing field mapping and will take
|
||||||
|
effect the next time the fielddata for a segment is loaded. Use the
|
||||||
|
<<indices-clearcache,Clear Cache>> API
|
||||||
|
to reload the fielddata using the new filters.
|
|
@ -0,0 +1,281 @@
|
||||||
|
[[mapping-date-format]]
|
||||||
|
=== `format`
|
||||||
|
|
||||||
|
In JSON documents, dates are represented as strings. Elasticsearch uses a set
|
||||||
|
of preconfigured formats to recognize and parse these strings into a long
|
||||||
|
value representing _milliseconds-since-the-epoch_ in UTC.
|
||||||
|
|
||||||
|
Besides the <<built-in-date-formats,built-in formats>>, your own
|
||||||
|
<<custom-date-formats,custom formats>> can be specified using the familiar
|
||||||
|
`yyyy/MM/dd` syntax:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"date": {
|
||||||
|
"type": "date",
|
||||||
|
"format": "yyyy-MM-dd"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
Many APIs which support date values also support <<date-math,date math>>
|
||||||
|
expressions, such as `now-1m/d` -- the current time, minus one month, rounded
|
||||||
|
down to the nearest day.
|
||||||
|
|
||||||
|
[[custom-date-formats]]
|
||||||
|
==== Custom date formats
|
||||||
|
|
||||||
|
Completely customizable date formats are supported. The syntax for these is explained
|
||||||
|
http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[in the Joda docs].
|
||||||
|
|
||||||
|
[[built-in-date-formats]]
|
||||||
|
==== Built In Formats
|
||||||
|
|
||||||
|
Most of the below dates have a `strict` companion dates, which means, that
|
||||||
|
year, month and day parts of the week must have prepending zeros in order
|
||||||
|
to be valid. This means, that a date like `5/11/1` would not be valid, but
|
||||||
|
you would need to specify the full date, which would be `2005/11/01` in this
|
||||||
|
example. So instead of `date_optional_time` you would need to specify
|
||||||
|
`strict_date_optional_time`.
|
||||||
|
|
||||||
|
The following tables lists all the defaults ISO formats supported:
|
||||||
|
|
||||||
|
`epoch_millis`::
|
||||||
|
|
||||||
|
A formatter for the number of milliseconds since the epoch. Note, that
|
||||||
|
this timestamp allows a max length of 13 chars, so dates older than 1653
|
||||||
|
and 2286 are not supported. You should use a different date formatter in
|
||||||
|
that case.
|
||||||
|
|
||||||
|
`epoch_second`::
|
||||||
|
|
||||||
|
A formatter for the number of seconds since the epoch. Note, that this
|
||||||
|
timestamp allows a max length of 10 chars, so dates older than 1653 and
|
||||||
|
2286 are not supported. You should use a different date formatter in that
|
||||||
|
case.
|
||||||
|
|
||||||
|
[[strict-date-time]]`date_optional_time` or `strict_date_optional_time`::
|
||||||
|
|
||||||
|
A generic ISO datetime parser where the date is mandatory and the time is
|
||||||
|
optional.
|
||||||
|
http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[Full details here].
|
||||||
|
|
||||||
|
`basic_date`::
|
||||||
|
|
||||||
|
A basic formatter for a full date as four digit year, two digit month of
|
||||||
|
year, and two digit day of month: `yyyyMMdd`.
|
||||||
|
|
||||||
|
`basic_date_time`::
|
||||||
|
|
||||||
|
A basic formatter that combines a basic date and time, separated by a 'T':
|
||||||
|
`yyyyMMdd'T'HHmmss.SSSZ`.
|
||||||
|
|
||||||
|
`basic_date_time_no_millis`::
|
||||||
|
|
||||||
|
A basic formatter that combines a basic date and time without millis,
|
||||||
|
separated by a 'T': `yyyyMMdd'T'HHmmssZ`.
|
||||||
|
|
||||||
|
`basic_ordinal_date`::
|
||||||
|
|
||||||
|
A formatter for a full ordinal date, using a four digit year and three
|
||||||
|
digit dayOfYear: `yyyyDDD`.
|
||||||
|
|
||||||
|
`basic_ordinal_date_time`::
|
||||||
|
|
||||||
|
A formatter for a full ordinal date and time, using a four digit year and
|
||||||
|
three digit dayOfYear: `yyyyDDD'T'HHmmss.SSSZ`.
|
||||||
|
|
||||||
|
`basic_ordinal_date_time_no_millis`::
|
||||||
|
|
||||||
|
A formatter for a full ordinal date and time without millis, using a four
|
||||||
|
digit year and three digit dayOfYear: `yyyyDDD'T'HHmmssZ`.
|
||||||
|
|
||||||
|
`basic_time`::
|
||||||
|
|
||||||
|
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||||
|
two digit second of minute, three digit millis, and time zone offset:
|
||||||
|
`HHmmss.SSSZ`.
|
||||||
|
|
||||||
|
`basic_time_no_millis`::
|
||||||
|
|
||||||
|
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||||
|
two digit second of minute, and time zone offset: `HHmmssZ`.
|
||||||
|
|
||||||
|
`basic_t_time`::
|
||||||
|
|
||||||
|
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||||
|
two digit second of minute, three digit millis, and time zone off set
|
||||||
|
prefixed by 'T': `'T'HHmmss.SSSZ`.
|
||||||
|
|
||||||
|
`basic_t_time_no_millis`::
|
||||||
|
|
||||||
|
A basic formatter for a two digit hour of day, two digit minute of hour,
|
||||||
|
two digit second of minute, and time zone offset prefixed by 'T':
|
||||||
|
`'T'HHmmssZ`.
|
||||||
|
|
||||||
|
`basic_week_date` or `strict_basic_week_date`::
|
||||||
|
|
||||||
|
A basic formatter for a full date as four digit weekyear, two digit week
|
||||||
|
of weekyear, and one digit day of week: `xxxx'W'wwe`.
|
||||||
|
|
||||||
|
`basic_week_date_time` or `strict_basic_week_date_time`::
|
||||||
|
|
||||||
|
A basic formatter that combines a basic weekyear date and time, separated
|
||||||
|
by a 'T': `xxxx'W'wwe'T'HHmmss.SSSZ`.
|
||||||
|
|
||||||
|
`basic_week_date_time_no_millis` or `strict_basic_week_date_time_no_millis`::
|
||||||
|
|
||||||
|
A basic formatter that combines a basic weekyear date and time without
|
||||||
|
millis, separated by a 'T': `xxxx'W'wwe'T'HHmmssZ`.
|
||||||
|
|
||||||
|
`date` or `strict_date`::
|
||||||
|
|
||||||
|
A formatter for a full date as four digit year, two digit month of year,
|
||||||
|
and two digit day of month: `yyyy-MM-dd`.
|
||||||
|
|
||||||
|
`date_hour` or `strict_date_hour`::
|
||||||
|
|
||||||
|
A formatter that combines a full date and two digit hour of day.
|
||||||
|
|
||||||
|
`date_hour_minute` or `strict_date_hour_minute`::
|
||||||
|
|
||||||
|
A formatter that combines a full date, two digit hour of day, and two
|
||||||
|
digit minute of hour.
|
||||||
|
|
||||||
|
`date_hour_minute_second` or `strict_date_hour_minute_second`::
|
||||||
|
|
||||||
|
A formatter that combines a full date, two digit hour of day, two digit
|
||||||
|
minute of hour, and two digit second of minute.
|
||||||
|
|
||||||
|
`date_hour_minute_second_fraction` or `strict_date_hour_minute_second_fraction`::
|
||||||
|
|
||||||
|
A formatter that combines a full date, two digit hour of day, two digit
|
||||||
|
minute of hour, two digit second of minute, and three digit fraction of
|
||||||
|
second: `yyyy-MM-dd'T'HH:mm:ss.SSS`.
|
||||||
|
|
||||||
|
`date_hour_minute_second_millis` or `strict_date_hour_minute_second_millis`::
|
||||||
|
|
||||||
|
A formatter that combines a full date, two digit hour of day, two digit
|
||||||
|
minute of hour, two digit second of minute, and three digit fraction of
|
||||||
|
second: `yyyy-MM-dd'T'HH:mm:ss.SSS`.
|
||||||
|
|
||||||
|
`date_time` or `strict_date_time`::
|
||||||
|
|
||||||
|
A formatter that combines a full date and time, separated by a 'T': `yyyy-
|
||||||
|
MM-dd'T'HH:mm:ss.SSSZZ`.
|
||||||
|
|
||||||
|
`date_time_no_millis` or `strict_date_time_no_millis`::
|
||||||
|
|
||||||
|
A formatter that combines a full date and time without millis, separated
|
||||||
|
by a 'T': `yyyy-MM-dd'T'HH:mm:ssZZ`.
|
||||||
|
|
||||||
|
`hour` or `strict_hour`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day.
|
||||||
|
|
||||||
|
`hour_minute` or `strict_hour_minute`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day and two digit minute of hour.
|
||||||
|
|
||||||
|
`hour_minute_second` or `strict_hour_minute_second`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, and two
|
||||||
|
digit second of minute.
|
||||||
|
|
||||||
|
`hour_minute_second_fraction` or `strict_hour_minute_second_fraction`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||||
|
digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`.
|
||||||
|
|
||||||
|
`hour_minute_second_millis` or `strict_hour_minute_second_millis`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||||
|
digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`.
|
||||||
|
|
||||||
|
`ordinal_date` or `strict_ordinal_date`::
|
||||||
|
|
||||||
|
A formatter for a full ordinal date, using a four digit year and three
|
||||||
|
digit dayOfYear: `yyyy-DDD`.
|
||||||
|
|
||||||
|
`ordinal_date_time` or `strict_ordinal_date_time`::
|
||||||
|
|
||||||
|
A formatter for a full ordinal date and time, using a four digit year and
|
||||||
|
three digit dayOfYear: `yyyy-DDD'T'HH:mm:ss.SSSZZ`.
|
||||||
|
|
||||||
|
`ordinal_date_time_no_millis` or `strict_ordinal_date_time_no_millis`::
|
||||||
|
|
||||||
|
A formatter for a full ordinal date and time without millis, using a four
|
||||||
|
digit year and three digit dayOfYear: `yyyy-DDD'T'HH:mm:ssZZ`.
|
||||||
|
|
||||||
|
`time` or `strict_time`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||||
|
digit second of minute, three digit fraction of second, and time zone
|
||||||
|
offset: `HH:mm:ss.SSSZZ`.
|
||||||
|
|
||||||
|
`time_no_millis` or `strict_time_no_millis`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||||
|
digit second of minute, and time zone offset: `HH:mm:ssZZ`.
|
||||||
|
|
||||||
|
`t_time` or `strict_t_time`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||||
|
digit second of minute, three digit fraction of second, and time zone
|
||||||
|
offset prefixed by 'T': `'T'HH:mm:ss.SSSZZ`.
|
||||||
|
|
||||||
|
`t_time_no_millis` or `strict_t_time_no_millis`::
|
||||||
|
|
||||||
|
A formatter for a two digit hour of day, two digit minute of hour, two
|
||||||
|
digit second of minute, and time zone offset prefixed by 'T': `'T'HH:mm:ssZZ`.
|
||||||
|
|
||||||
|
`week_date` or `strict_week_date`::
|
||||||
|
|
||||||
|
A formatter for a full date as four digit weekyear, two digit week of
|
||||||
|
weekyear, and one digit day of week: `xxxx-'W'ww-e`.
|
||||||
|
|
||||||
|
`week_date_time` or `strict_week_date_time`::
|
||||||
|
|
||||||
|
A formatter that combines a full weekyear date and time, separated by a
|
||||||
|
'T': `xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ`.
|
||||||
|
|
||||||
|
`week_date_time_no_millis` or `strict_week_date_time_no_millis`::
|
||||||
|
|
||||||
|
A formatter that combines a full weekyear date and time without millis,
|
||||||
|
separated by a 'T': `xxxx-'W'ww-e'T'HH:mm:ssZZ`.
|
||||||
|
|
||||||
|
`weekyear` or `strict_weekyear`::
|
||||||
|
|
||||||
|
A formatter for a four digit weekyear.
|
||||||
|
|
||||||
|
`weekyear_week` or `strict_weekyear_week`::
|
||||||
|
|
||||||
|
A formatter for a four digit weekyear and two digit week of weekyear.
|
||||||
|
|
||||||
|
`weekyear_week_day` or `strict_weekyear_week_day`::
|
||||||
|
|
||||||
|
A formatter for a four digit weekyear, two digit week of weekyear, and one
|
||||||
|
digit day of week.
|
||||||
|
|
||||||
|
`year` or `strict_year`::
|
||||||
|
|
||||||
|
A formatter for a four digit year.
|
||||||
|
|
||||||
|
`year_month` or `strict_year_month`::
|
||||||
|
|
||||||
|
A formatter for a four digit year and two digit month of year.
|
||||||
|
|
||||||
|
`year_month_day` or `strict_year_month_day`::
|
||||||
|
|
||||||
|
A formatter for a four digit year, two digit month of year, and two digit
|
||||||
|
day of month.
|
||||||
|
|
|
@ -0,0 +1,60 @@
|
||||||
|
[[geohash-precision]]
|
||||||
|
=== `geohash_precision`
|
||||||
|
|
||||||
|
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||||
|
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||||
|
cell in turn can be further subdivided into smaller cells which are
|
||||||
|
represented by a longer string. So the longer the geohash, the smaller
|
||||||
|
(and thus more accurate) the cell is.
|
||||||
|
|
||||||
|
The `geohash_precision` setting controls the length of the geohash that is
|
||||||
|
indexed when the <<geohash,`geohash`>> option is enabled, and the maximum
|
||||||
|
geohash length when the <<geohash-prefix,`geohash_prefix`>> option is enabled.
|
||||||
|
|
||||||
|
It accepts:
|
||||||
|
|
||||||
|
* a number between 1 and 12 (default), which represents the length of the geohash.
|
||||||
|
* a <<distance-units,distance>>, e.g. `1km`.
|
||||||
|
|
||||||
|
If a distance is specified, it will be translated to the smallest
|
||||||
|
geohash-length that will provide the requested resolution.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"location": {
|
||||||
|
"type": "geo_point",
|
||||||
|
"geohash_prefix": true,
|
||||||
|
"geohash_precision": 6 <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"location": {
|
||||||
|
"lat": 41.12,
|
||||||
|
"lon": -71.34
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search?fielddata_fields=location.geohash
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"term": {
|
||||||
|
"location.geohash": "drm3bt"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> A `geohash_precision` of 6 equates to geohash cells of approximately 1.26km x 0.6km
|
|
@ -0,0 +1,64 @@
|
||||||
|
[[geohash-prefix]]
|
||||||
|
=== `geohash_prefix`
|
||||||
|
|
||||||
|
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||||
|
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||||
|
cell in turn can be further subdivided into smaller cells which are
|
||||||
|
represented by a longer string. So the longer the geohash, the smaller
|
||||||
|
(and thus more accurate) the cell is.
|
||||||
|
|
||||||
|
While the <<geohash,`geohash`>> option enables indexing the geohash that
|
||||||
|
corresponds to the lat/lon point, at the specified
|
||||||
|
<<geohash-precision,precision>>, the `geohash_prefix` option will also
|
||||||
|
index all the enclosing cells as well.
|
||||||
|
|
||||||
|
For instance, a geohash of `drm3btev3e86` will index all of the following
|
||||||
|
terms: [ `d`, `dr`, `drm`, `drm3`, `drm3b`, `drm3bt`, `drm3bte`, `drm3btev`,
|
||||||
|
`drm3btev3`, `drm3btev3e`, `drm3btev3e8`, `drm3btev3e86` ].
|
||||||
|
|
||||||
|
The geohash prefixes can be used with the
|
||||||
|
<<query-dsl-geohash-cell-query,`geohash_cell` query>> to find points within a
|
||||||
|
particular geohash, or its neighbours:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"location": {
|
||||||
|
"type": "geo_point",
|
||||||
|
"geohash_prefix": true,
|
||||||
|
"geohash_precision": 6
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"location": {
|
||||||
|
"lat": 41.12,
|
||||||
|
"lon": -71.34
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search?fielddata_fields=location.geohash
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"geohash_cell": {
|
||||||
|
"location": {
|
||||||
|
"lat": 41.02,
|
||||||
|
"lon": -71.48
|
||||||
|
},
|
||||||
|
"precision": 4, <1>
|
||||||
|
"neighbors": true <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
|
@ -0,0 +1,70 @@
|
||||||
|
[[geohash]]
|
||||||
|
=== `geohash`
|
||||||
|
|
||||||
|
Geohashes are a form of lat/lon encoding which divides the earth up into
|
||||||
|
a grid. Each cell in this grid is represented by a geohash string. Each
|
||||||
|
cell in turn can be further subdivided into smaller cells which are
|
||||||
|
represented by a longer string. So the longer the geohash, the smaller
|
||||||
|
(and thus more accurate) the cell is.
|
||||||
|
|
||||||
|
Because geohashes are just strings, they can be stored in an inverted
|
||||||
|
index like any other string, which makes querying them very efficient.
|
||||||
|
|
||||||
|
If you enable the `geohash` option, a `geohash` ``sub-field'' will be indexed
|
||||||
|
as, eg `.geohash`. The length of the geohash is controlled by the
|
||||||
|
<<geohash-precision,`geohash_precision`>> parameter.
|
||||||
|
|
||||||
|
If the <<geohash-prefix,`geohash_prefix`>> option is enabled, the `geohash`
|
||||||
|
option will be enabled automatically.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"location": {
|
||||||
|
"type": "geo_point", <1>
|
||||||
|
"geohash": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"location": {
|
||||||
|
"lat": 41.12,
|
||||||
|
"lon": -71.34
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search?fielddata_fields=location.geohash <2>
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"prefix": {
|
||||||
|
"location.geohash": "drm3b" <3>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> A `location.geohash` field will be indexed for each geo-point.
|
||||||
|
<2> The geohash can be retrieved with <<doc-values,`doc_values`>>.
|
||||||
|
<3> A <<query-dsl-prefix-query,`prefix`>> query can find all geohashes which start with a particular prefix.
|
||||||
|
|
||||||
|
[WARNING]
|
||||||
|
============================================
|
||||||
|
|
||||||
|
A `prefix` query on geohashes is expensive. Instead, consider using the
|
||||||
|
<<geohash-prefix,`geohash_prefix`>> to pay the expense once at index time
|
||||||
|
instead of on every query.
|
||||||
|
|
||||||
|
============================================
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,61 @@
|
||||||
|
[[ignore-above]]
|
||||||
|
=== `ignore_above`
|
||||||
|
|
||||||
|
Strings longer than the `ignore_above` setting will not be processed by the
|
||||||
|
<<analyzer,analyzer>> and will not be indexed. This is mainly useful for
|
||||||
|
<<mapping-index,`not_analyzed`>> string fields, which are typically used for
|
||||||
|
filtering, aggregations, and sorting. These are structured fields and it
|
||||||
|
doesn't usually make sense to allow very long terms to be indexed in these
|
||||||
|
fields.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"message": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed",
|
||||||
|
"ignore_above": 20 <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1 <2>
|
||||||
|
{
|
||||||
|
"message": "Syntax error"
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/2 <3>
|
||||||
|
{
|
||||||
|
"message": "Syntax error with some long stacktrace"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET _search <4>
|
||||||
|
{
|
||||||
|
"aggs": {
|
||||||
|
"messages": {
|
||||||
|
"terms": {
|
||||||
|
"field": "message"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> This field will ignore any string longer than 20 characters.
|
||||||
|
<2> This document is indexed successfully.
|
||||||
|
<3> This document will be indexed, but without indexing the `message` field.
|
||||||
|
<4> Search returns both documents, but only the first is present in the terms aggregation.
|
||||||
|
|
||||||
|
This option is also useful for protecting against Lucene's term byte-length
|
||||||
|
limit of `32766`.
|
||||||
|
|
||||||
|
NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
|
||||||
|
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
|
||||||
|
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
|
||||||
|
3 bytes.
|
|
@ -0,0 +1,83 @@
|
||||||
|
[[ignore-malformed]]
|
||||||
|
=== `ignore_malformed`
|
||||||
|
|
||||||
|
Sometimes you don't have much control over the data that you receive. One
|
||||||
|
user may send a `login` field that is a <<date,`date`>>, and another sends a
|
||||||
|
`login` field that is an email address.
|
||||||
|
|
||||||
|
Trying to index the wrong datatype into a field throws an exception by
|
||||||
|
default, and rejects the whole document. The `ignore_malformed` parameter, if
|
||||||
|
set to `true`, allows the exception to be ignored. The malformed field is not
|
||||||
|
indexed, but other fields in the document are processed normally.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"number_one": {
|
||||||
|
"type": "integer"
|
||||||
|
},
|
||||||
|
"number_two": {
|
||||||
|
"type": "integer",
|
||||||
|
"ignore_malformed": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"text": "Some text value",
|
||||||
|
"number_one": "foo" <1>
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{
|
||||||
|
"text": "Some text value",
|
||||||
|
"number_two": "foo" <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> This document will be rejected because `number_one` does not allow malformed values.
|
||||||
|
<2> This document will have the `text` field indexed, but not the `number_two` field.
|
||||||
|
|
||||||
|
|
||||||
|
[[ignore-malformed-setting]]
|
||||||
|
==== Index-level default
|
||||||
|
|
||||||
|
The `index.mapping.ignore_malformed` setting can be set on the index level to
|
||||||
|
allow to ignore malformed content globally across all mapping types.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"index.mapping.ignore_malformed": true <1>
|
||||||
|
},
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"number_one": { <1>
|
||||||
|
"type": "byte"
|
||||||
|
},
|
||||||
|
"number_two": {
|
||||||
|
"type": "integer",
|
||||||
|
"ignore_malformed": false <2>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> The `number_one` field inherits the index-level setting.
|
||||||
|
<2> The `number_two` field overrides the index-level setting to turn off `ignore_malformed`.
|
||||||
|
|
|
@ -0,0 +1,83 @@
|
||||||
|
[[include-in-all]]
|
||||||
|
=== `include_in_all`
|
||||||
|
|
||||||
|
The `include_in_all` parameter provides per-field control over which fields
|
||||||
|
are included in the <<mapping-all-field,`_all`>> field. It defaults to `true`, unless <<mapping-index,`index`>> is set to `no`.
|
||||||
|
|
||||||
|
This example demonstrates how to exclude the `date` field from the `_all` field:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"title": { <1>
|
||||||
|
"type": "string"
|
||||||
|
}
|
||||||
|
"content": { <1>
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"date": { <2>
|
||||||
|
"type": "date",
|
||||||
|
"include_in_all": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> The `title` and `content` fields with be included in the `_all` field.
|
||||||
|
<2> The `date` field will not be included in the `_all` field.
|
||||||
|
|
||||||
|
The `include_in_all` parameter can also be set at the type level and on
|
||||||
|
<<object,`object`>> or <<nested,`nested`>> fields, in which case all sub-
|
||||||
|
fields inherit that setting. For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"include_in_all": false, <1>
|
||||||
|
"properties": {
|
||||||
|
"title": { "type": "string" },
|
||||||
|
"author": {
|
||||||
|
"include_in_all": true, <2>
|
||||||
|
"properties": {
|
||||||
|
"first_name": { "type": "string" },
|
||||||
|
"last_name": { "type": "string" }
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"editor": {
|
||||||
|
"properties": {
|
||||||
|
"first_name": { "type": "string" }, <3>
|
||||||
|
"last_name": { "type": "string", "include_in_all": true } <3>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> All fields in `my_type` are excluded from `_all`.
|
||||||
|
<2> The `author.first_name` and `author.last_name` fields are included in `_all`.
|
||||||
|
<3> Only the `editor.last_name` field is included in `_all`.
|
||||||
|
The `editor.first_name` inherits the type-level setting and is excluded.
|
||||||
|
|
||||||
|
[NOTE]
|
||||||
|
.Multi-fields and `include_in_all`
|
||||||
|
=================================
|
||||||
|
|
||||||
|
The original field value is added to the `_all` field, not the terms produced
|
||||||
|
by a field's analyzer. For this reason, it makes no sense to set
|
||||||
|
`include_in_all` to `true` on <<multi-fields,multi-fields>>, as each
|
||||||
|
multi-field has exactly the same value as its parent.
|
||||||
|
|
||||||
|
=================================
|
|
@ -0,0 +1,70 @@
|
||||||
|
[[index-options]]
|
||||||
|
=== `index_options`
|
||||||
|
|
||||||
|
The `index_options` parameter controls what information is added to the
|
||||||
|
inverted index, for search and highlighting purposes. It accepts the
|
||||||
|
following settings:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`docs`::
|
||||||
|
|
||||||
|
Only the doc number is indexed. Can answer the question _Does this term
|
||||||
|
exist in this field?_
|
||||||
|
|
||||||
|
`freqs`::
|
||||||
|
|
||||||
|
Doc number and term frequencies are indexed. Term frequencies are used to
|
||||||
|
score repeated terms higher than single terms.
|
||||||
|
|
||||||
|
`positions`::
|
||||||
|
|
||||||
|
Doc number, term frequencies, and term positions (or order) are indexed.
|
||||||
|
Positions can be used for
|
||||||
|
<<query-dsl-match-query-phrase,proximity or phrase queries>>.
|
||||||
|
|
||||||
|
`offsets`::
|
||||||
|
|
||||||
|
Doc number, term frequencies, positions, and start and end character
|
||||||
|
offsets (which map the term back to the original string) are indexed.
|
||||||
|
Offsets are used by the <<postings-highlighter,postings highlighter>>.
|
||||||
|
|
||||||
|
<<mapping-index,Analyzed>> string fields use `positions` as the default, and
|
||||||
|
<<all other fields use `docs` as the default.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"text": {
|
||||||
|
"type": "string",
|
||||||
|
"index_options": "offsets"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"text": "Quick brown fox"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"text": "brown fox"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"highlight": {
|
||||||
|
"fields": {
|
||||||
|
"text": {} <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `text` field will use the postings highlighter by default because `offsets` are indexed.
|
|
@ -0,0 +1,48 @@
|
||||||
|
[[mapping-index]]
|
||||||
|
=== `index`
|
||||||
|
|
||||||
|
The `index` option controls how field values are indexed and, thus, how they
|
||||||
|
are searchable. It accepts three values:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`no`::
|
||||||
|
|
||||||
|
Do not add this field value to the index. With this setting, the field
|
||||||
|
will not be queryable.
|
||||||
|
|
||||||
|
`not_analyzed`::
|
||||||
|
|
||||||
|
Add the field value to the index unchanged, as a single term. This is the
|
||||||
|
default for all fields that support this option except for
|
||||||
|
<<string,`string`>> fields. `not_analyzed` fields are usually used with
|
||||||
|
<<term-level-queries,term-level queries>> for structured search.
|
||||||
|
|
||||||
|
`analyzed`::
|
||||||
|
|
||||||
|
This option applies only to `string` fields, for which it is the default.
|
||||||
|
The string field value is first <<analysis,analyzed>> to convert the
|
||||||
|
string into terms (e.g. a list of individual words), which are then
|
||||||
|
indexed. At search time, the the query string is passed through
|
||||||
|
(<<search-analyzer,usually>>) the same analyzer to generate terms
|
||||||
|
in the same format as those in the index. It is this process that enables
|
||||||
|
<<full-text-queries,full text search>>.
|
||||||
|
|
||||||
|
For example, you can create a `not_analyzed` string field with the following:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"status_code": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
|
@ -0,0 +1,63 @@
|
||||||
|
[[lat-lon]]
|
||||||
|
=== `lat_lon`
|
||||||
|
|
||||||
|
<<geo-queries,Geo-queries>> are usually performed by plugging the value of
|
||||||
|
each <<geo-point,`geo_point`>> field into a formula to determine whether it
|
||||||
|
falls into the required area or not. Unlike most queries, the inverted index
|
||||||
|
is not involved.
|
||||||
|
|
||||||
|
Setting `lat_lon` to `true` causes the latitude and longitude values to be
|
||||||
|
indexed as numeric fields (called `.lat` and `.lon`). These fields can be used
|
||||||
|
by the <<query-dsl-geo-bounding-box-query,`geo_bounding_box`>> and
|
||||||
|
<<query-dsl-geo-distance-query,`geo_distance`>> queries instead of
|
||||||
|
performing in-memory calculations.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"location": {
|
||||||
|
"type": "geo_point",
|
||||||
|
"lat_lon": true <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"location": {
|
||||||
|
"lat": 41.12,
|
||||||
|
"lon": -71.34
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"geo_distance": {
|
||||||
|
"location": {
|
||||||
|
"lat": 41,
|
||||||
|
"lon": -71
|
||||||
|
},
|
||||||
|
"distance": "50km",
|
||||||
|
"optimize_bbox": "indexed" <2>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Setting `lat_lon` to true indexes the geo-point in the `location.lat` and `location.lon` fields.
|
||||||
|
<2> The `indexed` option tells the geo-distance query to use the inverted index instead of the in-memory calculation.
|
||||||
|
|
||||||
|
Whether the in-memory or indexed operation performs better depends both on
|
||||||
|
your dataset and on the types of queries that you are running.
|
||||||
|
|
||||||
|
NOTE: The `lat_lon` option only makes sense for single-value `geo_point`
|
||||||
|
fields. It will not work with arrays of geo-points.
|
||||||
|
|
|
@ -0,0 +1,132 @@
|
||||||
|
[[multi-fields]]
|
||||||
|
=== `fields`
|
||||||
|
|
||||||
|
It is often useful to index the same field in different ways for different
|
||||||
|
purposes. This is the purpose of _multi-fields_. For instance, a `string`
|
||||||
|
field could be <<mapping-index,indexed>> as an `analyzed` field for full-text
|
||||||
|
search, and as a `not_analyzed` field for sorting or aggregations:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"city": {
|
||||||
|
"type": "string",
|
||||||
|
"fields": {
|
||||||
|
"raw": { <1>
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT /my_index/my_type/1
|
||||||
|
{
|
||||||
|
"city": "New York"
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT /my_index/my_type/2
|
||||||
|
{
|
||||||
|
"city": "York"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET /my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"city": "york" <2>
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"sort": {
|
||||||
|
"city.raw": "asc" <3>
|
||||||
|
},
|
||||||
|
"aggs": {
|
||||||
|
"Cities": {
|
||||||
|
"terms": {
|
||||||
|
"field": "city.raw" <3>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `city.raw` field is a `not_analyzed` version of the `city` field.
|
||||||
|
<2> The analyzed `city` field can be used for full text search.
|
||||||
|
<3> The `city.raw` field can be used for sorting and aggregations
|
||||||
|
|
||||||
|
NOTE: Multi-fields do not change the original `_source` field.
|
||||||
|
|
||||||
|
==== Multi-fields with multiple analyzers
|
||||||
|
|
||||||
|
Another use case of multi-fields is to analyze the same field in different
|
||||||
|
ways for better relevance. For instance we could index a field with the
|
||||||
|
<<analysis-standard-analyzer,`standard` analyzer>> which breaks text up into
|
||||||
|
words, and again with the <<english-analyzer,`english` analyzer>>
|
||||||
|
which stems words into their root form:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"text": { <1>
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"fields": {
|
||||||
|
"english": { <2>
|
||||||
|
"type": "string",
|
||||||
|
"analyzer": "english"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{ "text": "quick brown fox" } <3>
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{ "text": "quick brown foxes" } <3>
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"multi_match": {
|
||||||
|
"query": "quick brown foxes",
|
||||||
|
"fields": [ <4>
|
||||||
|
"text",
|
||||||
|
"text.english"
|
||||||
|
],
|
||||||
|
"type": "most_fields" <4>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> The `text` field uses the `standard` analyzer.
|
||||||
|
<2> The `text.english` field uses the `english` analyzer.
|
||||||
|
<3> Index two documents, one with `fox` and the other with `foxes`.
|
||||||
|
<4> Query both the `text` and `text.english` fields and combine the scores.
|
||||||
|
|
||||||
|
The `text` field contains the term `fox` in the first document and `foxes` in
|
||||||
|
the second document. The `text.english` field contains `fox` for both
|
||||||
|
documents, because `foxes` is stemmed to `fox`.
|
||||||
|
|
||||||
|
The query string is also analyzed by the `standard` analyzer for the `text`
|
||||||
|
field, and by the `english` analyzer` for the `text.english` field. The
|
||||||
|
stemmed field allows a query for `foxes` to also match the document containing
|
||||||
|
just `fox`. This allows us to match as many documents as possible. By also
|
||||||
|
querying the unstemmed `text` field, we improve the relevance score of the
|
||||||
|
document which matches `foxes` exactly.
|
||||||
|
|
|
@ -0,0 +1,64 @@
|
||||||
|
[[norms]]
|
||||||
|
=== `norms`
|
||||||
|
|
||||||
|
Norms store various normalization factors -- a number to represent the
|
||||||
|
relative field length and the <<index-boost,index time `boost`>> setting --
|
||||||
|
that are later used at query time in order to compute the score of a document
|
||||||
|
relatively to a query.
|
||||||
|
|
||||||
|
Although useful for scoring, norms also require quite a lot of memory
|
||||||
|
(typically in the order of one byte per document per field in your index, even
|
||||||
|
for documents that don't have this specific field). As a consequence, if you
|
||||||
|
don't need scoring on a specific field, you should disable norms on that
|
||||||
|
field. In particular, this is the case for fields that are used solely for
|
||||||
|
filtering or aggregations.
|
||||||
|
|
||||||
|
Norms can be disabled (but not reenabled) after the fact, using the
|
||||||
|
<<indices-put-mapping,PUT mapping API>> like so:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
------------
|
||||||
|
PUT my_index/_mapping/my_type
|
||||||
|
{
|
||||||
|
"properties": {
|
||||||
|
"title": {
|
||||||
|
"type": "string",
|
||||||
|
"norms": {
|
||||||
|
"enabled": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
NOTE: Norms will not be removed instantly, but will be removed as old segments
|
||||||
|
are merged into new segments as you continue indexing new documents. Any score
|
||||||
|
computation on a field that has had norms removed might return inconsistent
|
||||||
|
results since some documents won't have norms anymore while other documents
|
||||||
|
might still have norms.
|
||||||
|
|
||||||
|
==== Lazy loading of norms
|
||||||
|
|
||||||
|
Norms can be loaded into memory eagerly (`eager`), whenever a new segment
|
||||||
|
comes online, or they can loaded lazily (`lazy`, default), only when the field
|
||||||
|
is queried.
|
||||||
|
|
||||||
|
Eager loading can be configured as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
------------
|
||||||
|
PUT my_index/_mapping/my_type
|
||||||
|
{
|
||||||
|
"properties": {
|
||||||
|
"title": {
|
||||||
|
"type": "string",
|
||||||
|
"norms": {
|
||||||
|
"loading": "eager"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
|
@ -0,0 +1,58 @@
|
||||||
|
[[null-value]]
|
||||||
|
=== `null_value`
|
||||||
|
|
||||||
|
A `null` value cannot be indexed or searched. When a field is set to `null`,
|
||||||
|
(or an empty array or an array of `null` values) it is treated as though that
|
||||||
|
field has no values.
|
||||||
|
|
||||||
|
The `null_value` parameter allows you to replace explicit `null` values with
|
||||||
|
the specified value so that it can be indexed and searched. For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"status_code": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed",
|
||||||
|
"null_value": "NULL" <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"status_code": null
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{
|
||||||
|
"status_code": [] <2>
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"term": {
|
||||||
|
"status_code": "NULL" <3>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Replace explicit `null` values with the term `NULL`.
|
||||||
|
<2> An empty array does not contain an explicit `null`, and so won't be replaced with the `null_value`.
|
||||||
|
<3> A query for `NULL` returns document 1, but not document 2.
|
||||||
|
|
||||||
|
IMPORTANT: The `null_value` needs to be the same datatype as the field. For
|
||||||
|
instance, a `long` field cannot have a string `null_value`. String fields
|
||||||
|
which are `analyzed` will also pass the `null_value` through the configured
|
||||||
|
analyzer.
|
||||||
|
|
||||||
|
Also see the <<query-dsl-missing-query,`missing` query>> for its `null_value` support.
|
||||||
|
|
|
@ -0,0 +1,68 @@
|
||||||
|
[[position-offset-gap]]
|
||||||
|
=== `position_offset_gap`
|
||||||
|
|
||||||
|
<<mapping-index,Analyzed>> string fields take term <<index-options,positions>>
|
||||||
|
into account, in order to be able to support
|
||||||
|
<<query-dsl-match-query-phrase,proximity or phrase queries>>.
|
||||||
|
When indexing an array of strings, each string of the array is indexed
|
||||||
|
directly after the previous one, almost as though all the strings in the array
|
||||||
|
had been concatenated into one big string.
|
||||||
|
|
||||||
|
This can result in matches from phrase queries spanning two array elements.
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /my_index/groups/1
|
||||||
|
{
|
||||||
|
"names": [ "John Abraham", "Lincoln Smith"]
|
||||||
|
}
|
||||||
|
|
||||||
|
GET /my_index/groups/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match_phrase": {
|
||||||
|
"names": "Abraham Lincoln" <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> This phrase query matches our document, even though `Abraham` and `Lincoln` are in separate strings.
|
||||||
|
|
||||||
|
The `position_offset_gap` can introduce a fake gap between each array element. For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"names": {
|
||||||
|
"type": "string",
|
||||||
|
"position_offset_gap": 50 <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT /my_index/groups/1
|
||||||
|
{
|
||||||
|
"names": [ "John Abraham", "Lincoln Smith"]
|
||||||
|
}
|
||||||
|
|
||||||
|
GET /my_index/groups/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match_phrase": {
|
||||||
|
"names": "Abraham Lincoln" <2>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The first term in the next array element will be 50 terms apart from the
|
||||||
|
last term in the previous array element.
|
||||||
|
<2> The phrase query no longer matches our document.
|
|
@ -0,0 +1,56 @@
|
||||||
|
[[precision-step]]
|
||||||
|
=== `precision_step`
|
||||||
|
|
||||||
|
Most <<number,numeric>> datatypes index extra terms representing numeric
|
||||||
|
ranges for each number to make <<query-dsl-range-query,`range` queries>>
|
||||||
|
faster. For instance, this `range` query:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"range": {
|
||||||
|
"number": {
|
||||||
|
"gte": 0
|
||||||
|
"lte": 321
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
might be executed internally as a <<query-dsl-terms-query,`terms` query>> that
|
||||||
|
looks something like this:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
"terms": {
|
||||||
|
"number": [
|
||||||
|
"0-255",
|
||||||
|
"256-319"
|
||||||
|
"320",
|
||||||
|
"321"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
These extra terms greatly reduce the number of terms that have to be examined,
|
||||||
|
at the cost of increased disk space.
|
||||||
|
|
||||||
|
The default value for `precision_step` depends on the `type` of the numeric field:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`long`, `double`, `date`, `ip`:: `16` (3 extra terms)
|
||||||
|
`integer`, `float`, `short`:: `8` (3 extra terms)
|
||||||
|
`byte`:: `2147483647` (0 extra terms)
|
||||||
|
`token_count`:: `32` (0 extra terms)
|
||||||
|
|
||||||
|
The value of the `precision_step` setting indicates the number of bits that
|
||||||
|
should be compressed into an extra term. A `long` value consists of 64 bits,
|
||||||
|
so a `precision_step` of 16 results in the following terms:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
Bits 0-15:: `value & 1111111111111111 0000000000000000 0000000000000000 0000000000000000`
|
||||||
|
Bits 0-31:: `value & 1111111111111111 1111111111111111 0000000000000000 0000000000000000`
|
||||||
|
Bits 0-47:: `value & 1111111111111111 1111111111111111 1111111111111111 0000000000000000`
|
||||||
|
Bits 0-63:: `value`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,101 @@
|
||||||
|
[[properties]]
|
||||||
|
=== `properties`
|
||||||
|
|
||||||
|
Type mappings, <<object,`object` fields>> and <<nested,`nested` fields>>
|
||||||
|
contain sub-fields, called `properties`. These properties may be of any
|
||||||
|
<<mapping-types,datatype>>, including `object` and `nested`. Properties can
|
||||||
|
be added:
|
||||||
|
|
||||||
|
* explicitly by defining them when <<indices-create-index,creating an index>>.
|
||||||
|
* explicitily by defining them when adding or updating a mapping type with the <<indices-put-mapping,PUT mapping>> API.
|
||||||
|
* <<dynamic-mapping,dynamically>> just by indexing documents containing new fields.
|
||||||
|
|
||||||
|
Below is an example of adding `properties` to a mapping type, an `object`
|
||||||
|
field, and a `nested` field:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": { <1>
|
||||||
|
"properties": {
|
||||||
|
"manager": { <2>
|
||||||
|
"properties": {
|
||||||
|
"age": { "type": "integer" },
|
||||||
|
"name": { "type": "string" }
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"employees": { <3>
|
||||||
|
"type": "nested",
|
||||||
|
"properties": {
|
||||||
|
"age": { "type": "integer" },
|
||||||
|
"name": { "type": "string" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1 <4>
|
||||||
|
{
|
||||||
|
"region": "US",
|
||||||
|
"manager": {
|
||||||
|
"name": "Alice White",
|
||||||
|
"age": 30
|
||||||
|
},
|
||||||
|
"employees": [
|
||||||
|
{
|
||||||
|
"name": "John Smith",
|
||||||
|
"age": 34
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "Peter Brown",
|
||||||
|
"age": 26
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Properties under the `my_type` mapping type.
|
||||||
|
<2> Properties under the `manager` object field.
|
||||||
|
<3> Properties under the `employees` nested field.
|
||||||
|
<4> An example document which corresponds to the above mapping.
|
||||||
|
|
||||||
|
==== Dot notation
|
||||||
|
|
||||||
|
Inner fields can be referred to in queries, aggregations, etc., using _dot
|
||||||
|
notation_:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"manager.name": "Alice White" <1>
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"aggs": {
|
||||||
|
"Employees": {
|
||||||
|
"nested": {
|
||||||
|
"path": "employees"
|
||||||
|
},
|
||||||
|
"aggs": {
|
||||||
|
"Employee Ages": {
|
||||||
|
"histogram": {
|
||||||
|
"field": "employees.age", <2>
|
||||||
|
"interval": 5
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
IMPORTANT: The full path to the inner field must be specified.
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,79 @@
|
||||||
|
[[search-analyzer]]
|
||||||
|
=== `search_analyzer`
|
||||||
|
|
||||||
|
Usually, the same <<analyzer,analyzer>> should be applied at index time and at
|
||||||
|
search time, to ensure that the terms in the query are in the same format as
|
||||||
|
the terms in the inverted index.
|
||||||
|
|
||||||
|
Sometimes, though, it can make sense to use a different analyzer at search
|
||||||
|
time, such as when using the <<analysis-edgengram-tokenizer,`edge_ngram`>>
|
||||||
|
tokenizer for autocomplete.
|
||||||
|
|
||||||
|
By default, queries will use the `analyzer` defined in the field mapping, but
|
||||||
|
this can be overridden with the `search_analyzer` setting:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"settings": {
|
||||||
|
"analysis": {
|
||||||
|
"filter": {
|
||||||
|
"autocomplete_filter": {
|
||||||
|
"type": "edge_ngram",
|
||||||
|
"min_gram": 1,
|
||||||
|
"max_gram": 20
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"analyzer": {
|
||||||
|
"autocomplete": { <1>
|
||||||
|
"type": "custom",
|
||||||
|
"tokenizer": "standard",
|
||||||
|
"filter": [
|
||||||
|
"lowercase",
|
||||||
|
"autocomplete_filter"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"text": {
|
||||||
|
"type": "string",
|
||||||
|
"analyzer": "autocomplete", <2>
|
||||||
|
"search_analyzer": "standard" <2>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"text": "Quick Brown Fox" <3>
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"text": {
|
||||||
|
"query": "Quick Br", <4>
|
||||||
|
"operator": "and"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
<1> Analysis settings to define the custom `autocomplete` analyzer.
|
||||||
|
<2> The `text` field uses the `autocomplete` analyzer at index time, but the `standard` analyzer at search time.
|
||||||
|
<3> This field is indexed as the terms: [ `q`, `qu`, `qui`, `quic`, `quick`, `b`, `br`, `bro`, `brow`, `brown`, `f`, `fo`, `fox` ]
|
||||||
|
<4> The query searches for both of these terms: [ `quick`, `br` ]
|
||||||
|
|
||||||
|
See {defguide}/_index_time_search_as_you_type.html[Index time search-as-you-
|
||||||
|
type] for a full explanation of this example.
|
|
@ -0,0 +1,54 @@
|
||||||
|
[[similarity]]
|
||||||
|
=== `similarity`
|
||||||
|
|
||||||
|
Elasticsearch allows you to configure a scoring algorithm or _similarity_ per
|
||||||
|
field. The `similarity` setting provides a simple way of choosing a similarity
|
||||||
|
algorithm other than the default TF/IDF, such as `BM25`.
|
||||||
|
|
||||||
|
Similarities are mostly useful for <<string,`string`>> fields, especially
|
||||||
|
`analyzed` string fields, but can also apply to other field types.
|
||||||
|
|
||||||
|
Custom similarites can be configured by tuning the parameters of the built-in
|
||||||
|
similarities. For more details about this expert options, see the
|
||||||
|
<<index-modules-similarity,similarity module>>.
|
||||||
|
|
||||||
|
The only similarities which can be used out of the box, without any further
|
||||||
|
configuration are:
|
||||||
|
|
||||||
|
`default`::
|
||||||
|
The Default TF/IDF algorithm used by Elasticsearch and
|
||||||
|
Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function]
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
`BM25`::
|
||||||
|
The Okapi BM25 algorithm.
|
||||||
|
See {defguide}/pluggable-similarites.html[Plugggable Similarity Algorithms]
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
|
||||||
|
The `similarity` can be set on the field level when a field is first created,
|
||||||
|
as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"default_field": { <1>
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"bm25_field": {
|
||||||
|
"type": "string",
|
||||||
|
"similarity": "BM25" <2>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `default_field` uses the `default` similarity (ie TF/IDF).
|
||||||
|
<2> The `bm25_field` uses the `BM25` similarity.
|
||||||
|
|
|
@ -0,0 +1,73 @@
|
||||||
|
[[mapping-store]]
|
||||||
|
=== `store`
|
||||||
|
|
||||||
|
By default, field values <<mapping-index,indexed>> to make them searchable,
|
||||||
|
but they are not _stored_. This means that the field can be queried, but the
|
||||||
|
original field value cannot be retrieved.
|
||||||
|
|
||||||
|
Usually this doesn't matter. The field value is already part of the
|
||||||
|
<<mapping-source-field,`_source` field>>, which is stored by default. If you
|
||||||
|
only want to retrieve the value of a single field or of a few fields, instead
|
||||||
|
of the whole `_source`, then this can be achieved with
|
||||||
|
<<search-request-source-filtering,source filtering>>.
|
||||||
|
|
||||||
|
In certain situations it can make sense to `store` a field. For instance, if
|
||||||
|
you have a document with a `title`, a `date`, and a very large `content`
|
||||||
|
field, you may want to retrieve just the `title` and the `date` without having
|
||||||
|
to extract those fields from a large `_source` field:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT /my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"title": {
|
||||||
|
"type": "string",
|
||||||
|
"store": true <1>
|
||||||
|
},
|
||||||
|
"date": {
|
||||||
|
"type": "date",
|
||||||
|
"store": true <1>
|
||||||
|
},
|
||||||
|
"content": {
|
||||||
|
"type": "string"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT /my_index/my_type/1
|
||||||
|
{
|
||||||
|
"title": "Some short title",
|
||||||
|
"date": "2015-01-01",
|
||||||
|
"content": "A very long content field..."
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"fields": [ "title", "date" ] <2>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `title` and `date` fields are stored.
|
||||||
|
<2> This request will retrieve the values of the `title` and `date` fields.
|
||||||
|
|
||||||
|
[NOTE]
|
||||||
|
.Stored fields returned as arrays
|
||||||
|
======================================
|
||||||
|
|
||||||
|
For consistency, stored fields are always returned as an _array_ because there
|
||||||
|
is no way of knowing if the original field value was a single value, multiple
|
||||||
|
values, or an empty array.
|
||||||
|
|
||||||
|
If you need the original value, you should retrieve it from the `_source`
|
||||||
|
field instead.
|
||||||
|
|
||||||
|
======================================
|
||||||
|
|
||||||
|
Another situation where it can make sense to make a field stored is for those
|
||||||
|
that don't appear in the `_source` field (such as <<copy-to,`copy_to` fields>>).
|
||||||
|
|
|
@ -0,0 +1,68 @@
|
||||||
|
[[term-vector]]
|
||||||
|
=== `term_vector`
|
||||||
|
|
||||||
|
Term vectors contain information about the terms produced by the
|
||||||
|
<<analysis,analysis>> process, including:
|
||||||
|
|
||||||
|
* a list of terms.
|
||||||
|
* the position (or order) of each term.
|
||||||
|
* the start and end character offsets mapping the term to its
|
||||||
|
origin in the original string.
|
||||||
|
|
||||||
|
These term vectors can be stored so that they can be retrieved for a
|
||||||
|
particular document.
|
||||||
|
|
||||||
|
The `term_vector` setting accepts:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`no`:: No term vectors are stored. (default)
|
||||||
|
`yes`:: Just the terms in the field are stored.
|
||||||
|
`with_positions`:: Terms and positions are stored.
|
||||||
|
`with_offsets`:: Terms and character offsets are stored.
|
||||||
|
`with_positions_offsets`:: Terms, positions, and character offsets are stored.
|
||||||
|
|
||||||
|
The fast vector highlighter requires `with_positions_offsets`. The term
|
||||||
|
vectors API can retrieve whatever is stored.
|
||||||
|
|
||||||
|
WARNING: Setting `with_positions_offsets` will double the size of a field's
|
||||||
|
index.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"text": {
|
||||||
|
"type": "string",
|
||||||
|
"term_vector": "with_positions_offsets"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"text": "Quick brown fox"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"text": "brown fox"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"highlight": {
|
||||||
|
"fields": {
|
||||||
|
"text": {} <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The fast vector highlighter will be used by default for the `text` field
|
||||||
|
because term vectors are enabled.
|
||||||
|
|
|
@ -1,61 +0,0 @@
|
||||||
[[mapping-transform]]
|
|
||||||
== Transform
|
|
||||||
The document can be transformed before it is indexed by registering a
|
|
||||||
script in the `transform` element of the mapping. The result of the
|
|
||||||
transform is indexed but the original source is stored in the `_source`
|
|
||||||
field. Example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"example" : {
|
|
||||||
"transform" : {
|
|
||||||
"script" : {
|
|
||||||
"inline": "if (ctx._source['title']?.startsWith('t')) ctx._source['suggest'] = ctx._source['content']",
|
|
||||||
"params" : {
|
|
||||||
"variable" : "not used but an example anyway"
|
|
||||||
},
|
|
||||||
"lang": "groovy"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"properties": {
|
|
||||||
"title": { "type": "string" },
|
|
||||||
"content": { "type": "string" },
|
|
||||||
"suggest": { "type": "string" }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Its also possible to specify multiple transforms:
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"example" : {
|
|
||||||
"transform" : [
|
|
||||||
{"script": "ctx._source['suggest'] = ctx._source['content']"}
|
|
||||||
{"script": "ctx._source['foo'] = ctx._source['bar'];"}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Because the result isn't stored in the source it can't normally be fetched by
|
|
||||||
source filtering. It can be highlighted if it is marked as stored.
|
|
||||||
|
|
||||||
=== Get Transformed
|
|
||||||
The get endpoint will retransform the source if the `_source_transform`
|
|
||||||
parameter is set. Example:
|
|
||||||
|
|
||||||
[source,sh]
|
|
||||||
--------------------------------------------------
|
|
||||||
curl -XGET "http://localhost:9200/test/example/3?pretty&_source_transform"
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The transform is performed before any source filtering but it is mostly
|
|
||||||
designed to make it easy to see what was passed to the index for debugging.
|
|
||||||
|
|
||||||
=== Immutable Transformation
|
|
||||||
Once configured the transform script cannot be modified. This is not
|
|
||||||
because that is technically impossible but instead because madness lies
|
|
||||||
down that road.
|
|
|
@ -1,24 +1,71 @@
|
||||||
[[mapping-types]]
|
[[mapping-types]]
|
||||||
== Types
|
== Field datatypes
|
||||||
|
|
||||||
The datatype for each field in a document (eg strings, numbers,
|
Elasticsearch supports a number of different datatypes for the fields in a
|
||||||
objects etc) can be controlled via the type mapping.
|
document:
|
||||||
|
|
||||||
include::types/core-types.asciidoc[]
|
[float]
|
||||||
|
=== Core datatypes
|
||||||
|
|
||||||
include::types/array-type.asciidoc[]
|
<<string>>:: `string`
|
||||||
|
<<number>>:: `long`, `integer`, `short`, `byte`, `double`, `float`
|
||||||
|
<<date>>:: `date`
|
||||||
|
<<boolean>>:: `boolean`
|
||||||
|
<<binary>>:: `binary`
|
||||||
|
|
||||||
include::types/object-type.asciidoc[]
|
[float]
|
||||||
|
=== Complex datatypes
|
||||||
|
|
||||||
include::types/root-object-type.asciidoc[]
|
<<array>>:: Array support does not require a dedicated `type`
|
||||||
|
<<object>>:: `object` for single JSON objects
|
||||||
|
<<nested>>:: `nested` for arrays of JSON objects
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Geo dataypes
|
||||||
|
|
||||||
|
<<geo-point>>:: `geo_point` for lat/lon points
|
||||||
|
<<geo-shape>>:: `geo_shape` for complex shapes like polygons
|
||||||
|
|
||||||
|
[float]
|
||||||
|
=== Specialised datatypes
|
||||||
|
|
||||||
|
<<ip>>:: `ip` for IPv4 addresses
|
||||||
|
<<search-suggesters-completion,Completion datatype>>::
|
||||||
|
`completion` to provide auto-complete suggestions
|
||||||
|
<<token-count>>:: `token_count` to count the number of tokens in a string
|
||||||
|
|
||||||
|
Attachment datatype::
|
||||||
|
|
||||||
|
See the https://github.com/elastic/elasticsearch-mapper-attachments[mapper attachment plugin]
|
||||||
|
which supports indexing ``attachments'' like Microsoft Office formats, Open
|
||||||
|
Document formats, ePub, HTML, etc. into an `attachment` datatype.
|
||||||
|
|
||||||
|
include::types/array.asciidoc[]
|
||||||
|
|
||||||
|
include::types/binary.asciidoc[]
|
||||||
|
|
||||||
|
include::types/boolean.asciidoc[]
|
||||||
|
|
||||||
|
include::types/date.asciidoc[]
|
||||||
|
|
||||||
|
include::types/geo-point.asciidoc[]
|
||||||
|
|
||||||
|
include::types/geo-shape.asciidoc[]
|
||||||
|
|
||||||
|
include::types/ip.asciidoc[]
|
||||||
|
|
||||||
|
include::types/nested.asciidoc[]
|
||||||
|
|
||||||
|
include::types/numeric.asciidoc[]
|
||||||
|
|
||||||
|
include::types/object.asciidoc[]
|
||||||
|
|
||||||
|
include::types/string.asciidoc[]
|
||||||
|
|
||||||
|
include::types/token-count.asciidoc[]
|
||||||
|
|
||||||
include::types/nested-type.asciidoc[]
|
|
||||||
|
|
||||||
include::types/ip-type.asciidoc[]
|
|
||||||
|
|
||||||
include::types/geo-point-type.asciidoc[]
|
|
||||||
|
|
||||||
include::types/geo-shape-type.asciidoc[]
|
|
||||||
|
|
||||||
include::types/attachment-type.asciidoc[]
|
|
||||||
|
|
||||||
|
|
|
@ -1,69 +0,0 @@
|
||||||
[[mapping-array-type]]
|
|
||||||
=== Array Type
|
|
||||||
|
|
||||||
JSON documents allow to define an array (list) of fields or objects.
|
|
||||||
Mapping array types could not be simpler since arrays gets automatically
|
|
||||||
detected and mapping them can be done either with
|
|
||||||
<<mapping-core-types,Core Types>> or
|
|
||||||
<<mapping-object-type,Object Type>> mappings.
|
|
||||||
For example, the following JSON defines several arrays:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"message" : "some arrays in this tweet...",
|
|
||||||
"tags" : ["elasticsearch", "wow"],
|
|
||||||
"lists" : [
|
|
||||||
{
|
|
||||||
"name" : "prog_list",
|
|
||||||
"description" : "programming list"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name" : "cool_list",
|
|
||||||
"description" : "cool stuff list"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The above JSON has the `tags` property defining a list of a simple
|
|
||||||
`string` type, and the `lists` property is an `object` type array. Here
|
|
||||||
is a sample explicit mapping:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"message" : {"type" : "string"},
|
|
||||||
"tags" : {"type" : "string"},
|
|
||||||
"lists" : {
|
|
||||||
"properties" : {
|
|
||||||
"name" : {"type" : "string"},
|
|
||||||
"description" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The fact that array types are automatically supported can be shown by
|
|
||||||
the fact that the following JSON document is perfectly fine:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"message" : "some arrays in this tweet...",
|
|
||||||
"tags" : "elasticsearch",
|
|
||||||
"lists" : {
|
|
||||||
"name" : "prog_list",
|
|
||||||
"description" : "programming list"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
|
@ -0,0 +1,99 @@
|
||||||
|
[[array]]
|
||||||
|
=== Array datatype
|
||||||
|
|
||||||
|
In Elasticsearch, there is no dedicated `array` type. Any field can contain
|
||||||
|
zero or more values by default, however, all values in the array must be of
|
||||||
|
the same datatype. For instance:
|
||||||
|
|
||||||
|
* an array of strings: [ `"one"`, `"two"` ]
|
||||||
|
* an array of integers: [ `1`, `2` ]
|
||||||
|
* an array of arrays: [ `1`, [ `2`, `3` ]] which is the equivalent of [ `1`, `2`, `3` ]
|
||||||
|
* an array of objects: [ `{ "name": "Mary", "age": 12 }`, `{ "name": "John", "age": 10 }`]
|
||||||
|
|
||||||
|
.Arrays of objects
|
||||||
|
[NOTE]
|
||||||
|
====================================================
|
||||||
|
|
||||||
|
Arrays of objects do not work as you would expect: you cannot query each
|
||||||
|
object independently of the other objects in the array. If you need to be
|
||||||
|
able to do this then you should use the <<nested,`nested`>> datatype instead
|
||||||
|
of the <<object,`object`>> datatype.
|
||||||
|
|
||||||
|
This is explained in more detail in <<nested>>.
|
||||||
|
====================================================
|
||||||
|
|
||||||
|
|
||||||
|
When adding a field dynamically, the first value in the array determines the
|
||||||
|
field `type`. All subsequent values must be of the same datatype or it must
|
||||||
|
at least be possible to <<coerce,coerce>> subsequent values to the same
|
||||||
|
datatype.
|
||||||
|
|
||||||
|
Arrays with a mixture of datatypes are _not_ supported: [ `10`, `"some string"` ]
|
||||||
|
|
||||||
|
An array may contain `null` values, which are either replaced by the
|
||||||
|
configured <<null-value,`null_value`>> or skipped entirely. An empty array
|
||||||
|
`[]` is treated as a missing field -- a field with no values.
|
||||||
|
|
||||||
|
Nothing needs to be pre-configured in order to use arrays in documents, they
|
||||||
|
are supported out of the box:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"message": "some arrays in this document...",
|
||||||
|
"tags": [ "elasticsearch", "wow" ], <1>
|
||||||
|
"lists": [ <2>
|
||||||
|
{
|
||||||
|
"name": "prog_list",
|
||||||
|
"description": "programming list"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "cool_list",
|
||||||
|
"description": "cool stuff list"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/2 <3>
|
||||||
|
{
|
||||||
|
"message": "no arrays in this document...",
|
||||||
|
"tags": "elasticsearch",
|
||||||
|
"lists": {
|
||||||
|
"name": "prog_list",
|
||||||
|
"description": "programming list"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"match": {
|
||||||
|
"tags": "elasticsearch" <4>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `tags` field is dynamically added as a `string` field.
|
||||||
|
<2> The `lists` field is dynamically added as an `object` field.
|
||||||
|
<3> The second document contains no arrays, but can be indexed into the same fields.
|
||||||
|
<4> The query looks for `elasticsearch` in the `tags` field, and matches both documents.
|
||||||
|
|
||||||
|
.Multi-value fields and the inverted index
|
||||||
|
****************************************************
|
||||||
|
|
||||||
|
The fact that all field types support multi-value fields out of the box is a
|
||||||
|
consequence of the origins of Lucene. Lucene was designed to be a full text
|
||||||
|
search engine. In order to be able to search for individual words within a
|
||||||
|
big block of text, Lucene tokenizes the text into individual terms, and
|
||||||
|
adds each term to the inverted index separately.
|
||||||
|
|
||||||
|
This means that even a simple text field must be able to support multiple
|
||||||
|
values by default. When other datatypes were added, such as numbers and
|
||||||
|
dates, they used the same data structure as strings, and so got multi-values
|
||||||
|
for free.
|
||||||
|
|
||||||
|
****************************************************
|
||||||
|
|
|
@ -1,13 +0,0 @@
|
||||||
[[mapping-attachment-type]]
|
|
||||||
=== Attachment Type
|
|
||||||
|
|
||||||
The `attachment` type allows to index different "attachment" type field
|
|
||||||
(encoded as `base64`), for example, Microsoft Office formats, open
|
|
||||||
document formats, ePub, HTML, and so on.
|
|
||||||
|
|
||||||
The `attachment` type is provided as a
|
|
||||||
https://github.com/elasticsearch/elasticsearch-mapper-attachments[plugin
|
|
||||||
extension]. It uses http://tika.apache.org/[Apache Tika] behind the scene.
|
|
||||||
|
|
||||||
See https://github.com/elasticsearch/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch[README file]
|
|
||||||
for details.
|
|
|
@ -0,0 +1,52 @@
|
||||||
|
[[binary]]
|
||||||
|
=== Binary datatype
|
||||||
|
|
||||||
|
The `binary` type accepts a binary value as a
|
||||||
|
https://en.wikipedia.org/wiki/Base64[Base64] encoded string. The field is not
|
||||||
|
stored by default and is not searchable:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"name": {
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"blob": {
|
||||||
|
"type": "binary"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"name": "Some binary blob",
|
||||||
|
"blob": "U29tZSBiaW5hcnkgYmxvYg==" <1>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
<1> The Base64 encoded binary value must not have embedded newlines `\n`.
|
||||||
|
|
||||||
|
[[binary-params]]
|
||||||
|
==== Parameters for `binary` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `binary` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` or `false` (default).
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,119 @@
|
||||||
|
[[boolean]]
|
||||||
|
=== Boolean datatype
|
||||||
|
|
||||||
|
Boolean fields accept JSON `true` and `false` values, but can also accept
|
||||||
|
strings and numbers which are interpreted as either true or false:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
False values::
|
||||||
|
|
||||||
|
`false`, `"false"`, `"off"`, `"no"`, `"0"`, `""` (empty string), `0`, `0.0`
|
||||||
|
|
||||||
|
True values::
|
||||||
|
|
||||||
|
Anything that isn't false.
|
||||||
|
|
||||||
|
For example:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"is_published": {
|
||||||
|
"type": "boolean"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
POST my_index/my_type/1
|
||||||
|
{
|
||||||
|
"is_published": true <1>
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"term": {
|
||||||
|
"is_published": 1 <2>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Indexing a document with a JSON `true`.
|
||||||
|
<2> Querying for the document with `1`, which is interpreted as `true`.
|
||||||
|
|
||||||
|
Aggregations like the <<search-aggregations-bucket-terms-aggregation,`terms`
|
||||||
|
aggregation>> use `1` and `0` for the `key`, and the strings `"true"` and
|
||||||
|
`"false"` for the `key_as_string`. Boolean fields when used in scripts,
|
||||||
|
return `1` and `0`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
POST my_index/my_type/1
|
||||||
|
{
|
||||||
|
"is_published": true
|
||||||
|
}
|
||||||
|
|
||||||
|
POST my_index/my_type/2
|
||||||
|
{
|
||||||
|
"is_published": false
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"aggs": {
|
||||||
|
"publish_state": {
|
||||||
|
"terms": {
|
||||||
|
"field": "is_published"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"script_fields": {
|
||||||
|
"is_published": {
|
||||||
|
"script": "doc['is_published'].value" <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Inline scripts must be <<enable-dynamic-scripting,enabled>> for this example to work.
|
||||||
|
|
||||||
|
[[boolean-params]]
|
||||||
|
==== Parameters for `boolean` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `boolean` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<index-boost,`boost`>>::
|
||||||
|
|
||||||
|
Field-level index time boosting. Accepts a floating point number, defaults
|
||||||
|
to `1.0`.
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` (default) or `false`.
|
||||||
|
|
||||||
|
<<mapping-index,`index`>>::
|
||||||
|
|
||||||
|
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||||
|
|
||||||
|
<<null-value,`null_value`>>::
|
||||||
|
|
||||||
|
Accepts any of the true or false values listed above. The value is
|
||||||
|
substituted for any explicit `null` values. Defaults to `null`, which
|
||||||
|
means the field is treated as missing.
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
||||||
|
|
|
@ -1,649 +0,0 @@
|
||||||
[[mapping-core-types]]
|
|
||||||
=== Core Types
|
|
||||||
|
|
||||||
Each JSON field can be mapped to a specific core type. JSON itself
|
|
||||||
already provides us with some typing, with its support for `string`,
|
|
||||||
`integer`/`long`, `float`/`double`, `boolean`, and `null`.
|
|
||||||
|
|
||||||
The following sample tweet JSON document will be used to explain the
|
|
||||||
core types:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" {
|
|
||||||
"user" : "kimchy",
|
|
||||||
"message" : "This is a tweet!",
|
|
||||||
"postDate" : "2009-11-15T14:12:12",
|
|
||||||
"priority" : 4,
|
|
||||||
"rank" : 12.3
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Explicit mapping for the above JSON tweet can be:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"user" : {"type" : "string", "index" : "not_analyzed"},
|
|
||||||
"message" : {"type" : "string", "null_value" : "na"},
|
|
||||||
"postDate" : {"type" : "date"},
|
|
||||||
"priority" : {"type" : "integer"},
|
|
||||||
"rank" : {"type" : "float"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[string]]
|
|
||||||
==== String
|
|
||||||
|
|
||||||
The text based string type is the most basic type, and contains one or
|
|
||||||
more characters. An example mapping can be:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"message" : {
|
|
||||||
"type" : "string",
|
|
||||||
"store" : true,
|
|
||||||
"index" : "analyzed",
|
|
||||||
"null_value" : "na"
|
|
||||||
},
|
|
||||||
"user" : {
|
|
||||||
"type" : "string",
|
|
||||||
"index" : "not_analyzed",
|
|
||||||
"norms" : {
|
|
||||||
"enabled" : false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The above mapping defines a `string` `message` property/field within the
|
|
||||||
`tweet` type. The field is stored in the index (so it can later be
|
|
||||||
retrieved using selective loading when searching), and it gets analyzed
|
|
||||||
(broken down into searchable terms). If the message has a `null` value,
|
|
||||||
then the value that will be stored is `na`. There is also a `string` `user`
|
|
||||||
which is indexed as-is (not broken down into tokens) and has norms
|
|
||||||
disabled (so that matching this field is a binary decision, no match is
|
|
||||||
better than another one).
|
|
||||||
|
|
||||||
The following table lists all the attributes that can be used with the
|
|
||||||
`string` type:
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Attribute |Description
|
|
||||||
|`index_name` |The name of the field that will be stored in the index.
|
|
||||||
Defaults to the property/field name.
|
|
||||||
|
|
||||||
|`store` |Set to `true` to actually store the field in the index, `false` to not
|
|
||||||
store it. Since by default Elasticsearch stores all fields of the source
|
|
||||||
document in the special `_source` field, this option is primarily useful when
|
|
||||||
the `_source` field has been disabled in the type definition. Defaults to
|
|
||||||
`false`.
|
|
||||||
|
|
||||||
|`index` |Set to `analyzed` for the field to be indexed and searchable
|
|
||||||
after being broken down into token using an analyzer. `not_analyzed`
|
|
||||||
means that its still searchable, but does not go through any analysis
|
|
||||||
process or broken down into tokens. `no` means that it won't be
|
|
||||||
searchable at all (as an individual field; it may still be included in
|
|
||||||
`_all`). Setting to `no` disables `include_in_all`. Defaults to
|
|
||||||
`analyzed`.
|
|
||||||
|
|
||||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
|
||||||
Automatically set to `true` when the <<fielddata-formats,`fielddata` format>> is `doc_values`.
|
|
||||||
|
|
||||||
|`term_vector` |Possible values are `no`, `yes`, `with_offsets`,
|
|
||||||
`with_positions`, `with_positions_offsets`. Defaults to `no`.
|
|
||||||
|
|
||||||
|`boost` |The boost value. Defaults to `1.0`.
|
|
||||||
|
|
||||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
|
||||||
`null_value` as the field value. Defaults to not adding the field at
|
|
||||||
all.
|
|
||||||
|
|
||||||
|`norms: {enabled: <value>}` |Boolean value if norms should be enabled or
|
|
||||||
not. Defaults to `true` for `analyzed` fields, and to `false` for
|
|
||||||
`not_analyzed` fields. See the <<norms,section about norms>>.
|
|
||||||
|
|
||||||
|`norms: {loading: <value>}` |Describes how norms should be loaded, possible values are
|
|
||||||
`eager` and `lazy` (default). It is possible to change the default value to
|
|
||||||
eager for all fields by configuring the index setting `index.norms.loading`
|
|
||||||
to `eager`.
|
|
||||||
|
|
||||||
|`index_options` | Allows to set the indexing
|
|
||||||
options, possible values are `docs` (only doc numbers are indexed),
|
|
||||||
`freqs` (doc numbers and term frequencies), and `positions` (doc
|
|
||||||
numbers, term frequencies and positions). Defaults to `positions` for
|
|
||||||
`analyzed` fields, and to `docs` for `not_analyzed` fields. It
|
|
||||||
is also possible to set it to `offsets` (doc numbers, term
|
|
||||||
frequencies, positions and offsets).
|
|
||||||
|
|
||||||
|`analyzer` |The analyzer used to analyze the text contents when
|
|
||||||
`analyzed` during indexing and searching.
|
|
||||||
Defaults to the globally configured analyzer.
|
|
||||||
|
|
||||||
|`search_analyzer` |The analyzer used to analyze the field when searching, which
|
|
||||||
overrides the value of `analyzer`. Can be updated on an existing field.
|
|
||||||
|
|
||||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
|
||||||
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
|
|
||||||
defaults to `true` or to the parent `object` type setting.
|
|
||||||
|
|
||||||
|`ignore_above` |The analyzer will ignore strings larger than this size.
|
|
||||||
Useful for generic `not_analyzed` fields that should ignore long text.
|
|
||||||
|
|
||||||
This option is also useful for protecting against Lucene's term byte-length
|
|
||||||
limit of `32766`. Note: the value for `ignore_above` is the _character count_,
|
|
||||||
but Lucene counts bytes, so if you have UTF-8 text, you may want to set the
|
|
||||||
limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most 3
|
|
||||||
bytes.
|
|
||||||
|
|
||||||
|`position_offset_gap` |Position increment gap between field instances
|
|
||||||
with the same field name. Defaults to 0.
|
|
||||||
|=======================================================================
|
|
||||||
|
|
||||||
The `string` type also support custom indexing parameters associated
|
|
||||||
with the indexed value. For example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"message" : {
|
|
||||||
"_value": "boosted value",
|
|
||||||
"_boost": 2.0
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The mapping is required to disambiguate the meaning of the document.
|
|
||||||
Otherwise, the structure would interpret "message" as a value of type
|
|
||||||
"object". The key `_value` (or `value`) in the inner document specifies
|
|
||||||
the real string content that should eventually be indexed. The `_boost`
|
|
||||||
(or `boost`) key specifies the per field document boost (here 2.0).
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[norms]]
|
|
||||||
===== Norms
|
|
||||||
|
|
||||||
Norms store various normalization factors that are later used (at query time)
|
|
||||||
in order to compute the score of a document relatively to a query.
|
|
||||||
|
|
||||||
Although useful for scoring, norms also require quite a lot of memory
|
|
||||||
(typically in the order of one byte per document per field in your index,
|
|
||||||
even for documents that don't have this specific field). As a consequence, if
|
|
||||||
you don't need scoring on a specific field, it is highly recommended to disable
|
|
||||||
norms on it. In particular, this is the case for fields that are used solely
|
|
||||||
for filtering or aggregations.
|
|
||||||
|
|
||||||
In case you would like to disable norms after the fact, it is possible to do so
|
|
||||||
by using the <<indices-put-mapping,PUT mapping API>>, like this:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
------------
|
|
||||||
PUT my_index/_mapping/my_type
|
|
||||||
{
|
|
||||||
"properties": {
|
|
||||||
"title": {
|
|
||||||
"type": "string",
|
|
||||||
"norms": {
|
|
||||||
"enabled": false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
------------
|
|
||||||
|
|
||||||
Please however note that norms won't be removed instantly, but will be removed
|
|
||||||
as old segments are merged into new segments as you continue indexing new documents.
|
|
||||||
Any score computation on a field that has had
|
|
||||||
norms removed might return inconsistent results since some documents won't have
|
|
||||||
norms anymore while other documents might still have norms.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[number]]
|
|
||||||
==== Number
|
|
||||||
|
|
||||||
A number based type supporting `float`, `double`, `byte`, `short`,
|
|
||||||
`integer`, and `long`. It uses specific constructs within Lucene in
|
|
||||||
order to support numeric values. The number types have the same ranges
|
|
||||||
as corresponding
|
|
||||||
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html[Java
|
|
||||||
types]. An example mapping can be:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"rank" : {
|
|
||||||
"type" : "float",
|
|
||||||
"null_value" : 1.0
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The following table lists all the attributes that can be used with a
|
|
||||||
numbered type:
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Attribute |Description
|
|
||||||
|`type` |The type of the number. Can be `float`, `double`, `integer`,
|
|
||||||
`long`, `short`, `byte`. Required.
|
|
||||||
|
|
||||||
|`index_name` |The name of the field that will be stored in the index.
|
|
||||||
Defaults to the property/field name.
|
|
||||||
|
|
||||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
|
||||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
|
||||||
and it can be retrieved from it).
|
|
||||||
|
|
||||||
|`index` |Set to `no` if the value should not be indexed. Setting to
|
|
||||||
`no` disables `include_in_all`. If set to `no` the field should be either stored
|
|
||||||
in `_source`, have `include_in_all` enabled, or `store` be set to
|
|
||||||
`true` for this to be useful.
|
|
||||||
|
|
||||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
|
||||||
Automatically set to `true` when the fielddata format is `doc_values`.
|
|
||||||
|
|
||||||
|`precision_step` |The precision step (influences the number of terms
|
|
||||||
generated for each number value). Defaults to `16` for `long`, `double`,
|
|
||||||
`8` for `short`, `integer`, `float`, and `2147483647` for `byte`.
|
|
||||||
|
|
||||||
|`boost` |The boost value. Defaults to `1.0`.
|
|
||||||
|
|
||||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
|
||||||
`null_value` as the field value. Defaults to not adding the field at
|
|
||||||
all.
|
|
||||||
|
|
||||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
|
||||||
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
|
|
||||||
defaults to `true` or to the parent `object` type setting.
|
|
||||||
|
|
||||||
|`ignore_malformed` |Ignored a malformed number. Defaults to `false`.
|
|
||||||
|
|
||||||
|`coerce` |Try convert strings to numbers and truncate fractions for integers. Defaults to `true`.
|
|
||||||
|
|
||||||
|=======================================================================
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[token_count]]
|
|
||||||
==== Token Count
|
|
||||||
The `token_count` type maps to the JSON string type but indexes and stores
|
|
||||||
the number of tokens in the string rather than the string itself. For
|
|
||||||
example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"name" : {
|
|
||||||
"type" : "string",
|
|
||||||
"fields" : {
|
|
||||||
"word_count": {
|
|
||||||
"type" : "token_count",
|
|
||||||
"store" : "yes",
|
|
||||||
"analyzer" : "standard"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
All the configuration that can be specified for a number can be specified
|
|
||||||
for a token_count. The only extra configuration is the required
|
|
||||||
`analyzer` field which specifies which analyzer to use to break the string
|
|
||||||
into tokens. For best performance, use an analyzer with no token filters.
|
|
||||||
|
|
||||||
[NOTE]
|
|
||||||
===================================================================
|
|
||||||
Technically the `token_count` type sums position increments rather than
|
|
||||||
counting tokens. This means that even if the analyzer filters out stop
|
|
||||||
words they are included in the count.
|
|
||||||
===================================================================
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[date]]
|
|
||||||
==== Date
|
|
||||||
|
|
||||||
The date type is a special type which maps to JSON string type. It
|
|
||||||
follows a specific format that can be explicitly set. All dates are
|
|
||||||
`UTC`. Internally, a date maps to a number type `long`, with the added
|
|
||||||
parsing stage from string to long and from long to string. An example
|
|
||||||
mapping:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"postDate" : {
|
|
||||||
"type" : "date",
|
|
||||||
"format" : "YYYY-MM-dd"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The date type will also accept a long number representing UTC
|
|
||||||
milliseconds since the epoch, regardless of the format it can handle.
|
|
||||||
|
|
||||||
The following table lists all the attributes that can be used with a
|
|
||||||
date type:
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Attribute |Description
|
|
||||||
|`index_name` |The name of the field that will be stored in the index.
|
|
||||||
Defaults to the property/field name.
|
|
||||||
|
|
||||||
|`format` |The <<mapping-date-format,date
|
|
||||||
format>>. Defaults to `epoch_millis||strictDateOptionalTime`.
|
|
||||||
|
|
||||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
|
||||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
|
||||||
and it can be retrieved from it).
|
|
||||||
|
|
||||||
|`index` |Set to `no` if the value should not be indexed. Setting to
|
|
||||||
`no` disables `include_in_all`. If set to `no` the field should be either stored
|
|
||||||
in `_source`, have `include_in_all` enabled, or `store` be set to
|
|
||||||
`true` for this to be useful.
|
|
||||||
|
|
||||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
|
||||||
Automatically set to `true` when the fielddata format is `doc_values`.
|
|
||||||
|
|
||||||
|`precision_step` |The precision step (influences the number of terms
|
|
||||||
generated for each number value). Defaults to `16`.
|
|
||||||
|
|
||||||
|`boost` |The boost value. Defaults to `1.0`.
|
|
||||||
|
|
||||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
|
||||||
`null_value` as the field value. Defaults to not adding the field at
|
|
||||||
all.
|
|
||||||
|
|
||||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
|
||||||
enabled). If `index` is set to `no` this defaults to `false`, otherwise,
|
|
||||||
defaults to `true` or to the parent `object` type setting.
|
|
||||||
|
|
||||||
|`ignore_malformed` |Ignored a malformed number. Defaults to `false`.
|
|
||||||
|
|
||||||
|=======================================================================
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[boolean]]
|
|
||||||
==== Boolean
|
|
||||||
|
|
||||||
The boolean type Maps to the JSON boolean type. It ends up storing
|
|
||||||
within the index either `T` or `F`, with automatic translation to `true`
|
|
||||||
and `false` respectively.
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"hes_my_special_tweet" : {
|
|
||||||
"type" : "boolean"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The boolean type also supports passing the value as a number or a string
|
|
||||||
(in this case `0`, an empty string, `false`, `off` and `no` are
|
|
||||||
`false`, all other values are `true`).
|
|
||||||
|
|
||||||
The following table lists all the attributes that can be used with the
|
|
||||||
boolean type:
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Attribute |Description
|
|
||||||
|`index_name` |The name of the field that will be stored in the index.
|
|
||||||
Defaults to the property/field name.
|
|
||||||
|
|
||||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
|
||||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
|
||||||
and it can be retrieved from it).
|
|
||||||
|
|
||||||
|`index` |Set to `no` if the value should not be indexed. Setting to
|
|
||||||
`no` disables `include_in_all`. If set to `no` the field should be either stored
|
|
||||||
in `_source`, have `include_in_all` enabled, or `store` be set to
|
|
||||||
`true` for this to be useful.
|
|
||||||
|
|
||||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
|
||||||
Automatically set to `true` when the fielddata format is `doc_values`.
|
|
||||||
|
|
||||||
|`boost` |The boost value. Defaults to `1.0`.
|
|
||||||
|
|
||||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
|
||||||
`null_value` as the field value. Defaults to not adding the field at
|
|
||||||
all.
|
|
||||||
|=======================================================================
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[binary]]
|
|
||||||
==== Binary
|
|
||||||
|
|
||||||
The binary type is a base64 representation of binary data that can be
|
|
||||||
stored in the index. The field is not stored by default and not indexed at
|
|
||||||
all.
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"image" : {
|
|
||||||
"type" : "binary"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The following table lists all the attributes that can be used with the
|
|
||||||
binary type:
|
|
||||||
|
|
||||||
[horizontal]
|
|
||||||
|
|
||||||
`index_name`::
|
|
||||||
|
|
||||||
The name of the field that will be stored in the index. Defaults to the
|
|
||||||
property/field name.
|
|
||||||
|
|
||||||
`store`::
|
|
||||||
|
|
||||||
Set to `true` to store actual field in the index, `false` to not store it.
|
|
||||||
Defaults to `false` (note, the JSON document itself is already stored, so
|
|
||||||
the binary field can be retrieved from there).
|
|
||||||
|
|
||||||
`doc_values`::
|
|
||||||
|
|
||||||
Set to `true` to store field values in a column-stride fashion.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[fielddata-filters]]
|
|
||||||
==== Fielddata filters
|
|
||||||
|
|
||||||
It is possible to control which field values are loaded into memory,
|
|
||||||
which is particularly useful for aggregations on string fields, using
|
|
||||||
fielddata filters, which are explained in detail in the
|
|
||||||
<<modules-fielddata,Fielddata>> section.
|
|
||||||
|
|
||||||
Fielddata filters can exclude terms which do not match a regex, or which
|
|
||||||
don't fall between a `min` and `max` frequency range:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
tweet: {
|
|
||||||
type: "string",
|
|
||||||
analyzer: "whitespace"
|
|
||||||
fielddata: {
|
|
||||||
filter: {
|
|
||||||
regex: {
|
|
||||||
"pattern": "^#.*"
|
|
||||||
},
|
|
||||||
frequency: {
|
|
||||||
min: 0.001,
|
|
||||||
max: 0.1,
|
|
||||||
min_segment_size: 500
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
These filters can be updated on an existing field mapping and will take
|
|
||||||
effect the next time the fielddata for a segment is loaded. Use the
|
|
||||||
<<indices-clearcache,Clear Cache>> API
|
|
||||||
to reload the fielddata using the new filters.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Similarity
|
|
||||||
|
|
||||||
Elasticsearch allows you to configure a similarity (scoring algorithm) per field.
|
|
||||||
The `similarity` setting provides a simple way of choosing a similarity algorithm
|
|
||||||
other than the default TF/IDF, such as `BM25`.
|
|
||||||
|
|
||||||
You can configure similarities via the
|
|
||||||
<<index-modules-similarity,similarity module>>
|
|
||||||
|
|
||||||
[float]
|
|
||||||
===== Configuring Similarity per Field
|
|
||||||
|
|
||||||
Defining the Similarity for a field is done via the `similarity` mapping
|
|
||||||
property, as this example shows:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"book":{
|
|
||||||
"properties":{
|
|
||||||
"title":{
|
|
||||||
"type":"string", "similarity":"BM25"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The following Similarities are configured out-of-box:
|
|
||||||
|
|
||||||
`default`::
|
|
||||||
The Default TF/IDF algorithm used by Elasticsearch and
|
|
||||||
Lucene in previous versions.
|
|
||||||
|
|
||||||
`BM25`::
|
|
||||||
The BM25 algorithm.
|
|
||||||
http://en.wikipedia.org/wiki/Okapi_BM25[See Okapi_BM25] for more
|
|
||||||
details.
|
|
||||||
|
|
||||||
|
|
||||||
[[copy-to]]
|
|
||||||
[float]
|
|
||||||
===== Copy to field
|
|
||||||
|
|
||||||
Adding `copy_to` parameter to any field mapping will cause all values of this field to be copied to fields specified in
|
|
||||||
the parameter. In the following example all values from fields `title` and `abstract` will be copied to the field
|
|
||||||
`meta_data`. The field which is being copied to will be indexed (i.e. searchable, and available through `fielddata_field`) but the original source will not be modified.
|
|
||||||
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"book" : {
|
|
||||||
"properties" : {
|
|
||||||
"title" : { "type" : "string", "copy_to" : "meta_data" },
|
|
||||||
"abstract" : { "type" : "string", "copy_to" : "meta_data" },
|
|
||||||
"meta_data" : { "type" : "string" }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Multiple fields are also supported:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"book" : {
|
|
||||||
"properties" : {
|
|
||||||
"title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
[[multi-fields]]
|
|
||||||
===== Multi fields
|
|
||||||
|
|
||||||
The `fields` options allows to map several core types fields into a single
|
|
||||||
json source field. This can be useful if a single field need to be
|
|
||||||
used in different ways. For example a single field is to be used for both
|
|
||||||
free text search and sorting.
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"name" : {
|
|
||||||
"type" : "string",
|
|
||||||
"index" : "analyzed",
|
|
||||||
"fields" : {
|
|
||||||
"raw" : {"type" : "string", "index" : "not_analyzed"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In the above example the field `name` gets processed twice. The first time it gets
|
|
||||||
processed as an analyzed string and this version is accessible under the field name
|
|
||||||
`name`, this is the main field and is in fact just like any other field. The second time
|
|
||||||
it gets processed as a not analyzed string and is accessible under the name `name.raw`.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Include in All
|
|
||||||
|
|
||||||
The `include_in_all` setting is ignored on any field that is defined in
|
|
||||||
the `fields` options. Setting the `include_in_all` only makes sense on
|
|
||||||
the main field, since the raw field value is copied to the `_all` field,
|
|
||||||
the tokens aren't copied.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Updating a field
|
|
||||||
|
|
||||||
In essence a field cannot be updated. However multi fields can be
|
|
||||||
added to existing fields. This allows for example to have a different
|
|
||||||
`analyzer` configuration in addition to the already configured
|
|
||||||
`analyzer` configuration specified in the main and other multi fields.
|
|
||||||
|
|
||||||
Also the new multi field will only be applied on document that have been
|
|
||||||
added after the multi field has been added and in fact the new multi field
|
|
||||||
doesn't exist in existing documents.
|
|
||||||
|
|
||||||
Another important note is that new multi fields will be merged into the
|
|
||||||
list of existing multi fields, so when adding new multi fields for a field
|
|
||||||
previous added multi fields don't need to be specified.
|
|
|
@ -0,0 +1,138 @@
|
||||||
|
[[date]]
|
||||||
|
=== Date datatype
|
||||||
|
|
||||||
|
JSON doesn't have a date datatype, so dates in Elasticsearch can either be:
|
||||||
|
|
||||||
|
* strings containing formatted dates, e.g. `¨2015-01-01¨` or `¨2015/01/01 12:10:30`.
|
||||||
|
* a long number representing _milliseconds-since-the-epoch_.
|
||||||
|
* an integer representing _seconds-since-the-epoch_.
|
||||||
|
|
||||||
|
Internally, dates are converted to UTC (if the time-zone is specified) and
|
||||||
|
stored as a long number representing milliseconds-since-the-epoch.
|
||||||
|
|
||||||
|
Date formats can be customised, but if no `format` is specified then it uses
|
||||||
|
the default: `strictDateOptionalTime||epoch_millis`. This means that it will
|
||||||
|
accept dates with optional timestamps, which conform to the formats supported
|
||||||
|
by <<strict-date-time,`strictDateOptionalTime`>> or milliseconds-since-the-epoch.
|
||||||
|
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"date": {
|
||||||
|
"type": "date" <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{ "date": "2015-01-01" } <2>
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{ "date": "2015-01-01T12:10:30Z" } <3>
|
||||||
|
|
||||||
|
PUT my_index/my_type/3
|
||||||
|
{ "date": 1420070400001 } <4>
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"sort": { "date": "asc"} <5>
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `date` field uses the default `format`.
|
||||||
|
<2> This document uses a plain date.
|
||||||
|
<3> This document includes a time.
|
||||||
|
<4> This document uses milliseconds-since-the-epoch.
|
||||||
|
<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch.
|
||||||
|
|
||||||
|
[[multiple-date-formats]]
|
||||||
|
==== Multiple date formats
|
||||||
|
|
||||||
|
Multiple formats can be specified by separating them with `||` as a separator.
|
||||||
|
Each format will be tried in turn until a matching format is found. The first
|
||||||
|
format will be used to convert the _milliseconds-since-the-epoch_ value back
|
||||||
|
into a string.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"date": {
|
||||||
|
"type": "date",
|
||||||
|
"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
[[date-params]]
|
||||||
|
==== Parameters for `date` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `date` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<index-boost,`boost`>>::
|
||||||
|
|
||||||
|
Field-level index time boosting. Accepts a floating point number, defaults
|
||||||
|
to `1.0`.
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` (default) or `false`.
|
||||||
|
|
||||||
|
<<mapping-date-format,`format`>>::
|
||||||
|
|
||||||
|
The date format(s) that can be parsed. Defaults to
|
||||||
|
`epoch_millis||strictDateOptionalTime`.
|
||||||
|
|
||||||
|
<<ignore-malformed,`ignore_malformed`>>::
|
||||||
|
|
||||||
|
If `true`, malformed numbers are ignored. If `false` (default), malformed
|
||||||
|
numbers throw an exception and reject the whole document.
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Whether or not the field value should be included in the
|
||||||
|
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||||
|
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||||
|
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||||
|
Otherwise defaults to `true`.
|
||||||
|
|
||||||
|
<<mapping-index,`index`>>::
|
||||||
|
|
||||||
|
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||||
|
|
||||||
|
<<null-value,`null_value`>>::
|
||||||
|
|
||||||
|
Accepts a date value in one of the configured +format+'s as the field
|
||||||
|
which is substituted for any explicit `null` values. Defaults to `null`,
|
||||||
|
which means the field is treated as missing.
|
||||||
|
|
||||||
|
<<precision-step,`precision_step`>>::
|
||||||
|
|
||||||
|
Controls the number of extra terms that are indexed to make
|
||||||
|
<<query-dsl-range-query,`range` queries>> faster. Defaults to `16`.
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
||||||
|
|
||||||
|
|
|
@ -1,215 +0,0 @@
|
||||||
[[mapping-geo-point-type]]
|
|
||||||
=== Geo Point Type
|
|
||||||
|
|
||||||
Mapper type called `geo_point` to support geo based points. The
|
|
||||||
declaration looks as follows:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"pin" : {
|
|
||||||
"properties" : {
|
|
||||||
"location" : {
|
|
||||||
"type" : "geo_point"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Indexed Fields
|
|
||||||
|
|
||||||
The `geo_point` mapping will index a single field with the format of
|
|
||||||
`lat,lon`. The `lat_lon` option can be set to also index the `.lat` and
|
|
||||||
`.lon` as numeric fields, and `geohash` can be set to `true` to also
|
|
||||||
index `.geohash` value.
|
|
||||||
|
|
||||||
A good practice is to enable indexing `lat_lon` as well, since both the
|
|
||||||
geo distance and bounding box filters can either be executed using in
|
|
||||||
memory checks, or using the indexed lat lon values, and it really
|
|
||||||
depends on the data set which one performs better. Note though, that
|
|
||||||
indexed lat lon only make sense when there is a single geo point value
|
|
||||||
for the field, and not multi values.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Geohashes
|
|
||||||
|
|
||||||
Geohashes are a form of lat/lon encoding which divides the earth up into
|
|
||||||
a grid. Each cell in this grid is represented by a geohash string. Each
|
|
||||||
cell in turn can be further subdivided into smaller cells which are
|
|
||||||
represented by a longer string. So the longer the geohash, the smaller
|
|
||||||
(and thus more accurate) the cell is.
|
|
||||||
|
|
||||||
Because geohashes are just strings, they can be stored in an inverted
|
|
||||||
index like any other string, which makes querying them very efficient.
|
|
||||||
|
|
||||||
If you enable the `geohash` option, a `geohash` ``sub-field'' will be
|
|
||||||
indexed as, eg `pin.geohash`. The length of the geohash is controlled by
|
|
||||||
the `geohash_precision` parameter, which can either be set to an absolute
|
|
||||||
length (eg `12`, the default) or to a distance (eg `1km`).
|
|
||||||
|
|
||||||
More usefully, set the `geohash_prefix` option to `true` to not only index
|
|
||||||
the geohash value, but all the enclosing cells as well. For instance, a
|
|
||||||
geohash of `u30` will be indexed as `[u,u3,u30]`. This option can be used
|
|
||||||
by the <<query-dsl-geohash-cell-query>> to find geopoints within a
|
|
||||||
particular cell very efficiently.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Input Structure
|
|
||||||
|
|
||||||
The above mapping defines a `geo_point`, which accepts different
|
|
||||||
formats. The following formats are supported:
|
|
||||||
|
|
||||||
[float]
|
|
||||||
===== Lat Lon as Properties
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"pin" : {
|
|
||||||
"location" : {
|
|
||||||
"lat" : 41.12,
|
|
||||||
"lon" : -71.34
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
===== Lat Lon as String
|
|
||||||
|
|
||||||
Format in `lat,lon`.
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"pin" : {
|
|
||||||
"location" : "41.12,-71.34"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
===== Geohash
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"pin" : {
|
|
||||||
"location" : "drm3btev3e86"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
===== Lat Lon as Array
|
|
||||||
|
|
||||||
Format in `[lon, lat]`, note, the order of lon/lat here in order to
|
|
||||||
conform with http://geojson.org/[GeoJSON].
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"pin" : {
|
|
||||||
"location" : [-71.34, 41.12]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Mapping Options
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Option |Description
|
|
||||||
|`lat_lon` |Set to `true` to also index the `.lat` and `.lon` as fields.
|
|
||||||
Defaults to `false`.
|
|
||||||
|
|
||||||
|`geohash` |Set to `true` to also index the `.geohash` as a field.
|
|
||||||
Defaults to `false`.
|
|
||||||
|
|
||||||
|`geohash_precision` |Sets the geohash precision. It can be set to an
|
|
||||||
absolute geohash length or a distance value (eg 1km, 1m, 1ml) defining
|
|
||||||
the size of the smallest cell. Defaults to an absolute length of 12.
|
|
||||||
|
|
||||||
|`geohash_prefix` |If this option is set to `true`, not only the geohash
|
|
||||||
but also all its parent cells (true prefixes) will be indexed as well. The
|
|
||||||
number of terms that will be indexed depends on the `geohash_precision`.
|
|
||||||
Defaults to `false`. *Note*: This option implicitly enables `geohash`.
|
|
||||||
|
|
||||||
|`validate` |Set to `false` to accept geo points with invalid latitude or
|
|
||||||
longitude (default is `true`). *Note*: Validation only works when
|
|
||||||
normalization has been disabled. This option will be deprecated and removed
|
|
||||||
in upcoming releases.
|
|
||||||
|
|
||||||
|`validate_lat` |Set to `false` to accept geo points with an invalid
|
|
||||||
latitude (default is `true`). This option will be deprecated and removed
|
|
||||||
in upcoming releases.
|
|
||||||
|
|
||||||
|`validate_lon` |Set to `false` to accept geo points with an invalid
|
|
||||||
longitude (default is `true`). This option will be deprecated and removed
|
|
||||||
in upcoming releases.
|
|
||||||
|
|
||||||
|`normalize` |Set to `true` to normalize latitude and longitude (default
|
|
||||||
is `true`).
|
|
||||||
|
|
||||||
|`normalize_lat` |Set to `true` to normalize latitude.
|
|
||||||
|
|
||||||
|`normalize_lon` |Set to `true` to normalize longitude.
|
|
||||||
|
|
||||||
|`precision_step` |The precision step (influences the number of terms
|
|
||||||
generated for each number value) for `.lat` and `.lon` fields
|
|
||||||
if `lat_lon` is set to `true`.
|
|
||||||
Defaults to `16`.
|
|
||||||
|=======================================================================
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Field data
|
|
||||||
|
|
||||||
By default, geo points use the `array` format which loads geo points into two
|
|
||||||
parallel double arrays, making sure there is no precision loss. However, this
|
|
||||||
can require a non-negligible amount of memory (16 bytes per document) which is
|
|
||||||
why Elasticsearch also provides a field data implementation with lossy
|
|
||||||
compression called `compressed`:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"pin" : {
|
|
||||||
"properties" : {
|
|
||||||
"location" : {
|
|
||||||
"type" : "geo_point",
|
|
||||||
"fielddata" : {
|
|
||||||
"format" : "compressed",
|
|
||||||
"precision" : "1cm"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
This field data format comes with a `precision` option which allows to
|
|
||||||
configure how much precision can be traded for memory. The default value is
|
|
||||||
`1cm`. The following table presents values of the memory savings given various
|
|
||||||
precisions:
|
|
||||||
|
|
||||||
|=============================================
|
|
||||||
| Precision | Bytes per point | Size reduction
|
|
||||||
| 1km | 4 | 75%
|
|
||||||
| 3m | 6 | 62.5%
|
|
||||||
| 1cm | 8 | 50%
|
|
||||||
| 1mm | 10 | 37.5%
|
|
||||||
|=============================================
|
|
||||||
|
|
||||||
Precision can be changed on a live index by using the update mapping API.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Usage in Scripts
|
|
||||||
|
|
||||||
When using `doc[geo_field_name]` (in the above mapping,
|
|
||||||
`doc['location']`), the `doc[...].value` returns a `GeoPoint`, which
|
|
||||||
then allows access to `lat` and `lon` (for example,
|
|
||||||
`doc[...].value.lat`). For performance, it is better to access the `lat`
|
|
||||||
and `lon` directly using `doc[...].lat` and `doc[...].lon`.
|
|
|
@ -0,0 +1,167 @@
|
||||||
|
[[geo-point]]
|
||||||
|
=== Geo-point datatype
|
||||||
|
|
||||||
|
Fields of type `geo_point` accept latitude-longitude pairs, which can be used:
|
||||||
|
|
||||||
|
* to find geo-points within a <<query-dsl-geo-bounding-box-query,bounding box>>,
|
||||||
|
within a certain <<query-dsl-geo-distance-query,distance>> of a central point,
|
||||||
|
within a <<query-dsl-geo-polygon-query,polygon>>, or within a
|
||||||
|
<<query-dsl-geohash-cell-query,geohash>> cell.
|
||||||
|
* to aggregate documents by <<search-aggregations-bucket-geohashgrid-aggregation,geographically>>
|
||||||
|
or by <<search-aggregations-bucket-geodistance-aggregation,distance>> from a central point.
|
||||||
|
* to integerate distance into a document's <<query-dsl-function-score-query,relevance score>>.
|
||||||
|
* to <<geo-sorting,sort>> documents by distance.
|
||||||
|
|
||||||
|
There are four ways that a geo-point may be specified, as demonstrated below:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"location": {
|
||||||
|
"type": "geo_point"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"text": "Geo-point as an object",
|
||||||
|
"location": { <1>
|
||||||
|
"lat": 41.12,
|
||||||
|
"lon": -71.34
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{
|
||||||
|
"text": "Geo-point as a string",
|
||||||
|
"location": "41.12,-71.34" <2>
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/3
|
||||||
|
{
|
||||||
|
"text": "Geo-point as a geohash",
|
||||||
|
"location": "drm3btev3e86" <3>
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/4
|
||||||
|
{
|
||||||
|
"text": "Geo-point as an array",
|
||||||
|
"location": [ -71.34, 41.12 ] <4>
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"geo_bounding_box": { <5>
|
||||||
|
"location": {
|
||||||
|
"top_left": {
|
||||||
|
"lat": 42,
|
||||||
|
"lon": -72
|
||||||
|
},
|
||||||
|
"bottom_right": {
|
||||||
|
"lat": 40,
|
||||||
|
"lon": -74
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> Geo-point expressed as an object, with `lat` and `lon` keys.
|
||||||
|
<2> Geo-point expressed as a string with the format: `"lat,lon"`.
|
||||||
|
<3> Geo-point expressed as a geohash.
|
||||||
|
<4> Geo-point expressed as an array with the format: [ `lon`, `lat`]
|
||||||
|
<5> A geo-bounding box query which finds all geo-points that fall inside the box.
|
||||||
|
|
||||||
|
[IMPORTANT]
|
||||||
|
.Geo-points expressed as an array or string
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
Please note that string geo-points are ordered as `lat,lon`, while array
|
||||||
|
geo-points are ordered as the reverse: `lon,lat`.
|
||||||
|
|
||||||
|
Originally, `lat,lon` was used for both array and string, but the array
|
||||||
|
format was changed early on to conform to the format used by GeoJSON.
|
||||||
|
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
|
||||||
|
[[geo-point-params]]
|
||||||
|
==== Parameters for `geo_point` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `geo_point` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<coerce,`coerce`>>::
|
||||||
|
|
||||||
|
Normalize longitude and latitude values to a standard -180:180 / -90:90
|
||||||
|
coordinate system. Accepts `true` and `false` (default).
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` (default) or `false`.
|
||||||
|
|
||||||
|
<<geohash,`geohash`>>::
|
||||||
|
|
||||||
|
Should the geo-point also be indexed as a geohash in the `.geohash`
|
||||||
|
sub-field? Defaults to `false`, unless `geohash_prefix` is `true`.
|
||||||
|
|
||||||
|
<<geohash-precision,`geohash_precision`>>::
|
||||||
|
|
||||||
|
The maximum length of the geohash to use for the `geohash` and
|
||||||
|
`geohash_prefix` options.
|
||||||
|
|
||||||
|
<<geohash-prefix,`geohash_prefix`>>::
|
||||||
|
|
||||||
|
Should the geo-point also be indexed as a geohash plus all its prefixes?
|
||||||
|
Defaults to `false`.
|
||||||
|
|
||||||
|
<<ignore-malformed,`ignore_malformed`>>::
|
||||||
|
|
||||||
|
If `true`, malformed geo-points are ignored. If `false` (default),
|
||||||
|
malformed geo-points throw an exception and reject the whole document.
|
||||||
|
|
||||||
|
<<lat-lon,`lat_lon`>>::
|
||||||
|
|
||||||
|
Should the geo-point also be indexed as `.lat` and `.lon` sub-fields?
|
||||||
|
Accepts `true` and `false` (default).
|
||||||
|
|
||||||
|
<<precision-step,`precision_step`>>::
|
||||||
|
|
||||||
|
Controls the number of extra terms that are indexed for each lat/lon point.
|
||||||
|
Defaults to `16`. Ignored if `lat_lon` is `false`.
|
||||||
|
|
||||||
|
|
||||||
|
==== Using geo-points in scripts
|
||||||
|
|
||||||
|
When accessing the value of a geo-point in a script, the value is returned as
|
||||||
|
a `GeoPoint` object, which allows access to the `.lat` and `.lon` values
|
||||||
|
respectively:
|
||||||
|
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
geopoint = doc['location'].value;
|
||||||
|
lat = geopoint.lat;
|
||||||
|
lon = geopoint.lon;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
For performance reasons, it is better to access the lat/lon values directly:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
lat = doc['location'].lat;
|
||||||
|
lon = doc['location'].lon;
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
[[mapping-geo-shape-type]]
|
[[geo-shape]]
|
||||||
=== Geo Shape Type
|
=== Geo-Shape datatype
|
||||||
|
|
||||||
The `geo_shape` mapping type facilitates the indexing of and searching
|
The `geo_shape` datatype facilitates the indexing of and searching
|
||||||
with arbitrary geo shapes such as rectangles and polygons. It should be
|
with arbitrary geo shapes such as rectangles and polygons. It should be
|
||||||
used when either the data being indexed or the queries being executed
|
used when either the data being indexed or the queries being executed
|
||||||
contain shapes other than just points.
|
contain shapes other than just points.
|
|
@ -1,40 +0,0 @@
|
||||||
[[mapping-ip-type]]
|
|
||||||
=== IP Type
|
|
||||||
|
|
||||||
An `ip` mapping type allows to store _ipv4_ addresses in a numeric form
|
|
||||||
allowing to easily sort, and range query it (using ip values).
|
|
||||||
|
|
||||||
The following table lists all the attributes that can be used with an ip
|
|
||||||
type:
|
|
||||||
|
|
||||||
[cols="<,<",options="header",]
|
|
||||||
|=======================================================================
|
|
||||||
|Attribute |Description
|
|
||||||
|`index_name` |The name of the field that will be stored in the index.
|
|
||||||
Defaults to the property/field name.
|
|
||||||
|
|
||||||
|`store` |Set to `true` to store actual field in the index, `false` to not
|
|
||||||
store it. Defaults to `false` (note, the JSON document itself is stored,
|
|
||||||
and it can be retrieved from it).
|
|
||||||
|
|
||||||
|`index` |Set to `no` if the value should not be indexed. In this case,
|
|
||||||
`store` should be set to `true`, since if it's not indexed and not
|
|
||||||
stored, there is nothing to do with it.
|
|
||||||
|
|
||||||
|`precision_step` |The precision step (influences the number of terms
|
|
||||||
generated for each number value). Defaults to `16`.
|
|
||||||
|
|
||||||
|`boost` |The boost value. Defaults to `1.0`.
|
|
||||||
|
|
||||||
|`null_value` |When there is a (JSON) null value for the field, use the
|
|
||||||
`null_value` as the field value. Defaults to not adding the field at
|
|
||||||
all.
|
|
||||||
|
|
||||||
|`include_in_all` |Should the field be included in the `_all` field (if
|
|
||||||
enabled). Defaults to `true` or to the parent `object` type setting.
|
|
||||||
|
|
||||||
|`doc_values` |Set to `true` to store field values in a column-stride fashion.
|
|
||||||
Automatically set to `true` when the <<fielddata-formats,`fielddata` format>> is `doc_values`.
|
|
||||||
|
|
||||||
|=======================================================================
|
|
||||||
|
|
|
@ -0,0 +1,89 @@
|
||||||
|
[[ip]]
|
||||||
|
=== IPv4 datatype
|
||||||
|
|
||||||
|
An `ip` field is really a <<number,`long`>> field which accepts
|
||||||
|
https://en.wikipedia.org/wiki/IPv4[IPv4] addresses and indexes them as long
|
||||||
|
values:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"ip_addr": {
|
||||||
|
"type": "ip"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"ip_addr": "192.168.1.1"
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"range": {
|
||||||
|
"ip_addr": {
|
||||||
|
"gte": "192.168.1.0",
|
||||||
|
"lt": "192.168.2.0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
|
||||||
|
[[ip-params]]
|
||||||
|
==== Parameters for `ip` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `ip` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<index-boost,`boost`>>::
|
||||||
|
|
||||||
|
Field-level index time boosting. Accepts a floating point number, defaults
|
||||||
|
to `1.0`.
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` (default) or `false`.
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Whether or not the field value should be included in the
|
||||||
|
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||||
|
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||||
|
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||||
|
Otherwise defaults to `true`.
|
||||||
|
|
||||||
|
<<mapping-index,`index`>>::
|
||||||
|
|
||||||
|
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||||
|
|
||||||
|
<<null-value,`null_value`>>::
|
||||||
|
|
||||||
|
Accepts an IPv4 value which is substituted for any explicit `null` values.
|
||||||
|
Defaults to `null`, which means the field is treated as missing.
|
||||||
|
|
||||||
|
<<precision-step,`precision_step`>>::
|
||||||
|
|
||||||
|
Controls the number of extra terms that are indexed to make
|
||||||
|
<<query-dsl-range-query,`range` queries>> faster. Defaults to `16`.
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
||||||
|
|
||||||
|
|
||||||
|
NOTE: IPv6 addresses are not supported yet.
|
|
@ -1,165 +0,0 @@
|
||||||
[[mapping-nested-type]]
|
|
||||||
=== Nested Type
|
|
||||||
|
|
||||||
The `nested` type works like the <<mapping-object-type,`object` type>> except
|
|
||||||
that an array of `objects` is flattened, while an array of `nested` objects
|
|
||||||
allows each object to be queried independently. To explain, consider this
|
|
||||||
document:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"group" : "fans",
|
|
||||||
"user" : [
|
|
||||||
{
|
|
||||||
"first" : "John",
|
|
||||||
"last" : "Smith"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"first" : "Alice",
|
|
||||||
"last" : "White"
|
|
||||||
},
|
|
||||||
]
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
If the `user` field is of type `object`, this document would be indexed
|
|
||||||
internally something like this:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"group" : "fans",
|
|
||||||
"user.first" : [ "alice", "john" ],
|
|
||||||
"user.last" : [ "smith", "white" ]
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The `first` and `last` fields are flattened, and the association between
|
|
||||||
`alice` and `white` is lost. This document would incorrectly match a query
|
|
||||||
for `alice AND smith`.
|
|
||||||
|
|
||||||
If the `user` field is of type `nested`, each object is indexed as a separate
|
|
||||||
document, something like this:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{ <1>
|
|
||||||
"user.first" : "alice",
|
|
||||||
"user.last" : "white"
|
|
||||||
}
|
|
||||||
{ <1>
|
|
||||||
"user.first" : "john",
|
|
||||||
"user.last" : "smith"
|
|
||||||
}
|
|
||||||
{ <2>
|
|
||||||
"group" : "fans"
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
<1> Hidden nested documents.
|
|
||||||
<2> Visible ``parent'' document.
|
|
||||||
|
|
||||||
By keeping each nested object separate, the association between the
|
|
||||||
`user.first` and `user.last` fields is maintained. The query for `alice AND
|
|
||||||
smith` would *not* match this document.
|
|
||||||
|
|
||||||
Searching on nested docs can be done using either the
|
|
||||||
<<query-dsl-nested-query,nested query>>.
|
|
||||||
|
|
||||||
==== Mapping
|
|
||||||
|
|
||||||
The mapping for `nested` fields is the same as `object` fields, except that it
|
|
||||||
uses type `nested`:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"type1" : {
|
|
||||||
"properties" : {
|
|
||||||
"user" : {
|
|
||||||
"type" : "nested",
|
|
||||||
"properties": {
|
|
||||||
"first" : {"type": "string" },
|
|
||||||
"last" : {"type": "string" }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
NOTE: changing an `object` type to `nested` type requires reindexing.
|
|
||||||
|
|
||||||
You may want to index inner objects both as `nested` fields *and* as flattened
|
|
||||||
`object` fields, eg for highlighting. This can be achieved by setting
|
|
||||||
`include_in_parent` to `true`:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"type1" : {
|
|
||||||
"properties" : {
|
|
||||||
"user" : {
|
|
||||||
"type" : "nested",
|
|
||||||
"include_in_parent": true,
|
|
||||||
"properties": {
|
|
||||||
"first" : {"type": "string" },
|
|
||||||
"last" : {"type": "string" }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The result of indexing our example document would be something like this:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{ <1>
|
|
||||||
"user.first" : "alice",
|
|
||||||
"user.last" : "white"
|
|
||||||
}
|
|
||||||
{ <1>
|
|
||||||
"user.first" : "john",
|
|
||||||
"user.last" : "smith"
|
|
||||||
}
|
|
||||||
{ <2>
|
|
||||||
"group" : "fans",
|
|
||||||
"user.first" : [ "alice", "john" ],
|
|
||||||
"user.last" : [ "smith", "white" ]
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
<1> Hidden nested documents.
|
|
||||||
<2> Visible ``parent'' document.
|
|
||||||
|
|
||||||
|
|
||||||
Nested fields may contain other nested fields. The `include_in_parent` object
|
|
||||||
refers to the direct parent of the field, while the `include_in_root`
|
|
||||||
parameter refers only to the topmost ``root'' object or document.
|
|
||||||
|
|
||||||
NOTE: The `include_in_parent` and `include_in_root` options do not apply
|
|
||||||
to <<mapping-geo-shape-type,`geo_shape` fields>>, which are only ever
|
|
||||||
indexed inside the nested document.
|
|
||||||
|
|
||||||
Nested docs will automatically use the root doc `_all` field only.
|
|
||||||
|
|
||||||
.Internal Implementation
|
|
||||||
*********************************************
|
|
||||||
Internally, nested objects are indexed as additional documents, but,
|
|
||||||
since they can be guaranteed to be indexed within the same "block", it
|
|
||||||
allows for extremely fast joining with parent docs.
|
|
||||||
|
|
||||||
Those internal nested documents are automatically masked away when doing
|
|
||||||
operations against the index (like searching with a match_all query),
|
|
||||||
and they bubble out when using the nested query.
|
|
||||||
|
|
||||||
Because nested docs are always masked to the parent doc, the nested docs
|
|
||||||
can never be accessed outside the scope of the `nested` query. For example
|
|
||||||
stored fields can be enabled on fields inside nested objects, but there is
|
|
||||||
no way of retrieving them, since stored fields are fetched outside of
|
|
||||||
the `nested` query scope.
|
|
||||||
|
|
||||||
The `_source` field is always associated with the parent document and
|
|
||||||
because of that field values via the source can be fetched for nested object.
|
|
||||||
*********************************************
|
|
|
@ -0,0 +1,201 @@
|
||||||
|
[[nested]]
|
||||||
|
=== Nested datatype
|
||||||
|
|
||||||
|
The `nested` type is a specialised version of the <<object,`object`>> datatype
|
||||||
|
that allows arrays of objects to be indexed and queried independently of each
|
||||||
|
other.
|
||||||
|
|
||||||
|
==== How arrays of objects are flattened
|
||||||
|
|
||||||
|
Arrays of inner <<object,`object` fields>> do not work the way you may expect.
|
||||||
|
Lucene has no concept of inner objects, so Elasticsearch flattens object
|
||||||
|
hierarchies into a simple list of field names and values. For instance, the
|
||||||
|
following document:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"group" : "fans",
|
||||||
|
"user" : [ <1>
|
||||||
|
{
|
||||||
|
"first" : "John",
|
||||||
|
"last" : "Smith"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"first" : "Alice",
|
||||||
|
"last" : "White"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `user` field is dynamically added as a field of type `object`.
|
||||||
|
|
||||||
|
would be transformed internally into a document that looks more like this:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"group" : "fans",
|
||||||
|
"user.first" : [ "alice", "john" ],
|
||||||
|
"user.last" : [ "smith", "white" ]
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
The `user.first` and `user.last` fields are flattened into multi-value fields,
|
||||||
|
and the association between `alice` and `white` is lost. This document would
|
||||||
|
incorrectly match a query for `alice AND smith`:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"bool": {
|
||||||
|
"must": [
|
||||||
|
{ "match": { "user.first": "Alice" }},
|
||||||
|
{ "match": { "user.last": "White" }}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
==== Using `nested` fields for arrays of objects
|
||||||
|
|
||||||
|
If you need to index arrays of objects and to maintain the independence of
|
||||||
|
each object in the array, you should used the `nested` datatype instead of the
|
||||||
|
<<object,`object`>> datatype. Internally, nested objects index each object in
|
||||||
|
the array as a separate hidden document, meaning that each nested object can be
|
||||||
|
queried independently of the others, with the <<query-dsl-nested-query,`nested` query>>:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"user": {
|
||||||
|
"type": "nested" <1>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{
|
||||||
|
"group" : "fans",
|
||||||
|
"user" : [
|
||||||
|
{
|
||||||
|
"first" : "John",
|
||||||
|
"last" : "Smith"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"first" : "Alice",
|
||||||
|
"last" : "White"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"nested": {
|
||||||
|
"path": "user",
|
||||||
|
"query": {
|
||||||
|
"bool": {
|
||||||
|
"must": [
|
||||||
|
{ "match": { "user.first": "Alice" }},
|
||||||
|
{ "match": { "user.last": "White" }} <2>
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"nested": {
|
||||||
|
"path": "user",
|
||||||
|
"query": {
|
||||||
|
"bool": {
|
||||||
|
"must": [
|
||||||
|
{ "match": { "user.first": "Alice" }},
|
||||||
|
{ "match": { "user.last": "Smith" }} <3>
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"inner_hits": { <4>
|
||||||
|
"highlight": {
|
||||||
|
"fields": {
|
||||||
|
"user.first": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `user` field is mapped as type `nested` instead of type `object`.
|
||||||
|
<2> This query doesn't match because `Alice` and `White` are not in the same nested object.
|
||||||
|
<3> This query matches because `Alice` and `White` are in the same nested object.
|
||||||
|
<4> `inner_hits` allow us to highlight the matching nested documents.
|
||||||
|
|
||||||
|
|
||||||
|
Nested documents can be:
|
||||||
|
|
||||||
|
* queried with the <<query-dsl-nested-query,`nested`>> query.
|
||||||
|
* analyzed with the <<search-aggregations-bucket-nested-aggregation,`nested`>>
|
||||||
|
and <<search-aggregations-bucket-reverse-nested-aggregation, `reverse_nested`>>
|
||||||
|
aggregations.
|
||||||
|
* sorted with <<nested-sorting,nested sorting>>.
|
||||||
|
* retrieved and highlighted with <<nested-inner-hits,nested inner hits>>.
|
||||||
|
|
||||||
|
|
||||||
|
[[nested-params]]
|
||||||
|
==== Parameters for `nested` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `nested` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
<<dynamic,`dynamic`>>::
|
||||||
|
|
||||||
|
Whether or not new `properties` should be added dynamically to an existing
|
||||||
|
nested object. Accepts `true` (default), `false` and `strict`.
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Sets the default `include_in_all` value for all the `properties` within
|
||||||
|
the nested object. Nested documents do not have their own `_all` field.
|
||||||
|
Instead, values are added to the `_all` field of the main ``root''
|
||||||
|
document.
|
||||||
|
|
||||||
|
<<properties,`properties`>>::
|
||||||
|
|
||||||
|
The fields within the nested object, which can be of any
|
||||||
|
<<mapping-types,datatype>>, including `nested`. New properties
|
||||||
|
may be added to an existing nested object.
|
||||||
|
|
||||||
|
|
||||||
|
[IMPORTANT]
|
||||||
|
=============================================
|
||||||
|
|
||||||
|
Because nested documents are indexed as separate documents, they can only be
|
||||||
|
accessed within the scope of the `nested` query, the
|
||||||
|
`nested`/`reverse_nested`, or <<nested-inner-hits,nested inner hits>>.
|
||||||
|
|
||||||
|
For instance, if a string field within a nested document has
|
||||||
|
<<index-options,`index_options`>> set to `offsets` to allow use of the postings
|
||||||
|
highlighter, these offsets will not be available during the main highlighting
|
||||||
|
phase. Instead, highlighting needs to be performed via
|
||||||
|
<<nested-inner-hits,nested inner hits>>.
|
||||||
|
|
||||||
|
=============================================
|
||||||
|
|
|
@ -0,0 +1,93 @@
|
||||||
|
[[number]]
|
||||||
|
=== Numeric datatypes
|
||||||
|
|
||||||
|
The following numeric types are supported:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`long`:: A signed 64-bit integer with a minimum value of +-2^63^+ and a maximum value of +2^63^-1+.
|
||||||
|
`integer`:: A signed 32-bit integer with a minimum value of +-2^31^+ and a maximum value of +2^31^-1+.
|
||||||
|
`short`:: A signed 16-bit integer with a minimum value of +-32,768+ and a maximum value of +32,767+.
|
||||||
|
`byte`:: A signed 8-bit integer with a minimum value of +-128+ and a maximum value of +127+.
|
||||||
|
`double`:: A double-precision 64-bit IEEE 754 floating point.
|
||||||
|
`float`:: A single-precision 32-bit IEEE 754 floating point.
|
||||||
|
|
||||||
|
Below is an example of configuring a mapping with numeric fields:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"number_of_bytes": {
|
||||||
|
"type": "integer"
|
||||||
|
},
|
||||||
|
"time_in_seconds": {
|
||||||
|
"type": "float"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
|
||||||
|
[[number-params]]
|
||||||
|
==== Parameters for numeric fields
|
||||||
|
|
||||||
|
The following parameters are accepted by numeric types:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<coerce,`coerce`>>::
|
||||||
|
|
||||||
|
Try to convert strings to numbers and truncate fractions for integers.
|
||||||
|
Accepts `true` (default) and `false`.
|
||||||
|
|
||||||
|
<<index-boost,`boost`>>::
|
||||||
|
|
||||||
|
Field-level index time boosting. Accepts a floating point number, defaults
|
||||||
|
to `1.0`.
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` (default) or `false`.
|
||||||
|
|
||||||
|
<<ignore-malformed,`ignore_malformed`>>::
|
||||||
|
|
||||||
|
If `true`, malformed numbers are ignored. If `false` (default), malformed
|
||||||
|
numbers throw an exception and reject the whole document.
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Whether or not the field value should be included in the
|
||||||
|
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||||
|
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||||
|
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||||
|
Otherwise defaults to `true`.
|
||||||
|
|
||||||
|
<<mapping-index,`index`>>::
|
||||||
|
|
||||||
|
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||||
|
|
||||||
|
<<null-value,`null_value`>>::
|
||||||
|
|
||||||
|
Accepts a numeric value of the same `type` as the field which is
|
||||||
|
substituted for any explicit `null` values. Defaults to `null`, which
|
||||||
|
means the field is treated as missing.
|
||||||
|
|
||||||
|
<<precision-step,`precision_step`>>::
|
||||||
|
|
||||||
|
Controls the number of extra terms that are indexed to make
|
||||||
|
<<query-dsl-range-query,`range` queries>> faster. The default depends on the
|
||||||
|
numeric `type`.
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
||||||
|
|
||||||
|
|
|
@ -1,179 +0,0 @@
|
||||||
[[mapping-object-type]]
|
|
||||||
=== Object Type
|
|
||||||
|
|
||||||
JSON documents are hierarchical in nature, allowing them to define inner
|
|
||||||
"objects" within the actual JSON. Elasticsearch completely understands
|
|
||||||
the nature of these inner objects and can map them easily, providing
|
|
||||||
query support for their inner fields. Because each document can have
|
|
||||||
objects with different fields each time, objects mapped this way are
|
|
||||||
known as "dynamic". Dynamic mapping is enabled by default. Let's take
|
|
||||||
the following JSON as an example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"person" : {
|
|
||||||
"name" : {
|
|
||||||
"first_name" : "Shay",
|
|
||||||
"last_name" : "Banon"
|
|
||||||
},
|
|
||||||
"sid" : "12345"
|
|
||||||
},
|
|
||||||
"message" : "This is a tweet!"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The above shows an example where a tweet includes the actual `person`
|
|
||||||
details. A `person` is an object, with a `sid`, and a `name` object
|
|
||||||
which has `first_name` and `last_name`. It's important to note that
|
|
||||||
`tweet` is also an object, although it is a special
|
|
||||||
<<mapping-root-object-type,root object type>>
|
|
||||||
which allows for additional mapping definitions.
|
|
||||||
|
|
||||||
The following is an example of explicit mapping for the above JSON:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"person" : {
|
|
||||||
"type" : "object",
|
|
||||||
"properties" : {
|
|
||||||
"name" : {
|
|
||||||
"type" : "object",
|
|
||||||
"properties" : {
|
|
||||||
"first_name" : {"type" : "string"},
|
|
||||||
"last_name" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"sid" : {"type" : "string", "index" : "not_analyzed"}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"message" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In order to mark a mapping of type `object`, set the `type` to object.
|
|
||||||
This is an optional step, since if there are `properties` defined for
|
|
||||||
it, it will automatically be identified as an `object` mapping.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== properties
|
|
||||||
|
|
||||||
An object mapping can optionally define one or more properties using the
|
|
||||||
`properties` tag for a field. Each property can be either another
|
|
||||||
`object`, or one of the
|
|
||||||
<<mapping-core-types,core_types>>.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== dynamic
|
|
||||||
|
|
||||||
One of the most important features of Elasticsearch is its ability to be
|
|
||||||
schema-less. This means that, in our example above, the `person` object
|
|
||||||
can be indexed later with a new property -- `age`, for example -- and it
|
|
||||||
will automatically be added to the mapping definitions. Same goes for
|
|
||||||
the `tweet` root object.
|
|
||||||
|
|
||||||
This feature is by default turned on, and it's the `dynamic` nature of
|
|
||||||
each object mapped. Each object mapped is automatically dynamic, though
|
|
||||||
it can be explicitly turned off:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"person" : {
|
|
||||||
"type" : "object",
|
|
||||||
"properties" : {
|
|
||||||
"name" : {
|
|
||||||
"dynamic" : false,
|
|
||||||
"properties" : {
|
|
||||||
"first_name" : {"type" : "string"},
|
|
||||||
"last_name" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"sid" : {"type" : "string", "index" : "not_analyzed"}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"message" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In the above example, the `name` object mapped is not dynamic, meaning
|
|
||||||
that if, in the future, we try to index JSON with a `middle_name` within
|
|
||||||
the `name` object, it will get discarded and not added.
|
|
||||||
|
|
||||||
There is no performance overhead if an `object` is dynamic, the ability
|
|
||||||
to turn it off is provided as a safety mechanism so "malformed" objects
|
|
||||||
won't, by mistake, index data that we do not wish to be indexed.
|
|
||||||
|
|
||||||
If a dynamic object contains yet another inner `object`, it will be
|
|
||||||
automatically added to the index and mapped as well.
|
|
||||||
|
|
||||||
When processing dynamic new fields, their type is automatically derived.
|
|
||||||
For example, if it is a `number`, it will automatically be treated as
|
|
||||||
number <<mapping-core-types,core_type>>. Dynamic
|
|
||||||
fields default to their default attributes, for example, they are not
|
|
||||||
stored and they are always indexed.
|
|
||||||
|
|
||||||
Date fields are special since they are represented as a `string`. Date
|
|
||||||
fields are detected if they can be parsed as a date when they are first
|
|
||||||
introduced into the system. The set of date formats that are tested
|
|
||||||
against can be configured using the `dynamic_date_formats` on the root object,
|
|
||||||
which is explained later.
|
|
||||||
|
|
||||||
Note, once a field has been added, *its type can not change*. For
|
|
||||||
example, if we added age and its value is a number, then it can't be
|
|
||||||
treated as a string.
|
|
||||||
|
|
||||||
The `dynamic` parameter can also be set to `strict`, meaning that not
|
|
||||||
only will new fields not be introduced into the mapping, but also that parsing
|
|
||||||
(indexing) docs with such new fields will fail.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== enabled
|
|
||||||
|
|
||||||
The `enabled` flag allows to disable parsing and indexing a named object
|
|
||||||
completely. This is handy when a portion of the JSON document contains
|
|
||||||
arbitrary JSON which should not be indexed, nor added to the mapping.
|
|
||||||
For example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"properties" : {
|
|
||||||
"person" : {
|
|
||||||
"type" : "object",
|
|
||||||
"properties" : {
|
|
||||||
"name" : {
|
|
||||||
"type" : "object",
|
|
||||||
"enabled" : false
|
|
||||||
},
|
|
||||||
"sid" : {"type" : "string", "index" : "not_analyzed"}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"message" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In the above, `name` and its content will not be indexed at all.
|
|
||||||
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== include_in_all
|
|
||||||
|
|
||||||
`include_in_all` can be set on the `object` type level. When set, it
|
|
||||||
propagates down to all the inner mappings defined within the `object`
|
|
||||||
that do not explicitly set it.
|
|
||||||
|
|
|
@ -0,0 +1,105 @@
|
||||||
|
[[object]]
|
||||||
|
=== Object datatype
|
||||||
|
|
||||||
|
JSON documents are hierarchical in nature: the document may contain inner
|
||||||
|
objects which, in turn, may contain inner objects themselves:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{ <1>
|
||||||
|
"region": "US",
|
||||||
|
"manager": { <2>
|
||||||
|
"age": 30,
|
||||||
|
"name": { <3>
|
||||||
|
"first": "John",
|
||||||
|
"last": "Smith"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The outer document is also a JSON object.
|
||||||
|
<2> It contains an inner object called `manager`.
|
||||||
|
<3> Which in turn contains an inner object called `name`.
|
||||||
|
|
||||||
|
Internally, this document is indexed as a simple, flat list of key-value
|
||||||
|
pairs, something like this:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"region": "US",
|
||||||
|
"manager.age": 30,
|
||||||
|
"manager.name.first": "John",
|
||||||
|
"manager.name.last": "Smith"
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
|
||||||
|
An explicit mapping for the above document could look like this:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": { <1>
|
||||||
|
"properties": {
|
||||||
|
"region": {
|
||||||
|
"type": "string",
|
||||||
|
"index": "not_analyzed"
|
||||||
|
},
|
||||||
|
"manager": { <2>
|
||||||
|
"properties": {
|
||||||
|
"age": { "type": "integer" },
|
||||||
|
"name": { <3>
|
||||||
|
"properties": {
|
||||||
|
"first": { "type": "string" },
|
||||||
|
"last": { "type": "string" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The mapping type is a type of object, and has a `properties` field.
|
||||||
|
<2> The `manager` field is an inner `object` field.
|
||||||
|
<3> The `manager.name` field is an inner `object` field within the `manager` field.
|
||||||
|
|
||||||
|
You are not required to set the field `type` to `object` explicitly, as this is the default value.
|
||||||
|
|
||||||
|
[[object-params]]
|
||||||
|
==== Parameters for `object` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `object` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
<<dynamic,`dynamic`>>::
|
||||||
|
|
||||||
|
Whether or not new `properties` should be added dynamically
|
||||||
|
to an existing object. Accepts `true` (default), `false`
|
||||||
|
and `strict`.
|
||||||
|
|
||||||
|
<<enabled,`enabled`>>::
|
||||||
|
|
||||||
|
Whether the JSON value given for the object field should be
|
||||||
|
parsed and indexed (`true`, default) or completely ignored (`false`).
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Sets the default `include_in_all` value for all the `properties` within
|
||||||
|
the object. The object itself is not added to the `_all` field.
|
||||||
|
|
||||||
|
<<properties,`properties`>>::
|
||||||
|
|
||||||
|
The fields within the object, which can be of any
|
||||||
|
<<mapping-types,datatype>>, including `object`. New properties
|
||||||
|
may be added to an existing object.
|
||||||
|
|
||||||
|
IMPORTANT: If you need to index arrays of objects instead of single objects,
|
||||||
|
read <<nested>> first.
|
||||||
|
|
|
@ -1,190 +0,0 @@
|
||||||
[[mapping-root-object-type]]
|
|
||||||
=== Root Object Type
|
|
||||||
|
|
||||||
The root object mapping is an <<mapping-object-type,object type mapping>> that
|
|
||||||
maps the root object (the type itself). It supports all of the different
|
|
||||||
mappings that can be set using the <<mapping-object-type,object type mapping>>.
|
|
||||||
|
|
||||||
The root object mapping allows to index a JSON document that only contains its
|
|
||||||
fields. For example, the following `tweet` JSON can be indexed without
|
|
||||||
specifying the `tweet` type in the document itself:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"message" : "This is a tweet!"
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== dynamic_date_formats
|
|
||||||
|
|
||||||
`dynamic_date_formats` (old setting called `date_formats` still works)
|
|
||||||
is the ability to set one or more date formats that will be used to
|
|
||||||
detect `date` fields. For example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
|
|
||||||
"properties" : {
|
|
||||||
"message" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
In the above mapping, if a new JSON field of type string is detected,
|
|
||||||
the date formats specified will be used in order to check if its a date.
|
|
||||||
If it passes parsing, then the field will be declared with `date` type,
|
|
||||||
and will use the matching format as its format attribute. The date
|
|
||||||
format itself is explained
|
|
||||||
<<mapping-date-format,here>>.
|
|
||||||
|
|
||||||
The default formats are: `strictDateOptionalTime` (ISO) and
|
|
||||||
`yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z` and `epoch_millis`.
|
|
||||||
|
|
||||||
*Note:* `dynamic_date_formats` are used *only* for dynamically added
|
|
||||||
date fields, not for `date` fields that you specify in your mapping.
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== date_detection
|
|
||||||
|
|
||||||
Allows to disable automatic date type detection (if a new field is introduced
|
|
||||||
and matches the provided format), for example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"date_detection" : false,
|
|
||||||
"properties" : {
|
|
||||||
"message" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== numeric_detection
|
|
||||||
|
|
||||||
Sometimes, even though json has support for native numeric types,
|
|
||||||
numeric values are still provided as strings. In order to try and
|
|
||||||
automatically detect numeric values from string, the `numeric_detection`
|
|
||||||
can be set to `true`. For example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"tweet" : {
|
|
||||||
"numeric_detection" : true,
|
|
||||||
"properties" : {
|
|
||||||
"message" : {"type" : "string"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
[float]
|
|
||||||
==== dynamic_templates
|
|
||||||
|
|
||||||
Dynamic templates allow to define mapping templates that will be applied
|
|
||||||
when dynamic introduction of fields / objects happens.
|
|
||||||
|
|
||||||
IMPORTANT: Dynamic field mappings are only added when a field contains
|
|
||||||
a concrete value -- not `null` or an empty array. This means that if the `null_value` option
|
|
||||||
is used in a `dynamic_template`, it will only be applied after the first document
|
|
||||||
with a concrete value for the field has been indexed.
|
|
||||||
|
|
||||||
For example, we might want to have all fields to be stored by default,
|
|
||||||
or all `string` fields to be stored, or have `string` fields to always
|
|
||||||
be indexed with multi fields syntax, once analyzed and once not_analyzed.
|
|
||||||
Here is a simple example:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"person" : {
|
|
||||||
"dynamic_templates" : [
|
|
||||||
{
|
|
||||||
"template_1" : {
|
|
||||||
"match" : "multi*",
|
|
||||||
"mapping" : {
|
|
||||||
"type" : "{dynamic_type}",
|
|
||||||
"index" : "analyzed",
|
|
||||||
"fields" : {
|
|
||||||
"org" : {"type": "{dynamic_type}", "index" : "not_analyzed"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"template_2" : {
|
|
||||||
"match" : "*",
|
|
||||||
"match_mapping_type" : "string",
|
|
||||||
"mapping" : {
|
|
||||||
"type" : "string",
|
|
||||||
"index" : "not_analyzed"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
The above mapping will create a field with multi fields for all field
|
|
||||||
names starting with multi, and will map all `string` types to be
|
|
||||||
`not_analyzed`.
|
|
||||||
|
|
||||||
Dynamic templates are named to allow for simple merge behavior. A new
|
|
||||||
mapping, just with a new template can be "put" and that template will be
|
|
||||||
added, or if it has the same name, the template will be replaced.
|
|
||||||
|
|
||||||
The `match` allow to define matching on the field name. An `unmatch`
|
|
||||||
option is also available to exclude fields if they do match on `match`.
|
|
||||||
The `match_mapping_type` controls if this template will be applied only
|
|
||||||
for dynamic fields of the specified type (as guessed by the json
|
|
||||||
format).
|
|
||||||
|
|
||||||
Another option is to use `path_match`, which allows to match the dynamic
|
|
||||||
template against the "full" dot notation name of the field (for example
|
|
||||||
`obj1.*.value` or `obj1.obj2.*`), with the respective `path_unmatch`.
|
|
||||||
|
|
||||||
The format of all the matching is simple format, allowing to use * as a
|
|
||||||
matching element supporting simple patterns such as xxx*, *xxx, xxx*yyy
|
|
||||||
(with arbitrary number of pattern types), as well as direct equality.
|
|
||||||
The `match_pattern` can be set to `regex` to allow for regular
|
|
||||||
expression based matching.
|
|
||||||
|
|
||||||
The `mapping` element provides the actual mapping definition. The
|
|
||||||
`{name}` keyword can be used and will be replaced with the actual
|
|
||||||
dynamic field name being introduced. The `{dynamic_type}` (or
|
|
||||||
`{dynamicType}`) can be used and will be replaced with the mapping
|
|
||||||
derived based on the field type (or the derived type, like `date`).
|
|
||||||
|
|
||||||
Complete generic settings can also be applied, for example, to have all
|
|
||||||
mappings be stored, just set:
|
|
||||||
|
|
||||||
[source,js]
|
|
||||||
--------------------------------------------------
|
|
||||||
{
|
|
||||||
"person" : {
|
|
||||||
"dynamic_templates" : [
|
|
||||||
{
|
|
||||||
"store_generic" : {
|
|
||||||
"match" : "*",
|
|
||||||
"mapping" : {
|
|
||||||
"store" : true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
--------------------------------------------------
|
|
||||||
|
|
||||||
Such generic templates should be placed at the end of the
|
|
||||||
`dynamic_templates` list because when two or more dynamic templates
|
|
||||||
match a field, only the first matching one from the list is used.
|
|
|
@ -0,0 +1,170 @@
|
||||||
|
[[string]]
|
||||||
|
=== String datatype
|
||||||
|
|
||||||
|
Fields of type `string` accept text values. Strings may be sub-divided into:
|
||||||
|
|
||||||
|
Full text::
|
||||||
|
+
|
||||||
|
--
|
||||||
|
|
||||||
|
Full text values, like the body of an email, are typically used for text based
|
||||||
|
relevance searches, such as: _Find the most relevant documents that match a
|
||||||
|
query for "quick brown fox"_.
|
||||||
|
|
||||||
|
These fields are `analyzed`, that is they are passed through an
|
||||||
|
<<analysis,analyzer>> to convert the string into a list of individual terms
|
||||||
|
before being indexed. The analysis process allows Elasticsearch to search for
|
||||||
|
individual words _within_ each full text field. Full text fields are not
|
||||||
|
used for sorting and seldom used for aggregations (although the
|
||||||
|
<<search-aggregations-bucket-significantterms-aggregation,significant terms aggregation>> is a notable exception).
|
||||||
|
|
||||||
|
--
|
||||||
|
|
||||||
|
Keywords::
|
||||||
|
|
||||||
|
Keywords are exact values like email addresses, hostnames, status codes, or
|
||||||
|
tags. They are typically used for filtering (_Find me all blog posts where
|
||||||
|
++status++ is ++published++_), for sorting, and for aggregations. Keyword
|
||||||
|
fields are `not_analyzed`. Instead, the exact string value is added to the
|
||||||
|
index as a single term.
|
||||||
|
|
||||||
|
Below is an example of a mapping for a full text (`analyzed`) and a keyword
|
||||||
|
(`not_analyzed`) string field:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"full_name": { <1>
|
||||||
|
"type": "string"
|
||||||
|
},
|
||||||
|
"status": {
|
||||||
|
"type": "string", <2>
|
||||||
|
"index": "not_analyzed"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `full_name` field is an `analyzed` full text field -- `index:analyzed` is the default.
|
||||||
|
<2> The `status` field is a `not_analyzed` keyword field.
|
||||||
|
|
||||||
|
Sometimes it is useful to have both a full text (`analyzed`) and a keyword
|
||||||
|
(`not_analyzed`) version of the same field: one for full text search and the
|
||||||
|
other for aggregations and sorting. This can be achieved with
|
||||||
|
<<multi-fields,multi-fields>>.
|
||||||
|
|
||||||
|
|
||||||
|
[[string-params]]
|
||||||
|
==== Parameters for string fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `string` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<analyzer,`analyzer`>>::
|
||||||
|
|
||||||
|
The <<analysis,analyzer>> which should be used for
|
||||||
|
<<mapping-index,`analyzed`>> string fields, both at index-time
|
||||||
|
and at search-time (unless overridden by the <<search-analyzer,`search_analyzer>>).
|
||||||
|
Defaults to the default index analyzer, or the
|
||||||
|
<<analysis-standard-analyzer,`standard` analyzer>>.
|
||||||
|
|
||||||
|
<<index-boost,`boost`>>::
|
||||||
|
|
||||||
|
Field-level index time boosting. Accepts a floating point number, defaults
|
||||||
|
to `1.0`.
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field use on-disk index-time doc values for sorting, aggregations,
|
||||||
|
or scripting? Accepts `true` or `false`. Defaults to `true` for
|
||||||
|
`not_analyzed` fields. Analyzed fields do not support doc values.
|
||||||
|
|
||||||
|
<<fielddata,`fielddata`>>::
|
||||||
|
|
||||||
|
Can the field use in-memory fielddata for sorting, aggregations,
|
||||||
|
or scripting? Accepts `disabled` or `paged_bytes` (default).
|
||||||
|
Not analyzed fields will use <<doc-values,doc values>> in preference
|
||||||
|
to fielddata.
|
||||||
|
|
||||||
|
<<multi-fields,`fields`>>::
|
||||||
|
|
||||||
|
Multi-fields allow the same string value to be indexed in multiple ways for
|
||||||
|
different purposes, such as one field for search and a multi-field for
|
||||||
|
sorting and aggregations, or the same string value analyzed by different
|
||||||
|
analyzers.
|
||||||
|
|
||||||
|
<<ignore-above,`ignore_above`>>::
|
||||||
|
|
||||||
|
Do not index or analyze any string longer than this value. Defaults to `0` (disabled).
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Whether or not the field value should be included in the
|
||||||
|
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||||
|
to `false` if <<mapping-index,`index`>> is set to `no`, or if a parent
|
||||||
|
<<object,`object`>> field sets `include_in_all` to `false`.
|
||||||
|
Otherwise defaults to `true`.
|
||||||
|
|
||||||
|
<<mapping-index,`index`>>::
|
||||||
|
|
||||||
|
Should the field be searchable? Accepts `analyzed` (default, treat as full-text field),
|
||||||
|
`not_analyzed` (treat as keyword field) and `no`.
|
||||||
|
|
||||||
|
<<index-options,`index_options`>>::
|
||||||
|
|
||||||
|
What information should be stored in the index, for search and highlighting purposes.
|
||||||
|
Defaults to `positions` for <<mapping-index,`analyzed`>> fields, and to `docs` for
|
||||||
|
`not_analyzed` fields.
|
||||||
|
|
||||||
|
|
||||||
|
<<norms,`norms`>>::
|
||||||
|
+
|
||||||
|
--
|
||||||
|
|
||||||
|
Whether field-length should be taken into account when scoring queries.
|
||||||
|
Defaults depend on the <<mapping-index,`index`>> setting:
|
||||||
|
|
||||||
|
* `analyzed` fields default to `{ "enabled": true, "loading": "lazy" }`.
|
||||||
|
* `not_analyzed` fields default to `{ "enabled": false }`.
|
||||||
|
--
|
||||||
|
|
||||||
|
<<null-value,`null_value`>>::
|
||||||
|
|
||||||
|
Accepts a string value which is substituted for any explicit `null`
|
||||||
|
values. Defaults to `null`, which means the field is treated as missing.
|
||||||
|
If the field is `analyzed`, the `null_value` will also be analyzed.
|
||||||
|
|
||||||
|
<<position-offset-gap,`position_offset_gap`>>::
|
||||||
|
|
||||||
|
The number of fake term positions which should be inserted between
|
||||||
|
each element of an array of strings. Defaults to 0.
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
||||||
|
|
||||||
|
<<search-analyzer,`search_analyzer`>>::
|
||||||
|
|
||||||
|
The <<analyzer,`analyzer`>> that should be used at search time on
|
||||||
|
<<mapping-index,`analyzed`>> fields. Defaults to the `analyzer` setting.
|
||||||
|
|
||||||
|
<<similarity,`similarity`>>::
|
||||||
|
|
||||||
|
Which scoring algorithm or _similarity_ should be used. Defaults
|
||||||
|
to `default`, which uses TF/IDF.
|
||||||
|
|
||||||
|
<<term-vector,`term_vector`>>::
|
||||||
|
|
||||||
|
Whether term vectors should be stored for an <<mapping-index,`analyzed`>>
|
||||||
|
field. Defaults to `no`.
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,107 @@
|
||||||
|
[[token-count]]
|
||||||
|
=== Token count datatype
|
||||||
|
|
||||||
|
A field of type `token_count` is really an <<number,`integer>> field which
|
||||||
|
accepts string values, analyzes them, then indexes the number of tokens in the
|
||||||
|
string.
|
||||||
|
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"name": { <1>
|
||||||
|
"type": "string",
|
||||||
|
"fields": {
|
||||||
|
"length": { <2>
|
||||||
|
"type": "token_count",
|
||||||
|
"analyzer": "standard"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
PUT my_index/my_type/1
|
||||||
|
{ "name": "John Smith" }
|
||||||
|
|
||||||
|
PUT my_index/my_type/2
|
||||||
|
{ "name": "Rachel Alice Williams" }
|
||||||
|
|
||||||
|
GET my_index/_search
|
||||||
|
{
|
||||||
|
"query": {
|
||||||
|
"term": {
|
||||||
|
"name.length": 3 <3>
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
// AUTOSENSE
|
||||||
|
<1> The `name` field is an analyzed string field which uses the default `standard` analyzer.
|
||||||
|
<2> The `name.length` field is a `token_count` <<multi-fields,multi-field>> which will index the number of tokens in the `name` field.
|
||||||
|
<3> This query matches only the document containing `Rachel Alice Williams`, as it contains three tokens.
|
||||||
|
|
||||||
|
[NOTE]
|
||||||
|
===================================================================
|
||||||
|
Technically the `token_count` type sums position increments rather than
|
||||||
|
counting tokens. This means that even if the analyzer filters out stop
|
||||||
|
words they are included in the count.
|
||||||
|
===================================================================
|
||||||
|
|
||||||
|
[[token-count-params]]
|
||||||
|
==== Parameters for `token_count` fields
|
||||||
|
|
||||||
|
The following parameters are accepted by `token_count` fields:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
|
||||||
|
<<analyzer,`analyzer`>>::
|
||||||
|
|
||||||
|
The <<analysis,analyzer>> which should be used to analyze the string
|
||||||
|
value. Required. For best performance, use an analyzer without token
|
||||||
|
filters.
|
||||||
|
|
||||||
|
<<index-boost,`boost`>>::
|
||||||
|
|
||||||
|
Field-level index time boosting. Accepts a floating point number, defaults
|
||||||
|
to `1.0`.
|
||||||
|
|
||||||
|
<<doc-values,`doc_values`>>::
|
||||||
|
|
||||||
|
Can the field value be used for sorting, aggregations, or scripting?
|
||||||
|
Accepts `true` (default) or `false`.
|
||||||
|
|
||||||
|
<<mapping-index,`index`>>::
|
||||||
|
|
||||||
|
Should the field be searchable? Accepts `not_analyzed` (default) and `no`.
|
||||||
|
|
||||||
|
<<include-in-all,`include_in_all`>>::
|
||||||
|
|
||||||
|
Whether or not the field value should be included in the
|
||||||
|
<<mapping-all-field,`_all`>> field? Accepts `true` or `false`. Defaults
|
||||||
|
to `false`. Note: if `true`, it is the string value that is added to `_all`,
|
||||||
|
not the calculated token count.
|
||||||
|
|
||||||
|
<<null-value,`null_value`>>::
|
||||||
|
|
||||||
|
Accepts a numeric value of the same `type` as the field which is
|
||||||
|
substituted for any explicit `null` values. Defaults to `null`, which
|
||||||
|
means the field is treated as missing.
|
||||||
|
|
||||||
|
<<precision-step,`precision_step`>>::
|
||||||
|
|
||||||
|
Controls the number of extra terms that are indexed to make
|
||||||
|
<<query-dsl-range-query,`range` queries>> faster. Defaults to `32`.
|
||||||
|
|
||||||
|
<<mapping-store,`store`>>::
|
||||||
|
|
||||||
|
Whether the field value should be stored and retrievable separately from
|
||||||
|
the <<mapping-source-field,`_source`>> field. Accepts `true` or `false`
|
||||||
|
(default).
|
|
@ -71,7 +71,7 @@ Field statistics can be accessed with a subscript operator like this:
|
||||||
|
|
||||||
Field statistics are computed per shard and therefore these numbers can vary
|
Field statistics are computed per shard and therefore these numbers can vary
|
||||||
depending on the shard the current document resides in.
|
depending on the shard the current document resides in.
|
||||||
The number of terms in a field cannot be accessed using the `_index` variable. See <<mapping-core-types, word count mapping type>> on how to do that.
|
The number of terms in a field cannot be accessed using the `_index` variable. See <<token-count>> for how to do that.
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
==== Term statistics:
|
==== Term statistics:
|
||||||
|
@ -80,7 +80,7 @@ Term statistics for a field can be accessed with a subscript operator like
|
||||||
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
|
this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist.
|
||||||
If you do not need the term frequency, call `_index['FIELD'].get('TERM', 0)`
|
If you do not need the term frequency, call `_index['FIELD'].get('TERM', 0)`
|
||||||
to avoid unnecessary initialization of the frequencies. The flag will have only
|
to avoid unnecessary initialization of the frequencies. The flag will have only
|
||||||
affect is your set the `index_options` to `docs` (see <<mapping-core-types, mapping documentation>>).
|
affect is your set the <<index-options,`index_options`>> to `docs`.
|
||||||
|
|
||||||
|
|
||||||
`_index['FIELD']['TERM'].df()`::
|
`_index['FIELD']['TERM'].df()`::
|
||||||
|
@ -176,7 +176,7 @@ return score;
|
||||||
[float]
|
[float]
|
||||||
==== Term vectors:
|
==== Term vectors:
|
||||||
|
|
||||||
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <<mapping-core-types,mapping documentation>>). To access them, call
|
The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (see <<term-vector>>). To access them, call
|
||||||
`_index.termVectors()` to get a
|
`_index.termVectors()` to get a
|
||||||
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields]
|
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields]
|
||||||
instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field.
|
instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field.
|
||||||
|
|
|
@ -284,7 +284,6 @@ supported operations are:
|
||||||
|=======================================================================
|
|=======================================================================
|
||||||
|Value |Description
|
|Value |Description
|
||||||
| `aggs` |Aggregations (wherever they may be used)
|
| `aggs` |Aggregations (wherever they may be used)
|
||||||
| `mapping` |Mappings (script transform feature)
|
|
||||||
| `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields)
|
| `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields)
|
||||||
| `update` |Update api
|
| `update` |Update api
|
||||||
| `plugin` |Any plugin that makes use of scripts under the generic `plugin` category
|
| `plugin` |Any plugin that makes use of scripts under the generic `plugin` category
|
||||||
|
|
|
@ -44,7 +44,7 @@ These documents would *not* match the above query:
|
||||||
[float]
|
[float]
|
||||||
===== `null_value` mapping
|
===== `null_value` mapping
|
||||||
|
|
||||||
If the field mapping includes the `null_value` setting (see <<mapping-core-types>>)
|
If the field mapping includes the <<null-value,`null_value`>> setting
|
||||||
then explicit `null` values are replaced with the specified `null_value`. For
|
then explicit `null` values are replaced with the specified `null_value`. For
|
||||||
instance, if the `user` field were mapped as follows:
|
instance, if the `user` field were mapped as follows:
|
||||||
|
|
||||||
|
|
|
@ -254,7 +254,7 @@ decay function is specified as
|
||||||
<1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`.
|
<1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`.
|
||||||
<2> The specified field must be a numeric, date, or geo-point field.
|
<2> The specified field must be a numeric, date, or geo-point field.
|
||||||
|
|
||||||
In the above example, the field is a <<mapping-geo-point-type>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example:
|
In the above example, the field is a <<geo-point,`geo_point`>> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example:
|
||||||
|
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
|
@ -268,7 +268,7 @@ In the above example, the field is a <<mapping-geo-point-type>> and origin can b
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
<1> The date format of the origin depends on the <<mapping-date-format>> defined in
|
<1> The date format of the origin depends on the <<mapping-date-format,`format`>> defined in
|
||||||
your mapping. If you do not define the origin, the current time is used.
|
your mapping. If you do not define the origin, the current time is used.
|
||||||
<2> The `offset` and `decay` parameters are optional.
|
<2> The `offset` and `decay` parameters are optional.
|
||||||
|
|
||||||
|
|
|
@ -112,7 +112,6 @@ Format in `lat,lon`.
|
||||||
[float]
|
[float]
|
||||||
==== geo_point Type
|
==== geo_point Type
|
||||||
|
|
||||||
The filter *requires* the
|
The query *requires* the <<geo-point,`geo_point`>> type to be set on the
|
||||||
<<mapping-geo-point-type,geo_point>> type to be
|
relevant field.
|
||||||
set on the relevant field.
|
|
||||||
|
|
||||||
|
|
|
@ -2,8 +2,8 @@
|
||||||
== Geo queries
|
== Geo queries
|
||||||
|
|
||||||
Elasticsearch supports two types of geo data:
|
Elasticsearch supports two types of geo data:
|
||||||
<<mapping-geo-point-type,`geo_point`>> fields which support lat/lon pairs, and
|
<<geo-point,`geo_point`>> fields which support lat/lon pairs, and
|
||||||
<<mapping-geo-shape-type,`geo_shape`>> fields, which support points,
|
<<geo-shape,`geo_shape`>> fields, which support points,
|
||||||
lines, circles, polygons, multi-polygons etc.
|
lines, circles, polygons, multi-polygons etc.
|
||||||
|
|
||||||
The queries in this group are:
|
The queries in this group are:
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
|
|
||||||
Filter documents indexed using the `geo_shape` type.
|
Filter documents indexed using the `geo_shape` type.
|
||||||
|
|
||||||
Requires the <<mapping-geo-shape-type,geo_shape Mapping>>.
|
Requires the <<geo-shape,`geo_shape` Mapping>>.
|
||||||
|
|
||||||
The `geo_shape` query uses the same grid square representation as the
|
The `geo_shape` query uses the same grid square representation as the
|
||||||
geo_shape mapping to find documents that have a shape that intersects
|
geo_shape mapping to find documents that have a shape that intersects
|
||||||
|
|
|
@ -2,13 +2,13 @@
|
||||||
=== Geohash Cell Query
|
=== Geohash Cell Query
|
||||||
|
|
||||||
The `geohash_cell` query provides access to a hierarchy of geohashes.
|
The `geohash_cell` query provides access to a hierarchy of geohashes.
|
||||||
By defining a geohash cell, only <<mapping-geo-point-type,geopoints>>
|
By defining a geohash cell, only <<geo-point,geopoints>>
|
||||||
within this cell will match this filter.
|
within this cell will match this filter.
|
||||||
|
|
||||||
To get this filter work all prefixes of a geohash need to be indexed. In
|
To get this filter work all prefixes of a geohash need to be indexed. In
|
||||||
example a geohash `u30` needs to be decomposed into three terms: `u30`,
|
example a geohash `u30` needs to be decomposed into three terms: `u30`,
|
||||||
`u3` and `u`. This decomposition must be enabled in the mapping of the
|
`u3` and `u`. This decomposition must be enabled in the mapping of the
|
||||||
<<mapping-geo-point-type,geopoint>> field that's going to be filtered by
|
<<geo-point,geopoint>> field that's going to be filtered by
|
||||||
setting the `geohash_prefix` option:
|
setting the `geohash_prefix` option:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
|
|
|
@ -7,7 +7,7 @@ which are designed to scale horizontally.
|
||||||
|
|
||||||
<<query-dsl-nested-query,`nested` query>>::
|
<<query-dsl-nested-query,`nested` query>>::
|
||||||
|
|
||||||
Documents may contains fields of type <<mapping-nested-type,`nested`>>. These
|
Documents may contains fields of type <<nested,`nested`>>. These
|
||||||
fields are used to index arrays of objects, where each object can be queried
|
fields are used to index arrays of objects, where each object can be queried
|
||||||
(with the `nested` query) as an independent document.
|
(with the `nested` query) as an independent document.
|
||||||
|
|
||||||
|
|
|
@ -44,7 +44,7 @@ These documents would *not* match the above filter:
|
||||||
[float]
|
[float]
|
||||||
==== `null_value` mapping
|
==== `null_value` mapping
|
||||||
|
|
||||||
If the field mapping includes a `null_value` (see <<mapping-core-types>>) then explicit `null` values
|
If the field mapping includes a <<null-value,`null_value`>> then explicit `null` values
|
||||||
are replaced with the specified `null_value`. For instance, if the `user` field were mapped
|
are replaced with the specified `null_value`. For instance, if the `user` field were mapped
|
||||||
as follows:
|
as follows:
|
||||||
|
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
=== Nested Query
|
=== Nested Query
|
||||||
|
|
||||||
Nested query allows to query nested objects / docs (see
|
Nested query allows to query nested objects / docs (see
|
||||||
<<mapping-nested-type,nested mapping>>). The
|
<<nested,nested mapping>>). The
|
||||||
query is executed against the nested objects / docs as if they were
|
query is executed against the nested objects / docs as if they were
|
||||||
indexed as separate docs (they are, internally) and resulting in the
|
indexed as separate docs (they are, internally) and resulting in the
|
||||||
root parent doc (or parent nested mapping). Here is a sample mapping we
|
root parent doc (or parent nested mapping). Here is a sample mapping we
|
||||||
|
|
|
@ -29,33 +29,60 @@ The `range` query accepts the following parameters:
|
||||||
`lt`:: Less-than
|
`lt`:: Less-than
|
||||||
`boost`:: Sets the boost value of the query, defaults to `1.0`
|
`boost`:: Sets the boost value of the query, defaults to `1.0`
|
||||||
|
|
||||||
[float]
|
|
||||||
==== Date options
|
|
||||||
|
|
||||||
When applied on `date` fields the `range` filter accepts also a `time_zone` parameter.
|
[[ranges-on-dates]]
|
||||||
The `time_zone` parameter will be applied to your input lower and upper bounds and will
|
==== Ranges on date fields
|
||||||
move them to UTC time based date:
|
|
||||||
|
When running `range` queries on fields of type <<date,`date`>>, ranges can be
|
||||||
|
specified using <<date-math>>::
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
{
|
{
|
||||||
"range" : {
|
"range" : {
|
||||||
"born" : {
|
"date" : {
|
||||||
"gte": "2012-01-01",
|
"gte" : "now-1d/d",
|
||||||
"lte": "now",
|
"lt" : "now/d"
|
||||||
"time_zone": "+01:00"
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
In the above example, `gte` will be actually moved to `2011-12-31T23:00:00` UTC date.
|
===== Date math and rounding
|
||||||
|
|
||||||
NOTE: if you give a date with a timezone explicitly defined and use the `time_zone` parameter, `time_zone` will be
|
When using <<date-math,date math>> to round dates to the nearest day, month,
|
||||||
ignored. For example, setting `gte` to `2012-01-01T00:00:00+01:00` with `"time_zone":"+10:00"` will still use `+01:00` time zone.
|
hour, etc, the rounded dates depend on whether the ends of the ranges are
|
||||||
|
inclusive or exclusive.
|
||||||
|
|
||||||
When applied on `date` fields the `range` query accepts also a `format` parameter.
|
Rounding up moves to the last millisecond of the rounding scope, and rounding
|
||||||
The `format` parameter will help support another date format than the one defined in mapping:
|
down to the first millisecond of the rounding scope. For example:
|
||||||
|
|
||||||
|
[horizontal]
|
||||||
|
`gt`::
|
||||||
|
|
||||||
|
Greater than the date rounded up: `2014-11-18||/M` becomes
|
||||||
|
`2014-11-30T23:59:59.999`, ie excluding the entire month.
|
||||||
|
|
||||||
|
`gte`::
|
||||||
|
|
||||||
|
Greater than or equal to the date rounded down: `2014-11-18||/M` becomes
|
||||||
|
`2014-11-01`, ie including the entire month.
|
||||||
|
|
||||||
|
`lt`::
|
||||||
|
|
||||||
|
Less than the date rounded down: `2014-11-18||/M` becomes `2014-11-01`, ie
|
||||||
|
excluding the entire month.
|
||||||
|
|
||||||
|
`lte`::
|
||||||
|
|
||||||
|
Less than or equal to the date rounded up: `2014-11-18||/M` becomes
|
||||||
|
`2014-11-30T23:59:59.999`, ie including the entire month.
|
||||||
|
|
||||||
|
===== Date format in range queries
|
||||||
|
|
||||||
|
Formatted dates will be parsed using the <<mapping-date-format,`format`>>
|
||||||
|
specified on the <<date,`date`>> field by default, but it can be overridden by
|
||||||
|
passing the `format` parameter to the `range` query:
|
||||||
|
|
||||||
[source,js]
|
[source,js]
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
@ -69,3 +96,25 @@ The `format` parameter will help support another date format than the one define
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
|
===== Time zone in range queries
|
||||||
|
|
||||||
|
Dates can be converted from another timezone to UTC either by specifying the
|
||||||
|
time zone in the date value itself (if the <<mapping-date-format, `format`>>
|
||||||
|
accepts it), or it can be specified as the `time_zone` parameter:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------------------------
|
||||||
|
{
|
||||||
|
"range" : {
|
||||||
|
"timestamp" : {
|
||||||
|
"gte": "2015-01-01 00:00:00", <1>
|
||||||
|
"lte": "now",
|
||||||
|
"time_zone": "+01:00"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------------------------
|
||||||
|
<1> This date will be converted to `2014-12-31T23:00:00 UTC`.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -3,7 +3,7 @@
|
||||||
|
|
||||||
experimental[]
|
experimental[]
|
||||||
|
|
||||||
The <<mapping-parent-field, parent/child>> and <<mapping-nested-type, nested>> features allow the return of documents that
|
The <<mapping-parent-field, parent/child>> and <<nested, nested>> features allow the return of documents that
|
||||||
have matches in a different scope. In the parent/child case, parent document are returned based on matches in child
|
have matches in a different scope. In the parent/child case, parent document are returned based on matches in child
|
||||||
documents or child document are returned based on matches in parent documents. In the nested case, documents are returned
|
documents or child document are returned based on matches in parent documents. In the nested case, documents are returned
|
||||||
based on matches in nested inner objects.
|
based on matches in nested inner objects.
|
||||||
|
|
|
@ -71,6 +71,7 @@ curl -XPOST 'localhost:9200/_search' -d '{
|
||||||
}'
|
}'
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
||||||
|
[[nested-sorting]]
|
||||||
==== Sorting within nested objects.
|
==== Sorting within nested objects.
|
||||||
|
|
||||||
Elasticsearch also supports sorting by
|
Elasticsearch also supports sorting by
|
||||||
|
@ -166,6 +167,7 @@ If any of the indices that are queried doesn't have a mapping for `price`
|
||||||
then Elasticsearch will handle it as if there was a mapping of type
|
then Elasticsearch will handle it as if there was a mapping of type
|
||||||
`long`, with all documents in this index having no value for this field.
|
`long`, with all documents in this index having no value for this field.
|
||||||
|
|
||||||
|
[[geo-sorting]]
|
||||||
==== Geo Distance Sorting
|
==== Geo Distance Sorting
|
||||||
|
|
||||||
Allow to sort by `_geo_distance`. Here is an example:
|
Allow to sort by `_geo_distance`. Here is an example:
|
||||||
|
|
Loading…
Reference in New Issue