diff --git a/.gitignore b/.gitignore index eb18986cf68..82ed94bd318 100644 --- a/.gitignore +++ b/.gitignore @@ -14,7 +14,7 @@ docs/html/ docs/build.log /tmp/ backwards/ - +html_docs ## eclipse ignores (use 'mvn eclipse:eclipse' to build eclipse projects) ## All files (.project, .classpath, .settings/*) should be generated through Maven which ## will correctly set the classpath based on the declared dependencies and write settings diff --git a/docs/reference/aggregations/bucket/geodistance-aggregation.asciidoc b/docs/reference/aggregations/bucket/geodistance-aggregation.asciidoc index 2120c0bec9a..4890f9dca9d 100644 --- a/docs/reference/aggregations/bucket/geodistance-aggregation.asciidoc +++ b/docs/reference/aggregations/bucket/geodistance-aggregation.asciidoc @@ -53,7 +53,7 @@ Response: } -------------------------------------------------- -The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the `geo_point` <>: +The specified field must be of type `geo_point` (which can only be set explicitly in the mappings). And it can also hold an array of `geo_point` fields, in which case all will be taken into account during aggregation. The origin point can accept all formats supported by the <>: * Object format: `{ "lat" : 52.3760, "lon" : 4.894 }` - this is the safest format as it is the most explicit about the `lat` & `lon` values * String format: `"52.3760, 4.894"` - where the first number is the `lat` and the second is the `lon` diff --git a/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc b/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc index b6e9c2cabae..30f403d3f7f 100644 --- a/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc +++ b/docs/reference/aggregations/metrics/tophits-aggregation.asciidoc @@ -200,7 +200,7 @@ and therefore can't be used in the `order` option of the `terms` aggregator. If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned. Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested` -or `reverse_nested` aggregator. Read more about nested in the <>. +or `reverse_nested` aggregator. Read more about nested in the <>. If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why diff --git a/docs/reference/api-conventions.asciidoc b/docs/reference/api-conventions.asciidoc index f7fbde44d81..41d25362ed8 100644 --- a/docs/reference/api-conventions.asciidoc +++ b/docs/reference/api-conventions.asciidoc @@ -152,6 +152,33 @@ being consumed by a monitoring tool, rather than intended for human consumption. The default for the `human` flag is `false`. +[[date-math]] +=== Date Math + +Most parameters which accept a formatted date value -- such as `gt` and `lt` +in <> `range` queries, or `from` and `to` +in <> -- understand date maths. + +The expression starts with an anchor date, which can either be `now`, or a +date string ending with `||`. This anchor date can optionally be followed by +one or more maths expressions: + +* `+1h` - add one hour +* `-1d` - subtract one day +* `/d` - round down to the nearest day + +The supported <> are: `y` (year), `M` (month), `w` (week), +`d` (day), `h` (hour), `m` (minute), and `s` (second). + +Some examples are: + +[horizontal] +`now+1h`:: The current time plus one hour, with ms resolution. +`now+1h+1m`:: The current time plus one hour plus one minute, with ms resolution. +`now+1h/d`:: The current time plus one hour, rounded down to the nearest day. +`2015-01-01||+1M/d`:: `2015-01-01` plus one month, rounded down to the nearest day. + [float] === Response Filtering @@ -237,10 +264,10 @@ curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version' -------------------------------------------------- Note that elasticsearch sometimes returns directly the raw value of a field, -like the `_source` field. If you want to filter _source fields, you should +like the `_source` field. If you want to filter `_source` fields, you should consider combining the already existing `_source` parameter (see <> for more details) with the `filter_path` - parameter like this: +parameter like this: [source,sh] -------------------------------------------------- @@ -318,8 +345,9 @@ of supporting the native JSON number types. [float] === Time units -Whenever durations need to be specified, eg for a `timeout` parameter, the duration -can be specified as a whole number representing time in milliseconds, or as a time value like `2d` for 2 days. The supported units are: +Whenever durations need to be specified, eg for a `timeout` parameter, the +duration must specify the unit, like `2d` for 2 days. The supported units +are: [horizontal] `y`:: Year @@ -329,6 +357,7 @@ can be specified as a whole number representing time in milliseconds, or as a ti `h`:: Hour `m`:: Minute `s`:: Second +`ms`:: Milli-second [[distance-units]] [float] diff --git a/docs/reference/index-modules/mapper.asciidoc b/docs/reference/index-modules/mapper.asciidoc index 4f82e2ffe7e..484aec18466 100644 --- a/docs/reference/index-modules/mapper.asciidoc +++ b/docs/reference/index-modules/mapper.asciidoc @@ -6,53 +6,3 @@ added to an index either when creating it or by using the put mapping api. It also handles the dynamic mapping support for types that have no explicit mappings pre defined. For more information about mapping definitions, check out the <>. - -[float] -=== Dynamic Mappings - -New types and new fields within types can be added dynamically just -by indexing a document. When Elasticsearch encounters a new type, -it creates the type using the `_default_` mapping (see below). - -When it encounters a new field within a type, it autodetects the -datatype that the field contains and adds it to the type mapping -automatically. - -See <> for details of how to control and -configure dynamic mapping. - -[float] -=== Default Mapping - -When a new type is created (at <> time, -using the <> or just by indexing a -document into it), the type uses the `_default_` mapping as its basis. Any -mapping specified in the <> or -<> request override values set in the -`_default_` mapping. - -The default mapping definition is a plain mapping definition that is -embedded within Elasticsearch: - -[source,js] --------------------------------------------------- -{ - _default_ : { - } -} --------------------------------------------------- - -Pretty short, isn't it? Basically, everything is `_default_`ed, including the -dynamic nature of the root object mapping which allows new fields to be added -automatically. - -The default mapping can be overridden by specifying the `_default_` type when -creating a new index. - -[float] -=== Mapper settings - -`index.mapper.dynamic` (_dynamic_):: - - Dynamic creation of mappings for unmapped types can be completely - disabled by setting `index.mapper.dynamic` to `false`. diff --git a/docs/reference/index-modules/similarity.asciidoc b/docs/reference/index-modules/similarity.asciidoc index 0c95709a860..82399a8cc79 100644 --- a/docs/reference/index-modules/similarity.asciidoc +++ b/docs/reference/index-modules/similarity.asciidoc @@ -6,8 +6,8 @@ are scored. Similarity is per field, meaning that via the mapping one can define a different similarity per field. Configuring a custom similarity is considered a expert feature and the -builtin similarities are most likely sufficient as is described in the -<> +builtin similarities are most likely sufficient as is described in +<>. [float] [[configuration]] @@ -41,7 +41,7 @@ Here we configure the DFRSimilarity so it can be referenced as "properties" : { "title" : { "type" : "string", "similarity" : "my_similarity" } } -} +} -------------------------------------------------- [float] @@ -52,9 +52,9 @@ Here we configure the DFRSimilarity so it can be referenced as ==== Default similarity The default similarity that is based on the TF/IDF model. This -similarity has the following option: +similarity has the following option: -`discount_overlaps`:: +`discount_overlaps`:: Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms. @@ -71,14 +71,14 @@ http://en.wikipedia.org/wiki/Okapi_BM25[Okapi_BM25] for more details. This similarity has the following options: [horizontal] -`k1`:: +`k1`:: Controls non-linear term frequency normalization - (saturation). + (saturation). -`b`:: - Controls to what degree document length normalizes tf values. +`b`:: + Controls to what degree document length normalizes tf values. -`discount_overlaps`:: +`discount_overlaps`:: Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms. @@ -90,17 +90,17 @@ Type name: `BM25` ==== DFR similarity Similarity that implements the -http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence +http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence from randomness] framework. This similarity has the following options: [horizontal] -`basic_model`:: - Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`. +`basic_model`:: + Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`. `after_effect`:: - Possible values: `no`, `b` and `l`. + Possible values: `no`, `b` and `l`. -`normalization`:: +`normalization`:: Possible values: `no`, `h1`, `h2`, `h3` and `z`. All options but the first option need a normalization value. @@ -111,12 +111,12 @@ Type name: `DFR` [[ib]] ==== IB similarity. -http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information +http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information based model] . This similarity has the following options: [horizontal] -`distribution`:: Possible values: `ll` and `spl`. -`lambda`:: Possible values: `df` and `ttf`. +`distribution`:: Possible values: `ll` and `spl`. +`lambda`:: Possible values: `df` and `ttf`. `normalization`:: Same as in `DFR` similarity. Type name: `IB` @@ -125,7 +125,7 @@ Type name: `IB` [[lm_dirichlet]] ==== LM Dirichlet similarity. -http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM +http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMDirichletSimilarity.html[LM Dirichlet similarity] . This similarity has the following options: [horizontal] @@ -137,7 +137,7 @@ Type name: `LMDirichlet` [[lm_jelinek_mercer]] ==== LM Jelinek Mercer similarity. -http://lucene.apache.org/core/4_7_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM +http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM Jelinek Mercer similarity] . This similarity has the following options: [horizontal] diff --git a/docs/reference/mapping.asciidoc b/docs/reference/mapping.asciidoc index ce514e5a1ed..711e36845ee 100644 --- a/docs/reference/mapping.asciidoc +++ b/docs/reference/mapping.asciidoc @@ -3,76 +3,157 @@ [partintro] -- -Mapping is the process of defining how a document should be mapped to -the Search Engine, including its searchable characteristics such as -which fields are searchable and if/how they are tokenized. In -Elasticsearch, an index may store documents of different "mapping -types". Elasticsearch allows one to associate multiple mapping -definitions for each mapping type. -Explicit mapping is defined on an index/type level. By default, there -isn't a need to define an explicit mapping, since one is automatically -created and registered when a new type or new field is introduced (with -no performance overhead) and have sensible defaults. Only when the -defaults need to be overridden must a mapping definition be provided. +Mapping is the process of defining how a document, and the fields it contains, +are stored and indexed. For instance, use mappings to define: + +* which string fields should be treated as full text fields. +* which fields contain numbers, dates, or geolocations. +* whether the values of all fields in the document should be + indexed into the catch-all <> field. +* the <> of date values. +* custom rules to control the mapping for + <>. [float] -[[all-mapping-types]] -=== Mapping Types +[[mapping-type]] +== Mapping Types -Mapping types are a way to divide the documents in an index into logical -groups. Think of it as tables in a database. Though there is separation -between types, it's not a full separation (all end up as a document -within the same Lucene index). +Each index has one or more _mapping types_, which are used to divide the +documents in an index into logical groups. User documents might be stored in a +`user` type, and blog posts in a `blogpost` type. -Field names with the same name across types are highly recommended to -have the same type and same mapping characteristics (analysis settings -for example). There is an effort to allow to explicitly "choose" which -field to use by using type prefix (`my_type.my_field`), but it's not -complete, and there are places where it will never work (like -aggregations on the field). +Each mapping type has: -In practice though, this restriction is almost never an issue. The field -name usually ends up being a good indication to its "typeness" (e.g. -"first_name" will always be a string). Note also, that this does not -apply to the cross index case. +<>:: + +Meta-fields are used to customize how a document's metadata associated is +treated. Examples of meta-fields include the document's +<>, <>, +<>, and <> fields. + +<> or _properties_:: + +Each mapping type contains a list of fields or `properties` pertinent to that +type. A `user` type might contain `title`, `name`, and `age` fields, while a +`blogpost` type might contain `title`, `body`, `user_id` and `created` +fields. + +The mapping for the above example could look like this: + +[source,js] +--------------------------------------- +PUT my_index <1> +{ + "mappings": { + "user": { <2> + "_all": { "enabled": false }, <3> + "properties": { <4> + "title": { "type": "string" }, <5> + "name": { "type": "string" }, <5> + "age": { "type": "integer" } <5> + } + }, + "blogpost": { <2> + "properties": { <4> + "title": { "type": "string" }, <5> + "body": { "type": "string" }, <5> + "user_id": { + "type": "string", <5> + "index": "not_analyzed" + }, + "created": { + "type": "date", <5> + "format": "strict_date_optional_time||epoch_millis" + } + } + } + } +} +--------------------------------------- +// AUTOSENSE +<1> Create an index called `my_index`. +<2> Add mapping types called `user` and `blogpost`. +<3> Disable the `_all` <> for the `user` mapping type. +<4> Specify fields or _properties_ in each mapping type. +<5> Specify the data `type` and mapping for each field. [float] -[[mapping-api]] -=== Mapping API +== Field datatypes -To create a mapping, you will need the <>, or you can add multiple mappings when you <>. +Each field has a data `type` which can be: + +* a simple type like <>, <>, <>, + <>, <> or <>. +* a type which supports the hierarchical nature of JSON such as + <> or <>. +* or a specialised type like <>, + <>, or <>. + +[IMPORTANT] +.Fields are shared across mapping types +========================================= + +Mapping types are used to group fields, but the fields in each mapping type +are not independent of each other. Fields with: + +* the _same name_ +* in the _same index_ +* in _different mapping types_ +* map to the _same field_ internally, +* and *must have the same mapping*. + +The `title` field exists in both the `user` and `blogpost` mapping types and +so must have exactly the same mapping in each type. The only exceptions to +this rule are the <>, <>, <>, <>, +<>, and <> parameters, which may have different +settings per field. + +Usually, fields with the same name also contain the same type of data, so +having the same mapping is not a problem. When conflicts do arise, these can +be solved by choosing more descriptive names, such as `user_title` and +`blog_title`. + +========================================= [float] -[[mapping-settings]] -=== Global Settings +== Dynamic mapping -The `index.mapping.ignore_malformed` global setting can be set on the -index level to allow to ignore malformed content globally across all -mapping types (malformed content example is trying to index a text string -value as a numeric type). +Fields and mapping types do not need to be defined before being used. Thanks +to _dynamic mapping_, new mapping types and new field names will be added +automatically, just by indexing a document. New fields can be added both to +the top-level mapping type, and to inner <> and +<> fields. + +The +<> rules can be configured to +customise the mapping that is used for new types and new fields. + +[float] +== Explicit mappings + +You know more about your data than Elasticsearch can guess, so while dynamic +mapping can be useful to get started, at some point you will want to specify +your own explicit mappings. + +You can create mapping types and field mappings when you +<>, and you can add mapping types and +fields to an existing index with the <>. + +[float] +== Updating existing mappings + +Other than where documented, *existing type and field mappings cannot be +updated*. Changing the mapping would mean invalidating already indexed +documents. Instead, you should create a new index with the correct mappings +and reindex your data into that index. -The `index.mapping.coerce` global setting can be set on the -index level to coerce numeric content globally across all -mapping types (The default setting is true and coercions attempted are -to convert strings with numbers into numeric types and also numeric values -with fractions to any integer/short/long values minus the fraction part). -When the permitted conversions fail in their attempts, the value is considered -malformed and the ignore_malformed setting dictates what will happen next. -- -include::mapping/fields.asciidoc[] - include::mapping/types.asciidoc[] -include::mapping/date-format.asciidoc[] +include::mapping/fields.asciidoc[] -include::mapping/fielddata_formats.asciidoc[] +include::mapping/params.asciidoc[] include::mapping/dynamic-mapping.asciidoc[] - -include::mapping/meta.asciidoc[] - -include::mapping/transform.asciidoc[] diff --git a/docs/reference/mapping/date-format.asciidoc b/docs/reference/mapping/date-format.asciidoc deleted file mode 100644 index 7f12616cc24..00000000000 --- a/docs/reference/mapping/date-format.asciidoc +++ /dev/null @@ -1,238 +0,0 @@ -[[mapping-date-format]] -== Date Format - -In JSON documents, dates are represented as strings. Elasticsearch uses a set -of pre-configured format to recognize and convert those, but you can change the -defaults by specifying the `format` option when defining a `date` type, or by -specifying `dynamic_date_formats` in the `root object` mapping (which will -be used unless explicitly overridden by a `date` type). There are built in -formats supported, as well as complete custom one. - -The parsing of dates uses http://www.joda.org/joda-time/[Joda]. The -default date parsing used if no format is specified is -http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[ISODateTimeFormat.dateOptionalTimeParser]. - -An extension to the format allow to define several formats using `||` -separator. This allows to define less strict formats that can be used, -for example, the `yyyy/MM/dd HH:mm:ss||yyyy/MM/dd` format will parse -both `yyyy/MM/dd HH:mm:ss` and `yyyy/MM/dd`. The first format will also -act as the one that converts back from milliseconds to a string -representation. - -[float] -[[date-math]] -=== Date Math - -The `date` type supports using date math expression when using it in a -query/filter (mainly makes sense in `range` query/filter). - -The expression starts with an "anchor" date, which can be either `now` -or a date string (in the applicable format) ending with `||`. It can -then follow by a math expression, supporting `+`, `-` and `/` -(rounding). The units supported are `y` (year), `M` (month), `w` (week), -`d` (day), `h` (hour), `m` (minute), and `s` (second). - -Here are some samples: `now+1h`, `now+1h+1m`, `now+1h/d`, -`2012-01-01||+1M/d`. - -When doing `range` type searches with rounding, the value parsed -depends on whether the end of the range is inclusive or exclusive, and -whether the beginning or end of the range. Rounding up moves to the -last millisecond of the rounding scope, and rounding down to the -first millisecond of the rounding scope. The semantics work as follows: -* `gt` - round up, and use > that value (`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie excluding the entire month) -* `gte` - round D down, and use >= that value (`2014-11-18||/M` becomes `2014-11-01`, ie including the entire month) -* `lt` - round D down, and use < that value (`2014-11-18||/M` becomes `2014-11-01`, ie excluding the entire month) -* `lte` - round D up, and use <= that value(`2014-11-18||/M` becomes `2014-11-30T23:59:59.999`, ie including the entire month) - -[float] -[[built-in]] -=== Built In Formats - -Most of the below dates have a `strict` companion dates, which means, that -year, month and day parts of the week must have prepending zeros in order -to be valid. This means, that a date like `5/11/1` would not be valid, but -you would need to specify the full date, which would be `2005/11/01` in this -example. So instead of `date_optional_time` you would need to specify -`strict_date_optional_time`. - -The following tables lists all the defaults ISO formats supported: - -[cols="<,<",options="header",] -|======================================================================= -|Name |Description -|`basic_date`|A basic formatter for a full date as four digit year, two -digit month of year, and two digit day of month (yyyyMMdd). - -|`basic_date_time`|A basic formatter that combines a basic date and time, -separated by a 'T' (yyyyMMdd'T'HHmmss.SSSZ). - -|`basic_date_time_no_millis`|A basic formatter that combines a basic date -and time without millis, separated by a 'T' (yyyyMMdd'T'HHmmssZ). - -|`basic_ordinal_date`|A formatter for a full ordinal date, using a four -digit year and three digit dayOfYear (yyyyDDD). - -|`basic_ordinal_date_time`|A formatter for a full ordinal date and time, -using a four digit year and three digit dayOfYear -(yyyyDDD'T'HHmmss.SSSZ). - -|`basic_ordinal_date_time_no_millis`|A formatter for a full ordinal date -and time without millis, using a four digit year and three digit -dayOfYear (yyyyDDD'T'HHmmssZ). - -|`basic_time`|A basic formatter for a two digit hour of day, two digit -minute of hour, two digit second of minute, three digit millis, and time -zone offset (HHmmss.SSSZ). - -|`basic_time_no_millis`|A basic formatter for a two digit hour of day, -two digit minute of hour, two digit second of minute, and time zone -offset (HHmmssZ). - -|`basic_t_time`|A basic formatter for a two digit hour of day, two digit -minute of hour, two digit second of minute, three digit millis, and time -zone off set prefixed by 'T' ('T'HHmmss.SSSZ). - -|`basic_t_time_no_millis`|A basic formatter for a two digit hour of day, -two digit minute of hour, two digit second of minute, and time zone -offset prefixed by 'T' ('T'HHmmssZ). - -|`basic_week_date`|A basic formatter for a full date as four digit -weekyear, two digit week of weekyear, and one digit day of week -(xxxx'W'wwe). `strict_basic_week_date` is supported. - -|`basic_week_date_time`|A basic formatter that combines a basic weekyear -date and time, separated by a 'T' (xxxx'W'wwe'T'HHmmss.SSSZ). -`strict_basic_week_date_time` is supported. - -|`basic_week_date_time_no_millis`|A basic formatter that combines a basic -weekyear date and time without millis, separated by a 'T' -(xxxx'W'wwe'T'HHmmssZ). `strict_week_date_time` is supported. - -|`date`|A formatter for a full date as four digit year, two digit month -of year, and two digit day of month (yyyy-MM-dd). `strict_date` is supported. -_ -|`date_hour`|A formatter that combines a full date and two digit hour of -day. strict_date_hour` is supported. - - -|`date_hour_minute`|A formatter that combines a full date, two digit hour -of day, and two digit minute of hour. strict_date_hour_minute` is supported. - -|`date_hour_minute_second`|A formatter that combines a full date, two -digit hour of day, two digit minute of hour, and two digit second of -minute. `strict_date_hour_minute_second` is supported. - -|`date_hour_minute_second_fraction`|A formatter that combines a full -date, two digit hour of day, two digit minute of hour, two digit second -of minute, and three digit fraction of second -(yyyy-MM-dd'T'HH:mm:ss.SSS). `strict_date_hour_minute_second_fraction` is supported. - -|`date_hour_minute_second_millis`|A formatter that combines a full date, -two digit hour of day, two digit minute of hour, two digit second of -minute, and three digit fraction of second (yyyy-MM-dd'T'HH:mm:ss.SSS). -`strict_date_hour_minute_second_millis` is supported. - -|`date_optional_time`|a generic ISO datetime parser where the date is -mandatory and the time is optional. `strict_date_optional_time` is supported. - -|`date_time`|A formatter that combines a full date and time, separated by -a 'T' (yyyy-MM-dd'T'HH:mm:ss.SSSZZ). `strict_date_time` is supported. - -|`date_time_no_millis`|A formatter that combines a full date and time -without millis, separated by a 'T' (yyyy-MM-dd'T'HH:mm:ssZZ). -`strict_date_time_no_millis` is supported. - -|`hour`|A formatter for a two digit hour of day. `strict_hour` is supported. - -|`hour_minute`|A formatter for a two digit hour of day and two digit -minute of hour. `strict_hour_minute` is supported. - -|`hour_minute_second`|A formatter for a two digit hour of day, two digit -minute of hour, and two digit second of minute. -`strict_hour_minute_second` is supported. - -|`hour_minute_second_fraction`|A formatter for a two digit hour of day, -two digit minute of hour, two digit second of minute, and three digit -fraction of second (HH:mm:ss.SSS). -`strict_hour_minute_second_fraction` is supported. - -|`hour_minute_second_millis`|A formatter for a two digit hour of day, two -digit minute of hour, two digit second of minute, and three digit -fraction of second (HH:mm:ss.SSS). -`strict_hour_minute_second_millis` is supported. - -|`ordinal_date`|A formatter for a full ordinal date, using a four digit -year and three digit dayOfYear (yyyy-DDD). `strict_ordinal_date` is supported. - -|`ordinal_date_time`|A formatter for a full ordinal date and time, using -a four digit year and three digit dayOfYear (yyyy-DDD'T'HH:mm:ss.SSSZZ). -`strict_ordinal_date_time` is supported. - -|`ordinal_date_time_no_millis`|A formatter for a full ordinal date and -time without millis, using a four digit year and three digit dayOfYear -(yyyy-DDD'T'HH:mm:ssZZ). -`strict_ordinal_date_time_no_millis` is supported. - -|`time`|A formatter for a two digit hour of day, two digit minute of -hour, two digit second of minute, three digit fraction of second, and -time zone offset (HH:mm:ss.SSSZZ). `strict_time` is supported. - -|`time_no_millis`|A formatter for a two digit hour of day, two digit -minute of hour, two digit second of minute, and time zone offset -(HH:mm:ssZZ). `strict_time_no_millis` is supported. - -|`t_time`|A formatter for a two digit hour of day, two digit minute of -hour, two digit second of minute, three digit fraction of second, and -time zone offset prefixed by 'T' ('T'HH:mm:ss.SSSZZ). -`strict_t_time` is supported. - -|`t_time_no_millis`|A formatter for a two digit hour of day, two digit -minute of hour, two digit second of minute, and time zone offset -prefixed by 'T' ('T'HH:mm:ssZZ). `strict_t_time_no_millis` is supported. - -|`week_date`|A formatter for a full date as four digit weekyear, two -digit week of weekyear, and one digit day of week (xxxx-'W'ww-e). -`strict_week_date` is supported. - -|`week_date_time`|A formatter that combines a full weekyear date and -time, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ). -`strict_week_date_time` is supported. - -|`week_date_time_no_millis`|A formatter that combines a full weekyear date -and time without millis, separated by a 'T' (xxxx-'W'ww-e'T'HH:mm:ssZZ). -`strict_week_date_time` is supported. - -|`weekyear`|A formatter for a four digit weekyear. `strict_week_year` is supported. - -|`weekyear_week`|A formatter for a four digit weekyear and two digit week -of weekyear. `strict_weekyear_week` is supported. - -|`weekyear_week_day`|A formatter for a four digit weekyear, two digit week -of weekyear, and one digit day of week. `strict_weekyear_week_day` is supported. - -|`year`|A formatter for a four digit year. `strict_year` is supported. - -|`year_month`|A formatter for a four digit year and two digit month of -year. `strict_year_month` is supported. - -|`year_month_day`|A formatter for a four digit year, two digit month of -year, and two digit day of month. `strict_year_month_day` is supported. - -|`epoch_second`|A formatter for the number of seconds since the epoch. -Note, that this timestamp allows a max length of 10 chars, so dates -older than 1653 and 2286 are not supported. You should use a different -date formatter in that case. - -|`epoch_millis`|A formatter for the number of milliseconds since the epoch. -Note, that this timestamp allows a max length of 13 chars, so dates -older than 1653 and 2286 are not supported. You should use a different -date formatter in that case. -|======================================================================= - -[float] -[[custom]] -=== Custom Format - -Allows for a completely customizable date format explained -http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[here]. diff --git a/docs/reference/mapping/dynamic-mapping.asciidoc b/docs/reference/mapping/dynamic-mapping.asciidoc index 91ecd6b0c2d..0f445ac6152 100644 --- a/docs/reference/mapping/dynamic-mapping.asciidoc +++ b/docs/reference/mapping/dynamic-mapping.asciidoc @@ -1,73 +1,67 @@ -[[mapping-dynamic-mapping]] +[[dynamic-mapping]] == Dynamic Mapping -Default mappings allow generic mapping definitions to be automatically applied -to types that do not have mappings predefined. This is mainly done -thanks to the fact that the -<> and -namely the <> allow for schema-less dynamic addition of unmapped -fields. - -The default mapping definition is a plain mapping definition that is -embedded within the distribution: +One of the most important features of Elasticsearch is that it tries to get +out of your way and let you start exploring your data as quickly as possible. +To index a document, you don't have to first create an index, define a mapping +type, and define your fields -- you can just index a document and the index, +type, and fields will spring to life automatically: [source,js] -------------------------------------------------- -{ - "_default_" : { - } -} +PUT data/counters/1 <1> +{ "count": 5 } -------------------------------------------------- +// AUTOSENSE +<1> Creates the `data` index, the `counters` mapping type, and a field + called `count` with datatype `long`. -Pretty short, isn't it? Basically, everything is defaulted, especially the -dynamic nature of the root object mapping. The default mapping can be -overridden by specifying the `_default_` type when creating a new index. +The automatic detection and addition of new types and fields is called +_dynamic mapping_. The dynamic mapping rules can be customised to suit your +purposes with: -The dynamic creation of mappings for unmapped types can be completely -disabled by setting `index.mapper.dynamic` to `false`. +<>:: -The dynamic creation of fields within a type can be completely -disabled by setting the `dynamic` property of the type to `strict`. + Configure the base mapping to be used for new mapping types. -Here is a <> example that -disables dynamic field creation for a `tweet`: +<>:: -[source,js] --------------------------------------------------- -$ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d ' -{ - "tweet" : { - "dynamic": "strict", - "properties" : { - "message" : {"type" : "string", "store" : true } - } - } -} -' --------------------------------------------------- + The rules governing dynamic field detection. -Here is how we can change the default -<> used in the -root and inner object types: +<>:: + + Custom rules to configure the mapping for dynamically added fields. + +TIP: <> allow you to configure the default +mappings, settings, aliases, and warmers for new indices, whether created +automatically or explicitly. -[source,js] --------------------------------------------------- -{ - "_default_" : { - "dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy", "date_optional_time"] - } -} --------------------------------------------------- [float] -=== Unmapped fields in queries +=== Disabling automatic type creation -Queries and filters can refer to fields that don't exist in a mapping. Whether this -is allowed is controlled by the `index.query.parse.allow_unmapped_fields` setting. -This setting defaults to `true`. Setting it to `false` will disallow the usage of -unmapped fields in queries. +Automatic type creation can be disabled by setting the `index.mapper.dynamic` +setting to `false`, either by setting the default value in the +`config/elasticsearch.yml` file, or per-index as an index setting: + +[source,js] +-------------------------------------------------- +PUT /_settings <1> +{ + "index.mapper.dynamic":false +} +-------------------------------------------------- +// AUTOSENSE +<1> Disable automatic type creation for all indices. + +Regardless of the value of this setting, types can still be added explicitly +when <> or with the +<> API. + + +include::dynamic/default-mapping.asciidoc[] + +include::dynamic/field-mapping.asciidoc[] + +include::dynamic/templates.asciidoc[] -When registering a new <> or creating -a <> then the `index.query.parse.allow_unmapped_fields` setting -is forcefully overwritten to disallowed unmapped fields. \ No newline at end of file diff --git a/docs/reference/mapping/dynamic/default-mapping.asciidoc b/docs/reference/mapping/dynamic/default-mapping.asciidoc new file mode 100644 index 00000000000..c1e1f8dec66 --- /dev/null +++ b/docs/reference/mapping/dynamic/default-mapping.asciidoc @@ -0,0 +1,82 @@ +[[default-mapping]] +=== `_default_` mapping + +The default mapping, which will be used as the base mapping for any new +mapping types, can be customised by adding a mapping type with the name +`_default_` to an index, either when +<> or later on with the +<> API. + + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "_default_": { <1> + "_all": { + "enabled": false + } + }, + "user": {}, <2> + "blogpost": { <3> + "_all": { + "enabled": true + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `_default_` mapping defaults the <> field to disabled. +<2> The `user` type inherits the settings from `_default_`. +<3> The `blogpost` type overrides the defaults and enables the <> field. + +While the `_default_` mapping can be updated after an index has been created, +the new defaults will only affect mapping types that are created afterwards. + +The `_default_` mapping can be used in conjunction with +<> to control dynamically created types +within automatically created indices: + + +[source,js] +-------------------------------------------------- +PUT _template/logging +{ + "template": "logs-*", <1> + "settings": { "number_of_shards": 1 }, <2> + "mappings": { + "_default_": { + "_all": { <3> + "enabled": false + }, + "dynamic_templates": [ + { + "strings": { <4> + "match_mapping_type": "string", + "mapping": { + "type": "string", + "fields": { + "raw": { + "type": "string", + "index": "not_analyzed", + "ignore_above": 256 + } + } + } + } + } + ] + } + } +} + +PUT logs-2015.10.01/event/1 +{ "message": "error:16" } +-------------------------------------------------- +// AUTOSENSE +<1> The `logging` template will match any indices beginning with `logs-`. +<2> Matching indices will be created with a single primary shard. +<3> The `_all` field will be disabled by default for new type mappings. +<4> String fields will be created with an `analyzed` main field, and a `not_analyzed` `.raw` field. diff --git a/docs/reference/mapping/dynamic/field-mapping.asciidoc b/docs/reference/mapping/dynamic/field-mapping.asciidoc new file mode 100644 index 00000000000..585931d5e3f --- /dev/null +++ b/docs/reference/mapping/dynamic/field-mapping.asciidoc @@ -0,0 +1,139 @@ +[[dynamic-field-mapping]] +=== Dynamic field mapping + +By default, when a previously unseen field is found in a document, +Elasticsearch will add the new field to the type mapping. This behaviour can +be disabled, both at the document and at the <> level, by +setting the <> parameter to `false` or to `strict`. + +Assuming `dynamic` field mapping is enabled, some simple rules are used to +determine which datatype the field should have: + +[horizontal] +*JSON datatype*:: *Elasticsearch datatype* + +`null`:: No field is added. +`true` or `false`:: <> field +floating{nbsp}point{nbsp}number:: <> field +integer:: <> field +object:: <> field +array:: Depends on the first non-`null` value in the array. +string:: Either a <> field + (if the value passes <>), + a <> or <> field + (if the value passes <>) + or an <> <> field. + +These are the only <> that are dynamically +detected. All other datatypes must be mapped explicitly. + +Besides the options listed below, dynamic field mapping rules can be further +customised with <>. + +[[date-detection]] +==== Date detection + +If `date_detection` is enabled (default), then new string fields are checked +to see whether their contents match any of the date patterns specified in +`dynamic_date_formats`. If a match is found, a new <> field is +added with the corresponding format. + +The default value for `dynamic_date_formats` is: + +[ <>,`"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`] + +For example: + + +[source,js] +-------------------------------------------------- +PUT my_index/my_type/1 +{ + "create_date": "2015/09/02" +} + +GET my_index/_mapping <1> +-------------------------------------------------- +// AUTOSENSE +<1> The `create_date` field has been added as a <> + field with the <>: + + `"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"`. + +===== Disabling date detection + +Dynamic date dection can be disabled by setting `date_detection` to `false`: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "date_detection": false + } + } +} + +PUT my_index/my_type/1 <1> +{ + "create": "2015/09/02" +} +-------------------------------------------------- +// AUTOSENSE + +<1> The `create_date` field has been added as a <> field. + +===== Customising detected date formats + +Alternatively, the `dynamic_date_formats` can be customised to support your +own <>: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "dynamic_date_formats": ["MM/dd/yyyy"] + } + } +} + +PUT my_index/my_type/1 +{ + "create_date": "09/25/2015" +} +-------------------------------------------------- +// AUTOSENSE + + +[[numeric-detection]] +==== Numeric detection + +While JSON has support for native floating point and integer datatypes, some +applications or languages may sometimes render numbers as strings. Usually the +correct solution is to map these fields explicitly, but numeric detection +(which is disabled by default) can be enabled to do this automatically: + + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "numeric_detection": true + } + } +} + +PUT my_index/my_type/1 +{ + "my_float": "1.0", <1> + "my_integer": "1" <2> +} +-------------------------------------------------- +// AUTOSENSE +<1> The `my_float` field is added as a <> field. +<2> The `my_integer` field is added as a <> field. + diff --git a/docs/reference/mapping/dynamic/templates.asciidoc b/docs/reference/mapping/dynamic/templates.asciidoc new file mode 100644 index 00000000000..e38fc31cb37 --- /dev/null +++ b/docs/reference/mapping/dynamic/templates.asciidoc @@ -0,0 +1,251 @@ +[[dynamic-templates]] +=== Dynamic templates + +Dynamic templates allow you to define custom mappings that can be applied to +dynamically added fields based on: + +* the <> detected by Elasticsearch, with <>. +* the name of the field, with <> or <>. +* the full dotted path to the field, with <>. + +The original field name `{name}` and the detected datatype +`{dynamic_type`} <> can be used in +the mapping specification as placeholders. + +IMPORTANT: Dynamic field mappings are only added when a field contains a +concrete value -- not `null` or an empty array. This means that if the +`null_value` option is used in a `dynamic_template`, it will only be applied +after the first document with a concrete value for the field has been +indexed. + +Dynamic templates are specified as an array of named objects: + +[source,js] +-------------------------------------------------- + "dynamic_templates": [ + { + "my_template_name": { <1> + ... match conditions ... <2> + "mapping": { ... } <3> + } + }, + ... + ] +-------------------------------------------------- +<1> The template name can be any string value. +<2> The match conditions can include any of : `match_mapping_type`, `match`, `match_pattern`, `unmatch`, `match_path`, `unmatch_path`. +<3> The mapping that the matched field should use. + + +Templates are processed in order -- the first matching template wins. New +templates can be appended to the end of the list with the +<> API. If a new template has the same +name as an existing template, it will replace the old version. + +[[match-mapping-type]] +==== `match_mapping_type` + +The `match_mapping_type` matches on the datatype detected by +<>, in other words, the datatype +that Elasticsearch thinks the field should have. Only the following datatypes +can be automatically detected: `boolean`, `date`, `double`, `long`, `object`, +`string`. It also accepts `*` to match all datatypes. + +For example, if we wanted to map all integer fields as `integer` instead of +`long`, and all `string` fields as both `analyzed` and `not_analyzed`, we +could use the following template: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "dynamic_templates": [ + { + "integers": { + "match_mapping_type": "long", + "mapping": { + "type": "integer" + } + } + }, + { + "strings": { + "match_mapping_type": "string", + "mapping": { + "type": "string", + "fields": { + "raw": { + "type": "string", + "index": "not_analyzed", + "ignore_above": 256 + } + } + } + } + } + ] + } + } +} + +PUT my_index/my_type/1 +{ + "my_integer": 5, <1> + "my_string": "Some string" <2> +} + +-------------------------------------------------- +// AUTOSENSE +<1> The `my_integer` field is mapped as an `integer`. +<2> The `my_string` field is mapped as an analyzed `string`, with a `not_analyzed` <>. + + +[[match-unmatch]] +==== `match` and `unmatch` + +The `match` parameter uses a pattern to match on the fieldname, while +`unmatch` uses a pattern to exclude fields matched by `match`. + +The following example matches all `string` fields whose name starts with +`long_` (except for those which end with `_text`) and maps them as `long` +fields: + + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "dynamic_templates": [ + { + "longs_as_strings": { + "match_mapping_type": "string", + "match": "long_*", + "unmatch": "*_text", + "mapping": { + "type": "long" + } + } + } + ] + } + } +} + +PUT my_index/my_type/1 +{ + "long_num": "5", <1> + "long_text": "foo" <2> +} +-------------------------------------------------- +// AUTOSENSE +<1> The `long_num` field is mapped as a `long`. +<2> The `long_text` field uses the default `string` mapping. + +[[match-pattern]] +==== `match_pattern` + +The `match_pattern` parameter behaves just like the `match` parameter, but +supports full Java regular expression matching on the field name instead of +simple wildcards, for instance: + +[source,js] +-------------------------------------------------- + "match_pattern": "^profit_\d+$" +-------------------------------------------------- + +[[path-match-unmatch]] +==== `path_match` and `path_unmatch` + +The `path_match` and `path_unmatch` parameters work in the same way as `match` +and `unmatch`, but operate on the full dotted path to the field, not just the +final name, e.g. `some_object.*.some_field`. + +This example copies the values of any fields in the `name` object to the +top-level `full_name` field, except for the `middle` field: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "dynamic_templates": [ + { + "full_name": { + "path_match": "name.*", + "path_unmatch": "*.middle", + "mapping": { + "type": "string", + "copy_to": "full_name" + } + } + } + ] + } + } +} + +PUT my_index/my_type/1 +{ + "name": { + "first": "Alice", + "middle": "Mary", + "last": "White" + } +} +-------------------------------------------------- +// AUTOSENSE + +[[template-variables]] +==== `{name}` and `{dynamic_type}` + +The `{name}` and `{dynamic_type}` placeholders are replaced in the `mapping` +with the field name and detected dynamic type. The following example sets all +string fields to use an <> with the same name as the +field, and disables <> for all non-string fields: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "dynamic_templates": [ + { + "named_analyzers": { + "match_mapping_type": "string", + "match": "*", + "mapping": { + "type": "string", + "analyzer": "{name}" + } + } + }, + { + "no_doc_values": { + "match_mapping_type":"*", + "mapping": { + "type": "{dynamic_type}", + "doc_values": false + } + } + } + ] + } + } +} + +PUT my_index/my_type/1 +{ + "english": "Some English text", <1> + "count": 5 <2> +} +-------------------------------------------------- +// AUTOSENSE +<1> The `english` field is mapped as a `string` field with the `english` analyzer. +<2> The `count` field is mapped as a `long` field with `doc_values` disabled + diff --git a/docs/reference/mapping/fielddata_formats.asciidoc b/docs/reference/mapping/fielddata_formats.asciidoc deleted file mode 100644 index eda3566258e..00000000000 --- a/docs/reference/mapping/fielddata_formats.asciidoc +++ /dev/null @@ -1,257 +0,0 @@ -[[fielddata-formats]] -== Fielddata formats - -The field data format controls how field data should be stored. - -Depending on the field type, there might be several field data types -available. In particular, string, geo-point and numeric types support the `doc_values` -format which allows for computing the field data data-structures at indexing -time and storing them on disk. Although it will make the index larger and may -be slightly slower, this implementation will be more near-realtime-friendly -and will require much less memory from the JVM than other implementations. - -Here is an example of how to configure the `tag` field to use the `paged_bytes` field -data format. - -[source,js] --------------------------------------------------- -{ - "tag": { - "type": "string", - "fielddata": { - "format": "paged_bytes" - } - } -} --------------------------------------------------- - -It is possible to change the field data format (and the field data settings -in general) on a live index by using the update mapping API. - -[float] -=== String field data types - -`paged_bytes` (default on analyzed string fields):: - Stores unique terms sequentially in a large buffer and maps documents to - the indices of the terms they contain in this large buffer. - -`doc_values` (default when index is set to `not_analyzed`):: - Computes and stores field data data-structures on disk at indexing time. - Lowers memory usage but only works on non-analyzed strings (`index`: `no` or - `not_analyzed`). - -[float] -=== Numeric field data types - -`array`:: - Stores field values in memory using arrays. - -`doc_values` (default unless doc values are disabled):: - Computes and stores field data data-structures on disk at indexing time. - -[float] -=== Geo point field data types - -`array`:: - Stores latitudes and longitudes in arrays. - -`doc_values` (default unless doc values are disabled):: - Computes and stores field data data-structures on disk at indexing time. - -[float] -[[global-ordinals]] -=== Global ordinals - -Global ordinals is a data-structure on top of field data, that maintains an -incremental numbering for all the terms in field data in a lexicographic order. -Each term has a unique number and the number of term 'A' is lower than the number -of term 'B'. Global ordinals are only supported on string fields. - -Field data on string also has ordinals, which is a unique numbering for all terms -in a particular segment and field. Global ordinals just build on top of this, -by providing a mapping between the segment ordinals and the global ordinals. -The latter being unique across the entire shard. - -Global ordinals can be beneficial in search features that use segment ordinals already -such as the terms aggregator to improve the execution time. Often these search features -need to merge the segment ordinal results to a cross segment terms result. With -global ordinals this mapping happens during field data load time instead of during each -query execution. With global ordinals search features only need to resolve the actual -term when building the (shard) response, but during the execution there is no need -at all to use the actual terms and the unique numbering global ordinals provided is -sufficient and improves the execution time. - -Global ordinals for a specified field are tied to all the segments of a shard (Lucene index), -which is different than for field data for a specific field which is tied to a single segment. -For this reason global ordinals need to be rebuilt in its entirety once new segments -become visible. This one time cost would happen anyway without global ordinals, but -then it would happen for each search execution instead! - -The loading time of global ordinals depends on the number of terms in a field, but in general -it is low, since it source field data has already been loaded. The memory overhead of global -ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals -can move the loading time from the first search request, to the refresh itself. - -[float] -[[fielddata-loading]] -=== Fielddata loading - -By default, field data is loaded lazily, ie. the first time that a query that -requires them is executed. However, this can make the first requests that -follow a merge operation quite slow since fielddata loading is a heavy -operation. - -It is possible to force field data to be loaded and cached eagerly through the -`loading` setting of fielddata: - -[source,js] --------------------------------------------------- -{ - "category": { - "type": "string", - "fielddata": { - "loading": "eager" - } - } -} --------------------------------------------------- - -Global ordinals can also be eagerly loaded: - -[source,js] --------------------------------------------------- -{ - "category": { - "type": "string", - "fielddata": { - "loading": "eager_global_ordinals" - } - } -} --------------------------------------------------- - -With the above setting both field data and global ordinals for a specific field -are eagerly loaded. - -[float] -==== Disabling field data loading - -Field data can take a lot of RAM so it makes sense to disable field data -loading on the fields that don't need field data, for example those that are -used for full-text search only. In order to disable field data loading, just -change the field data format to `disabled`. When disabled, all requests that -will try to load field data, e.g. when they include aggregations and/or sorting, -will return an error. - -[source,js] --------------------------------------------------- -{ - "text": { - "type": "string", - "fielddata": { - "format": "disabled" - } - } -} --------------------------------------------------- - -The `disabled` format is supported by all field types. - -[float] -[[field-data-filtering]] -=== Filtering fielddata - -It is possible to control which field values are loaded into memory, -which is particularly useful for string fields. When specifying the -<> for a field, you -can also specify a fielddata filter. - -Fielddata filters can be changed using the -<> -API. After changing the filters, use the -<> API -to reload the fielddata using the new filters. - -[float] -==== Filtering by frequency: - -The frequency filter allows you to only load terms whose frequency falls -between a `min` and `max` value, which can be expressed an absolute -number (when the number is bigger than 1.0) or as a percentage -(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated -*per segment*. Percentages are based on the number of docs which have a -value for the field, as opposed to all docs in the segment. - -Small segments can be excluded completely by specifying the minimum -number of docs that the segment should contain with `min_segment_size`: - -[source,js] --------------------------------------------------- -{ - "tag": { - "type": "string", - "fielddata": { - "filter": { - "frequency": { - "min": 0.001, - "max": 0.1, - "min_segment_size": 500 - } - } - } - } -} --------------------------------------------------- - -[float] -==== Filtering by regex - -Terms can also be filtered by regular expression - only values which -match the regular expression are loaded. Note: the regular expression is -applied to each term in the field, not to the whole field value. For -instance, to only load hashtags from a tweet, we can use a regular -expression which matches terms beginning with `#`: - -[source,js] --------------------------------------------------- -{ - "tweet": { - "type": "string", - "analyzer": "whitespace" - "fielddata": { - "filter": { - "regex": { - "pattern": "^#.*" - } - } - } - } -} --------------------------------------------------- - -[float] -==== Combining filters - -The `frequency` and `regex` filters can be combined: - -[source,js] --------------------------------------------------- -{ - "tweet": { - "type": "string", - "analyzer": "whitespace" - "fielddata": { - "filter": { - "regex": { - "pattern": "^#.*", - }, - "frequency": { - "min": 0.001, - "max": 0.1, - "min_segment_size": 500 - } - } - } - } -} --------------------------------------------------- diff --git a/docs/reference/mapping/fields.asciidoc b/docs/reference/mapping/fields.asciidoc index e2d42a93558..dd267a84410 100644 --- a/docs/reference/mapping/fields.asciidoc +++ b/docs/reference/mapping/fields.asciidoc @@ -5,7 +5,8 @@ Each document has metadata associated with it, such as the `_index`, mapping <>, and `_id` meta-fields. The behaviour of some of these meta-fields can be customised when a mapping type is created. -The meta-fields are: +[float] +=== Identity meta-fields [horizontal] <>:: @@ -18,16 +19,26 @@ The meta-fields are: <>:: - The document's <>. + The document's <>. <>:: The document's ID. +[float] +=== Document source meta-fields + <>:: The original JSON representing the body of the document. +<>:: + + The size of the `_source` field in bytes. + +[float] +=== Indexing meta-fields + <>:: A _catch-all_ field that indexes the values of all other fields. @@ -36,18 +47,6 @@ The meta-fields are: All fields in the document which contain non-null values. -<>:: - - Used to create a parent-child relationship between two mapping types. - -<>:: - - A custom routing value which routes a document to a particular shard. - -<>:: - - The size of the `_source` field in bytes. - <>:: A timestamp associated with the document, either specified manually or auto-generated. @@ -56,27 +55,49 @@ The meta-fields are: How long a document should live before it is automatically deleted. -include::fields/index-field.asciidoc[] +[float] +=== Routing meta-fields -include::fields/uid-field.asciidoc[] +<>:: -include::fields/type-field.asciidoc[] + Used to create a parent-child relationship between two mapping types. + +<>:: + + A custom routing value which routes a document to a particular shard. + +[float] +=== Other meta-field + +<>:: + + Application specific metadata. -include::fields/id-field.asciidoc[] -include::fields/source-field.asciidoc[] include::fields/all-field.asciidoc[] include::fields/field-names-field.asciidoc[] +include::fields/id-field.asciidoc[] + +include::fields/index-field.asciidoc[] + +include::fields/meta-field.asciidoc[] + include::fields/parent-field.asciidoc[] include::fields/routing-field.asciidoc[] include::fields/size-field.asciidoc[] +include::fields/source-field.asciidoc[] + include::fields/timestamp-field.asciidoc[] include::fields/ttl-field.asciidoc[] +include::fields/type-field.asciidoc[] + +include::fields/uid-field.asciidoc[] + diff --git a/docs/reference/mapping/fields/all-field.asciidoc b/docs/reference/mapping/fields/all-field.asciidoc index d6037f4e804..00c8d3b245b 100644 --- a/docs/reference/mapping/fields/all-field.asciidoc +++ b/docs/reference/mapping/fields/all-field.asciidoc @@ -151,82 +151,18 @@ PUT my_index <1> The `_all` field is disabled for the `my_type` type. <2> The `query_string` query will default to querying the `content` field in this index. -[[include-in-all]] -==== Including specific fields in `_all` +[[excluding-from-all]] +==== Excluding fields from `_all` Individual fields can be included or excluded from the `_all` field with the -`include_in_all` setting, which defaults to `true`: +<> setting. -[source,js] --------------------------------- -PUT my_index -{ - "mappings": { - "my_type": { - "properties": { - "title": { <1> - "type": "string" - } - "content": { <1> - "type": "string" - }, - "date": { <2> - "type": "date", - "include_in_all": false - } - } - } - } -} --------------------------------- -// AUTOSENSE - -<1> The `title` and `content` fields with be included in the `_all` field. -<2> The `date` field will not be included in the `_all` field. - -The `include_in_all` parameter can also be set at the type level and on -<> or <> fields, -in which case all sub-fields inherit that setting. For instance: - -[source,js] --------------------------------- -PUT my_index -{ - "mappings": { - "my_type": { - "include_in_all": false, <1> - "properties": { - "title": { "type": "string" }, - "author": { - "include_in_all": true, <2> - "properties": { - "first_name": { "type": "string" }, - "last_name": { "type": "string" } - } - }, - "editor": { - "properties": { - "first_name": { "type": "string" }, <3> - "last_name": { "type": "string", "include_in_all": true } <3> - } - } - } - } - } -} --------------------------------- -// AUTOSENSE - -<1> All fields in `my_type` are excluded from `_all`. -<2> The `author.first_name` and `author.last_name` fields are included in `_all`. -<3> Only the `editor.last_name` field is included in `_all`. - The `editor.first_name` inherits the type-level setting and is excluded. [[all-field-and-boosting]] ==== Index boosting and the `_all` field -Individual fields can be _boosted_ at index time, with the `boost` parameter. -The `_all` field takes these boosts into account: +Individual fields can be _boosted_ at index time, with the <> +parameter. The `_all` field takes these boosts into account: [source,js] -------------------------------- diff --git a/docs/reference/mapping/fields/id-field.asciidoc b/docs/reference/mapping/fields/id-field.asciidoc index 3aa6b927128..994ff2ef3da 100644 --- a/docs/reference/mapping/fields/id-field.asciidoc +++ b/docs/reference/mapping/fields/id-field.asciidoc @@ -2,8 +2,8 @@ === `_id` field Each document indexed is associated with a <> (see -<>) and an <>. The -`_id` field is not indexed as its value can be derived automatically from the +<>) and an <>. The `_id` field is not +indexed as its value can be derived automatically from the <> field. The value of the `_id` field is accessible in queries and scripts, but _not_ diff --git a/docs/reference/mapping/fields/meta-field.asciidoc b/docs/reference/mapping/fields/meta-field.asciidoc new file mode 100644 index 00000000000..4bcab15b4c0 --- /dev/null +++ b/docs/reference/mapping/fields/meta-field.asciidoc @@ -0,0 +1,30 @@ +[[mapping-meta-field]] +=== `_meta` field + +Each mapping type can have custom meta data associated with it. These are not +used at all by Elasticsearch, but can be used to store application-specific +metadata, such as the class that a document belongs to: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "user": { + "_meta": { <1> + "class": "MyApp::User", + "version": { + "min": "1.0", + "max": "1.3" + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> This `_meta` info can be retrieved with the + <> API. + +The `_meta` field can be updated on an existing type using the +<> API. diff --git a/docs/reference/mapping/fields/source-field.asciidoc b/docs/reference/mapping/fields/source-field.asciidoc index f33b4a72c82..767b6afe21e 100644 --- a/docs/reference/mapping/fields/source-field.asciidoc +++ b/docs/reference/mapping/fields/source-field.asciidoc @@ -78,8 +78,7 @@ stored. WARNING: Removing fields from the `_source` has similar downsides to disabling `_source`, especially the fact that you cannot reindex documents from one Elasticsearch index to another. Consider using -<> or a -<> instead. +<> instead. The `includes`/`excludes` parameters (which also accept wildcards) can be used as follows: diff --git a/docs/reference/mapping/fields/ttl-field.asciidoc b/docs/reference/mapping/fields/ttl-field.asciidoc index 446b1d96502..26bc3e74417 100644 --- a/docs/reference/mapping/fields/ttl-field.asciidoc +++ b/docs/reference/mapping/fields/ttl-field.asciidoc @@ -1,5 +1,5 @@ [[mapping-ttl-field]] -=== `_ttl` +=== `_ttl` field Some types of documents, such as session data or special offers, come with an expiration date. The `_ttl` field allows you to specify the minimum time a diff --git a/docs/reference/mapping/fields/type-field.asciidoc b/docs/reference/mapping/fields/type-field.asciidoc index bc6c578922d..c8f3817bcbf 100644 --- a/docs/reference/mapping/fields/type-field.asciidoc +++ b/docs/reference/mapping/fields/type-field.asciidoc @@ -2,8 +2,8 @@ === `_type` field Each document indexed is associated with a <> (see -<>) and an <>. The -`_type` field is indexed in order to make searching by type name fast. +<>) and an <>. The `_type` field is +indexed in order to make searching by type name fast. The value of the `_type` field is accessible in queries, aggregations, scripts, and when sorting: diff --git a/docs/reference/mapping/fields/uid-field.asciidoc b/docs/reference/mapping/fields/uid-field.asciidoc index a6dc6a9a27e..cab5f4f8216 100644 --- a/docs/reference/mapping/fields/uid-field.asciidoc +++ b/docs/reference/mapping/fields/uid-field.asciidoc @@ -2,8 +2,8 @@ === `_uid` field Each document indexed is associated with a <> (see -<>) and an <>. These -values are combined as `{type}#{id}` and indexed as the `_uid` field. +<>) and an <>. These values are +combined as `{type}#{id}` and indexed as the `_uid` field. The value of the `_uid` field is accessible in queries, aggregations, scripts, and when sorting: diff --git a/docs/reference/mapping/meta.asciidoc b/docs/reference/mapping/meta.asciidoc deleted file mode 100644 index 5cb0c14eaad..00000000000 --- a/docs/reference/mapping/meta.asciidoc +++ /dev/null @@ -1,25 +0,0 @@ -[[mapping-meta]] -== Meta - -Each mapping can have custom meta data associated with it. These are -simple storage elements that are simply persisted along with the mapping -and can be retrieved when fetching the mapping definition. The meta is -defined under the `_meta` element, for example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "_meta" : { - "attr1" : "value1", - "attr2" : { - "attr3" : "value3" - } - } - } -} --------------------------------------------------- - -Meta can be handy for example for client libraries that perform -serialization and deserialization to store its meta model (for example, -the class the document maps to). diff --git a/docs/reference/mapping/params.asciidoc b/docs/reference/mapping/params.asciidoc new file mode 100644 index 00000000000..119ce820ee4 --- /dev/null +++ b/docs/reference/mapping/params.asciidoc @@ -0,0 +1,100 @@ +[[mapping-params]] +== Mapping parameters + +The following pages provide detailed explanations of the various mapping +parameters that are used by <>: + + +The following mapping parameters are common to some or all field datatypes: + +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> + + +include::params/analyzer.asciidoc[] + +include::params/boost.asciidoc[] + +include::params/coerce.asciidoc[] + +include::params/copy-to.asciidoc[] + +include::params/doc-values.asciidoc[] + +include::params/dynamic.asciidoc[] + +include::params/enabled.asciidoc[] + +include::params/fielddata.asciidoc[] + +include::params/format.asciidoc[] + +include::params/geohash.asciidoc[] + +include::params/geohash-precision.asciidoc[] + +include::params/geohash-prefix.asciidoc[] + +include::params/ignore-above.asciidoc[] + +include::params/ignore-malformed.asciidoc[] + +include::params/include-in-all.asciidoc[] + +include::params/index.asciidoc[] + +include::params/index-options.asciidoc[] + +include::params/lat-lon.asciidoc[] + +include::params/multi-fields.asciidoc[] + +include::params/norms.asciidoc[] + +include::params/null-value.asciidoc[] + +include::params/position-offset-gap.asciidoc[] + +include::params/precision-step.asciidoc[] + +include::params/properties.asciidoc[] + +include::params/search-analyzer.asciidoc[] + +include::params/similarity.asciidoc[] + +include::params/store.asciidoc[] + +include::params/term-vector.asciidoc[] + + +[source,js] +-------------------------------------------------- +-------------------------------------------------- +// AUTOSENSE + diff --git a/docs/reference/mapping/params/analyzer.asciidoc b/docs/reference/mapping/params/analyzer.asciidoc new file mode 100644 index 00000000000..6c48ebd17d9 --- /dev/null +++ b/docs/reference/mapping/params/analyzer.asciidoc @@ -0,0 +1,80 @@ +[[analyzer]] +=== `analyzer` + +The values of <> string fields are passed through an +<> to convert the string into a stream of _tokens_ or +_terms_. For instance, the string `"The quick Brown Foxes."` may, depending +on which analyzer is used, be analyzed to the tokens: `quick`, `brown`, +`fox`. These are the actual terms that are indexed for the field, which makes +it possible to search efficiently for individual words _within_ big blobs of +text. + +This analysis process needs to happen not just at index time, but also at +query time: the query string needs to be passed through the same (or a +similar) analyzer so that the terms that it tries to find are in the same +format as those that exist in the index. + +Elasticsearch ships with a number of <>, +which can be used without further configuration. It also ships with many +<>, <>, +and <> which can be combined to configure +custom analyzers per index. + +Analyzers can be specified per-query, per-field or per-index. At index time, +Elasticsearch will look for an analyzer in this order: + +* The `analyzer` defined in the field mapping. +* An analyzer named `default` in the index settings. +* The <> analyzer. + +At query time, there are a few more layers: + +* The `analyzer` defined in a <>. +* The `search_analyzer` defined in the field mapping. +* The `analyzer` defined in the field mapping. +* An analyzer named `default_search` in the index settings. +* An analyzer named `default` in the index settings. +* The <> analyzer. + +The easiest way to specify an analyzer for a particular field is to define it +in the field mapping, as follows: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "text": { <1> + "type": "string", + "fields": { + "english": { <2> + "type": "string", + "analyzer": "english" + } + } + } + } + } + } +} + +GET my_index/_analyze?field=text <3> +{ + "text": "The quick Brown Foxes." +} + +GET my_index/_analyze?field=text.english <4> +{ + "text": "The quick Brown Foxes." +} +-------------------------------------------------- +// AUTOSENSE +<1> The `text` field uses the default `standard` analyzer`. +<2> The `text.english` <> uses the `english` analyzer, which removes stop words and applies stemming. +<3> This returns the tokens: [ `the`, `quick`, `brown`, `foxes` ]. +<4> This returns the tokens: [ `quick`, `brown`, `fox` ]. + + + diff --git a/docs/reference/mapping/params/boost.asciidoc b/docs/reference/mapping/params/boost.asciidoc new file mode 100644 index 00000000000..b92e081ca72 --- /dev/null +++ b/docs/reference/mapping/params/boost.asciidoc @@ -0,0 +1,59 @@ +[[index-boost]] +=== `boost` + +Individual fields can be _boosted_ -- count more towards the relevance score +-- at index time, with the `boost` parameter as follows: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "title": { + "type": "string", + "boost": 2 <1> + }, + "content": { + "type": "string" + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + +<1> Matches on the `title` field will have twice the weight as those on the + `content` field, which has the default `boost` of `1.0`. + +Note that a `title` field will usually be shorter than a `content` field. The +default relevance calculation takes field length into account, so a short +`title` field will have a higher natural boost than a long `content` field. + +[WARNING] +.Why index time boosting is a bad idea +================================================== + +We advise against using index time boosting for the following reasons: + +* You cannot change index-time `boost` values without reindexing all of your + documents. + +* Every query supports query-time boosting which achieves the same effect. The + difference is that you can tweak the `boost` value without having to reindex. + +* Index-time boosts are stored as part of the <>, which is only one + byte. This reduces the resolution of the field length normalization factor + which can lead to lower quality relevance calculations. + +================================================== + +The only advantage that index time boosting has is that it is copied with the +value into the <> field. This means that, when +querying the `_all` field, words that originated from the `title` field will +have a higher score than words that originated in the `content` field. +This functionality comes at a cost: queries on the `_all` field are slower +when index-time boosting is used. + diff --git a/docs/reference/mapping/params/coerce.asciidoc b/docs/reference/mapping/params/coerce.asciidoc new file mode 100644 index 00000000000..66add2aa990 --- /dev/null +++ b/docs/reference/mapping/params/coerce.asciidoc @@ -0,0 +1,89 @@ +[[coerce]] +=== `coerce` + +Data is not always clean. Depending on how it is produced a number might be +rendered in the JSON body as a true JSON number, e.g. `5`, but it might also +be rendered as a string, e.g. `"5"`. Alternatively, a number that should be +an integer might instead be rendered as a floating point, e.g. `5.0`, or even +`"5.0"`. + +Coercion attempts to clean up dirty values to fit the datatype of a field. +For instance: + +* Strings will be coerced to numbers. +* Floating points will be truncated for integer values. +* Lon/lat geo-points will be normalized to a standard -180:180 / -90:90 coordinate system. + +For instance: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "number_one": { + "type": "integer" + }, + "number_two": { + "type": "integer", + "coerce": false + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "number_one": "10" <1> +} + +PUT my_index/my_type/2 +{ + "number_two": "10" <2> +} +-------------------------------------------------- +// AUTOSENSE +<1> The `number_one` field will contain the integer `10`. +<2> This document will be rejected because coercion is disabled. + +[[coerce-setting]] +==== Index-level default + +The `index.mapping.coerce` setting can be set on the index level to disable +coercion globally across all mapping types: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "settings": { + "index.mapping.coerce": false + }, + "mappings": { + "my_type": { + "properties": { + "number_one": { + "type": "integer" + }, + "number_two": { + "type": "integer", + "coerce": true + } + } + } + } +} + +PUT my_index/my_type/1 +{ "number_one": "10" } <1> + +PUT my_index/my_type/2 +{ "number_two": "10" } <2> +-------------------------------------------------- +// AUTOSENSE +<1> This document will be rejected because the `number_one` field inherits the index-level coercion setting. +<2> The `number_two` field overrides the index level setting to enable coercion. + diff --git a/docs/reference/mapping/params/copy-to.asciidoc b/docs/reference/mapping/params/copy-to.asciidoc new file mode 100644 index 00000000000..b437a87424a --- /dev/null +++ b/docs/reference/mapping/params/copy-to.asciidoc @@ -0,0 +1,64 @@ +[[copy-to]] +=== `copy_to` + +The `copy_to` parameter allows you to create custom +<> fields. In other words, the values of multiple +fields can be copied into a group field, which can then be queried as a single +field. For instance, the `first_name` and `last_name` fields can be copied to +the `full_name` field as follows: + +[source,js] +-------------------------------------------------- +PUT /my_index +{ + "mappings": { + "my_type": { + "properties": { + "first_name": { + "type": "string", + "copy_to": "full_name" <1> + }, + "last_name": { + "type": "string", + "copy_to": "full_name" <1> + }, + "full_name": { + "type": "string" + } + } + } + } +} + +PUT /my_index/my_type/1 +{ + "first_name": "John", + "last_name": "Smith" +} + +GET /my_index/_search +{ + "query": { + "match": { + "full_name": { <2> + "query": "John Smith", + "operator": "and" + } + } + } +} + +-------------------------------------------------- +// AUTOSENSE +<1> The values of the `first_name` and `last_name` fields are copied to the + `full_name` field. + +<2> The `first_name` and `last_name` fields can still be queried for the + first name and last name respectively, but the `full_name` field can be + queried for both first and last names. + +Some important points: + +* It is the field _value_ which is copied, not the terms (which result from the analysis process). +* The original <> field will not be modified to show the copied values. +* The same value can be copied to multiple fields, with `"copy_to": [ "field_1", "field_2" ]` diff --git a/docs/reference/mapping/params/doc-values.asciidoc b/docs/reference/mapping/params/doc-values.asciidoc new file mode 100644 index 00000000000..4a495e6be92 --- /dev/null +++ b/docs/reference/mapping/params/doc-values.asciidoc @@ -0,0 +1,46 @@ +[[doc-values]] +=== `doc_values` + +Most fields are <> by default, which makes them +searchable. The inverted index allows queries to look up the search term in +unique sorted list of terms, and from that immediately have access to the list +of documents that contain the term. + +Sorting, aggregations, and access to field values in scripts requires a +different data access pattern. Instead of lookup up the term and finding +documents, we need to be able to look up the document and find the terms that +is has in a field. + +Doc values are the on-disk data structure, built at document index time, which +makes this data access pattern possible. Doc values are supported on almost +all field types, with the __notable exception of `analyzed` string fields__. + +All fields which support doc values have them enabled by default. If you are +sure that you don't need to sort or aggregate on a field, or access the field +value from a script, you can disable doc values in order to save disk space: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "status_code": { <1> + "type": "string", + "index": "not_analyzed" + }, + "session_id": { <2> + "type": "string", + "index": "not_analyzed", + "doc_values": false + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `status_code` field has `doc_values` enabled by default. +<2> The `session_id` has `doc_values` disabled, but can still be queried. + diff --git a/docs/reference/mapping/params/dynamic.asciidoc b/docs/reference/mapping/params/dynamic.asciidoc new file mode 100644 index 00000000000..3ea3c9cb922 --- /dev/null +++ b/docs/reference/mapping/params/dynamic.asciidoc @@ -0,0 +1,87 @@ +[[dynamic]] +=== `dynamic` + +By default, fields can be added _dynamically_ to a document, or to +<> within a document, just by indexing a document +containing the new field. For instance: + +[source,js] +-------------------------------------------------- +DELETE my_index <1> + +PUT my_index/my_type/1 <2> +{ + "username": "johnsmith", + "name": { + "first": "John", + "last": "Smith" + } +} + +GET my_index/_mapping <3> + +PUT my_index/my_type/2 <4> +{ + "username": "marywhite", + "email": "mary@white.com", + "name": { + "first": "Mary", + "middle": "Alice", + "last": "White" + } +} + +GET my_index/_mapping <5> +-------------------------------------------------- +// AUTOSENSE +<1> First delete the index, in case it already exists. +<2> This document introduces the string field `username`, the object field + `name`, and two string fields under the `name` object which can be + referred to as `name.first` and `name.last`. +<3> Check the mapping to verify the above. +<4> This document adds two string fields: `email` and `name.middle`. +<5> Check the mapping to verify the changes. + +The details of how new fields are detected and added to the mapping is explained in <>. + +The `dynamic` setting controls whether new fields can be added dynamically or +not. It accepts three settings: + +[horizontal] +`true`:: Newly detected fields are added to the mapping. (default) +`false`:: Newly detected fields are ignored. New fields must be added explicitly. +`strict`:: If new fields are detected, an exception is thrown and the document is rejected. + +The `dynamic` setting may be set at the mapping type level, and on each +<>. Inner objects inherit the setting from their parent +object or from the mapping type. For instance: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "dynamic": false, <1> + "properties": { + "user": { <2> + "properties": { + "name": { + "type": "string" + }, + "social_networks": { <3> + "dynamic": true, + "properties": {} + } + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> Dynamic mapping is disabled at the type level, so no new top-level fields will be added dynamically. +<2> The `user` object inherits the type-level setting. +<3> The `user.social_networks` object enables dynamic mapping, so new fields may be added to this inner object. + diff --git a/docs/reference/mapping/params/enabled.asciidoc b/docs/reference/mapping/params/enabled.asciidoc new file mode 100644 index 00000000000..d9d162e358e --- /dev/null +++ b/docs/reference/mapping/params/enabled.asciidoc @@ -0,0 +1,94 @@ +[[enabled]] +=== `enabled` + +Elasticsearch tries to index all of the fields you give it, but sometimes you +want to just store the field without indexing it. For instance, imagine that +you are using Elasticsearch as a web session store. You may want to index the +session ID and last update time, but you don't need to query or run +aggregations on the session data itself. + +The `enabled` setting, which can be applied only to the mapping type and to +<> fields, causes Elasticsearch to skip parsing of the +contents of the field entirely. The JSON can still be retrieved from the +<> field, but it is not searchable or stored +in any other way: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "session": { + "properties": { + "user_id": { + "type": "string", + "index": "not_analyzed" + }, + "last_updated": { + "type": "date" + }, + "session_data": { <1> + "enabled": false + } + } + } + } +} + +PUT my_index/session/session_1 +{ + "user_id": "kimchy", + "session_data": { <2> + "arbitrary_object": { + "some_array": [ "foo", "bar", { "baz": 2 } ] + } + }, + "last_updated": "2015-12-06T18:20:22" +} + +PUT my_index/session/session_2 +{ + "user_id": "jpountz", + "session_data": "none", <3> + "last_updated": "2015-12-06T18:22:13" +} +-------------------------------------------------- +// AUTOSENSE +<1> The `session_data` field is disabled. +<2> Any arbitrary data can be passed to the `session_data` field as it will be entirely ignored. +<3> The `session_data` will also ignore values that are not JSON objects. + +The entire mapping type may be disabled as well, in which case the document is +stored in the <> field, which means it can be +retrieved, but none of its contents are indexed in any way: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "session": { <1> + "enabled": false + } + } +} + +PUT my_index/session/session_1 +{ + "user_id": "kimchy", + "session_data": { + "arbitrary_object": { + "some_array": [ "foo", "bar", { "baz": 2 } ] + } + }, + "last_updated": "2015-12-06T18:20:22" +} + +GET my_index/session/session_1 <2> + +GET my_index/_mapping <3> +-------------------------------------------------- +// AUTOSENSE +<1> The entire `session` mapping type is disabled. +<2> The document can be retrieved. +<3> Checking the mapping reveals that no fields have been added. diff --git a/docs/reference/mapping/params/fielddata.asciidoc b/docs/reference/mapping/params/fielddata.asciidoc new file mode 100644 index 00000000000..33dd7156d98 --- /dev/null +++ b/docs/reference/mapping/params/fielddata.asciidoc @@ -0,0 +1,225 @@ +[[fielddata]] +=== `fielddata` + +Most fields are <> by default, which makes them +searchable. The inverted index allows queries to look up the search term in +unique sorted list of terms, and from that immediately have access to the list +of documents that contain the term. + +Sorting, aggregations, and access to field values in scripts requires a +different data access pattern. Instead of lookup up the term and finding +documents, we need to be able to look up the document and find the terms that +is has in a field. + +Most fields can use index-time, on-disk <> to support +this type of data access pattern, but `analyzed` string fields do not support +`doc_values`. + +Instead, `analyzed` strings use a query-time data structure called +`fielddata`. This data structure is built on demand the first time that a +field is used for aggregations, sorting, or is accessed in a script. It is built +by reading the entire inverted index for each segment from disk, inverting the +term ↔︎ document relationship, and storing the result in memory, in the +JVM heap. + + +Loading fielddata is an expensive process so, once it has been loaded, it +remains in memory for the lifetime of the segment. + +[WARNING] +.Fielddata can fill up your heap space +============================================================================== +Fielddata can consume a lot of heap space, especially when loading high +cardinality `analyzed` string fields. Most of the time, it doesn't make sense +to sort or aggregate on `analyzed` string fields (with the notable exception +of the +<> +aggregation). Always think about whether a `not_analyzed` field (which can +use `doc_values`) would be a better fit for your use case. +============================================================================== + +[[fielddata-format]] +==== `fielddata.format` + +For `analyzed` string fields, the fielddata `format` controls whether +fielddata should be enabled or not. It accepts: `disabled` and `paged_bytes` +(enabled, which is the default). To disable fielddata loading, you can use +the following mapping: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "text": { + "type": "string", + "fielddata": { + "format": "disabled" <1> + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `text` field cannot be used for sorting, aggregations, or in scripts. + +.Fielddata and other datatypes +[NOTE] +================================================== + +Historically, other field datatypes also used fielddata, but this has been replaced +by index-time, disk-based <>. + +================================================== + + +[[fielddata-loading]] +==== `fielddata.loading` + +This per-field setting controls when fielddata is loaded into memory. It +accepts three options: + +[horizontal] +`lazy`:: + + Fielddata is only loaded into memory when it is needed. (default) + +`eager`:: + + Fielddata is loaded into memory before a new search segment becomes + visible to search. This can reduce the latency that a user may experience + if their search request has to trigger lazy loading from a big segment. + +`eager_global_ordinals`:: + + Loading fielddata into memory is only part of the work that is required. + After loading the fielddata for each segment, Elasticsearch builds the + <> data structure to make a list of all unique terms + across all the segments in a shard. By default, global ordinals are built + lazily. If the field has a very high cardinality, global ordinals may + take some time to build, in which case you can use eager loading instead. + +[[global-ordinals]] +.Global ordinals +***************************************** + +Global ordinals is a data-structure on top of fielddata and doc values, that +maintains an incremental numbering for each unique term in a lexicographic +order. Each term has a unique number and the number of term 'A' is lower than +the number of term 'B'. Global ordinals are only supported on string fields. + +Fielddata and doc values also have ordinals, which is a unique numbering for all terms +in a particular segment and field. Global ordinals just build on top of this, +by providing a mapping between the segment ordinals and the global ordinals, +the latter being unique across the entire shard. + +Global ordinals are used for features that use segment ordinals, such as +sorting and the terms aggregation, to improve the execution time. A terms +aggregation relies purely on global ordinals to perform the aggregation at the +shard level, then converts global ordinals to the real term only for the final +reduce phase, which combines results from different shards. + +Global ordinals for a specified field are tied to _all the segments of a +shard_, while fielddata and doc values ordinals are tied to a single segment. +which is different than for field data for a specific field which is tied to a +single segment. For this reason global ordinals need to be entirely rebuilt +whenever a once new segment becomes visible. + +The loading time of global ordinals depends on the number of terms in a field, but in general +it is low, since it source field data has already been loaded. The memory overhead of global +ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals +can move the loading time from the first search request, to the refresh itself. + +***************************************** + +[[field-data-filtering]] +==== `fielddata.filter` + +Fielddata filtering can be used to reduce the number of terms loaded into +memory, and thus reduce memory usage. Terms can be filtered by _frequency_ or +by _regular expression_, or a combination of the two: + +Filtering by frequency:: ++ +-- + +The frequency filter allows you to only load terms whose term frequency falls +between a `min` and `max` value, which can be expressed an absolute +number (when the number is bigger than 1.0) or as a percentage +(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated +*per segment*. Percentages are based on the number of docs which have a +value for the field, as opposed to all docs in the segment. + +Small segments can be excluded completely by specifying the minimum +number of docs that the segment should contain with `min_segment_size`: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "tag": { + "type": "string", + "fielddata": { + "filter": { + "frequency": { + "min": 0.001, + "max": 0.1, + "min_segment_size": 500 + } + } + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +-- + +Filtering by regex:: ++ +-- +Terms can also be filtered by regular expression - only values which +match the regular expression are loaded. Note: the regular expression is +applied to each term in the field, not to the whole field value. For +instance, to only load hashtags from a tweet, we can use a regular +expression which matches terms beginning with `#`: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "tweet": { + "type": "string", + "analyzer": "whitespace", + "fielddata": { + "filter": { + "regex": { + "pattern": "^#.*" + } + } + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +-- + +These filters can be updated on an existing field mapping and will take +effect the next time the fielddata for a segment is loaded. Use the +<> API +to reload the fielddata using the new filters. diff --git a/docs/reference/mapping/params/format.asciidoc b/docs/reference/mapping/params/format.asciidoc new file mode 100644 index 00000000000..a740858721e --- /dev/null +++ b/docs/reference/mapping/params/format.asciidoc @@ -0,0 +1,281 @@ +[[mapping-date-format]] +=== `format` + +In JSON documents, dates are represented as strings. Elasticsearch uses a set +of preconfigured formats to recognize and parse these strings into a long +value representing _milliseconds-since-the-epoch_ in UTC. + +Besides the <>, your own +<> can be specified using the familiar +`yyyy/MM/dd` syntax: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "date": { + "type": "date", + "format": "yyyy-MM-dd" + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + +Many APIs which support date values also support <> +expressions, such as `now-1m/d` -- the current time, minus one month, rounded +down to the nearest day. + +[[custom-date-formats]] +==== Custom date formats + +Completely customizable date formats are supported. The syntax for these is explained +http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html[in the Joda docs]. + +[[built-in-date-formats]] +==== Built In Formats + +Most of the below dates have a `strict` companion dates, which means, that +year, month and day parts of the week must have prepending zeros in order +to be valid. This means, that a date like `5/11/1` would not be valid, but +you would need to specify the full date, which would be `2005/11/01` in this +example. So instead of `date_optional_time` you would need to specify +`strict_date_optional_time`. + +The following tables lists all the defaults ISO formats supported: + +`epoch_millis`:: + + A formatter for the number of milliseconds since the epoch. Note, that + this timestamp allows a max length of 13 chars, so dates older than 1653 + and 2286 are not supported. You should use a different date formatter in + that case. + +`epoch_second`:: + + A formatter for the number of seconds since the epoch. Note, that this + timestamp allows a max length of 10 chars, so dates older than 1653 and + 2286 are not supported. You should use a different date formatter in that + case. + +[[strict-date-time]]`date_optional_time` or `strict_date_optional_time`:: + + A generic ISO datetime parser where the date is mandatory and the time is + optional. + http://www.joda.org/joda-time/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateOptionalTimeParser--[Full details here]. + +`basic_date`:: + + A basic formatter for a full date as four digit year, two digit month of + year, and two digit day of month: `yyyyMMdd`. + +`basic_date_time`:: + + A basic formatter that combines a basic date and time, separated by a 'T': + `yyyyMMdd'T'HHmmss.SSSZ`. + +`basic_date_time_no_millis`:: + + A basic formatter that combines a basic date and time without millis, + separated by a 'T': `yyyyMMdd'T'HHmmssZ`. + +`basic_ordinal_date`:: + + A formatter for a full ordinal date, using a four digit year and three + digit dayOfYear: `yyyyDDD`. + +`basic_ordinal_date_time`:: + + A formatter for a full ordinal date and time, using a four digit year and + three digit dayOfYear: `yyyyDDD'T'HHmmss.SSSZ`. + +`basic_ordinal_date_time_no_millis`:: + + A formatter for a full ordinal date and time without millis, using a four + digit year and three digit dayOfYear: `yyyyDDD'T'HHmmssZ`. + +`basic_time`:: + + A basic formatter for a two digit hour of day, two digit minute of hour, + two digit second of minute, three digit millis, and time zone offset: + `HHmmss.SSSZ`. + +`basic_time_no_millis`:: + + A basic formatter for a two digit hour of day, two digit minute of hour, + two digit second of minute, and time zone offset: `HHmmssZ`. + +`basic_t_time`:: + + A basic formatter for a two digit hour of day, two digit minute of hour, + two digit second of minute, three digit millis, and time zone off set + prefixed by 'T': `'T'HHmmss.SSSZ`. + +`basic_t_time_no_millis`:: + + A basic formatter for a two digit hour of day, two digit minute of hour, + two digit second of minute, and time zone offset prefixed by 'T': + `'T'HHmmssZ`. + +`basic_week_date` or `strict_basic_week_date`:: + + A basic formatter for a full date as four digit weekyear, two digit week + of weekyear, and one digit day of week: `xxxx'W'wwe`. + +`basic_week_date_time` or `strict_basic_week_date_time`:: + + A basic formatter that combines a basic weekyear date and time, separated + by a 'T': `xxxx'W'wwe'T'HHmmss.SSSZ`. + +`basic_week_date_time_no_millis` or `strict_basic_week_date_time_no_millis`:: + + A basic formatter that combines a basic weekyear date and time without + millis, separated by a 'T': `xxxx'W'wwe'T'HHmmssZ`. + +`date` or `strict_date`:: + + A formatter for a full date as four digit year, two digit month of year, + and two digit day of month: `yyyy-MM-dd`. + +`date_hour` or `strict_date_hour`:: + + A formatter that combines a full date and two digit hour of day. + +`date_hour_minute` or `strict_date_hour_minute`:: + + A formatter that combines a full date, two digit hour of day, and two + digit minute of hour. + +`date_hour_minute_second` or `strict_date_hour_minute_second`:: + + A formatter that combines a full date, two digit hour of day, two digit + minute of hour, and two digit second of minute. + +`date_hour_minute_second_fraction` or `strict_date_hour_minute_second_fraction`:: + + A formatter that combines a full date, two digit hour of day, two digit + minute of hour, two digit second of minute, and three digit fraction of + second: `yyyy-MM-dd'T'HH:mm:ss.SSS`. + +`date_hour_minute_second_millis` or `strict_date_hour_minute_second_millis`:: + + A formatter that combines a full date, two digit hour of day, two digit + minute of hour, two digit second of minute, and three digit fraction of + second: `yyyy-MM-dd'T'HH:mm:ss.SSS`. + +`date_time` or `strict_date_time`:: + + A formatter that combines a full date and time, separated by a 'T': `yyyy- + MM-dd'T'HH:mm:ss.SSSZZ`. + +`date_time_no_millis` or `strict_date_time_no_millis`:: + + A formatter that combines a full date and time without millis, separated + by a 'T': `yyyy-MM-dd'T'HH:mm:ssZZ`. + +`hour` or `strict_hour`:: + + A formatter for a two digit hour of day. + +`hour_minute` or `strict_hour_minute`:: + + A formatter for a two digit hour of day and two digit minute of hour. + +`hour_minute_second` or `strict_hour_minute_second`:: + + A formatter for a two digit hour of day, two digit minute of hour, and two + digit second of minute. + +`hour_minute_second_fraction` or `strict_hour_minute_second_fraction`:: + + A formatter for a two digit hour of day, two digit minute of hour, two + digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`. + +`hour_minute_second_millis` or `strict_hour_minute_second_millis`:: + + A formatter for a two digit hour of day, two digit minute of hour, two + digit second of minute, and three digit fraction of second: `HH:mm:ss.SSS`. + +`ordinal_date` or `strict_ordinal_date`:: + + A formatter for a full ordinal date, using a four digit year and three + digit dayOfYear: `yyyy-DDD`. + +`ordinal_date_time` or `strict_ordinal_date_time`:: + + A formatter for a full ordinal date and time, using a four digit year and + three digit dayOfYear: `yyyy-DDD'T'HH:mm:ss.SSSZZ`. + +`ordinal_date_time_no_millis` or `strict_ordinal_date_time_no_millis`:: + + A formatter for a full ordinal date and time without millis, using a four + digit year and three digit dayOfYear: `yyyy-DDD'T'HH:mm:ssZZ`. + +`time` or `strict_time`:: + + A formatter for a two digit hour of day, two digit minute of hour, two + digit second of minute, three digit fraction of second, and time zone + offset: `HH:mm:ss.SSSZZ`. + +`time_no_millis` or `strict_time_no_millis`:: + + A formatter for a two digit hour of day, two digit minute of hour, two + digit second of minute, and time zone offset: `HH:mm:ssZZ`. + +`t_time` or `strict_t_time`:: + + A formatter for a two digit hour of day, two digit minute of hour, two + digit second of minute, three digit fraction of second, and time zone + offset prefixed by 'T': `'T'HH:mm:ss.SSSZZ`. + +`t_time_no_millis` or `strict_t_time_no_millis`:: + + A formatter for a two digit hour of day, two digit minute of hour, two + digit second of minute, and time zone offset prefixed by 'T': `'T'HH:mm:ssZZ`. + +`week_date` or `strict_week_date`:: + + A formatter for a full date as four digit weekyear, two digit week of + weekyear, and one digit day of week: `xxxx-'W'ww-e`. + +`week_date_time` or `strict_week_date_time`:: + + A formatter that combines a full weekyear date and time, separated by a + 'T': `xxxx-'W'ww-e'T'HH:mm:ss.SSSZZ`. + +`week_date_time_no_millis` or `strict_week_date_time_no_millis`:: + + A formatter that combines a full weekyear date and time without millis, + separated by a 'T': `xxxx-'W'ww-e'T'HH:mm:ssZZ`. + +`weekyear` or `strict_weekyear`:: + + A formatter for a four digit weekyear. + +`weekyear_week` or `strict_weekyear_week`:: + + A formatter for a four digit weekyear and two digit week of weekyear. + +`weekyear_week_day` or `strict_weekyear_week_day`:: + + A formatter for a four digit weekyear, two digit week of weekyear, and one + digit day of week. + +`year` or `strict_year`:: + + A formatter for a four digit year. + +`year_month` or `strict_year_month`:: + + A formatter for a four digit year and two digit month of year. + +`year_month_day` or `strict_year_month_day`:: + + A formatter for a four digit year, two digit month of year, and two digit + day of month. + diff --git a/docs/reference/mapping/params/geohash-precision.asciidoc b/docs/reference/mapping/params/geohash-precision.asciidoc new file mode 100644 index 00000000000..c571a70a9de --- /dev/null +++ b/docs/reference/mapping/params/geohash-precision.asciidoc @@ -0,0 +1,60 @@ +[[geohash-precision]] +=== `geohash_precision` + +Geohashes are a form of lat/lon encoding which divides the earth up into +a grid. Each cell in this grid is represented by a geohash string. Each +cell in turn can be further subdivided into smaller cells which are +represented by a longer string. So the longer the geohash, the smaller +(and thus more accurate) the cell is. + +The `geohash_precision` setting controls the length of the geohash that is +indexed when the <> option is enabled, and the maximum +geohash length when the <> option is enabled. + +It accepts: + +* a number between 1 and 12 (default), which represents the length of the geohash. +* a <>, e.g. `1km`. + +If a distance is specified, it will be translated to the smallest +geohash-length that will provide the requested resolution. + +For example: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "location": { + "type": "geo_point", + "geohash_prefix": true, + "geohash_precision": 6 <1> + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "location": { + "lat": 41.12, + "lon": -71.34 + } +} + +GET my_index/_search?fielddata_fields=location.geohash +{ + "query": { + "term": { + "location.geohash": "drm3bt" + } + } +} + +-------------------------------------------------- +// AUTOSENSE +<1> A `geohash_precision` of 6 equates to geohash cells of approximately 1.26km x 0.6km diff --git a/docs/reference/mapping/params/geohash-prefix.asciidoc b/docs/reference/mapping/params/geohash-prefix.asciidoc new file mode 100644 index 00000000000..da4708f89b0 --- /dev/null +++ b/docs/reference/mapping/params/geohash-prefix.asciidoc @@ -0,0 +1,64 @@ +[[geohash-prefix]] +=== `geohash_prefix` + +Geohashes are a form of lat/lon encoding which divides the earth up into +a grid. Each cell in this grid is represented by a geohash string. Each +cell in turn can be further subdivided into smaller cells which are +represented by a longer string. So the longer the geohash, the smaller +(and thus more accurate) the cell is. + +While the <> option enables indexing the geohash that +corresponds to the lat/lon point, at the specified +<>, the `geohash_prefix` option will also +index all the enclosing cells as well. + +For instance, a geohash of `drm3btev3e86` will index all of the following +terms: [ `d`, `dr`, `drm`, `drm3`, `drm3b`, `drm3bt`, `drm3bte`, `drm3btev`, +`drm3btev3`, `drm3btev3e`, `drm3btev3e8`, `drm3btev3e86` ]. + +The geohash prefixes can be used with the +<> to find points within a +particular geohash, or its neighbours: + + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "location": { + "type": "geo_point", + "geohash_prefix": true, + "geohash_precision": 6 + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "location": { + "lat": 41.12, + "lon": -71.34 + } +} + +GET my_index/_search?fielddata_fields=location.geohash +{ + "query": { + "geohash_cell": { + "location": { + "lat": 41.02, + "lon": -71.48 + }, + "precision": 4, <1> + "neighbors": true <1> + } + } +} +-------------------------------------------------- +// AUTOSENSE + diff --git a/docs/reference/mapping/params/geohash.asciidoc b/docs/reference/mapping/params/geohash.asciidoc new file mode 100644 index 00000000000..a1319c071a3 --- /dev/null +++ b/docs/reference/mapping/params/geohash.asciidoc @@ -0,0 +1,70 @@ +[[geohash]] +=== `geohash` + +Geohashes are a form of lat/lon encoding which divides the earth up into +a grid. Each cell in this grid is represented by a geohash string. Each +cell in turn can be further subdivided into smaller cells which are +represented by a longer string. So the longer the geohash, the smaller +(and thus more accurate) the cell is. + +Because geohashes are just strings, they can be stored in an inverted +index like any other string, which makes querying them very efficient. + +If you enable the `geohash` option, a `geohash` ``sub-field'' will be indexed +as, eg `.geohash`. The length of the geohash is controlled by the +<> parameter. + +If the <> option is enabled, the `geohash` +option will be enabled automatically. + +For example: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "location": { + "type": "geo_point", <1> + "geohash": true + } + } + } + } +} + + +PUT my_index/my_type/1 +{ + "location": { + "lat": 41.12, + "lon": -71.34 + } +} + +GET my_index/_search?fielddata_fields=location.geohash <2> +{ + "query": { + "prefix": { + "location.geohash": "drm3b" <3> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> A `location.geohash` field will be indexed for each geo-point. +<2> The geohash can be retrieved with <>. +<3> A <> query can find all geohashes which start with a particular prefix. + +[WARNING] +============================================ + +A `prefix` query on geohashes is expensive. Instead, consider using the +<> to pay the expense once at index time +instead of on every query. + +============================================ + + diff --git a/docs/reference/mapping/params/ignore-above.asciidoc b/docs/reference/mapping/params/ignore-above.asciidoc new file mode 100644 index 00000000000..61d6e8f8edf --- /dev/null +++ b/docs/reference/mapping/params/ignore-above.asciidoc @@ -0,0 +1,61 @@ +[[ignore-above]] +=== `ignore_above` + +Strings longer than the `ignore_above` setting will not be processed by the +<> and will not be indexed. This is mainly useful for +<> string fields, which are typically used for +filtering, aggregations, and sorting. These are structured fields and it +doesn't usually make sense to allow very long terms to be indexed in these +fields. + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "message": { + "type": "string", + "index": "not_analyzed", + "ignore_above": 20 <1> + } + } + } + } +} + +PUT my_index/my_type/1 <2> +{ + "message": "Syntax error" +} + +PUT my_index/my_type/2 <3> +{ + "message": "Syntax error with some long stacktrace" +} + +GET _search <4> +{ + "aggs": { + "messages": { + "terms": { + "field": "message" + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> This field will ignore any string longer than 20 characters. +<2> This document is indexed successfully. +<3> This document will be indexed, but without indexing the `message` field. +<4> Search returns both documents, but only the first is present in the terms aggregation. + +This option is also useful for protecting against Lucene's term byte-length +limit of `32766`. + +NOTE: The value for `ignore_above` is the _character count_, but Lucene counts +bytes. If you use UTF-8 text with many non-ASCII characters, you may want to +set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most +3 bytes. diff --git a/docs/reference/mapping/params/ignore-malformed.asciidoc b/docs/reference/mapping/params/ignore-malformed.asciidoc new file mode 100644 index 00000000000..359c51ef958 --- /dev/null +++ b/docs/reference/mapping/params/ignore-malformed.asciidoc @@ -0,0 +1,83 @@ +[[ignore-malformed]] +=== `ignore_malformed` + +Sometimes you don't have much control over the data that you receive. One +user may send a `login` field that is a <>, and another sends a +`login` field that is an email address. + +Trying to index the wrong datatype into a field throws an exception by +default, and rejects the whole document. The `ignore_malformed` parameter, if +set to `true`, allows the exception to be ignored. The malformed field is not +indexed, but other fields in the document are processed normally. + +For example: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "number_one": { + "type": "integer" + }, + "number_two": { + "type": "integer", + "ignore_malformed": true + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "text": "Some text value", + "number_one": "foo" <1> +} + +PUT my_index/my_type/2 +{ + "text": "Some text value", + "number_two": "foo" <2> +} +-------------------------------------------------- +// AUTOSENSE +<1> This document will be rejected because `number_one` does not allow malformed values. +<2> This document will have the `text` field indexed, but not the `number_two` field. + + +[[ignore-malformed-setting]] +==== Index-level default + +The `index.mapping.ignore_malformed` setting can be set on the index level to +allow to ignore malformed content globally across all mapping types. + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "settings": { + "index.mapping.ignore_malformed": true <1> + }, + "mappings": { + "my_type": { + "properties": { + "number_one": { <1> + "type": "byte" + }, + "number_two": { + "type": "integer", + "ignore_malformed": false <2> + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + +<1> The `number_one` field inherits the index-level setting. +<2> The `number_two` field overrides the index-level setting to turn off `ignore_malformed`. + diff --git a/docs/reference/mapping/params/include-in-all.asciidoc b/docs/reference/mapping/params/include-in-all.asciidoc new file mode 100644 index 00000000000..09e42668239 --- /dev/null +++ b/docs/reference/mapping/params/include-in-all.asciidoc @@ -0,0 +1,83 @@ +[[include-in-all]] +=== `include_in_all` + +The `include_in_all` parameter provides per-field control over which fields +are included in the <> field. It defaults to `true`, unless <> is set to `no`. + +This example demonstrates how to exclude the `date` field from the `_all` field: + +[source,js] +-------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "title": { <1> + "type": "string" + } + "content": { <1> + "type": "string" + }, + "date": { <2> + "type": "date", + "include_in_all": false + } + } + } + } +} +-------------------------------- +// AUTOSENSE + +<1> The `title` and `content` fields with be included in the `_all` field. +<2> The `date` field will not be included in the `_all` field. + +The `include_in_all` parameter can also be set at the type level and on +<> or <> fields, in which case all sub- +fields inherit that setting. For instance: + +[source,js] +-------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "include_in_all": false, <1> + "properties": { + "title": { "type": "string" }, + "author": { + "include_in_all": true, <2> + "properties": { + "first_name": { "type": "string" }, + "last_name": { "type": "string" } + } + }, + "editor": { + "properties": { + "first_name": { "type": "string" }, <3> + "last_name": { "type": "string", "include_in_all": true } <3> + } + } + } + } + } +} +-------------------------------- +// AUTOSENSE + +<1> All fields in `my_type` are excluded from `_all`. +<2> The `author.first_name` and `author.last_name` fields are included in `_all`. +<3> Only the `editor.last_name` field is included in `_all`. + The `editor.first_name` inherits the type-level setting and is excluded. + +[NOTE] +.Multi-fields and `include_in_all` +================================= + +The original field value is added to the `_all` field, not the terms produced +by a field's analyzer. For this reason, it makes no sense to set +`include_in_all` to `true` on <>, as each +multi-field has exactly the same value as its parent. + +================================= diff --git a/docs/reference/mapping/params/index-options.asciidoc b/docs/reference/mapping/params/index-options.asciidoc new file mode 100644 index 00000000000..9e096604b76 --- /dev/null +++ b/docs/reference/mapping/params/index-options.asciidoc @@ -0,0 +1,70 @@ +[[index-options]] +=== `index_options` + +The `index_options` parameter controls what information is added to the +inverted index, for search and highlighting purposes. It accepts the +following settings: + +[horizontal] +`docs`:: + + Only the doc number is indexed. Can answer the question _Does this term + exist in this field?_ + +`freqs`:: + + Doc number and term frequencies are indexed. Term frequencies are used to + score repeated terms higher than single terms. + +`positions`:: + + Doc number, term frequencies, and term positions (or order) are indexed. + Positions can be used for + <>. + +`offsets`:: + + Doc number, term frequencies, positions, and start and end character + offsets (which map the term back to the original string) are indexed. + Offsets are used by the <>. + +<> string fields use `positions` as the default, and +< + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `text` field will use the postings highlighter by default because `offsets` are indexed. diff --git a/docs/reference/mapping/params/index.asciidoc b/docs/reference/mapping/params/index.asciidoc new file mode 100644 index 00000000000..979ead036f0 --- /dev/null +++ b/docs/reference/mapping/params/index.asciidoc @@ -0,0 +1,48 @@ +[[mapping-index]] +=== `index` + +The `index` option controls how field values are indexed and, thus, how they +are searchable. It accepts three values: + +[horizontal] +`no`:: + + Do not add this field value to the index. With this setting, the field + will not be queryable. + +`not_analyzed`:: + + Add the field value to the index unchanged, as a single term. This is the + default for all fields that support this option except for + <> fields. `not_analyzed` fields are usually used with + <> for structured search. + +`analyzed`:: + + This option applies only to `string` fields, for which it is the default. + The string field value is first <> to convert the + string into terms (e.g. a list of individual words), which are then + indexed. At search time, the the query string is passed through + (<>) the same analyzer to generate terms + in the same format as those in the index. It is this process that enables + <>. + +For example, you can create a `not_analyzed` string field with the following: + +[source,js] +-------------------------------------------------- +PUT /my_index +{ + "mappings": { + "my_type": { + "properties": { + "status_code": { + "type": "string", + "index": "not_analyzed" + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE \ No newline at end of file diff --git a/docs/reference/mapping/params/lat-lon.asciidoc b/docs/reference/mapping/params/lat-lon.asciidoc new file mode 100644 index 00000000000..0a5cce48437 --- /dev/null +++ b/docs/reference/mapping/params/lat-lon.asciidoc @@ -0,0 +1,63 @@ +[[lat-lon]] +=== `lat_lon` + +<> are usually performed by plugging the value of +each <> field into a formula to determine whether it +falls into the required area or not. Unlike most queries, the inverted index +is not involved. + +Setting `lat_lon` to `true` causes the latitude and longitude values to be +indexed as numeric fields (called `.lat` and `.lon`). These fields can be used +by the <> and +<> queries instead of +performing in-memory calculations. + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "location": { + "type": "geo_point", + "lat_lon": true <1> + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "location": { + "lat": 41.12, + "lon": -71.34 + } +} + + +GET my_index/_search +{ + "query": { + "geo_distance": { + "location": { + "lat": 41, + "lon": -71 + }, + "distance": "50km", + "optimize_bbox": "indexed" <2> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> Setting `lat_lon` to true indexes the geo-point in the `location.lat` and `location.lon` fields. +<2> The `indexed` option tells the geo-distance query to use the inverted index instead of the in-memory calculation. + +Whether the in-memory or indexed operation performs better depends both on +your dataset and on the types of queries that you are running. + +NOTE: The `lat_lon` option only makes sense for single-value `geo_point` +fields. It will not work with arrays of geo-points. + diff --git a/docs/reference/mapping/params/multi-fields.asciidoc b/docs/reference/mapping/params/multi-fields.asciidoc new file mode 100644 index 00000000000..1287fa3d67c --- /dev/null +++ b/docs/reference/mapping/params/multi-fields.asciidoc @@ -0,0 +1,132 @@ +[[multi-fields]] +=== `fields` + +It is often useful to index the same field in different ways for different +purposes. This is the purpose of _multi-fields_. For instance, a `string` +field could be <> as an `analyzed` field for full-text +search, and as a `not_analyzed` field for sorting or aggregations: + +[source,js] +-------------------------------------------------- +PUT /my_index +{ + "mappings": { + "my_type": { + "properties": { + "city": { + "type": "string", + "fields": { + "raw": { <1> + "type": "string", + "index": "not_analyzed" + } + } + } + } + } + } +} + +PUT /my_index/my_type/1 +{ + "city": "New York" +} + +PUT /my_index/my_type/2 +{ + "city": "York" +} + +GET /my_index/_search +{ + "query": { + "match": { + "city": "york" <2> + } + }, + "sort": { + "city.raw": "asc" <3> + }, + "aggs": { + "Cities": { + "terms": { + "field": "city.raw" <3> + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `city.raw` field is a `not_analyzed` version of the `city` field. +<2> The analyzed `city` field can be used for full text search. +<3> The `city.raw` field can be used for sorting and aggregations + +NOTE: Multi-fields do not change the original `_source` field. + +==== Multi-fields with multiple analyzers + +Another use case of multi-fields is to analyze the same field in different +ways for better relevance. For instance we could index a field with the +<> which breaks text up into +words, and again with the <> +which stems words into their root form: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "text": { <1> + "type": "string" + }, + "fields": { + "english": { <2> + "type": "string", + "analyzer": "english" + } + } + } + } + } +} + +PUT my_index/my_type/1 +{ "text": "quick brown fox" } <3> + +PUT my_index/my_type/2 +{ "text": "quick brown foxes" } <3> + +GET my_index/_search +{ + "query": { + "multi_match": { + "query": "quick brown foxes", + "fields": [ <4> + "text", + "text.english" + ], + "type": "most_fields" <4> + } + } +} +-------------------------------------------------- +// AUTOSENSE + +<1> The `text` field uses the `standard` analyzer. +<2> The `text.english` field uses the `english` analyzer. +<3> Index two documents, one with `fox` and the other with `foxes`. +<4> Query both the `text` and `text.english` fields and combine the scores. + +The `text` field contains the term `fox` in the first document and `foxes` in +the second document. The `text.english` field contains `fox` for both +documents, because `foxes` is stemmed to `fox`. + +The query string is also analyzed by the `standard` analyzer for the `text` +field, and by the `english` analyzer` for the `text.english` field. The +stemmed field allows a query for `foxes` to also match the document containing +just `fox`. This allows us to match as many documents as possible. By also +querying the unstemmed `text` field, we improve the relevance score of the +document which matches `foxes` exactly. + diff --git a/docs/reference/mapping/params/norms.asciidoc b/docs/reference/mapping/params/norms.asciidoc new file mode 100644 index 00000000000..0a28f5af4d6 --- /dev/null +++ b/docs/reference/mapping/params/norms.asciidoc @@ -0,0 +1,64 @@ +[[norms]] +=== `norms` + +Norms store various normalization factors -- a number to represent the +relative field length and the <> setting -- +that are later used at query time in order to compute the score of a document +relatively to a query. + +Although useful for scoring, norms also require quite a lot of memory +(typically in the order of one byte per document per field in your index, even +for documents that don't have this specific field). As a consequence, if you +don't need scoring on a specific field, you should disable norms on that +field. In particular, this is the case for fields that are used solely for +filtering or aggregations. + +Norms can be disabled (but not reenabled) after the fact, using the +<> like so: + +[source,js] +------------ +PUT my_index/_mapping/my_type +{ + "properties": { + "title": { + "type": "string", + "norms": { + "enabled": false + } + } + } +} +------------ +// AUTOSENSE + +NOTE: Norms will not be removed instantly, but will be removed as old segments +are merged into new segments as you continue indexing new documents. Any score +computation on a field that has had norms removed might return inconsistent +results since some documents won't have norms anymore while other documents +might still have norms. + +==== Lazy loading of norms + +Norms can be loaded into memory eagerly (`eager`), whenever a new segment +comes online, or they can loaded lazily (`lazy`, default), only when the field +is queried. + +Eager loading can be configured as follows: + +[source,js] +------------ +PUT my_index/_mapping/my_type +{ + "properties": { + "title": { + "type": "string", + "norms": { + "loading": "eager" + } + } + } +} +------------ +// AUTOSENSE + diff --git a/docs/reference/mapping/params/null-value.asciidoc b/docs/reference/mapping/params/null-value.asciidoc new file mode 100644 index 00000000000..09584ed0fa2 --- /dev/null +++ b/docs/reference/mapping/params/null-value.asciidoc @@ -0,0 +1,58 @@ +[[null-value]] +=== `null_value` + +A `null` value cannot be indexed or searched. When a field is set to `null`, +(or an empty array or an array of `null` values) it is treated as though that +field has no values. + +The `null_value` parameter allows you to replace explicit `null` values with +the specified value so that it can be indexed and searched. For instance: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "status_code": { + "type": "string", + "index": "not_analyzed", + "null_value": "NULL" <1> + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "status_code": null +} + +PUT my_index/my_type/2 +{ + "status_code": [] <2> +} + +GET my_index/_search +{ + "query": { + "term": { + "status_code": "NULL" <3> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> Replace explicit `null` values with the term `NULL`. +<2> An empty array does not contain an explicit `null`, and so won't be replaced with the `null_value`. +<3> A query for `NULL` returns document 1, but not document 2. + +IMPORTANT: The `null_value` needs to be the same datatype as the field. For +instance, a `long` field cannot have a string `null_value`. String fields +which are `analyzed` will also pass the `null_value` through the configured +analyzer. + +Also see the <> for its `null_value` support. + diff --git a/docs/reference/mapping/params/position-offset-gap.asciidoc b/docs/reference/mapping/params/position-offset-gap.asciidoc new file mode 100644 index 00000000000..0e908dd1f14 --- /dev/null +++ b/docs/reference/mapping/params/position-offset-gap.asciidoc @@ -0,0 +1,68 @@ +[[position-offset-gap]] +=== `position_offset_gap` + +<> string fields take term <> +into account, in order to be able to support +<>. +When indexing an array of strings, each string of the array is indexed +directly after the previous one, almost as though all the strings in the array +had been concatenated into one big string. + +This can result in matches from phrase queries spanning two array elements. +For instance: + +[source,js] +-------------------------------------------------- +PUT /my_index/groups/1 +{ + "names": [ "John Abraham", "Lincoln Smith"] +} + +GET /my_index/groups/_search +{ + "query": { + "match_phrase": { + "names": "Abraham Lincoln" <1> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> This phrase query matches our document, even though `Abraham` and `Lincoln` are in separate strings. + +The `position_offset_gap` can introduce a fake gap between each array element. For instance: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "names": { + "type": "string", + "position_offset_gap": 50 <1> + } + } + } + } +} + +PUT /my_index/groups/1 +{ + "names": [ "John Abraham", "Lincoln Smith"] +} + +GET /my_index/groups/_search +{ + "query": { + "match_phrase": { + "names": "Abraham Lincoln" <2> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The first term in the next array element will be 50 terms apart from the + last term in the previous array element. +<2> The phrase query no longer matches our document. \ No newline at end of file diff --git a/docs/reference/mapping/params/precision-step.asciidoc b/docs/reference/mapping/params/precision-step.asciidoc new file mode 100644 index 00000000000..a7ea4d342e7 --- /dev/null +++ b/docs/reference/mapping/params/precision-step.asciidoc @@ -0,0 +1,56 @@ +[[precision-step]] +=== `precision_step` + +Most <> datatypes index extra terms representing numeric +ranges for each number to make <> +faster. For instance, this `range` query: + +[source,js] +-------------------------------------------------- + "range": { + "number": { + "gte": 0 + "lte": 321 + } + } +-------------------------------------------------- + +might be executed internally as a <> that +looks something like this: + +[source,js] +-------------------------------------------------- + "terms": { + "number": [ + "0-255", + "256-319" + "320", + "321" + ] + } +-------------------------------------------------- + +These extra terms greatly reduce the number of terms that have to be examined, +at the cost of increased disk space. + +The default value for `precision_step` depends on the `type` of the numeric field: + +[horizontal] +`long`, `double`, `date`, `ip`:: `16` (3 extra terms) +`integer`, `float`, `short`:: `8` (3 extra terms) +`byte`:: `2147483647` (0 extra terms) +`token_count`:: `32` (0 extra terms) + +The value of the `precision_step` setting indicates the number of bits that +should be compressed into an extra term. A `long` value consists of 64 bits, +so a `precision_step` of 16 results in the following terms: + +[horizontal] +Bits 0-15:: `value & 1111111111111111 0000000000000000 0000000000000000 0000000000000000` +Bits 0-31:: `value & 1111111111111111 1111111111111111 0000000000000000 0000000000000000` +Bits 0-47:: `value & 1111111111111111 1111111111111111 1111111111111111 0000000000000000` +Bits 0-63:: `value` + + + + diff --git a/docs/reference/mapping/params/properties.asciidoc b/docs/reference/mapping/params/properties.asciidoc new file mode 100644 index 00000000000..c340ce47aa7 --- /dev/null +++ b/docs/reference/mapping/params/properties.asciidoc @@ -0,0 +1,101 @@ +[[properties]] +=== `properties` + +Type mappings, <> and <> +contain sub-fields, called `properties`. These properties may be of any +<>, including `object` and `nested`. Properties can +be added: + +* explicitly by defining them when <>. +* explicitily by defining them when adding or updating a mapping type with the <> API. +* <> just by indexing documents containing new fields. + +Below is an example of adding `properties` to a mapping type, an `object` +field, and a `nested` field: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { <1> + "properties": { + "manager": { <2> + "properties": { + "age": { "type": "integer" }, + "name": { "type": "string" } + } + }, + "employees": { <3> + "type": "nested", + "properties": { + "age": { "type": "integer" }, + "name": { "type": "string" } + } + } + } + } + } +} + +PUT my_index/my_type/1 <4> +{ + "region": "US", + "manager": { + "name": "Alice White", + "age": 30 + }, + "employees": [ + { + "name": "John Smith", + "age": 34 + }, + { + "name": "Peter Brown", + "age": 26 + } + ] +} +-------------------------------------------------- +// AUTOSENSE +<1> Properties under the `my_type` mapping type. +<2> Properties under the `manager` object field. +<3> Properties under the `employees` nested field. +<4> An example document which corresponds to the above mapping. + +==== Dot notation + +Inner fields can be referred to in queries, aggregations, etc., using _dot +notation_: + +[source,js] +-------------------------------------------------- +GET my_index/_search +{ + "query": { + "match": { + "manager.name": "Alice White" <1> + } + }, + "aggs": { + "Employees": { + "nested": { + "path": "employees" + }, + "aggs": { + "Employee Ages": { + "histogram": { + "field": "employees.age", <2> + "interval": 5 + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + +IMPORTANT: The full path to the inner field must be specified. + + diff --git a/docs/reference/mapping/params/search-analyzer.asciidoc b/docs/reference/mapping/params/search-analyzer.asciidoc new file mode 100644 index 00000000000..2ebf3f811a1 --- /dev/null +++ b/docs/reference/mapping/params/search-analyzer.asciidoc @@ -0,0 +1,79 @@ +[[search-analyzer]] +=== `search_analyzer` + +Usually, the same <> should be applied at index time and at +search time, to ensure that the terms in the query are in the same format as +the terms in the inverted index. + +Sometimes, though, it can make sense to use a different analyzer at search +time, such as when using the <> +tokenizer for autocomplete. + +By default, queries will use the `analyzer` defined in the field mapping, but +this can be overridden with the `search_analyzer` setting: + +[source,js] +-------------------------------------------------- +PUT /my_index +{ + "settings": { + "analysis": { + "filter": { + "autocomplete_filter": { + "type": "edge_ngram", + "min_gram": 1, + "max_gram": 20 + } + }, + "analyzer": { + "autocomplete": { <1> + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "autocomplete_filter" + ] + } + } + } + }, + "mappings": { + "my_type": { + "properties": { + "text": { + "type": "string", + "analyzer": "autocomplete", <2> + "search_analyzer": "standard" <2> + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "text": "Quick Brown Fox" <3> +} + +GET my_index/_search +{ + "query": { + "match": { + "text": { + "query": "Quick Br", <4> + "operator": "and" + } + } + } +} + +-------------------------------------------------- +// AUTOSENSE + +<1> Analysis settings to define the custom `autocomplete` analyzer. +<2> The `text` field uses the `autocomplete` analyzer at index time, but the `standard` analyzer at search time. +<3> This field is indexed as the terms: [ `q`, `qu`, `qui`, `quic`, `quick`, `b`, `br`, `bro`, `brow`, `brown`, `f`, `fo`, `fox` ] +<4> The query searches for both of these terms: [ `quick`, `br` ] + +See {defguide}/_index_time_search_as_you_type.html[Index time search-as-you- +type] for a full explanation of this example. diff --git a/docs/reference/mapping/params/similarity.asciidoc b/docs/reference/mapping/params/similarity.asciidoc new file mode 100644 index 00000000000..393f654bcf1 --- /dev/null +++ b/docs/reference/mapping/params/similarity.asciidoc @@ -0,0 +1,54 @@ +[[similarity]] +=== `similarity` + +Elasticsearch allows you to configure a scoring algorithm or _similarity_ per +field. The `similarity` setting provides a simple way of choosing a similarity +algorithm other than the default TF/IDF, such as `BM25`. + +Similarities are mostly useful for <> fields, especially +`analyzed` string fields, but can also apply to other field types. + +Custom similarites can be configured by tuning the parameters of the built-in +similarities. For more details about this expert options, see the +<>. + +The only similarities which can be used out of the box, without any further +configuration are: + +`default`:: + The Default TF/IDF algorithm used by Elasticsearch and + Lucene. See {defguide}/practical-scoring-function.html[Lucene’s Practical Scoring Function] + for more information. + +`BM25`:: + The Okapi BM25 algorithm. + See {defguide}/pluggable-similarites.html[Plugggable Similarity Algorithms] + for more information. + + +The `similarity` can be set on the field level when a field is first created, +as follows: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "default_field": { <1> + "type": "string" + }, + "bm25_field": { + "type": "string", + "similarity": "BM25" <2> + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `default_field` uses the `default` similarity (ie TF/IDF). +<2> The `bm25_field` uses the `BM25` similarity. + diff --git a/docs/reference/mapping/params/store.asciidoc b/docs/reference/mapping/params/store.asciidoc new file mode 100644 index 00000000000..b81208aed77 --- /dev/null +++ b/docs/reference/mapping/params/store.asciidoc @@ -0,0 +1,73 @@ +[[mapping-store]] +=== `store` + +By default, field values <> to make them searchable, +but they are not _stored_. This means that the field can be queried, but the +original field value cannot be retrieved. + +Usually this doesn't matter. The field value is already part of the +<>, which is stored by default. If you +only want to retrieve the value of a single field or of a few fields, instead +of the whole `_source`, then this can be achieved with +<>. + +In certain situations it can make sense to `store` a field. For instance, if +you have a document with a `title`, a `date`, and a very large `content` +field, you may want to retrieve just the `title` and the `date` without having +to extract those fields from a large `_source` field: + +[source,js] +-------------------------------------------------- +PUT /my_index +{ + "mappings": { + "my_type": { + "properties": { + "title": { + "type": "string", + "store": true <1> + }, + "date": { + "type": "date", + "store": true <1> + }, + "content": { + "type": "string" + } + } + } + } +} + +PUT /my_index/my_type/1 +{ + "title": "Some short title", + "date": "2015-01-01", + "content": "A very long content field..." +} + +GET my_index/_search +{ + "fields": [ "title", "date" ] <2> +} +-------------------------------------------------- +// AUTOSENSE +<1> The `title` and `date` fields are stored. +<2> This request will retrieve the values of the `title` and `date` fields. + +[NOTE] +.Stored fields returned as arrays +====================================== + +For consistency, stored fields are always returned as an _array_ because there +is no way of knowing if the original field value was a single value, multiple +values, or an empty array. + +If you need the original value, you should retrieve it from the `_source` +field instead. + +====================================== + +Another situation where it can make sense to make a field stored is for those +that don't appear in the `_source` field (such as <>). + diff --git a/docs/reference/mapping/params/term-vector.asciidoc b/docs/reference/mapping/params/term-vector.asciidoc new file mode 100644 index 00000000000..74c4c416d95 --- /dev/null +++ b/docs/reference/mapping/params/term-vector.asciidoc @@ -0,0 +1,68 @@ +[[term-vector]] +=== `term_vector` + +Term vectors contain information about the terms produced by the +<> process, including: + +* a list of terms. +* the position (or order) of each term. +* the start and end character offsets mapping the term to its + origin in the original string. + +These term vectors can be stored so that they can be retrieved for a +particular document. + +The `term_vector` setting accepts: + +[horizontal] +`no`:: No term vectors are stored. (default) +`yes`:: Just the terms in the field are stored. +`with_positions`:: Terms and positions are stored. +`with_offsets`:: Terms and character offsets are stored. +`with_positions_offsets`:: Terms, positions, and character offsets are stored. + +The fast vector highlighter requires `with_positions_offsets`. The term +vectors API can retrieve whatever is stored. + +WARNING: Setting `with_positions_offsets` will double the size of a field's +index. + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "text": { + "type": "string", + "term_vector": "with_positions_offsets" + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "text": "Quick brown fox" +} + +GET my_index/_search +{ + "query": { + "match": { + "text": "brown fox" + } + }, + "highlight": { + "fields": { + "text": {} <1> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The fast vector highlighter will be used by default for the `text` field + because term vectors are enabled. + diff --git a/docs/reference/mapping/transform.asciidoc b/docs/reference/mapping/transform.asciidoc deleted file mode 100644 index 0fc8aab3204..00000000000 --- a/docs/reference/mapping/transform.asciidoc +++ /dev/null @@ -1,61 +0,0 @@ -[[mapping-transform]] -== Transform -The document can be transformed before it is indexed by registering a -script in the `transform` element of the mapping. The result of the -transform is indexed but the original source is stored in the `_source` -field. Example: - -[source,js] --------------------------------------------------- -{ - "example" : { - "transform" : { - "script" : { - "inline": "if (ctx._source['title']?.startsWith('t')) ctx._source['suggest'] = ctx._source['content']", - "params" : { - "variable" : "not used but an example anyway" - }, - "lang": "groovy" - } - }, - "properties": { - "title": { "type": "string" }, - "content": { "type": "string" }, - "suggest": { "type": "string" } - } - } -} --------------------------------------------------- - -Its also possible to specify multiple transforms: -[source,js] --------------------------------------------------- -{ - "example" : { - "transform" : [ - {"script": "ctx._source['suggest'] = ctx._source['content']"} - {"script": "ctx._source['foo'] = ctx._source['bar'];"} - ] - } -} --------------------------------------------------- - -Because the result isn't stored in the source it can't normally be fetched by -source filtering. It can be highlighted if it is marked as stored. - -=== Get Transformed -The get endpoint will retransform the source if the `_source_transform` -parameter is set. Example: - -[source,sh] --------------------------------------------------- -curl -XGET "http://localhost:9200/test/example/3?pretty&_source_transform" --------------------------------------------------- - -The transform is performed before any source filtering but it is mostly -designed to make it easy to see what was passed to the index for debugging. - -=== Immutable Transformation -Once configured the transform script cannot be modified. This is not -because that is technically impossible but instead because madness lies -down that road. diff --git a/docs/reference/mapping/types.asciidoc b/docs/reference/mapping/types.asciidoc index 0cc967e77e1..f7c93b04718 100644 --- a/docs/reference/mapping/types.asciidoc +++ b/docs/reference/mapping/types.asciidoc @@ -1,24 +1,71 @@ [[mapping-types]] -== Types +== Field datatypes -The datatype for each field in a document (eg strings, numbers, -objects etc) can be controlled via the type mapping. +Elasticsearch supports a number of different datatypes for the fields in a +document: -include::types/core-types.asciidoc[] +[float] +=== Core datatypes -include::types/array-type.asciidoc[] +<>:: `string` +<>:: `long`, `integer`, `short`, `byte`, `double`, `float` +<>:: `date` +<>:: `boolean` +<>:: `binary` -include::types/object-type.asciidoc[] +[float] +=== Complex datatypes -include::types/root-object-type.asciidoc[] +<>:: Array support does not require a dedicated `type` +<>:: `object` for single JSON objects +<>:: `nested` for arrays of JSON objects + +[float] +=== Geo dataypes + +<>:: `geo_point` for lat/lon points +<>:: `geo_shape` for complex shapes like polygons + +[float] +=== Specialised datatypes + +<>:: `ip` for IPv4 addresses +<>:: + `completion` to provide auto-complete suggestions +<>:: `token_count` to count the number of tokens in a string + +Attachment datatype:: + + See the https://github.com/elastic/elasticsearch-mapper-attachments[mapper attachment plugin] + which supports indexing ``attachments'' like Microsoft Office formats, Open + Document formats, ePub, HTML, etc. into an `attachment` datatype. + +include::types/array.asciidoc[] + +include::types/binary.asciidoc[] + +include::types/boolean.asciidoc[] + +include::types/date.asciidoc[] + +include::types/geo-point.asciidoc[] + +include::types/geo-shape.asciidoc[] + +include::types/ip.asciidoc[] + +include::types/nested.asciidoc[] + +include::types/numeric.asciidoc[] + +include::types/object.asciidoc[] + +include::types/string.asciidoc[] + +include::types/token-count.asciidoc[] -include::types/nested-type.asciidoc[] -include::types/ip-type.asciidoc[] -include::types/geo-point-type.asciidoc[] -include::types/geo-shape-type.asciidoc[] -include::types/attachment-type.asciidoc[] diff --git a/docs/reference/mapping/types/array-type.asciidoc b/docs/reference/mapping/types/array-type.asciidoc deleted file mode 100644 index f2dc40ed094..00000000000 --- a/docs/reference/mapping/types/array-type.asciidoc +++ /dev/null @@ -1,69 +0,0 @@ -[[mapping-array-type]] -=== Array Type - -JSON documents allow to define an array (list) of fields or objects. -Mapping array types could not be simpler since arrays gets automatically -detected and mapping them can be done either with -<> or -<> mappings. -For example, the following JSON defines several arrays: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "message" : "some arrays in this tweet...", - "tags" : ["elasticsearch", "wow"], - "lists" : [ - { - "name" : "prog_list", - "description" : "programming list" - }, - { - "name" : "cool_list", - "description" : "cool stuff list" - } - ] - } -} --------------------------------------------------- - -The above JSON has the `tags` property defining a list of a simple -`string` type, and the `lists` property is an `object` type array. Here -is a sample explicit mapping: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "message" : {"type" : "string"}, - "tags" : {"type" : "string"}, - "lists" : { - "properties" : { - "name" : {"type" : "string"}, - "description" : {"type" : "string"} - } - } - } - } -} --------------------------------------------------- - -The fact that array types are automatically supported can be shown by -the fact that the following JSON document is perfectly fine: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "message" : "some arrays in this tweet...", - "tags" : "elasticsearch", - "lists" : { - "name" : "prog_list", - "description" : "programming list" - } - } -} --------------------------------------------------- - diff --git a/docs/reference/mapping/types/array.asciidoc b/docs/reference/mapping/types/array.asciidoc new file mode 100644 index 00000000000..2422e48e3d1 --- /dev/null +++ b/docs/reference/mapping/types/array.asciidoc @@ -0,0 +1,99 @@ +[[array]] +=== Array datatype + +In Elasticsearch, there is no dedicated `array` type. Any field can contain +zero or more values by default, however, all values in the array must be of +the same datatype. For instance: + +* an array of strings: [ `"one"`, `"two"` ] +* an array of integers: [ `1`, `2` ] +* an array of arrays: [ `1`, [ `2`, `3` ]] which is the equivalent of [ `1`, `2`, `3` ] +* an array of objects: [ `{ "name": "Mary", "age": 12 }`, `{ "name": "John", "age": 10 }`] + +.Arrays of objects +[NOTE] +==================================================== + +Arrays of objects do not work as you would expect: you cannot query each +object independently of the other objects in the array. If you need to be +able to do this then you should use the <> datatype instead +of the <> datatype. + +This is explained in more detail in <>. +==================================================== + + +When adding a field dynamically, the first value in the array determines the +field `type`. All subsequent values must be of the same datatype or it must +at least be possible to <> subsequent values to the same +datatype. + +Arrays with a mixture of datatypes are _not_ supported: [ `10`, `"some string"` ] + +An array may contain `null` values, which are either replaced by the +configured <> or skipped entirely. An empty array +`[]` is treated as a missing field -- a field with no values. + +Nothing needs to be pre-configured in order to use arrays in documents, they +are supported out of the box: + + +[source,js] +-------------------------------------------------- +PUT my_index/my_type/1 +{ + "message": "some arrays in this document...", + "tags": [ "elasticsearch", "wow" ], <1> + "lists": [ <2> + { + "name": "prog_list", + "description": "programming list" + }, + { + "name": "cool_list", + "description": "cool stuff list" + } + ] +} + +PUT my_index/my_type/2 <3> +{ + "message": "no arrays in this document...", + "tags": "elasticsearch", + "lists": { + "name": "prog_list", + "description": "programming list" + } +} + +GET my_index/_search +{ + "query": { + "match": { + "tags": "elasticsearch" <4> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `tags` field is dynamically added as a `string` field. +<2> The `lists` field is dynamically added as an `object` field. +<3> The second document contains no arrays, but can be indexed into the same fields. +<4> The query looks for `elasticsearch` in the `tags` field, and matches both documents. + +.Multi-value fields and the inverted index +**************************************************** + +The fact that all field types support multi-value fields out of the box is a +consequence of the origins of Lucene. Lucene was designed to be a full text +search engine. In order to be able to search for individual words within a +big block of text, Lucene tokenizes the text into individual terms, and +adds each term to the inverted index separately. + +This means that even a simple text field must be able to support multiple +values by default. When other datatypes were added, such as numbers and +dates, they used the same data structure as strings, and so got multi-values +for free. + +**************************************************** + diff --git a/docs/reference/mapping/types/attachment-type.asciidoc b/docs/reference/mapping/types/attachment-type.asciidoc deleted file mode 100644 index a8e49b44bbd..00000000000 --- a/docs/reference/mapping/types/attachment-type.asciidoc +++ /dev/null @@ -1,13 +0,0 @@ -[[mapping-attachment-type]] -=== Attachment Type - -The `attachment` type allows to index different "attachment" type field -(encoded as `base64`), for example, Microsoft Office formats, open -document formats, ePub, HTML, and so on. - -The `attachment` type is provided as a -https://github.com/elasticsearch/elasticsearch-mapper-attachments[plugin -extension]. It uses http://tika.apache.org/[Apache Tika] behind the scene. - -See https://github.com/elasticsearch/elasticsearch-mapper-attachments#mapper-attachments-type-for-elasticsearch[README file] -for details. diff --git a/docs/reference/mapping/types/binary.asciidoc b/docs/reference/mapping/types/binary.asciidoc new file mode 100644 index 00000000000..ff76fbebf90 --- /dev/null +++ b/docs/reference/mapping/types/binary.asciidoc @@ -0,0 +1,52 @@ +[[binary]] +=== Binary datatype + +The `binary` type accepts a binary value as a +https://en.wikipedia.org/wiki/Base64[Base64] encoded string. The field is not +stored by default and is not searchable: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "name": { + "type": "string" + }, + "blob": { + "type": "binary" + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "name": "Some binary blob", + "blob": "U29tZSBiaW5hcnkgYmxvYg==" <1> +} +-------------------------------------------------- +<1> The Base64 encoded binary value must not have embedded newlines `\n`. + +[[binary-params]] +==== Parameters for `binary` fields + +The following parameters are accepted by `binary` fields: + +[horizontal] + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` or `false` (default). + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). + + diff --git a/docs/reference/mapping/types/boolean.asciidoc b/docs/reference/mapping/types/boolean.asciidoc new file mode 100644 index 00000000000..5ebcc651d09 --- /dev/null +++ b/docs/reference/mapping/types/boolean.asciidoc @@ -0,0 +1,119 @@ +[[boolean]] +=== Boolean datatype + +Boolean fields accept JSON `true` and `false` values, but can also accept +strings and numbers which are interpreted as either true or false: + +[horizontal] +False values:: + + `false`, `"false"`, `"off"`, `"no"`, `"0"`, `""` (empty string), `0`, `0.0` + +True values:: + + Anything that isn't false. + +For example: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "is_published": { + "type": "boolean" + } + } + } + } +} + +POST my_index/my_type/1 +{ + "is_published": true <1> +} + +GET my_index/_search +{ + "query": { + "term": { + "is_published": 1 <2> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> Indexing a document with a JSON `true`. +<2> Querying for the document with `1`, which is interpreted as `true`. + +Aggregations like the <> use `1` and `0` for the `key`, and the strings `"true"` and +`"false"` for the `key_as_string`. Boolean fields when used in scripts, +return `1` and `0`: + +[source,js] +-------------------------------------------------- +POST my_index/my_type/1 +{ + "is_published": true +} + +POST my_index/my_type/2 +{ + "is_published": false +} + +GET my_index/_search +{ + "aggs": { + "publish_state": { + "terms": { + "field": "is_published" + } + } + }, + "script_fields": { + "is_published": { + "script": "doc['is_published'].value" <1> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> Inline scripts must be <> for this example to work. + +[[boolean-params]] +==== Parameters for `boolean` fields + +The following parameters are accepted by `boolean` fields: + +[horizontal] + +<>:: + + Field-level index time boosting. Accepts a floating point number, defaults + to `1.0`. + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` (default) or `false`. + +<>:: + + Should the field be searchable? Accepts `not_analyzed` (default) and `no`. + +<>:: + + Accepts any of the true or false values listed above. The value is + substituted for any explicit `null` values. Defaults to `null`, which + means the field is treated as missing. + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). + diff --git a/docs/reference/mapping/types/core-types.asciidoc b/docs/reference/mapping/types/core-types.asciidoc deleted file mode 100644 index f848fbd290e..00000000000 --- a/docs/reference/mapping/types/core-types.asciidoc +++ /dev/null @@ -1,649 +0,0 @@ -[[mapping-core-types]] -=== Core Types - -Each JSON field can be mapped to a specific core type. JSON itself -already provides us with some typing, with its support for `string`, -`integer`/`long`, `float`/`double`, `boolean`, and `null`. - -The following sample tweet JSON document will be used to explain the -core types: - -[source,js] --------------------------------------------------- -{ - "tweet" { - "user" : "kimchy", - "message" : "This is a tweet!", - "postDate" : "2009-11-15T14:12:12", - "priority" : 4, - "rank" : 12.3 - } -} --------------------------------------------------- - -Explicit mapping for the above JSON tweet can be: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "user" : {"type" : "string", "index" : "not_analyzed"}, - "message" : {"type" : "string", "null_value" : "na"}, - "postDate" : {"type" : "date"}, - "priority" : {"type" : "integer"}, - "rank" : {"type" : "float"} - } - } -} --------------------------------------------------- - -[float] -[[string]] -==== String - -The text based string type is the most basic type, and contains one or -more characters. An example mapping can be: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "message" : { - "type" : "string", - "store" : true, - "index" : "analyzed", - "null_value" : "na" - }, - "user" : { - "type" : "string", - "index" : "not_analyzed", - "norms" : { - "enabled" : false - } - } - } - } -} --------------------------------------------------- - -The above mapping defines a `string` `message` property/field within the -`tweet` type. The field is stored in the index (so it can later be -retrieved using selective loading when searching), and it gets analyzed -(broken down into searchable terms). If the message has a `null` value, -then the value that will be stored is `na`. There is also a `string` `user` -which is indexed as-is (not broken down into tokens) and has norms -disabled (so that matching this field is a binary decision, no match is -better than another one). - -The following table lists all the attributes that can be used with the -`string` type: - -[cols="<,<",options="header",] -|======================================================================= -|Attribute |Description -|`index_name` |The name of the field that will be stored in the index. -Defaults to the property/field name. - -|`store` |Set to `true` to actually store the field in the index, `false` to not -store it. Since by default Elasticsearch stores all fields of the source -document in the special `_source` field, this option is primarily useful when -the `_source` field has been disabled in the type definition. Defaults to -`false`. - -|`index` |Set to `analyzed` for the field to be indexed and searchable -after being broken down into token using an analyzer. `not_analyzed` -means that its still searchable, but does not go through any analysis -process or broken down into tokens. `no` means that it won't be -searchable at all (as an individual field; it may still be included in -`_all`). Setting to `no` disables `include_in_all`. Defaults to -`analyzed`. - -|`doc_values` |Set to `true` to store field values in a column-stride fashion. -Automatically set to `true` when the <> is `doc_values`. - -|`term_vector` |Possible values are `no`, `yes`, `with_offsets`, -`with_positions`, `with_positions_offsets`. Defaults to `no`. - -|`boost` |The boost value. Defaults to `1.0`. - -|`null_value` |When there is a (JSON) null value for the field, use the -`null_value` as the field value. Defaults to not adding the field at -all. - -|`norms: {enabled: }` |Boolean value if norms should be enabled or -not. Defaults to `true` for `analyzed` fields, and to `false` for -`not_analyzed` fields. See the <>. - -|`norms: {loading: }` |Describes how norms should be loaded, possible values are -`eager` and `lazy` (default). It is possible to change the default value to -eager for all fields by configuring the index setting `index.norms.loading` -to `eager`. - -|`index_options` | Allows to set the indexing -options, possible values are `docs` (only doc numbers are indexed), -`freqs` (doc numbers and term frequencies), and `positions` (doc -numbers, term frequencies and positions). Defaults to `positions` for -`analyzed` fields, and to `docs` for `not_analyzed` fields. It -is also possible to set it to `offsets` (doc numbers, term -frequencies, positions and offsets). - -|`analyzer` |The analyzer used to analyze the text contents when -`analyzed` during indexing and searching. -Defaults to the globally configured analyzer. - -|`search_analyzer` |The analyzer used to analyze the field when searching, which -overrides the value of `analyzer`. Can be updated on an existing field. - -|`include_in_all` |Should the field be included in the `_all` field (if -enabled). If `index` is set to `no` this defaults to `false`, otherwise, -defaults to `true` or to the parent `object` type setting. - -|`ignore_above` |The analyzer will ignore strings larger than this size. -Useful for generic `not_analyzed` fields that should ignore long text. - -This option is also useful for protecting against Lucene's term byte-length -limit of `32766`. Note: the value for `ignore_above` is the _character count_, -but Lucene counts bytes, so if you have UTF-8 text, you may want to set the -limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most 3 -bytes. - -|`position_offset_gap` |Position increment gap between field instances -with the same field name. Defaults to 0. -|======================================================================= - -The `string` type also support custom indexing parameters associated -with the indexed value. For example: - -[source,js] --------------------------------------------------- -{ - "message" : { - "_value": "boosted value", - "_boost": 2.0 - } -} --------------------------------------------------- - -The mapping is required to disambiguate the meaning of the document. -Otherwise, the structure would interpret "message" as a value of type -"object". The key `_value` (or `value`) in the inner document specifies -the real string content that should eventually be indexed. The `_boost` -(or `boost`) key specifies the per field document boost (here 2.0). - -[float] -[[norms]] -===== Norms - -Norms store various normalization factors that are later used (at query time) -in order to compute the score of a document relatively to a query. - -Although useful for scoring, norms also require quite a lot of memory -(typically in the order of one byte per document per field in your index, -even for documents that don't have this specific field). As a consequence, if -you don't need scoring on a specific field, it is highly recommended to disable -norms on it. In particular, this is the case for fields that are used solely -for filtering or aggregations. - -In case you would like to disable norms after the fact, it is possible to do so -by using the <>, like this: - -[source,js] ------------- -PUT my_index/_mapping/my_type -{ - "properties": { - "title": { - "type": "string", - "norms": { - "enabled": false - } - } - } -} ------------- - -Please however note that norms won't be removed instantly, but will be removed -as old segments are merged into new segments as you continue indexing new documents. -Any score computation on a field that has had -norms removed might return inconsistent results since some documents won't have -norms anymore while other documents might still have norms. - -[float] -[[number]] -==== Number - -A number based type supporting `float`, `double`, `byte`, `short`, -`integer`, and `long`. It uses specific constructs within Lucene in -order to support numeric values. The number types have the same ranges -as corresponding -http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html[Java -types]. An example mapping can be: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "rank" : { - "type" : "float", - "null_value" : 1.0 - } - } - } -} --------------------------------------------------- - -The following table lists all the attributes that can be used with a -numbered type: - -[cols="<,<",options="header",] -|======================================================================= -|Attribute |Description -|`type` |The type of the number. Can be `float`, `double`, `integer`, -`long`, `short`, `byte`. Required. - -|`index_name` |The name of the field that will be stored in the index. -Defaults to the property/field name. - -|`store` |Set to `true` to store actual field in the index, `false` to not -store it. Defaults to `false` (note, the JSON document itself is stored, -and it can be retrieved from it). - -|`index` |Set to `no` if the value should not be indexed. Setting to -`no` disables `include_in_all`. If set to `no` the field should be either stored -in `_source`, have `include_in_all` enabled, or `store` be set to -`true` for this to be useful. - -|`doc_values` |Set to `true` to store field values in a column-stride fashion. -Automatically set to `true` when the fielddata format is `doc_values`. - -|`precision_step` |The precision step (influences the number of terms -generated for each number value). Defaults to `16` for `long`, `double`, -`8` for `short`, `integer`, `float`, and `2147483647` for `byte`. - -|`boost` |The boost value. Defaults to `1.0`. - -|`null_value` |When there is a (JSON) null value for the field, use the -`null_value` as the field value. Defaults to not adding the field at -all. - -|`include_in_all` |Should the field be included in the `_all` field (if -enabled). If `index` is set to `no` this defaults to `false`, otherwise, -defaults to `true` or to the parent `object` type setting. - -|`ignore_malformed` |Ignored a malformed number. Defaults to `false`. - -|`coerce` |Try convert strings to numbers and truncate fractions for integers. Defaults to `true`. - -|======================================================================= - -[float] -[[token_count]] -==== Token Count -The `token_count` type maps to the JSON string type but indexes and stores -the number of tokens in the string rather than the string itself. For -example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "name" : { - "type" : "string", - "fields" : { - "word_count": { - "type" : "token_count", - "store" : "yes", - "analyzer" : "standard" - } - } - } - } - } -} --------------------------------------------------- - -All the configuration that can be specified for a number can be specified -for a token_count. The only extra configuration is the required -`analyzer` field which specifies which analyzer to use to break the string -into tokens. For best performance, use an analyzer with no token filters. - -[NOTE] -=================================================================== -Technically the `token_count` type sums position increments rather than -counting tokens. This means that even if the analyzer filters out stop -words they are included in the count. -=================================================================== - -[float] -[[date]] -==== Date - -The date type is a special type which maps to JSON string type. It -follows a specific format that can be explicitly set. All dates are -`UTC`. Internally, a date maps to a number type `long`, with the added -parsing stage from string to long and from long to string. An example -mapping: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "postDate" : { - "type" : "date", - "format" : "YYYY-MM-dd" - } - } - } -} --------------------------------------------------- - -The date type will also accept a long number representing UTC -milliseconds since the epoch, regardless of the format it can handle. - -The following table lists all the attributes that can be used with a -date type: - -[cols="<,<",options="header",] -|======================================================================= -|Attribute |Description -|`index_name` |The name of the field that will be stored in the index. -Defaults to the property/field name. - -|`format` |The <>. Defaults to `epoch_millis||strictDateOptionalTime`. - -|`store` |Set to `true` to store actual field in the index, `false` to not -store it. Defaults to `false` (note, the JSON document itself is stored, -and it can be retrieved from it). - -|`index` |Set to `no` if the value should not be indexed. Setting to -`no` disables `include_in_all`. If set to `no` the field should be either stored -in `_source`, have `include_in_all` enabled, or `store` be set to -`true` for this to be useful. - -|`doc_values` |Set to `true` to store field values in a column-stride fashion. -Automatically set to `true` when the fielddata format is `doc_values`. - -|`precision_step` |The precision step (influences the number of terms -generated for each number value). Defaults to `16`. - -|`boost` |The boost value. Defaults to `1.0`. - -|`null_value` |When there is a (JSON) null value for the field, use the -`null_value` as the field value. Defaults to not adding the field at -all. - -|`include_in_all` |Should the field be included in the `_all` field (if -enabled). If `index` is set to `no` this defaults to `false`, otherwise, -defaults to `true` or to the parent `object` type setting. - -|`ignore_malformed` |Ignored a malformed number. Defaults to `false`. - -|======================================================================= - -[float] -[[boolean]] -==== Boolean - -The boolean type Maps to the JSON boolean type. It ends up storing -within the index either `T` or `F`, with automatic translation to `true` -and `false` respectively. - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "hes_my_special_tweet" : { - "type" : "boolean" - } - } - } -} --------------------------------------------------- - -The boolean type also supports passing the value as a number or a string -(in this case `0`, an empty string, `false`, `off` and `no` are -`false`, all other values are `true`). - -The following table lists all the attributes that can be used with the -boolean type: - -[cols="<,<",options="header",] -|======================================================================= -|Attribute |Description -|`index_name` |The name of the field that will be stored in the index. -Defaults to the property/field name. - -|`store` |Set to `true` to store actual field in the index, `false` to not -store it. Defaults to `false` (note, the JSON document itself is stored, -and it can be retrieved from it). - -|`index` |Set to `no` if the value should not be indexed. Setting to -`no` disables `include_in_all`. If set to `no` the field should be either stored -in `_source`, have `include_in_all` enabled, or `store` be set to -`true` for this to be useful. - -|`doc_values` |Set to `true` to store field values in a column-stride fashion. -Automatically set to `true` when the fielddata format is `doc_values`. - -|`boost` |The boost value. Defaults to `1.0`. - -|`null_value` |When there is a (JSON) null value for the field, use the -`null_value` as the field value. Defaults to not adding the field at -all. -|======================================================================= - -[float] -[[binary]] -==== Binary - -The binary type is a base64 representation of binary data that can be -stored in the index. The field is not stored by default and not indexed at -all. - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "image" : { - "type" : "binary" - } - } - } -} --------------------------------------------------- - -The following table lists all the attributes that can be used with the -binary type: - -[horizontal] - -`index_name`:: - - The name of the field that will be stored in the index. Defaults to the - property/field name. - -`store`:: - - Set to `true` to store actual field in the index, `false` to not store it. - Defaults to `false` (note, the JSON document itself is already stored, so - the binary field can be retrieved from there). - -`doc_values`:: - - Set to `true` to store field values in a column-stride fashion. - -[float] -[[fielddata-filters]] -==== Fielddata filters - -It is possible to control which field values are loaded into memory, -which is particularly useful for aggregations on string fields, using -fielddata filters, which are explained in detail in the -<> section. - -Fielddata filters can exclude terms which do not match a regex, or which -don't fall between a `min` and `max` frequency range: - -[source,js] --------------------------------------------------- -{ - tweet: { - type: "string", - analyzer: "whitespace" - fielddata: { - filter: { - regex: { - "pattern": "^#.*" - }, - frequency: { - min: 0.001, - max: 0.1, - min_segment_size: 500 - } - } - } - } -} --------------------------------------------------- - -These filters can be updated on an existing field mapping and will take -effect the next time the fielddata for a segment is loaded. Use the -<> API -to reload the fielddata using the new filters. - -[float] -==== Similarity - -Elasticsearch allows you to configure a similarity (scoring algorithm) per field. -The `similarity` setting provides a simple way of choosing a similarity algorithm -other than the default TF/IDF, such as `BM25`. - -You can configure similarities via the -<> - -[float] -===== Configuring Similarity per Field - -Defining the Similarity for a field is done via the `similarity` mapping -property, as this example shows: - -[source,js] --------------------------------------------------- -{ - "book":{ - "properties":{ - "title":{ - "type":"string", "similarity":"BM25" - } - } - } -} --------------------------------------------------- - -The following Similarities are configured out-of-box: - -`default`:: - The Default TF/IDF algorithm used by Elasticsearch and - Lucene in previous versions. - -`BM25`:: - The BM25 algorithm. - http://en.wikipedia.org/wiki/Okapi_BM25[See Okapi_BM25] for more - details. - - -[[copy-to]] -[float] -===== Copy to field - -Adding `copy_to` parameter to any field mapping will cause all values of this field to be copied to fields specified in -the parameter. In the following example all values from fields `title` and `abstract` will be copied to the field -`meta_data`. The field which is being copied to will be indexed (i.e. searchable, and available through `fielddata_field`) but the original source will not be modified. - - -[source,js] --------------------------------------------------- -{ - "book" : { - "properties" : { - "title" : { "type" : "string", "copy_to" : "meta_data" }, - "abstract" : { "type" : "string", "copy_to" : "meta_data" }, - "meta_data" : { "type" : "string" } - } -} --------------------------------------------------- - -Multiple fields are also supported: - -[source,js] --------------------------------------------------- -{ - "book" : { - "properties" : { - "title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] } - } -} --------------------------------------------------- - -[float] -[[multi-fields]] -===== Multi fields - -The `fields` options allows to map several core types fields into a single -json source field. This can be useful if a single field need to be -used in different ways. For example a single field is to be used for both -free text search and sorting. - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "name" : { - "type" : "string", - "index" : "analyzed", - "fields" : { - "raw" : {"type" : "string", "index" : "not_analyzed"} - } - } - } - } -} --------------------------------------------------- - -In the above example the field `name` gets processed twice. The first time it gets -processed as an analyzed string and this version is accessible under the field name -`name`, this is the main field and is in fact just like any other field. The second time -it gets processed as a not analyzed string and is accessible under the name `name.raw`. - -[float] -==== Include in All - -The `include_in_all` setting is ignored on any field that is defined in -the `fields` options. Setting the `include_in_all` only makes sense on -the main field, since the raw field value is copied to the `_all` field, -the tokens aren't copied. - -[float] -==== Updating a field - -In essence a field cannot be updated. However multi fields can be -added to existing fields. This allows for example to have a different -`analyzer` configuration in addition to the already configured -`analyzer` configuration specified in the main and other multi fields. - -Also the new multi field will only be applied on document that have been -added after the multi field has been added and in fact the new multi field -doesn't exist in existing documents. - -Another important note is that new multi fields will be merged into the -list of existing multi fields, so when adding new multi fields for a field -previous added multi fields don't need to be specified. diff --git a/docs/reference/mapping/types/date.asciidoc b/docs/reference/mapping/types/date.asciidoc new file mode 100644 index 00000000000..1eccc414a9c --- /dev/null +++ b/docs/reference/mapping/types/date.asciidoc @@ -0,0 +1,138 @@ +[[date]] +=== Date datatype + +JSON doesn't have a date datatype, so dates in Elasticsearch can either be: + +* strings containing formatted dates, e.g. `¨2015-01-01¨` or `¨2015/01/01 12:10:30`. +* a long number representing _milliseconds-since-the-epoch_. +* an integer representing _seconds-since-the-epoch_. + +Internally, dates are converted to UTC (if the time-zone is specified) and +stored as a long number representing milliseconds-since-the-epoch. + +Date formats can be customised, but if no `format` is specified then it uses +the default: `strictDateOptionalTime||epoch_millis`. This means that it will +accept dates with optional timestamps, which conform to the formats supported +by <> or milliseconds-since-the-epoch. + +For instance: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "date": { + "type": "date" <1> + } + } + } + } +} + +PUT my_index/my_type/1 +{ "date": "2015-01-01" } <2> + +PUT my_index/my_type/2 +{ "date": "2015-01-01T12:10:30Z" } <3> + +PUT my_index/my_type/3 +{ "date": 1420070400001 } <4> + +GET my_index/_search +{ + "sort": { "date": "asc"} <5> +} +-------------------------------------------------- +// AUTOSENSE +<1> The `date` field uses the default `format`. +<2> This document uses a plain date. +<3> This document includes a time. +<4> This document uses milliseconds-since-the-epoch. +<5> Note that the `sort` values that are returned are all in milliseconds-since-the-epoch. + +[[multiple-date-formats]] +==== Multiple date formats + +Multiple formats can be specified by separating them with `||` as a separator. +Each format will be tried in turn until a matching format is found. The first +format will be used to convert the _milliseconds-since-the-epoch_ value back +into a string. + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "date": { + "type": "date", + "format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + +[[date-params]] +==== Parameters for `date` fields + +The following parameters are accepted by `date` fields: + +[horizontal] + +<>:: + + Field-level index time boosting. Accepts a floating point number, defaults + to `1.0`. + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` (default) or `false`. + +<>:: + + The date format(s) that can be parsed. Defaults to + `epoch_millis||strictDateOptionalTime`. + +<>:: + + If `true`, malformed numbers are ignored. If `false` (default), malformed + numbers throw an exception and reject the whole document. + +<>:: + + Whether or not the field value should be included in the + <> field? Accepts `true` or `false`. Defaults + to `false` if <> is set to `no`, or if a parent + <> field sets `include_in_all` to `false`. + Otherwise defaults to `true`. + +<>:: + + Should the field be searchable? Accepts `not_analyzed` (default) and `no`. + +<>:: + + Accepts a date value in one of the configured +format+'s as the field + which is substituted for any explicit `null` values. Defaults to `null`, + which means the field is treated as missing. + +<>:: + + Controls the number of extra terms that are indexed to make + <> faster. Defaults to `16`. + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). + + diff --git a/docs/reference/mapping/types/geo-point-type.asciidoc b/docs/reference/mapping/types/geo-point-type.asciidoc deleted file mode 100644 index 35f3600468e..00000000000 --- a/docs/reference/mapping/types/geo-point-type.asciidoc +++ /dev/null @@ -1,215 +0,0 @@ -[[mapping-geo-point-type]] -=== Geo Point Type - -Mapper type called `geo_point` to support geo based points. The -declaration looks as follows: - -[source,js] --------------------------------------------------- -{ - "pin" : { - "properties" : { - "location" : { - "type" : "geo_point" - } - } - } -} --------------------------------------------------- - -[float] -==== Indexed Fields - -The `geo_point` mapping will index a single field with the format of -`lat,lon`. The `lat_lon` option can be set to also index the `.lat` and -`.lon` as numeric fields, and `geohash` can be set to `true` to also -index `.geohash` value. - -A good practice is to enable indexing `lat_lon` as well, since both the -geo distance and bounding box filters can either be executed using in -memory checks, or using the indexed lat lon values, and it really -depends on the data set which one performs better. Note though, that -indexed lat lon only make sense when there is a single geo point value -for the field, and not multi values. - -[float] -==== Geohashes - -Geohashes are a form of lat/lon encoding which divides the earth up into -a grid. Each cell in this grid is represented by a geohash string. Each -cell in turn can be further subdivided into smaller cells which are -represented by a longer string. So the longer the geohash, the smaller -(and thus more accurate) the cell is. - -Because geohashes are just strings, they can be stored in an inverted -index like any other string, which makes querying them very efficient. - -If you enable the `geohash` option, a `geohash` ``sub-field'' will be -indexed as, eg `pin.geohash`. The length of the geohash is controlled by -the `geohash_precision` parameter, which can either be set to an absolute -length (eg `12`, the default) or to a distance (eg `1km`). - -More usefully, set the `geohash_prefix` option to `true` to not only index -the geohash value, but all the enclosing cells as well. For instance, a -geohash of `u30` will be indexed as `[u,u3,u30]`. This option can be used -by the <> to find geopoints within a -particular cell very efficiently. - -[float] -==== Input Structure - -The above mapping defines a `geo_point`, which accepts different -formats. The following formats are supported: - -[float] -===== Lat Lon as Properties - -[source,js] --------------------------------------------------- -{ - "pin" : { - "location" : { - "lat" : 41.12, - "lon" : -71.34 - } - } -} --------------------------------------------------- - -[float] -===== Lat Lon as String - -Format in `lat,lon`. - -[source,js] --------------------------------------------------- -{ - "pin" : { - "location" : "41.12,-71.34" - } -} --------------------------------------------------- - -[float] -===== Geohash - -[source,js] --------------------------------------------------- -{ - "pin" : { - "location" : "drm3btev3e86" - } -} --------------------------------------------------- - -[float] -===== Lat Lon as Array - -Format in `[lon, lat]`, note, the order of lon/lat here in order to -conform with http://geojson.org/[GeoJSON]. - -[source,js] --------------------------------------------------- -{ - "pin" : { - "location" : [-71.34, 41.12] - } -} --------------------------------------------------- - -[float] -==== Mapping Options - -[cols="<,<",options="header",] -|======================================================================= -|Option |Description -|`lat_lon` |Set to `true` to also index the `.lat` and `.lon` as fields. -Defaults to `false`. - -|`geohash` |Set to `true` to also index the `.geohash` as a field. -Defaults to `false`. - -|`geohash_precision` |Sets the geohash precision. It can be set to an -absolute geohash length or a distance value (eg 1km, 1m, 1ml) defining -the size of the smallest cell. Defaults to an absolute length of 12. - -|`geohash_prefix` |If this option is set to `true`, not only the geohash -but also all its parent cells (true prefixes) will be indexed as well. The -number of terms that will be indexed depends on the `geohash_precision`. -Defaults to `false`. *Note*: This option implicitly enables `geohash`. - -|`validate` |Set to `false` to accept geo points with invalid latitude or -longitude (default is `true`). *Note*: Validation only works when -normalization has been disabled. This option will be deprecated and removed -in upcoming releases. - -|`validate_lat` |Set to `false` to accept geo points with an invalid -latitude (default is `true`). This option will be deprecated and removed -in upcoming releases. - -|`validate_lon` |Set to `false` to accept geo points with an invalid -longitude (default is `true`). This option will be deprecated and removed -in upcoming releases. - -|`normalize` |Set to `true` to normalize latitude and longitude (default -is `true`). - -|`normalize_lat` |Set to `true` to normalize latitude. - -|`normalize_lon` |Set to `true` to normalize longitude. - -|`precision_step` |The precision step (influences the number of terms -generated for each number value) for `.lat` and `.lon` fields -if `lat_lon` is set to `true`. -Defaults to `16`. -|======================================================================= - -[float] -==== Field data - -By default, geo points use the `array` format which loads geo points into two -parallel double arrays, making sure there is no precision loss. However, this -can require a non-negligible amount of memory (16 bytes per document) which is -why Elasticsearch also provides a field data implementation with lossy -compression called `compressed`: - -[source,js] --------------------------------------------------- -{ - "pin" : { - "properties" : { - "location" : { - "type" : "geo_point", - "fielddata" : { - "format" : "compressed", - "precision" : "1cm" - } - } - } - } -} --------------------------------------------------- - -This field data format comes with a `precision` option which allows to -configure how much precision can be traded for memory. The default value is -`1cm`. The following table presents values of the memory savings given various -precisions: - -|============================================= -| Precision | Bytes per point | Size reduction -| 1km | 4 | 75% -| 3m | 6 | 62.5% -| 1cm | 8 | 50% -| 1mm | 10 | 37.5% -|============================================= - -Precision can be changed on a live index by using the update mapping API. - -[float] -==== Usage in Scripts - -When using `doc[geo_field_name]` (in the above mapping, -`doc['location']`), the `doc[...].value` returns a `GeoPoint`, which -then allows access to `lat` and `lon` (for example, -`doc[...].value.lat`). For performance, it is better to access the `lat` -and `lon` directly using `doc[...].lat` and `doc[...].lon`. diff --git a/docs/reference/mapping/types/geo-point.asciidoc b/docs/reference/mapping/types/geo-point.asciidoc new file mode 100644 index 00000000000..0049d8f93ac --- /dev/null +++ b/docs/reference/mapping/types/geo-point.asciidoc @@ -0,0 +1,167 @@ +[[geo-point]] +=== Geo-point datatype + +Fields of type `geo_point` accept latitude-longitude pairs, which can be used: + +* to find geo-points within a <>, + within a certain <> of a central point, + within a <>, or within a + <> cell. +* to aggregate documents by <> + or by <> from a central point. +* to integerate distance into a document's <>. +* to <> documents by distance. + +There are four ways that a geo-point may be specified, as demonstrated below: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "location": { + "type": "geo_point" + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "text": "Geo-point as an object", + "location": { <1> + "lat": 41.12, + "lon": -71.34 + } +} + +PUT my_index/my_type/2 +{ + "text": "Geo-point as a string", + "location": "41.12,-71.34" <2> +} + +PUT my_index/my_type/3 +{ + "text": "Geo-point as a geohash", + "location": "drm3btev3e86" <3> +} + +PUT my_index/my_type/4 +{ + "text": "Geo-point as an array", + "location": [ -71.34, 41.12 ] <4> +} + +GET my_index/_search +{ + "query": { + "geo_bounding_box": { <5> + "location": { + "top_left": { + "lat": 42, + "lon": -72 + }, + "bottom_right": { + "lat": 40, + "lon": -74 + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> Geo-point expressed as an object, with `lat` and `lon` keys. +<2> Geo-point expressed as a string with the format: `"lat,lon"`. +<3> Geo-point expressed as a geohash. +<4> Geo-point expressed as an array with the format: [ `lon`, `lat`] +<5> A geo-bounding box query which finds all geo-points that fall inside the box. + +[IMPORTANT] +.Geo-points expressed as an array or string +================================================== + +Please note that string geo-points are ordered as `lat,lon`, while array +geo-points are ordered as the reverse: `lon,lat`. + +Originally, `lat,lon` was used for both array and string, but the array +format was changed early on to conform to the format used by GeoJSON. + +================================================== + + +[[geo-point-params]] +==== Parameters for `geo_point` fields + +The following parameters are accepted by `geo_point` fields: + +[horizontal] + +<>:: + + Normalize longitude and latitude values to a standard -180:180 / -90:90 + coordinate system. Accepts `true` and `false` (default). + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` (default) or `false`. + +<>:: + + Should the geo-point also be indexed as a geohash in the `.geohash` + sub-field? Defaults to `false`, unless `geohash_prefix` is `true`. + +<>:: + + The maximum length of the geohash to use for the `geohash` and + `geohash_prefix` options. + +<>:: + + Should the geo-point also be indexed as a geohash plus all its prefixes? + Defaults to `false`. + +<>:: + + If `true`, malformed geo-points are ignored. If `false` (default), + malformed geo-points throw an exception and reject the whole document. + +<>:: + + Should the geo-point also be indexed as `.lat` and `.lon` sub-fields? + Accepts `true` and `false` (default). + +<>:: + + Controls the number of extra terms that are indexed for each lat/lon point. + Defaults to `16`. Ignored if `lat_lon` is `false`. + + +==== Using geo-points in scripts + +When accessing the value of a geo-point in a script, the value is returned as +a `GeoPoint` object, which allows access to the `.lat` and `.lon` values +respectively: + + +[source,js] +-------------------------------------------------- +geopoint = doc['location'].value; +lat = geopoint.lat; +lon = geopoint.lon; +-------------------------------------------------- + +For performance reasons, it is better to access the lat/lon values directly: + +[source,js] +-------------------------------------------------- +lat = doc['location'].lat; +lon = doc['location'].lon; +-------------------------------------------------- + + diff --git a/docs/reference/mapping/types/geo-shape-type.asciidoc b/docs/reference/mapping/types/geo-shape.asciidoc similarity index 99% rename from docs/reference/mapping/types/geo-shape-type.asciidoc rename to docs/reference/mapping/types/geo-shape.asciidoc index 0e2074365b3..563d04d530e 100644 --- a/docs/reference/mapping/types/geo-shape-type.asciidoc +++ b/docs/reference/mapping/types/geo-shape.asciidoc @@ -1,7 +1,7 @@ -[[mapping-geo-shape-type]] -=== Geo Shape Type +[[geo-shape]] +=== Geo-Shape datatype -The `geo_shape` mapping type facilitates the indexing of and searching +The `geo_shape` datatype facilitates the indexing of and searching with arbitrary geo shapes such as rectangles and polygons. It should be used when either the data being indexed or the queries being executed contain shapes other than just points. diff --git a/docs/reference/mapping/types/ip-type.asciidoc b/docs/reference/mapping/types/ip-type.asciidoc deleted file mode 100644 index e2575344b6e..00000000000 --- a/docs/reference/mapping/types/ip-type.asciidoc +++ /dev/null @@ -1,40 +0,0 @@ -[[mapping-ip-type]] -=== IP Type - -An `ip` mapping type allows to store _ipv4_ addresses in a numeric form -allowing to easily sort, and range query it (using ip values). - -The following table lists all the attributes that can be used with an ip -type: - -[cols="<,<",options="header",] -|======================================================================= -|Attribute |Description -|`index_name` |The name of the field that will be stored in the index. -Defaults to the property/field name. - -|`store` |Set to `true` to store actual field in the index, `false` to not -store it. Defaults to `false` (note, the JSON document itself is stored, -and it can be retrieved from it). - -|`index` |Set to `no` if the value should not be indexed. In this case, -`store` should be set to `true`, since if it's not indexed and not -stored, there is nothing to do with it. - -|`precision_step` |The precision step (influences the number of terms -generated for each number value). Defaults to `16`. - -|`boost` |The boost value. Defaults to `1.0`. - -|`null_value` |When there is a (JSON) null value for the field, use the -`null_value` as the field value. Defaults to not adding the field at -all. - -|`include_in_all` |Should the field be included in the `_all` field (if -enabled). Defaults to `true` or to the parent `object` type setting. - -|`doc_values` |Set to `true` to store field values in a column-stride fashion. -Automatically set to `true` when the <> is `doc_values`. - -|======================================================================= - diff --git a/docs/reference/mapping/types/ip.asciidoc b/docs/reference/mapping/types/ip.asciidoc new file mode 100644 index 00000000000..9610466acc2 --- /dev/null +++ b/docs/reference/mapping/types/ip.asciidoc @@ -0,0 +1,89 @@ +[[ip]] +=== IPv4 datatype + +An `ip` field is really a <> field which accepts +https://en.wikipedia.org/wiki/IPv4[IPv4] addresses and indexes them as long +values: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "ip_addr": { + "type": "ip" + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "ip_addr": "192.168.1.1" +} + +GET my_index/_search +{ + "query": { + "range": { + "ip_addr": { + "gte": "192.168.1.0", + "lt": "192.168.2.0" + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + + +[[ip-params]] +==== Parameters for `ip` fields + +The following parameters are accepted by `ip` fields: + +[horizontal] + +<>:: + + Field-level index time boosting. Accepts a floating point number, defaults + to `1.0`. + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` (default) or `false`. + +<>:: + + Whether or not the field value should be included in the + <> field? Accepts `true` or `false`. Defaults + to `false` if <> is set to `no`, or if a parent + <> field sets `include_in_all` to `false`. + Otherwise defaults to `true`. + +<>:: + + Should the field be searchable? Accepts `not_analyzed` (default) and `no`. + +<>:: + + Accepts an IPv4 value which is substituted for any explicit `null` values. + Defaults to `null`, which means the field is treated as missing. + +<>:: + + Controls the number of extra terms that are indexed to make + <> faster. Defaults to `16`. + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). + + +NOTE: IPv6 addresses are not supported yet. diff --git a/docs/reference/mapping/types/nested-type.asciidoc b/docs/reference/mapping/types/nested-type.asciidoc deleted file mode 100644 index 1427c93b8f3..00000000000 --- a/docs/reference/mapping/types/nested-type.asciidoc +++ /dev/null @@ -1,165 +0,0 @@ -[[mapping-nested-type]] -=== Nested Type - -The `nested` type works like the <> except -that an array of `objects` is flattened, while an array of `nested` objects -allows each object to be queried independently. To explain, consider this -document: - -[source,js] --------------------------------------------------- -{ - "group" : "fans", - "user" : [ - { - "first" : "John", - "last" : "Smith" - }, - { - "first" : "Alice", - "last" : "White" - }, - ] -} --------------------------------------------------- - -If the `user` field is of type `object`, this document would be indexed -internally something like this: - -[source,js] --------------------------------------------------- -{ - "group" : "fans", - "user.first" : [ "alice", "john" ], - "user.last" : [ "smith", "white" ] -} --------------------------------------------------- - -The `first` and `last` fields are flattened, and the association between -`alice` and `white` is lost. This document would incorrectly match a query -for `alice AND smith`. - -If the `user` field is of type `nested`, each object is indexed as a separate -document, something like this: - -[source,js] --------------------------------------------------- -{ <1> - "user.first" : "alice", - "user.last" : "white" -} -{ <1> - "user.first" : "john", - "user.last" : "smith" -} -{ <2> - "group" : "fans" -} --------------------------------------------------- -<1> Hidden nested documents. -<2> Visible ``parent'' document. - -By keeping each nested object separate, the association between the -`user.first` and `user.last` fields is maintained. The query for `alice AND -smith` would *not* match this document. - -Searching on nested docs can be done using either the -<>. - -==== Mapping - -The mapping for `nested` fields is the same as `object` fields, except that it -uses type `nested`: - -[source,js] --------------------------------------------------- -{ - "type1" : { - "properties" : { - "user" : { - "type" : "nested", - "properties": { - "first" : {"type": "string" }, - "last" : {"type": "string" } - } - } - } - } -} --------------------------------------------------- - -NOTE: changing an `object` type to `nested` type requires reindexing. - -You may want to index inner objects both as `nested` fields *and* as flattened -`object` fields, eg for highlighting. This can be achieved by setting -`include_in_parent` to `true`: - -[source,js] --------------------------------------------------- -{ - "type1" : { - "properties" : { - "user" : { - "type" : "nested", - "include_in_parent": true, - "properties": { - "first" : {"type": "string" }, - "last" : {"type": "string" } - } - } - } - } -} --------------------------------------------------- - -The result of indexing our example document would be something like this: - -[source,js] --------------------------------------------------- -{ <1> - "user.first" : "alice", - "user.last" : "white" -} -{ <1> - "user.first" : "john", - "user.last" : "smith" -} -{ <2> - "group" : "fans", - "user.first" : [ "alice", "john" ], - "user.last" : [ "smith", "white" ] -} --------------------------------------------------- -<1> Hidden nested documents. -<2> Visible ``parent'' document. - - -Nested fields may contain other nested fields. The `include_in_parent` object -refers to the direct parent of the field, while the `include_in_root` -parameter refers only to the topmost ``root'' object or document. - -NOTE: The `include_in_parent` and `include_in_root` options do not apply -to <>, which are only ever -indexed inside the nested document. - -Nested docs will automatically use the root doc `_all` field only. - -.Internal Implementation -********************************************* -Internally, nested objects are indexed as additional documents, but, -since they can be guaranteed to be indexed within the same "block", it -allows for extremely fast joining with parent docs. - -Those internal nested documents are automatically masked away when doing -operations against the index (like searching with a match_all query), -and they bubble out when using the nested query. - -Because nested docs are always masked to the parent doc, the nested docs -can never be accessed outside the scope of the `nested` query. For example -stored fields can be enabled on fields inside nested objects, but there is -no way of retrieving them, since stored fields are fetched outside of -the `nested` query scope. - -The `_source` field is always associated with the parent document and -because of that field values via the source can be fetched for nested object. -********************************************* diff --git a/docs/reference/mapping/types/nested.asciidoc b/docs/reference/mapping/types/nested.asciidoc new file mode 100644 index 00000000000..07f87037b07 --- /dev/null +++ b/docs/reference/mapping/types/nested.asciidoc @@ -0,0 +1,201 @@ +[[nested]] +=== Nested datatype + +The `nested` type is a specialised version of the <> datatype +that allows arrays of objects to be indexed and queried independently of each +other. + +==== How arrays of objects are flattened + +Arrays of inner <> do not work the way you may expect. +Lucene has no concept of inner objects, so Elasticsearch flattens object +hierarchies into a simple list of field names and values. For instance, the +following document: + +[source,js] +-------------------------------------------------- +PUT my_index/my_type/1 +{ + "group" : "fans", + "user" : [ <1> + { + "first" : "John", + "last" : "Smith" + }, + { + "first" : "Alice", + "last" : "White" + } + ] +} +-------------------------------------------------- +// AUTOSENSE +<1> The `user` field is dynamically added as a field of type `object`. + +would be transformed internally into a document that looks more like this: + +[source,js] +-------------------------------------------------- +{ + "group" : "fans", + "user.first" : [ "alice", "john" ], + "user.last" : [ "smith", "white" ] +} +-------------------------------------------------- + +The `user.first` and `user.last` fields are flattened into multi-value fields, +and the association between `alice` and `white` is lost. This document would +incorrectly match a query for `alice AND smith`: + +[source,js] +-------------------------------------------------- +GET my_index/_search +{ + "query": { + "bool": { + "must": [ + { "match": { "user.first": "Alice" }}, + { "match": { "user.last": "White" }} + ] + } + } +} +-------------------------------------------------- +// AUTOSENSE + +==== Using `nested` fields for arrays of objects + +If you need to index arrays of objects and to maintain the independence of +each object in the array, you should used the `nested` datatype instead of the +<> datatype. Internally, nested objects index each object in +the array as a separate hidden document, meaning that each nested object can be +queried independently of the others, with the <>: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "user": { + "type": "nested" <1> + } + } + } + } +} + +PUT my_index/my_type/1 +{ + "group" : "fans", + "user" : [ + { + "first" : "John", + "last" : "Smith" + }, + { + "first" : "Alice", + "last" : "White" + } + ] +} + +GET my_index/_search +{ + "query": { + "nested": { + "path": "user", + "query": { + "bool": { + "must": [ + { "match": { "user.first": "Alice" }}, + { "match": { "user.last": "White" }} <2> + ] + } + } + } + } +} + +GET my_index/_search +{ + "query": { + "nested": { + "path": "user", + "query": { + "bool": { + "must": [ + { "match": { "user.first": "Alice" }}, + { "match": { "user.last": "Smith" }} <3> + ] + } + }, + "inner_hits": { <4> + "highlight": { + "fields": { + "user.first": {} + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `user` field is mapped as type `nested` instead of type `object`. +<2> This query doesn't match because `Alice` and `White` are not in the same nested object. +<3> This query matches because `Alice` and `White` are in the same nested object. +<4> `inner_hits` allow us to highlight the matching nested documents. + + +Nested documents can be: + +* queried with the <> query. +* analyzed with the <> + and <> + aggregations. +* sorted with <>. +* retrieved and highlighted with <>. + + +[[nested-params]] +==== Parameters for `nested` fields + +The following parameters are accepted by `nested` fields: + +[horizontal] +<>:: + + Whether or not new `properties` should be added dynamically to an existing + nested object. Accepts `true` (default), `false` and `strict`. + +<>:: + + Sets the default `include_in_all` value for all the `properties` within + the nested object. Nested documents do not have their own `_all` field. + Instead, values are added to the `_all` field of the main ``root'' + document. + +<>:: + + The fields within the nested object, which can be of any + <>, including `nested`. New properties + may be added to an existing nested object. + + +[IMPORTANT] +============================================= + +Because nested documents are indexed as separate documents, they can only be +accessed within the scope of the `nested` query, the +`nested`/`reverse_nested`, or <>. + +For instance, if a string field within a nested document has +<> set to `offsets` to allow use of the postings +highlighter, these offsets will not be available during the main highlighting +phase. Instead, highlighting needs to be performed via +<>. + +============================================= + diff --git a/docs/reference/mapping/types/numeric.asciidoc b/docs/reference/mapping/types/numeric.asciidoc new file mode 100644 index 00000000000..f04efa16583 --- /dev/null +++ b/docs/reference/mapping/types/numeric.asciidoc @@ -0,0 +1,93 @@ +[[number]] +=== Numeric datatypes + +The following numeric types are supported: + +[horizontal] +`long`:: A signed 64-bit integer with a minimum value of +-2^63^+ and a maximum value of +2^63^-1+. +`integer`:: A signed 32-bit integer with a minimum value of +-2^31^+ and a maximum value of +2^31^-1+. +`short`:: A signed 16-bit integer with a minimum value of +-32,768+ and a maximum value of +32,767+. +`byte`:: A signed 8-bit integer with a minimum value of +-128+ and a maximum value of +127+. +`double`:: A double-precision 64-bit IEEE 754 floating point. +`float`:: A single-precision 32-bit IEEE 754 floating point. + +Below is an example of configuring a mapping with numeric fields: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "number_of_bytes": { + "type": "integer" + }, + "time_in_seconds": { + "type": "float" + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE + +[[number-params]] +==== Parameters for numeric fields + +The following parameters are accepted by numeric types: + +[horizontal] + +<>:: + + Try to convert strings to numbers and truncate fractions for integers. + Accepts `true` (default) and `false`. + +<>:: + + Field-level index time boosting. Accepts a floating point number, defaults + to `1.0`. + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` (default) or `false`. + +<>:: + + If `true`, malformed numbers are ignored. If `false` (default), malformed + numbers throw an exception and reject the whole document. + +<>:: + + Whether or not the field value should be included in the + <> field? Accepts `true` or `false`. Defaults + to `false` if <> is set to `no`, or if a parent + <> field sets `include_in_all` to `false`. + Otherwise defaults to `true`. + +<>:: + + Should the field be searchable? Accepts `not_analyzed` (default) and `no`. + +<>:: + + Accepts a numeric value of the same `type` as the field which is + substituted for any explicit `null` values. Defaults to `null`, which + means the field is treated as missing. + +<>:: + + Controls the number of extra terms that are indexed to make + <> faster. The default depends on the + numeric `type`. + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). + + diff --git a/docs/reference/mapping/types/object-type.asciidoc b/docs/reference/mapping/types/object-type.asciidoc deleted file mode 100644 index b1ef623cd3d..00000000000 --- a/docs/reference/mapping/types/object-type.asciidoc +++ /dev/null @@ -1,179 +0,0 @@ -[[mapping-object-type]] -=== Object Type - -JSON documents are hierarchical in nature, allowing them to define inner -"objects" within the actual JSON. Elasticsearch completely understands -the nature of these inner objects and can map them easily, providing -query support for their inner fields. Because each document can have -objects with different fields each time, objects mapped this way are -known as "dynamic". Dynamic mapping is enabled by default. Let's take -the following JSON as an example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "person" : { - "name" : { - "first_name" : "Shay", - "last_name" : "Banon" - }, - "sid" : "12345" - }, - "message" : "This is a tweet!" - } -} --------------------------------------------------- - -The above shows an example where a tweet includes the actual `person` -details. A `person` is an object, with a `sid`, and a `name` object -which has `first_name` and `last_name`. It's important to note that -`tweet` is also an object, although it is a special -<> -which allows for additional mapping definitions. - -The following is an example of explicit mapping for the above JSON: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "person" : { - "type" : "object", - "properties" : { - "name" : { - "type" : "object", - "properties" : { - "first_name" : {"type" : "string"}, - "last_name" : {"type" : "string"} - } - }, - "sid" : {"type" : "string", "index" : "not_analyzed"} - } - }, - "message" : {"type" : "string"} - } - } -} --------------------------------------------------- - -In order to mark a mapping of type `object`, set the `type` to object. -This is an optional step, since if there are `properties` defined for -it, it will automatically be identified as an `object` mapping. - -[float] -==== properties - -An object mapping can optionally define one or more properties using the -`properties` tag for a field. Each property can be either another -`object`, or one of the -<>. - -[float] -==== dynamic - -One of the most important features of Elasticsearch is its ability to be -schema-less. This means that, in our example above, the `person` object -can be indexed later with a new property -- `age`, for example -- and it -will automatically be added to the mapping definitions. Same goes for -the `tweet` root object. - -This feature is by default turned on, and it's the `dynamic` nature of -each object mapped. Each object mapped is automatically dynamic, though -it can be explicitly turned off: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "person" : { - "type" : "object", - "properties" : { - "name" : { - "dynamic" : false, - "properties" : { - "first_name" : {"type" : "string"}, - "last_name" : {"type" : "string"} - } - }, - "sid" : {"type" : "string", "index" : "not_analyzed"} - } - }, - "message" : {"type" : "string"} - } - } -} --------------------------------------------------- - -In the above example, the `name` object mapped is not dynamic, meaning -that if, in the future, we try to index JSON with a `middle_name` within -the `name` object, it will get discarded and not added. - -There is no performance overhead if an `object` is dynamic, the ability -to turn it off is provided as a safety mechanism so "malformed" objects -won't, by mistake, index data that we do not wish to be indexed. - -If a dynamic object contains yet another inner `object`, it will be -automatically added to the index and mapped as well. - -When processing dynamic new fields, their type is automatically derived. -For example, if it is a `number`, it will automatically be treated as -number <>. Dynamic -fields default to their default attributes, for example, they are not -stored and they are always indexed. - -Date fields are special since they are represented as a `string`. Date -fields are detected if they can be parsed as a date when they are first -introduced into the system. The set of date formats that are tested -against can be configured using the `dynamic_date_formats` on the root object, -which is explained later. - -Note, once a field has been added, *its type can not change*. For -example, if we added age and its value is a number, then it can't be -treated as a string. - -The `dynamic` parameter can also be set to `strict`, meaning that not -only will new fields not be introduced into the mapping, but also that parsing -(indexing) docs with such new fields will fail. - -[float] -==== enabled - -The `enabled` flag allows to disable parsing and indexing a named object -completely. This is handy when a portion of the JSON document contains -arbitrary JSON which should not be indexed, nor added to the mapping. -For example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "properties" : { - "person" : { - "type" : "object", - "properties" : { - "name" : { - "type" : "object", - "enabled" : false - }, - "sid" : {"type" : "string", "index" : "not_analyzed"} - } - }, - "message" : {"type" : "string"} - } - } -} --------------------------------------------------- - -In the above, `name` and its content will not be indexed at all. - - -[float] -==== include_in_all - -`include_in_all` can be set on the `object` type level. When set, it -propagates down to all the inner mappings defined within the `object` -that do not explicitly set it. - diff --git a/docs/reference/mapping/types/object.asciidoc b/docs/reference/mapping/types/object.asciidoc new file mode 100644 index 00000000000..0d159d7e1ef --- /dev/null +++ b/docs/reference/mapping/types/object.asciidoc @@ -0,0 +1,105 @@ +[[object]] +=== Object datatype + +JSON documents are hierarchical in nature: the document may contain inner +objects which, in turn, may contain inner objects themselves: + +[source,js] +-------------------------------------------------- +PUT my_index/my_type/1 +{ <1> + "region": "US", + "manager": { <2> + "age": 30, + "name": { <3> + "first": "John", + "last": "Smith" + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The outer document is also a JSON object. +<2> It contains an inner object called `manager`. +<3> Which in turn contains an inner object called `name`. + +Internally, this document is indexed as a simple, flat list of key-value +pairs, something like this: + +[source,js] +-------------------------------------------------- +{ + "region": "US", + "manager.age": 30, + "manager.name.first": "John", + "manager.name.last": "Smith" +} +-------------------------------------------------- + +An explicit mapping for the above document could look like this: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { <1> + "properties": { + "region": { + "type": "string", + "index": "not_analyzed" + }, + "manager": { <2> + "properties": { + "age": { "type": "integer" }, + "name": { <3> + "properties": { + "first": { "type": "string" }, + "last": { "type": "string" } + } + } + } + } + } + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The mapping type is a type of object, and has a `properties` field. +<2> The `manager` field is an inner `object` field. +<3> The `manager.name` field is an inner `object` field within the `manager` field. + +You are not required to set the field `type` to `object` explicitly, as this is the default value. + +[[object-params]] +==== Parameters for `object` fields + +The following parameters are accepted by `object` fields: + +[horizontal] +<>:: + + Whether or not new `properties` should be added dynamically + to an existing object. Accepts `true` (default), `false` + and `strict`. + +<>:: + + Whether the JSON value given for the object field should be + parsed and indexed (`true`, default) or completely ignored (`false`). + +<>:: + + Sets the default `include_in_all` value for all the `properties` within + the object. The object itself is not added to the `_all` field. + +<>:: + + The fields within the object, which can be of any + <>, including `object`. New properties + may be added to an existing object. + +IMPORTANT: If you need to index arrays of objects instead of single objects, +read <> first. + diff --git a/docs/reference/mapping/types/root-object-type.asciidoc b/docs/reference/mapping/types/root-object-type.asciidoc deleted file mode 100644 index 090f88bc846..00000000000 --- a/docs/reference/mapping/types/root-object-type.asciidoc +++ /dev/null @@ -1,190 +0,0 @@ -[[mapping-root-object-type]] -=== Root Object Type - -The root object mapping is an <> that -maps the root object (the type itself). It supports all of the different -mappings that can be set using the <>. - -The root object mapping allows to index a JSON document that only contains its -fields. For example, the following `tweet` JSON can be indexed without -specifying the `tweet` type in the document itself: - -[source,js] --------------------------------------------------- -{ - "message" : "This is a tweet!" -} --------------------------------------------------- - -[float] -==== dynamic_date_formats - -`dynamic_date_formats` (old setting called `date_formats` still works) -is the ability to set one or more date formats that will be used to -detect `date` fields. For example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "dynamic_date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"], - "properties" : { - "message" : {"type" : "string"} - } - } -} --------------------------------------------------- - -In the above mapping, if a new JSON field of type string is detected, -the date formats specified will be used in order to check if its a date. -If it passes parsing, then the field will be declared with `date` type, -and will use the matching format as its format attribute. The date -format itself is explained -<>. - -The default formats are: `strictDateOptionalTime` (ISO) and -`yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z` and `epoch_millis`. - -*Note:* `dynamic_date_formats` are used *only* for dynamically added -date fields, not for `date` fields that you specify in your mapping. - -[float] -==== date_detection - -Allows to disable automatic date type detection (if a new field is introduced -and matches the provided format), for example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "date_detection" : false, - "properties" : { - "message" : {"type" : "string"} - } - } -} --------------------------------------------------- - -[float] -==== numeric_detection - -Sometimes, even though json has support for native numeric types, -numeric values are still provided as strings. In order to try and -automatically detect numeric values from string, the `numeric_detection` -can be set to `true`. For example: - -[source,js] --------------------------------------------------- -{ - "tweet" : { - "numeric_detection" : true, - "properties" : { - "message" : {"type" : "string"} - } - } -} --------------------------------------------------- - -[float] -==== dynamic_templates - -Dynamic templates allow to define mapping templates that will be applied -when dynamic introduction of fields / objects happens. - -IMPORTANT: Dynamic field mappings are only added when a field contains -a concrete value -- not `null` or an empty array. This means that if the `null_value` option -is used in a `dynamic_template`, it will only be applied after the first document -with a concrete value for the field has been indexed. - -For example, we might want to have all fields to be stored by default, -or all `string` fields to be stored, or have `string` fields to always -be indexed with multi fields syntax, once analyzed and once not_analyzed. -Here is a simple example: - -[source,js] --------------------------------------------------- -{ - "person" : { - "dynamic_templates" : [ - { - "template_1" : { - "match" : "multi*", - "mapping" : { - "type" : "{dynamic_type}", - "index" : "analyzed", - "fields" : { - "org" : {"type": "{dynamic_type}", "index" : "not_analyzed"} - } - } - } - }, - { - "template_2" : { - "match" : "*", - "match_mapping_type" : "string", - "mapping" : { - "type" : "string", - "index" : "not_analyzed" - } - } - } - ] - } -} --------------------------------------------------- - -The above mapping will create a field with multi fields for all field -names starting with multi, and will map all `string` types to be -`not_analyzed`. - -Dynamic templates are named to allow for simple merge behavior. A new -mapping, just with a new template can be "put" and that template will be -added, or if it has the same name, the template will be replaced. - -The `match` allow to define matching on the field name. An `unmatch` -option is also available to exclude fields if they do match on `match`. -The `match_mapping_type` controls if this template will be applied only -for dynamic fields of the specified type (as guessed by the json -format). - -Another option is to use `path_match`, which allows to match the dynamic -template against the "full" dot notation name of the field (for example -`obj1.*.value` or `obj1.obj2.*`), with the respective `path_unmatch`. - -The format of all the matching is simple format, allowing to use * as a -matching element supporting simple patterns such as xxx*, *xxx, xxx*yyy -(with arbitrary number of pattern types), as well as direct equality. -The `match_pattern` can be set to `regex` to allow for regular -expression based matching. - -The `mapping` element provides the actual mapping definition. The -`{name}` keyword can be used and will be replaced with the actual -dynamic field name being introduced. The `{dynamic_type}` (or -`{dynamicType}`) can be used and will be replaced with the mapping -derived based on the field type (or the derived type, like `date`). - -Complete generic settings can also be applied, for example, to have all -mappings be stored, just set: - -[source,js] --------------------------------------------------- -{ - "person" : { - "dynamic_templates" : [ - { - "store_generic" : { - "match" : "*", - "mapping" : { - "store" : true - } - } - } - ] - } -} --------------------------------------------------- - -Such generic templates should be placed at the end of the -`dynamic_templates` list because when two or more dynamic templates -match a field, only the first matching one from the list is used. diff --git a/docs/reference/mapping/types/string.asciidoc b/docs/reference/mapping/types/string.asciidoc new file mode 100644 index 00000000000..edad8d8e7ae --- /dev/null +++ b/docs/reference/mapping/types/string.asciidoc @@ -0,0 +1,170 @@ +[[string]] +=== String datatype + +Fields of type `string` accept text values. Strings may be sub-divided into: + +Full text:: ++ +-- + +Full text values, like the body of an email, are typically used for text based +relevance searches, such as: _Find the most relevant documents that match a +query for "quick brown fox"_. + +These fields are `analyzed`, that is they are passed through an +<> to convert the string into a list of individual terms +before being indexed. The analysis process allows Elasticsearch to search for +individual words _within_ each full text field. Full text fields are not +used for sorting and seldom used for aggregations (although the +<> is a notable exception). + +-- + +Keywords:: + +Keywords are exact values like email addresses, hostnames, status codes, or +tags. They are typically used for filtering (_Find me all blog posts where +++status++ is ++published++_), for sorting, and for aggregations. Keyword +fields are `not_analyzed`. Instead, the exact string value is added to the +index as a single term. + +Below is an example of a mapping for a full text (`analyzed`) and a keyword +(`not_analyzed`) string field: + +[source,js] +-------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "full_name": { <1> + "type": "string" + }, + "status": { + "type": "string", <2> + "index": "not_analyzed" + } + } + } + } +} +-------------------------------- +// AUTOSENSE +<1> The `full_name` field is an `analyzed` full text field -- `index:analyzed` is the default. +<2> The `status` field is a `not_analyzed` keyword field. + +Sometimes it is useful to have both a full text (`analyzed`) and a keyword +(`not_analyzed`) version of the same field: one for full text search and the +other for aggregations and sorting. This can be achieved with +<>. + + +[[string-params]] +==== Parameters for string fields + +The following parameters are accepted by `string` fields: + +[horizontal] + +<>:: + + The <> which should be used for + <> string fields, both at index-time + and at search-time (unless overridden by the <>). + Defaults to the default index analyzer, or the + <>. + +<>:: + + Field-level index time boosting. Accepts a floating point number, defaults + to `1.0`. + +<>:: + + Can the field use on-disk index-time doc values for sorting, aggregations, + or scripting? Accepts `true` or `false`. Defaults to `true` for + `not_analyzed` fields. Analyzed fields do not support doc values. + +<>:: + + Can the field use in-memory fielddata for sorting, aggregations, + or scripting? Accepts `disabled` or `paged_bytes` (default). + Not analyzed fields will use <> in preference + to fielddata. + +<>:: + + Multi-fields allow the same string value to be indexed in multiple ways for + different purposes, such as one field for search and a multi-field for + sorting and aggregations, or the same string value analyzed by different + analyzers. + +<>:: + + Do not index or analyze any string longer than this value. Defaults to `0` (disabled). + +<>:: + + Whether or not the field value should be included in the + <> field? Accepts `true` or `false`. Defaults + to `false` if <> is set to `no`, or if a parent + <> field sets `include_in_all` to `false`. + Otherwise defaults to `true`. + +<>:: + + Should the field be searchable? Accepts `analyzed` (default, treat as full-text field), + `not_analyzed` (treat as keyword field) and `no`. + +<>:: + + What information should be stored in the index, for search and highlighting purposes. + Defaults to `positions` for <> fields, and to `docs` for + `not_analyzed` fields. + + +<>:: ++ +-- + +Whether field-length should be taken into account when scoring queries. +Defaults depend on the <> setting: + +* `analyzed` fields default to `{ "enabled": true, "loading": "lazy" }`. +* `not_analyzed` fields default to `{ "enabled": false }`. +-- + +<>:: + + Accepts a string value which is substituted for any explicit `null` + values. Defaults to `null`, which means the field is treated as missing. + If the field is `analyzed`, the `null_value` will also be analyzed. + +<>:: + + The number of fake term positions which should be inserted between + each element of an array of strings. Defaults to 0. + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). + +<>:: + + The <> that should be used at search time on + <> fields. Defaults to the `analyzer` setting. + +<>:: + + Which scoring algorithm or _similarity_ should be used. Defaults + to `default`, which uses TF/IDF. + +<>:: + + Whether term vectors should be stored for an <> + field. Defaults to `no`. + + diff --git a/docs/reference/mapping/types/token-count.asciidoc b/docs/reference/mapping/types/token-count.asciidoc new file mode 100644 index 00000000000..6c1b93c34d9 --- /dev/null +++ b/docs/reference/mapping/types/token-count.asciidoc @@ -0,0 +1,107 @@ +[[token-count]] +=== Token count datatype + +A field of type `token_count` is really an <> field which +accepts string values, analyzes them, then indexes the number of tokens in the +string. + +For instance: + +[source,js] +-------------------------------------------------- +PUT my_index +{ + "mappings": { + "my_type": { + "properties": { + "name": { <1> + "type": "string", + "fields": { + "length": { <2> + "type": "token_count", + "analyzer": "standard" + } + } + } + } + } + } +} + +PUT my_index/my_type/1 +{ "name": "John Smith" } + +PUT my_index/my_type/2 +{ "name": "Rachel Alice Williams" } + +GET my_index/_search +{ + "query": { + "term": { + "name.length": 3 <3> + } + } +} +-------------------------------------------------- +// AUTOSENSE +<1> The `name` field is an analyzed string field which uses the default `standard` analyzer. +<2> The `name.length` field is a `token_count` <> which will index the number of tokens in the `name` field. +<3> This query matches only the document containing `Rachel Alice Williams`, as it contains three tokens. + +[NOTE] +=================================================================== +Technically the `token_count` type sums position increments rather than +counting tokens. This means that even if the analyzer filters out stop +words they are included in the count. +=================================================================== + +[[token-count-params]] +==== Parameters for `token_count` fields + +The following parameters are accepted by `token_count` fields: + +[horizontal] + +<>:: + + The <> which should be used to analyze the string + value. Required. For best performance, use an analyzer without token + filters. + +<>:: + + Field-level index time boosting. Accepts a floating point number, defaults + to `1.0`. + +<>:: + + Can the field value be used for sorting, aggregations, or scripting? + Accepts `true` (default) or `false`. + +<>:: + + Should the field be searchable? Accepts `not_analyzed` (default) and `no`. + +<>:: + + Whether or not the field value should be included in the + <> field? Accepts `true` or `false`. Defaults + to `false`. Note: if `true`, it is the string value that is added to `_all`, + not the calculated token count. + +<>:: + + Accepts a numeric value of the same `type` as the field which is + substituted for any explicit `null` values. Defaults to `null`, which + means the field is treated as missing. + +<>:: + + Controls the number of extra terms that are indexed to make + <> faster. Defaults to `32`. + +<>:: + + Whether the field value should be stored and retrievable separately from + the <> field. Accepts `true` or `false` + (default). diff --git a/docs/reference/modules/advanced-scripting.asciidoc b/docs/reference/modules/advanced-scripting.asciidoc index b4e907cb313..206a2fec50d 100644 --- a/docs/reference/modules/advanced-scripting.asciidoc +++ b/docs/reference/modules/advanced-scripting.asciidoc @@ -71,7 +71,7 @@ Field statistics can be accessed with a subscript operator like this: Field statistics are computed per shard and therefore these numbers can vary depending on the shard the current document resides in. -The number of terms in a field cannot be accessed using the `_index` variable. See <> on how to do that. +The number of terms in a field cannot be accessed using the `_index` variable. See <> for how to do that. [float] ==== Term statistics: @@ -80,7 +80,7 @@ Term statistics for a field can be accessed with a subscript operator like this: `_index['FIELD']['TERM']`. This will never return null, even if term or field does not exist. If you do not need the term frequency, call `_index['FIELD'].get('TERM', 0)` to avoid unnecessary initialization of the frequencies. The flag will have only -affect is your set the `index_options` to `docs` (see <>). +affect is your set the <> to `docs`. `_index['FIELD']['TERM'].df()`:: @@ -176,7 +176,7 @@ return score; [float] ==== Term vectors: -The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (set `term_vector` in the mapping as described in the <>). To access them, call +The `_index` variable can only be used to gather statistics for single terms. If you want to use information on all terms in a field, you must store the term vectors (see <>). To access them, call `_index.termVectors()` to get a https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[Fields] instance. This object can then be used as described in https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/index/Fields.html[lucene doc] to iterate over fields and then for each field iterate over each term in the field. diff --git a/docs/reference/modules/scripting.asciidoc b/docs/reference/modules/scripting.asciidoc index ea31b2d010c..7729ce20030 100644 --- a/docs/reference/modules/scripting.asciidoc +++ b/docs/reference/modules/scripting.asciidoc @@ -284,7 +284,6 @@ supported operations are: |======================================================================= |Value |Description | `aggs` |Aggregations (wherever they may be used) -| `mapping` |Mappings (script transform feature) | `search` |Search api, Percolator api and Suggester api (e.g filters, script_fields) | `update` |Update api | `plugin` |Any plugin that makes use of scripts under the generic `plugin` category diff --git a/docs/reference/query-dsl/exists-query.asciidoc b/docs/reference/query-dsl/exists-query.asciidoc index d25ebdecd89..ca72887d048 100644 --- a/docs/reference/query-dsl/exists-query.asciidoc +++ b/docs/reference/query-dsl/exists-query.asciidoc @@ -44,7 +44,7 @@ These documents would *not* match the above query: [float] ===== `null_value` mapping -If the field mapping includes the `null_value` setting (see <>) +If the field mapping includes the <> setting then explicit `null` values are replaced with the specified `null_value`. For instance, if the `user` field were mapped as follows: diff --git a/docs/reference/query-dsl/function-score-query.asciidoc b/docs/reference/query-dsl/function-score-query.asciidoc index ab1d7d7a3de..08e5e575f20 100644 --- a/docs/reference/query-dsl/function-score-query.asciidoc +++ b/docs/reference/query-dsl/function-score-query.asciidoc @@ -254,7 +254,7 @@ decay function is specified as <1> The `DECAY_FUNCTION` should be one of `linear`, `exp`, or `gauss`. <2> The specified field must be a numeric, date, or geo-point field. -In the above example, the field is a <> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example: +In the above example, the field is a <> and origin can be provided in geo format. `scale` and `offset` must be given with a unit in this case. If your field is a date field, you can set `scale` and `offset` as days, weeks, and so on. Example: [source,js] @@ -268,7 +268,7 @@ In the above example, the field is a <> and origin can b } } -------------------------------------------------- -<1> The date format of the origin depends on the <> defined in +<1> The date format of the origin depends on the <> defined in your mapping. If you do not define the origin, the current time is used. <2> The `offset` and `decay` parameters are optional. diff --git a/docs/reference/query-dsl/geo-polygon-query.asciidoc b/docs/reference/query-dsl/geo-polygon-query.asciidoc index b9b624bda4c..6c28a05b445 100644 --- a/docs/reference/query-dsl/geo-polygon-query.asciidoc +++ b/docs/reference/query-dsl/geo-polygon-query.asciidoc @@ -112,7 +112,6 @@ Format in `lat,lon`. [float] ==== geo_point Type -The filter *requires* the -<> type to be -set on the relevant field. +The query *requires* the <> type to be set on the +relevant field. diff --git a/docs/reference/query-dsl/geo-queries.asciidoc b/docs/reference/query-dsl/geo-queries.asciidoc index 01f42831e00..ca7064fda72 100644 --- a/docs/reference/query-dsl/geo-queries.asciidoc +++ b/docs/reference/query-dsl/geo-queries.asciidoc @@ -2,8 +2,8 @@ == Geo queries Elasticsearch supports two types of geo data: -<> fields which support lat/lon pairs, and -<> fields, which support points, +<> fields which support lat/lon pairs, and +<> fields, which support points, lines, circles, polygons, multi-polygons etc. The queries in this group are: diff --git a/docs/reference/query-dsl/geo-shape-query.asciidoc b/docs/reference/query-dsl/geo-shape-query.asciidoc index 7a11677f0b1..8ab2f13a22c 100644 --- a/docs/reference/query-dsl/geo-shape-query.asciidoc +++ b/docs/reference/query-dsl/geo-shape-query.asciidoc @@ -3,7 +3,7 @@ Filter documents indexed using the `geo_shape` type. -Requires the <>. +Requires the <>. The `geo_shape` query uses the same grid square representation as the geo_shape mapping to find documents that have a shape that intersects diff --git a/docs/reference/query-dsl/geohash-cell-query.asciidoc b/docs/reference/query-dsl/geohash-cell-query.asciidoc index 8b75f3c60b4..c3c83de1866 100644 --- a/docs/reference/query-dsl/geohash-cell-query.asciidoc +++ b/docs/reference/query-dsl/geohash-cell-query.asciidoc @@ -2,13 +2,13 @@ === Geohash Cell Query The `geohash_cell` query provides access to a hierarchy of geohashes. -By defining a geohash cell, only <> +By defining a geohash cell, only <> within this cell will match this filter. To get this filter work all prefixes of a geohash need to be indexed. In example a geohash `u30` needs to be decomposed into three terms: `u30`, `u3` and `u`. This decomposition must be enabled in the mapping of the -<> field that's going to be filtered by +<> field that's going to be filtered by setting the `geohash_prefix` option: [source,js] diff --git a/docs/reference/query-dsl/joining-queries.asciidoc b/docs/reference/query-dsl/joining-queries.asciidoc index a230dedae92..ae66db8ce81 100644 --- a/docs/reference/query-dsl/joining-queries.asciidoc +++ b/docs/reference/query-dsl/joining-queries.asciidoc @@ -7,7 +7,7 @@ which are designed to scale horizontally. <>:: -Documents may contains fields of type <>. These +Documents may contains fields of type <>. These fields are used to index arrays of objects, where each object can be queried (with the `nested` query) as an independent document. diff --git a/docs/reference/query-dsl/missing-query.asciidoc b/docs/reference/query-dsl/missing-query.asciidoc index 276722bd4c6..648da068189 100644 --- a/docs/reference/query-dsl/missing-query.asciidoc +++ b/docs/reference/query-dsl/missing-query.asciidoc @@ -44,7 +44,7 @@ These documents would *not* match the above filter: [float] ==== `null_value` mapping -If the field mapping includes a `null_value` (see <>) then explicit `null` values +If the field mapping includes a <> then explicit `null` values are replaced with the specified `null_value`. For instance, if the `user` field were mapped as follows: diff --git a/docs/reference/query-dsl/nested-query.asciidoc b/docs/reference/query-dsl/nested-query.asciidoc index 0d8f5ceb6f5..d32705a0a7a 100644 --- a/docs/reference/query-dsl/nested-query.asciidoc +++ b/docs/reference/query-dsl/nested-query.asciidoc @@ -2,7 +2,7 @@ === Nested Query Nested query allows to query nested objects / docs (see -<>). The +<>). The query is executed against the nested objects / docs as if they were indexed as separate docs (they are, internally) and resulting in the root parent doc (or parent nested mapping). Here is a sample mapping we @@ -48,7 +48,7 @@ And here is a sample nested query usage: The query `path` points to the nested object path, and the `query` (or `filter`) includes the query that will run on the nested docs matching the direct path, and joining with the root parent docs. Note that any -fields referenced inside the query must use the complete path (fully +fields referenced inside the query must use the complete path (fully qualified). The `score_mode` allows to set how inner children matching affects diff --git a/docs/reference/query-dsl/range-query.asciidoc b/docs/reference/query-dsl/range-query.asciidoc index d5b271215c2..3c216ca58f7 100644 --- a/docs/reference/query-dsl/range-query.asciidoc +++ b/docs/reference/query-dsl/range-query.asciidoc @@ -29,33 +29,60 @@ The `range` query accepts the following parameters: `lt`:: Less-than `boost`:: Sets the boost value of the query, defaults to `1.0` -[float] -==== Date options -When applied on `date` fields the `range` filter accepts also a `time_zone` parameter. -The `time_zone` parameter will be applied to your input lower and upper bounds and will -move them to UTC time based date: +[[ranges-on-dates]] +==== Ranges on date fields + +When running `range` queries on fields of type <>, ranges can be +specified using <>:: [source,js] -------------------------------------------------- { "range" : { - "born" : { - "gte": "2012-01-01", - "lte": "now", - "time_zone": "+01:00" + "date" : { + "gte" : "now-1d/d", + "lt" : "now/d" } } } -------------------------------------------------- -In the above example, `gte` will be actually moved to `2011-12-31T23:00:00` UTC date. +===== Date math and rounding -NOTE: if you give a date with a timezone explicitly defined and use the `time_zone` parameter, `time_zone` will be -ignored. For example, setting `gte` to `2012-01-01T00:00:00+01:00` with `"time_zone":"+10:00"` will still use `+01:00` time zone. +When using <> to round dates to the nearest day, month, +hour, etc, the rounded dates depend on whether the ends of the ranges are +inclusive or exclusive. -When applied on `date` fields the `range` query accepts also a `format` parameter. -The `format` parameter will help support another date format than the one defined in mapping: +Rounding up moves to the last millisecond of the rounding scope, and rounding +down to the first millisecond of the rounding scope. For example: + +[horizontal] +`gt`:: + + Greater than the date rounded up: `2014-11-18||/M` becomes + `2014-11-30T23:59:59.999`, ie excluding the entire month. + +`gte`:: + + Greater than or equal to the date rounded down: `2014-11-18||/M` becomes + `2014-11-01`, ie including the entire month. + +`lt`:: + + Less than the date rounded down: `2014-11-18||/M` becomes `2014-11-01`, ie + excluding the entire month. + +`lte`:: + + Less than or equal to the date rounded up: `2014-11-18||/M` becomes + `2014-11-30T23:59:59.999`, ie including the entire month. + +===== Date format in range queries + +Formatted dates will be parsed using the <> +specified on the <> field by default, but it can be overridden by +passing the `format` parameter to the `range` query: [source,js] -------------------------------------------------- @@ -69,3 +96,25 @@ The `format` parameter will help support another date format than the one define } } -------------------------------------------------- + +===== Time zone in range queries + +Dates can be converted from another timezone to UTC either by specifying the +time zone in the date value itself (if the <> +accepts it), or it can be specified as the `time_zone` parameter: + +[source,js] +-------------------------------------------------- +{ + "range" : { + "timestamp" : { + "gte": "2015-01-01 00:00:00", <1> + "lte": "now", + "time_zone": "+01:00" + } + } +} +-------------------------------------------------- +<1> This date will be converted to `2014-12-31T23:00:00 UTC`. + + diff --git a/docs/reference/search/request/inner-hits.asciidoc b/docs/reference/search/request/inner-hits.asciidoc index 5e38ac024c2..bf7d184111b 100644 --- a/docs/reference/search/request/inner-hits.asciidoc +++ b/docs/reference/search/request/inner-hits.asciidoc @@ -3,7 +3,7 @@ experimental[] -The <> and <> features allow the return of documents that +The <> and <> features allow the return of documents that have matches in a different scope. In the parent/child case, parent document are returned based on matches in child documents or child document are returned based on matches in parent documents. In the nested case, documents are returned based on matches in nested inner objects. diff --git a/docs/reference/search/request/sort.asciidoc b/docs/reference/search/request/sort.asciidoc index c529955b5c5..0a8f3682b72 100644 --- a/docs/reference/search/request/sort.asciidoc +++ b/docs/reference/search/request/sort.asciidoc @@ -71,6 +71,7 @@ curl -XPOST 'localhost:9200/_search' -d '{ }' -------------------------------------------------- +[[nested-sorting]] ==== Sorting within nested objects. Elasticsearch also supports sorting by @@ -166,6 +167,7 @@ If any of the indices that are queried doesn't have a mapping for `price` then Elasticsearch will handle it as if there was a mapping of type `long`, with all documents in this index having no value for this field. +[[geo-sorting]] ==== Geo Distance Sorting Allow to sort by `_geo_distance`. Here is an example: