Commit Graph

252 Commits

Author SHA1 Message Date
LeonardGC 0b8be7f894 Update field-mapping.asciidoc (#17670) 2016-04-15 09:22:38 +02:00
Adrien Grand d84c643f58 Use the new points API to index numeric fields. #17746
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.

Notes about how the change works:
 - Numeric mappers have been split into a legacy version that is essentially
   the current mapper, and a new version that uses points, eg.
   LegacyDateFieldMapper and DateFieldMapper.
 - Since new and old fields have the same names, the decision about which one
   to use is made based on the index creation version.
 - If you try to force using a legacy field on a new index or a field that uses
   points on an old index, you will get an exception.
 - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
   in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
   and sortable).
 - The internal MappedFieldType that is stored by the new mappers does not have
   any of the points-related properties set. Instead, it keeps setting the index
   options when parsing the `index` property of mappings and does
   `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
   when parsing documents.

Known issues that won't fix:
 - You can't use numeric fields in significant terms aggregations anymore since
   this requires document frequencies, which points do not record.
 - Term queries on numeric fields will now return constant scores instead of
   giving better scores to the rare values.

Known issues that we could work around (in follow-up PRs, this one is too large
already):
 - Range queries on `ip` addresses only work if both the lower and upper bounds
   are inclusive (exclusive bounds are not exposed in Lucene). We could either
   decide to implement it, or drop range support entirely and tell users to
   query subnets using the CIDR notation instead.
 - Since IP addresses now use a different representation for doc values,
   aggregations will fail when running a terms aggregation on an ip field on a
   list of indices that contains both pre-5.0 and 5.0 indices.
 - The ip range aggregation does not work on the new ip field. We need to either
   implement range aggs for SORTED_SET doc values or drop support for ip ranges
   and tell users to use filters instead. #17700

Closes #16751
Closes #17007
Closes #11513
2016-04-14 17:56:23 +02:00
Nik Everett 0f9804b0e2 reindex: gracefully handle when _source is disabled
Closes #17666
2016-04-13 08:19:58 -04:00
Ibrahim Awwal 5121060e75 Fix typo in templates.asciidoc
The doc mentions match_path in one place but the correct syntax is path_match which is mentioned everywhere else. Using the wrong string leads to errors because the mapping becomes too greedy, and matches things it shouldn't.
2016-04-06 16:40:20 -06:00
Sergii Golubev 8430b379d8 string.asciidoc: fix for `position_increment_gap`
Remove  outdated and duplicate description for the `position_increment_gap` parameter.
2016-04-05 16:23:42 -04:00
Adrien Grand 26a0fb37a4 Add examples of useful dynamic templates to the docs. #17413 2016-03-31 09:45:11 +02:00
Adrien Grand fc47007e17 Add a soft limit on the mapping depth. #17400
This commit adds the new `index.mapping.depth.limit` setting which controls the
maximum mapping depth that is allowed. It has a default value of 20.
2016-03-30 14:37:00 +02:00
Yanjun Huang 361adcf387 Add limit to total number of fields in mapping. #17357
This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable

Closes #11443
2016-03-29 19:39:46 +02:00
Adrien Grand b42f66c8ac Document 5.0 mapping changes. 2016-03-22 16:22:58 +01:00
Clinton Gormley 2fa573bc58 Missing word in docs 2016-03-10 14:34:05 +01:00
Nicholas Knize 55635d5de1 update coerce and breaking changes documentation 2016-03-09 16:09:44 -06:00
Nicholas Knize 61f39e6c92 GeoPointV2 update docs and query builders
This commit updates the documentation for GeoPointField by removing all references to the coerce and doc_values parameters. DocValues are enabled in lucene GeoPointField by default (required for boundary filtering). The QueryBuilders are updated to automatically normalize points (ignoring the coerce parameter) for any index created onOrAfter version 2.2.
2016-03-09 16:09:44 -06:00
Jim Ferenczi 927303e7a9 Change the field mapping index time boost into a query time boost.
Index time boost will still be applied for indices created before 5.0.0.
2016-03-04 11:47:35 +01:00
Clinton Gormley 05e3cd6b97 Merge pull request #16878 from peschlowp/patch-8
Update index-options.asciidoc
2016-03-02 10:52:44 +01:00
Clinton Gormley 812f03a33f Merge pull request #16842 from anhlqn/patch-1
Fix minor spelling
2016-02-29 01:32:42 +01:00
Clinton Gormley 00b9640208 Merge pull request #16672 from teuneboon/patch-1
Clarify text about date format range
2016-02-15 16:16:19 +01:00
Dongjoon Hyun 21ea552070 Fix typos in docs. 2016-02-09 02:07:32 -08:00
Adrien Grand 209860854d Make the `index` property a boolean.
With the split of `string` into `text` and `keyword`, the `index` property can
only have two values and should be a boolean.
2016-01-27 09:06:00 +01:00
Clinton Gormley 6aa1a4930e Added back deprecation notices for _ttl and _timestamp 2016-01-26 11:56:36 +01:00
Robert Muir 6e7e3a2274 Update lucene to r1725675
Adds DFI (divergence from independence) provider.
Fixes test bugs passing invalid values for BM25 parameters.
2016-01-20 03:32:51 -05:00
Rachit Gupta 5b2ded5c96 Fix typo in doc values docs
Closes #16067
2016-01-19 05:58:39 -05:00
Yannick Welsch a1b8dd2de9 Add per-index setting to limit number of nested fields
Closes #14983
2016-01-19 10:03:48 +01:00
Felipe Forbeck 9965c83ae4 Documented how to define custom mappings for all indexes and all types
Closes #15557
2016-01-12 13:35:29 +01:00
Clinton Gormley 9773cca58e Merge pull request #15870 from rjruizes/patch-1
fix nested multi-value query
2016-01-10 10:06:40 +01:00
Adrien Grand 67d233cecd Remove warmers and the warmer API.
Warmers are now barely useful and will be removed in 3.0. Note that this only
removes the warmer API and query-based warmers. We still have warmers internally
for eg. global ordinals.

Close #15607
2016-01-07 09:57:07 +01:00
Imran Azad 8081c782ef Documented search_quote_analyzer in mapping types and detailed how to disable stop words as a potential use case. 2016-01-06 10:40:51 +01:00
Jim Ferenczi 81fd2169cf Renames "default" similarity into "classic".
Replaces deprecated DefaultSimilarity by ClassicSimilarity.
Fixes #15102
2015-12-21 16:22:53 +01:00
umeku 0ce88b5887 Fix inaccurate docs for nested datatype
Closes #15436
2015-12-15 15:15:00 +01:00
Clinton Gormley 061446b25a Merge pull request #15304 from cjohansen/patch-1
Fix typo
2015-12-15 10:57:38 +01:00
Clinton Gormley 83ee1fc903 Merge pull request #15400 from TheDude05/fix-match_pattern-docs
Fix docs with `match_pattern` in dynamic templates
2015-12-14 14:18:59 +01:00
Nicholas Knize 5f3d807f61 Update geo_shape/query docs, fix TermStrategy defaults
This commit adds the following:
* SpatialStrategy documentation to the geo-shape reference docs.
* Updates relation documentation to geo-shape-query reference docs.
* Updates GeoShapeFiledMapper to set points_only to true if TERM strategy is used (to be consistent with documentation)
2015-12-11 17:14:22 -06:00
Andrew Williams e7127c9f6f Fix docs with `match_pattern` in dynamic templates 2015-12-11 14:03:54 -06:00
Jim Ferenczi 9ab168dbf6 Removes all the reference of the query in the docs 2015-12-11 20:07:57 +01:00
Ben Tse 3cede749f9 fixed minor typo 2015-12-03 23:53:48 -05:00
Clinton Gormley 72be42d742 Document that _index is a virtual field and only supports term queries
Closes #15070
Closes #15081
2015-11-30 08:43:23 +01:00
Jason Tedor b6da075505 Fix typo in TTL field docs
Closes #14994
2015-11-24 22:57:35 -05:00
David Pilato 5b0e2823b1 Merge branch 'docs/mapper-attachments' 2015-11-23 12:14:31 +01:00
Clinton Gormley 2293c0d8c8 Update token-count.asciidoc
Fix typo
2015-11-20 19:00:52 +01:00
Clinton Gormley 728cc5137a Merge pull request #14738 from petmit/patch-1
Update error in documentation for multi-fields
2015-11-17 17:33:53 +01:00
Adrien Grand 35c0b50879 Reword some documentation to make it more obvious that doc values are a columnar representation of the data.
Some users may already be familiar with column stores, so saying more explicitly
that doc values are a columnar representation of the data may help them better
and/or more quickly understand what doc values are about.
2015-11-09 23:32:47 +01:00
David Pilato e993c6a862 Migrate mapper attachements plugin to asciidoc
Followup for #14605
2015-11-09 15:35:06 +01:00
Clinton Gormley c49aaa1284 Merge pull request #14608 from jimmyjones2/patch-1
Update all-field.asciidoc
2015-11-09 13:43:25 +01:00
Clinton Gormley dc018cf622 Updated docs for 3.0.0-beta 2015-10-07 13:27:46 +02:00
xuzha a77c68ba0e Fix position-increment-gap doc example 2015-09-23 08:04:43 -07:00
Nik Everett b205875c43 Merge pull request #13515 from elastic/docsfix
Fix for mappings->_source example in docs
2015-09-11 11:02:55 -04:00
Shane Connelly d86c1e8769 Fixes #13417 2015-09-11 07:34:14 -07:00
Nicholas Knize e4e71d8a9a add points_only option to GeoShapeFieldMapper for optimizing indexing performance on geo_shape indexes designed to store only points. Includes updated documentation and exception handling for ensuring index integrity on points only data. 2015-09-08 16:17:50 -05:00
Clinton Gormley 2c20658204 Docs: Added deprecation notice for _timestamp and _ttl 2015-09-07 21:16:19 +02:00
Nik Everett da16dcf527 [docs] Fix docs for position_increment_gap
Closes #13207
2015-08-31 14:05:55 -04:00
Nik Everett 9eb684da51 Default detect_noop to true
detect_noop is pretty cheap and noop updates compartively expensive so this
feels like a sensible default.

Also had to do some testing and documentation around how _ttl works with
detect_noop.

Closes #11282
2015-08-27 10:34:18 -04:00
xuzha 9bd4a7b72e Fix doc build 2015-08-26 16:02:36 -07:00
xuzha fb2be6d6a1 The name "position_offset_gap" is confusing because Lucene has three
similar sounding things:

* Analyzer#getPositionIncrementGap
* Analyzer#getOffsetGap
* IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS and
* FieldType#storeTermVectorOffsets

Rename position_offset_gap to position_increment_gap
closes #13056
2015-08-26 14:56:35 -07:00
Nik Everett 4b9664beeb Mapping: Default position_offset_gap to 100
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.

Also postition_offset_gaps less than 0 aren't allowed at all.

New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through

Closes #7268
2015-08-25 14:21:50 -04:00
Adrien Grand a91b3fcbb9 Move the `murmur3` field to a plugin and fix defaults.
This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its
defaults so that values will not be indexed by default, as the only purpose
of this field is to speed up `cardinality` aggregations on high-cardinality
string fields, which only requires doc values.

I also removed the `rehash` option from the `cardinality` aggregation as it
doesn't bring much value (rehashing is cheap) and allowed to remove the
coupling between the `cardinality` aggregation and the `murmur3` field.

Close #12874
2015-08-18 11:41:52 +02:00
Clinton Gormley 5df5ab0451 Docs: Another bad asciidoc link 2015-08-15 18:25:34 +02:00
Clinton Gormley b67741f5f3 Docs: Another bad asciidoc link 2015-08-15 18:22:28 +02:00
Clinton Gormley 43936c5fcd Docs: Removed the _size field include 2015-08-15 18:12:31 +02:00
Clinton Gormley e143c6e460 Docs: Prepare plugin and integration docs for 2.0
* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community

Closes #11734
Closes #11724
Closes #11636
Closes #11635
Closes #11632
Closes #11630
Closes #12046
Closes #12438
Closes #12579
2015-08-15 18:02:43 +02:00
Clinton Gormley c6c3a40cb6 Docs: Updated annotations for 2.0.0-beta1 2015-08-14 10:51:09 +02:00
Clinton Gormley f8b9ede81f Documented the update_all_types setting on PUT mapping
Added docs to each mapping param to specify which ones can be updated when
2015-08-12 21:21:37 +02:00
Clinton Gormley 9da8822aed Docs: Made multi-fields more prominent 2015-08-06 20:09:42 +02:00
Clinton Gormley 0eb2ab915d Docs: Fixed date format default option 2015-08-06 19:05:09 +02:00
Clinton Gormley 08687dfa3d Docs: Fixed typo on string datatype page 2015-08-06 18:59:37 +02:00
Clinton Gormley 52663071c0 Docs: Removed redundant docs from field datatypes page. 2015-08-06 18:52:54 +02:00
Clinton Gormley 7977979146 Docs: Reorganised the mapping home page 2015-08-06 18:44:07 +02:00
Clinton Gormley ac2b8951c6 Docs: Mapping docs completely rewritten for 2.0 2015-08-06 17:24:51 +02:00
loopmachine 5de2044c5b Update nested-type.asciidoc mapping example 2015-08-04 14:02:03 -04:00
Ryan Ernst 8cd03cce5e Merge branch 'master' into fix/12329 2015-07-21 00:29:34 -07:00
Ryan Ernst 1c99626b84 Mappings: Remove ability to configure _index
The `_index` field is now a completely virtual field thanks
to #12027. It is no longer necessary to index the actual value
of the index name.

closes #12329
2015-07-20 23:54:35 -07:00
Clinton Gormley c56ce0e242 Docs: Refactored the mapping meta-fields docs 2015-07-20 01:26:27 +02:00
Clinton Gormley 2b512f1f29 Docs: Use "js" instead of "json" and "sh" instead of "shell" for source highlighting 2015-07-14 18:14:09 +02:00
John Roesler f86e8c33c1 Docfix: ignore_above uses string length, not utf-8
ignore_above is used to guard against the lucene limitation
that a term cannot exceed 32766 bytes.

However, the implementation just used the character count, which
doesn't take into account the fact that some characters have
multi-byte utf-8 encodings.

This commit updates the docs to make this relationship clear.

Closes #11563
2015-07-10 18:47:21 +02:00
Clinton Gormley 6c0badd0b3 Docs: Updated the source field docs to remove deprecation of includes/excludes
Also provide warnings about why disabling source is probably something
you don't want to do

Closes #12141
2015-07-10 15:52:30 +02:00
Alexander Reelsen b612cab96a Dates: More strict parsing of ISO dates
If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits. Same applies for month, day, hour, minute, seconds.

Also the new default is `strictDateOptionalTime` for indices created
with Elasticsearch 2.0 or newer.

In addition a couple of not exposed date formats have been exposed, as they
have been mentioned in the documentation.

Closes #6158
2015-07-07 09:34:37 +02:00
Martijn van Groningen 53874bf5a6 aliases: Parse aliases at search time and never cache parsed alias filters
The work around for resolving `now` doesn't need to be used for aliases, becuase alias filters are parsed at search time. However it can't be removed, because the percolator relies on it.

Parent/child can be specified again in alias filters, this now works again because alias filters are parsed at search time. Parent/child will also use the late query parse work around, to make sure to do the final preparations when the search context is around. This allows the aliases api to validate the parent/child queries without failing because there is no search context.

Closes #10485
2015-07-01 21:20:54 +02:00
Christoph Büscher f5f73259e4 Docs: Update Joda URLs in documentation. 2015-06-26 10:23:02 +02:00
Christoph Büscher ba9bbf7e66 Docs: Update date-format.asciidoc
Joda documentation moved from http://joda-time.sourceforge.net/ to http://www.joda.org/joda-time/. Updated the links in the documentation accordingly.
2015-06-26 09:49:29 +02:00
Alexander Reelsen 23cf9af495 Dates: Be backwards compatible with pre 2.x indices
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.

Relates #10971
2015-06-25 17:21:29 +02:00
Clinton Gormley 3105b4edbe Update core-types.asciidoc
Added an anchor for multi-fields in mappinggs
2015-06-24 21:36:37 +02:00
Clinton Gormley f123a53d72 Docs: Refactored modules and index modules sections 2015-06-22 23:49:45 +02:00
Ryan Ernst 12e7cbe92b Mappings: Lockdown _timestamp
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0.  Users who need to
adjust these settings can use a date field.
2015-06-22 10:21:03 -07:00
Alexander Reelsen 38ddc8159c Dates: Allow for negative unix timestamps
This fixes an issue to allow for negative unix timestamps.
An own printer for epochs instead of just having a parser has been added.
Added docs that only 10/13 length unix timestamps are supported
Added docs in upgrade documentation

Fixes #11478
2015-06-22 11:56:31 +02:00
Robin Clarke f13c216aa2 More information about 'Copy field to' 2015-06-09 16:35:49 +02:00
Alexander Reelsen 01e8eaf181 Date Parsing: Add parsing for epoch and epoch in milliseconds
This commit changes the date handling. First and foremost Elasticsearch
does not try to convert every date to a unix timestamp first and then
uses the configured date. This now allows for dates like `2015121212` to
be parsed correctly.

Instead it is now explicit by adding a `epoch_second` and `epoch_millis`
date format. This also means, that the default date format now is
`epoch_millis||dateOptionalTime` to remain backwards compatible.

Closes #5328
Relates #10971
2015-06-03 18:07:47 +02:00
Martijn van Groningen 359d9ac0d0 docs: added missing ids 2015-05-29 22:45:01 +02:00
Martijn van Groningen 1cfb6a79f1 Parent/child: refactored _parent field mapper and parent/child queries
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more.
* Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory.

Breaking changes:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.

All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used.

It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script.

Closes #6107
Closes #8134
2015-05-29 21:44:17 +02:00
Colin Goodheart-Smithe 35a58d874e Scripting: Unify script and template requests across codebase
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs

Closes #11091
2015-05-29 16:52:04 +01:00
Adrien Grand 461683ac58 Mappings: Remove the `compress`/`compress_threshold` options of the BinaryFieldMapper.
This option is broken currently since it potentially interprets an incoming
binary value as compressed while it just happens that the first bytes are the
same as the LZF header.
2015-05-22 14:20:42 +02:00
Ryan Ernst e29492ce94 Docs: Cleanup meta field docs
Meta fields were locked down to not allow exotic options to the
underlying field types in #8143. This change fixes the docs
to no longer refer to the old settings.

closes #10879
2015-05-07 11:26:49 -07:00
Adrien Grand a0af88e996 Query DSL: Remove filter parsers.
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
2015-05-07 20:14:34 +02:00
Ryan Ernst 7a7bd6086a Mappings: Remove ability to disable _source field
Current features (eg. update API) and future features (eg. reindex API)
depend on _source. This change locks down the field so that
it can no longer be disabled. It also removes legacy settings
compress/compress_threshold.

closes #8142
closes #10915
2015-05-05 22:04:18 -07:00
Ryan Ernst d2b12e4fc2 Mappings: Remove docs for type level analyzer defaults
These settings were removed in #9430.
2015-04-30 13:57:55 -07:00
Ryan Ernst 4ef9f3ca63 Mappings: Remove file based default mappings
Using files that must be specified on each node is an anti-pattern
from the API based goal of ES. This change removes the ability
to specify the default mapping with a file on each node.

closes #10620
2015-04-30 13:50:35 -07:00
Adrien Grand 6e076efdb9 Docs: Add documentation for the `doc_values` setting on the `boolean` field type.
Close #10431
2015-04-29 15:59:24 +02:00
Clinton Gormley 7aa4c7e256 Docs: Removed a reference to index_name from the array mapping page 2015-04-29 15:12:31 +02:00
Ryan Ernst bf09e58cb3 Mappings: Remove includes and excludes from _source
Regardless of the outcome of #8142, we should at least enforce that
when _source is enabled, it is sufficient to reindex. This change
removes the excludes and includes settings, since these modify
the source, causing us to lose the ability to reindex some fields.

closes #10814
2015-04-28 15:03:51 -07:00
Clinton Gormley 2579cc31b1 Docs: Note that include_in_parent/root does not apply to geo-shape fields
Closes #10653
2015-04-25 16:49:49 +02:00
Nicholas Knize 453217fd7a [GEO] Prioritize tree_level and precision parameters over default distance_error_pct
If a user explicitly defined the tree_level or precision parameter in a geo_shape mapping their specification was always overridden by the default_error_pct parameter (even though our docs say this parameter is a 'hint'). This lead to unexpected accuracy problems in the results of a geo_shape filter. (example provided in issue #9691)

This simple patch fixes the unexpected behavior by setting the default distance_error_pct parameter to zero when the tree_level or precision parameters are provided by the user. Under the covers the quadtree will now use the tree level defined by the user. The docs will be updated to alert the user to exercise caution with these parameters.  Specifying a precision of "1m" for an index using large complex shapes can quickly lead to OOM issues.

closes #9691
2015-04-21 14:42:10 -05:00
Clinton Gormley abc7de96ae Docs: Updated version annotations in master 2015-04-09 14:50:11 +02:00
Adrien Grand fae124103a Merge pull request #10420 from jpountz/feature/numeric_resolution
Mappings: Bring back numeric_resolution.

Close #10420
2015-04-09 12:28:33 +02:00