Commit Graph

511 Commits

Author SHA1 Message Date
Martijn van Groningen 6aa9aaa2c6
Add validation for dynamic templates (#52890)
Backport of #51233 to the seven dot x branch.

Tries to load a `Mapper` instance for the mapping snippet of a dynamic template.
This should catch things like using an analyzer that is undefined or mapping attributes that are unused.

This is best effort:
* If `{{name}}` placeholder is used in the mapping snippet then validation is skipped.
* If `match_mapping_type` is not specified then validation is performed for all mapping types.
  If parsing succeeds with a single mapping type then this the dynamic mapping is considered valid.

If is detected that a dynamic template mapping snippet is invalid at mapping update time then the mapping update is failed for indices created on 8.0.0-alpha1 and later. For indices created on prior version a deprecation warning is omitted instead. In 7.x clusters the mapping update will never fail in case of an invalid dynamic template mapping snippet and a deprecation warning will always be omitted.

Closes #17411
Closes #24419

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2020-02-28 10:35:04 +01:00
James Rodewig 1a14ae4e1b [DOCS] Document `include_in_*` nested mapping parms (#52648)
Adds documentation for the `include_in_parent` and `include_in_root`
mapping parameters for the `nested` mapping datatype.
2020-02-25 07:13:49 -05:00
Ignacio Vera ba9d3c6389
Add support for multipoint shape queries (#52564) (#52705) 2020-02-24 13:46:51 +01:00
Ignacio Vera 107f00a4ec
Add support for multipoint geoshape queries (#52133) (#52553)
Currently multi-point queries are not supported when indexing your data using BKD-backed geoshape strategy. This commit removes this limitation.
2020-02-21 07:45:53 +01:00
Valentin Crettaz a68fafd64b [DOCS] Clarify that "now" cannot be used in `date_range` at index time (#52446)
`date_range` fields do not accept `"now"` as a value of either bounds at indexing time.

This corrects an error in the range data type mapping docs.
2020-02-19 12:40:58 -05:00
James Rodewig 9128106b4c [DOCS] Remove 'analyzed string' references (#51946)
The `string` field datatype was replaced by the `text` and `keyword`
field datatypes in [5.0][0].

This removes several outdated references to 'analyzed string' fields.

[0]:https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_mapping_changes.html#_string_fields_replaced_by_textkeyword_fields
2020-02-14 12:34:37 -05:00
Igor Motov a66988281f
Add histogram field type support to boxplot aggs (#52265)
Add support for the histogram field type to boxplot aggs.

Closes #52233
Relates to #33112
2020-02-13 18:09:26 -05:00
Ryan Ernst 12e378b3ac Fix incorrect date nanos docs example (#52249)
The example of how to access the nano value of a date_nanos field has
been broken since it was created. This commit fixes it to use the
correct scripting methods.

closes #51931
2020-02-12 15:55:41 -08:00
Marios Trivyzas dac720d7a1
Add a cluster setting to disallow expensive queries (#51385) (#52279)
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.

- Queries that need to do linear scans to identify matches:
  - Script queries
- Queries that have a high up-front cost:
  - Fuzzy queries
  - Regexp queries
  - Prefix queries (without index_prefixes enabled
  - Wildcard queries
  - Range queries on text and keyword fields
- Joining queries
  - HasParent queries
  - HasChild queries
  - ParentId queries
  - Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
  - Script score queries
  - Percolate queries

Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
2020-02-12 22:56:14 +01:00
Yannick Welsch fa212fe60b Stricter checks of setup and teardown in docs tests (#51430)
Adds extra checks due to 7.x backport
2020-01-28 16:52:23 +01:00
Mayya Sharipova a29deecbda Revert "Make it clear this is boost at index time (#51390)"
This reverts commit 3d5238bd95.
2020-01-24 11:05:42 -05:00
Jonas F. Henriksen 3d5238bd95 Make it clear this is boost at index time (#51390)
The way it was originally written, it sounds like 
we are boosting at query time.
 Of course, the effect is at query time, 
but the point here is that boosting is done at index time
2020-01-24 10:37:07 -05:00
junmuz 6718ce0f62 [DOCS] Correct typo in `ignore_malformed` mapping parm docs (#50780) 2020-01-13 09:49:53 -05:00
Adrien Grand 31158ab3d5
Add per-field metadata. (#50333)
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.

In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
 - keys can't be longer than 20 chars,
 - values can only be numbers or strings of no more than 50 chars - no inner
   arrays or objects,
 - the metadata can't have more than 5 keys in total.

Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.

Here is how the meta might look like in mappings:

```json
{
  "properties": {
    "latency": {
      "type": "long",
      "meta": {
        "unit": "ms"
      }
    }
  }
}
```

And then in the field capabilities response:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms" ]
      }
    }
  }
}
```

When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms", "ns" ]
      }
    }
  }
}
```

Closes #33267
2020-01-08 16:21:18 +01:00
James Rodewig d3094f9d23 [DOCS] Fix typo in mapping date format docs 2020-01-08 07:55:51 -06:00
arkel-s d5f4790f90 [DOCS] Add example format for `date_optional_time` (#50458)
Adds an example format for `date_optional_time` to the `format` mapping
parameter docs.

Closes #50457
2020-01-07 10:13:34 -06:00
James Rodewig ef467cc6f5 [DOCS] Remove unneeded redirects (#50476)
The docs/reference/redirects.asciidoc file stores a list of relocated or
deleted pages for the Elasticsearch Reference documentation.

This prunes several older redirects that are no longer needed and
don't require work to fix broken links in other repositories.
2019-12-26 08:29:28 -05:00
Nik Everett 01293ebad5
Fix docs typos (#50365) (#50464)
Fixes a few typos in the docs.

Co-authored-by: Xiang Dai <764524258@qq.com>
2019-12-23 12:38:17 -05:00
James Rodewig 726c35dfd0 [DOCS] Add identifier mapping tip to numeric and keyword datatype docs (#49933)
Users often mistakenly map numeric IDs to numeric datatypes. However,
this is often slow for the `term` and other term-level queries.

The "Tune for search speed" docs includes advice for mapping numeric
IDs to `keyword` fields. However, this tip is not included in the
`numeric` or `keyword` field datatype doc pages.

This rewords the tip in the "Tune for search speed" docs, relocates it
to the `numeric` field docs, and reuses it using tagged regions.
2019-12-17 09:34:32 -05:00
Ignacio Vera 3717c733ff
"CONTAINS" support for BKD-backed geo_shape and shape fields (#50141) (#50213)
Lucene 8.4 added support for "CONTAINS", therefore in this commit those
changes are integrated in Elasticsearch. This commit contains as well a
bug fix when querying with a geometry collection with "DISJOINT" relation.
2019-12-16 09:17:51 +01:00
Adrien Grand 87e72156ce
Upgrade to lucene 8.4.0-snapshot-662c455. (#50016) (#50039)
Lucene 8.4 is about to be released so we should check it doesn't cause problems
with Elasticsearch.
2019-12-10 18:04:58 +01:00
Ignacio Vera 326fe7566e
New Histogram field mapper that supports percentiles aggregations. (#48580) (#49683)
This commit adds  a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.
2019-11-28 15:06:26 +01:00
Jim Ferenczi d6445fae4b Add a cluster setting to disallow loading fielddata on _id field (#49166)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.

Closes #43599
2019-11-28 09:35:28 +01:00
Mayya Sharipova e3da60c23d Increase the number of vector dims to 2048 (#46895) 2019-11-20 07:47:33 -05:00
Julie Tibshirani 81a9d98a47 Remove the 'experimental' marking from vector fields. (#49120)
We wrapped up the API changes we wanted to make, and vector fields can now be
considered GA.
2019-11-18 12:42:46 -08:00
Antoine Garcia 288217e82b [Docs] Specify field types not supporting doc values (#49041)
The `string` type (with option `analyzed`) has been replaced by `text` after `6.0`, 
also the `annonated_text` field do not support doc values and should be mentioned.
2019-11-18 16:38:31 +01:00
Julie Tibshirani 37fa3fb4ff
Ensure parameters are updated when merging flattened mappings. (#48971) (#49014)
This PR makes the following two fixes around updating flattened fields:

* Make sure that the new value for ignore_above is immediately taken into
  affect. Previously we recorded the new value but did not use it when parsing
  documents.
* Allow depth_limit to be updated dynamically. It seems plausible that a user
  might want to tweak this setting as they encounter more data.
2019-11-12 21:50:39 -05:00
lgypro abddf51672 [Docs] Fix syntax error leading to wrong doc ID (#48554)
In order to index a document with id 2, the "&" should be replaced by "?"
2019-10-29 10:27:23 +01:00
Julie Tibshirani b2974e3816 Correct outdated information in _index docs. (#48436)
This PR makes the following updates:
* Update the supported query types to include `prefix` and `wildcard`.
* Specify that queries accept index aliases.
* Clarify that when querying on a remote index name, the separator `:` must be
  present.
2019-10-24 11:02:25 -07:00
Julie Tibshirani 2664cbd20b
Deprecate the sparse_vector field type. (#48368)
We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
2019-10-23 16:35:03 -07:00
Christoph Büscher 3ea666d694 Clarify mapping types that support ignore_malformed (#48206)
The `ignore_malformed` setting only works on selected mapping types, otherwise
we throw an mapper_parsing_exception. We should add a list of all the mapping
types that support it, since the number of types not supporting it seems larger.

Closes #47166
2019-10-18 20:39:38 +02:00
Julie Tibshirani 4faba9cbbf Mention ip fields in the global ordinals docs. (#47045)
Although they do not support eager_global_ordinals, ip fields use global
ordinals for certain aggregations like 'terms'.

This commit also corrects a reference to the sampler aggregation.
2019-09-24 12:39:11 -07:00
James Rodewig 2831535cf9 [DOCS] Replace "// CONSOLE" comments with [source,console] (#46679) 2019-09-13 11:44:54 -04:00
Christoph Büscher aa0c586b73 Deprecate `_field_names` disabling (#42854)
Currently we allow `_field_names` fields to be disabled explicitely, but since
the overhead is negligible now we decided to keep it turned on by default and
deprecate the `enable` option on the field type. This change adds a deprecation
warning whenever this setting is used, going forward we want to ignore and finally
remove it.

Closes #27239
2019-09-11 14:58:08 +02:00
Julie Tibshirani 10da998dfa Expand documentation around global ordinals. (#46517)
This commit updates the eager_global_ordinals documentation to give more
background on what global ordinals are and when they are used. The docs also now
mention that global ordinal loading may be expensive, and describes the cases
where in which loading them can be avoided.
2019-09-10 11:04:07 -07:00
Julie Tibshirani 1b9bd9a0a8 Use a literal block in the field data docs. (#46469)
Currently we use `quote`, which renders a bit strangely on the website.
2019-09-10 11:04:07 -07:00
James Rodewig f04573f8e8
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) (#46459) 2019-09-06 16:09:09 -04:00
Anton 6ae1ae9c9a [Docs] Fix typo in field-names-field.asciidoc (#46430) 2019-09-06 18:04:28 +02:00
James Rodewig c46c57d439
[DOCS] Change // CONSOLE comments to [source,console] (#46441) (#46451) 2019-09-06 11:31:13 -04:00
James Rodewig bb7bff5e30
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) (#46418) 2019-09-06 09:22:08 -04:00
Julie Tibshirani 40c3225d26
First round of optimizations for vector functions. (#46294)
This PR merges the `vectors-optimize-brute-force` feature branch, which makes
the following changes to how vector functions are computed:
* Precompute the L2 norm of each vector at indexing time. (#45390)
* Switch to ByteBuffer for vector encoding. (#45936)
* Decode vectors and while computing the vector function. (#46103) 
* Use an array instead of a List for the query vector. (#46155)
* Precompute the normalized query vector when using cosine similarity. (#46190)

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2019-09-04 14:45:57 -07:00
Nick Knize 647a8308c3
[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363)
* Introduce Spatial Plugin (#44389)

Introduce a skeleton Spatial plugin that holds new licensed features coming to 
Geo/Spatial land!

* [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923)

Refactor DeprecatedParameters specific to legacy geo_shape out of
AbstractGeometryFieldMapper.TypeParser#parse.

* [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980)

Add a new ShapeFieldMapper to the xpack spatial module for
indexing arbitrary cartesian geometries using a new field type called shape.
The indexing approach leverages lucene's new XYShape field type which is
backed by BKD in the same manner as LatLonShape but without the WGS84
latitude longitude restrictions. The new field mapper builds on and
extends the refactoring effort in AbstractGeometryFieldMapper and accepts
shapes in either GeoJSON or WKT format (both of which support non geospatial
geometries).

Tests are provided in the ShapeFieldMapperTest class in the same manner
as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests.
Documentation for how to use the new field type and what parameters are
accepted is included. The QueryBuilder for searching indexed shapes is
provided in a separate commit.

* [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108)

Add a new ShapeQueryBuilder to the xpack spatial module for
querying arbitrary Cartesian geometries indexed using the new shape field
type.

The query builder extends AbstractGeometryQueryBuilder and leverages the
ShapeQueryProcessor added in the previous field mapper commit.

Tests are provided in ShapeQueryTests in the same manner as
GeoShapeQueryTests and docs are updated to explain how the query works.
2019-08-14 16:35:10 -05:00
Julie Tibshirani 9318192578 Correct a code snippet in removal_of_types. (#45118)
Previously, the reindex examples did not include `_doc` as the destination type.
This would result in the reindex failing with the error "Rejecting mapping
update to [users] as the final mapping would have more than 1 type: [_doc,
user]".

Relates to #43100.
2019-08-06 14:09:21 -07:00
James Rodewig 8dd74dfe0b Rename "indices APIs" to "index APIs" (#44863) 2019-08-02 14:10:09 -04:00
David Turner 8516fb0f3b Expand docs on force-merge and global ordinals (#44684)
Some small clarifications about force-merging and global ordinals, particularly
that global ordinals are cheap on a single-segment index and how this relates
to frozen indices.

Fixes #41687
2019-07-23 07:33:33 +01:00
James Rodewig 8d7392de35 [DOCS] Make field datatype titles consistent (#43933)
* [DOCS] Make field datatype titles consistent

* Add titleabbrev for array
2019-07-22 08:52:23 -04:00
James Rodewig d46545f729 [DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:18:23 -04:00
Mayya Sharipova 3220709b0a Add positions info into term_vector doc (#44379) 2019-07-16 16:24:50 -04:00
Mark Walkom 4a5215d22a [DOCS] Update id-field.asciidoc (#42482)
Adding a note around the size limit for `_id`
2019-07-16 14:57:33 +02:00
Nikita Glashenko d187fcb9de Support WKT point conversion to geo_point type (#44107)
This PR adds support for parsing geo_point values from WKT POINT format.
Also, a few minor bugs in geo_point parsing were fixed.

Closes #41821
2019-07-12 14:31:07 -04:00