Commit Graph

515 Commits

Author SHA1 Message Date
Mayya Sharipova b5d532f9e3
Vector field (#33022)
1. Dense vector

PUT dindex
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_vector": {
          "type": "dense_vector"
        },
        "my_text" : {
          "type" : "keyword"
        }
      }
    }
  }
}

PUT dinex/_doc/1
{
  "my_text" : "text1",
  "my_vector" : [ 0.5, 10, 6 ]
}

2. Sparse vector

PUT sindex
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_vector": {
          "type": "sparse_vector"
        },
        "my_text" : {
          "type" : "keyword"
        }
      }
    }
  }
}

PUT sindex/_doc/1
{
  "my_text" : "text1",
  "my_vector" : {"1": 0.5, "99": -0.5,  "5": 1}
}
2018-12-12 21:20:53 -05:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Alan Woodward 73ceaad03a
Update to lucene-8.0.0-snapshot-c78429a554 (#36212)
Includes:

* A fix for a bug in Intervals.or() (https://issues.apache.org/jira/browse/LUCENE-8586)
* The ability to disable offset mangling in WordDelimiterGraphFilter
        (https://issues.apache.org/jira/browse/LUCENE-8509)
* BM25Similarity no longer multiplies scores by k1 + 1
2018-12-05 12:43:56 +00:00
Guido Lena Cota 89fae42833 (Minor) Fix some typos (#36180) 2018-12-04 11:10:30 +01:00
Peter Dyson 1f25a0bd31 [Docs] Add example for updating meta field (#35893) 2018-11-28 12:04:57 +01:00
Alan Woodward be8097f9ce
Improve docs for index_prefixes option (#35778)
This commit moves the documentation and examples for the `index_prefixes`
option on text fields to its own file, to bring it in line with other mapping 
parameters, and expands a bit on both.
2018-11-22 09:20:46 +00:00
Alan Woodward 26cc8ff8c3
Add pointer to the index-phrases option in shingle filter docs (#35771)
We should be discouraging the use of shingle filters and instead pointing users to the
index-phrases parameter on text fields.
2018-11-21 15:27:11 +00:00
Takuro Wada 7b2d547e8e [Docs] Delete inappropriate backtick (#35722) 2018-11-20 10:08:32 +01:00
Julie Tibshirani ec53288fc0
Remove include_type_name from the relevant APIs. (#35192)
We've decided that the bulk, delete, get, index, update, and search APIs should not
contain this request parameter, and we will instead accept both typed and typeless calls.
2018-11-06 14:33:48 -08:00
Julie Tibshirani 70da490f34
Remove some documentation that only makes sense with multiple types. (#35066)
* Remove a tip about ignore_above that only makes sense with multiple types.
* Remove a line from the percolator documentation that refers to multiple types.
2018-10-30 10:19:12 -07:00
Julie Tibshirani f854330e06
Make sure to use the type _doc in the REST documentation. (#34662)
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
2018-10-22 11:54:04 -07:00
Igor Motov 94bde37bcf
Geo: Don't flip longitude of envelopes crossing dateline (#34535)
When a envelope that crosses the dateline is specified as a part of
geo_shape query is parsed it shouldn't have its left and right points
flipped.

Fixes #34418
2018-10-19 13:53:54 -04:00
Daniel Mitterdorfer 02fb5aa4ec
Remove leftover doc about format being updatable
With this commit we remove a leftover in the docs about the `format`
field being updatable. This is not true since we removed support for
updates in #25285.

Closes #33986
Relates #25285
Relates #34006
2018-09-25 10:13:23 +02:00
markharwood 2fa09f062e
New plugin - Annotated_text field type (#30364)
New plugin for annotated_text field type.
Largely a copy of `text` field type but adds ability to include markdown-like syntax in the text.
The “AnnotatedText” class parses text+markup and converts into plain text and AnnotationTokens.
The annotation token values are injected unchanged alongside the regular text tokens to provide a
form of additional indexed overlay useful in positional searches and highlighting.
Annotated_text fields do not support fielddata as we want to phase this out.
Also includes a new "annotated" highlighter type that retains annotations and merges in search
hits as additional annotation markup.

Closes #29467
2018-09-18 10:25:27 +01:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Pablo Musa a88f8789a0
Highlight that index_phrases only works if no slop is used (#33303)
Highlight that `index_phrases` only works if no slop is used at query time.
2018-08-31 14:48:55 +02:00
Luca Cavanna 393eec1482
Set maxScore for empty TopDocs to Nan rather than 0 (#32938)
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
2018-08-22 17:23:54 +02:00
Dimitrios Liappis abb4c183f1
Clarify ignore_above behavior with arrays of strings
Currently docs don't explain how `ignore_above` behaves with arrays of
strings.

Clarify how `ignore_above` applies for arrays of strings and
also note that all string(s) will still be visible in the
`_source` field.

Relates #33057
2018-08-22 18:18:30 +03:00
Julie Tibshirani 815c56b677
Fix an inaccuracy in the dynamic templates documentation. (#32890) 2018-08-20 11:00:11 -07:00
Julie Tibshirani 0f0068b91c
Ensure that field aliases cannot be used in multi-fields. (#32219) 2018-07-20 00:18:54 -07:00
Julie Tibshirani 15ff3da653
Add support for field aliases. (#32172)
* Add basic support for field aliases in index mappings. (#31287)
* Allow for aliases when fetching stored fields. (#31411)
* Add tests around accessing field aliases in scripts. (#31417)
* Add documentation around field aliases. (#31538)
* Add validation for field alias mappings. (#31518)
* Return both concrete fields and aliases in DocumentFieldMappers#getMapper. (#31671)
* Make sure that field-level security is enforced when using field aliases. (#31807)
* Add more comprehensive tests for field aliases in queries + aggregations. (#31565)
* Remove the deprecated method DocumentFieldMappers#getFieldMapper. (#32148)
2018-07-18 09:33:09 -07:00
Nik Everett 0522c6644d Docs: Remove duplicate test setup
The range docs had an introductory section that described how to set up
and index *and* a test setup section in `docs/build.gradle` that
duplicated that section. This is bad because these section can (and do)
drift from one another. This change removes the setup in build.gradle
and marks the introductor snippet with `// TESTSETUP` so it is used on
all the snippets.
2018-06-28 10:59:35 -04:00
Peter Dyson e7a7b9689d [Docs] Mention ip_range datatypes on ip type page (#31416)
A link to the ip_range datatype page provides a way for newer users to know
it exists if they land directly on the ip datatype page first via a search.
2018-06-20 13:04:03 +02:00
Julie Tibshirani 3f5ebb862d
Clarify that IP range data can be specified in CIDR notation. (#31374) 2018-06-18 08:21:41 -07:00
David Turner 6ad7217656
Remove reference to multiple fields with one name (#31127)
If there is only one type per index then each field's name is unique.
2018-06-07 12:38:57 +01:00
Rafał Bigaj 749d39061a [Docs] Correct minor typos in templates.asciidoc (#31167) 2018-06-07 10:44:57 +02:00
Adrien Grand 458bca11bc
Add a `feature_vector` field. (#31102)
This field is similar to the `feature` field but is better suited to index
sparse feature vectors. A use-case for this field could be to record topics
associated with every documents alongside a metric that quantifies how well
the topic is connected to this document, and then boost queries based on the
topics that the logged user is interested in.

Relates #27552
2018-06-07 10:05:37 +02:00
Colin Goodheart-Smithe d09d60858a
[DOCS] Clarify nested datatype introduction (#31055) 2018-06-06 09:32:45 +01:00
Christoph Büscher 1cee45e768
[Docs] Delete superfluous callouts (#31111)
Those callout create rendering problems on the subsequent page.

Closes #30532
2018-06-06 09:53:14 +02:00
Adrien Grand 500094f5c8
Improve documentation of dynamic mappings. (#30952)
Closes #30939
2018-06-05 08:51:52 +02:00
Jim Ferenczi fa6b7266eb Remove wrong link in index phrases doc
Relates #30450
2018-06-04 12:13:55 +02:00
Colin Goodheart-Smithe 1efb1aae28
[DOCS] Rewords _field_names documentation (#31029)
* [DOCS] Rewords _field_names documentation

Corrects the language around when we write to `_field_names` and when you might want to disable it given that n recent versions it does not carry the indexing overhead it once did.

Relates to #30862

* Update wording following review
2018-06-04 09:17:11 +01:00
Alan Woodward 0427339ab0
Index phrases (#30450)
Specifying `index_phrases: true` on a text field mapping will add a subsidiary
[field]._index_phrase field, indexing two-term shingles from the parent field.
The parent analysis chain is re-used, wrapped with a FixedShingleFilter.

At query time, if a phrase match query is executed, the mapping will redirect it
to run against the subsidiary field.

This should trade faster phrase querying for a larger index and longer indexing
times.

Relates to #27049
2018-06-04 08:50:35 +01:00
Igor Motov 7376c35960
[DOCS] Make geoshape docs less memory hungry (#31014)
Reduces shape size and precision in geo shape mapper examples to reduce
amount of memory required to check docs.

Fixes #23836
2018-06-01 15:05:37 -04:00
Jim Ferenczi 0791f93dbd
Add an option to split keyword field on whitespace at query time (#30691)
This change adds an option named `split_queries_on_whitespace` to the `keyword`
field type. When set to true full text queries (`match`, `multi_match`, `query_string`, ...) that target the field will split the input on whitespace to build the query terms. Defaults to `false`.
Closes #30393
2018-06-01 09:47:03 +02:00
Alan Woodward 67905c85a5
Rename index_prefix to index_prefixes (#30932)
This commit also adds index_prefixes tests to TextFieldMapperTests to ensure that cloning and wire-serialization work correctly
2018-05-30 08:32:31 +01:00
Adrien Grand 886db84ad2
Expose Lucene's FeatureField. (#30618)
Lucene has a new `FeatureField` which gives the ability to record numeric
features as term frequencies. Its main benefit is that it allows to boost
queries with the values of these features and efficiently skip non-competitive
documents at the same time using block-max WAND and indexed impacts.
2018-05-23 08:55:21 +02:00
Jason Tedor 4a4e3d70d5
Default to one shard (#30539)
This commit changes the default out-of-the-box configuration for the
number of shards from five to one. We think this will help address a
common problem of oversharding. For users with time-based indices that
need a different default, this can be managed with index templates. For
users with non-time-based indices that find they need to re-shard with
the split API in place they no longer need to resort only to
reindexing.

Since this has the impact of changing the default number of shards used
in REST tests, we want to ensure that we still have coverage for issues
that could arise from multiple shards. As such, we randomize (rarely)
the default number of shards in REST tests to two. This is managed via a
global index template. However, some tests check the templates that are
in the cluster state during the test. Since this template is randomly
there, we need a way for tests to skip adding the template used to set
the number of shards to two. For this we add the default_shards feature
skip. To avoid having to write our docs in a complicated way because
sometimes they might be behind one shard, and sometimes they might be
behind two shards we apply the default_shards feature skip to all docs
tests. That is, these tests will always run with the default number of
shards (one).
2018-05-14 12:22:35 -04:00
Sue Gallagher 09a6ba4fea
Change quad tree max levels to 29. Closes #21191 (#29663)
* [DOCS] Changed quad tree max levels to 29. Clears 21191

* Changed QuadPrefixTree max levels to 29 and added defaults. Closes #21191
2018-05-03 09:48:21 -07:00
wmellouli c8d8407012 [Docs] Add term query with normalizer example 2018-05-03 10:23:14 +02:00
Adrien Grand 5991ede9ef Fix docs of the `_ignored` meta field.
Relates #29658
2018-05-02 11:43:50 +02:00
Adrien Grand 7358946bda
Add a new `_ignored` meta field. (#29658)
This adds a new `_ignored` meta field which indexes and stores fields that have
been ignored at index time because of the `ignore_malformed` option. It makes
malformed documents easier to identify by using `exists` or `term(s)` queries
on the `_ignored` field.

Closes #29494
2018-05-02 10:47:02 +02:00
Adrien Grand 0a5a9a2086 Remove reference to `not_analyzed`.
Relates #30122.
2018-04-25 15:00:53 +02:00
Adrien Grand 6e62b481b4
Update plan for the removal of mapping types. (#29586)
8.x will no longer allow types in APIs and 7.x will issue deprecation warnings
when `include_type_name` is set to `false`.
2018-04-19 15:09:14 +02:00
Igor Motov 983d6c15a2
Add null_value support to geo_point type (#29451)
Adds support for null_value attribute to the geo_point types.

Closes #12998
2018-04-17 10:19:54 -04:00
Adrien Grand 3367948be6
Add documentation about the include_type_name option. (#29555)
This option will be useful in 7.x to prepare for upgrade to 8.0 which won't
know about types anymore.
2018-04-17 15:04:46 +02:00
Igor Motov e334baf6fc
Fix overflow error in parsing of long geohashes (#29418)
Fixes a possible overflow error that geohashes longer than 12 characters
can cause during parsing.

Fixes #24616
2018-04-16 12:37:38 -04:00
Adrien Grand 3a147b442a Fix docs build. 2018-04-11 13:48:53 +02:00
Adrien Grand 4918924fae
Remove legacy mapping code. (#29224)
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
2018-04-11 09:41:37 +02:00
Adrien Grand 569d0c0e89
Improve similarity integration. (#29187)
This improves the way similarities are plugged in in order to:
 - reject the classic similarity on 7.x indices and emit a deprecation
   warning otherwise
 - reject unkwown parameters on 7.x indices and emit a deprecation
   warning otherwise

Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.

Closes #23208
Closes #29035
2018-04-03 16:45:25 +02:00
David Turner 40d19532bc
Clarify expectations of false positives/negatives (#27964)
Today this part of the documentation just says that Geo queries are not 100% 
accurate, but in fact we can be more precise about which kinds of queries see
which kinds of error. This commit clarifies this point.
2018-04-02 10:03:42 +01:00
David Turner 3ca9310aee
Update docs on vertex ordering (#27963)
At time of writing, GeoJSON did not enforce a specific ordering of vertices in
a polygon, but it now does. We occasionally get reports of Elasticsearch
rejecting apparently-valid GeoJSON because of badly oriented polygons, and it's
helpful to be able to point at this bit of the documentation when responding.
2018-04-02 09:59:12 +01:00
Sue Gallagher 5518640d46
[DOCS] Added info on WGS-84. Closes issue #3590 (#29305) 2018-03-29 15:50:05 -07:00
Nicholas Knize d400a08788 [DOCS] Remove ignore_z_value parameter link
Removes invalid ignore_z_value parameter link in geo-point.asciidoc.
2018-03-23 11:07:24 -05:00
Nicholas Knize fede633563 Add Z value support to geo_shape
This enhancement adds Z value support (source only) to geo_shape fields. If vertices are provided with a third dimension, the third dimension is ignored for indexing but returned as part of source. Like beofre, any values greater than the 3rd dimension are ignored.

closes #23747
2018-03-23 08:50:55 -05:00
Adrien Grand 8f9d2ee4e2
Reject updates to the `_default_` mapping. (#29165)
This will reject mapping updates to the `_default_` mapping with 7.x indices
and still emit a deprecation warning with 6.x indices.

Relates #15613
Supersedes #28248
2018-03-21 10:44:11 +01:00
Adrien Grand 0755ff425f
Clarify requirements of strict date formats. (#29090)
Closes #29014
2018-03-16 14:39:36 +01:00
Adrien Grand 695ec05160
Clarify that dates are always rendered as strings. (#29093)
Even in the case that the date was originally supplied as a long in the
JSON document.

Closes #26504
2018-03-16 14:34:33 +01:00
Cladis 3234fb1369 Grammar: "by geographically" → "geographically" (#28595) 2018-02-15 16:12:58 -08:00
Alex Moros Marco abe1e05ba4 [Docs] Add missing word in nested.asciidoc (#28507) 2018-02-15 14:56:02 +01:00
Christoph Büscher bc10334f7a
[Docs] Move callouts in range.asciidoc (#28264)
Currently the callouts for this section are below all the examples, making it
harder to relate them to the snippets. Instead they should be moved closer 
to the examples.
2018-02-02 11:00:07 +01:00
Adrien Grand 3f5716b9b8
Clarify that the `null_value` option doesn't modify the `_source` document. (#28374)
Closes #15959
2018-01-31 15:04:11 +01:00
Adrien Grand 9163c9b8d1
Clarify the defaults for `ignore_above`. (#28372)
Closes #27992
2018-01-31 15:03:20 +01:00
Alan Woodward 424ecb3c7d
Add ability to index prefixes on text fields (#28290)
This adds the ability to index term prefixes into a hidden subfield, enabling prefix queries to be run without multitermquery rewrites. The subfield reuses the analysis chain of its parent text field, appending an EdgeNGramTokenFilter. It can be configured with minimum and maximum ngram lengths. Query terms with lengths outside this min-max range fall back to using prefix queries against the parent text field.

The mapping looks like this:

"my_text_field" : {
"type" : "text",
"analyzer" : "english",
"index_prefix" : { "min_chars" : 1, "max_chars" : 10 }
}

Relates to #27049
2018-01-30 08:26:56 +00:00
David Kemp 531c58cf81 Documents applicability of term query to range type (#28166)
Closes #27030
2018-01-18 17:19:01 -05:00
Christoph Büscher d4ac0026fc
[Docs] Clarify numeric datatype ranges (#28240)
Since #25826 we reject infinite values for float, double and half_float
datatypes. This change adds this restriction to the documentation for the
supported datatypes.

Closes #27653
2018-01-16 15:53:28 +01:00
Martijn van Groningen cef7bd2079
docs: add best practises for wildcard queries inside percolator queries 2017-12-15 10:49:59 +01:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. (#27816)
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
2017-12-14 17:47:53 +01:00
Ryan Ernst c51e48bec0
Correct docs for binary fields and their default for doc values (#27680)
closes #27240
2017-12-05 15:10:18 -08:00
Nicholas Knize 8bcf5393f2 [Geo] Add Well Known Text (WKT) Parsing Support to ShapeBuilders
This commit adds WKT support to Geo ShapeBuilders.

This supports the following format:

POINT (30 10)
LINESTRING (30 10, 10 30, 40 40)
BBOX (-10, 10, 10, -10)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOINT (10 40, 40 30, 20 20, 30 10)
MULTILINESTRING ((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10))
MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))
GEOMETRYCOLLECTION (POINT (30 10), MULTIPOINT ((10 40), (40 30), (20 20), (30 10)))

closes #9120
2017-12-05 10:56:41 -06:00
Clinton Gormley 0bba2a8438 Update removal_of_types.asciidoc
Corrected  `include_in_type` to `include_type_name`
2017-12-05 10:44:48 +01:00
Christoph Büscher 0d11b9fe34
[Docs] Unify spelling of Elasticsearch (#27567)
Removes occurences of "elasticsearch" or "ElasticSearch" in favour of
"Elasticsearch" where appropriate.
2017-11-29 09:44:25 +01:00
David Turner a165d1df40
Minor improvements to docs for numeric types (#27553)
* Caps
* Fix awkward wording that took multiple passes to parse
* Floating point _number_
* Something more descriptive about the `scaled_float` scaling factor.
2017-11-28 11:36:07 +00:00
Mayya Sharipova 57e4d10007
Limit the number of nested documents (#27405)
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes #26962
2017-11-22 10:16:28 -05:00
Jim Ferenczi bf72858ce8
[Docs] Restore section about multi-level parent/child relation in parent-join (#27392)
This section was removed to hide this ability to new users.
This change restores the section and adds a warning regarding the expected performance.

Closes #27336
2017-11-16 11:29:16 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Nicholas Knize 06ff92d237 Add ignore_malformed to geo_shape fields
This commit adds ignore_malformed support to geo_shape field types to skip malformed geoJson fields.

closes #23747
2017-11-09 17:59:05 -06:00
Holger Bartnick aa03fb72b7 [Docs] Correct link target for datatype murmur3 (#27143) 2017-10-30 09:31:55 +01:00
Martijn van Groningen f1e944a675
docs: describe parent/child performances 2017-10-26 11:49:13 +02:00
markwalkom 2b864156ca [Docs] Clarify mapping `index` option default (#27104) 2017-10-25 12:42:29 +02:00
David Turner 559fc5a4de Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
You need 4 bytes for characters outside the BMP, which includes many emoji and
a bunch of less-common writing characters too.
2017-10-24 09:50:47 +01:00
Adrien Grand 4e1ff8d086 Add documentation about disabling `_field_names`. (#26813)
This field has significant index-time overhead.

Closes #26779
2017-10-06 16:49:15 +02:00
Clinton Gormley eb3ead6561 Update type-field.asciidoc
Fixed asciidoc syntax on deprecated annotation
2017-10-06 11:57:27 +02:00
Christoph Büscher 6189c54c84 Reject the `index_options` parameter for numeric fields (#26668)
Numeric fields no longer support the index_options parameter. This changes the parameter
to be rejected in numeric field types after it was deprecated in 6.0.

Closes #21475
2017-09-25 23:43:14 +02:00
Michael Basnight f385e0cf26 Add bad_request to the rest-api-spec catch params (#26539)
This adds another request to the catch params. It also makes sure that
the generic request param does not allow 400 either.
2017-09-14 14:24:03 -05:00
Bernd 59600dfe2d [Docs] Correct typo in removal_of_types.asciidoc (#26646) 2017-09-14 15:34:07 +02:00
Daniel A. Ochoa 914416e9f4 [Docs] Update link in removal_of_types.asciidoc (#26614)
Fix link to [parent-child relationship].
2017-09-14 10:11:03 +02:00
Jim Ferenczi c709b8d6ac Fix incomplete sentences in parent-join docs (#26623)
* Fix incomplete sentences in parent-join docs

Closes #26590
2017-09-13 16:09:00 +02:00
Martijn van Groningen b391425da1
Added support to the percolate query to percolate multiple documents
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.

Also improved the support for multiple percolate queries in a search request.
2017-09-08 17:28:39 +02:00
Martijn van Groningen a4d5c6418e
percolator: Rename map_unmapped_fields_as_string setting to map_unmapped_fields_as_text
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
2017-09-04 14:12:44 +02:00
Jim Ferenczi 86d97971a4 Remove the _all metadata field (#26356)
* Remove the _all metadata field

This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
2017-08-28 17:43:59 +02:00
Martijn van Groningen 636e85e5b7
percolator: Hint what clauses are important in a conjunction query based on fields
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.

For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
2017-08-11 15:32:01 +02:00
Martijn van Groningen b88cfe2008
docs: Use stackexchange based example to make documentation easier to understand 2017-08-04 16:04:26 +02:00
Martijn van Groningen ec7ac32772
docs: document work around for the percolator if query time text analysis is expensive. 2017-07-28 15:04:15 +02:00
Martijn van Groningen 7c3735bdc4
percolator: Store the QueryBuilder's Writable representation instead of its XContent representation.
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.

The query builder's binary format has now the same bwc guarentees as the xcontent format.

Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
2017-07-28 12:24:10 +02:00
Martijn van Groningen 5cf56a846a
docs: Remove incorrect warning
Closes #25935
2017-07-28 10:53:47 +02:00
Colin Goodheart-Smithe f1f1725fcf [DOCS] improve explanation of dynamic mapping setting (#25829)
Closes #25825
2017-07-21 12:24:38 +01:00
Clinton Gormley febb4bf7bc Update removal_of_types.asciidoc
Fixed `include_in_type` -> `include_type_name`
2017-07-20 19:18:51 +02:00
Clinton Gormley f69decf509 NOCONSOLE -> NOTCONSOLE in removal-of-types 2017-07-19 14:06:04 +02:00
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Simon Willnauer e81804cfa4 Add a shard filter search phase to pre-filter shards based on query rewriting (#25658)
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.

This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
2017-07-12 22:19:20 +02:00
James Baiera 847378a43b Add another parent value option to join documentation (#25609)
Indexing a join field on a document requires a value of type "object" and two sub fields "name" 
and "parent". The "parent" field is only required on child documents, but the "name" field which 
denotes the name of the relation is always needed. Previously, only the short-hand version of the 
join field was documented. This adds documentation for the long-hand join field data, and 
explicitly points out that just specifying the name of the relation for the field value is a 
convenience shortcut.
2017-07-11 15:36:59 -04:00
Martijn van Groningen d0f9f425bd
parent/child: Removed ParentJoinFieldSubFetchPhase 2017-07-06 13:15:02 +02:00
Adrien Grand 26de905f1e Fix the documentation to state that the `_id` field is indexed. (#25540) 2017-07-05 16:09:31 +02:00
Clinton Gormley 0170e0e8d3 Remove usage of multi-types from the docs and added a page explaining type removal (#25543)
Closes #25401
2017-07-05 12:30:19 +02:00
Martijn van Groningen 9ce9c21b83
docs: added percolator script query limitation 2017-06-28 17:10:30 +02:00
Nathan Taylor 645bb9d0fb Docs: Removed duplicated line in mapping docs 2017-06-21 10:47:19 +02:00
Jim Ferenczi afada69ea9 [Docs] more fix for the parent-join docs 2017-06-16 12:49:16 +02:00
Jim Ferenczi 664193185e [Docs] Fix cross reference for parent-join field 2017-06-16 11:53:16 +02:00
Jim Ferenczi ccb3c9aae7 Add documentation for the new parent-join field (#25227)
* Add documentation for the new parent-join field

This commit adds the docs for the new parent-join field.
It explains how to define, index and query this new field.

Relates #20257
2017-06-16 11:13:23 +02:00
Russ Cam f6821c41d8 Add half_float and scaled float (#22988)
to numeric datatypes
(cherry picked from commit 67ea06145a80d5ec52ba55d1f2e1e8287e1882b1)
2017-06-13 09:54:44 +10:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Jim Ferenczi 8250aa4267 Remove the postings highlighter and make unified the default highlighter choice (#25028)
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815

It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
2017-06-09 14:09:57 +02:00
Andrey Groshev e4fd8485ce Made the same length of opening and closing lines (#23583) 2017-06-09 00:50:43 -07:00
Jim Ferenczi ad905924ae update docs that claim that classic is the default similarity 2017-06-09 09:22:48 +02:00
Adrien Grand ebf806d38f Reorganize docs of global ordinals. (#24982)
Currently global ordinals are documented under `fielddata`. It moves them to
their own file since they also work with doc values and fielddata is on the way
out.

Closes #23101
2017-06-01 16:47:44 +02:00
markharwood b7197f5e21 SignificantText aggregation - like significant_terms, but for text (#24432)
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.

Closes #23674
2017-05-24 13:46:43 +01:00
Adrien Grand a72eaa8e0f Identify documents by their `_id`. (#24460)
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.

One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
2017-05-09 16:33:52 +02:00
Nicholas Knize 0c4eb0a029 Add new ip_range field type
This commit adds support for indexing and searching a new ip_range field type. Both IPv4 and IPv6 formats are supported. Tests are updated and docs are added.
2017-05-05 09:43:42 -05:00
Nik Everett a01f846226 CONSOLEify a few more docs
Adds CONSOLE to cross-cluster-search docs but skips them for testing
because we don't have a second cluster set up. This gets us the
`VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they
are valid yaml (not json, technically) but doesn't get testing.
Which is better than we had before.

Adds CONSOLE to the dynamic templates docs and ingest-node docs.
The ingest-node docs contain a *ton* of non-console snippets. We
might want to convert them to full examples later, but that can be
a separate thing.

Relates to #18160
2017-05-04 21:01:14 -04:00
Adrien Grand 1be2800120 Only allow one type on 7.0 indices (#24317)
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.

Relates #15613
2017-04-27 08:43:20 +02:00
Danilo Akamine 0adaf9fb4c Drop `search_analyzer` parameter from keyword.asciidoc (#24221)
`search_analyzer` isn't supported by `keyword` fields so this removes
it from the documentation for them.
2017-04-25 12:49:50 -04:00
Nik Everett e429d66956 CONSOLEify some more docs
Relates to #18160
2017-04-24 16:08:19 -04:00
Fabien Baligand 4a45579506 token_count type : add an option to count tokens (fix #23227) (#24175)
Add option "enable_position_increments" with default value true.
If option is set to false, indexed value is the number of tokens
(not position increments count)
2017-04-21 00:53:28 +02:00
Loek van Gool e11d892562 Update field-names-field.asciidoc (#24178)
fix typo in field name
2017-04-19 11:57:37 +02:00
Martijn van Groningen 3d9671a668
[PERCOLATOR] Allowing range queries with now ranges inside percolator queries.
Before now ranges where forbidden, because the percolator query itself could get cached and then the percolator queries with now ranges that should no longer match, incorrectly will continue to match.
By disabling caching when the `percolator` is being used, the percolator can now correctly support range queries with now based ranges.

 I think this is the right tradeoff. The percolator query is likely to not be the same between search requests and disabling range queries with now ranges really disabled people using the percolator for their use cases.

 Also fixed an issue that existed in the percolator fieldmapper, it was unable to find forbidden queries inside `dismax` queries.

 Closes #23859
2017-04-07 08:44:43 +02:00
Lee Hinman b6b9ef8e26 [DOCS] Remove line about eager loading global ordinals
Fielddata can no longer be configured to be loaded eagerly (it only accepts
`true` and `false`), so this line is a little misleading because it talks about
a procedure we can no longer do.
2017-04-03 12:56:21 -06:00
Nik Everett 653f50973a CONSOLEify geo-shape docs
`CONSOLE`ify geo-shape type and geo-shape query docs.

Relates to #18160
2017-03-31 09:11:54 -04:00
Nik Everett 5f91241f57 CONSOLEify geo aggregation docs
Turns the top example in each of the geo aggregation docs into a working
example that can be opened in CONSOLE. Subsequent examples can all also
be opened in console and will work after you've run the first example.
All examples are tested as part of the build.
2017-03-30 21:28:52 -04:00
Ali Beyad 8359dd05c9 Adds boolean similarity to Elasticsearch (#23637)
This commit adds the boolean similarity scoring from Lucene to
Elasticsearch.  The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
    "similarity": "boolean"

Closes #6731
2017-03-28 10:17:23 -04:00
Martijn van Groningen b116b8f0cb
[DOCS] Update the docs about the fact that global ordinals for _parent field are loaded eagerly instead of lazily by default.
Relates to #8053
2017-03-22 10:39:39 +01:00
Lee Hinman b3c27a7fdd Disallow include_in_all for 6.0+ indices
Since `_all` is now deprecated and cannot be set for new indices, we should also
disallow any field that has the `include_in_all` parameter set.

Resolves #22923
2017-02-07 19:31:51 -07:00
AlexNodex fb8bdbc57a Update typo in date (#22955)
your example has yyy and it should be yyyy
2017-02-03 13:16:17 +01:00
Clinton Gormley 19ce039d2d Update type-field.asciidoc
Wildcard type names are not supported
2017-01-27 17:50:28 +01:00
Yannick Welsch 881993de3a [Docs] Remove outdated info about enabling/disabling doc_values (#22694) 2017-01-19 17:33:40 +01:00
Daniel Mitterdorfer aece89d6a1 Make boolean conversion strict (#22200)
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
2017-01-19 07:59:18 +01:00
Scott Somerville 372812da98 Allow an index to be partitioned with custom routing (#22274)
This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.

This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:
- The big tenants cannot fit into a single shard or there is so many of them that they will likely
end up on the same shard
- Tenants often have a surge in write traffic and a single shard cannot process it fast enough

Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinality fields.

A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.

Closes #21585
2017-01-18 08:51:23 +01:00
Alex a0c83c4511 Minor doc changes to clarify mapping index param for string type (#22652)
* Grammatical correction

* Add note for legacy string mapping type

* Update truncate token filter to not mention the keyword tokenizer

The advice predates the existence of the keyword field

Closes #22650
2017-01-17 16:43:11 +01:00
Lee Hinman 7a18bb50fc Disable _all by default
This change disables the _all meta field by default.

Now that we have the "all-fields" method of query execution, we can save both
indexing time and disk space by disabling it.

_all can no longer be configured for indices created after 6.0.

Relates to #20925 and #21341
Resolves #19784
2017-01-11 16:47:13 -07:00
Nik Everett 75d5b3d9eb Fix parent_id example in docs
And fix some indentation I noticed while looking up the query.
2017-01-10 10:01:31 -05:00
Clinton Gormley cb7952e71d Docs: Parent field is no longer indexed and should use parent_id instead of term query
Closes #22517
2017-01-10 13:48:07 +01:00
Jason Veatch 20f90178fe Docs: Detail on false/strict dynamic mapping setting (#22451)
Reference: https://www.elastic.co/guide/en/elasticsearch/guide/master/dynamic-mapping.html
2017-01-05 14:36:18 -05:00
Adrien Grand 3f805d68cb Add the ability to set an analyzer on keyword fields. (#21919)
This adds a new `normalizer` property to `keyword` fields that pre-processes the
field value prior to indexing, but without altering the `_source`. Note that
only the normalization components that work on a per-character basis are
applied, so for instance stemming filters will be ignored while lowercasing or
ascii folding will be applied.

Closes #18064
2016-12-30 09:36:10 +01:00
Adrien Grand 84edf36f11 Make `-0` compare less than `+0` consistently. (#22173)
Our `float`/`double` fields generally assume that `-0` compares less than `+0`,
except when bounds are exclusive: an exclusive lower bound on `-0` excludes
`+0` and an exclusive upper bound on `+0` excludes `-0`.

Closes #22167
2016-12-21 16:51:45 +01:00
Adrien Grand 9524c81af9 Document the `locale` option of the `date` field. (#22050)
This also adds another level of protection against using the default locale.
Relates to https://discuss.elastic.co/t/mapping-for-12h-date-format/68433/3.
2016-12-09 09:45:53 +01:00
Nicholas Knize af1ab68b64 Add RangeFieldMapper for numeric and date range types
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.

Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.

When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
2016-11-29 10:10:14 -06:00
Clinton Gormley 5555e85619 Document that the PUT mapping API with the _default_ type overwrites instead of merging
Closes #8215
2016-11-26 12:43:56 +01:00
Clinton Gormley a4e88bb64a Fixed bad asciidoc in boolean mapping docs 2016-11-15 17:50:23 +00:00
Lee Hinman 96122aa518 Be strict when parsing values searching for booleans (#21555)
This changes only the query parsing behavior to be strict when searching on
boolean values. We continue to accept the variety of values during index time,
but searches will only be parsed using `"true"` or `"false"`.

Resolves #21545
2016-11-15 10:36:57 -07:00
Alexander Lin 0219a211d3 Allows multiple patterns to be specified for index templates (#21009)
* Allows for an array of index template patterns to be provided to an
index template, and rename the field from 'template' to 'index_pattern'.

Closes #20690
2016-11-10 18:00:30 -05:00