Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.
This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
Indexing a join field on a document requires a value of type "object" and two sub fields "name"
and "parent". The "parent" field is only required on child documents, but the "name" field which
denotes the name of the relation is always needed. Previously, only the short-hand version of the
join field was documented. This adds documentation for the long-hand join field data, and
explicitly points out that just specifying the name of the relation for the field value is a
convenience shortcut.
* Add documentation for the new parent-join field
This commit adds the docs for the new parent-join field.
It explains how to define, index and query this new field.
Relates #20257
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815
It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
Currently global ordinals are documented under `fielddata`. It moves them to
their own file since they also work with doc values and fielddata is on the way
out.
Closes#23101
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.
Closes#23674
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.
One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
This commit adds support for indexing and searching a new ip_range field type. Both IPv4 and IPv6 formats are supported. Tests are updated and docs are added.
Adds CONSOLE to cross-cluster-search docs but skips them for testing
because we don't have a second cluster set up. This gets us the
`VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they
are valid yaml (not json, technically) but doesn't get testing.
Which is better than we had before.
Adds CONSOLE to the dynamic templates docs and ingest-node docs.
The ingest-node docs contain a *ton* of non-console snippets. We
might want to convert them to full examples later, but that can be
a separate thing.
Relates to #18160
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
Add option "enable_position_increments" with default value true.
If option is set to false, indexed value is the number of tokens
(not position increments count)
Before now ranges where forbidden, because the percolator query itself could get cached and then the percolator queries with now ranges that should no longer match, incorrectly will continue to match.
By disabling caching when the `percolator` is being used, the percolator can now correctly support range queries with now based ranges.
I think this is the right tradeoff. The percolator query is likely to not be the same between search requests and disabling range queries with now ranges really disabled people using the percolator for their use cases.
Also fixed an issue that existed in the percolator fieldmapper, it was unable to find forbidden queries inside `dismax` queries.
Closes#23859
Fielddata can no longer be configured to be loaded eagerly (it only accepts
`true` and `false`), so this line is a little misleading because it talks about
a procedure we can no longer do.
Turns the top example in each of the geo aggregation docs into a working
example that can be opened in CONSOLE. Subsequent examples can all also
be opened in console and will work after you've run the first example.
All examples are tested as part of the build.
This commit adds the boolean similarity scoring from Lucene to
Elasticsearch. The boolean similarity provides a means to specify that
a field should not be scored with typical full-text ranking algorithms,
but rather just whether the query terms match the document or not.
Boolean similarity scores a query term equal to its query boost only.
Boolean similarity is available as a default similarity option and thus
a field can be specified to have boolean similarity by declaring in its
mapping:
"similarity": "boolean"
Closes#6731
Since `_all` is now deprecated and cannot be set for new indices, we should also
disallow any field that has the `include_in_all` parameter set.
Resolves#22923
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
This change makes it possible for custom routing values to go to a subset of shards rather than
just a single shard. This enables the ability to utilize the spatial locality that custom routing can
provide while mitigating the likelihood of ending up with an imbalanced cluster or suffering
from a hot shard.
This is ideal for large multi-tenant indices with custom routing that suffer from one or both of
the following:
- The big tenants cannot fit into a single shard or there is so many of them that they will likely
end up on the same shard
- Tenants often have a surge in write traffic and a single shard cannot process it fast enough
Beyond that, this should also be useful for use cases where most queries are done under the context
of a specific field (e.g. a category) since it gives a hint at how the data can be stored to minimize
the number of shards to check per query. While a similar solution can be achieved with multiple
concrete indices or aliases per value today, those approaches breakdown for high cardinality fields.
A partitioned index enforces that mappings have routing required, that the partition size does not
change when shrinking an index (the partitions will shrink proportionally), and rejects mappings
that have parent/child relationships.
Closes#21585
* Grammatical correction
* Add note for legacy string mapping type
* Update truncate token filter to not mention the keyword tokenizer
The advice predates the existence of the keyword field
Closes#22650
This change disables the _all meta field by default.
Now that we have the "all-fields" method of query execution, we can save both
indexing time and disk space by disabling it.
_all can no longer be configured for indices created after 6.0.
Relates to #20925 and #21341Resolves#19784
This adds a new `normalizer` property to `keyword` fields that pre-processes the
field value prior to indexing, but without altering the `_source`. Note that
only the normalization components that work on a per-character basis are
applied, so for instance stemming filters will be ignored while lowercasing or
ascii folding will be applied.
Closes#18064
Our `float`/`double` fields generally assume that `-0` compares less than `+0`,
except when bounds are exclusive: an exclusive lower bound on `-0` excludes
`+0` and an exclusive upper bound on `+0` excludes `-0`.
Closes#22167
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.
Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.
When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
This changes only the query parsing behavior to be strict when searching on
boolean values. We continue to accept the variety of values during index time,
but searches will only be parsed using `"true"` or `"false"`.
Resolves#21545
* Allows for an array of index template patterns to be provided to an
index template, and rename the field from 'template' to 'index_pattern'.
Closes#20690
With the cut over to LatLonPoint the geohash, geohash_precision, lat_lon, and geohash_prefix parameters have been removed. This commit fixes the doc build by removing the remaining dangling references to these removed parameters.
This includes:
- All regular numeric types such as int, long, scaled-float, double, etc
- IP addresses
- Dates
- Geopoints and Geoshapes
Relates to #19784
This changes Elasticsearch to automatically downgrade `text` and
`keyword` fields into appropriate `string` fields when changing the
mapping of indexes imported from 2.x. This allows users to use the
modern, documented syntax against 2.x indexes. It also makes it clear
that reindexing in order to recreate the index in 5.0 is required for
any long lived indexes. This change is useful for the times when you
can't (cluster is just starting, not stable enough for reindex) or
shouldn't (index will only live 90 days or something).
Fix field examples to make documents actually visible
This commit adds refresh calls to field examples an removes not working
`_routing` and `_field_names` script access.
Closes#20118
This includes:
- All regular numeric types such as int, long, scaled-float, double, etc
- IP addresses
- Dates
- Geopoints and Geoshapes
Relates to #19784
Adds `warnings` syntax to the yaml test that allows you to expect
a `Warning` header that looks like:
```
- do:
warnings:
- '[index] is deprecated'
- quotes are not required because yaml
- but this argument is always a list, never a single string
- no matter how many warnings you expect
get:
index: test
type: test
id: 1
```
These are accessible from the docs with:
```
// TEST[warning:some warning]
```
This should help to force you to update the docs if you deprecate
something. You *must* add the warnings marker to the docs or the build
will fail. While you are there you *should* update the docs to add
deprecation warnings visible in the rendered results.
This is a tentative to revive #15939 motivated by elastic/beats#1941.
Half-floats are a pretty bad option for storing percentages. They would likely
require 2 bytes all the time while they don't need more than one byte.
So this PR exposes a new `scaled_float` type that requires a `scaling_factor`
and internally indexes `value*scaling_factor` in a long field. Compared to the
original PR it exposes a lower-level API so that the trade-offs are clearer and
avoids any reference to fixed precision that might imply that this type is more
accurate (actually it is *less* accurate).
In addition to being more space-efficient for some use-cases that beats is
interested in, this is also faster that `half_float` unless we can improve the
efficiency of decoding half-float bits (which is currently done using software)
or until Java gets first-class support for half-floats.
If there are percolator queries containing `range` queries with ranges based on the current time then this can lead to incorrect results if the `percolate` query gets cached. These ranges are changing each time the `percolate` query gets executed and if this query gets cached then the results will be based on how the range was at the time when the `percolate` query got cached.
The ExtractQueryTermsService has been renamed `QueryAnalyzer` and now only deals with analyzing the query (extracting terms and deciding if the entire query is a verified match) . The `PercolatorFieldMapper` is responsible for adding the right fields based on the analysis the `QueryAnalyzer` has performed, because this is highly dependent on the field mappings. Also the `PercolatorFieldMapper` is responsible for creating the percolate query.
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
They have been implemented in https://issues.apache.org/jira/browse/LUCENE-7289.
Ranges are implemented so that the accuracy loss only occurs at index time,
which means that if you are searching for values between A and B, the query will
match exactly all documents whose value rounded to the closest half-float point
is between A and B.
`doc_values` for _type field are created but any attempt to load them throws an IAE.
This PR re-enables `doc_values` loading for _type, it also enables `fielddata` loading for indices created between 2.0 and 2.1 since doc_values were disabled during that period.
It also restores the old docs that gives example on how to sort or aggregate on _type field.
Remove the arbitrary limit on epoch_millis and epoch_seconds of 13 and 10
characters, respectively. Instead allow any character combination that can
be converted to a Java Long.
Update the docs to reflect this change.
* Docs: First pass at improving analyzer docs
I've rewritten the intro to analyzers plus the docs
for all analyzers to provide working examples.
I've also removed:
* analyzer aliases (see #18244)
* analyzer versions (see #18267)
* snowball analyzer (see #8690)
Next steps will be tokenizers, token filters, char filters
* Fixed two typos
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.
By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.
Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.
See docs/README.asciidoc for lots more.
Closes#12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
* Added an extra `field` parameter to the `percolator` query to indicate what percolator field should be used. This must be an existing field in the mapping of type `percolator`.
* The `.percolator` type is now forbidden. (just like any type that starts with a `.`)
This only applies for new indices created on 5.0 and later. Indices created on previous versions the .percolator type is still allowed to exist.
The new `percolator` field type isn't active in such indices and the `PercolatorQueryCache` knows how to load queries from these legacy indices.
The `PercolatorQueryBuilder` will not enforce that the `field` parameter is of type `percolator`.
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.
Notes about how the change works:
- Numeric mappers have been split into a legacy version that is essentially
the current mapper, and a new version that uses points, eg.
LegacyDateFieldMapper and DateFieldMapper.
- Since new and old fields have the same names, the decision about which one
to use is made based on the index creation version.
- If you try to force using a legacy field on a new index or a field that uses
points on an old index, you will get an exception.
- IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
and sortable).
- The internal MappedFieldType that is stored by the new mappers does not have
any of the points-related properties set. Instead, it keeps setting the index
options when parsing the `index` property of mappings and does
`if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
when parsing documents.
Known issues that won't fix:
- You can't use numeric fields in significant terms aggregations anymore since
this requires document frequencies, which points do not record.
- Term queries on numeric fields will now return constant scores instead of
giving better scores to the rare values.
Known issues that we could work around (in follow-up PRs, this one is too large
already):
- Range queries on `ip` addresses only work if both the lower and upper bounds
are inclusive (exclusive bounds are not exposed in Lucene). We could either
decide to implement it, or drop range support entirely and tell users to
query subnets using the CIDR notation instead.
- Since IP addresses now use a different representation for doc values,
aggregations will fail when running a terms aggregation on an ip field on a
list of indices that contains both pre-5.0 and 5.0 indices.
- The ip range aggregation does not work on the new ip field. We need to either
implement range aggs for SORTED_SET doc values or drop support for ip ranges
and tell users to use filters instead. #17700Closes#16751Closes#17007Closes#11513
The doc mentions match_path in one place but the correct syntax is path_match which is mentioned everywhere else. Using the wrong string leads to errors because the mapping becomes too greedy, and matches things it shouldn't.
This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable
Closes#11443
This commit updates the documentation for GeoPointField by removing all references to the coerce and doc_values parameters. DocValues are enabled in lucene GeoPointField by default (required for boundary filtering). The QueryBuilders are updated to automatically normalize points (ignoring the coerce parameter) for any index created onOrAfter version 2.2.
Warmers are now barely useful and will be removed in 3.0. Note that this only
removes the warmer API and query-based warmers. We still have warmers internally
for eg. global ordinals.
Close#15607
This commit adds the following:
* SpatialStrategy documentation to the geo-shape reference docs.
* Updates relation documentation to geo-shape-query reference docs.
* Updates GeoShapeFiledMapper to set points_only to true if TERM strategy is used (to be consistent with documentation)
Some users may already be familiar with column stores, so saying more explicitly
that doc values are a columnar representation of the data may help them better
and/or more quickly understand what doc values are about.
detect_noop is pretty cheap and noop updates compartively expensive so this
feels like a sensible default.
Also had to do some testing and documentation around how _ttl works with
detect_noop.
Closes#11282
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.
Also postition_offset_gaps less than 0 aren't allowed at all.
New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through
Closes#7268
This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its
defaults so that values will not be indexed by default, as the only purpose
of this field is to speed up `cardinality` aggregations on high-cardinality
string fields, which only requires doc values.
I also removed the `rehash` option from the `cardinality` aggregation as it
doesn't bring much value (rehashing is cheap) and allowed to remove the
coupling between the `cardinality` aggregation and the `murmur3` field.
Close#12874
* Centralised plugin docs in docs/plugins/
* Moved integrations into same docs
* Moved community clients into the clients section of the docs
* Removed docs/community
Closes#11734Closes#11724Closes#11636Closes#11635Closes#11632Closes#11630Closes#12046Closes#12438Closes#12579
The `_index` field is now a completely virtual field thanks
to #12027. It is no longer necessary to index the actual value
of the index name.
closes#12329
ignore_above is used to guard against the lucene limitation
that a term cannot exceed 32766 bytes.
However, the implementation just used the character count, which
doesn't take into account the fact that some characters have
multi-byte utf-8 encodings.
This commit updates the docs to make this relationship clear.
Closes#11563
If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits. Same applies for month, day, hour, minute, seconds.
Also the new default is `strictDateOptionalTime` for indices created
with Elasticsearch 2.0 or newer.
In addition a couple of not exposed date formats have been exposed, as they
have been mentioned in the documentation.
Closes#6158
The work around for resolving `now` doesn't need to be used for aliases, becuase alias filters are parsed at search time. However it can't be removed, because the percolator relies on it.
Parent/child can be specified again in alias filters, this now works again because alias filters are parsed at search time. Parent/child will also use the late query parse work around, to make sure to do the final preparations when the search context is around. This allows the aliases api to validate the parent/child queries without failing because there is no search context.
Closes#10485
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.
Relates #10971
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0. Users who need to
adjust these settings can use a date field.
This fixes an issue to allow for negative unix timestamps.
An own printer for epochs instead of just having a parser has been added.
Added docs that only 10/13 length unix timestamps are supported
Added docs in upgrade documentation
Fixes#11478
This commit changes the date handling. First and foremost Elasticsearch
does not try to convert every date to a unix timestamp first and then
uses the configured date. This now allows for dates like `2015121212` to
be parsed correctly.
Instead it is now explicit by adding a `epoch_second` and `epoch_millis`
date format. This also means, that the default date format now is
`epoch_millis||dateOptionalTime` to remain backwards compatible.
Closes#5328
Relates #10971
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more.
* Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory.
Breaking changes:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.
All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used.
It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script.
Closes#6107Closes#8134
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs
Closes#11091
This option is broken currently since it potentially interprets an incoming
binary value as compressed while it just happens that the first bytes are the
same as the LZF header.
Meta fields were locked down to not allow exotic options to the
underlying field types in #8143. This change fixes the docs
to no longer refer to the old settings.
closes#10879
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
Current features (eg. update API) and future features (eg. reindex API)
depend on _source. This change locks down the field so that
it can no longer be disabled. It also removes legacy settings
compress/compress_threshold.
closes#8142closes#10915
Using files that must be specified on each node is an anti-pattern
from the API based goal of ES. This change removes the ability
to specify the default mapping with a file on each node.
closes#10620
Regardless of the outcome of #8142, we should at least enforce that
when _source is enabled, it is sufficient to reindex. This change
removes the excludes and includes settings, since these modify
the source, causing us to lose the ability to reindex some fields.
closes#10814
If a user explicitly defined the tree_level or precision parameter in a geo_shape mapping their specification was always overridden by the default_error_pct parameter (even though our docs say this parameter is a 'hint'). This lead to unexpected accuracy problems in the results of a geo_shape filter. (example provided in issue #9691)
This simple patch fixes the unexpected behavior by setting the default distance_error_pct parameter to zero when the tree_level or precision parameters are provided by the user. Under the covers the quadtree will now use the tree level defined by the user. The docs will be updated to alert the user to exercise caution with these parameters. Specifying a precision of "1m" for an index using large complex shapes can quickly lead to OOM issues.
closes#9691
We had an undocumented parameter called `numeric_resolution` which allows to
configure how to deal with dates when provided as a number. The default is to
handle them as milliseconds, but you can also opt-on for eg. seconds.
Close#10072
Documentation states false as the default for "validate", "validate_lon", and "validate_lat" leading to confusion as described in issue #9539. This simple fix corrects the documentation and communicates that these fields will be deprecated and removed in upcoming versions.
closes#9539
As explained in elasticsearch/elasticsearch-mapper-attachments#101, we should have consistent documentation.
The best option is to link the documentation in elasticsearch guide to the most recent README in the plugin repo.
Closes#9756
_id and _routing now no longer support the 'path' setting on indexes
created with 2.0. Indexes created before 2.0 still support this
setting for backcompat.
closes#6730
The `analyzer` setting is now the base setting, and `search_analyzer`
is simply an override of the search time analyzer. When setting
`search_analyzer`, `analyzer` must be set.
closes#9371
Related to #9049.
By default, the default value for `timestamp` is `now` which means the date the document was processed by the indexing chain.
You can now reject documents which not provide a `timestamp` value by setting `ignore_missing` to false (default to `true`):
```js
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"ignore_missing" : false
}
}
}
```
When you update the cluster to 1.5 or master, this index created with 1.4 we automatically migrate an index created with 1.4 to the 1.5 syntax.
Let say you have defined this in elasticsearch 1.4.x:
```js
DELETE test
PUT test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
PUT test/type/_mapping
{
"type" : {
"_timestamp" : {
"enabled" : true,
"default" : null
}
}
}
```
After migration, the mapping become:
```js
{
"test": {
"mappings": {
"type": {
"_timestamp": {
"enabled": true,
"store": false,
"ignore_missing": false
},
"properties": {}
}
}
}
}
```
Closes#8882.
This feature adds an optional orientation parameter to the GeoJSON document and geo_shape mapping enabling users to explicitly define how they want Elasticsearch to interpret vertex ordering. The default uses the right-hand rule (counterclockwise for outer ring, clockwise for inner ring) complying with OGC Simple Feature Access standards. The parameter can be explicitly specified for an entire index using the geo_shape mapping by adding "orientation":{"left"|"right"|"cw"|"ccw"|"clockwise"|"counterclockwise"} and/or overridden on each insert by adding the same parameter to the GeoJSON document.
closes#8764
The setting `mapping.date.round_ceil` (and the undocumented setting
`index.mapping.date.parse_upper_inclusive`) affect how date ranges using
`lte` are parsed. In #8556 the semantics of date rounding were
solidified, eliminating the need to have different parsing functions
whether the date is inclusive or exclusive.
This change removes these legacy settings and improves the tests
for the date math parser (now at 100% coverage!). It also removes the
unnecessary function `DateMathParser.parseTimeZone` for which
the existing `DateTimeZone.forID` handles all use cases.
Any user previously using these settings can refer to the changed
semantics and change their query accordingly. This is a breaking change
because even dates without datemath previously used the different
parsing functions depending on context.
closes#8598closes#8889
Storing `_timestamp` by default means that under the default configuration, you
would have all the information you need in order to reindex into a different
index.
Close#8139
It is strange to provide an example with `"store" : false` when talking about possibility of enabling the field to be stored.
Broke the line in the mapping in two lines for better readability.
More verbose sentence above the mapping.
Closes#7894
This documentation was dangerous because it felt like it was possible to gain
substantial performance by just switching the codec of the index.
However, non-default codecs are dangerous to use since they are not supported
in terms of backward compatibility, and most improvements that they bring have
been folded into the default codec anyway (for example, the default codec
"pulses" postings lists that contain a single document).
Index process fails when having `_timestamp` enabled and `path` option is set.
It fails with a `TimestampParsingException[failed to parse timestamp [null]]` message.
Reproduction:
```
DELETE test
PUT test
{
"mappings": {
"test": {
"_timestamp" : {
"enabled" : "yes",
"path" : "post_date"
}
}
}
}
PUT test/test/1
{
"foo": "bar"
}
```
You can define a default value for when timestamp is not provided
within the index request or in the `_source` document.
By default, the default value is `now` which means the date the document was processed by the indexing chain.
You can disable that default value by setting `default` to `null`. It means that `timestamp` is mandatory:
```
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"default" : null
}
}
}
```
If you don't provide any timestamp value, indexation will fail.
You can also set the default value to any date respecting timestamp format:
```
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"format" : "YYYY-MM-dd",
"default" : "1970-01-01"
}
}
}
```
If you don't provide any timestamp value, indexation will fail.
Closes#4718.
Closes#7036.
The `exists` and `missing` filters need to merge postings lists of all existing
terms, which can be very costly, especially on high-cardinality fields. This
commit indexes the field names of a document under `_field_names` and reuses it
to speed up the `exists` and `missing` filters.
This is only enabled for indices that are created on or after Elasticsearch
1.3.0.
Close#5659
Update `geo-shape-type.asciidoc` to include all `GeoShapeType`s supported by the `org.elasticsearch.common.geo.builders.ShapeBuilder`.
Changes include:
1. A tabular mapping of GeoJSON types to Elasticsearch types
2. Listing all types, with brief examples, for all support Elasticsearch types
3. Putting non-standard types to the bottom (really just moving Envelope to the bottom)
4. Linking to all GeoJSON types.
5. Adding whitespace around tightly nested arrays (particularly `multipolygon`) for readability
Change the default numeric precision_step to 16 for 64-bit types,
8 for 32-bit and 16-bit types. Disable precision_step for the 8-bit
byte type.
Closes#5905
The `field_value_factor` function uses the value of a field in the
document to influence the score.
A query that looks like:
{
"query": {
"function_score": {
"query": {"match": { "body": "foo" }},
"functions": [
{
"field_value_factor": {
"field": "popularity",
"factor": 1.1,
"modifier": "square"
}
}
],
"score_mode": "max",
"boost_mode": "sum"
}
}
}
Would have the score modified by:
square(1.1 * doc['popularity'].value)
Closes#5519
Currently, boosting on `copy_to` is misleading and does not work as originally specified in #4520. Instead of boosting just the terms from the origin field, it boosts the whole destination field. If two fields copy_to a third field, one with a boost of 2 and another with a boost of 3, all the terms in the third field end up with a boost of 6. This was not the intention.
The alternative: to store the boost in a payload for every term, results in poor performance and inflexibility. Instead, users should either (1) query the common field AND the field that requires boosting, or (2) the multi_match query will soon be able to perform term-centric cross-field matching that will allow per-field boosting at query time (coming in 1.1).
Removed unused misc.asciidoc file
Added plugins directory to directory layout
Fixed transport.tcp.connect_timeout value to match the code found in NetworkService.TcpSettings
Clarified that phrase query does not preserve order of terms
Clarified merge page
Added instructions on how to build documentation to docs/README
When set to false a new strict mode of parsing is employed which
a) does not permit numbers to be passed as JSON strings in quotes
b) rejects numbers with fractions that are passed to integer, short or long fields.
Closes#4117
When upgrading to ES 1.0 the existing mappings with a multi-field type automatically get replaced to a core field with the new `fields` option.
If a `multi_field` type-ed field doesn't have a main / default field, a default field will be chosen for the multi fields syntax. The new main field type
will be equal to the first `multi_field` fields' field or type string if no fields have been configured for the `multi_field` field and in both cases
the default index will not be indexed (`index=no` is set on the default field).
If a `multi_field` typed field has a default field, that field will replace the `multi_field` typed field.
Closes to #4521