Commit Graph

93 Commits

Author SHA1 Message Date
Alexander Reelsen b612cab96a Dates: More strict parsing of ISO dates
If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits. Same applies for month, day, hour, minute, seconds.

Also the new default is `strictDateOptionalTime` for indices created
with Elasticsearch 2.0 or newer.

In addition a couple of not exposed date formats have been exposed, as they
have been mentioned in the documentation.

Closes #6158
2015-07-07 09:34:37 +02:00
Martijn van Groningen c6ae6fc6d9 percolator: `getTime` -> `time` 2015-06-30 18:44:58 +02:00
Alexander Reelsen 23cf9af495 Dates: Be backwards compatible with pre 2.x indices
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.

Relates #10971
2015-06-25 17:21:29 +02:00
David Pursehouse b49e66c3a1 Replace references to ImmutableSettings with Settings
ImmutableSettings was merged into Settings in commit 4873070.

Change-Id: I06bd0150381d131593920c2328c46beacf49661f
2015-06-24 14:54:53 +09:00
Ryan Ernst 12e7cbe92b Mappings: Lockdown _timestamp
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0.  Users who need to
adjust these settings can use a date field.
2015-06-22 10:21:03 -07:00
Alexander Reelsen 38ddc8159c Dates: Allow for negative unix timestamps
This fixes an issue to allow for negative unix timestamps.
An own printer for epochs instead of just having a parser has been added.
Added docs that only 10/13 length unix timestamps are supported
Added docs in upgrade documentation

Fixes #11478
2015-06-22 11:56:31 +02:00
Adrien Grand ac7ce2b899 Rivers removal.
While we had initially planned to keep rivers around in 2.0 to ease migration,
keeping support for rivers is challenging as it conflicts with other important
changes that we want to bring to 2.0 like synchronous dynamic mappings updates.
Nothing impossible to fix, but it would increase the complexity of how we
deal with dynamic mappings updates and manage rivers, while handling dynamic
mappings updates correctly is important for resiliency and rivers are on the go.
So removing rivers in 2.0 may well be a better trade-off.
2015-06-10 09:22:09 +02:00
Alexander Reelsen 3bda78e43b ResourceWatcher: Rename settings to prevent watcher clash
The ResourceWatcher used settings prefixed `watcher.`, which
potentially could clash with the watcher plugin.

In order to prevent confusion, the settings have been renamed to
`resource.reload` prefixes.

This also uses the deprecation logging infrastructure introduced
in #11033 to log deprecated settings and their alternative at
startup.

Closes #11175
2015-06-09 10:02:49 +02:00
javanna 2ef0fcfd6a Plugins: one single (global) way to register custom query parsers
There are different ways to register custom query parsers through plugins, a couple of them work per index via index settings, which is probably even too flexible. There also three different ways to add a global custom query parser through either IndicesQueriesModule or IndicesQueriesRegistry. This commit consolidates the registration of custom query parsers via IndicesQueriesModule#addQuery(Class<? extends QueryParser>). The complexity of supporting parsers per index is not needed hence it got removed. Also the other ways of registering global custom parsers are dropped in favour of the one mentioned above.

Closes #11481
2015-06-08 12:19:53 +02:00
Adrien Grand 7c698146f5 Rest: Add all meta fields to the top level json document.
Some of our meta fields (such as _id, _version, ...) are returned as top-level
properties of the json document, while other properties (_timestamp, _routing,
...) are returned under `fields`. This commit makes all meta fields returned
as top-level properties.

So eg. `GET test/test/1?fields=_timestamp,foo` would now return

```json
{
   "_index": "test",
   "_type": "test",
   "_id": "1",
   "_version": 1,
   "_timestamp": 10000000,
   "found": true,
   "fields": {
     "foo": [ "bar" ]
   }
}
```

while it used to return

```json
{
   "_index": "test",
   "_type": "test",
   "_id": "1",
   "_version": 1,
   "found": true,
   "fields": {
     "_timestamp": 10000000,
     "foo": [ "bar" ]
   }
}
```
2015-06-04 23:42:17 +02:00
Lee Hinman 65f43970da Default to binding to loopback address
Binds to the address returned by `InetAddress.getLoopbackAddress()`.

Closes #11300
2015-06-04 10:25:49 -06:00
Martijn van Groningen 1cfb6a79f1 Parent/child: refactored _parent field mapper and parent/child queries
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more.
* Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory.

Breaking changes:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.

All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used.

It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script.

Closes #6107
Closes #8134
2015-05-29 21:44:17 +02:00
javanna 6c81a8daf3 Internal: count api to become a shortcut to the search api
The count api used to have its own execution path, although it would do the same (up to bugs!) of the search api. This commit makes it a shortcut to the search api with size set to 0. The change is made in a backwards compatible manner, by leaving all of the java api code around too, given that you may not want to get back a whole SearchResponse when asking only for number of hits matching a query, also cause migrating from countResponse.getCount() to searchResponse.getHits().totalHits() doesn't look great from a user perspective. We can always decide to drop more code around the count api if we want to break backwards compatibility on the java api, making it a shortcut on the rest layer only.

Closes #9117
Closes #11198
2015-05-26 19:12:11 +02:00
Adrien Grand 461683ac58 Mappings: Remove the `compress`/`compress_threshold` options of the BinaryFieldMapper.
This option is broken currently since it potentially interprets an incoming
binary value as compressed while it just happens that the first bytes are the
same as the LZF header.
2015-05-22 14:20:42 +02:00
Igor Motov dd41c68741 Snapshot/Restore: fix FSRepository location configuration
Closes #11068
2015-05-20 22:14:31 -04:00
Adrien Grand 2c241e8a36 Mappings: Remove the `ignore_conflicts` option.
Mappings conflicts should not be ignored. If I read the history correctly, this
option was added when a mapping update to an existing field was considered a
conflict, even if the new mapping was exactly the same. Now that mapping updates
are smart enough to detect conflicting options, we don't need an option to
ignore conflicts.
2015-05-18 15:28:23 +02:00
javanna a843008b17 Highlighting: require_field_match set to true by default
The default `false` for `require_field_match` is a bit odd and confusing for users, given that field names get ignored by default and every field gets highlighted if it contains terms extracted out of the query, regardless of which fields were queries. Changed the default to `true`, it can always be changed per request.

Closes #10627
Closes #11067
2015-05-15 21:38:45 +02:00
javanna 46c521f7ec Highlighting: nuke XPostingsHighlighter
Our own fork of the lucene PostingsHighlighter is not easy to maintain and doesn't give us any added value at this point. In particular, it was introduced to support the require_field_match option and discrete per value highlighting, used in case one wants to highlight the whole content of a field, but get back one snippet per value. These two features won't
 make it into lucene as they slow things down and shouldn't have been supported from day one on our end probably.

One other customization we had was support for a wider range of queries via custom rewrite etc. (yet another way to slow
 things down), which got added to lucene and works much much better than what we used to do (instead of or rewrite, term
s are pulled out of the automata for multi term queries).

Removing our fork means the following in terms of features:
- dropped support for require_field_match: the postings highlighter will only highlight fields that were queried
- some custom es queries won't be supported anymore, meaning they won't be highlighted. The only one I found up until now is the phrase_prefix. Postings highlighter rewrites against an empty reader to avoid slow operations (like the ones that we were performing with the fork that we are removing here), thus the prefix will not be expanded to any term. What the postings highlighter does instead is pulling the automata out of multi term queries, but this is not supported at the moment with our MultiPhrasePrefixQuery.

Closes #10625
Closes #11077
2015-05-15 20:41:33 +02:00
Jun Ohtani 597c53a0bb Add migrationi note for AnalyzeRequest 2015-05-16 00:25:53 +09:00
Martijn van Groningen ece18f162e Removed `id_cache` from stats and cat apis.
Also removed the `id_cache` option from the clear cache api.

Closes #5269
2015-05-15 14:06:18 +02:00
Adrien Grand 630757906a Query DSL: Add `filter` clauses to `bool` queries.
These clauses filter the document space without affecting scoring and map to
Lucene's BooleanClause.Occur.FILTER. The `filtered` query is now deprecated and

```json
{
  "filtered": {
    "query": { //query },
    "filter": { //filter }
  }
}
```
should be replaced with
```json
{
  "bool": {
    "must": { //query },
    "filter": { //filter }
  }
}
```
2015-05-13 12:04:56 +02:00
Ryan Ernst f766b260ba Add tests for includeInObject backcompat 2015-05-12 23:11:15 -07:00
Ryan Ernst 565ffb16f1 Mappings: Remove ability to set meta fields inside documents
A few meta fields can currently be set within a document's source.
However, the recommended way to set meta fields like this is through
the api, and setting within the document can be a performance trap
(e.g. needing to find _id in order to route the document).

This change removes the ability to set meta fields within
a document source for 2.0+ indexes.

closes #11051
closes #11074
2015-05-12 23:09:03 -07:00
Ryan Ernst e7618b8528 Settings: Remove file based index templates
As a follow up to #10870, this removes support for
index templates on disk. It also removes a missed
place still allowing disk based mappings.

closes #11052
2015-05-11 12:51:22 -07:00
Martijn van Groningen acdd9a5dd9 parent/child: Removed the `top_children` query. 2015-05-10 16:30:19 +02:00
Lee Hinman c6747ded16 Truncate log messages at 10,000 characters 2015-05-08 10:10:44 -06:00
Adrien Grand a0af88e996 Query DSL: Remove filter parsers.
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
2015-05-07 20:14:34 +02:00
Alex Ksikes 4787cf701f More Like This: remove percent_terms_to_match
Users should use minimum_should_match instead.

Closes #11030
2015-05-07 14:21:29 +02:00
Alex Ksikes ec4f12f9ef More Like This: removal of the MLT API
Removes the More Like This API, users should now use the More Like This query.
The MLT API tests were converted to their query equivalent. Also some clean
ups in MLT tests.

Closes #10736
Closes #11003
2015-05-06 18:11:11 +02:00
Ryan Ernst 7a7bd6086a Mappings: Remove ability to disable _source field
Current features (eg. update API) and future features (eg. reindex API)
depend on _source. This change locks down the field so that
it can no longer be disabled. It also removes legacy settings
compress/compress_threshold.

closes #8142
closes #10915
2015-05-05 22:04:18 -07:00
Clinton Gormley e28ad853c7 Docs: Fixed bad asciidoc in migrate_2_0 2015-05-05 11:17:21 +02:00
Pascal Borreli af6d890ad5 Docs: Fixed typos
Closes #10973
2015-05-05 10:38:05 +02:00
Shay Banon 187d79b6df Centralize admin implementations and action execution
This change removes the multiple implementations of different admin interfaces and centralizes it with AbstractClient. It also makes sure *all* executions of actions now go through a single AbstractClient#execute method, taking care of copying headers and wrapping listener.
This also has the side benefit of removing all the code around differnet possible clients, and removes quite a bit of code (most of the + code is actually removal of generics and such).

This change also changes how TransportClient is constructed, requiring a Builder to create it, its a breaking change and its noted in the migration guide.

Yea another step towards simplifying the action infra and making it simpler...
2015-05-04 23:40:17 +02:00
Robert Muir 4b3672b7df Add migration note for hunspell dictionaries 2015-05-04 10:00:05 -04:00
Adrien Grand b72f27a410 Core: Cut over to the Lucene filter cache.
This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:
 - custom cache keys (`_cache_key`) are unsupported
 - decisions are made internally and can't be overridden by users ('_cache`)
 - not only filters can be cached but also all queries that do not need scores
 - parent/child queries can now be cached, however cached entries are only
   valid for the current top-level reader so in practice it will likely only
   be used on read-only indices
 - the cache deduplicates filters, which plays nicer with large keys (eg. `terms`)
 - better stats: we already had ram usage and evictions, but now also hit count,
   miss count, lookup count, number of cached doc id sets and current number of
   doc id sets in the cache
 - dynamically changing the filter cache size is not supported anymore

Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).

Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.
2015-05-04 09:02:15 +02:00
Ryan Ernst 4ef9f3ca63 Mappings: Remove file based default mappings
Using files that must be specified on each node is an anti-pattern
from the API based goal of ES. This change removes the ability
to specify the default mapping with a file on each node.

closes #10620
2015-04-30 13:50:35 -07:00
Adrien Grand e5be85d586 Aggs: Change the default `min_doc_count` to 0 on histograms.
The assumption is that gaps in histogram are generally undesirable, for instance
if you want to build a visualization from it. Additionally, we are building new
aggregations that require that there are no gaps to work correctly (eg.
derivatives).
2015-04-30 15:48:23 +02:00
Simon Willnauer 94d8b20611 Add multi data.path to migration guide
this commit removes the obsolete settings for distributors and updates
the documentation on multiple data.path. It also adds an explain to the
migration guide.

Relates to #9498
Closes #10770
2015-04-29 11:51:37 +02:00
Ryan Ernst bf09e58cb3 Mappings: Remove includes and excludes from _source
Regardless of the outcome of #8142, we should at least enforce that
when _source is enabled, it is sufficient to reindex. This change
removes the excludes and includes settings, since these modify
the source, causing us to lose the ability to reindex some fields.

closes #10814
2015-04-28 15:03:51 -07:00
javanna c914134355 Scripting: remove groovy sandbox
Groovy sandboxing was disabled by default from 1.4.3 on though since we found out that it could be worked around, so it makes little sense to keep it and maintain it.

Closes #10156
Closes #10480
2015-04-28 11:27:50 +02:00
Jun Ohtani 933edf7bcc Analysis: Fix wrong position number by analyze API
Add breaking chages comment to migrate docs
Fix the stopword included text using stopword filter
2015-04-28 17:44:41 +09:00
Simon Willnauer d164526d27 Remove `_shutdown` API
Thsi commit removes the `_shutdown` API entirely without any replacement.
Nodes should be managed from the operating system not via REST APIs
2015-04-27 17:19:36 +02:00
markharwood 1b8b993912 Query enhancement: Enable Lucene ranking behaviour for queries on numeric fields.
This changes the default ranking behaviour of single-term queries on numeric fields to use the usual Lucene TermQuery scoring logic rather than a constant-scoring wrapper.

Closes #10628
2015-04-27 09:42:55 +01:00
Ryan Ernst 1f5bdca8cc Mappings: Restrict murmur3 field type to sane options
Disabling doc values or trying to index hash values are not
correct uses of this the murmur3 field type, and just cause
problems.  This disallows changing doc values or index options
for 2.0+.

closes #10465
2015-04-23 21:48:42 -07:00
Igor Motov 60721b2a17 Snapshot/Restore: remove obsolete expand_wildcards_open and expand_wildcards_close options
In #6097 we made snapshot/restore index option consistent with other API. Now we can remove old style options from master.

Closes #10743
2015-04-23 13:29:24 -04:00
Adrien Grand d7abb12100 Replace deprecated filters with equivalent queries.
In Lucene 5.1 lots of filters got deprecated in favour of equivalent queries.
Additionally, random-access to filters is now replaced with approximations on
scorers. This commit
 - replaces the deprecated NumericRangeFilter, PrefixFilter, TermFilter and
   TermsFilter with NumericRangeQuery, PrefixQuery, TermQuery and TermsQuery,
   wrapped in a QueryWrapperFilter
 - replaces XBooleanFilter, AndFilter and OrFilter with a BooleanQuery in a
   QueryWrapperFilter
 - removes DocIdSets.isBroken: the new two-phase iteration API will now help
   execute slow filters efficiently
 - replaces FilterCachingPolicy with QueryCachingPolicy

Close #8960
2015-04-21 15:32:43 +02:00
Clinton Gormley ab3fa78ae0 Docs: Reverte migration docs mentioning parent removal from update request
Relates to #9612
2015-04-13 16:35:21 +02:00
Adrien Grand 5b3cc2f07c Search: deprecate the limit filter.
This is really a Collector instead of a filter. This commit deprecates the
`limit` filter, makes it a no-op and recommends to use the `terminate_after`
parameter instead that we introduced in the meantime.
2015-04-10 17:18:50 +02:00
Adrien Grand 919589b908 Queries: Remove fuzzy-like-this support.
The fuzzy-like-this query builds very expensive queries and only serves esoteric
use-cases.
2015-04-10 17:16:02 +02:00
Adrien Grand aecd9ac515 Aggregations: Speed up include/exclude in terms aggregations with regexps.
Today we check every regular expression eagerly against every possible term.
This can be very slow if you have lots of unique terms, and even the bottleneck
if your query is selective.

This commit switches to Lucene regular expressions instead of Java (not exactly
the same syntax yet most existing regular expressions should keep working) and
uses the same logic as RegExpQuery to intersect the regular expression with the
terms dictionary. I wrote a quick benchmark (in the PR) to make sure it made
things faster and the same request that took 750ms on master now takes 74ms with
this change.

Close #7526
2015-04-09 12:12:56 +02:00