Commit Graph

10606 Commits

Author SHA1 Message Date
Adrien Grand ce11e0ee6d Filter cache: add a `_cache: auto` option and make it the default.
Up to now, all filters could be cached using the `_cache` flag that could be
set to `true` or `false` and the default was set depending on the type of the
`filter`. For instance, `script` filters are not cached by default while
`terms` are. For some filters, the default is more complicated and eg. date
range filters are cached unless they use `now` in a non-rounded fashion.

This commit adds a 3rd option called `auto`, which becomes the default for
all filters. So for all filters a cache wrapper will be returned, and the
decision will be made at caching time, per-segment. Here is the default logic:
 - if there is already a cache entry for this filter in the current segment,
   then return the cache entry.
 - else if the doc id set cannot iterate (eg. script filter) then do not cache.
 - else if the doc id set is already cacheable and it has been used twice or
   more in the last 1000 filters then cache it.
 - else if the filter is costly (eg. multi-term) and has been used twice or more
   in the last 1000 filters then cache it.
 - else if the doc id set is not cacheable and it has been used 5 times or more
   in the last 1000 filters, then load it into a cacheable set and cache it.
 - else return the uncached set.

So for instance geo-distance filters and script filters are going to use this
new default and are not going to be cached because of their iterators.

Similarly, date range filters are going to use this default all the time, but
it is very unlikely that those that use `now` in a not rounded fashion will get
reused so in practice they won't be cached.

`terms`, `range`, ... filters produce cacheable doc id sets with good iterators
so they will be cached as soon as they have been used twice.

Filters that don't produce cacheable doc id sets such as the `term` filter will
need to be used 5 times before being cached. This ensures that we don't spend
CPU iterating over all documents matching such filters unless we have good
evidence of reuse.

One last interesting point about this change is that it also applies to compound
filters. So if you keep on repeating the same `bool` filter with the same
underlying clauses, it will be cached on its own while up to now it used to
never be cached by default.

`_cache: true` has been changed to only cache on large segments, in order to not
pollute the cache since small segments should not be the bottleneck anyway.
However `_cache: false` still has the same semantics.

Close #8449
2014-12-18 15:51:36 +01:00
Boaz Leskes b9db5b178c Internal: PlainTransportFuture should not set currentThread().interrupt()
We use PlainTransportFuture as a future for our transport calls. If someone blocks on it and it is interrupted, we throw an ElasticsearchIllegalStateException. We should not set  Thread.currentThread().interrupt(); in this case because we already communicate the interrupt through an exception.

Closes #9001
2014-12-18 11:57:12 +01:00
javanna d17db85794 [TEST] upgrade randomized runner to 2.1.11
2.1.11 contains the fix for this issue: https://github.com/carrotsearch/randomizedtesting/issues/179

Closes #8930
2014-12-18 10:40:05 +01:00
Adrien Grand 6d253aba08 Upgrade to lucene-5.0.0-snapshot-1646179. 2014-12-18 09:51:20 +01:00
Boaz Leskes ee7ed387d4 Test: use less shards in SimpleQueryTests 2014-12-18 09:02:51 +01:00
Michael McCandless 242e631e95 Core: ignore known idle threads by default in /_nodes/hot_threads
Add a new ignore_idle_threads boolean option (default true) to
/_nodes/hot_threads, to filter out threads in known idle places like
waiting on a socket select or on pulling the next task from an empty
queue.

Closes #8985

Closes #8908
2014-12-17 11:59:31 -05:00
Adrien Grand f1da788211 Aggregations: reduce histogram buckets on the fly using a priority queue.
This commit makes histogram reduction a bit cleaner by expecting buckets
returned from shards to be sorted by key and merging them on-the-fly on the
coordinating node using a priority queue.

Close #8797
2014-12-17 16:46:16 +01:00
Alex Ksikes 86e1655e4b Term Vectors: support for version and version_type
This commit adds support for version and version_type to the Term Vectors API.
This could be useful in the following case whereby the user gets a document
and later wants to generate its TVs. With version, this would ensure that only
the TVs of that particular document are generated, and error out if the
document has been updated in between.

Closes #7480
2014-12-17 15:43:15 +01:00
Adrien Grand c2695d3d77 Revert "Aggregations: reduce histogram buckets on the fly using a priority queue."
This reverts commit 5694626f79.
2014-12-17 15:41:23 +01:00
Martijn Laarman bc76032fdd Documented the new terminate_after querystring option on search as implemented in #6885 2014-12-17 14:49:05 +01:00
Adrien Grand 5694626f79 Aggregations: reduce histogram buckets on the fly using a priority queue.
This commit makes histogram reduction a bit cleaner by expecting buckets
returned from shards to be sorted by key and merging them on-the-fly on the
coordinating node using a priority queue.

Close #8797
2014-12-17 14:21:00 +01:00
Yasir Bamarni 5059d6fe1c Update percolate.asciidoc
wrong type used in the -GET request

Closes #8942
2014-12-17 14:05:27 +01:00
Pablo Díaz-López adb1a5b43b Update getting-started.asciidoc
Missing -X flag at the curl template

Closes #8977
2014-12-17 14:03:38 +01:00
Peter Johnson a.k.a. insertcoffee 4b5e6b2de0 [docs] pedantry
Closes #8982
2014-12-17 13:46:39 +01:00
Joao Duarte d73f7c90aa doc: transport sniff only adds data nodes 2014-12-17 11:29:01 +00:00
Lee Hinman ddf83a90dd [TEST] Inject IndexSettings, not node Settings objects
Guice was injecting the wrong Settings object
2014-12-17 10:55:13 +01:00
Lee Hinman 853879a121 Revert "Add index.data_path setting"
This reverts commit b2ec19ab36.
2014-12-17 09:39:19 +01:00
Boaz Leskes 8f146f9ab0 Discovery: only retry join when other node is not (yet) a master
When a node tries to join a master, the master may not yet be ready to accept the join request. In such cases we retry sending the join request up to 3 times before going back to ping. To detect this the current logic uses ExceptionsHelper.unwrapCause(t) to unwrap the incoming RemoteTransportException and inspect it's source, looking for ElasticsearchIllegalStateException. However, local ElasticsearchIllegalStateException can also be thrown when the join process should be cancelled (i.e., node shut down). In this case we shouldn't retry.

This commit adds an explicit NotMasterException to indicate the remote node is not a master. A similarly named exception (but meaning something else) in the master fault detection code was given a better name. Also clean up some other exceptions while at it.

Closes #8972
2014-12-16 23:12:46 +01:00
Lee Hinman 154e9d90cd [TEST] Mute IndicesCustomDataPathTests 2014-12-16 23:02:36 +01:00
Adrien Grand a50e3930c9 Terms aggs: Validate the aggregation order on unmapped terms too.
Close #8946
2014-12-16 18:50:37 +01:00
Lee Hinman b2ec19ab36 Add index.data_path setting
This allows specifying the path an index will be at.

`index.data_path` is specified in the settings when creating an index,
and can not be dynamically changed.

An example request would look like:

POST /myindex
{
  "settings": {
    "number_of_shards": 2,
    "data_path": "/tmp/myindex"
  }
}

And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1

Since this can be used to write data to arbitrary locations on disk, it
requires enabling the `node.enable_custom_paths` setting in
elasticsearch.yml on all nodes.
2014-12-16 18:25:21 +01:00
Nicholas Knize 18d56f154c Adding unit tests for clockwise non-OGC ordering
Adding unit tests to validate cw defined polys not-crossing and crossing the dateline, respectively
2014-12-16 10:54:51 -06:00
Nicholas Knize ac0e37449e Adding unit test for self intersecting polygons. Relevant to #7751 even/odd discussion
Updating documentation to describe polygon ambiguity and vertex ordering.
2014-12-16 10:54:39 -06:00
Nicholas Knize 437afd6f45 Adding dateline test with valid lat/lon pairs
Cleanup: Removing unnecessary logic checks
2014-12-16 10:54:28 -06:00
Nicholas Knize 85502ac40a Updating translation gate check to disregard order of hole vertices for non dateline crossing polys.
Updating comments and code readability

Correcting code formatting
2014-12-16 10:54:13 -06:00
Nicholas Knize e9e13d5cfc Computational geometry logic changes to support OGC standards
This commit adds the logic necessary for supporting polygon vertex ordering per OGC standards. Exterior rings will be treated in ccw (right-handed rule) and interior rings will be treated in cw (left-handed rule).  This feature change supports polygons that cross the dateline, and those that span the globe/map.  The unit tests have been updated and corrected to test various situations.  Greater test coverage will be provided in future commits.

Addresses #8672
2014-12-16 10:54:02 -06:00
Nicholas Knize 9466e16e24 Updating connect method to prevent duplicate edges 2014-12-16 10:53:46 -06:00
Nicholas Knize f8f92f816a [GEO] OGC compliant polygons fail with ambiguity
This feature branch implements OGC compliance for Polygon/Multi-polygon.  That is, vertex order for the exterior ring follows the right-hand rule (ccw) and all holes follow the left-hand rule (cw).  While GeoJSON imposes no restrictions, a user that wants to specify a complex poly across the dateline must do so in compliance with the OGC spec, otherwise a polygon that spans the globe will be assumed.

Reference issue #8672

Fix orientation of outer and inner ring for polygon with holes.  Updated unit tests.  Bug exists in boundary condition on negative side of dateline.
2014-12-16 10:53:34 -06:00
Michael McCandless 5910b17ece Add 1.4.3 2014-12-16 09:54:56 -05:00
mikemccand 8017f788e6 Add 1.3.8 version 2014-12-16 09:40:54 -05:00
Alex Ksikes dda33155d6 Indices API: Fix wrong search stats groups
This provides a fix to issue #7644. A new Stats object must be created, and
not a reference to the retrieved stats, before we can add stats to it.
Otherwise, we would keep on adding to the same object on subsequent calls to
IndicesStatsResponse#getPrimaries() or IndicesStatsResponse#getTotal().

Closes #7644 and #8950
2014-12-16 14:31:41 +01:00
Lee Hinman 54f2eae4d8 [TEST] Remove "compressed" field data from numeric formats
The "compressed" format was removed, so this caused warnings in the log
like:

```
[WARN ][index.fielddata          ] [node_0] [test] failed to find format
[compressed] for field [test-num], will use default
```
2014-12-16 12:38:59 +01:00
Lee Hinman 63ee24982f [TEST] Call .cleanUp() on field data cache
Now that we do not automatically call .cleanUp() when clearing the field
data cache, we need to call it after the cache clear in
RandomExceptionCircuitBreakerTests
2014-12-16 12:38:47 +01:00
Simon Willnauer af64a02ed1 Add toString() to IndexShardGateway 2014-12-15 22:53:58 +01:00
Simon Willnauer a834cc0e0f Shutdown indices service last
We do wait for shards to be closed in IndicesService for 30 second.
Yet, if somebody holds on to a store reference ie. an open scroll request
the 30 seconds time-out and node shutdown takes very long. We should
release all other resources first before we shutdown IndicesService.

Closes #8940
2014-12-15 22:43:37 +01:00
Ryan Ernst 37287284e6 Settings: Remove `mapping.date.round_ceil` setting for date math parsing
The setting `mapping.date.round_ceil` (and the undocumented setting
`index.mapping.date.parse_upper_inclusive`) affect how date ranges using
`lte` are parsed.  In #8556 the semantics of date rounding were
solidified, eliminating the need to have different parsing functions
whether the date is inclusive or exclusive.

This change removes these legacy settings and improves the tests
for the date math parser (now at 100% coverage!). It also removes the
unnecessary function `DateMathParser.parseTimeZone` for which
the existing `DateTimeZone.forID` handles all use cases.

Any user previously using these settings can refer to the changed
semantics and change their query accordingly. This is a breaking change
because even dates without datemath previously used the different
parsing functions depending on context.

closes #8598
closes #8889
2014-12-15 13:13:45 -08:00
Timothy Perisho ceafde41e9 Docs: typo on "frequent"
I replaced "high frequent terms" with "high frequency terms" and "low frequent terms" with "low frequency terms".

Alternatively, we could write, "highly frequent terms" and "minimally frequent terms" (or just "rare terms").

Closes #8962
2014-12-15 19:59:50 +01:00
Clinton Gormley fcb83055de Update repositories.asciidoc
Update formatting of PGP key
2014-12-15 18:04:17 +01:00
Lee Hinman 8fbf45ef2b [TEST] Make parent breaker check less strict
In cases of heavy contention, it's possible for more than 2 threads
to race to a circuit breaking exception.

Essentially this means that if we have 3 threads all trying to add 3 and
simultaneously cause a circuit breaking exception (due to retry), when
adjusting after circuit breaking we can "rewind" past what this test
expects the child breaker to be at.

This adds leeway into the check, where it's okay to be within
NUM_THREADS from the parentLimit, because each thread should only add 1
to the breaker at a time.
2014-12-15 17:06:21 +01:00
Robert Muir 01fc84dbb3 Wire utf-8 encoding, so unicode filenames work
Sets -Dfile.encoding=UTF-8 by default.

Closes #8847
2014-12-15 10:36:43 -05:00
Simon Willnauer 3bba45289e Remove unused code 2014-12-15 16:26:48 +01:00
Simon Willnauer 1247774ff1 Remove Gateway abstraction
We only have a single gatweway since es 1.3. There is no need to keep all
these abstractsion and nested packages. We can fold most of it into simpler
structures.
2014-12-15 15:53:02 +01:00
Lee Hinman a8fa650ee6 [CORE] Remove IndexEngine
IndexEngine was an abstraction where we had index-level engines (instead
of shard-level) that could store meta information about the index. It
was never actually used by Elasticsearch, and only there for plugins.

This removes it, because it is a confusing abstraction and not needed,
no plugins should be implementing their own IndexEngines.
2014-12-15 14:30:44 +01:00
spapin ad747ba67f Docs: fix a typo in cluster stats documentation example
Closes #8898
2014-12-15 14:14:38 +01:00
Ayush 23dbecf3e7 Update percolate.asciidoc
Updating the `associated` spelling

Closes #8907
2014-12-15 14:12:03 +01:00
Simone Scarduzio fff483d612 Docs: Adding REST ACL plugin
Closes #8925
2014-12-15 14:09:22 +01:00
Boaz Leskes d62bf5f67f Discovery: concurrent node failures can cause unneeded cluster state publishing
When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.

Closes #8804
Closes #8933
2014-12-15 14:01:25 +01:00
Lee Hinman 9b18c44b67 Default _cat APIs to verbose
`?v=false` can be used if the headers are not desired.

Resolves #8922

Fixes #8927
2014-12-15 12:51:59 +00:00
Simon Willnauer e47b753617 [SEARCH] close active contexts on SearchService#close()
When we close a node all pending / active search requests need to be
cleared otherwise a node will wait up to 30 sec for shutdown sicne there
could be open scroll requests. This behavior was introduces in 1.5 such that
versions <= 1.4.x are not affected.

Closes #8940
2014-12-15 09:41:31 +01:00
Boaz Leskes a63a055f63 Test: missing {} from log command in indexRandom 2014-12-13 17:24:46 +01:00