Commit Graph

8741 Commits

Author SHA1 Message Date
Simon Willnauer 76fab9d42a [TEST] consistently omit norms in test otherwise scoring will be dependent on merges etc. 2014-06-16 10:54:12 +02:00
Volker Fröhlich 06192686a2 [DOCS] Fixd typo in http.asciidoc 2014-06-16 10:42:34 +02:00
Shay Banon 4c579d6c8d Better default size for global index -> alias map
The alias -> (index -> alias) map, specifically the index -> alias one, typically just hold one entry, yet we eagerly initialize it to the number of indices. When there are many indices, each with many aliases, this is a very large overhead per alias...
closes #6504
2014-06-16 00:52:33 +02:00
Simon Willnauer 6d77a248fb [TEST] Stabelize test - wait for yellow to ensure all primaries are allocated 2014-06-14 21:52:15 +02:00
Adrien Grand 7bcabf9481 Fielddata: Don't expose hashes anymore.
Our field data currently exposes hashes of the bytes values. That takes roughly
4 bytes per unique value, which is definitely not negligible on high-cardinality
fields.

These hashes have been used for 3 different purposes:
 - term-based aggregations,
 - parent/child queries,
 - the percolator _id -> Query cache.

Both aggregations and parent/child queries have been moved to ordinals which
provide a greater speedup and lower memory usage. In the case of the percolator
it is used in conjunction with HashedBytesRef to not recompute the hash value
when getting resolving a query given its ID. However, removing this has no
impact on PercolatorStressBenchmark.

Close #6500
2014-06-13 23:05:02 +02:00
Adrien Grand 232394e3a8 Aggregations: Remove `ordinals` execution hint.
This was how terms aggregations managed to not be too slow initially by caching
reads into the terms dictionary using ordinals. However, this doesn't behave
nicely on high-cardinality fields since the reads into the terms dict are
random and this execution mode loads all unique terms into memory.

The `global_ordinals` execution mode (default since 1.2) is expected to be
better in all cases.

Close #6499
2014-06-13 23:02:20 +02:00
Adrien Grand fbd7c9aa5d Aggregations: Fix reducing of range aggregations.
Under some rare circumstances:
 - local transport,
 - the range aggregation has both a parent and a child aggregation,
 - the range aggregation got no documents on one shard or more and several
   documents on one shard or more.
the range aggregation could return incorrect counts and sub aggregations.

The root cause is that since the reduce happens in-place and since the range
aggregation uses the same instance for all sub-aggregation in case of an
empty bucket, sometimes non-empty buckets would have been reduced into this
shared instance.

In order to avoid similar bugs in the future, aggregations have been updated
to return a new instance when reducing instead of doing it in-place.

Close #6435
2014-06-13 23:01:43 +02:00
Martijn van Groningen 52be3748ff [TEST] Fix assert 2014-06-13 18:03:25 +02:00
javanna b9ffb2b0a5 Java API: Make sure afterBulk is always called in BulkProcessor after beforeBulk
Moved BulkProcessor tests from BulkTests to newly added BulkProcessorTests class.
Strenghtened BulkProcessorTests by adding randomizations to existing tests and new tests for concurrent requests and expcetions.
Also made sure that afterBulk is called only once per request if concurrentRequests==0.

Closes #5038
2014-06-13 17:40:06 +02:00
Boaz Leskes 44097b358d [Test] set search request size testGeohashCellFilter
The default of 10 is not good enough as previously thought.
2014-06-13 16:11:13 +02:00
stephlag 13d910f016 Added missing comma in suggester example 2014-06-13 16:01:04 +02:00
Alexander Reelsen b252aacc67 Packaging: Remove java-6 directories from debian init script
Closes #6350
2014-06-13 13:48:22 +02:00
Adrien Grand 7a34702925 [DOCS] Clarify the trade-off of the `disk` doc values format. 2014-06-13 13:24:53 +02:00
Adrien Grand 01327d7136 Facets: deprecation.
Users are encouraged to move to the new aggregation framework that was
introduced in Elasticsearch 1.0.

Close #6485
2014-06-13 13:13:44 +02:00
Clinton Gormley eb6c9fe111 Docs: Linked to fielddata formats from core types
Closes #6489
2014-06-13 12:58:03 +02:00
Daniel Winterstein 99549cab00 Added link to "native" ES client 2014-06-13 12:49:48 +02:00
Martijn van Groningen 59ff05020f [TEST] Removed incorrect assertion (it is expected that the flush doesn't execute on all shard copies, because we don't wait for green status) 2014-06-13 12:25:43 +02:00
mikemccand 9620aa315e [TEST] Add FailureMarker to test listeners so -Dtests.failfast works 2014-06-13 06:04:33 -04:00
Martijn van Groningen 77e0429089 [TEST] Verify the flush reponse 2014-06-13 11:40:05 +02:00
Boaz Leskes 7fb16c783d Added caching support to geohash_filter
Caching is turned off by default.

Closes #6478
2014-06-12 22:19:34 +02:00
Clinton Gormley eabd4abf57 Rest-Spec: search was missing the track_scores param 2014-06-12 18:25:20 +02:00
Shay Banon 2330421816 Wait till node is part of cluster state for join process
When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.
closes #6480
2014-06-12 18:15:51 +02:00
Lee Hinman 3a3f81d59b Enable DiskThresholdDecider by default, change default limits to 85/90%
Fixes #6200
Fixes #6201
2014-06-12 16:35:29 +02:00
Alex Ksikes 35cba50fce More Like This Query: creates only one MLT query per field for all queried items.
Previously, one MLT query per field was created for each item. One issue with
this method is that the maximum number of selected terms was equal to the
number of items times 'max_query_terms'. Instead, users should have direct control
over the maximum number of selected terms allowed, regardless of the number of
queried items.

Another issue related to the previous method is that it could lead to the
selection of rather uninteresting terms, that because they were found in a
particular queried item. Instead, this new procedure enforces the selection of
interesting terms across ALL items, not within each item. This could lead to
search results where the best matching items share commonalities amongst the
best characteristics of all the items.

Closes #6404
2014-06-12 14:19:33 +02:00
Clinton Gormley c41e63c2f9 Docs: Updated index-modules/store and setup/configuration
Explain how to set different index storage types, and
added the vm settings required to stop mmapfs from running
out of memory

Closes #6327
2014-06-12 13:56:06 +02:00
Clinton Gormley be2b8066d1 Docs: Updated CONTRIBUTING.md to provide more details about the PR process
Closes #6323
2014-06-12 13:07:04 +02:00
shadow000fire 1b45b216fd Update nested-query.asciidoc
Added note that fields inside a nested query must be full qualified.
2014-06-12 12:48:23 +02:00
Simon Willnauer b8537d0e95 [BUILD] exclude target dir from tab validation 2014-06-12 12:31:18 +02:00
Luke Fender f9da5259bc [DOCS] Fixed typo in post-filter.asciidoc
Remove 'be' where it is not needed
2014-06-12 12:09:19 +02:00
Nik Everett 29c10ed1bb [BUILD] Generate source jars for tests
Closes #6125
2014-06-12 12:05:54 +02:00
Simon Willnauer 5575ba1a12 [BUILD] Check for tabs and nocommits in the code on validate
This commit adds checks for nocommit and tabs in the source code.
The task is executed during the validate phase and can be disabled via
`-Dvalidate.skip`
2014-06-12 11:11:23 +02:00
Igor Motov 56a264cf6d [DOCS] Snapshot/restore: add more information about snapshot and restore monitoring 2014-06-11 20:52:45 -04:00
Honza Král 681e9fa522 [TEST] remove accidental double escape in yaml tests 2014-06-12 01:03:36 +02:00
Clinton Gormley f546662e8f Docs: Hunspell tidied
Tidied some formatting
2014-06-11 21:49:02 +02:00
Israel Tsadok 32f87f8617 Highlighting: make HighlightQuery class public 2014-06-11 17:23:43 +02:00
Clinton Gormley 04dacaaf27 Docs: Use the "stemmer" token filter for the english analyzer, to be consistent 2014-06-11 13:47:07 +02:00
Clinton Gormley 8a94b71b75 Docs: Corrected the use of keyword_marker on the lang analyzers 2014-06-11 13:43:02 +02:00
Clinton Gormley 673ef3db3f The StemmerTokenFilter had a number of issues:
* `english` returned the slow snowball English stemmer
* `porter2` returned the snowball Porter stemmer (v1)
* `portuguese` was used twice, preventing the second version from working

Changes:

* `english` now returns the fast PorterStemmer (for indices created from v1.3.0 onwards)
* `porter2` now returns the snowball English stemmer (for indices created from v1.3.0 onwards)
* `light_english` now returns the `kstem` stemmer (`kstem` still works)
* `portuguese_rslp` returns the PortugueseStemmer
* `dutch_kp` is a synonym for `kp`

Tests and docs updated

Fixes #6345
Fixes #6213
Fixes #6330
2014-06-11 12:30:16 +02:00
Clinton Gormley c25de57d5d Tests: Fixed CompletionSuggester test which relied on a bug 2014-06-10 21:34:03 +02:00
Clinton Gormley 0859f2e588 Tests: Java test runner can't handle array responses 2014-06-10 20:33:04 +02:00
Clinton Gormley eb3e0fb931 Tests: Fixed indices.stats types test when run with replicas 2014-06-10 19:10:18 +02:00
Clinton Gormley bb15def36e Stats: Bugfixes and enhancements to indices stats API
Bugs:
* "groups" and "types" were being ignored
* "completion_fields" as wildcards were not being resolved to fieldnames

Enhancements:
* Made "groups" and "types" support wildcards
* Added missing tests

Closes #6390
2014-06-10 17:35:49 +02:00
Alexander Reelsen d3dc158458 TransportClient: Improve logging, fix minor issue
In order to return more information to the client, in case a TransportClient
can not connect to the cluster, this commit adds logging and also returns the
configured nodes in the NoNodeAvailableException

Also a minor bug has been fixed, which propagated exceptions wrong, so that an
invalid request was actually tried on every node, if a regular connection failure
on the first node had happened.

Closes #6376
2014-06-10 13:15:59 +02:00
Martijn van Groningen 38be1e0dde Aggregations: if maxOrd is 0 then use noop collector
Before the OrdinalsCollector was used and this leads to a ArrayIndexOutOfBoundsException

Closes #6413
2014-06-10 09:14:06 +02:00
Martijn van Groningen e15d2e2514 Fielddata: EmptyOrdinals#getMaxOrd() should return 0 instead of 1, since ordinals are zero based since #5871. 2014-06-10 09:13:27 +02:00
Martijn van Groningen 5e408f3d40 Change the top_hits to be a metric aggregation instead of a bucket aggregation (which can't have an sub aggs)
Closes #6395
Closes #6434
2014-06-10 09:09:50 +02:00
Clinton Gormley e323e577e8 Docs: Fixed bad ref on cjk_width/bigram pages 2014-06-09 23:36:58 +02:00
Clinton Gormley 5e40868f44 Docs: Fixed a bad ref on lang analyzers page 2014-06-09 23:03:12 +02:00
Clinton Gormley 5c5c1da06c Docs: Fixed some errors on the language analyzers page 2014-06-09 22:51:28 +02:00
Clinton Gormley 585b0ef730 Docs: Added custom-analyzer equivalents of all the language analyzers 2014-06-09 22:41:25 +02:00