Commit Graph

7605 Commits

Author SHA1 Message Date
uboness 7d6ad8d91c Added extended_bounds support for date_/histogram aggs
By default the date_/histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when when requesting empty buckets (min_doc_count : 0), this causes a confusion, specifically, when the data is also filtered.

To understand why, let's look at an example:

Lets say the you're filtering your request to get all docs from the last month, and in the date_histogram aggs you'd like to slice the data per day. You also specify min_doc_count:0 so that you'd still get empty buckets for those days to which no document belongs. By default, if the first document that fall in this last month also happen to fall on the first day of the **second week** of the month, the date_histogram will **not** return empty buckets for all those days prior to that second week. The reason for that is that by default the histogram aggregations only start building buckets when they encounter documents (hence, missing on all the days of the first week in our example).

With extended_bounds, you now can "force" the histogram aggregations to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if the min_doc_count is greater than 0).

Note that (as the name suggest) extended_bounds is **not** filtering buckets. Meaning, if the min bounds is higher than the values extracted from the documents, the documents will still dictate what the min bucket will be (and the same goes to the extended_bounds.max and the max bucket). For filtering buckets, one should nest the histogram agg under a range filter agg with the appropriate min/max.

Closes #5224
2014-03-20 14:48:27 +01:00
uboness 07af39364e Changed the DateHistogram.Bucket to return the date key in UTC timezone
Closes #5477
2014-03-20 13:30:08 +01:00
Shay Banon 6a2bb9a35e clean the query parse context after usage
that will make sure we don't have any parsers lying around that are no longer used
2014-03-20 13:11:15 +01:00
Simon Willnauer 047119d92e Add BoolFilterBuilder#hasClauses to be consitent with BoolQueryBuilder
Closes #5472
2014-03-20 13:05:04 +01:00
Simon Willnauer bd003fe5b6 [TEST] add @Slow annotations to bad apples 2014-03-20 12:58:28 +01:00
Simon Willnauer 876b2592ac [Build] Skip topN hints if tests are skipped 2014-03-20 12:45:35 +01:00
Clinton Gormley 1fff379742 [DOCS] Documented the fact that binary fields are not stored by default 2014-03-20 12:43:43 +01:00
Simon Willnauer 7c494461f3 [BUILD] Add top execution hints to test phase
Added a ant task that prints the Top N most expensive tests after
each test run.
2014-03-20 12:29:48 +01:00
Britta Weber be3c5b44e0 category type should be called "category" instead of "field" in context suggester
Change according to documentation.
2014-03-20 12:25:45 +01:00
Florian Schilling c0a092aa92 [Doc] Updated docs for distance scripting
Updated docs for distance scripting and
added missing geohash distance functions
Closes #5397
2014-03-20 12:18:25 +01:00
markharwood 0cd184ef3c Fix for Jenkins advice on SignificantTermsAggregatorFactory, changing a small switch statement to an if-else 2014-03-20 11:02:06 +00:00
David Smiley 644fdfc4aa Upgrade to Spatial4j 0.4.1 and JTS 1.13
Fixes #5279
2014-03-20 11:17:51 +01:00
Simon Willnauer 34077e3121 [TEST] remove time upperbound from test - only for testing 2014-03-20 11:12:35 +01:00
Clinton Gormley 96655d2505 PUT /_aliases should accept a numeric routing value
Also added REST tests for setting index/search/routing
via the PUT /_aliases endpoint

Fixes #5465
2014-03-20 10:38:32 +01:00
Boaz Leskes a3f57c176e [Test] IndicesLifecycleListenerTests - added busy waiting for expected results (and improved error message)
The test currently uses ensureGreen to guaranty all shard allocations has happened. This only guaranties it from the cluster perspective and in some cases the nodes are not fast enough to implement the changes (which is what the test is about).
2014-03-20 10:34:44 +01:00
Simon Willnauer adfa82b2ed Allow iteration over MultiGetRequest#Item instances
Closes #3061
2014-03-20 10:24:12 +01:00
Adrien Grand ecdcc2df92 Fix cardinality memory-usage considerations.
Default precision was computed based on the number of MULTI_BUCKET parents
instead of PER_BUCKET.

The ordinals-based execution mode was almost always used although ordinals
might have non-negligible memory usage compared to the counters.

Close #5452
2014-03-20 10:02:40 +01:00
Martijn van Groningen 85c3c6fe62 [TEST] Disable wildcard query in testDfsSearchType for now 2014-03-20 14:56:25 +07:00
Andrew Raines 37b59d196b Remove shard-level info from _cat/segments and add pri/rep and ip address to segments
Also fix typo

Closes #4711
2014-03-19 22:56:06 -05:00
colings86 a7b3fdf3bf Added segments action to _cat API
Currently displays the same information as the _segments API
2014-03-19 22:56:06 -05:00
Florian Schilling 689fd15d78 Add exceptions to `GeoPointFieldMapper` when parsing `geo_point` object
* moved `geo_point` parsing to GeoUtils
* cleaned up `gzipped.json` for bulktest
* merged `GeoPointFieldMapper` and `GeoPoint` parsing methods

Closes #5390
2014-03-19 17:30:49 +01:00
markharwood 12d1bf8485 Significant_terms agg only creates term frequency cache when necessary and uses new TermsEnum wrapper to cache frequencies. Long and String-based aggs no longer need to pass an IndexReader as parameter when looking up frequencies of terms.
Closes #5459
2014-03-19 15:49:23 +00:00
Shay Banon d24600830b Use BytesReference to write to translog files
Instead of using byte arrays, pass the BytesReference to the actual translog file, and use the new copyTo(channel) method to write. This will improve by not potentially having to convert the data to a byte array
closes #5463
2014-03-19 14:12:42 +01:00
Clinton Gormley 4c34615686 [DOCS] Fixed some bad UTF8 2014-03-19 12:46:06 +01:00
Clinton Gormley 1f497c6678 [DOCS] Updated Drupal integration 2014-03-19 11:49:39 +01:00
Martijn van Groningen 9001874a47 Invoke super.clone() instead of creating a new instance in the clone methods. 2014-03-19 11:12:14 +07:00
Boaz Leskes 7380a0a65a [Test] RecoveryWhileUnderLoadTests: smarter waiting for background indexers as it sometimes times out. 2014-03-19 00:12:03 +01:00
Benjamin Devèze f38d6f8a1b Minor improvements to Table class and add tests 2014-03-18 17:23:07 -05:00
Shay Banon 1f15c1e7de BytesReference usage to properly work when hasArray is not available
fix spelling in comment, + remove overcautious assert
2014-03-18 21:06:04 +01:00
Shay Banon 0f6c24d0c5 BytesReference usage to properly work when hasArray is not available
when a BytesReference doesn't have a backing array, properly handle the case in places where its applicable
closes #5455
2014-03-18 21:02:31 +01:00
Martijn van Groningen 7d3f49c43b [TEST] Added the option to specify on what ES version a node should run with. Useful for testing. 2014-03-18 21:36:46 +07:00
Alexander Reelsen 0ca7fddb66 Geo Point Fieldmapper: Allow distance for geohash precision
Even though mentioned differently in the docs, the geohash precision needed to
be an integer instead of a DistanceUnit.

Closes #5448
2014-03-18 14:09:57 +01:00
Shay Banon 0ef3b03be1 Move to use serial merge schedule by default
Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
closes #5447
2014-03-18 13:17:00 +01:00
Nik Everett 917c93d7ee Speed up phrase suggestion scoring
Two changes:
1.  In the StupidBackoffScorer only look for the trigram if there is a bigram.
2.  Cache the frequencies in WordScorer so we don't look them up again and
again and again.  This is implemented by wrapping the TermsEnum in a special
purpose wrapper that really only works in context of the WordScorer.

This provides a pretty substantial speedup when there are many candidates.

Closes #5395
2014-03-18 12:16:32 +01:00
Nik Everett d88ac0a95a Make indexRandom handle many documents better
* Index one at a time only rarely if doing more then 300.
* When launching async actions, take some care to make sure you don't already
have more then 150 other async actions in flight.
* When indexing in bulk split into chunks of 1000 documents.
2014-03-18 12:16:32 +01:00
Martijn van Groningen c501d9960a Made p/c override the clone() method. This is necessary since by default clone will make a shallow copy of the original object, while for p/c queries we need to make sure that the wrapped queries are also cloned. 2014-03-18 17:30:45 +07:00
Igor Motov a1192044f2 Add ability to get snapshot status for running snapshots
Closes #4946
2014-03-17 20:13:49 -04:00
Bill Hwang fe487373e6 Revert "Findbug warning supression"
This reverts commit 744eabad03.
2014-03-17 13:55:39 -07:00
Bill Hwang 744eabad03 Findbug warning supression
Added logic to enable findbug warnings supression via annotations
2014-03-17 13:35:37 -07:00
David Pilato 8dfdc6f647 [TEST] pre check download service working
Seen during CI tests, it could appears that the download service is not available for any reason.

This fix in test will check before each test which requires an internet access (annotated with @Network) that the download service we are testing is still working.

It won't fail the test but will mark the test as `Ignored` in case of failure.
2014-03-17 21:31:17 +01:00
David Pilato 0805c01984 [DOCS] Add Azure storage repositories 2014-03-17 19:40:28 +01:00
Simon Willnauer b17e074f07 [TEST] Print tests.jvms and tests.client.ratio if set
We need to print options that can modify stream of events we need
to print it otherwise a seed might not reproduce the failure in
the tests.
2014-03-17 17:14:13 +01:00
uboness 35696aeb75 cleanup formatting in significant aggs package 2014-03-17 14:43:10 +01:00
Dridi Boukelmoune 9500dddad3 Move systemd files from /etc to /usr/lib
As documented in systemd's manual pages tmpfiles.d(5) and systemd.unit(5),
a package should install its default configuration in /usr/lib, which can
be overriden by system administrators in /etc.

New locations in the rpm:
/usr/lib/systemd/system/elasticsearch.service
/usr/lib/tmpfiles.d/elasticsearch.conf
2014-03-17 14:06:34 +01:00
Simon Willnauer ff039019c5 [TEST] Use client node such that we always have to do a round-trip to do a fetch. 2014-03-17 13:55:37 +01:00
markharwood 5f1d9af9fe Documentation fix for significant_terms heading levels 2014-03-17 12:17:54 +00:00
Randy Stauner 933852768d [DOCS] Fixing contributing.md indentation
Add a whitespace to make these separate paragraphs inside of a list.
2014-03-17 12:20:48 +01:00
Randy Stauner 1486188a3b [DOCS] Reword clear-scroll sentence 2014-03-17 12:08:49 +01:00
lzhoucs 5a5171cb70 [DOCS] Fix typo in the reference doc. SuSe -> SUSE
SUSE, as a Linux distribution, is never lower cased

fixes #5354
2014-03-17 12:03:25 +01:00
Justin Etheredge 36219a1786 [DOCS] Updating scripting docs for geo functions
Added a few functions are corrected the default unit where necessary
2014-03-17 11:59:02 +01:00