866 Commits

Author SHA1 Message Date
markharwood
767bef0596 Significant_terms aggregation identifies terms that are significant rather than merely popular in a set.
Significance is related to the changes in document frequency observed between everyday use in the corpus and
frequency observed in the result set. The asciidocs include extensive details on the applications of this feature.

Closes #5146
2014-03-14 10:34:24 +00:00
Adrien Grand
5821fa042c Cardinality aggregation.
This aggregation computes unique term counts using the hyperloglog++ algorithm
which uses linear counting to estimate low cardinalities and hyperloglog on
higher cardinalities.

Since this algorithm works on hashes, it is useful for high-cardinality fields
to store the hash of values directly in the index, which is the purpose of
the new `murmur3` field type. This is less necessary on low-cardinality
string fields because the aggregator is smart enough to only compute the hash
once per unique value per segment thanks to ordinals, or on numeric fields
since hashing them is very fast.

Close #5426
2014-03-13 19:19:56 +01:00
Florian Schilling
81e537bd5e ContextSuggester
================

This commit extends the `CompletionSuggester` by context
informations. In example such a context informations can
be a simple string representing a category reducing the
suggestions in order to this category.

Three base implementations of these context informations
have been setup in this commit.

- a Category Context
- a Geo Context

All the mapping for these context informations are
specified within a context field in the completion
field that should use this kind of information.
2014-03-13 11:24:46 +01:00
Kurt Hurtado
ca6a2bb790 [DOCS] Various aggregation doc fixes 2014-03-13 09:05:25 +01:00
Boaz Leskes
b7a95d11a7 Introduced VersionType.FORCE & VersionType.EXTERNAL_GTE
Also added "external_gt" as an alias name for VersionType.EXTERNAL , accessible for the rest layer.

Closes #4213 , Closes #2946
2014-03-10 21:07:17 +01:00
Simon Willnauer
fbb8c0fafa [DOCS] Add coming tag to multiple rescores
Closes #5365
2014-03-10 09:27:44 +01:00
Benjamin Devèze
2affa5004f Fix small typo in percentiles doc 2014-03-07 10:10:19 +01:00
Adrien Grand
f359b7f38b [DOC] The percentiles aggregation is coming in 1.1.0. 2014-03-07 10:03:15 +01:00
uboness
9d0fc76f54 Added support for sorting buckets based on sub aggregations
Supports sorting on sub-aggs down the current hierarchy. This is supported as long as the aggregation in the specified order path are of a single-bucket type, where the last aggregation in the path points to either a single-bucket aggregation or a metrics one. If it's a single-bucket aggregation, the sort will be applied on the document count in the bucket (i.e. doc_count), and if it is a metrics type, the sort will be applied on the pointed out metric (in case of a single-metric aggregations, such as avg, the sort will be applied on the single metric value)

 NOTE: this commit adds a constraint on what should be considered a valid aggregation name. Aggregations names must be alpha-numeric and may contain '-' and '_'.

 Closes #5253
2014-03-06 00:05:27 +01:00
Zachary Tong
7b16c5857d Percentiles aggregation.
A new metric aggregation that can compute approximate values of arbitrary
percentiles.

Close #5323
2014-03-03 18:06:14 +01:00
Binh Ly
7e49848697 Clarify range aggregations 2014-02-28 14:38:57 -05:00
Clinton Gormley
53ce0e8e27 [DOCS] Fixed added[] tag version number 2014-02-28 15:29:43 +01:00
Luca Cavanna
4e6610a798 Fixed multi term queries support in postings highlighter for non top-level queries
In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.

The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.

Closes #5102
2014-02-21 21:43:40 +01:00
Britta Weber
db3c6c2a8e Enable percolation for nested documents
closes #5082
2014-02-14 22:42:33 +01:00
uboness
d335630e57 [docs] fixed errors in aggs docs
- error in nested aggs example
- error in terms aggs example
2014-02-13 20:36:02 +01:00
Luca Cavanna
179750f0f5 [DOCS] fixed count docs, it now requires a top-level query object, same as other apis
Relates to #4074
2014-02-13 13:36:20 +01:00
Luca Cavanna
01abea5945 [DOCS] fixed count and validate query docs, they now require a top-level query object, same as other apis
Relates to #4074
Closes #5111
2014-02-13 11:42:04 +01:00
Simon Willnauer
990ce658a4 [Docs] Remove custom_score from documentation and add a migration
section.
2014-02-11 14:59:15 +01:00
Clinton Gormley
93930d6dc7 Removed 0.90.* deprecation and addition notifications
Closes #5052
2014-02-07 20:52:49 +01:00
Adrien Grand
9cb17408cb Make size=0 return all buckets for the geohash_grid aggregation.
Close #4875
2014-02-07 09:55:10 +01:00
Boaz Leskes
9bf263c741 [DOCS] Fix terms agg value script example 2014-02-06 16:35:49 +01:00
Boaz Leskes
ae4ed29f9b [Docs] value_count supports script per 1.1 2014-02-06 15:04:50 +01:00
Clinton Gormley
6238d406b5 [DOCS] Removed the experimental label from Tribe, Hot Threads
and Completion Suggester
2014-02-06 14:19:17 +01:00
Adrien Grand
6777be60ce Add script support to value_count aggregations.
Close #5001
2014-02-04 14:29:32 +01:00
Clinton Gormley
238b26a466 [DOC] Tidied up geohashgrid aggregations 2014-02-04 11:54:32 +01:00
Jun Ohtani
ba415b8ad2 Does not support "script" in value_clunt aggregation. 2014-02-04 10:26:07 +01:00
Adrien Grand
cc1ff560df Rename geohashgrid to geohash_grid in documentation.
It was renamed in fc6bc4c4776a2f710f57616e3495aaf6a230c4d3.

Close #4997
2014-02-04 09:39:55 +01:00
Lars Francke
1bd9dc129b Fix confusing sentence
The original sentence didn't make much sense. I hope this is a bit better. Taken heavy inspiration from c63d8c4fb5
2014-02-03 17:20:40 +01:00
Lars Francke
7cbd0962b5 Improve Aggregations documentation
* Mostly minor things like typos and grammar stuff
* Some clarifications
* The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated
2014-02-03 17:16:52 +01:00
uboness
d3f2173ef9 fixed date_/histogram aggregation documentation - added documentation for the min_doc_count setting
Closes #4944
2014-01-29 20:55:26 +01:00
uboness
9f04e5fe38 fixed nested example response in docs
Closes #4935
2014-01-29 13:09:12 +01:00
uboness
dd389d1cc5 Made all multi-bucket aggs return consistent response format
Closes #4926
2014-01-28 17:46:57 +01:00
Nik Everett
93a8e80aff Support multiple rescores
Detects if rescores arrive as an array instead of a plain object.  If so
then parse each element of the array as a separate rescore to be executed
one after another.  It looks like this:
   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": "log10(doc['numeric'].value + 2)"
               }
            }
         }
      }
   } ]

Rescores as a single object are still supported.

Closes #4748
2014-01-23 16:29:07 +01:00
Nik Everett
37f80c8d80 Documentation for score_mode
Closes #4742
2014-01-23 16:24:48 +01:00
Clinton Gormley
8685818ad3 [DOCS] Moved termvector and mtermvectors from search to docs 2014-01-22 14:10:26 +01:00
Simon Willnauer
cb3bcb05be [DOCS]: Fix added version termvectors.asciidoc 2014-01-22 12:08:13 +01:00
Adrien Grand
9282ae4ffd Terms aggregations: make size=0 return all terms.
Terms aggregations return up to `size` terms, so up to now, the way to get all
matching terms back was to set `size` to an arbitrary high number that would be
larger than the number of unique terms.

Terms aggregators already made sure to not allocate memory based on the `size`
parameter so this commit mostly consists in making `0` an alias for the
maximum integer value in the TermsParser.

Close #4837
2014-01-22 11:05:10 +01:00
Lee Hinman
2c289fb538 Add the ability to retrieve fields from field data
Adds a new FetchSubPhase, FieldDataFieldsFetchSubPhase, which loads the
field data cache for a field and returns an array of values for the
field.

Also removes `doc['<field>']` and `_source.<field>` workaround no longer
needed in field name resolving.

Closes #4492
2014-01-21 09:13:32 -07:00
Martijn van Groningen
9bc3d996ff [SPECS] Updated percolator specs. 2014-01-20 18:18:27 +01:00
Florian Gilcher
eed079aaac Reference docs fixes
* Make it clearer that `aggs` is an allowed synomym
  for the `aggregations` key
* Fix broken example in for datehistogram, `1.5M` is
  not an allowed interval
* Make use of colon before examples consistent
* Fix typos
2014-01-20 12:14:17 +01:00
Dawid Weiss
ae71b25145 Documentation typo. 2014-01-20 11:51:08 +01:00
Luca Cavanna
4126ae2631 [DOCS] updated json responses after #4310 and #4480
- Removed "ok": true from response examples
 - Added "created" flag to index response examples
 - Replaced exists flag with found in delete response examples
2014-01-16 12:01:39 +01:00
markharwood
2795f4e55d Standardized use of “*_length” for parameter names rather than “*_len”.
Java Builder apis drop old “len” methods in favour of new “length”
Rest APIs support both old “len: and new “length” forms using new ParseField class to a) provide compiler-checked consistency between Builder and Parser classes and
b) a common means of handling deprecated syntax in the DSL.
Documentation and rest specs only document the new “*length” forms
Closes #4083
2014-01-13 15:59:15 +00:00
Adrien Grand
5c237fe834 Add new option min_doc_count to terms and histogram aggregations.
`min_doc_count` is the minimum number of hits that a term or histogram key
should match in order to appear in the response.

`min_doc_count=0` replaces `compute_empty_buckets` for histograms and will
behave exactly like facets' `all_terms=true` for terms aggregations.

Close #4662
2014-01-13 10:09:38 +01:00
Martijn van Groningen
943b62634c Replaced the multi-field type in favour for the multi fields option that can be set on any core field.
When upgrading to ES 1.0 the existing mappings with a multi-field type automatically get replaced to a core field with the new `fields` option.

If a `multi_field` type-ed field doesn't have a main / default field, a default field will be chosen for the multi fields syntax. The new main field type
will be equal to the first `multi_field` fields' field or type string if no fields have been configured for the `multi_field` field and in both cases
the default index will not be indexed (`index=no` is set on the default field).

If a `multi_field` typed field has a default field, that field will replace the `multi_field` typed field.

Closes to #4521
2014-01-13 09:21:53 +01:00
Florian Schilling
464037e0c1 Geo clean Up
============
The default unit for measuring distances is *MILES* in most cases. This commit moves ES
over to the *International System of Units* and make it work on a default which relates
to *METERS* . Also the current structures of the `GeoBoundingBox Filter` changed in
order to define the *Bounding* by setting abitrary corners.

Distances
---------
Since the default unit for measuring distances has changed to a default unit
`DistanceUnit.DEFAULT` relating to *meters*, the **REST API** has changed at the
following places:

  * `ScriptDocValues.factorDistance()` returns *meters* instead of *miles*
  * `ScriptDocValues.factorDistanceWithDefault()` returns *meters* instead of *miles*
  * `ScriptDocValues.arcDistance()` returns *meters* instead of *miles*
        one might use `ScriptDocValues.arcDistanceInMiles()`
  * `ScriptDocValues.arcDistanceWithDefault()` returns *meters* instead of *miles*
  * `ScriptDocValues.distance()` returns *meters* instead of *miles*
        one might use `ScriptDocValues.distanceInMiles()`
  * `ScriptDocValues.distanceWithDefault()` returns *meters* instead of *miles*
        one might use `ScriptDocValues.distanceInMilesWithDefault()`
  * `GeoDistanceFilter` default unit changes from *kilometers* to *meters*
  * `GeoDistanceRangeFilter` default unit changes from *miles* to *meters*
  * `GeoDistanceFacet` default unit changes from *miles* to *meters*

Geo Bounding Box Filter
-----------------------
The naming of the GeoBoundingBoxFilter properties allows to set arbitrary corners
(see #4084) namely `top_right`, `top_left`, `bottom_right` and `bottom_left`. This
change also includes the fields `topRight` and `bottomLeft` Also it is be possible to
set the single values by using just `top`, `bottom`, `left` and `right` parameters.

Closes #4515, #4084
2014-01-11 21:30:29 +09:00
Simon Willnauer
bc5a9ca342 Rename edit_distance/min_similarity to fuzziness
A lot of different API's currently use different names for the
same logical parameter. Since lucene moved away from the notion
of a `similarity` and now uses an `fuzziness` we should generalize
this and encapsulate the generation, parsing and creation of these
settings across all queries.

This commit adds a new `Fuzziness` class that handles the renaming
and generalization in a backwards compatible manner.

This commit also added a ParseField class to better support deprecated
Query DSL parameters

The ParseField class allows specifying parameger that have been deprecated.
Those parameters can be more easily tracked and removed in future version.
This also allows to run queries in `strict` mode per index to throw
exceptions if a query is executed with deprected keys.

Closes #4082
2014-01-09 15:14:51 +01:00
Martijn van Groningen
7e341cefd0 Change the sort boolean option in percolate api to the sort dsl available in search api.
Closes #4625
2014-01-09 09:58:34 +01:00
Clinton Gormley
2e4b70d40f [DOCS] Fixed duplicate ID in highlighting 2014-01-09 00:37:18 +01:00
Nik Everett
bbf0ec52de Add warning phrase suggester's max_errors
large number can badly impact performance.
2014-01-08 23:06:41 +01:00