OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	7ea490dfd1	Aggregations: Return the sum of the doc counts of other buckets. This commit adds a new field to the response of the terms aggregation called `sum_other_doc_count` which is equal to the sum of the doc counts of the buckets that did not make it to the list of top buckets. It is typically useful to have a sector called eg. `other` when using terms aggregations to build pie charts. Example query and response: ```json GET test/_search?search_type=count { "aggs": { "colors": { "terms": { "field": "color", "size": 3 } } } } ``` ```json { [...], "aggregations": { "colors": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 4, "buckets": [ { "key": "blue", "doc_count": 65 }, { "key": "red", "doc_count": 14 }, { "key": "brown", "doc_count": 3 } ] } } } ``` Close #8213	2014-10-27 12:11:26 +01:00
Andrew O'Brien	33097d901b	Docs: Typo: s/by/be/ Closes #8114	2014-10-16 20:51:58 +02:00
Clinton Gormley	cb00d4a542	Docs: Removed all the added/deprecated tags from 1.x	2014-09-26 21:04:42 +02:00
Jordan Snodgrass	6246aac9ab	Docs: Indicate that the Children Aggregation is coming in 1.4.0	2014-09-17 09:22:02 +02:00
Colin Goodheart-Smithe	d4e83df3b8	Aggregations: Adds ability to sort on multiple criteria The terms aggregation can now support sorting on multiple criteria by replacing the sort object with an array or sort object whose order signifies the priority of the sort. The existing syntax for sorting on a single criteria also still works. Contributes to #6917 Replaces #7588	2014-09-15 11:08:29 +01:00
markharwood	3c8f8cc090	Aggs enhancement - allow Include/Exclude clauses to use array of terms as alternative to a regex Closes #6782	2014-09-12 15:28:03 +01:00
smayzak	65a0ca021d	The description was incorrect Looked like a copy and paste from another aggregation	2014-09-10 16:05:03 +02:00
smayzak	6416f5d3d0	Fixing some grammar	2014-09-10 16:05:03 +02:00
David Pilato	7fdd3651fa	[docs] Fix typo: resonable - reasonable	2014-09-10 15:57:57 +02:00
Colin Goodheart-Smithe	b127b52fd3	Revert "Aggregations: Adds ability to sort on multiple criteria" This reverts commit `bfedd11ffa`.	2014-09-08 20:27:19 +01:00
Colin Goodheart-Smithe	bfedd11ffa	Aggregations: Adds ability to sort on multiple criteria The terms aggregation can now support sorting on multiple criteria by replacing the sort object with an array or sort object whose order signifies the priority of the sort. The existing syntax for sorting on a single criteria also still works. Contributes to #6917	2014-09-08 15:20:33 +01:00
Clinton Gormley	1bdf79e527	Docs: Added explanation of how to do multi-field terms agg Closes #5100	2014-09-07 11:09:52 +02:00
Clinton Gormley	a059a6574a	Update reverse-nested-aggregation.asciidoc Fixed reverse nested example Closes #7463	2014-09-02 11:40:41 +01:00
Adrien Grand	8e1d3d56b3	Docs: Replace added[1.4.0] with coming[1.4.0] since 1.4 is not released yet.	2014-08-29 11:57:22 +02:00
Martijn van Groningen	383e64bd5c	Aggregations: Add `children` bucket aggregator that is able to map buckets between parent types and child types using the already builtin parent/child support. Closes #6936	2014-08-19 12:40:51 +02:00
Konrad Feldmeier	3b3e2ed5e9	Docs: Remove the 'Factor' paragraph to reflect #6490 The current implementation of 'date_histogram' does not understand the `factor` parameter. Since the docs shouldn't raise false hopes, I removed the section. Closes #7277	2014-08-18 13:02:15 +02:00
Colin Goodheart-Smithe	e6632ec63e	[DOCS] fixed title for filters aggregation documentation	2014-08-07 08:37:43 +01:00
Clinton Gormley	7b0b315b71	Tidied up the filters agg docs and added a coming[] tag	2014-08-07 09:03:23 +02:00
Britta Weber	a3cefd919e	significant terms: add google normalized distance, add chi square closes #6858	2014-08-04 08:15:26 +02:00
uboness	3c9c9f33e2	Aggregations Added Filters aggregation A multi-bucket aggregation where multiple filters can be defined (each filter defines a bucket). The buckets will collect all the documents that match their associated filter. This aggregation can be very useful when one wants to compare analytics between different criterias. It can also be accomplished using multiple definitions of the single filter aggregation, but here, the user will only need to define the sub-aggregations only once. Closes #6118	2014-08-01 16:01:08 +01:00
Colin Goodheart-Smithe	655157c83a	Aggregations: Added an option to show the upper bound of the error for the terms aggregation. This is only applicable when the order is set to _count. The upper bound of the error in the doc count is calculated by summing the doc count of the last term on each shard which did not return the term. The implementation calculates the error by summing the doc count for the last term on each shard for which the term IS returned and then subtracts this value from the sum of the doc counts for the last term from ALL shards. Closes #6696	2014-07-25 14:24:24 +01:00
Simon Willnauer	5bfea56457	[DOCS] move all coming tags to added in master	2014-07-23 16:37:19 +02:00
Adrien Grand	abeefbddea	Docs: Update documentation about execution hints for the terms aggregation.	2014-07-21 11:55:57 +02:00
Britta Weber	74927adced	significant terms: infrastructure for changing easily the significance heuristic This commit adds the infrastructure to allow pluging in different measures for computing the significance of a term. Significance measures can be provided externally by overriding - SignificanceHeuristic - SignificanceHeuristicBuilder - SignificanceHeuristicParser closes #6561	2014-07-14 11:00:50 +02:00
Florian Hopf	3689f67a76	Docs: Fixed invalid word count in geodistance agg doc Closes #6838	2014-07-11 18:35:36 +02:00
Andrii Gakhov	80321d89d9	Docs: Update histogram-aggregation.asciidoc filter in a filtered query should be under "filter" key Closes #6738	2014-07-07 10:44:11 +02:00
Chris	011e20678d	[DOCS] Fixed json example in nested-aggregation.asciidoc	2014-06-18 19:38:02 +02:00
Martijn van Groningen	5e408f3d40	Change the top_hits to be a metric aggregation instead of a bucket aggregation (which can't have an sub aggs) Closes #6395 Closes #6434	2014-06-10 09:09:50 +02:00
markharwood	724129e6ce	Aggregations optimisation for memory usage. Added changes to core Aggregator class to support a new mode of deferred collection. A new "breadth_first" results collection mode allows upper branches of aggregation tree to be calculated and then pruned to a smaller selection before advancing into executing collection on child branches. Closes #6128	2014-06-06 15:59:51 +01:00
Martijn van Groningen	aab38fb2e6	Aggregations: added pagination support to `top_hits` aggregation by adding `from` option. Closes #6299	2014-05-30 11:45:31 +02:00
Martijn van Groningen	5fafd2451a	Added `top_hits` aggregation that keeps track of the most relevant document being aggregated per bucket. Closes #6124	2014-05-23 16:01:18 +02:00
Simon Willnauer	9d5507047f	Update Documentation Feature Flags [1.2.0]	2014-05-22 15:06:42 +02:00
Britta Weber	08e57890f8	use shard_min_doc_count also in TermsAggregation This was discussed in issue #6041 and #5998 . closes #6143	2014-05-14 14:10:04 +02:00
David Pilato	1cb2c3bdd3	[DOCS] reverse-nested aggs are added in 1.2.0	2014-05-13 20:00:42 +02:00
markharwood	1e560b0d92	Significant_terms agg: added option for a background_filter to define background context for analysis of term frequencies Closes #5944	2014-05-13 09:10:30 +01:00
Clinton Gormley	5b93255ec8	[DOCS] Added "Aggregation" to all aggs titles	2014-05-13 01:35:58 +02:00
Rashid Khan	233aaa63c9	Change key to keyed	2014-05-12 13:15:07 -07:00
Britta Weber	7944369fd1	Add `shard_min_doc_count` parameter for significant terms similar to `shard_size` Significant terms internally maintain a priority queue per shard with a size potentially lower than the number of terms. This queue uses the score as criterion to determine if a bucket is kept or not. If many terms with low subsetDF score very high but the `min_doc_count` is set high, this might result in no terms being returned because the pq is filled with low frequent terms which are all sorted out in the end. This can be avoided by increasing the `shard_size` parameter to a higher value. However, it is not immediately clear to which value this parameter must be set because we can not know how many terms with low frequency are scored higher that the high frequent terms that we are actually interested in. On the other hand, if there is no routing of docs to shards involved, we can maybe assume that the documents of classes and also the terms therein are distributed evenly across shards. In that case it might be easier to not add documents to the pq that have subsetDF <= `shard_min_doc_count` which can be set to something like `min_doc_count`/number of shards because we would assume that even when summing up the subsetDF across shards `min_doc_count` will not be reached. closes #5998 closes #6041	2014-05-07 18:02:56 +02:00
gabriel-tessier	7b0efcbd96	fix typo	2014-05-06 15:54:36 +02:00
Martijn van Groningen	013b319415	Added `reverse_nested` aggregation. The `reverse_nested` aggregation allows to aggregate on properties outside of the nested scope of a `nested` aggregation. Closes #5507	2014-05-01 00:23:05 +07:00
gabriel-tessier	000c33aac3	fix typo	2014-04-07 09:23:46 +02:00
Martijn van Groningen	ade1d0ef57	Added global ordinals (unique incremental numbering for terms) to fielddata. Added a terms aggregation implementations that work on global ordinals, which is also the default. Closes #5672	2014-04-07 11:06:41 +07:00
bleskes	5d832374dd	Update Documentation Feature Flags [1.1.0]	2014-03-25 17:51:30 +01:00
uboness	7d6ad8d91c	Added extended_bounds support for date_/histogram aggs By default the date_/histogram returns all the buckets within the range of the data itself, that is, the documents with the smallest values (on which with histogram) will determine the min bucket (the bucket with the smallest key) and the documents with the highest values will determine the max bucket (the bucket with the highest key). Often, when when requesting empty buckets (min_doc_count : 0), this causes a confusion, specifically, when the data is also filtered. To understand why, let's look at an example: Lets say the you're filtering your request to get all docs from the last month, and in the date_histogram aggs you'd like to slice the data per day. You also specify min_doc_count:0 so that you'd still get empty buckets for those days to which no document belongs. By default, if the first document that fall in this last month also happen to fall on the first day of the second week of the month, the date_histogram will not return empty buckets for all those days prior to that second week. The reason for that is that by default the histogram aggregations only start building buckets when they encounter documents (hence, missing on all the days of the first week in our example). With extended_bounds, you now can "force" the histogram aggregations to start building buckets on a specific min values and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if the min_doc_count is greater than 0). Note that (as the name suggest) extended_bounds is not filtering buckets. Meaning, if the min bounds is higher than the values extracted from the documents, the documents will still dictate what the min bucket will be (and the same goes to the extended_bounds.max and the max bucket). For filtering buckets, one should nest the histogram agg under a range filter agg with the appropriate min/max. Closes #5224	2014-03-20 14:48:27 +01:00
markharwood	5f1d9af9fe	Documentation fix for significant_terms heading levels	2014-03-17 12:17:54 +00:00
Boaz Leskes	ee8743f3f2	[Docs] added a missing reference to significantterms-aggergations Also fix header level mismatch issue reported by the build	2014-03-17 11:45:55 +01:00
rphadake	36a0cb99d7	[Doc] doc updates for date histogram interval Close #5308	2014-03-14 18:55:32 +01:00
markharwood	767bef0596	Significant_terms aggregation identifies terms that are significant rather than merely popular in a set. Significance is related to the changes in document frequency observed between everyday use in the corpus and frequency observed in the result set. The asciidocs include extensive details on the applications of this feature. Closes #5146	2014-03-14 10:34:24 +00:00
uboness	9d0fc76f54	Added support for sorting buckets based on sub aggregations Supports sorting on sub-aggs down the current hierarchy. This is supported as long as the aggregation in the specified order path are of a single-bucket type, where the last aggregation in the path points to either a single-bucket aggregation or a metrics one. If it's a single-bucket aggregation, the sort will be applied on the document count in the bucket (i.e. doc_count), and if it is a metrics type, the sort will be applied on the pointed out metric (in case of a single-metric aggregations, such as avg, the sort will be applied on the single metric value) NOTE: this commit adds a constraint on what should be considered a valid aggregation name. Aggregations names must be alpha-numeric and may contain '-' and '_'. Closes #5253	2014-03-06 00:05:27 +01:00
Binh Ly	7e49848697	Clarify range aggregations	2014-02-28 14:38:57 -05:00

1 2

70 Commits