The assumption is that gaps in histogram are generally undesirable, for instance
if you want to build a visualization from it. Additionally, we are building new
aggregations that require that there are no gaps to work correctly (eg.
derivatives).
* Removed the docs for `index.compound_format` and `index.compound_on_flush` - these are expert settings which should probably be removed (see https://github.com/elastic/elasticsearch/issues/10778)
* Removed the docs for `index.index_concurrency` - another expert setting
* Labelled the segments verbose output as experimental
* Marked the `compression`, `precision_threshold` and `rehash` options as experimental in the cardinality and percentile aggs
* Improved the experimental text on `significant_terms`, `execution_hint` in the terms agg, and `terminate_after` param on count and search
* Removed the experimental flag on the `geobounds` agg
* Marked the settings in the `merge` and `store` modules as experimental, rather than the modules themselves
Closes#10782
Today we check every regular expression eagerly against every possible term.
This can be very slow if you have lots of unique terms, and even the bottleneck
if your query is selective.
This commit switches to Lucene regular expressions instead of Java (not exactly
the same syntax yet most existing regular expressions should keep working) and
uses the same logic as RegExpQuery to intersect the regular expression with the
terms dictionary. I wrote a quick benchmark (in the PR) to make sure it made
things faster and the same request that took 750ms on master now takes 74ms with
this change.
Close#7526
This commit adds scripting capability to significant_terms.
Custom heuristics can be implemented with a script that provides
parameters subset_freq, superset_freq,subset_size, superset_size.
closes#7850
Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of
the simpler `time_zone` option. Previously, specifying different values for these could
lead to confusing scenarios where ES would return bucket keys that are not UTC.
Now `time_zone` is the only option setting, the calculation of date buckets to take place in the
preferred time zone, but after rounding converting the bucket key values back to UTC.
Closes#9062Closes#9637
Add offset option to 'date_histogram' replacing and simplifying the previous 'pre_offset' and 'post_offset' options.
This change is part of a larger clean up task for `date_histogram` from issue #9062.
We now have a very useful annotation to mark features or parameters as
experimental. Let's use it! This commit replaces some custom text warnings with
this annotation and adds this annotation to some existing features/parameters:
- inner_hits (unreleased yet)
- terminate_after (released in 1.4)
- per-bucket doc count errors in the terms agg (released in 1.4)
I also tagged with this annotation settings which should either be not needed
(like the ability to evict entries from the filter cache based on time) or that
are too deep into the way that Elasticsearch works like the Directory
implementation or merge settings.
Close#9563
These aggregations are not experimental anymore but some of their parameters
still are:
- `precision_threshold` and `rehash` on `cardinality`
- `compression` on percentiles(-ranks)
Close#9560
Histogram aggregation supports an 'offset' option to move bucket boundaries.
In a histogram with buckets of size X these can be moved from 0, X, 2X, 3X,...
by an offset value of Y to Y, X+Y, 2X+Y, 3X+Y... by using the 'offset' option.
The previous 'pre_offset' and 'post_offset' options are removed in favour of
the simplified 'offset' option.
Closes#9417Closes#9505
Extended_stats now displays the upper and lower bounds on standard deviations (e.g. avg +/- std).
Default is to show 2 std above/below, but can be changed using the `sigma` parameter.
Accepts non-negative doubles
Closes#9356
This commit adds a new field to the response of the terms aggregation called
`sum_other_doc_count` which is equal to the sum of the doc counts of the buckets
that did not make it to the list of top buckets. It is typically useful to have
a sector called eg. `other` when using terms aggregations to build pie charts.
Example query and response:
```json
GET test/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color",
"size": 3
}
}
}
}
```
```json
{
[...],
"aggregations": {
"colors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 4,
"buckets": [
{
"key": "blue",
"doc_count": 65
},
{
"key": "red",
"doc_count": 14
},
{
"key": "brown",
"doc_count": 3
}
]
}
}
}
```
Close#8213
By letting the fetch phase understand the nested docs structure we can serve nested docs as hits.
The `top_hits` aggregation can because of this commit be placed in a `nested` or `reverse_nested` aggregation.
Closes#7164
This change removes the script_type parameter form the Scripted Metric Aggregation and adds support for _file and _id suffixes to the init_script, map_script, combine_script and reduce_script parameters to make defining the source of the script consistent with the other APIs which use the ScriptService
Changes the name of the field in the scripted metrics aggregation from 'aggregation' to 'value' to be more in line with the other metrics aggregations like 'avg'