GeoDistance query, sort, and scripts make use of a crazy GeoDistance enum for handling 4 different ways of computing geo distance: SLOPPY_ARC, ARC, FACTOR, and PLANE. Only two of these are necessary: ARC, PLANE. This commit removes SLOPPY_ARC, and FACTOR and cleans up the way Geo distance is computed.
Added missing CONSOLE scripts to documentation for sampler and diversified_sampler aggs.
Includes new StackOverflow index setup in build.gradle
Closes#22746
* Formatting tweaks
This adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the
snippets in the docs for the `date_range` aggregation and tests
those snippets as part of the build.
Relates to #18160
This adds the `VIEW IN SENSE` and `COPY AS CURL` links and has
the build automatically execute the snippets and verify that they
work.
Relates to #18160
Adds the `VIEW IN CONSOLE` and `COPY AS CURL` links to the example
`global` aggregation. Also improves the example by adding a
non-`global` aggregation to compare it to.
Relates to #18160
Similar to the Filters aggregation but only supports "keyed" filter buckets and automatically "ANDs" pairs of filters to produce a form of adjacency matrix.
The intersection of buckets "A" and "B" is named "A&B" (the choice of separator is configurable). Empty intersection buckets are removed from the final results.
Closes#22169
* Promote longs to doubles when a terms agg mixes decimal and non-decimal number
This change makes the terms aggregation work when the buckets coming from different indices are a mix of decimal numbers and non-decimal numbers. In this case non-decimal number (longs) are promoted to decimal (double) which can result in a loss of precision for big numbers.
Fixes#22232
The use of the avg aggregation for sorting the terms aggregation is not encouraged since it has unbounded error. This changes the examples to use the max aggregation which does not suffer the same issues
and be much more stingy about what we consider a console candidate.
* Add `// CONSOLE` to check-running
* Fix version in some snippets
* Mark groovy snippets as groovy
* Fix versions in plugins
* Fix language marker errors
* Fix language parsing in snippets
This adds support for snippets who's language is written like
`[source, txt]` and `["source","js",subs="attributes,callouts"]`.
This also makes language required for snippets which is nice because
then we can be sure we can grep for snippets in a particular language.
Currently both aggregations really share the same implementation. This commit
splits the implementations so that regular histograms can support decimal
intervals/offsets and compute correct buckets for negative decimal values.
However the response API is still the same. So for intance both regular
histograms and date histograms will produce an
`org.elasticsearch.search.aggregations.bucket.histogram.Histogram`
aggregation.
The optimization to compute an identifier of the rounded value and the
rounded value itself has been removed since it was only used by regular
histograms, which now do the rounding themselves instead of relying on the
Rounding abstraction.
Closes#8082Closes#4847
The current heuristic to compute a default shard size is pretty aggressive,
it returns `max(10, number_of_shards * size)` as a value for the shard size.
I think making it less aggressive has the benefit that it would reduce the
likelyness of running into OOME when there are many shards (yearly
aggregations with time-based indices can make numbers of shards in the
thousands) and make the use of breadth-first more likely/efficient.
This commit replaces the heuristic with `size * 1.5 + 10`, which is enough
to have good accuracy on zipfian distributions.
The filters aggregation now has an option to add an 'other' bucket which will, when turned on, contain all documents which do not match any of the defined filters. There is also an option to change the name of the 'other' bucket from the default of '_other_'
Closes#11289
Replace the previous example which leveraged a range filter, which causes unnecessary confusion about when to use a range filter to create a single bucket or a range aggregation with exactly one member in ranges.
Closes#11704
This change unifies the way scripts and templates are specified for all instances in the codebase. It builds on the Script class added previously and adds request building and parsing support as well as the ability to transfer script objects between nodes. It also adds a Template class which aims to provide the same functionality for template APIs
Closes#11091
Most aggregations (terms, histogram, stats, percentiles, geohash-grid) now
support a new `missing` option which defines the value to consider when a
field does not have a value. This can be handy if you eg. want a terms
aggregation to handle the same way documents that have "N/A" or no value
for a `tag` field.
This works in a very similar way to the `missing` option on the `sort`
element.
One known issue is that this option sometimes cannot make the right decision
in the unmapped case: it needs to replace all values with the `missing` value
but might not know what kind of values source should be produced (numerics,
strings, geo points?). For this reason, we might want to add an `unmapped_type`
option in the future like we did for sorting.
Related to #5324
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.