druid/extensions-core/datasketches
Magnus Henoch 67f45fa7bf
Fix histograms for sketches where min and max are equal (#15381)
There is a problem with Quantiles sketches and KLL Quantiles sketches.
Queries using the histogram post-aggregator fail if:

- the sketch contains at least one value, and
- the values in the sketch are all equal, and
- the splitPoints argument is not passed to the post-aggregator, and
- the numBins argument is greater than 2 (or not specified, which
  leads to the default of 10 being used)

In that case, the query fails and returns this error:

    {
      "error": "Unknown exception",
      "errorClass": "org.apache.datasketches.common.SketchesArgumentException",
      "host": null,
      "errorCode": "legacyQueryException",
      "persona": "OPERATOR",
      "category": "RUNTIME_FAILURE",
      "errorMessage": "Values must be unique, monotonically increasing and not NaN.",
      "context": {
        "host": null,
        "errorClass": "org.apache.datasketches.common.SketchesArgumentException",
        "legacyErrorCode": "Unknown exception"
      }
    }

This behaviour is undesirable, since the caller doesn't necessarily
know in advance whether the sketch has values that are diverse
enough. With this change, the post-aggregators return [N, 0, 0...]
instead of crashing, where N is the number of values in the sketch,
and the length of the list is equal to numBins. That is what they
already returned for numBins = 2.

Here is an example of a query that would fail:

    {"queryType":"timeseries",
     "dataSource": {
       "type": "inline",
       "columnNames": ["foo", "bar"],
       "rows": [
          ["abc", 42.0],
          ["def", 42.0]
       ]
     },
     "intervals":["0000/3000"],
     "granularity":"all",
     "aggregations":[
       {"name":"the_sketch", "fieldName":"bar", "type":"quantilesDoublesSketch"}],
     "postAggregations":[
       {"name":"the_histogram",
        "type":"quantilesDoublesSketchToHistogram",
        "field":{"type":"fieldAccess","fieldName":"the_sketch"},
        "numBins": 3}]}

I believe this also fixes issue #10585.
2023-11-16 12:31:22 +05:30
..
src Fix histograms for sketches where min and max are equal (#15381) 2023-11-16 12:31:22 +05:30
README.md update links datasketches.github.io to datasketches.apache.org (#10107) 2020-07-01 14:56:17 -07:00
pom.xml Prepare master for Druid 29 (#15121) 2023-10-11 10:33:45 +05:30

README.md

This module provides Druid aggregators based on https://datasketches.apache.org/.

Credits: This module is a result of feedback and work done by following people.

https://github.com/cheddar https://github.com/himanshug https://github.com/leerho https://github.com/will-lauer