* Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.
These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.
This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.
As to impact, Druid finalizes aggregators in two cases:
- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
post-aggregator (not any other kind of post-aggregator).
With this patch, the functions will no longer be finalized in these cases.
The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.
So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.
Other changes:
1) Move various argument-checking logic from runtime to planning time in
DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.
2) Add various JsonIgnores to the sketches to simplify their JSON representations.
3) Allow chaining of ExpressionPostAggregators and other PostAggregators
in the SQL layer.
4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
now that expressions can operate on complex inputs.
5) Adjust return type to thetaSketch (instead of OTHER) in
ThetaSketchSetBaseOperatorConversion.
* Fix benchmark class.
* Fix compilation error.
* Fix ThetaSketchSqlAggregatorTest.
* Hopefully fix ITAutoCompactionTest.
* Adjustment to ITAutoCompactionTest.
* Use lookup memory footprint in MSQ memory computations.
Two main changes:
1) Add estimateHeapFootprint to LookupExtractor.
2) Use this in MSQ's IndexerWorkerContext when determining the total
amount of available memory. It's taken off the top.
This prevents MSQ tasks from running out of memory when there are lookups
defined in the cluster.
* Updates from code review.
* Support for middle manager less druid, tasks launch as k8s jobs
* Fixing forking task runner test
* Test cleanup, dependency cleanup, intellij inspections cleanup
* Changes per PR review
Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support
* Removing un-needed log lines
* Small changes per PR review
* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly
* Merge conflict fix
* Fixing tests and docs
* update tiny-cluster.yaml
changed `enableTaskLevelLogPush` to `encapsulatedTask`
* Apply suggestions from code review
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Minor changes per PR request
* Cleanup, adding test to AbstractTask
* Add comment in peon.sh
* Bumping code coverage
* More tests to make code coverage happy
* Doh a duplicate dependnecy
* Integration test setup is weird for k8s, will do this in a different PR
* Reverting back all integration test changes, will do in anotbher PR
* use StringUtils.base64 instead of Base64
* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* introduce a "tree" type to the flattenSpec
* feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard
* feedback - expand docs to more clearly capture limitations of "tree" flattenSpec
* feedback - fix for typo on docs
* introduce a comment to explain defensive copy, tweak null handling
* fix: part of rebase
* mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type
* fix: objectflattener restore previous behavior to call getRootField for root type
* docs: ingestion/data-formats add note that ORC only supports path expressions
* chore: linter remove unused import
* fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark
* add FrontCodedIndexed for delta string encoding
* now for actual segments
* fix indexOf
* fixes and thread safety
* add bucket size 4, which seems generally better
* fixes
* fixes maybe
* update indexes to latest interfaces
* utf8 support
* adjust
* oops
* oops
* refactor, better, faster
* more test
* fixes
* revert
* adjustments
* fix prefixing
* more chill
* sql nested benchmark too
* refactor
* more comments and javadocs
* better get
* remove base class
* fix
* hot rod
* adjust comments
* faster still
* minor adjustments
* spatial index support
* spotbugs
* add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs
* fix docs
* push into constructor
* use base buffer instead of copy
* oops
Tracking additional improvements requested by @paul-rogers: #13239
* api: refactor page so that indented bullet is child and unindented portion is parent
* get rid of post etc headings and combine them with the endpoint
* Update docs/operations/api-reference.md
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
* fix broken links
* fix typo
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Async reads for JDBC:
Prevents JDBC timeouts on long queries by returning empty batches
when a batch fetch takes too long. Uses an async model to run the
result fetch concurrently with JDBC requests.
Fixed race condition in Druid's Avatica server-side handler
Fixed issue with no-user connections
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
* add a note to the documentation about pre-built HLLSketches
Druid actually supports ingesting a pre-generated sketch column by using
the HLLSketchMerge aggregator. However, this functionality was
previously not made clear in the documentation.
* copyedit from the King's English to American English
* add suggested style changes
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* update log4j example
* fix some style issues
* Update docs/configuration/logging.md
Co-authored-by: Frank Chen <frankchen@apache.org>
Co-authored-by: Frank Chen <frankchen@apache.org>
* Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns
* Update docs/querying/sql-aggregations.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Various documentation updates.
1) Split out "data management" from "ingestion". Break it into thematic pages.
2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
all conceptual content is in concepts.md and all syntax content is in reference.md.
Shorten the known issues page to the most interesting ones.
3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.
4) Rename various mentions of "Druid console" to "web console".
5) Add additional information to ingestion/partitioning.md.
6) Remove a mention of Tranquility.
7) Remove a note about upgrading to Druid 0.10.1.
8) Remove no-longer-relevant task types from ingestion/tasks.md.
9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.
10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
in the sidebar.
11) Make all br tags self-closing.
12) Certain other cosmetic changes.
13) Update to node-sass 7.
* make travis use node12 for docs
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
* remove things that do not apply
* fix more things
* pin node to a working version
* fix
* fixes
* known issues tidy up
* revert auto formatting changes
* remove management-uis page which is 100% lies
* don't mention the Coordinator console (that no longer exits)
* goodies
* fix typo
* prometheus-emitter supports sending metrics to pushgateway regularly and continuously
* spell check fix
* Optimization variable name and related documents
* Update docs/development/extensions-contrib/prometheus.md
OK, it looks more conspicuous
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update doc
* Update docs/development/extensions-contrib/prometheus.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* When PrometheusEmitter is closed, close the scheduler
* Ensure that registeredMetrics is thread safe.
* Local variable name optimization
* Remove unnecessary white space characters
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update tutorial-kafka.md
Added missing command to the doc for zookeeper before starting kafka
* Update docs/tutorials/tutorial-kafka.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* Add interpolation to JsonConfigurator
* Fix checkstyle
* Fix tests by removing common-text override
* Add back commons-text without version
* Remove unused hadoopDir configs
* Move some stuff to hopefully pass coverage
* more consistent expression error messages
* review stuff
* add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff
* fixes
* add expression transform name to transformer failure, better parse_json error messaging
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: brian.le <brian.le@imply.io>