Commit Graph

2648 Commits

Author SHA1 Message Date
Gian Merlino 8f90589ce5
Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247)
* Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.

These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.

This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.

As to impact, Druid finalizes aggregators in two cases:

- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
  post-aggregator (not any other kind of post-aggregator).

With this patch, the functions will no longer be finalized in these cases.

The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.

So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.

Other changes:

1) Move various argument-checking logic from runtime to planning time in
   DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.

2) Add various JsonIgnores to the sketches to simplify their JSON representations.

3) Allow chaining of ExpressionPostAggregators and other PostAggregators
   in the SQL layer.

4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
   now that expressions can operate on complex inputs.

5) Adjust return type to thetaSketch (instead of OTHER) in
   ThetaSketchSetBaseOperatorConversion.

* Fix benchmark class.

* Fix compilation error.

* Fix ThetaSketchSqlAggregatorTest.

* Hopefully fix ITAutoCompactionTest.

* Adjustment to ITAutoCompactionTest.
2022-11-03 09:43:00 -07:00
Gian Merlino d1877e41ec
Use lookup memory footprint in MSQ memory computations. (#13271)
* Use lookup memory footprint in MSQ memory computations.

Two main changes:

1) Add estimateHeapFootprint to LookupExtractor.

2) Use this in MSQ's IndexerWorkerContext when determining the total
   amount of available memory. It's taken off the top.

This prevents MSQ tasks from running out of memory when there are lookups
defined in the cluster.

* Updates from code review.
2022-11-03 07:36:54 -07:00
317brian ae638e338c
docs(msq): update insert vs replace for dimension-based segment pruning (#13228)
* docs(msq): update insert vs replace to mention dimension-based segment pruning

* make suggested changes
2022-11-03 14:17:44 +05:30
Dr. Sizzles e5ad24ff9f
Support for middle manager less druid, tasks launch as k8s jobs (#13156)
* Support for middle manager less druid, tasks launch as k8s jobs

* Fixing forking task runner test

* Test cleanup, dependency cleanup, intellij inspections cleanup

* Changes per PR review

Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support

* Removing un-needed log lines

* Small changes per PR review

* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly

* Merge conflict fix

* Fixing tests and docs

* update tiny-cluster.yaml 

changed `enableTaskLevelLogPush` to `encapsulatedTask`

* Apply suggestions from code review

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Minor changes per PR request

* Cleanup, adding test to AbstractTask

* Add comment in peon.sh

* Bumping code coverage

* More tests to make code coverage happy

* Doh a duplicate dependnecy

* Integration test setup is weird for k8s, will do this in a different PR

* Reverting back all integration test changes, will do in anotbher PR

* use StringUtils.base64 instead of Base64

* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-02 19:44:47 -07:00
Jason Koch 0d03ce435f
introduce a "tree" type to the flattenSpec (#12177)
* introduce a "tree" type to the flattenSpec

* feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard

* feedback - expand docs to more clearly capture limitations of "tree" flattenSpec

* feedback - fix for typo on docs

* introduce a comment to explain defensive copy, tweak null handling

* fix: part of rebase

* mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type

* fix: objectflattener restore previous behavior to call getRootField for root type

* docs: ingestion/data-formats add note that ORC only supports path expressions

* chore: linter remove unused import

* fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark
2022-11-01 14:49:30 +08:00
Gian Merlino d851985cf5
MSQ: Add support for indexSpec. (#13275) 2022-10-28 14:27:50 -07:00
Adarsh Sanjeev 4775427e2c
Add task start status to worker report (#13263)
* Add task start status to worker report

* Address review comments

* Address review comments

* Update documentation

* Update spelling checks
2022-10-28 12:00:15 +05:30
Tejaswini Bandlamudi 49e54a0ec6
Docs: Update inputSegmentSizeBytes description (#13266) 2022-10-28 09:33:52 +05:30
Clint Wylie 77e4246598
add support for 'front coded' string dictionaries for smaller string columns (#12277)
* add FrontCodedIndexed for delta string encoding

* now for actual segments

* fix indexOf

* fixes and thread safety

* add bucket size 4, which seems generally better

* fixes

* fixes maybe

* update indexes to latest interfaces

* utf8 support

* adjust

* oops

* oops

* refactor, better, faster

* more test

* fixes

* revert

* adjustments

* fix prefixing

* more chill

* sql nested benchmark too

* refactor

* more comments and javadocs

* better get

* remove base class

* fix

* hot rod

* adjust comments

* faster still

* minor adjustments

* spatial index support

* spotbugs

* add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs

* fix docs

* push into constructor

* use base buffer instead of copy

* oops
2022-10-25 18:05:38 -07:00
317brian c83115e4e1
api: change API page formatting (#13213)
Tracking additional improvements requested by @paul-rogers: #13239

* api: refactor page so that indented bullet is child and unindented portion is parent

* get rid of post etc headings and combine them with the endpoint

* Update docs/operations/api-reference.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* fix broken links

* fix typo

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-10-18 13:22:26 -07:00
Paul Rogers b34b4353f4
Async reads for JDBC (#13196)
Async reads for JDBC:
Prevents JDBC timeouts on long queries by returning empty batches
when a batch fetch takes too long. Uses an async model to run the
result fetch concurrently with JDBC requests.

Fixed race condition in Druid's Avatica server-side handler
Fixed issue with no-user connections
2022-10-18 11:40:57 -07:00
cristian-popa cc10350870
Collocated processes instructions (#13224)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
2022-10-17 11:56:00 -07:00
Gian Merlino 3bbb76f17b
Docs: Add query/cpu/time to real-time metrics. (#13229) 2022-10-15 18:26:44 +05:30
arvindanugula 42384d85e7
Update nested-columns.md (#13227)
typo error corrected.
2022-10-14 16:15:46 -07:00
Victoria Lim 02ad62a08c
Docs: update description of query priority default value (#13191)
* update description of default for query priority

* update order

* update terms

* standardize to query context parameters
2022-10-14 14:28:04 -07:00
Karan Kumar 9d51e466b1
Minor doc update for BroadcastTablesTooLarge (#13218)
Minor doc update for `BroadcastTablesTooLarge`. Now the user will know what to do
in case this fault is encountered.
2022-10-14 09:06:55 +05:30
Tejaswini Bandlamudi 3e13584e0e
Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144)
* Idle Seekable stream supervisor changes.

* nit

* nit

* nit

* Adds unit tests

* Supervisor decides it's idle state instead of AutoScaler

* docs update

* nit

* nit

* docs update

* Adds Kafka unit test

* Adds Kafka Integration test.

* Updates travis config.

* Updates kafka-indexing-service dependencies.

* updates previous offsets snapshot & doc

* Doesn't act if supervisor is suspended.

* Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes.

* Reverts Kinesis Supervisor idle behaviour changes.

* nit

* nit

* Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests.

* Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too

* Adds Kafka Supervisor UT

* Improves test coverage in druid-server

* Corrects IT override config

* Doc updates and Syntactic changes

* nit

* supervisorSpec.ioConfig.idleConfig changes
2022-10-12 18:31:08 +05:30
Jonathan Wei 9b8e69c99a
Add inline descriptor Protobuf bytes decoder (#13192)
* Add inline descriptor Protobuf bytes decoder

* PR comments

* Update tests, check for IllegalArgumentException

* Fix license, add equals test

* Update extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/InlineDescriptorProtobufBytesDecoder.java

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-10-11 13:37:28 -05:00
Charles Smith 25c1d55dd6
Clarify behavior when decommissioningMaxPercentOfMaxSegmentsToMove = 0 (#13157)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-10-07 09:01:32 -07:00
317brian 0edceead80
msq: update known issue about GROUPING SETS and COUNT DISTINCT (#13185)
* msq: update known issue about GROUPING SETS and COUNT DISTINCT

* address feedback from Gian
2022-10-05 19:47:03 -07:00
AmatyaAvadhanula 41e51b21c3
Make http options the default configurations (#13092)
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
2022-10-05 05:35:17 +05:30
Adarsh Sanjeev 92d2633ae6
Update ClusterByStatisticsCollectorImpl to use bytes instead of keys (#12998)
* Update clusterByStatistics to use bytes instead of keys

* Address review comments

* Resolve checkstyle

* Increase test coverage

* Update test

* Update thresholds

* Update retained keys function

* Update docs

* Fix spelling
2022-10-03 12:08:23 +05:30
Jill Osborne 548d810baa
Correct nested columns example (#13150) 2022-09-28 10:39:56 +05:30
David Palmer 0d7bf66578
Add a note to the documentation about pre-built HLLSketches (#13088)
* add a note to the documentation about pre-built HLLSketches

Druid actually supports ingesting a pre-generated sketch column by using
the HLLSketchMerge aggregator. However, this functionality was
previously not made clear in the documentation.

* copyedit from the King's English to American English

* add suggested style changes

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-09-27 10:29:39 +08:00
Apoorv Gupta c8f4d72fb1
Fix documentation bug about injective lookups (#13147)
replace mapping to `unique keys` with mapping to `unique values`.
2022-09-27 10:16:48 +08:00
Jonathan Wei 1f1fced6d4
Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089)
* Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON

* Fix serde and docs

* Add PR comment check
2022-09-26 19:51:04 -05:00
Charles Smith eb760c3d1d
update log4j example (#13095)
* update log4j example

* fix some style issues

* Update docs/configuration/logging.md

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-22 09:46:49 +08:00
317brian 12f12a13a9
fix: fix broken postgres link (#13135) 2022-09-22 09:46:20 +08:00
317brian 7fa35839c0
fix: follow naming convention for msq task engine (#13127)
* fix: follow naming convention for msq task engine

* more fixes

* add back in experimental

* fix anchor
2022-09-21 18:46:06 -07:00
Gian Merlino 2f731f356e
Update pull-deps docs with correct repo list. (#13134)
There is only one default remote repo at this time.
2022-09-21 12:16:57 -07:00
Katya Macedo 90d14f629a
spatial-filters (#13124) 2022-09-20 22:48:36 -07:00
hosswald 5ed5c83aab
Clarified the behaviour of SQL COUNT(DISTINCT dim) on multi-value dimensions (#13128)
* Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns

* Update docs/querying/sql-aggregations.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-09-20 18:03:34 -07:00
Vadim Ogievetsky edc444a4bc
fix quickstart (#13126) 2022-09-20 17:44:21 -07:00
Vadim Ogievetsky b9edfe34a4
be consistent about referring to the web console by its name (#13118) 2022-09-19 15:02:17 -07:00
Vadim Ogievetsky bb0b810b1d
fix html tags in docs (#13117)
* fix html tags in docs

* revert not null
2022-09-18 19:40:33 -07:00
Gian Merlino d9b2968edb
Docs: Clarify the situation with SELECT. (#13109) 2022-09-17 10:47:57 -07:00
Charles Smith b366a6c5a4
Add clarification around docker environment #8926 (#13084)
* Add clarification around docker environment #8926

* fix spelling

* Update docs/tutorials/docker.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/tutorials/docker.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* fix nano quickstart

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-17 20:44:24 +08:00
Gian Merlino d4967c38f8
Various documentation updates. (#13107)
* Various documentation updates.

1) Split out "data management" from "ingestion". Break it into thematic pages.

2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
   all conceptual content is in concepts.md and all syntax content is in reference.md.
   Shorten the known issues page to the most interesting ones.

3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
   index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.

4) Rename various mentions of "Druid console" to "web console".

5) Add additional information to ingestion/partitioning.md.

6) Remove a mention of Tranquility.

7) Remove a note about upgrading to Druid 0.10.1.

8) Remove no-longer-relevant task types from ingestion/tasks.md.

9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.

10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
    places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
    in the sidebar.

11) Make all br tags self-closing.

12) Certain other cosmetic changes.

13) Update to node-sass 7.

* make travis use node12 for docs

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2022-09-16 21:58:11 -07:00
Vadim Ogievetsky 2493eb17bf
Doc fixes around msq (#13090)
* remove things that do not apply

* fix more things

* pin node to a working version

* fix

* fixes

* known issues tidy up

* revert auto formatting changes

* remove management-uis page which is 100% lies

* don't mention the Coordinator console (that no longer exits)

* goodies

* fix typo
2022-09-16 02:15:26 -07:00
Katya Macedo 2218c8d23c
Documentation: Update spatial indexing example (#12555)
* fix spatial indexing example

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update text and example

* Format JSON example

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/development/geo.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Accept review suggestions

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-16 10:32:19 +08:00
Atul Mohan a8fd3a9077
Provide service specific log4j overrides in containerized deployments (#13020)
* Provide service specific log4j overrides

* Clarify comments

* Add docs
2022-09-14 11:47:11 +08:00
Benedict Jin 4bde50e683
Bump the version of Druid docker image from 0.16.0-incubating to latest (#13058) 2022-09-10 14:06:00 +05:30
Vadim Ogievetsky 4fc43670e5
adjust docs and images (#13067) 2022-09-10 14:05:19 +05:30
DENNIS dced61645f
prometheus-emitter supports sending metrics to pushgateway regularly … (#13034)
* prometheus-emitter supports sending metrics to pushgateway regularly and continuously

* spell check fix

* Optimization variable name and related documents

* Update docs/development/extensions-contrib/prometheus.md

OK, it looks more conspicuous

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update doc

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* When PrometheusEmitter is closed, close the scheduler

* Ensure that registeredMetrics is thread safe.

* Local variable name optimization

* Remove unnecessary white space characters

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-09 20:46:14 +08:00
sachidananda007 48c99054d0
Update tutorial-kafka.md (#13056)
* Update tutorial-kafka.md

Added missing command to the doc for zookeeper before starting kafka

* Update docs/tutorials/tutorial-kafka.md

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-09 10:06:19 +08:00
Frank Chen d57557d51d
Improve doc and configuration of prometheus emitter (#13028)
* Improve doc and validation

* Add configuration for peon tasks

* Update doc

* Update test case

* Fix typo

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-09-09 02:20:34 +08:00
Rohan Garg 7aa8d7f987
Add query/time metric for SQL queries from router (#12867)
* Add query/time metric for SQL queries from router

* Fix query cancel bug when user has overriden native query-id in a SQL query
2022-09-07 13:54:46 +05:30
Adam Peck ee22663dd3
Add interpolation to JsonConfigurator (#13023)
* Add interpolation to JsonConfigurator

* Fix checkstyle

* Fix tests by removing common-text override

* Add back commons-text without version

* Remove unused hadoopDir configs

* Move some stuff to hopefully pass coverage
2022-09-07 12:48:01 +05:30
Clint Wylie a3a377e570
more consistent expression error messages (#12995)
* more consistent expression error messages

* review stuff

* add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff

* fixes

* add expression transform name to transformer failure, better parse_json error messaging
2022-09-06 23:21:38 -07:00
Jill Osborne 1f69140623
Nested columns documentation (#12946)
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: brian.le <brian.le@imply.io>
2022-09-06 14:42:18 -07:00