druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	ef6811ef88	Improved Java 17 support and Java runtime docs. (#12839 ) * Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2022-08-03 23:16:05 -07:00
Gian Merlino	2912a36a20	Use nonzero default value of maxQueuedBytes. (#12840 ) * Use nonzero default value of maxQueuedBytes. The purpose of this parameter is to prevent the Broker from running out of memory. The prior default is unlimited; this patch changes it to a relatively conservative 25MB. This may be too low for larger clusters. The risk is that throughput can decrease for queries with large resultsets or large amounts of intermediate data. However, I think this is better than the risk of the prior default, which is that these queries can cause the Broker to go OOM. * Alter calculation.	2022-08-02 17:57:27 -07:00
Charles Smith	efbb58e90e	docs: remove maxRowsPerSegment where appropriate (#12071 ) * remove maxRowsPerSegment where appropriate * fix tutorial, accept suggestions * Update docs/design/coordinator.md * additional tutorial file * fix initial index spec * accept comments * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add back comment on maxrows per segment * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rm duplicate entry * Update native-batch-simple-task.md remove ref to `maxrowspersegment` * Update native-batch.md remove ref to `maxrowspersegment` * final tenticles * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-07-28 16:52:13 +05:30
Atul Mohan	93a9a4b1c5	Add retention for file request logs (#12559 ) * Add retention for file request logs * Spelling	2022-07-27 08:17:02 -07:00
TSFenwick	8c02880d5f	Emit metrics for distribution of number of rows per segment (#12730 ) * initial commit of bucket dimensions for metrics return counts of segments that have rowcount in a bucket size for a datasource return average value of rowcount per segment in a datasource added unit test naming could use a lot of work buckets right now are not finalized added javadocs altered metrics.md * fix checkstyle issues * addressed review comments add monitor test move added functionality to new monitor update docs * address comments renamed monitor handle tombstones better update docs added javadocs * Add support for tombstones in the segment distribution * undo changes to tombstone segmentizer factory * fix accidental whitespacing changes * address comments regarding metrics documentation and rename variable to be more accurate * fix tests * fix checkstyle issues * fix broken test * undo removal of timeout	2022-07-12 07:04:42 -07:00
Rui Chen	068bea6334	deps: upgrade mysql-connector-java to v5.1.49 (#12704 )	2022-06-29 23:15:46 +08:00
Gian Merlino	d29343cbe3	Disable autokill of segments by default. (#12693 ) Also add clarifying commentary to the documentation about how durationToRetain works.	2022-06-23 17:17:11 -07:00
Tejaswini Bandlamudi	99e1b4efee	Update default value of `inputSegmentSizeBytes` in configuration docs (#12678 )	2022-06-22 09:05:03 +05:30
Gian Merlino	0099940808	Add TIME_IN_INTERVAL SQL operator. (#12662 ) * Add TIME_IN_INTERVAL SQL operator. The operator is implemented as a convertlet rather than an OperatorConversion, because this allows it to be equivalent to using the >= and < operators directly. * SqlParserPos cannot be null here. * Remove unused import. * Doc updates. * Add words to dictionary.	2022-06-21 13:05:37 -07:00
Jill Osborne	f050069767	Segments doc update (#12344 ) * Corrected heading levels in segments doc * IMPLY-18394: Updated Segments doc * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/segments.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update segments.md * Updated links to changed headings in Segments doc * Corrected spelling error * Update segments.md Incorporated suggestions from Paul Rogers. * Update index.md * Update segments.md * Update segments.md * Update segments.md * Update compaction.md * Update docs/design/segments.md fix typo * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/design/segments.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-06-16 13:25:17 -07:00
Victoria Lim	353475bd36	Docs for automatic compaction (#12569 ) * docs for auto-compaction * fix broken links * another link * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Suneet Saldanha <suneet@apache.org> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> * reorg content for skipOffset * Update docs/ingestion/automatic-compaction.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-06-09 14:55:12 -07:00
Gian Merlino	a503683a4a	Add caching and CSP response headers. (#12609 ) * Add caching and CSP response headers. * Fix tests. * Fix checkstyle issues Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-06-04 21:46:49 +05:30
Gian Merlino	a27f4f5740	Service stdout log files, move logs to log/. (#12570 ) * Service stdout log files, move logs to log/. Two changes that make log behavior cleaner: 1) Redirect messages from the Java runtime to their own log files. Otherwise, they would get jumbled up in the output of the all-in-one start command. 2) Use log/ instead of bin/log/ for the default log directory. Makes them easier to find. Additionally, add documentation about how to avoid the reflective access warnings in Java 11. * Spelling. * See if code formatting affects spelling.	2022-06-03 10:44:29 +05:30
Katya Macedo	5073cee73f	Fix zookeeper spelling (#12556 )	2022-05-21 16:14:02 +08:00
317brian	351e57bdb6	docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459 ) * docs(fix): clarify how worker.version and minWorkerVersion comparison works * Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works" This reverts commit `cadd1fdc60`. * docs(fix): clarify how worker.version and minWorkerVersion comparison works * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md fix spelling Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-05-16 07:48:33 -07:00
Frank Chen	c33ff1c745	Enforce console logging for peon process (#12067 ) Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log. But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage. So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.	2022-05-16 15:07:21 +05:30
Lucas Capistrant	deb69d1bc0	Allow coordinator to be configured to kill segments in future (#10877 ) Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system. A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.	2022-05-11 07:35:15 +05:30
Victoria Lim	0206a2da5c	Update automatic compaction docs with consistent terminology (#12416 ) * specify automatic compaction where applicable * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update for style and consistency * implement suggested feedback * remove duplicate example * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/ingestion/compaction.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/operations/api-reference.md * update .spelling * Adopt review suggestions Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2022-05-03 16:22:25 -07:00
zachjsh	564d6defd4	Worker level task metrics (#12446 ) * * fix metric name inconsistency * * add task slot metrics for middle managers * * add new WorkerTaskCountStatsMonitor to report task count metrics from worker * * more stuff * * remove unused variable * * more stuff * * add javadocs * * fix checkstyle * * fix hadoop test failure * * cleanup * * add more code coverage in tests * * fix test failure * * add docs * * increase code coverage * * fix spelling * * fix failing tests * * remove dead code * * fix spelling	2022-04-26 11:44:44 -05:00
Maytas Monsereenusorn	5d37d9f9d8	Add docs to metric spec for auto compaction (#12415 ) * add docs * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update index.md * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-13 13:27:00 -07:00
Parag Jain	2c79d28bb7	Copy of #11309 with fixes (#12402 ) * Optionally load segment index files into page cache on bootstrap and new segment download * Fix unit test failure * Fix test case * fix spelling * fix spelling * fix test and test coverage issues Co-authored-by: Jian Wang <wjhypo@gmail.com>	2022-04-11 21:05:24 +05:30
mark-imply	bf96ddf5ba	Update index.md (#12390 ) Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.	2022-04-08 18:01:54 +05:30
Victoria Lim	d326c681c1	Document config for ingesting null columns (#12389 ) * config for ingesting null columns * add link * edit .spelling * what happens if storeEmptyColumns is disabled	2022-04-05 09:15:42 -07:00
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
somu-imply	a1ea658115	Introducing a new config to ignore nulls while computing String Cardinality (#12345 ) * Counting nulls in String cardinality with a config * Adding tests for the new config * Wrapping the vectorize part to allow backward compatibility * Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes * Updating testcase and code * Adding null handling test to improve coverage * Checkstyle fix * Adding 1 more change in docs * Making docs clearer	2022-03-29 14:31:36 -07:00
Victoria Lim	9ed7aa33ec	Docs for request logging (#12363 ) * add docs for request logging * remove stray character * Update docs/operations/request-logging.md Co-authored-by: TSFenwick <tsfenwick@gmail.com> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: TSFenwick <tsfenwick@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-28 14:09:41 -07:00
Dr. Sizzles	69f928f50e	Adding k8s support for human readable parsing (#12316 ) * Adding k8s support for human readable parsing * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java Co-authored-by: Frank Chen <frankchen@apache.org> * Changes per review Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-03-16 11:18:47 +08:00
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	875e0696e0	GroupBy: Cap dictionary-building selector memory usage. (#12309 ) * GroupBy: Cap dictionary-building selector memory usage. New context parameter "maxSelectorDictionarySize" controls when the per-segment processing code should return early and trigger a trip to the merge buffer. Includes: - Vectorized and nonvectorized implementations. - Adjustments to GroupByQueryRunnerTest to exercise this code in the v2SmallDictionary suite. (Both the selector dictionary and the merging dictionary will be small in that suite.) - Tests for the new config parameter. * Fix issues from tests. * Add "pre-existing" to dictionary. * Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods. * Adjustments from review comments.	2022-03-08 13:13:11 -08:00
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
AmatyaAvadhanula	393e9b68a8	Add config to limit task slots for parallel indexing tasks (#12221 ) In extreme cases where many parallel indexing jobs are submitted together, it is possible that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule their own sub-tasks thus stalling progress of all the indexing jobs. Key changes: - Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots for `ParallelIndexSupervisorTasks` per worker - `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior) - `ratio = 0` implies supervisor tasks can not use any slot on a worker (actually, at least 1 slot is always available to ensure progress of parallel indexing jobs) - `ImmutableWorkerInfo.canRunTask()` - `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`	2022-02-15 23:15:09 +05:30
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Suneet Saldanha	ced1389d4c	Enable auto kill segments by default (#12187 ) * Enable auto-kill by default * tests * wip * test * fix IT * fix it * remove from docs * make coverage bot happy	2022-02-07 06:57:54 -08:00
Suneet Saldanha	159f97dcb0	Update docs for druid.processing.numThreads in brokers (#12231 ) * Update docs for druid.processing.numThreads * error msg * one more reference	2022-02-04 17:34:21 -08:00
Maytas Monsereenusorn	fac6a48a8f	add impl (#12201 )	2022-01-27 11:39:59 -08:00
Suneet Saldanha	2b32d86f3b	Enable automatic metdata cleanup by default (#12188 )	2022-01-24 20:04:17 -08:00
somu-imply	c267b65f97	Removing unused processing threadpool on broker (#12070 ) * Thread pool for broker * Updating two tests to improve coverage for new method added * Updating druidProcessingConfigTest to cover coverage * Adding missed spelling errors caused in doc * Adding test to cover lines of new function added	2021-12-21 13:07:53 -08:00
Lucas Capistrant	150902b95c	clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960 ) * clean up the balancing code around the batched vs deprecated way of sampling segments to balance * fix docs, clarify comments, add deprecated annotations to legacy code * remove unused variable * update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated * fix dynamic config text for percentOfSegmentsToConsiderPerMove * run prettier to cleanup coordinator-dynamic-config.tsx changes * update jest snapshot * update documentation per review feedback	2021-12-07 14:47:46 -08:00
Peter Marshall	0b3f0bbbd8	Docs - Metrics docs layout and info about query/bytes (#11481 ) * Metrics docs layout and info about query/bytes Knowledge transfer from https://groups.google.com/g/druid-user/c/8fiflmSEoTQ - updated the layout of the Metrics part, adding links between docs pages. Update index.md Amended typo * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Feedback applied Http --> HTTP and moved content / removed > * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-12-07 09:45:24 -08:00
Frank Chen	4631a66723	Support rolling log files (#10147 ) * apply log file rolling strategy * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * Use absolute log path and allow spaces in log path * Update log4j2 configuration * apply FileAppender to ZooKeeper * DO NOT redirect application's console log to file in supervisor	2021-12-03 21:32:01 +08:00
Charles Smith	7ed46800c3	Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983 ) Adds documentation for multi-dimension partitioning. cc: @kfaraz Refactors the native batch partitioning topic as follows: Native batch ingestion covers parallel-index Native batch simple task indexing covers index Native batch input sources covers ioSource Native batch ingestion with firehose covers deprecated firehose	2021-12-03 16:37:14 +05:30
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
Laksh Singla	c381cae51b	Improve the output of SQL explain message (#11908 ) Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side. This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).	2021-11-25 21:08:33 +05:30
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Kashif Faraz	a22687ecbe	Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to skip the realtime nodes and thus it would query only Historical processes for any given segment.	2021-11-02 12:38:42 +05:30
Maytas Monsereenusorn	ba2874ee1f	Support changing query granularity in Auto Compaction (#11856 ) * add queryGranularity * fix checkstyle * fix test	2021-11-01 15:18:44 -07:00
Maytas Monsereenusorn	33d9d9bd74	Add rollup config to auto and manual compaction (#11850 ) * add rollup to auto and manual compaction * add unit tests * add unit tests * add IT * fix checkstyle	2021-10-29 10:22:25 -07:00
Gian Merlino	fc95c92806	Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124 ) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions.	2021-10-26 08:05:56 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Arun Ramani	b6b42d3936	Minor processor quota computation fix + docs (#11783 ) * cpu/cpuset cgroup and procfs data gathering * Renames and default values * Formatting * Trigger Build * Add cgroup monitors * Return 0 if no period * Update * Minor processor quota computation fix + docs * Address comments * Address comments * Fix spellcheck Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>	2021-10-08 22:52:03 -05:00
Lucas Capistrant	1930ad1f47	Implement configurable internally generated query context (#11429 ) * Add the ability to add a context to internally generated druid broker queries * fix docs * changes after first CI failure * cleanup after merge with master * change default to empty map and improve unit tests * add doc info and fix checkstyle * refactor DruidSchema#runSegmentMetadataQuery and add a unit test	2021-10-06 09:02:41 -07:00
Kashif Faraz	b688db790b	Add Broker config `druid.broker.segment.ignoredTiers` (#11766 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to ignore the segments being served by the specified historical tiers. By default, no tier is ignored. This config is useful when you want a completely isolated tier amongst many other tiers. Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn and there are several brokers Broker B1, Broker B2 .... Broker Bm If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"] for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]	2021-10-06 10:06:32 +05:30
Charles Smith	621e5ac63f	docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor (#11565 ) * docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor * Update docs/configuration/index.md	2021-10-05 17:38:23 -07:00
Maytas Monsereenusorn	f60b3b3bab	fix doc (#11772 )	2021-10-05 15:42:11 -07:00
Caroline1000	ffbe303828	Update balancer strategy recommendations (#11759 ) * Update balancer strategy recommendations * Update docs/configuration/index.md * Update docs/configuration/index.md Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-10-05 09:47:37 -07:00
Maytas Monsereenusorn	129911a20e	Add documentations for config to filter internal Druid-related messages from error response (#11755 ) * add doc * add doc * address comments * fix typo * address comments	2021-10-01 17:49:02 +07:00
Kashif Faraz	c641657bae	Fix router documentation for `druid.router.sql.enable` (#11716 ) * Rename field, fix router documentation * Add more lines to doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-28 22:54:13 +05:30
Clint Wylie	5de26cf6d9	add optional system schema authorization (#11720 ) * add optional system schema authorization * remove unused * adjust docs * doc fixes, missing ldap config change for integration tests * style	2021-09-21 13:28:26 -07:00
Peter Marshall	ee009ec18e	Docs - ingestion task log config and process (#11678 ) * Update index.md Moved H4s underneath the H3 for the task log location and added hyperlinks. * Update tasks.md Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead. * Update tasks.md .html > .md * Update docs/ingestion/tasks.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2021-09-13 15:49:09 -07:00
Charles Smith	f9329fbf9e	add clarification for maxSubqueryRows (#11687 ) * add clarification for maxSubqueryRows	2021-09-13 11:49:30 -07:00
Suneet Saldanha	531d11abaf	Update description of batchProcessingMode (#11686 ) * Update description of batchProcessingMode Update the description to explicitly mention a released version of Druid that the original version was referencing * Update docs/configuration/index.md * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-10 16:55:48 -07:00
Agustin Gonzalez	9efa6cc9c8	Make persists concurrent with adding rows in batch ingestion (#11536 ) * Make persists concurrent with ingestion * Remove semaphore but keep concurrent persists (with add) and add push in the backround as well * Go back to documented default persists (zero) * Move to debug * Remove unnecessary Atomics * Comments on synchronization (or not) for sinks & sinkMetadata * Some cleanup for unit tests but they still need further work * Shutdown & wait for persists and push on close * Provide support for three existing batch appenderators using batchProcessingMode flag * Fix reference to wrong appenderator * Fix doc typos * Add BatchAppenderators class test coverage * Add log message to batchProcessingMode final value, fix typo in enum name * Another typo and minor fix to log message * LEGACY->OPEN_SEGMENTS, Edit docs * Minor update legacy->open segments log message * More code comments, mostly small adjustments to naming etc * fix spelling * Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage * Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments" Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-09-08 13:31:52 -07:00
Clint Wylie	ec334a641b	MySQL extension with MariaDB connector docs (#11608 ) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple	2021-08-19 01:52:26 -07:00
Peter Marshall	8aaefb91e3	Docs - MiddleManager Affinity "strong" definition (#11480 ) * Affinity "strong" definition Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800 * Spelling corrections * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-13 19:17:16 -07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
Charles Smith	6524d838d7	Docs refactor of ingestion. Carries #11541 (#11576 ) * Docs refactor of ingestion. Carries #11541 * Update docs/misc/math-expr.md * add Apache license * fix header, add topics to sidebar * Update docs/ingestion/partitioning.md * pick up changes to and md from `c7fdf1d`, #11479 Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-13 08:42:03 -07:00
Kashif Faraz	aaf0aaad8f	Enable routing of SQL queries at Router (#11566 ) This PR adds a new property druid.router.sql.enable which allows the Router to handle SQL queries when set to true. This change does not affect Avatica JDBC requests and they are still routed by hashing the Connection ID. To allow parsing of the request object as a SqlQuery (contained in module druid-sql), some classes have been moved from druid-server to druid-services with the same package name.	2021-08-13 18:44:39 +05:30
Charles Smith	941c5ffb05	clarify JVM tmp dir requires execute on files (#11542 ) * clarify JVM tmp dir requires execute on files * code SysMonitor for spellcheck	2021-08-09 17:25:10 -07:00
Suneet Saldanha	e423e99997	Update default maxSegmentsInNodeLoadingQueue (#11540 ) * Update default maxSegmentsInNodeLoadingQueue Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100. An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability. Since this is the default druid operators need to run into this instability and then look through the docs to see that the recommended value for a large cluster is 1000. This change makes it so the default will prevent clusters from falling over as they grow over time. * update tests * codestyle	2021-08-05 11:26:58 -07:00
frank chen	55a01a030a	Clarify that Broker caching for groupBy v2 queries does not work (#11370 ) * Add a note * Update docs/configuration/index.md Co-authored-by: sthetland <steve.hetland@imply.io> * clarify that both of non-result level cache and result level cache are not supported Co-authored-by: sthetland <steve.hetland@imply.io>	2021-08-03 10:01:15 -07:00
Yuanli Han	b83742179a	Reduce method invocation of reservoir sampling (#11257 ) * reduce method invocation of reservoir sampling * add a dynamic parameter and add benchmark * rebase	2021-07-30 22:09:50 +08:00
Maytas Monsereenusorn	c068906fca	Make intermediate store for shuffle tasks an extension point (#11492 ) * add interface * add docs * fix errors * fix injection * fix injection * update javadoc	2021-07-27 11:29:43 +07:00
Maytas Monsereenusorn	6ce3b6ca2d	Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444 ) * fix doc * address comments * Update docs/configuration/index.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-07-21 12:48:56 +07:00
Maytas Monsereenusorn	8d7d60d18e	Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440 ) * fix pendingTaskBased * fix doc * address comments * address comments * address comments * address comments * address comments * address comments * address comments	2021-07-15 06:52:25 +07:00
Agustin Gonzalez	7e61042794	Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294 ) * Bound memory in native batch ingest create segments * Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that * Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged * Changed name from RealtimeAppenderator to StreamAppenderator * Style * Incorporating tests from StreamAppenderatorTest * Keep totalRows and cleanup code * Added missing dep * Fix unit test * Checkstyle * allowIncrementalPersists should always be true for batch * Added sinks metadata * clear sinks metadata when closing appenderator * Style + minor edits to log msgs * Update sinks metadata & totalRows when dropping a sink (segment) * Remove max * Intelli-j check * Keep a count of hydrants persisted by sink for sanity check before merge * Move out sanity * Add previous hydrant count to sink metadata * Remove redundant field from SinkMetadata * Remove unneeded functions * Cleanup unused code * Removed unused code * Remove unused field * Exclude it from jacoco because it is very hard to get branch coverage * Remove segment announcement and some other minor cleanup * Add fallback flag * Minor code cleanup * Checkstyle * Code review changes * Update batchMemoryMappedIndex name * Code review comments * Exclude class from coverage, will include again when packaging gets fixed * Moved test classes to server module * More BatchAppenderator cleanup * Fix bug in wrong counting of totalHydrants plus minor cleanup in add * Removed left over comments * Have BatchAppenderator follow the Appenderator contract for push & getSegments * Fix LGTM violations * Review comments * Add stats after push is done * Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.) * Update javadocs * Add thread safety notice to BatchAppenderator * Further cleanup config * More config cleanup	2021-07-09 00:10:29 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
frank chen	60843bd11f	Add configuration suggestion to `druid.indexer.storage.type` (#11304 )	2021-05-27 06:44:47 -07:00
Maytas Monsereenusorn	3455352241	Add feature to automatically remove compaction configurations for inactive datasources (#11232 ) * add auto cleanup * add auto cleanup * add auto cleanup * add tests * add tests * use retryutils * use retryutils * use retryutils * address comments	2021-05-11 18:49:18 -07:00
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn	4326e699bd	Add feature to automatically remove datasource metadata based on retention period (#11227 ) * add auto clean up datasource metadata * add test * fix checkstyle * add comments * fix error * address comments * Address comments * fix test * fix test * fix typo * add comment * fix test * fix test	2021-05-11 01:22:33 -07:00
Charles Smith	fae7ebf489	change errant 'none' configuration to 'manual': (#11218 )	2021-05-10 22:04:18 -07:00
frank chen	fa113fb4a9	Fix default value (#11220 )	2021-05-10 10:11:26 -07:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
Maytas Monsereenusorn	d73f72e508	Add feature to automatically remove supervisor based on retention period (#11200 ) * add auto clean up * add test * add test * fix test * Address comments * Address comments	2021-05-06 22:25:23 -07:00
Lucas Capistrant	bb3c810b36	Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135 ) * lay the groundwork for throttling replicant loads per RunRules execution * Add dynamic coordinator config to control new replicant threshold. * remove redundant line * add some unit tests * fix checkstyle error * add documentation for new dynamic config * improve docs and logs * Alter how null is handled for new config. If null, manually set as default	2021-05-05 07:39:36 -05:00
Maytas Monsereenusorn	84aac4832d	Add feature to automatically remove rules based on retention period (#11164 ) * Add feature to automatically remove rules based on retention period * Add feature to automatically remove rules based on retention period * address comments	2021-05-03 11:50:45 -07:00
benkrug	fdab95ea99	Update index.md (#11174 ) tiny change for readability	2021-04-30 09:40:19 -07:00
Maytas Monsereenusorn	6d2b5cdd7e	Add feature to automatically remove audit logs based on retention period (#11084 ) * add docs * add impl * fix checkstyle * fix test * add test * fix checkstyle * fix checkstyle * fix test * Address comments * Address comments * fix spelling * fix docs	2021-04-20 17:10:43 -07:00
Maytas Monsereenusorn	f968400170	Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078 ) * Add config to skip storing audit payload if exceed limit * fix checkstyle * change config name * skip null fields for audit payload * fix checkstyle * address comments * fix guice * fix test * add tests * address comments * address comments * address comments * fix checkstyle * address comments * fix test * fix test * address comments * Address comments Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-13 20:18:28 -07:00
bergmt2000	f60d8ea1c3	Update index.md (#11105 ) Fix json typo in readme for granularitySpec in compaction config example	2021-04-13 16:26:36 +08:00
Maytas Monsereenusorn	4576152e4a	Make dropExisting flag for Compaction configurable and add warning documentations (#11070 ) * Make dropExisting flag for Compaction configurable * fix checkstyle * fix checkstyle * fix test * add tests * fix spelling * fix docs * add IT * fix test * fix doc * fix doc	2021-04-09 00:12:28 -07:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
Jihoon Son	cc12a57034	Enforce allow list for JDBC properties by default (#11063 ) * Enforce allow list for JDBC properties by default * fix tests	2021-04-06 19:46:19 -07:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00

1 2 3 4 5

222 Commits