druid

Commit Graph

Author	SHA1	Message	Date
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	875e0696e0	GroupBy: Cap dictionary-building selector memory usage. (#12309 ) * GroupBy: Cap dictionary-building selector memory usage. New context parameter "maxSelectorDictionarySize" controls when the per-segment processing code should return early and trigger a trip to the merge buffer. Includes: - Vectorized and nonvectorized implementations. - Adjustments to GroupByQueryRunnerTest to exercise this code in the v2SmallDictionary suite. (Both the selector dictionary and the merging dictionary will be small in that suite.) - Tests for the new config parameter. * Fix issues from tests. * Add "pre-existing" to dictionary. * Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods. * Adjustments from review comments.	2022-03-08 13:13:11 -08:00
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
AmatyaAvadhanula	393e9b68a8	Add config to limit task slots for parallel indexing tasks (#12221 ) In extreme cases where many parallel indexing jobs are submitted together, it is possible that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule their own sub-tasks thus stalling progress of all the indexing jobs. Key changes: - Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots for `ParallelIndexSupervisorTasks` per worker - `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior) - `ratio = 0` implies supervisor tasks can not use any slot on a worker (actually, at least 1 slot is always available to ensure progress of parallel indexing jobs) - `ImmutableWorkerInfo.canRunTask()` - `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`	2022-02-15 23:15:09 +05:30
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Suneet Saldanha	ced1389d4c	Enable auto kill segments by default (#12187 ) * Enable auto-kill by default * tests * wip * test * fix IT * fix it * remove from docs * make coverage bot happy	2022-02-07 06:57:54 -08:00
Suneet Saldanha	159f97dcb0	Update docs for druid.processing.numThreads in brokers (#12231 ) * Update docs for druid.processing.numThreads * error msg * one more reference	2022-02-04 17:34:21 -08:00
Maytas Monsereenusorn	fac6a48a8f	add impl (#12201 )	2022-01-27 11:39:59 -08:00
Suneet Saldanha	2b32d86f3b	Enable automatic metdata cleanup by default (#12188 )	2022-01-24 20:04:17 -08:00
somu-imply	c267b65f97	Removing unused processing threadpool on broker (#12070 ) * Thread pool for broker * Updating two tests to improve coverage for new method added * Updating druidProcessingConfigTest to cover coverage * Adding missed spelling errors caused in doc * Adding test to cover lines of new function added	2021-12-21 13:07:53 -08:00
Lucas Capistrant	150902b95c	clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960 ) * clean up the balancing code around the batched vs deprecated way of sampling segments to balance * fix docs, clarify comments, add deprecated annotations to legacy code * remove unused variable * update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated * fix dynamic config text for percentOfSegmentsToConsiderPerMove * run prettier to cleanup coordinator-dynamic-config.tsx changes * update jest snapshot * update documentation per review feedback	2021-12-07 14:47:46 -08:00
Peter Marshall	0b3f0bbbd8	Docs - Metrics docs layout and info about query/bytes (#11481 ) * Metrics docs layout and info about query/bytes Knowledge transfer from https://groups.google.com/g/druid-user/c/8fiflmSEoTQ - updated the layout of the Metrics part, adding links between docs pages. Update index.md Amended typo * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Feedback applied Http --> HTTP and moved content / removed > * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-12-07 09:45:24 -08:00
Frank Chen	4631a66723	Support rolling log files (#10147 ) * apply log file rolling strategy * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * Use absolute log path and allow spaces in log path * Update log4j2 configuration * apply FileAppender to ZooKeeper * DO NOT redirect application's console log to file in supervisor	2021-12-03 21:32:01 +08:00
Charles Smith	7ed46800c3	Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983 ) Adds documentation for multi-dimension partitioning. cc: @kfaraz Refactors the native batch partitioning topic as follows: Native batch ingestion covers parallel-index Native batch simple task indexing covers index Native batch input sources covers ioSource Native batch ingestion with firehose covers deprecated firehose	2021-12-03 16:37:14 +05:30
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
Laksh Singla	c381cae51b	Improve the output of SQL explain message (#11908 ) Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side. This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).	2021-11-25 21:08:33 +05:30
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Kashif Faraz	a22687ecbe	Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to skip the realtime nodes and thus it would query only Historical processes for any given segment.	2021-11-02 12:38:42 +05:30
Maytas Monsereenusorn	ba2874ee1f	Support changing query granularity in Auto Compaction (#11856 ) * add queryGranularity * fix checkstyle * fix test	2021-11-01 15:18:44 -07:00
Maytas Monsereenusorn	33d9d9bd74	Add rollup config to auto and manual compaction (#11850 ) * add rollup to auto and manual compaction * add unit tests * add unit tests * add IT * fix checkstyle	2021-10-29 10:22:25 -07:00
Gian Merlino	fc95c92806	Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124 ) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions.	2021-10-26 08:05:56 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Arun Ramani	b6b42d3936	Minor processor quota computation fix + docs (#11783 ) * cpu/cpuset cgroup and procfs data gathering * Renames and default values * Formatting * Trigger Build * Add cgroup monitors * Return 0 if no period * Update * Minor processor quota computation fix + docs * Address comments * Address comments * Fix spellcheck Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>	2021-10-08 22:52:03 -05:00
Lucas Capistrant	1930ad1f47	Implement configurable internally generated query context (#11429 ) * Add the ability to add a context to internally generated druid broker queries * fix docs * changes after first CI failure * cleanup after merge with master * change default to empty map and improve unit tests * add doc info and fix checkstyle * refactor DruidSchema#runSegmentMetadataQuery and add a unit test	2021-10-06 09:02:41 -07:00
Kashif Faraz	b688db790b	Add Broker config `druid.broker.segment.ignoredTiers` (#11766 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to ignore the segments being served by the specified historical tiers. By default, no tier is ignored. This config is useful when you want a completely isolated tier amongst many other tiers. Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn and there are several brokers Broker B1, Broker B2 .... Broker Bm If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"] for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]	2021-10-06 10:06:32 +05:30
Charles Smith	621e5ac63f	docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor (#11565 ) * docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor * Update docs/configuration/index.md	2021-10-05 17:38:23 -07:00
Maytas Monsereenusorn	f60b3b3bab	fix doc (#11772 )	2021-10-05 15:42:11 -07:00
Caroline1000	ffbe303828	Update balancer strategy recommendations (#11759 ) * Update balancer strategy recommendations * Update docs/configuration/index.md * Update docs/configuration/index.md Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-10-05 09:47:37 -07:00
Maytas Monsereenusorn	129911a20e	Add documentations for config to filter internal Druid-related messages from error response (#11755 ) * add doc * add doc * address comments * fix typo * address comments	2021-10-01 17:49:02 +07:00
Kashif Faraz	c641657bae	Fix router documentation for `druid.router.sql.enable` (#11716 ) * Rename field, fix router documentation * Add more lines to doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-28 22:54:13 +05:30
Clint Wylie	5de26cf6d9	add optional system schema authorization (#11720 ) * add optional system schema authorization * remove unused * adjust docs * doc fixes, missing ldap config change for integration tests * style	2021-09-21 13:28:26 -07:00
Peter Marshall	ee009ec18e	Docs - ingestion task log config and process (#11678 ) * Update index.md Moved H4s underneath the H3 for the task log location and added hyperlinks. * Update tasks.md Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead. * Update tasks.md .html > .md * Update docs/ingestion/tasks.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2021-09-13 15:49:09 -07:00
Charles Smith	f9329fbf9e	add clarification for maxSubqueryRows (#11687 ) * add clarification for maxSubqueryRows	2021-09-13 11:49:30 -07:00
Suneet Saldanha	531d11abaf	Update description of batchProcessingMode (#11686 ) * Update description of batchProcessingMode Update the description to explicitly mention a released version of Druid that the original version was referencing * Update docs/configuration/index.md * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-10 16:55:48 -07:00
Agustin Gonzalez	9efa6cc9c8	Make persists concurrent with adding rows in batch ingestion (#11536 ) * Make persists concurrent with ingestion * Remove semaphore but keep concurrent persists (with add) and add push in the backround as well * Go back to documented default persists (zero) * Move to debug * Remove unnecessary Atomics * Comments on synchronization (or not) for sinks & sinkMetadata * Some cleanup for unit tests but they still need further work * Shutdown & wait for persists and push on close * Provide support for three existing batch appenderators using batchProcessingMode flag * Fix reference to wrong appenderator * Fix doc typos * Add BatchAppenderators class test coverage * Add log message to batchProcessingMode final value, fix typo in enum name * Another typo and minor fix to log message * LEGACY->OPEN_SEGMENTS, Edit docs * Minor update legacy->open segments log message * More code comments, mostly small adjustments to naming etc * fix spelling * Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage * Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments" Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-09-08 13:31:52 -07:00
Clint Wylie	ec334a641b	MySQL extension with MariaDB connector docs (#11608 ) * add docs for mariadb support via mysql extensions * add logging so you know what druid knows * homogenize * spelling * missed a couple	2021-08-19 01:52:26 -07:00
Peter Marshall	8aaefb91e3	Docs - MiddleManager Affinity "strong" definition (#11480 ) * Affinity "strong" definition Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800 * Spelling corrections * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-13 19:17:16 -07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
Charles Smith	6524d838d7	Docs refactor of ingestion. Carries #11541 (#11576 ) * Docs refactor of ingestion. Carries #11541 * Update docs/misc/math-expr.md * add Apache license * fix header, add topics to sidebar * Update docs/ingestion/partitioning.md * pick up changes to and md from `c7fdf1d`, #11479 Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-13 08:42:03 -07:00
Kashif Faraz	aaf0aaad8f	Enable routing of SQL queries at Router (#11566 ) This PR adds a new property druid.router.sql.enable which allows the Router to handle SQL queries when set to true. This change does not affect Avatica JDBC requests and they are still routed by hashing the Connection ID. To allow parsing of the request object as a SqlQuery (contained in module druid-sql), some classes have been moved from druid-server to druid-services with the same package name.	2021-08-13 18:44:39 +05:30
Charles Smith	941c5ffb05	clarify JVM tmp dir requires execute on files (#11542 ) * clarify JVM tmp dir requires execute on files * code SysMonitor for spellcheck	2021-08-09 17:25:10 -07:00
Suneet Saldanha	e423e99997	Update default maxSegmentsInNodeLoadingQueue (#11540 ) * Update default maxSegmentsInNodeLoadingQueue Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100. An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability. Since this is the default druid operators need to run into this instability and then look through the docs to see that the recommended value for a large cluster is 1000. This change makes it so the default will prevent clusters from falling over as they grow over time. * update tests * codestyle	2021-08-05 11:26:58 -07:00
frank chen	55a01a030a	Clarify that Broker caching for groupBy v2 queries does not work (#11370 ) * Add a note * Update docs/configuration/index.md Co-authored-by: sthetland <steve.hetland@imply.io> * clarify that both of non-result level cache and result level cache are not supported Co-authored-by: sthetland <steve.hetland@imply.io>	2021-08-03 10:01:15 -07:00
Yuanli Han	b83742179a	Reduce method invocation of reservoir sampling (#11257 ) * reduce method invocation of reservoir sampling * add a dynamic parameter and add benchmark * rebase	2021-07-30 22:09:50 +08:00
Maytas Monsereenusorn	c068906fca	Make intermediate store for shuffle tasks an extension point (#11492 ) * add interface * add docs * fix errors * fix injection * fix injection * update javadoc	2021-07-27 11:29:43 +07:00
Maytas Monsereenusorn	6ce3b6ca2d	Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444 ) * fix doc * address comments * Update docs/configuration/index.md Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-07-21 12:48:56 +07:00
Maytas Monsereenusorn	8d7d60d18e	Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440 ) * fix pendingTaskBased * fix doc * address comments * address comments * address comments * address comments * address comments * address comments * address comments	2021-07-15 06:52:25 +07:00
Agustin Gonzalez	7e61042794	Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294 ) * Bound memory in native batch ingest create segments * Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that * Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged * Changed name from RealtimeAppenderator to StreamAppenderator * Style * Incorporating tests from StreamAppenderatorTest * Keep totalRows and cleanup code * Added missing dep * Fix unit test * Checkstyle * allowIncrementalPersists should always be true for batch * Added sinks metadata * clear sinks metadata when closing appenderator * Style + minor edits to log msgs * Update sinks metadata & totalRows when dropping a sink (segment) * Remove max * Intelli-j check * Keep a count of hydrants persisted by sink for sanity check before merge * Move out sanity * Add previous hydrant count to sink metadata * Remove redundant field from SinkMetadata * Remove unneeded functions * Cleanup unused code * Removed unused code * Remove unused field * Exclude it from jacoco because it is very hard to get branch coverage * Remove segment announcement and some other minor cleanup * Add fallback flag * Minor code cleanup * Checkstyle * Code review changes * Update batchMemoryMappedIndex name * Code review comments * Exclude class from coverage, will include again when packaging gets fixed * Moved test classes to server module * More BatchAppenderator cleanup * Fix bug in wrong counting of totalHydrants plus minor cleanup in add * Removed left over comments * Have BatchAppenderator follow the Appenderator contract for push & getSegments * Fix LGTM violations * Review comments * Add stats after push is done * Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.) * Update javadocs * Add thread safety notice to BatchAppenderator * Further cleanup config * More config cleanup	2021-07-09 00:10:29 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
frank chen	60843bd11f	Add configuration suggestion to `druid.indexer.storage.type` (#11304 )	2021-05-27 06:44:47 -07:00
Maytas Monsereenusorn	3455352241	Add feature to automatically remove compaction configurations for inactive datasources (#11232 ) * add auto cleanup * add auto cleanup * add auto cleanup * add tests * add tests * use retryutils * use retryutils * use retryutils * address comments	2021-05-11 18:49:18 -07:00
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn	4326e699bd	Add feature to automatically remove datasource metadata based on retention period (#11227 ) * add auto clean up datasource metadata * add test * fix checkstyle * add comments * fix error * address comments * Address comments * fix test * fix test * fix typo * add comment * fix test * fix test	2021-05-11 01:22:33 -07:00
Charles Smith	fae7ebf489	change errant 'none' configuration to 'manual': (#11218 )	2021-05-10 22:04:18 -07:00
frank chen	fa113fb4a9	Fix default value (#11220 )	2021-05-10 10:11:26 -07:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
Maytas Monsereenusorn	d73f72e508	Add feature to automatically remove supervisor based on retention period (#11200 ) * add auto clean up * add test * add test * fix test * Address comments * Address comments	2021-05-06 22:25:23 -07:00
Lucas Capistrant	bb3c810b36	Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135 ) * lay the groundwork for throttling replicant loads per RunRules execution * Add dynamic coordinator config to control new replicant threshold. * remove redundant line * add some unit tests * fix checkstyle error * add documentation for new dynamic config * improve docs and logs * Alter how null is handled for new config. If null, manually set as default	2021-05-05 07:39:36 -05:00
Maytas Monsereenusorn	84aac4832d	Add feature to automatically remove rules based on retention period (#11164 ) * Add feature to automatically remove rules based on retention period * Add feature to automatically remove rules based on retention period * address comments	2021-05-03 11:50:45 -07:00
benkrug	fdab95ea99	Update index.md (#11174 ) tiny change for readability	2021-04-30 09:40:19 -07:00
Maytas Monsereenusorn	6d2b5cdd7e	Add feature to automatically remove audit logs based on retention period (#11084 ) * add docs * add impl * fix checkstyle * fix test * add test * fix checkstyle * fix checkstyle * fix test * Address comments * Address comments * fix spelling * fix docs	2021-04-20 17:10:43 -07:00
Maytas Monsereenusorn	f968400170	Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078 ) * Add config to skip storing audit payload if exceed limit * fix checkstyle * change config name * skip null fields for audit payload * fix checkstyle * address comments * fix guice * fix test * add tests * address comments * address comments * address comments * fix checkstyle * address comments * fix test * fix test * address comments * Address comments Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-13 20:18:28 -07:00
bergmt2000	f60d8ea1c3	Update index.md (#11105 ) Fix json typo in readme for granularitySpec in compaction config example	2021-04-13 16:26:36 +08:00
Maytas Monsereenusorn	4576152e4a	Make dropExisting flag for Compaction configurable and add warning documentations (#11070 ) * Make dropExisting flag for Compaction configurable * fix checkstyle * fix checkstyle * fix test * add tests * fix spelling * fix docs * add IT * fix test * fix doc * fix doc	2021-04-09 00:12:28 -07:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
Jihoon Son	cc12a57034	Enforce allow list for JDBC properties by default (#11063 ) * Enforce allow list for JDBC properties by default * fix tests	2021-04-06 19:46:19 -07:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00
Atul Mohan	3d7e7c2c83	Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213 ) * Skip queue removal on timeout * Clarify error * Add new config to control replication Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2021-03-17 11:34:05 -07:00
Mohammadamin Karbasforushan	dfad38d561	Fix unclear documentation of human readable byte (#10825 ) * Fix unclear documentation of human readable byte Follows https://github.com/apache/druid/pull/10203 ; See https://github.com/apache/druid/pull/10203#issuecomment-771080634 . * Fix sentence style Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-11 00:01:38 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
Clint Wylie	f34c6eb3c0	add druid jdbc handler config for minimum number of rows per frame (#10880 ) * add druid jdbc handler config for minimum number of rows per frame * javadocs and docs adjustments * spelling * adjust docs per review with minor tweaks * adjust more	2021-02-23 02:11:04 -08:00
Jihoon Son	1ec3f0bd73	Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535 )" (#10871 ) This reverts commit `6b14bdb3a5`.	2021-02-09 17:51:26 -08:00
zhangyue19921010	bf1d1d583b	modify (#10778 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-01-22 09:20:13 -08:00
zhangyue19921010	2837a9b62f	[Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661 ) * done * done * done Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-01-08 15:23:08 -08:00
Lucas Capistrant	58ce2e55d8	Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284 ) * dynamic coord config adding more balancing control add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This config caps the number of segments that are iterated over when selecting a segment to move. The default value combined with current balancing strategies will still iterate over all provided segments. However, setting this value to something > 0 will cap the number of segments visited. This could make sense in cases where a cluster has a very large number of segments and the admins prefer less iterations vs a thorough consideration of all segments provided. * fix checkstyle failure * Make doc more detailed for admin to understand when/why to use new config * refactor PR to use a % of segments instead of raw number * update the docs * remove bad doc line * fix typo in name of new dynamic config * update RservoirSegmentSampler to gracefully deal with values > 100% * add handler for <= 0 in ReservoirSegmentSampler * fixup CoordinatorDynamicConfigTest naming and argument ordering * fix items in docs after spellcheck flags * Fix lgtm flag on missing space in string literal * improve documentation for new config * Add default value to config docs and add advice in cluster tuning doc * Add percentOfSegmentsToConsiderPerMove to web console coord config dialog * update jest snapshot after console change * fix spell checker errors * Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider * add new config back to web console module after merge with master * fix ReservoirSegmentSamplerTest * fix line breaks in coordinator console dialog * Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove * Make improvements based off of feedback in review * additional cleanup coming from review * Add a warning log if limit on segments to consider for move can't be calcluated * remove unused import * fix tests for CoordinatorDynamicConfig * remove precondition test that is redundant in CoordinatorDynamicConfig Builder class	2020-12-22 08:27:55 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
frank chen	c410648630	fix injection failure of StorageLocationSelectorStrategy objects (#10363 ) * fix to allow customer storage location selector strategy * add test cases to check instance of selector strategy * update doc * code format * resolve code review comments * inject StorageLocation * fix CI * fix mismatched license item reported by CI * change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy * using a helper method to bind to correct property path	2020-12-08 09:48:31 -08:00
zhangyue19921010	e7e07eab11	[Improve Doc] : Modify the disadvantages of the lazyLoadOnStart feature. (#10608 ) * modify docs * modify docs Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-01 18:33:22 -08:00
frank chen	fe693a4f01	Improve doc and exception message for invalid user configurations (#10598 ) * improve doc and exception message * add spelling check rules and remove unused import * add a test to improve test coverage	2020-11-23 15:03:13 -08:00
zhangyue19921010	1272fb17e5	modify druid.historical.cache.maxEntrySize property in Unified format (#10590 ) Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-11-17 16:36:50 -06:00
Mainak Ghosh	d8e5a159e8	Update index.md (#10549 ) Removing the extra `_` in the default for middlemanager category	2020-11-03 13:44:47 +05:30
Husky Zeng	9286153145	doc wrong description of configuration (#10546 )	2020-11-02 17:57:16 -08:00
Nishant Bangarwa	6b14bdb3a5	Add support for Blacklisting some domains for HTTPInputSource (#10535 ) fix inspections refactor class name change name add allowList as well distinguish between empty and null list Fix CI	2020-11-02 21:47:25 +05:30
Mainak Ghosh	14072d3ab0	Adding more dimensions to the audit log entry (#10373 ) * Adding more dimensions to the audit log entry * Making adding payload in audit metric optional * Changing the name of the parameter to includePayloadAsDimensionInMetric. Adding a unit test * Fixing the intellij code introspection issues	2020-09-17 18:36:28 -07:00
Jihoon Son	8657b23ab2	Integration tests and docs for auto compaction with different partitioning (#10354 ) * Working * add test * doc * fix test * split other integration test * exclude other-index from other tests * doc anchor fix * adjust task slots and number of merge tasks * spell check * reduce maxNumConcurrentSubTasks to 1 * maxNumConcurrentSubtasks for range partitinoing * reduce memory for historical * change group name	2020-09-15 11:28:09 -07:00
Lucas Capistrant	690e070c43	Fix doc for name of dynamic config to pause coordination (#10345 )	2020-09-11 08:40:06 -05:00
Atul Mohan	06539bc828	Set default server.maxsize to the sum of segment cache (#10255 ) * Default server.maxsize * Remove maxsize refs from config Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-08-10 09:21:22 -07:00
frank chen	646fa84d04	Support unit on byte-related properties (#10203 ) * support unit suffix on byte-related properties * add doc * change default value of byte-related properites in example files * fix coding style * fix doc * fix CI * suppress spelling errors * improve code according to comments * rename Bytes to HumanReadableBytes * add getBytesInInt to get value safely * improve doc * fix problem reported by CI * fix problem reported by CI * resolve code review comments * improve error message * improve code & doc according to comments * fix CI problem * improve doc * suppress spelling check errors	2020-07-31 09:58:48 +08:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Antoine Huret	88d20a61a6	renamed authenticationChain to authenticatorChain (#10143 )	2020-07-08 19:58:21 -07:00
Atul Mohan	367eaedbb4	Clarify change in behavior for druid.server.maxSize (#10105 ) * Clarify maxSize docs * Add info about maxSize Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-07-01 22:22:18 -07:00
Suneet Saldanha	15a0b4ffe2	Filter http requests by http method (#10085 ) * Filter http requests by http method Add a config that allows a user which http methods to allow against their Druid server. Druid will only accept http requests with the method: GET, PUT, POST, DELETE and OPTIONS. If a Druid admin wants to allow other methods, they can do so by using the ServerConfig#allowedHttpMethods config. If a Druid user would like to disallow OPTIONS, this can be done by changing the AuthConfig#allowUnauthenticatedHttpOptions config * Exclude OPTIONS from always supported HTTP methods Add HEAD as an allowed method for web console e2e tests * fix docs * fix security IT * Actually fix the web console e2e tests * Ignore icode coverage for nitialization classes * code review	2020-06-29 16:59:31 -07:00
Jian Wang	20fd72bd13	Fix NPE when brokers use custom priority list (#9878 )	2020-06-26 17:28:54 -07:00
Maytas Monsereenusorn	9be5039f68	Enable query vectorization by default (#10065 ) * Enable query vectorization by default * update docs	2020-06-24 13:08:49 -07:00

1 2 3 4

195 Commits