Commit Graph

116 Commits

Author SHA1 Message Date
Maytas Monsereenusorn f60b3b3bab
fix doc (#11772) 2021-10-05 15:42:11 -07:00
Caroline1000 ffbe303828
Update balancer strategy recommendations (#11759)
* Update balancer strategy recommendations

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-10-05 09:47:37 -07:00
Maytas Monsereenusorn 129911a20e
Add documentations for config to filter internal Druid-related messages from error response (#11755)
* add doc

* add doc

* address comments

* fix typo

* address comments
2021-10-01 17:49:02 +07:00
Kashif Faraz c641657bae
Fix router documentation for `druid.router.sql.enable` (#11716)
* Rename field, fix router documentation

* Add more lines to doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-28 22:54:13 +05:30
Clint Wylie 5de26cf6d9
add optional system schema authorization (#11720)
* add optional system schema authorization

* remove unused

* adjust docs

* doc fixes, missing ldap config change for integration tests

* style
2021-09-21 13:28:26 -07:00
Peter Marshall ee009ec18e
Docs - ingestion task log config and process (#11678)
* Update index.md

Moved H4s underneath the H3 for the task log location and added hyperlinks.

* Update tasks.md

Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead.

* Update tasks.md

.html > .md

* Update docs/ingestion/tasks.md

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2021-09-13 15:49:09 -07:00
Charles Smith f9329fbf9e
add clarification for maxSubqueryRows (#11687)
* add clarification for maxSubqueryRows
2021-09-13 11:49:30 -07:00
Suneet Saldanha 531d11abaf
Update description of batchProcessingMode (#11686)
* Update description of batchProcessingMode 

Update the description to explicitly mention a released version of Druid that the original version was referencing

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-10 16:55:48 -07:00
Agustin Gonzalez 9efa6cc9c8
Make persists concurrent with adding rows in batch ingestion (#11536)
* Make persists concurrent with ingestion

* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well

* Go back to documented default persists (zero)

* Move to debug

* Remove unnecessary Atomics

* Comments on synchronization (or not) for sinks & sinkMetadata

* Some cleanup for unit tests but they still need further work

* Shutdown & wait for persists and push on close

* Provide support for three existing batch appenderators using batchProcessingMode flag

* Fix reference to wrong appenderator

* Fix doc typos

* Add BatchAppenderators class test coverage

* Add log message to batchProcessingMode final value, fix typo in enum name

* Another typo and minor fix to log message

* LEGACY->OPEN_SEGMENTS, Edit docs

* Minor update legacy->open segments log message

* More code comments, mostly small adjustments to naming etc

* fix spelling

* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage

* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-09-08 13:31:52 -07:00
Clint Wylie ec334a641b
MySQL extension with MariaDB connector docs (#11608)
* add docs for mariadb support via mysql extensions

* add logging so you know what druid knows

* homogenize

* spelling

* missed a couple
2021-08-19 01:52:26 -07:00
Peter Marshall 8aaefb91e3
Docs - MiddleManager Affinity "strong" definition (#11480)
* Affinity "strong" definition

Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800

* Spelling corrections

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-13 19:17:16 -07:00
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
Charles Smith 6524d838d7
Docs refactor of ingestion. Carries #11541 (#11576)
* Docs refactor of ingestion. Carries #11541

* Update docs/misc/math-expr.md

* add Apache license

* fix header, add topics to sidebar

* Update docs/ingestion/partitioning.md

* pick up changes to  and  md from c7fdf1d, #11479

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-08-13 08:42:03 -07:00
Kashif Faraz aaf0aaad8f
Enable routing of SQL queries at Router (#11566)
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.

This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.

To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
2021-08-13 18:44:39 +05:30
Charles Smith 941c5ffb05
clarify JVM tmp dir requires execute on files (#11542)
* clarify JVM tmp dir requires execute on files

* code SysMonitor for spellcheck
2021-08-09 17:25:10 -07:00
Suneet Saldanha e423e99997
Update default maxSegmentsInNodeLoadingQueue (#11540)
* Update default maxSegmentsInNodeLoadingQueue

Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100.

An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability.
Since this is the default druid operators need to run into this instability
and then look through the docs to see that the recommended value for a large
cluster is 1000. This change makes it so the default will prevent clusters
from falling over as they grow over time.

* update tests

* codestyle
2021-08-05 11:26:58 -07:00
frank chen 55a01a030a
Clarify that Broker caching for groupBy v2 queries does not work (#11370)
* Add a note

* Update docs/configuration/index.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* clarify that both of non-result level cache and result level cache are not supported

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-08-03 10:01:15 -07:00
Yuanli Han b83742179a
Reduce method invocation of reservoir sampling (#11257)
* reduce method invocation of reservoir sampling

* add a dynamic parameter and add benchmark

* rebase
2021-07-30 22:09:50 +08:00
Maytas Monsereenusorn c068906fca
Make intermediate store for shuffle tasks an extension point (#11492)
* add interface

* add docs

* fix errors

* fix injection

* fix injection

* update javadoc
2021-07-27 11:29:43 +07:00
Maytas Monsereenusorn 6ce3b6ca2d
Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444)
* fix doc

* address comments

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-07-21 12:48:56 +07:00
Maytas Monsereenusorn 8d7d60d18e
Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440)
* fix pendingTaskBased

* fix doc

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments
2021-07-15 06:52:25 +07:00
Agustin Gonzalez 7e61042794
Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294)
* Bound memory in native batch ingest create segments

* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged

* Changed name from RealtimeAppenderator to StreamAppenderator

* Style

* Incorporating tests from StreamAppenderatorTest

* Keep totalRows and cleanup code

* Added missing dep

* Fix unit test

* Checkstyle

* allowIncrementalPersists should always be true for batch

* Added sinks metadata

* clear sinks metadata when closing appenderator

* Style + minor edits to log msgs

* Update sinks metadata & totalRows when dropping a sink (segment)

* Remove max

* Intelli-j check

* Keep a count of hydrants persisted by sink for sanity check before merge

* Move out sanity

* Add previous hydrant count to sink metadata

* Remove redundant field from SinkMetadata

* Remove unneeded functions

* Cleanup unused code

* Removed unused code

* Remove unused field

* Exclude it from jacoco because it is very hard to get branch coverage

* Remove segment announcement and some other minor cleanup

* Add fallback flag

* Minor code cleanup

* Checkstyle

* Code review changes

* Update batchMemoryMappedIndex name

* Code review comments

* Exclude class from coverage, will include again when packaging gets fixed

* Moved test classes to server module

* More BatchAppenderator cleanup

* Fix bug in wrong counting of totalHydrants plus minor cleanup in add

* Removed left over comments

* Have BatchAppenderator follow the Appenderator contract for push & getSegments

* Fix LGTM violations

* Review comments

* Add stats after push is done

* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)

* Update javadocs

* Add thread safety notice to BatchAppenderator

* Further cleanup config

* More config cleanup
2021-07-09 00:10:29 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
frank chen 60843bd11f
Add configuration suggestion to `druid.indexer.storage.type` (#11304) 2021-05-27 06:44:47 -07:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Agustin Gonzalez 8e5048e643
Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123)
* Avoid mapping hydrants in create segments phase for native ingestion

* Drop queriable indices after a given sink is fully merged

* Do not drop memory mappings for realtime ingestion

* Style fixes

* Renamed to match use case better

* Rollback memoization code and use the real time flag instead

* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations

* Style

* Log some count stats

* Make sure sinks size is obtained at the right time

* BatchAppenderator unit test

* Fix comment typos

* Renamed methods to make them more readable

* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator

* Missing dependency

* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.

* Replaced concurrent variables with normal ones

* Added   batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.

* Style fix.

* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.

* Forgot to commit this edited documentation message
2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn 4326e699bd
Add feature to automatically remove datasource metadata based on retention period (#11227)
* add auto clean up datasource metadata

* add test

* fix checkstyle

* add comments

* fix error

* address comments

* Address comments

* fix test

* fix test

* fix typo

* add comment

* fix test

* fix test
2021-05-11 01:22:33 -07:00
Charles Smith fae7ebf489
change errant 'none' configuration to 'manual': (#11218) 2021-05-10 22:04:18 -07:00
frank chen fa113fb4a9
Fix default value (#11220) 2021-05-10 10:11:26 -07:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00
Maytas Monsereenusorn d73f72e508
Add feature to automatically remove supervisor based on retention period (#11200)
* add auto clean up

* add test

* add test

* fix test

* Address comments

* Address comments
2021-05-06 22:25:23 -07:00
Lucas Capistrant bb3c810b36
Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135)
* lay the groundwork for throttling replicant loads per RunRules execution

* Add dynamic coordinator config to control new replicant threshold.

* remove redundant line

* add some unit tests

* fix checkstyle error

* add documentation for new dynamic config

* improve docs and logs

* Alter how null is handled for new config. If null, manually set as default
2021-05-05 07:39:36 -05:00
Maytas Monsereenusorn 84aac4832d
Add feature to automatically remove rules based on retention period (#11164)
* Add feature to automatically remove rules based on retention period

* Add feature to automatically remove rules based on retention period

* address comments
2021-05-03 11:50:45 -07:00
benkrug fdab95ea99
Update index.md (#11174)
tiny change for readability
2021-04-30 09:40:19 -07:00
Maytas Monsereenusorn 6d2b5cdd7e
Add feature to automatically remove audit logs based on retention period (#11084)
* add docs

* add impl

* fix checkstyle

* fix test

* add test

* fix checkstyle

* fix checkstyle

* fix test

* Address comments

* Address comments

* fix spelling

* fix docs
2021-04-20 17:10:43 -07:00
Maytas Monsereenusorn f968400170
Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078)
* Add config to skip storing audit payload if exceed limit

* fix checkstyle

* change config name

* skip null fields for audit payload

* fix checkstyle

* address comments

* fix guice

* fix test

* add tests

* address comments

* address comments

* address comments

* fix checkstyle

* address comments

* fix test

* fix test

* address comments

* Address comments

Co-authored-by: Jihoon Son <jihoonson@apache.org>

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-04-13 20:18:28 -07:00
bergmt2000 f60d8ea1c3
Update index.md (#11105)
Fix json typo in readme for granularitySpec in compaction config example
2021-04-13 16:26:36 +08:00
Maytas Monsereenusorn 4576152e4a
Make dropExisting flag for Compaction configurable and add warning documentations (#11070)
* Make dropExisting flag for Compaction configurable

* fix checkstyle

* fix checkstyle

* fix test

* add tests

* fix spelling

* fix docs

* add IT

* fix test

* fix doc

* fix doc
2021-04-09 00:12:28 -07:00
sthetland fb6751fa45
Fix old broken link (#11048)
* link check fixes

* updated link target

* Update aggregations.md

* spelling error
2021-04-07 20:40:50 -07:00
Abhishek Agarwal 0df0bff44b
Enable multiple distinct aggregators in same query (#11014)
* Enable multiple distinct count

* Add more tests

* fix sql test

* docs fix

* Address nits
2021-04-07 00:52:19 -07:00
Jihoon Son cc12a57034
Enforce allow list for JDBC properties by default (#11063)
* Enforce allow list for JDBC properties by default

* fix tests
2021-04-06 19:46:19 -07:00
Jihoon Son cfcebc40f6
Allow list for JDBC connection properties to address CVE-2021-26919 (#11047)
* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11
2021-04-01 17:30:47 -07:00
Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00
Charles Smith d69533dbd9
First refactor of compaction (#10935)
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc

* fix links, typos, some reorganization

* fix spelling. TBD still there for work in progress

* updates tutorial examples, adds more clarification around compaction use cases

* add granularity spec to automatic compaction config

* final edits

* spelling fixes

* apply suggestions from review

* upadtes from review

* last edits

* move note

* clarify null

* fix links & spelling

* latest review

* edits to auto-compaction config

* add back rollup

* fix links & spelling

* Update compaction.md

add granularityspec to example
2021-03-24 11:41:44 -07:00
Atul Mohan 3d7e7c2c83
Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213)
* Skip queue removal on timeout

* Clarify error

* Add new config to control replication

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2021-03-17 11:34:05 -07:00
Mohammadamin Karbasforushan dfad38d561
Fix unclear documentation of human readable byte (#10825)
* Fix unclear documentation of human readable byte

Follows https://github.com/apache/druid/pull/10203 ;
See https://github.com/apache/druid/pull/10203#issuecomment-771080634 .

* Fix sentence style

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-11 00:01:38 -08:00
Jihoon Son 9946306d4b
Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830)
* Allow only HTTP and HTTPS protocols for the HTTP inputSource

* rename

* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* fix http firehose and update doc

* HDFS inputSource

* add configs for allowed protocols

* fix checkstyle and doc

* more checkstyle

* remove stale doc

* remove more doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* update hdfs address in docs

* fix test

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-06 11:43:00 -08:00
Clint Wylie f34c6eb3c0
add druid jdbc handler config for minimum number of rows per frame (#10880)
* add druid jdbc handler config for minimum number of rows per frame

* javadocs and docs adjustments

* spelling

* adjust docs per review with minor tweaks

* adjust more
2021-02-23 02:11:04 -08:00
Jihoon Son 1ec3f0bd73
Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535)" (#10871)
This reverts commit 6b14bdb3a5.
2021-02-09 17:51:26 -08:00
zhangyue19921010 bf1d1d583b
modify (#10778)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-22 09:20:13 -08:00