Commit Graph

236 Commits

Author SHA1 Message Date
Charles Smith 25c1d55dd6
Clarify behavior when decommissioningMaxPercentOfMaxSegmentsToMove = 0 (#13157)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-10-07 09:01:32 -07:00
AmatyaAvadhanula 41e51b21c3
Make http options the default configurations (#13092)
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
2022-10-05 05:35:17 +05:30
Charles Smith eb760c3d1d
update log4j example (#13095)
* update log4j example

* fix some style issues

* Update docs/configuration/logging.md

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-22 09:46:49 +08:00
Vadim Ogievetsky bb0b810b1d
fix html tags in docs (#13117)
* fix html tags in docs

* revert not null
2022-09-18 19:40:33 -07:00
Gian Merlino d4967c38f8
Various documentation updates. (#13107)
* Various documentation updates.

1) Split out "data management" from "ingestion". Break it into thematic pages.

2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
   all conceptual content is in concepts.md and all syntax content is in reference.md.
   Shorten the known issues page to the most interesting ones.

3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
   index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.

4) Rename various mentions of "Druid console" to "web console".

5) Add additional information to ingestion/partitioning.md.

6) Remove a mention of Tranquility.

7) Remove a note about upgrading to Druid 0.10.1.

8) Remove no-longer-relevant task types from ingestion/tasks.md.

9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.

10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
    places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
    in the sidebar.

11) Make all br tags self-closing.

12) Certain other cosmetic changes.

13) Update to node-sass 7.

* make travis use node12 for docs

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2022-09-16 21:58:11 -07:00
Vadim Ogievetsky 2493eb17bf
Doc fixes around msq (#13090)
* remove things that do not apply

* fix more things

* pin node to a working version

* fix

* fixes

* known issues tidy up

* revert auto formatting changes

* remove management-uis page which is 100% lies

* don't mention the Coordinator console (that no longer exits)

* goodies

* fix typo
2022-09-16 02:15:26 -07:00
Adam Peck ee22663dd3
Add interpolation to JsonConfigurator (#13023)
* Add interpolation to JsonConfigurator

* Fix checkstyle

* Fix tests by removing common-text override

* Add back commons-text without version

* Remove unused hadoopDir configs

* Move some stuff to hopefully pass coverage
2022-09-07 12:48:01 +05:30
zemin 6805a7f9c2
Ease of hidding sensitive properties from /status/proper… (#12950)
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

using one property for hiding properties, updated the index.md to document hiddenProperties

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

Added java docs

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties

fixed typo and removed redundant space

Co-authored-by: zemin <zemin.piao@adyen.com>
2022-09-02 08:51:25 -05:00
Adam Peck 21b73bde20
Update Curator to 5.3.0 (#12939)
* Update Curator to 5.3.0

* Update licenses.yaml

* Fix inspections + add tests.

* Fix checkstyle

* Another intellij inspection fix

* Update curator exclusions

* Cleanup new exhibitor references

* Remove unused dep and checkstyle fix
2022-08-26 18:23:40 -07:00
Karan Kumar f7c6316992
Setting useNativeQueryExplain to true (#12936)
* Setting useNativeQueryExplain to true

* Update docs/querying/sql-query-context.md

Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com>

* Fixing tests

* Fixing broken tests

Co-authored-by: Santosh Pingale <pingalesantosh@gmail.com>
2022-08-24 17:39:55 +05:30
Rohan Garg 3c129f6728
Add sql planning time metric (#12923) 2022-08-22 11:09:44 +05:30
Lucas Capistrant ec8bdeb9f6
Document missing property - druid.announcer.skipSegmentAnnouncementOnZk (#12891)
* document missing config related to segment announcement

* improve wording

* improve wording

* update docs
2022-08-16 12:32:56 +05:30
Lucas Capistrant 3a3271eddc
Introduce defaultOnDiskStorage config for Group By (#12833)
* Introduce defaultOnDiskStorage config for groupBy

* add debug log to groupby query config

* Apply config change suggestion from review

* Remove accidental new lines

* update default value of new default disk storage config

* update debug log to have more descriptive text

* Make maxOnDiskStorage and defaultOnDiskStorage HumanRedadableBytes

* improve test coverage

* Provide default implementation to new default method on advice of reviewer
2022-08-12 09:40:21 -07:00
Suneet Saldanha 267b32c2e2
Set druid.processing.fifo to true by default (#12571) 2022-08-08 10:18:24 -07:00
Gian Merlino ef6811ef88
Improved Java 17 support and Java runtime docs. (#12839)
* Improved Java 17 support and Java runtime docs.

1) Add a "Java runtime" doc page with information about supported
   Java versions, garbage collection, and strong encapsulation..

2) Update asm and equalsverifier to versions that support Java 17.

3) Add additional "--add-opens" lines to surefire configuration, so
   tests can pass successfully under Java 17.

4) Switch openjdk15 tests to openjdk17.

5) Update FrameFile to specifically mention Java runtime incompatibility
   as the cause of not being able to use Memory.map.

6) Update SegmentLoadDropHandler to log an error for Errors too, not
   just Exceptions. This is important because an IllegalAccessError is
   encountered when the correct "--add-opens" line is not provided,
   which would otherwise be silently ignored.

7) Update example configs to use druid.indexer.runner.javaOptsArray
   instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)

* Adjustments.

* Use run-java in more places.

* Add run-java.

* Update .gitignore.

* Exclude hadoop-client-api.

Brought in when building on Java 17.

* Swap one more usage of java.

* Fix the run-java script.

* Fix flag.

* Include link to Temurin.

* Spelling.

* Update examples/bin/run-java

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2022-08-03 23:16:05 -07:00
Gian Merlino 2912a36a20
Use nonzero default value of maxQueuedBytes. (#12840)
* Use nonzero default value of maxQueuedBytes.

The purpose of this parameter is to prevent the Broker from running out
of memory. The prior default is unlimited; this patch changes it to a
relatively conservative 25MB.

This may be too low for larger clusters. The risk is that throughput
can decrease for queries with large resultsets or large amounts of intermediate
data. However, I think this is better than the risk of the prior default, which
is that these queries can cause the Broker to go OOM.

* Alter calculation.
2022-08-02 17:57:27 -07:00
Charles Smith efbb58e90e
docs: remove maxRowsPerSegment where appropriate (#12071)
* remove maxRowsPerSegment where appropriate

* fix tutorial, accept suggestions

* Update docs/design/coordinator.md

* additional tutorial file

* fix initial index spec

* accept comments

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add back comment on maxrows per segment

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/tutorials/tutorial-compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rm duplicate entry

* Update native-batch-simple-task.md

remove ref to `maxrowspersegment`

* Update native-batch.md

remove ref to `maxrowspersegment`

* final tenticles

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-07-28 16:52:13 +05:30
Atul Mohan 93a9a4b1c5
Add retention for file request logs (#12559)
* Add retention for file request logs

* Spelling
2022-07-27 08:17:02 -07:00
TSFenwick 8c02880d5f
Emit metrics for distribution of number of rows per segment (#12730)
* initial commit of bucket dimensions for metrics

return counts of segments that have rowcount in a bucket size for a datasource
return average value of rowcount per segment in a datasource
added unit test
naming could use a lot of work
buckets right now are not finalized
added javadocs
altered metrics.md

* fix checkstyle issues

* addressed review comments

add monitor test
move added functionality to new monitor
update docs

* address comments

renamed monitor
handle tombstones better
update docs
added javadocs

* Add support for tombstones in the segment distribution

* undo changes to tombstone segmentizer factory

* fix accidental whitespacing changes

* address comments regarding metrics documentation

and rename variable to be more accurate

* fix tests

* fix checkstyle issues

* fix broken test

* undo removal of timeout
2022-07-12 07:04:42 -07:00
Rui Chen 068bea6334
deps: upgrade mysql-connector-java to v5.1.49 (#12704) 2022-06-29 23:15:46 +08:00
Gian Merlino d29343cbe3
Disable autokill of segments by default. (#12693)
Also add clarifying commentary to the documentation about how durationToRetain works.
2022-06-23 17:17:11 -07:00
Tejaswini Bandlamudi 99e1b4efee
Update default value of `inputSegmentSizeBytes` in configuration docs (#12678) 2022-06-22 09:05:03 +05:30
Gian Merlino 0099940808
Add TIME_IN_INTERVAL SQL operator. (#12662)
* Add TIME_IN_INTERVAL SQL operator.

The operator is implemented as a convertlet rather than an
OperatorConversion, because this allows it to be equivalent to using
the >= and < operators directly.

* SqlParserPos cannot be null here.

* Remove unused import.

* Doc updates.

* Add words to dictionary.
2022-06-21 13:05:37 -07:00
Jill Osborne f050069767
Segments doc update (#12344)
* Corrected heading levels in segments doc

* IMPLY-18394: Updated Segments doc

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/design/segments.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update segments.md

* Updated links to changed headings in Segments doc

* Corrected spelling error

* Update segments.md

Incorporated suggestions from Paul Rogers.

* Update index.md

* Update segments.md

* Update segments.md

* Update segments.md

* Update compaction.md

* Update docs/design/segments.md

fix typo

* Update docs/ingestion/compaction.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/design/segments.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-06-16 13:25:17 -07:00
Victoria Lim 353475bd36
Docs for automatic compaction (#12569)
* docs for auto-compaction

* fix broken links

* another link

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Suneet Saldanha <suneet@apache.org>

* reorg content for skipOffset

* Update docs/ingestion/automatic-compaction.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-06-09 14:55:12 -07:00
Gian Merlino a503683a4a
Add caching and CSP response headers. (#12609)
* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-06-04 21:46:49 +05:30
Gian Merlino a27f4f5740
Service stdout log files, move logs to log/. (#12570)
* Service stdout log files, move logs to log/.

Two changes that make log behavior cleaner:

1) Redirect messages from the Java runtime to their own log files.
   Otherwise, they would get jumbled up in the output of the all-in-one
   start command.

2) Use log/ instead of bin/log/ for the default log directory. Makes them
   easier to find.

Additionally, add documentation about how to avoid the reflective
access warnings in Java 11.

* Spelling.

* See if code formatting affects spelling.
2022-06-03 10:44:29 +05:30
Katya Macedo 5073cee73f
Fix zookeeper spelling (#12556) 2022-05-21 16:14:02 +08:00
317brian 351e57bdb6
docs(fix): clarify how worker.version and minWorkerVersion comparison works (#12459)
* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Revert "docs(fix): clarify how worker.version and minWorkerVersion comparison works"

This reverts commit cadd1fdc60.

* docs(fix): clarify how worker.version and minWorkerVersion comparison works

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

fix spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-05-16 07:48:33 -07:00
Frank Chen c33ff1c745
Enforce console logging for peon process (#12067)
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.

But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.

So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
2022-05-16 15:07:21 +05:30
Lucas Capistrant deb69d1bc0
Allow coordinator to be configured to kill segments in future (#10877)
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.

A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.
2022-05-11 07:35:15 +05:30
Victoria Lim 0206a2da5c
Update automatic compaction docs with consistent terminology (#12416)
* specify automatic compaction where applicable

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* update for style and consistency

* implement suggested feedback

* remove duplicate example

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/compaction.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/api-reference.md

* update .spelling

* Adopt review suggestions

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2022-05-03 16:22:25 -07:00
zachjsh 564d6defd4
Worker level task metrics (#12446)
* * fix metric name inconsistency

* * add task slot metrics for middle managers

* * add new WorkerTaskCountStatsMonitor to report task count metrics
  from worker

* * more stuff

* * remove unused variable

* * more stuff

* * add javadocs

* * fix checkstyle

* * fix hadoop test failure

* * cleanup

* * add more code coverage in tests

* * fix test failure

* * add docs

* * increase code coverage

* * fix spelling

* * fix failing tests

* * remove dead code

* * fix spelling
2022-04-26 11:44:44 -05:00
Maytas Monsereenusorn 5d37d9f9d8
Add docs to metric spec for auto compaction (#12415)
* add docs

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update index.md

* Update docs/configuration/index.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2022-04-13 13:27:00 -07:00
Parag Jain 2c79d28bb7
Copy of #11309 with fixes (#12402)
* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
2022-04-11 21:05:24 +05:30
mark-imply bf96ddf5ba
Update index.md (#12390)
Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.
2022-04-08 18:01:54 +05:30
Victoria Lim d326c681c1
Document config for ingesting null columns (#12389)
* config for ingesting null columns

* add link

* edit .spelling

* what happens if storeEmptyColumns is disabled
2022-04-05 09:15:42 -07:00
Tejaswini Bandlamudi 984904779b
Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381)
The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.

The default value is now increased to Long.MAX_VALUE.
2022-04-04 16:28:53 +05:30
somu-imply a1ea658115
Introducing a new config to ignore nulls while computing String Cardinality (#12345)
* Counting nulls in String cardinality with a config

* Adding tests for the new config

* Wrapping the vectorize part to allow backward compatibility

* Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes

* Updating testcase and code

* Adding null handling test to improve coverage

* Checkstyle fix

* Adding 1 more change in docs

* Making docs clearer
2022-03-29 14:31:36 -07:00
Victoria Lim 9ed7aa33ec
Docs for request logging (#12363)
* add docs for request logging

* remove stray character

* Update docs/operations/request-logging.md

Co-authored-by: TSFenwick <tsfenwick@gmail.com>

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: TSFenwick <tsfenwick@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-03-28 14:09:41 -07:00
Dr. Sizzles 69f928f50e
Adding k8s support for human readable parsing (#12316)
* Adding k8s support for human readable parsing

* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java

Co-authored-by: Frank Chen <frankchen@apache.org>

* Changes per review

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
2022-03-16 11:18:47 +08:00
AmatyaAvadhanula 7bf1d8c5c0
Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298)
Add config for eager / lazy connection initialization in ResourcePool

Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.

While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.

Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.

A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.

If set to false, lazy initialization of connection resources takes place.

NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR

Algorithm
The current implementation relies on the creation of maxSize resources eagerly.

The new implementation's behaviour is as follows:

If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
2022-03-09 23:17:43 +05:30
Agustin Gonzalez abe76ccb90
Batch ingestion replace (#12137)
* Tombstone support for replace functionality

* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec

* Update compaction test to match replace behavior

* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.

* Style plus simple queriableindex test

* Add segment cache loader tombstone test

* Add more tests

* Add a method to the LogicalSegment to test whether it has any data

* Test filter with some empty logical segments

* Refactor more compaction/dropexisting tests

* Code coverage

* Support for all empty segments

* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.

* Fix null ptr when segment does not have a queriable index

* Add support for empty replace interval (all input data has been filtered out)

* Fixed coverage & style

* Find tombstone versions from lock versions

* Test failures & style

* Interner was making this fail since the two segments were consider equal due to their id's being equal

* Cleanup tombstone version code

* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used

* Reject replace spec when input intervals are empty

* Documentation

* Style and unit test

* Restore test code deleted by mistake

* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.

* Unused imports. Dead code. Test coverage.

* Coverage.

* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.

* Fix OmniKiller + more test coverage.

* Tombstones are now marked using a shard spec

* Drop a segment factory.json in the segment cache for tombstones

* Style

* Style + coverage

* style

* Add TombstoneLoadSpec.class to mapper in test

* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java

Typo

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/configuration/index.md

Missing

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Typo

* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

* Range does not work with multi-dim

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
2022-03-08 20:07:02 -07:00
Gian Merlino 875e0696e0
GroupBy: Cap dictionary-building selector memory usage. (#12309)
* GroupBy: Cap dictionary-building selector memory usage.

New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.

Includes:

- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
  the v2SmallDictionary suite. (Both the selector dictionary and
  the merging dictionary will be small in that suite.)
- Tests for the new config parameter.

* Fix issues from tests.

* Add "pre-existing" to dictionary.

* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.

* Adjustments from review comments.
2022-03-08 13:13:11 -08:00
somu-imply eae163a797
Moving in filter check to broker (#12195)
* Moving in filter check to broker

* Adding more unit tests, making error message meaningful

* Spelling and doc changes

* Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100

* Removing upper limit of 100, updated docs

* Making documentation more meaningful

* Moving check outside to PlannerConfig, updating test cases and adding back max limit

* Updated with some additional code comments

* Missed removing one line during the checkin

* Addressing doc changes and one forbidden API correction

* Final doc change

* Adding a speling exception, correcting a testcase

* Reading entire filter tree to address combinations of ANDs and ORs

* Specifying in docs that, this case works only for ORs

* Revert "Reading entire filter tree to address combinations of ANDs and ORs"

This reverts commit 81ca8f8496.

* Covering a class cast exception and updating docs

* Counting changed

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2022-02-15 20:45:07 -08:00
AmatyaAvadhanula 393e9b68a8
Add config to limit task slots for parallel indexing tasks (#12221)
In extreme cases where many parallel indexing jobs are submitted together, it is possible
that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule
their own sub-tasks thus stalling progress of all the indexing jobs.

Key changes:
- Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots
  for `ParallelIndexSupervisorTasks` per worker
- `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior)
- `ratio = 0` implies supervisor tasks can not use any slot on a worker
   (actually, at least 1 slot is always available to ensure progress of parallel indexing jobs)
- `ImmutableWorkerInfo.canRunTask()`
- `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`
2022-02-15 23:15:09 +05:30
Victoria Lim c61b19d443
Refactor SQL docs (#12239)
* refactor and link fixes

* add sql docs to left nav

* code format for needle

* updated web console script

* link fixes

* update earliest/latest functions

* edits for grammar and style

* more link fixes

* another link

* update with #12226

* update .spelling file
2022-02-11 14:43:30 -08:00
Suneet Saldanha ced1389d4c
Enable auto kill segments by default (#12187)
* Enable auto-kill by default

* tests

* wip

* test

* fix IT

* fix it

* remove from docs

* make coverage bot happy
2022-02-07 06:57:54 -08:00
Suneet Saldanha 159f97dcb0
Update docs for druid.processing.numThreads in brokers (#12231)
* Update docs for druid.processing.numThreads

* error msg

* one more reference
2022-02-04 17:34:21 -08:00
Maytas Monsereenusorn fac6a48a8f
add impl (#12201) 2022-01-27 11:39:59 -08:00
Suneet Saldanha 2b32d86f3b
Enable automatic metdata cleanup by default (#12188) 2022-01-24 20:04:17 -08:00
somu-imply c267b65f97
Removing unused processing threadpool on broker (#12070)
* Thread pool for broker

* Updating two tests to improve coverage for new method added

* Updating druidProcessingConfigTest to cover coverage

* Adding missed spelling errors caused in doc

* Adding test to cover lines of new function added
2021-12-21 13:07:53 -08:00
Lucas Capistrant 150902b95c
clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960)
* clean up the balancing code around the batched vs deprecated way of sampling segments to balance

* fix docs, clarify comments, add deprecated annotations to legacy code

* remove unused variable

* update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated

* fix dynamic config text for percentOfSegmentsToConsiderPerMove

* run prettier to cleanup coordinator-dynamic-config.tsx changes

* update jest snapshot

* update documentation per review feedback
2021-12-07 14:47:46 -08:00
Peter Marshall 0b3f0bbbd8
Docs - Metrics docs layout and info about query/bytes (#11481)
* Metrics docs layout and info about query/bytes

Knowledge transfer from https://groups.google.com/g/druid-user/c/8fiflmSEoTQ - updated the layout of the Metrics part, adding links between docs pages.

Update index.md

Amended typo

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/metrics.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/metrics.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/metrics.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Feedback applied

Http --> HTTP and moved content / removed >

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-12-07 09:45:24 -08:00
Frank Chen 4631a66723
Support rolling log files (#10147)
* apply log file rolling strategy

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Use absolute log path and allow spaces in log path

* Update log4j2 configuration

* apply FileAppender to ZooKeeper

* DO NOT redirect application's console log to file in supervisor
2021-12-03 21:32:01 +08:00
Charles Smith 7ed46800c3
Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983)
Adds documentation for multi-dimension partitioning. cc: @kfaraz
Refactors the native batch partitioning topic as follows:

Native batch ingestion covers parallel-index
Native batch simple task indexing covers index
Native batch input sources covers ioSource
Native batch ingestion with firehose covers deprecated firehose
2021-12-03 16:37:14 +05:30
Clint Wylie 84b4bf56d8
vectorize logical operators and boolean functions (#11184)
changes:
* adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions
* vectorize logical operators and boolean functions, some only if useStrictBooleans is true
2021-12-02 16:40:23 -08:00
Laksh Singla c381cae51b
Improve the output of SQL explain message (#11908)
Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side.

This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).
2021-11-25 21:08:33 +05:30
Maytas Monsereenusorn bb3d2a433a
Support filtering data in Auto Compaction (#11922)
* add impl

* fix checkstyle

* add test

* add test

* add unit tests

* fix unit tests

* fix unit tests

* fix unit tests

* add IT

* add IT

* add comments

* fix spelling
2021-11-24 10:56:38 -08:00
TSFenwick 1487f558b1
Use a simple class to sanitize JDBC exceptions and also log them (#11843)
* Use a simple class to sanitize sanitizable errors and log them

The purpose of this is to sanitize JDBC errors, but can sanitize other errors
if they implement SanitizableError Interface

add a class to log errors and sanitize them
added a simple test that tests out that the error gets sanitized
add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy

* return less information as part of too many connections, and instead only log specific details

This is so an end user gets relevant information but not too much info since they might now how
many brokers they have

* return only runtime exceptions

added new error types that need to be sanitized
also sanitize deprecated and unsupported exceptions.

* dont reqrewite exceptions unless necessary for checked exceptions

add docs
avoid blanket turning all exceptions into runtime exceptions

* address comments, to fix up docs.

add more javadocs
add support UOE sanitization

* use try catch instead and sanitize at public methods

* checkstyle fixes

* throw noSuchStatement and NoSuchConnection as Avatica is affected by those

* address comments. move log error back to druid meta

clean up bad formatting and commented code. add missed catch for NoSuchStatementException
clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions

* alter test to reflect new error message
2021-11-16 13:13:03 -08:00
Maytas Monsereenusorn ddc68c6a81
Support changing dimension schema in Auto Compaction (#11874)
* add impl

* add unit tests

* fix checkstyle

* add impl

* add impl

* add impl

* add impl

* add impl

* add impl

* fix test

* add IT

* add IT

* fix docs

* add test

* address comments

* fix conflict
2021-11-08 21:17:08 -08:00
Kashif Faraz a22687ecbe
Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.
2021-11-02 12:38:42 +05:30
Maytas Monsereenusorn ba2874ee1f
Support changing query granularity in Auto Compaction (#11856)
* add queryGranularity

* fix checkstyle

* fix test
2021-11-01 15:18:44 -07:00
Maytas Monsereenusorn 33d9d9bd74
Add rollup config to auto and manual compaction (#11850)
* add rollup to auto and manual compaction

* add unit tests

* add unit tests

* add IT

* fix checkstyle
2021-10-29 10:22:25 -07:00
Gian Merlino fc95c92806
Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124)
* Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs.

This patch does the following:

- Removes OffheapIncrementalIndex.
- Clarifies that Aggregators are required to be thread safe.
- Clarifies that BufferAggregators and VectorAggregators are not
  required to be thread safe.
- Removes thread safety code from some DataSketches aggregators that
  had it. (Not all of them did, and that's OK, because it wasn't necessary
  anyway.)
- Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

- It is only used in one rare scenario: groupBy v1 (which is non-default)
  in "useOffheap" mode (also non-default). So you have to go pretty deep
  into the wilderness to get this code to activate in production. It is
  never used during ingestion.
- Its existence complicates developer efforts to reason about how
  aggregators get used, because the way it uses buffer aggregators is so
  different from how every other query engine uses them.
- It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

* Remove things that are now unused.

* Revert removal of getFloat, getLong, getDouble from BufferAggregator.

* OAK-related warnings, suppressions.

* Unused item suppressions.
2021-10-26 08:05:56 -07:00
Gian Merlino 8276c031c5
Add druid.sql.approxCountDistinct.function property. (#11181)
* Add druid.sql.approxCountDistinct.function property.

The new property allows admins to configure the implementation for
APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode.

The motivation for adding this setting is to enable site admins to
switch the default HLL implementation to DataSketches.

For example, an admin can set:

  druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL

* Fixes

* Fix tests.

* Remove erroneous cannotVectorize.

* Remove unused import.

* Remove unused test imports.
2021-10-25 12:16:21 -07:00
Arun Ramani b6b42d3936
Minor processor quota computation fix + docs (#11783)
* cpu/cpuset cgroup and procfs data gathering

* Renames and default values

* Formatting

* Trigger Build

* Add cgroup monitors

* Return 0 if no period

* Update

* Minor processor quota computation fix + docs

* Address comments

* Address comments

* Fix spellcheck

Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>
2021-10-08 22:52:03 -05:00
Lucas Capistrant 1930ad1f47
Implement configurable internally generated query context (#11429)
* Add the ability to add a context to internally generated druid broker queries

* fix docs

* changes after first CI failure

* cleanup after merge with master

* change default to empty map and improve unit tests

* add doc info and fix checkstyle

* refactor DruidSchema#runSegmentMetadataQuery and add a unit test
2021-10-06 09:02:41 -07:00
Kashif Faraz b688db790b
Add Broker config `druid.broker.segment.ignoredTiers` (#11766)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to ignore the segments being served
by the specified historical tiers. By default, no tier is ignored.

This config is useful when you want a completely isolated tier amongst many other tiers.

Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn
and there are several brokers Broker B1, Broker B2 .... Broker Bm

If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers
on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"]
for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]
2021-10-06 10:06:32 +05:30
Charles Smith 621e5ac63f
docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor (#11565)
* docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor

* Update docs/configuration/index.md
2021-10-05 17:38:23 -07:00
Maytas Monsereenusorn f60b3b3bab
fix doc (#11772) 2021-10-05 15:42:11 -07:00
Caroline1000 ffbe303828
Update balancer strategy recommendations (#11759)
* Update balancer strategy recommendations

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-10-05 09:47:37 -07:00
Maytas Monsereenusorn 129911a20e
Add documentations for config to filter internal Druid-related messages from error response (#11755)
* add doc

* add doc

* address comments

* fix typo

* address comments
2021-10-01 17:49:02 +07:00
Kashif Faraz c641657bae
Fix router documentation for `druid.router.sql.enable` (#11716)
* Rename field, fix router documentation

* Add more lines to doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-28 22:54:13 +05:30
Clint Wylie 5de26cf6d9
add optional system schema authorization (#11720)
* add optional system schema authorization

* remove unused

* adjust docs

* doc fixes, missing ldap config change for integration tests

* style
2021-09-21 13:28:26 -07:00
Peter Marshall ee009ec18e
Docs - ingestion task log config and process (#11678)
* Update index.md

Moved H4s underneath the H3 for the task log location and added hyperlinks.

* Update tasks.md

Added process information around log file generation, and subsumed text from the configuration guide into this explanatory text instead.

* Update tasks.md

.html > .md

* Update docs/ingestion/tasks.md

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2021-09-13 15:49:09 -07:00
Charles Smith f9329fbf9e
add clarification for maxSubqueryRows (#11687)
* add clarification for maxSubqueryRows
2021-09-13 11:49:30 -07:00
Suneet Saldanha 531d11abaf
Update description of batchProcessingMode (#11686)
* Update description of batchProcessingMode 

Update the description to explicitly mention a released version of Druid that the original version was referencing

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-10 16:55:48 -07:00
Agustin Gonzalez 9efa6cc9c8
Make persists concurrent with adding rows in batch ingestion (#11536)
* Make persists concurrent with ingestion

* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well

* Go back to documented default persists (zero)

* Move to debug

* Remove unnecessary Atomics

* Comments on synchronization (or not) for sinks & sinkMetadata

* Some cleanup for unit tests but they still need further work

* Shutdown & wait for persists and push on close

* Provide support for three existing batch appenderators using batchProcessingMode flag

* Fix reference to wrong appenderator

* Fix doc typos

* Add BatchAppenderators class test coverage

* Add log message to batchProcessingMode final value, fix typo in enum name

* Another typo and minor fix to log message

* LEGACY->OPEN_SEGMENTS, Edit docs

* Minor update legacy->open segments log message

* More code comments, mostly small adjustments to naming etc

* fix spelling

* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage

* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-09-08 13:31:52 -07:00
Clint Wylie ec334a641b
MySQL extension with MariaDB connector docs (#11608)
* add docs for mariadb support via mysql extensions

* add logging so you know what druid knows

* homogenize

* spelling

* missed a couple
2021-08-19 01:52:26 -07:00
Peter Marshall 8aaefb91e3
Docs - MiddleManager Affinity "strong" definition (#11480)
* Affinity "strong" definition

Reworded "strong" to emphasise meaning and consequences - OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1609558156092800

* Spelling corrections

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-13 19:17:16 -07:00
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
Charles Smith 6524d838d7
Docs refactor of ingestion. Carries #11541 (#11576)
* Docs refactor of ingestion. Carries #11541

* Update docs/misc/math-expr.md

* add Apache license

* fix header, add topics to sidebar

* Update docs/ingestion/partitioning.md

* pick up changes to  and  md from c7fdf1d, #11479

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-08-13 08:42:03 -07:00
Kashif Faraz aaf0aaad8f
Enable routing of SQL queries at Router (#11566)
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.

This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.

To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
2021-08-13 18:44:39 +05:30
Charles Smith 941c5ffb05
clarify JVM tmp dir requires execute on files (#11542)
* clarify JVM tmp dir requires execute on files

* code SysMonitor for spellcheck
2021-08-09 17:25:10 -07:00
Suneet Saldanha e423e99997
Update default maxSegmentsInNodeLoadingQueue (#11540)
* Update default maxSegmentsInNodeLoadingQueue

Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100.

An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability.
Since this is the default druid operators need to run into this instability
and then look through the docs to see that the recommended value for a large
cluster is 1000. This change makes it so the default will prevent clusters
from falling over as they grow over time.

* update tests

* codestyle
2021-08-05 11:26:58 -07:00
frank chen 55a01a030a
Clarify that Broker caching for groupBy v2 queries does not work (#11370)
* Add a note

* Update docs/configuration/index.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* clarify that both of non-result level cache and result level cache are not supported

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-08-03 10:01:15 -07:00
Yuanli Han b83742179a
Reduce method invocation of reservoir sampling (#11257)
* reduce method invocation of reservoir sampling

* add a dynamic parameter and add benchmark

* rebase
2021-07-30 22:09:50 +08:00
Maytas Monsereenusorn c068906fca
Make intermediate store for shuffle tasks an extension point (#11492)
* add interface

* add docs

* fix errors

* fix injection

* fix injection

* update javadoc
2021-07-27 11:29:43 +07:00
Maytas Monsereenusorn 6ce3b6ca2d
Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444)
* fix doc

* address comments

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-07-21 12:48:56 +07:00
Maytas Monsereenusorn 8d7d60d18e
Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440)
* fix pendingTaskBased

* fix doc

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments
2021-07-15 06:52:25 +07:00
Agustin Gonzalez 7e61042794
Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294)
* Bound memory in native batch ingest create segments

* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged

* Changed name from RealtimeAppenderator to StreamAppenderator

* Style

* Incorporating tests from StreamAppenderatorTest

* Keep totalRows and cleanup code

* Added missing dep

* Fix unit test

* Checkstyle

* allowIncrementalPersists should always be true for batch

* Added sinks metadata

* clear sinks metadata when closing appenderator

* Style + minor edits to log msgs

* Update sinks metadata & totalRows when dropping a sink (segment)

* Remove max

* Intelli-j check

* Keep a count of hydrants persisted by sink for sanity check before merge

* Move out sanity

* Add previous hydrant count to sink metadata

* Remove redundant field from SinkMetadata

* Remove unneeded functions

* Cleanup unused code

* Removed unused code

* Remove unused field

* Exclude it from jacoco because it is very hard to get branch coverage

* Remove segment announcement and some other minor cleanup

* Add fallback flag

* Minor code cleanup

* Checkstyle

* Code review changes

* Update batchMemoryMappedIndex name

* Code review comments

* Exclude class from coverage, will include again when packaging gets fixed

* Moved test classes to server module

* More BatchAppenderator cleanup

* Fix bug in wrong counting of totalHydrants plus minor cleanup in add

* Removed left over comments

* Have BatchAppenderator follow the Appenderator contract for push & getSegments

* Fix LGTM violations

* Review comments

* Add stats after push is done

* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)

* Update javadocs

* Add thread safety notice to BatchAppenderator

* Further cleanup config

* More config cleanup
2021-07-09 00:10:29 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
frank chen 60843bd11f
Add configuration suggestion to `druid.indexer.storage.type` (#11304) 2021-05-27 06:44:47 -07:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Agustin Gonzalez 8e5048e643
Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123)
* Avoid mapping hydrants in create segments phase for native ingestion

* Drop queriable indices after a given sink is fully merged

* Do not drop memory mappings for realtime ingestion

* Style fixes

* Renamed to match use case better

* Rollback memoization code and use the real time flag instead

* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations

* Style

* Log some count stats

* Make sure sinks size is obtained at the right time

* BatchAppenderator unit test

* Fix comment typos

* Renamed methods to make them more readable

* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator

* Missing dependency

* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.

* Replaced concurrent variables with normal ones

* Added   batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.

* Style fix.

* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.

* Forgot to commit this edited documentation message
2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn 4326e699bd
Add feature to automatically remove datasource metadata based on retention period (#11227)
* add auto clean up datasource metadata

* add test

* fix checkstyle

* add comments

* fix error

* address comments

* Address comments

* fix test

* fix test

* fix typo

* add comment

* fix test

* fix test
2021-05-11 01:22:33 -07:00
Charles Smith fae7ebf489
change errant 'none' configuration to 'manual': (#11218) 2021-05-10 22:04:18 -07:00
frank chen fa113fb4a9
Fix default value (#11220) 2021-05-10 10:11:26 -07:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00