Commit Graph

2325 Commits

Author SHA1 Message Date
sthetland a366753ba5
Consolidate multi-value dimension doc and highlight configurability (#11428)
* Clarify options for multi-value dims
* Add first example
2021-07-15 10:19:10 -07:00
Maytas Monsereenusorn 8d7d60d18e
Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440)
* fix pendingTaskBased

* fix doc

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments
2021-07-15 06:52:25 +07:00
Maytas Monsereenusorn 05d5dd9289
compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded (#11426)
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* fix test

* fix test
2021-07-13 09:48:06 +07:00
Agustin Gonzalez 7e61042794
Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294)
* Bound memory in native batch ingest create segments

* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged

* Changed name from RealtimeAppenderator to StreamAppenderator

* Style

* Incorporating tests from StreamAppenderatorTest

* Keep totalRows and cleanup code

* Added missing dep

* Fix unit test

* Checkstyle

* allowIncrementalPersists should always be true for batch

* Added sinks metadata

* clear sinks metadata when closing appenderator

* Style + minor edits to log msgs

* Update sinks metadata & totalRows when dropping a sink (segment)

* Remove max

* Intelli-j check

* Keep a count of hydrants persisted by sink for sanity check before merge

* Move out sanity

* Add previous hydrant count to sink metadata

* Remove redundant field from SinkMetadata

* Remove unneeded functions

* Cleanup unused code

* Removed unused code

* Remove unused field

* Exclude it from jacoco because it is very hard to get branch coverage

* Remove segment announcement and some other minor cleanup

* Add fallback flag

* Minor code cleanup

* Checkstyle

* Code review changes

* Update batchMemoryMappedIndex name

* Code review comments

* Exclude class from coverage, will include again when packaging gets fixed

* Moved test classes to server module

* More BatchAppenderator cleanup

* Fix bug in wrong counting of totalHydrants plus minor cleanup in add

* Removed left over comments

* Have BatchAppenderator follow the Appenderator contract for push & getSegments

* Fix LGTM violations

* Review comments

* Add stats after push is done

* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)

* Update javadocs

* Add thread safety notice to BatchAppenderator

* Further cleanup config

* More config cleanup
2021-07-09 00:10:29 -07:00
Joseph Glanville d5e8d4d680
Avro union support (#10505)
* Avro union support

* Document new union support

* Add support for AvroStreamInputFormat and fix checkstyle

* Extend multi-member union test schema and format

* Some additional docs and add Enums to spelling

* Rename explodeUnions -> extractUnions

* explode -> extract

* ByType

* Correct spelling error
2021-07-06 22:05:41 -07:00
Clint Wylie 17efa6f556
add single input string expression dimension vector selector and better expression planning (#11213)
* add single input string expression dimension vector selector and better expression planning

* better

* fixes

* oops

* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing

* oops

* javadocs, renaming

* more javadocs

* benchmarks

* use string expression vector processor with vector size 1 instead of expr.eval

* better logging

* javadocs, surprising number of the the

* more

* simplify
2021-07-06 11:20:49 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
Clint Wylie df9b57aa1a
bitwise aggregators, better null handling options for expression agg (#11280)
* bitwise aggregators, better nulls for expression agg

* correct behavior

* rework deserialize, better names

* fix json, share mask
2021-06-25 16:51:16 -07:00
sthetland fd0931d35e
Azure data lake input source (#11153)
* Mention Azure Data Lake

* Make consistent with other entries

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-06-25 15:54:34 -07:00
Hoseung Lee ed0a57e106
Update kafka-ingestion.md to clarify PasswordProvider support limitation (#11374)
Co-authored-by: Clint Wylie <cjwylie@gmail.com>

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-06-24 21:54:48 -07:00
Yi Yuan de8daf8139
Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351)
* delete_buildV9Directly_in_kafka_and_kinesis_indexing_service

* delete

* delete them from server

* delete buildV9Directly from hadoop indexing

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-23 16:36:46 -07:00
Clint Wylie bfbd7ec432
fix a bugs related to SQL type inference return type nullability (#11327)
* fix a bunch of type inference nullability bugs

* fixes

* style

* fix test

* fix concat
2021-06-15 12:26:59 -07:00
Charles Smith a1ed3a407d
clarify bySegment is native only (#11331) 2021-06-11 13:48:17 -07:00
Yi Yuan 8de0d36c52
Allow query through router when load moving average extension (#11276)
* init commit

* change NoopQuerySegmentWalker name

* change doc

* move NoopQuerySegmentWalker and add document

* fix doc

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-10 18:46:53 +08:00
Egor Riashin 9047fa3d9c
S3 ingestion can assume role (#10995)
* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* feature s3 assume role

* tests fix

* spelling fix

* sts fix

Co-authored-by: egor-ryashin <egor.ryashin@rilldata.com>
2021-06-09 16:02:35 +05:30
Yi Yuan 145cf9e5c3
fix document about input format (#11342)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-08 23:44:54 +08:00
frank chen 2ee7e31e5b
Fix syntax error (#11332) 2021-06-07 22:35:02 -07:00
frank chen d5139c9543
Fix permission problems in docker (#11299)
* Create /opt/data to fix permission problem

* eliminate symlink to avoid compatibility problem on AWS Fargate

* Add a workaround section

* Update instruction for named volume

* Use named volume in docker-compose

* Revert some doc change

* Resolve review comments
2021-06-01 17:33:27 -07:00
frank chen e664bfd433
Improve doc of movingAverage (#11262)
* Make doc more directive

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Add limitation

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Suppress spelling check error
2021-05-28 13:10:55 +08:00
frank chen 60843bd11f
Add configuration suggestion to `druid.indexer.storage.type` (#11304) 2021-05-27 06:44:47 -07:00
Xavier Léauté b517c3339b
remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073)
With this change, Druid will only support ZooKeeper 3.5.x and later.

In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x
(see #10780 for the detailed reasons) 

* remove ZooKeeper 3.4.x compatibility
* exclude additional ZK 3.5.x netty dependencies to ensure we use our version
* keep ZooKeeper version used for integration tests in sync with client library version
* remove the need to specify ZK version at runtime for docker
* add support to run integration tests with JDK 15
* build and run unit tests with Java 15 in travis
2021-05-25 12:49:49 -07:00
Agustin Gonzalez 4ba5738ffb
Add an issues section to deal with common issues when building druid (#11271) 2021-05-21 09:04:51 -07:00
Charles Smith 403dcf5cfb
fixes some typos, edits for style (#11258) 2021-05-21 08:58:39 -07:00
Charles Smith fcb4eaa3d4
add docs for high-churn datasource cleanup (#11245)
* add docs for high-churn datasource cleanup

* fix most comments except for task log

* address  comments

* update strategy recommendation

* address addtional comments

* fix

* address comments

* address comments from @sthetland
2021-05-20 09:48:42 -07:00
Clint Wylie 3649c608d2
array handling improvements (#11233)
* fix jdbc array handling, split handling for some array and multi value operator, split and add more tests

* formatting
2021-05-13 18:50:32 -07:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Agustin Gonzalez 8e5048e643
Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123)
* Avoid mapping hydrants in create segments phase for native ingestion

* Drop queriable indices after a given sink is fully merged

* Do not drop memory mappings for realtime ingestion

* Style fixes

* Renamed to match use case better

* Rollback memoization code and use the real time flag instead

* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations

* Style

* Log some count stats

* Make sure sinks size is obtained at the right time

* BatchAppenderator unit test

* Fix comment typos

* Renamed methods to make them more readable

* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator

* Missing dependency

* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.

* Replaced concurrent variables with normal ones

* Added   batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.

* Style fix.

* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.

* Forgot to commit this edited documentation message
2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn 4326e699bd
Add feature to automatically remove datasource metadata based on retention period (#11227)
* add auto clean up datasource metadata

* add test

* fix checkstyle

* add comments

* fix error

* address comments

* Address comments

* fix test

* fix test

* fix typo

* add comment

* fix test

* fix test
2021-05-11 01:22:33 -07:00
Charles Smith fae7ebf489
change errant 'none' configuration to 'manual': (#11218) 2021-05-10 22:04:18 -07:00
Clint Wylie 691d7a1d54
SQL timeseries no longer skip empty buckets with all granularity (#11188)
* SQL timeseries no longer skip empty buckets with all granularity

* add comment, fix tests

* the ol switcheroo

* revert unintended change

* docs and more tests

* style

* make checkstyle happy

* docs fixes and more tests

* add docs, tests for array_agg

* fixes

* oops

* doc stuffs

* fix compile, match doc style
2021-05-10 10:13:37 -07:00
frank chen fa113fb4a9
Fix default value (#11220) 2021-05-10 10:11:26 -07:00
Yuanli Han 14f1f2aa76
Fix a broken link in the development doc (#11226) 2021-05-10 16:14:06 +08:00
Yuanli Han 8647040f4d
Allow user to set group.id for Kafka ingestion task (#11147)
* allow user to set group.id for Kafka ingestion task

* fix test coverage by removing deprecated code and add doc

* fix typo

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: frank chen <frankchen@apache.org>

Co-authored-by: frank chen <frankchen@apache.org>
2021-05-09 11:56:19 +08:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00
Charles Smith cf2cde1d2d
add links to release notes, light refactor of landing page (#11051)
* add links to release notes, light refactor of landing page

* Update docs/design/index.md
2021-05-07 14:26:47 -07:00
benkrug 49c8307b72
Update datasource.md (#10864)
* Update datasource.md

Change "table" to "datasource" in join discussion: This means that all datasources
other than the leftmost "base" table must fit in memory.

According to docs on datasources, "datasource" is the more general term, and a table is a kind of datasource.  In the context here, then, "datasource" is applicable.

* left-hand table -> left-hand datasource

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-05-07 01:14:45 -07:00
Lasse Krogh Mammen 9be2a5cdc2
Add documentation re alphabetical sorted of MV dimensions (#10695) 2021-05-07 01:12:32 -07:00
Maytas Monsereenusorn d73f72e508
Add feature to automatically remove supervisor based on retention period (#11200)
* add auto clean up

* add test

* add test

* fix test

* Address comments

* Address comments
2021-05-06 22:25:23 -07:00
imply-jbalik 4adb121234
Fix example of prefixes for Cloud Input Sources(eg. S3) (#11192)
Fixed a syntax error in "prefix" lines in docs/ingestion/native-batch.md

S3 requires a trailing slash for directory like structures, so this updates the examples to include the trailing slashes.
2021-05-05 21:19:31 -07:00
Yuanli Han 34169c8550
fix doc (#11202)
(cherry picked from commit ffb3c049726b5e461c6f7f8b6f4b75d2cb907dcc)
2021-05-05 06:17:07 -07:00
Lucas Capistrant bb3c810b36
Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135)
* lay the groundwork for throttling replicant loads per RunRules execution

* Add dynamic coordinator config to control new replicant threshold.

* remove redundant line

* add some unit tests

* fix checkstyle error

* add documentation for new dynamic config

* improve docs and logs

* Alter how null is handled for new config. If null, manually set as default
2021-05-05 07:39:36 -05:00
Clint Wylie 554f1ffeee
ARRAY_AGG sql aggregator function (#11157)
* ARRAY_AGG sql aggregator function

* add javadoc

* spelling

* review stuff, return null instead of empty when nil input

* review stuff

* Update sql.md

* use type inference for finalize, refactor some things
2021-05-03 22:17:10 -07:00
imply-jbalik 6f7701e742
fixed array syntax (#11191) 2021-05-03 21:38:16 -07:00
sthetland ca1412d574
Reduce visibility of Tranquility documentation (#11134)
* reduce visibility of tranquility doc

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-05-03 16:48:24 -07:00
Maytas Monsereenusorn 84aac4832d
Add feature to automatically remove rules based on retention period (#11164)
* Add feature to automatically remove rules based on retention period

* Add feature to automatically remove rules based on retention period

* address comments
2021-05-03 11:50:45 -07:00
benkrug fdab95ea99
Update index.md (#11174)
tiny change for readability
2021-04-30 09:40:19 -07:00
Jeet Patel 7139c60868
Change the `id` for `kubernetes` doc link to work (#11176)
* Change the `id` for doc link to work

* Added `druid-kubernetes-extensions` to the list
2021-04-28 10:12:28 -07:00
Jeet Patel 31042cddf5
Fix `defaultMetricDimensions.json` path link (#11156) 2021-04-24 11:08:03 +08:00
Gian Merlino a47c0d2579
Clarify meaning of "root-level fields" in the documentation. (#11143) 2021-04-24 11:06:08 +08:00
Clint Wylie 57ff1f9cdb
expression aggregator (#11104)
* add experimental expression aggregator

* add test

* fix lgtm

* fix test

* adjust test

* use not null constant

* array_set_concat docs

* add equals and hashcode and tostring

* fix it

* spelling

* do multi-value magic for expression agg, more javadocs, tests

* formatting

* fix inspection

* more better

* nullable
2021-04-22 18:30:16 -07:00