druid

Commit Graph

Author	SHA1	Message	Date
Paul Rogers	aa8c615ac2	Updates to source and doc build pages (#11464 ) * Updates to source and doc build pages. Clarifies a few points for newbies. * Fixed spelling error And added spellcheck info to website README file.	2021-07-20 18:07:34 -07:00
Abhishek Agarwal	94c1671eaf	Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466 ) This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager. SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects. SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.	2021-07-21 00:14:19 +05:30
jerryleooo	c7fdf1d685	Fix typo in ingestion spec sample (#11433 ) * Update index.md Fix typo in the ingestion spec sample * fixed more typos	2021-07-19 22:02:21 -07:00
sthetland	a366753ba5	Consolidate multi-value dimension doc and highlight configurability (#11428 ) * Clarify options for multi-value dims * Add first example	2021-07-15 10:19:10 -07:00
Maytas Monsereenusorn	8d7d60d18e	Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440 ) * fix pendingTaskBased * fix doc * address comments * address comments * address comments * address comments * address comments * address comments * address comments	2021-07-15 06:52:25 +07:00
Maytas Monsereenusorn	05d5dd9289	compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded (#11426 ) * compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded * compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded * compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded * fix test * fix test	2021-07-13 09:48:06 +07:00
Agustin Gonzalez	7e61042794	Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294 ) * Bound memory in native batch ingest create segments * Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that * Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged * Changed name from RealtimeAppenderator to StreamAppenderator * Style * Incorporating tests from StreamAppenderatorTest * Keep totalRows and cleanup code * Added missing dep * Fix unit test * Checkstyle * allowIncrementalPersists should always be true for batch * Added sinks metadata * clear sinks metadata when closing appenderator * Style + minor edits to log msgs * Update sinks metadata & totalRows when dropping a sink (segment) * Remove max * Intelli-j check * Keep a count of hydrants persisted by sink for sanity check before merge * Move out sanity * Add previous hydrant count to sink metadata * Remove redundant field from SinkMetadata * Remove unneeded functions * Cleanup unused code * Removed unused code * Remove unused field * Exclude it from jacoco because it is very hard to get branch coverage * Remove segment announcement and some other minor cleanup * Add fallback flag * Minor code cleanup * Checkstyle * Code review changes * Update batchMemoryMappedIndex name * Code review comments * Exclude class from coverage, will include again when packaging gets fixed * Moved test classes to server module * More BatchAppenderator cleanup * Fix bug in wrong counting of totalHydrants plus minor cleanup in add * Removed left over comments * Have BatchAppenderator follow the Appenderator contract for push & getSegments * Fix LGTM violations * Review comments * Add stats after push is done * Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.) * Update javadocs * Add thread safety notice to BatchAppenderator * Further cleanup config * More config cleanup	2021-07-09 00:10:29 -07:00
Joseph Glanville	d5e8d4d680	Avro union support (#10505 ) * Avro union support * Document new union support * Add support for AvroStreamInputFormat and fix checkstyle * Extend multi-member union test schema and format * Some additional docs and add Enums to spelling * Rename explodeUnions -> extractUnions * explode -> extract * ByType * Correct spelling error	2021-07-06 22:05:41 -07:00
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
sthetland	fd0931d35e	Azure data lake input source (#11153 ) * Mention Azure Data Lake * Make consistent with other entries Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-06-25 15:54:34 -07:00
Hoseung Lee	ed0a57e106	Update kafka-ingestion.md to clarify PasswordProvider support limitation (#11374 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-06-24 21:54:48 -07:00
Yi Yuan	de8daf8139	Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351 ) * delete_buildV9Directly_in_kafka_and_kinesis_indexing_service * delete * delete them from server * delete buildV9Directly from hadoop indexing * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-23 16:36:46 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Charles Smith	a1ed3a407d	clarify bySegment is native only (#11331 )	2021-06-11 13:48:17 -07:00
Yi Yuan	8de0d36c52	Allow query through router when load moving average extension (#11276 ) * init commit * change NoopQuerySegmentWalker name * change doc * move NoopQuerySegmentWalker and add document * fix doc Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-10 18:46:53 +08:00
Egor Riashin	9047fa3d9c	S3 ingestion can assume role (#10995 ) * feature s3 assume role * feature s3 assume role * feature s3 assume role * feature s3 assume role * feature s3 assume role * feature s3 assume role * tests fix * spelling fix * sts fix Co-authored-by: egor-ryashin <egor.ryashin@rilldata.com>	2021-06-09 16:02:35 +05:30
Yi Yuan	145cf9e5c3	fix document about input format (#11342 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-08 23:44:54 +08:00
frank chen	2ee7e31e5b	Fix syntax error (#11332 )	2021-06-07 22:35:02 -07:00
frank chen	d5139c9543	Fix permission problems in docker (#11299 ) * Create /opt/data to fix permission problem * eliminate symlink to avoid compatibility problem on AWS Fargate * Add a workaround section * Update instruction for named volume * Use named volume in docker-compose * Revert some doc change * Resolve review comments	2021-06-01 17:33:27 -07:00
frank chen	e664bfd433	Improve doc of movingAverage (#11262 ) * Make doc more directive Signed-off-by: frank chen <frank.chen021@outlook.com> * Add limitation Signed-off-by: frank chen <frank.chen021@outlook.com> * Suppress spelling check error	2021-05-28 13:10:55 +08:00
frank chen	60843bd11f	Add configuration suggestion to `druid.indexer.storage.type` (#11304 )	2021-05-27 06:44:47 -07:00
Xavier Léauté	b517c3339b	remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073 ) With this change, Druid will only support ZooKeeper 3.5.x and later. In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x (see #10780 for the detailed reasons) * remove ZooKeeper 3.4.x compatibility * exclude additional ZK 3.5.x netty dependencies to ensure we use our version * keep ZooKeeper version used for integration tests in sync with client library version * remove the need to specify ZK version at runtime for docker * add support to run integration tests with JDK 15 * build and run unit tests with Java 15 in travis	2021-05-25 12:49:49 -07:00
Agustin Gonzalez	4ba5738ffb	Add an issues section to deal with common issues when building druid (#11271 )	2021-05-21 09:04:51 -07:00
Charles Smith	403dcf5cfb	fixes some typos, edits for style (#11258 )	2021-05-21 08:58:39 -07:00
Charles Smith	fcb4eaa3d4	add docs for high-churn datasource cleanup (#11245 ) * add docs for high-churn datasource cleanup * fix most comments except for task log * address comments * update strategy recommendation * address addtional comments * fix * address comments * address comments from @sthetland	2021-05-20 09:48:42 -07:00
Clint Wylie	3649c608d2	array handling improvements (#11233 ) * fix jdbc array handling, split handling for some array and multi value operator, split and add more tests * formatting	2021-05-13 18:50:32 -07:00
Maytas Monsereenusorn	3455352241	Add feature to automatically remove compaction configurations for inactive datasources (#11232 ) * add auto cleanup * add auto cleanup * add auto cleanup * add tests * add tests * use retryutils * use retryutils * use retryutils * address comments	2021-05-11 18:49:18 -07:00
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn	4326e699bd	Add feature to automatically remove datasource metadata based on retention period (#11227 ) * add auto clean up datasource metadata * add test * fix checkstyle * add comments * fix error * address comments * Address comments * fix test * fix test * fix typo * add comment * fix test * fix test	2021-05-11 01:22:33 -07:00
Charles Smith	fae7ebf489	change errant 'none' configuration to 'manual': (#11218 )	2021-05-10 22:04:18 -07:00
Clint Wylie	691d7a1d54	SQL timeseries no longer skip empty buckets with all granularity (#11188 ) * SQL timeseries no longer skip empty buckets with all granularity * add comment, fix tests * the ol switcheroo * revert unintended change * docs and more tests * style * make checkstyle happy * docs fixes and more tests * add docs, tests for array_agg * fixes * oops * doc stuffs * fix compile, match doc style	2021-05-10 10:13:37 -07:00
frank chen	fa113fb4a9	Fix default value (#11220 )	2021-05-10 10:11:26 -07:00
Yuanli Han	14f1f2aa76	Fix a broken link in the development doc (#11226 )	2021-05-10 16:14:06 +08:00
Yuanli Han	8647040f4d	Allow user to set group.id for Kafka ingestion task (#11147 ) * allow user to set group.id for Kafka ingestion task * fix test coverage by removing deprecated code and add doc * fix typo * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: frank chen <frankchen@apache.org> Co-authored-by: frank chen <frankchen@apache.org>	2021-05-09 11:56:19 +08:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
Charles Smith	cf2cde1d2d	add links to release notes, light refactor of landing page (#11051 ) * add links to release notes, light refactor of landing page * Update docs/design/index.md	2021-05-07 14:26:47 -07:00
benkrug	49c8307b72	Update datasource.md (#10864 ) * Update datasource.md Change "table" to "datasource" in join discussion: This means that all datasources other than the leftmost "base" table must fit in memory. According to docs on datasources, "datasource" is the more general term, and a table is a kind of datasource. In the context here, then, "datasource" is applicable. * left-hand table -> left-hand datasource Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-05-07 01:14:45 -07:00
Lasse Krogh Mammen	9be2a5cdc2	Add documentation re alphabetical sorted of MV dimensions (#10695 )	2021-05-07 01:12:32 -07:00
Maytas Monsereenusorn	d73f72e508	Add feature to automatically remove supervisor based on retention period (#11200 ) * add auto clean up * add test * add test * fix test * Address comments * Address comments	2021-05-06 22:25:23 -07:00
imply-jbalik	4adb121234	Fix example of prefixes for Cloud Input Sources(eg. S3) (#11192 ) Fixed a syntax error in "prefix" lines in docs/ingestion/native-batch.md S3 requires a trailing slash for directory like structures, so this updates the examples to include the trailing slashes.	2021-05-05 21:19:31 -07:00
Yuanli Han	34169c8550	fix doc (#11202 ) (cherry picked from commit ffb3c049726b5e461c6f7f8b6f4b75d2cb907dcc)	2021-05-05 06:17:07 -07:00
Lucas Capistrant	bb3c810b36	Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135 ) * lay the groundwork for throttling replicant loads per RunRules execution * Add dynamic coordinator config to control new replicant threshold. * remove redundant line * add some unit tests * fix checkstyle error * add documentation for new dynamic config * improve docs and logs * Alter how null is handled for new config. If null, manually set as default	2021-05-05 07:39:36 -05:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
imply-jbalik	6f7701e742	fixed array syntax (#11191 )	2021-05-03 21:38:16 -07:00
sthetland	ca1412d574	Reduce visibility of Tranquility documentation (#11134 ) * reduce visibility of tranquility doc Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-05-03 16:48:24 -07:00
Maytas Monsereenusorn	84aac4832d	Add feature to automatically remove rules based on retention period (#11164 ) * Add feature to automatically remove rules based on retention period * Add feature to automatically remove rules based on retention period * address comments	2021-05-03 11:50:45 -07:00
benkrug	fdab95ea99	Update index.md (#11174 ) tiny change for readability	2021-04-30 09:40:19 -07:00
Jeet Patel	7139c60868	Change the `id` for `kubernetes` doc link to work (#11176 ) * Change the `id` for doc link to work * Added `druid-kubernetes-extensions` to the list	2021-04-28 10:12:28 -07:00
Jeet Patel	31042cddf5	Fix `defaultMetricDimensions.json` path link (#11156 )	2021-04-24 11:08:03 +08:00
Gian Merlino	a47c0d2579	Clarify meaning of "root-level fields" in the documentation. (#11143 )	2021-04-24 11:06:08 +08:00
Clint Wylie	57ff1f9cdb	expression aggregator (#11104 ) * add experimental expression aggregator * add test * fix lgtm * fix test * adjust test * use not null constant * array_set_concat docs * add equals and hashcode and tostring * fix it * spelling * do multi-value magic for expression agg, more javadocs, tests * formatting * fix inspection * more better * nullable	2021-04-22 18:30:16 -07:00
Maytas Monsereenusorn	6d2b5cdd7e	Add feature to automatically remove audit logs based on retention period (#11084 ) * add docs * add impl * fix checkstyle * fix test * add test * fix checkstyle * fix checkstyle * fix test * Address comments * Address comments * fix spelling * fix docs	2021-04-20 17:10:43 -07:00
Charles Smith	09dcf6aa36	fix syntax error for loadstatus api (#11136 )	2021-04-20 14:17:20 +08:00
Gian Merlino	cb7c6ac314	Doc updates for union datasources. (#11103 ) The main one is updating datasources.md to talk about SQL. (It still said that table unions are not supported in SQL.) Also, this doc update adds some clarifying details on limitations.	2021-04-14 18:18:14 -07:00
Charles Smith	b51632b0bf	Update security overview with additional recommendations (#11016 ) * updatee security overview with additional recommendations for improved security * address first set of review questions * Update docs/operations/security-overview.md * Update docs/operations/security-overview.md * apply changes from review * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update security-overview.md fix additional comments & typos cc: @suneet-s, @jihoonsoon Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-04-14 08:58:17 -07:00
Maytas Monsereenusorn	f968400170	Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078 ) * Add config to skip storing audit payload if exceed limit * fix checkstyle * change config name * skip null fields for audit payload * fix checkstyle * address comments * fix guice * fix test * add tests * address comments * address comments * address comments * fix checkstyle * address comments * fix test * fix test * address comments * Address comments Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-13 20:18:28 -07:00
bergmt2000	f60d8ea1c3	Update index.md (#11105 ) Fix json typo in readme for granularitySpec in compaction config example	2021-04-13 16:26:36 +08:00
Yi Yuan	0e0c1a1aaf	add protobuf inputformat (#11018 ) * add protobuf inputformat * repair pom * alter intermediateRow to type of Dynamicmessage * add document * refine test * fix document * add protoBytesDecoder * refine document and add ser test * add hash * add schema registry ser test Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-04-12 22:03:13 -07:00
Yi Yuan	d0a94a8c14	add avro stream input format (#11040 ) * add avro stream input format * bug fixed * add document * doc fix * change doc * add integretion test * bug fixed * bug fixed * add string as binary getter Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-04-12 21:53:41 -07:00
zhangyue19921010	95b82dd325	Add missing API references for coordinator (#10967 ) * add miss API references for coordinator * add miss API references for coordinator * add miss API references for coordinator Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-04-09 18:20:47 -07:00
Maytas Monsereenusorn	4576152e4a	Make dropExisting flag for Compaction configurable and add warning documentations (#11070 ) * Make dropExisting flag for Compaction configurable * fix checkstyle * fix checkstyle * fix test * add tests * fix spelling * fix docs * add IT * fix test * fix doc * fix doc	2021-04-09 00:12:28 -07:00
Lucas Capistrant	8264203cee	Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676 ) * Add ability to wait for segment availability for batch jobs * IT updates * fix queries in legacy hadoop IT * Fix broken indexing integration tests * address an lgtm flag * spell checker still flagging for hadoop doc. adding under that file header too * fix compaction IT * Updates to wait for availability method * improve unit testing for patch * fix bad indentation * refactor waitForSegmentAvailability * Fixes based off of review comments * cleanup to get compile after merging with master * fix failing test after previous logic update * add back code that must have gotten deleted during conflict resolution * update some logging code * fixes to get compilation working after merge with master * reset interrupt flag in catch block after code review pointed it out * small changes following self-review * fixup some issues brought on by merge with master * small changes after review * cleanup a little bit after merge with master * Fix potential resource leak in AbstractBatchIndexTask * syntax fix * Add a Compcation TuningConfig type * add docs stipulating the lack of support by Compaction tasks for the new config * Fixup compilation errors after merge with master * Remove erreneous newline	2021-04-08 21:03:00 -07:00
sthetland	dd4c5f2a17	Update using-caching.md (#11069 )	2021-04-08 16:48:26 -05:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00
Himanshu	a0d52c3def	k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value (#10961 ) * k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value * update doc * fix test	2021-04-07 17:45:28 -07:00
Cameron Teasdale	786207995e	add minimal documentation for expression filters (#11045 ) * add minimal documentation for expression filters * Update docs/querying/filters.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/querying/filters.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/filters.md Co-authored-by: Alejandro Lujan <andanthor@gmail.com> * Update docs/querying/filters.md Co-authored-by: Alejandro Lujan <andanthor@gmail.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Alejandro Lujan <andanthor@gmail.com>	2021-04-07 16:58:28 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
Jihoon Son	cc12a57034	Enforce allow list for JDBC properties by default (#11063 ) * Enforce allow list for JDBC properties by default * fix tests	2021-04-06 19:46:19 -07:00
zachjsh	8cf1e83543	Add paramter to loadstatus API to compute underdeplication against cluster view (#11056 ) * Add paramter to loadstatus API to compute underdeplication against cluster view This change adds a query parameter `computeUsingClusterView` to loadstatus apis that if specified have the coordinator compute undereplication for segments based on the number of services available within cluster that the segment can be replicated on, instead of the configured replication count configured in load rule. A default load rule is created in all clusters that specified that all segments should be replicated 2 times. As replicas are forced to be on separate nodes in the cluster, this causes the loadstatus api to report that there are under-replicated segments when there is only 1 data server in the cluster. In this case, calling loadstatus api without this new query parameter will always result in a response indicating under-replication of segments * * fix exception mapper * * Address review comments * * update external API docs * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * * update more external docs * * update javadoc * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-04-05 00:02:43 -04:00
Clint Wylie	470d659ca0	add documentation for coordinator dynamic configuration (#11052 )	2021-04-02 22:01:43 -07:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Maytas Monsereenusorn	d7f5293364	Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (#11025 ) * Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer. * Add option to drop segments after ingestion * fix checkstyle * add tests * add tests * add tests * fix test * add tests * fix checkstyle * fix checkstyle * add docs * fix docs * address comments * address comments * fix spelling	2021-04-01 12:29:36 -07:00
Charles Smith	67dd61e6e4	remove outdated info from faq (#11053 ) * remove outdated info from faq	2021-04-01 08:13:29 -07:00
Parag Jain	b35486fa81	request logs through kafka emitter (#11036 ) * request logs through kafka emitter * travis fixes * review comments * kafka emitter unit test * new line * travis checks * checkstyle fix * count request lost when request topic is null	2021-04-01 11:31:32 +05:30
Lasse Krogh Mammen	782a1d4e6c	Add Calcite Avatica protobuf handler (#10543 )	2021-03-31 12:46:25 -07:00
Tushar Raj	6789ed0a05	Update reset-cluster.md (#10990 ) fixed Error: Could not find or load main class org.apache.druid.cli.Main	2021-03-29 20:38:35 -07:00
Charles Smith	8544d29bc7	remove experimental from Kinesis with caveats (#10998 ) * remove experimental from Kinesis with caveats * add suggested known issue * spelling fixes	2021-03-29 13:57:58 -07:00
Parag Jain	2fdc313e4d	GCS lookup support (#11026 ) * GCS lookup support * checkstyle fix * review comments * review comments * remove unused import	2021-03-30 01:40:41 +05:30
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00
Atul Mohan	3d7e7c2c83	Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213 ) * Skip queue removal on timeout * Clarify error * Add new config to control replication Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2021-03-17 11:34:05 -07:00
Mohammadamin Karbasforushan	dfad38d561	Fix unclear documentation of human readable byte (#10825 ) * Fix unclear documentation of human readable byte Follows https://github.com/apache/druid/pull/10203 ; See https://github.com/apache/druid/pull/10203#issuecomment-771080634 . * Fix sentence style Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-11 00:01:38 -08:00
benkrug	7f96ca8f5e	Update topnquery.md (#10944 ) minor edits of the English, no meanings changed (imo)	2021-03-09 15:19:02 -08:00
Yi Yuan	36e86a2880	Add protobuf schema registry (#10839 ) * dd_protobuf_schema_registry * change licese * delete some annotation * nodify tests * delete extra exception * add licenses * add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version * seperate kafka-protobuf-provider * modify protobuf.md * refine protobuf.md * add config and header * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-03-09 15:15:51 -08:00
Tianxin Zhao	a57c28e9ce	prometheus metric exporter (#10412 ) * prometheus-emitter * use existing jetty server to expose prometheus collection endpoint * unused variables * better variable names * removed unused dependencies * more metric definitions * reorganize * use prometheus HTTPServer instead of hooking into Jetty server * temporary empty help string * temporary non-empty help. fix incorrect dimension value in JSON (also updated statsd json) * added full help text. added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation * added documentation for prometheus emitter * safety for invalid labelNames * fix travis checks * Unit test and better sanitization of metrics names and label values * add precondition to check namespace against regex * use precompiled regex * remove static imports. fix metric types * better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start() * Update regex for label-value replacements to allow internal numeric values. Additional tests * Adds missing license header updates website/.spelling to add words used in prometheus-emitter docs. updates docs/operations/metrics.md to correct the spelling of bufferPoolName * fixes version in extensions-contrib/prometheus-emitter * fix style guide errors * update import ordering * add another word to website/.spelling * remove unthrown declared exception * remove unused import * Pushgateway strategy for metrics * typo * Format fix and nullable strategy * Update pom file for prometheus-emitter * code review comments. Counter to gauge for cache metrics, periodical task to pushGateway * Syntax fix * Dimension label regex include numeric character back, fix previous commit * bump prometheus-emitter pom dev version * Remove scheduled task inside poen that push metrics * Fix checkstyle * Unit test coverage * Unit test coverage * Spelling * Doc fix * spelling Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com> Co-authored-by: Michael Schiff <schiff.michael@gmail.com> Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com> Co-authored-by: Tianxin Zhao <tizhao@adobe.com>	2021-03-09 14:37:31 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Charles Smith	0f81ce32a0	refactor query caching docs (#10848 ) * refactor query caching * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * add description for context link * accept suggestions * reword, rework some awkward language * incorporate feedback, fix errors * add back perf considerations * Apply suggestions from code review applying @suneet-s 's changes Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update caching.md fix link Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-03-08 22:25:48 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Jihoon Son	16acd6686a	Remove stale 'namespace' config for JDBC lookups from doc (#10886 ) * Remove stale 'namespace' config for JDBC lookups from doc and web-console * revert webconsole change * address comments	2021-03-04 17:16:34 -08:00
Atul Mohan	be2ac8d6ce	Document type inference issues with dynamic params in SQL (#10801 ) * Clarify docs * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-04 03:48:11 -08:00
spinatelli	99198c02af	Add config and header support for confluent schema registry. (#10314 ) * Add config and header support for confluent schema registry. (porting code from https://github.com/apache/druid/pull/9096) * Add Eclipse Public License 2.0 to license check * Update licenses.yaml, revert changes to check-licenses.py and dependencies for integration-tests * Add spelling exception and remove unused dependency * Use non-deprecated getSchemaById() and remove duplicated license entry * Update docs/ingestion/data-formats.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Added check for schema being null, as per Confluent code * Missing imports and whitespace * Updated unit tests with AvroSchema Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@7-tv.de> Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@joyn.de> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-02-27 14:25:35 -08:00
Charles Smith	573de3bc0d	clarify security requirements around HTTPInputSource (#10914 ) * clarify security requirements around HTTPInputSource * explicitly mention write/datasource in best practices. clarify that the ingestion task is the risk * Update docs/operations/security-overview.md Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-02-26 09:37:47 -08:00
zachjsh	67eff4110d	Improve Druid ldap auth documentation (#10915 ) * Improve Druid ldap auth documentation Improved the ldap auth docs by clarifying that the object classes and attributes noted are specific to Microsoft Active Directory, and could be different depending on the specific ldap server being used. Also emphasized the importance of the memberOf field and noted that the step about adding users to roles is only needed in certain circumstances. * * add another note * Apply suggestions from code review Co-authored-by: sthetland <steve.hetland@imply.io> * * simplify * * Address review comments Co-authored-by: sthetland <steve.hetland@imply.io>	2021-02-24 15:28:41 -08:00
Clint Wylie	f34c6eb3c0	add druid jdbc handler config for minimum number of rows per frame (#10880 ) * add druid jdbc handler config for minimum number of rows per frame * javadocs and docs adjustments * spelling * adjust docs per review with minor tweaks * adjust more	2021-02-23 02:11:04 -08:00
Clint Wylie	cbbef80c7f	add SQL operators for bitwise expressions (#10823 ) * add SQL operators for bitwise expressions * more test * fix spelling * more tests	2021-02-18 20:56:33 -08:00
sthetland	1e40f51e65	Fix example names of security artifacts in docs (#10882 ) * replacing example names * unrelated typos * unintended changes * a few more typo fixes	2021-02-16 14:58:50 -08:00
Jihoon Son	1ec3f0bd73	Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535 )" (#10871 ) This reverts commit `6b14bdb3a5`.	2021-02-09 17:51:26 -08:00

1 2 3 4 5 ...

2378 Commits