druid

Commit Graph

Author	SHA1	Message	Date
Charles Smith	797371598d	update syntax for golbal cached uri lookups (#10629 )	2020-12-24 09:49:01 -08:00
Xavier Léauté	b7a16d08a6	Update Apache Kafka to 2.7.0 (#10701 ) - align scala versions to match Kafka	2020-12-22 13:56:00 -08:00
Lucas Capistrant	58ce2e55d8	Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284 ) * dynamic coord config adding more balancing control add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This config caps the number of segments that are iterated over when selecting a segment to move. The default value combined with current balancing strategies will still iterate over all provided segments. However, setting this value to something > 0 will cap the number of segments visited. This could make sense in cases where a cluster has a very large number of segments and the admins prefer less iterations vs a thorough consideration of all segments provided. * fix checkstyle failure * Make doc more detailed for admin to understand when/why to use new config * refactor PR to use a % of segments instead of raw number * update the docs * remove bad doc line * fix typo in name of new dynamic config * update RservoirSegmentSampler to gracefully deal with values > 100% * add handler for <= 0 in ReservoirSegmentSampler * fixup CoordinatorDynamicConfigTest naming and argument ordering * fix items in docs after spellcheck flags * Fix lgtm flag on missing space in string literal * improve documentation for new config * Add default value to config docs and add advice in cluster tuning doc * Add percentOfSegmentsToConsiderPerMove to web console coord config dialog * update jest snapshot after console change * fix spell checker errors * Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider * add new config back to web console module after merge with master * fix ReservoirSegmentSamplerTest * fix line breaks in coordinator console dialog * Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove * Make improvements based off of feedback in review * additional cleanup coming from review * Add a warning log if limit on segments to consider for move can't be calcluated * remove unused import * fix tests for CoordinatorDynamicConfig * remove precondition test that is redundant in CoordinatorDynamicConfig Builder class	2020-12-22 08:27:55 -08:00
Maytas Monsereenusorn	5bd7924296	Fix kinesis integration test (#10696 ) * fix kinesis IT * fix checkstyle	2020-12-21 12:57:40 -08:00
Clint Wylie	92e5700e1e	fix integration test override config which requires environment variables before calling compose (#10694 )	2020-12-18 17:57:07 -08:00
keefe roedersheimer	ca3b925133	allow server selection to be aware of query (#10428 ) * add query through to server selector * add nullable extensions, deprecate old methods with defaults * style changes * add nullable to ServerSelectorStrategy * fix test coverage * missing override in test * add null check	2020-12-18 13:56:19 -08:00
Maytas Monsereenusorn	6f2ce8f0a5	fix Kinesis It (#10692 )	2020-12-18 13:47:00 -08:00
Gian Merlino	57ee8ce4e7	CompressionUtils: Read the entire stream when unzipping from a stream. (#10664 ) * CompressionUtils: Read the entire stream when unzipping from a stream. Should fix #6905 by making sure we avoid closing partially-read streams. * CHECKSTYLE!	2020-12-17 22:52:04 -08:00
Clint Wylie	da0eabaa01	integration test for coordinator and overlord leadership client (#10680 ) * integration test for coordinator and overlord leadership, added sys.servers is_leader column * docs * remove not needed * fix comments * fix compile heh * oof * revert unintended * fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster * fixes * style * dang * heh * scripts are hard * fix spelling * fix thing that must not matter since was already wrong ip, log when test fails * needs more heap * fix merge * less aggro	2020-12-17 22:50:12 -08:00
Abhishek Agarwal	796c25532e	Fix post-aggregator computation when used with subtotals (#10653 ) * Fix post-aggregator computation * remove commented code * Fix numeric null handling * Add test when subquery returns null long	2020-12-17 20:10:26 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
zhangyue19921010	1884c35698	Do Integrate test for Druid base on K8s cluster (#10669 ) * add a travls job to do integrate test on K8s * revert build_run_cluster.sh * revert msic * run IT test * ready to test * modify before/after script * done * change mod for script * done * add env DRUID_OPERATOR_VERSION=0.0.3 * change version Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-16 16:00:42 -08:00
Vadim Ogievetsky	55b8cc428a	remove extra word (#10682 )	2020-12-15 23:02:33 -08:00
Abhishek Agarwal	7a8e9bb156	Fix hadoop docker copy script (#10671 )	2020-12-14 23:08:50 -08:00
Himanshu	ac1882bf74	kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544 ) * honor zk enablement config in more places in druid code * kubernetes based discovery module * fix spotbugs check * fix intellij checks error * fix doc link to kubernetes.md from extension * make spellchecker happy * update license.yaml * fix dependency check errors * update extension coverage * UTs for BaseNodeRoleWatcher * fix forbidden-api check * update k8s module coverage ignores * add Bouncy Castle License being same as MIT License for license checking purposes * further update licenses.yaml * label/annotation pre-existence assumption * address review comment	2020-12-14 21:10:31 -08:00
Harini Rajendran	c2e26d2e1c	Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679 ) Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior. Also added example config files for indexer process that can be used to spin up the indexer process.	2020-12-14 19:04:14 -08:00
zhangyue19921010	0ad27c06da	Historical load Segments enhancement (#10650 ) * load segments with segment files check * add more java docs * done * add java docs * revert misc * resolve ci failures * resolve ci failures * done Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-14 13:56:03 -08:00
Clint Wylie	64f97e7003	fix DruidSchema incorrectly listing tables with no segments (#10660 ) * fix race condition with DruidSchema tables and dataSourcesNeedingRebuild * rework to see if it passes analysis * more better * maybe this * re-arrange and comments	2020-12-11 14:14:00 -08:00
Gian Merlino	753fa6b3bd	IdUtils: Forbid characters that cannot be used in znodes. (#10659 ) * IdUtils: Forbid characters that cannot be used in znodes. * Fix whitespace.	2020-12-10 10:49:40 -08:00
Himanshu	be019760bb	document DynamicConfigProvider for kafka consumer properties (#10658 ) * document DynamicConfigProvider for kafka consumer properties * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/development/extensions-core/kafka-ingestion.md * fix doc build Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-12-10 08:24:33 -08:00
Jihoon Son	abcf624a2e	Bump up jackson-databind to 2.10.5.1 (#10655 ) * Bump up jackson version to 2.10.5.1 * only jackson-databind * license	2020-12-09 13:54:47 -08:00
Vadim Ogievetsky	577cd66002	Web console: reflect the changes to interval requirement in the data loader flow (#10647 ) * no need for intervals * don't set redundant fields * fix tests * better filter control * work with not * wrap callout with form group * update snapshot * add split hint * highlight issues with spec * fixes * fix default value * move intervals back to partition step * work with all sorts of chars * fix enabled view	2020-12-09 10:18:42 -08:00
Harini Rajendran	9d2df506ea	Return appropriate config directory path for indexer process. (#10657 ) Currently `getConfPath` function returns an empty string for `indexer` service. So the config directory path for this service is not set properly when installed using docker environment.	2020-12-09 09:30:36 -08:00
Atul Mohan	44df05b8b2	Clarify split hint spec behavior (#10656 )	2020-12-09 08:24:32 -06:00
Abhishek Agarwal	4ea1ab8531	Fix links in the grouping function doc (#10654 )	2020-12-09 14:56:32 +08:00
Gian Merlino	96a387d972	Fixes and tests related to the Indexer process. (#10631 ) * Fixes and tests related to the Indexer process. Three bugs fixed: 1) Indexers would not announce themselves as segment servers if they did not have storage locations defined. This used to work, but was broken in #9971. Fixed this by adding an "isSegmentServer" method to ServerType and updating SegmentLoadDropHandler to always announce if this method returns true. 2) Certain batch task types were written in a way that assumed "isReady" would be called before "run", which is not guaranteed. In particular, they relied on it in order to initialize "taskLockHelper". Fixed this by updating AbstractBatchIndexTask to ensure "isReady" is called before "run" for these tasks. 3) UnifiedIndexerAppenderatorsManager did not properly handle complex datasources. Introduced DataSourceAnalysis in order to fix this. Test changes: 1) Add a new "docker-compose.cli-indexer.yml" config that spins up an Indexer instead of a MiddleManager. 2) Introduce a "USE_INDEXER" environment variable that determines if docker-compose will start up an Indexer or a MiddleManager. 3) Duplicate all the jdk8 tests and run them in both MiddleManager and Indexer mode. 4) Various adjustments to encourage fail-fast errors in the Docker build scripts. 5) Various adjustments to speed up integration tests and reduce memory usage. 6) Add another Mac-specific approach to determining a machine's own IP. This was useful on my development machine. 7) Update segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused). Javadoc updates: 1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX that make it clear when taskLockHelper will be initialized as a side effect. (Related to the second bug above.) 2) Task: Clarified that "isReady" is not guaranteed to be called before "run". It was already implied, but now it's explicit. 3) ZkCoordinator: Clarified deprecation message. 4) DataSegmentServerAnnouncer: Clarified deprecation message. * Fix stop_cluster script. * Fix sanity check in script. * Fix hashbang lines. * Test and doc adjustments. * Additional tests, and adjustments for tests. * Split ITs back out. * Revert change to druid_coordinator_period_indexingPeriod. * Set Indexer capacity to match MM. * Bump up Historical memory. * Bump down coordinator, overlord memory. * Bump up Broker memory.	2020-12-08 16:02:26 -08:00
Vyatcheslav Mogilevsky	5324785eac	integration tests fix: update base image for hadoop containers to centos 7 (#10638 ) LGTM	2020-12-08 11:00:51 -08:00
frank chen	c410648630	fix injection failure of StorageLocationSelectorStrategy objects (#10363 ) * fix to allow customer storage location selector strategy * add test cases to check instance of selector strategy * update doc * code format * resolve code review comments * inject StorageLocation * fix CI * fix mismatched license item reported by CI * change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy * using a helper method to bind to correct property path	2020-12-08 09:48:31 -08:00
Vadim Ogievetsky	e3f7217546	Web console: Improve the handling of extreme data (funky datasources, longs) (#10641 ) * better API escape * fix escaping issue, bigints * update licenses * fix align * do not show Query with SQL if no SQL * add prettify script * update dev readme * add ordering to the datasource list * add ordering to supervisor table	2020-12-08 09:25:14 -08:00
Gian Merlino	9acab0b646	DruidInputSource: Sort segments by ID before grouping into splits. (#10646 ) This is useful because it groups up segments for the same time chunk into the same splits, which in turn is useful because it minimizes the number of time chunks that each task will have to deal with.	2020-12-07 13:48:24 -08:00
Abhishek Agarwal	26d74b3580	Add grouping_id function (#10518 ) * First draft of grouping_id function * Add more tests and documentation * Add calcite tests * Fix travis failures * bit of a change * Add documentation * Fix typos * typo fix	2020-12-07 11:46:29 -08:00
Gian Merlino	b681861f05	Speed up integration tests in two ways. (#10648 ) 1) Accelerate coordinator runs to speed up segment load after publishing. 2) For streaming ingestion tests, Instead of waiting 3 minutes for data to load, wait until the expected number of rows is loaded. Also updates segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused).	2020-12-07 10:59:29 -08:00
egor-ryashin	f46cc4faaf	Revert "fixed input source sampler buildReader exp" This reverts commit `e688db8`	2020-12-07 18:34:59 +03:00
egor-ryashin	e688db8503	fixed input source sampler buildReader exp	2020-12-07 18:28:25 +03:00
Gian Merlino	b7641f644c	Two fixes related to encoding of % symbols. (#10645 ) * Two fixes related to encoding of % symbols. 1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments() returns already-decoded strings. Applying StringUtils.urlDecode on top of that causes erroneous behavior with '%' characters. 2) Update various ThreadFactoryBuilder name formats to escape '%' characters. This fixes situations where substrings starting with '%' are erroneously treated as format specifiers. ITs are updated to include a '%' in extra.datasource.name.suffix. * Avoid String.replace. * Work around surefire bug. * Fix xml encoding. * Another try at the proper encoding. * Give up on the emojis. * Less ambitious testing. * Fix an additional problem. * Adjust encodeForFormat to return null if the input is null.	2020-12-06 22:35:11 -08:00
Gian Merlino	17f39ab91e	Fix misspellings in druid-forbidden-apis. (#10634 ) These caused certain APIs to not actually be properly forbidden. Also removed two MoreExecutors entries for methods that don't exist in our version of Guava.	2020-12-05 15:26:57 -08:00
Maytas Monsereenusorn	7eb5f59a9a	Fix string byte calculation in StringDimensionIndexer (#10623 ) * fix string byte calculation * fix tests * fix test	2020-12-04 00:51:48 -08:00
Liran Funaro	52d46cebc3	Move common configurations to TuningConfig (#10478 ) * Move common methods that are used in HadoopTuningConfig and in AppenderatorConfig to TuningConfig * Rename rowFlushBoundary in HadoopTuningConfig to maxRowsInMemory to match TuningConfig API	2020-12-03 18:13:32 -08:00
zhangyue19921010	229b5f359f	Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better (#10551 ) * remove build in kafka consumer config : * modify druid docs of kafka indexing service * yuezhang * modify doc * modify docs * fix kafkaindexTaskTest.java * revert uncessary change * add more logs and modify docs * revert jdk version * modify docs * modify-kafka-version v2 * modify docs * modify docs * modify docs * modify docs * modify docs * done * remove useless import * change code and add UT Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-03 17:37:59 -08:00
Jihoon Son	ae6c43de71	Add an integration test for HTTP inputSource (#10620 )	2020-12-03 15:51:56 -08:00
Valdemar	2cd017b7aa	Fix the config initialization on container restart (#10458 )	2020-12-03 12:03:00 -08:00
Himanshu	813e18774e	make dimension column extensible with COMPLEX type (#10277 ) * make dimension column extensible with COMPLEX type * more changes Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d * processing module changes for build fix Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e * rename ColumnCapabilities.getTypeName() to getComplexTypeName() * rename ColumnBuilder.setTypeName(..) -> ColumnBuilder.setComplexTypeName(..)	2020-12-03 08:58:17 -08:00
Suneet Saldanha	c94be8a945	Revert "Update google client libraries (#10536 )" (#10599 ) This reverts commit `4537016cad`.	2020-12-03 20:14:52 +05:30
Himanshu	7e9522870f	introduce DynamicConfigProvider interface and make kafka consumer props extensible (#10309 ) * introduce DynamicConfigProvider interface and make kafka consumer props extensible * fix intellij inspection error * make DynamicConfigProvider generic Change-Id: I2e3e89f8617b6fe7fc96859deca4011f609dc5a3 * deprecate PasswordProvider	2020-12-02 16:38:27 -08:00
Atul Mohan	f965464f36	Fix empty directory handling (#10319 ) Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-12-02 10:37:08 -08:00
Lucas Capistrant	2e02eebd9d	Add context dimension to DefaultQueryMetrics (#10578 ) * Add context dimension to DefaultQueryMetrics * remove redundant addition of context dimension from DruidMetrics now that QueryMetrics adds it by default * update SearchQueryMetrics to reflect the same pattern as other default dimensions in QueryMetrics * add PublicApi annotation for context in QueryMetrics Interface	2020-12-01 18:34:03 -08:00
zhangyue19921010	e7e07eab11	[Improve Doc] : Modify the disadvantages of the lazyLoadOnStart feature. (#10608 ) * modify docs * modify docs Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-01 18:33:22 -08:00
frank chen	24f1e35b5d	fix desc of 'required' for granularity property (#10616 )	2020-12-01 18:29:51 -08:00
Vadim Ogievetsky	5b06c7a3a9	Web console: improve how code is imported, use API instance (#10597 ) * fix imports * clean up imports * update DQT to fix escaping	2020-12-01 13:16:14 -08:00
Jihoon Son	d47d6cf081	Add time-to-first-result benchmark for groupBy (#10612 )	2020-12-01 10:32:37 -08:00

1 2 3 4 5 ...

10729 Commits All Branches Search

10729 Commits

All Branches