druid

Commit Graph

Author	SHA1	Message	Date
Maytas Monsereenusorn	f19c2e9ce4	If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948 ) * add IT * add IT * add the fix * fix checkstyle * fix compile * fix compile * fix test * fix test * address comments	2021-03-18 17:04:28 -07:00
Suneet Saldanha	6b0c2e8996	CompactionTask throws exception on conflicting segmentGranularity (#10996 ) * CompactionTask throws exception on conflicting segmentGranularity * add comment	2021-03-16 12:51:50 -07:00
Maytas Monsereenusorn	ed91a2bb38	Fix Kinesis should not increment throwAway count on EOS record (#10976 ) * fix Kinesis increament throwAway on EOS record * fix checkstyle * fix IT * fix test * fix IT * fix IT * fix IT * fix IT	2021-03-11 22:04:58 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Maytas Monsereenusorn	b7b0ee8362	Add query granularity to compaction task (#10900 ) * add query granularity to compaction task * fix checkstyle * fix checkstyle * fix test * fix test * add tests * fix test * fix test * cleanup * rename class * fix test * fix test * add test * fix test	2021-03-02 11:23:52 -08:00
Gian Merlino	07902f607b	Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904 ) * Granularity: Introduce primitive-typed bucketStart, increment methods. Saves creation of unnecessary DateTime objects in timestamp_floor and timestamp_ceil expressions. * Fix style. * Amp up the test coverage.	2021-02-25 07:59:20 -08:00
Agustin Gonzalez	eabad0fb35	Keep query granularity of compacted segments after compaction (#10856 ) * Keep query granularity of compacted segments after compaction * Protect against null isRollup * Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment * Make sure that NONE is also included when comparing for the finer granularity * Update integration test check for segment size due to query granularity propagation affecting size * Minor code cleanup * Added functional test to verify queryGranlarity after compaction * Minor style fix * Update unit tests	2021-02-18 01:35:10 -08:00
Maytas Monsereenusorn	6541178c21	Support segmentGranularity for auto-compaction (#10843 ) * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * resolve conflict * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * fix tests * fix more tests * fix checkstyle * add unit tests * fix checkstyle * fix checkstyle * fix checkstyle * add unit tests * add integration tests * fix checkstyle * fix checkstyle * fix failing tests * address comments * address comments * fix tests * fix tests * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test	2021-02-12 03:03:20 -08:00
Egor Riashin	5d8a69ecf9	fixed input source sampler buildReader exp (#10651 ) * fixed input source sampler buildReader exp * sampler exception unit test * styling Co-authored-by: egor-ryashin <egor.ryashin@rilldata.com>	2021-02-05 13:33:55 -08:00
Abhishek Agarwal	96d26e5338	Fix kinesis ingestion bugs (#10761 ) * add offsetFetchPeriod to kinesis ingestion doc * Remove jackson dependencies from extensions * Use fixed delay for lag collection * Metrics reset after finishing processing * comments * Broaden the list of exceptions to retry for * Unit tests * Add more tests * Refactoring * re-order metrics * Doc suggestions Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Add tests Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-02-05 02:49:58 -08:00
Jihoon Son	3f8f00a231	Fix CVE-2021-25646 (#10818 )	2021-02-04 11:21:43 -08:00
Agustin Gonzalez	0e4750bac2	Granularity interval materialization (#10742 ) * Prevent interval materialization for UniformGranularitySpec inside the overlord * Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval> * Javadoc update, respect inputIntervals contract * Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec * Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity * Fix Travis style & other checks * Refactor TreeSet to facilitate re-use in UniformGranularitySpec * Make sure intervals are unique when there is no segment granularity * Style/bugspot fixes... * More travis checks * Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method * Style & PR feedback * Fixed failing test * Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change) * Refactor so that we can get the condensed buckets without materializing the intervals * Get rid of GranularitySpec::condensedInputIntervals ... not needed * Travis failures fixes * Travis checkstyle fix * Edited/added javadoc comments and a method name (code review feedback) * Fixed jacoco coverage by moving class and adding more coverage * Avoid materializing the condensed intervals when locking * Deal with overlapping intervals * Remove code and use library code instead * Refactor intervals by granularity using the FluentIterable, add sanity checks * Change !hasNext() to inputIntervals().isEmpty() * Remove redundant lambda * Use materialized intervals here since this is outside the overlord (for performance) * Name refactor to reflect the fact that bucket intervals are sorted. * Style fixes * Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense) * Remove forbidden api * Move helper class inside common base class to reduce public space pollution	2021-01-29 06:02:10 -08:00
Abhishek Agarwal	0080e333cc	Fix cardinality estimation (#10762 ) * Fix cardinality estimation * Add unit test * code coverage * fix typo	2021-01-28 15:06:10 -08:00
Maytas Monsereenusorn	a46d561bd7	Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740 ) * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix checkstyle * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix test * fix test * add log * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * address comments * fix checkstyle * fix checkstyle * add config to skip overhead memory calculation * add test for the skipBytesInMemoryOverheadCheck config * add docs * fix checkstyle * fix checkstyle * fix spelling * address comments * fix travis * address comments	2021-01-27 00:34:56 -08:00
zhangyue19921010	2590ad4f67	Historical unloads damaged segments automatically when lazy on start. (#10688 ) * ready to test * tested on dev cluster * tested * code review * add UTs * add UTs * ut passed * ut passed * opti imports * done * done * fix checkstyle * modify uts * modify logs * changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment * merge from master * modify import orders * merge from master * merge from master * modify logs * modify docs * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci * modify logs to rerun ci Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-01-16 19:53:30 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Jihoon Son	ca32652932	Fix potential deadlock in batch ingestion (#10736 ) * Fix potential deadlock in batch ingestion * fix checkstyle and comment * this is better	2021-01-12 12:50:45 -08:00
Xavier Léauté	118b50195e	Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730 ) Today Kafka message support in streaming indexing tasks is limited to message values, and does not provide a way to expose Kafka headers, timestamps, or keys, which may be of interest to more specialized Druid input formats. For instance, Kafka headers may be used to indicate payload format/encoding or additional metadata, and timestamps are often omitted from values in Kafka streams applications, since they are included in the record. This change proposes to introduce KafkaRecordEntity as InputEntity, which would give input formats full access to the underlying Kafka record, including headers, key, timestamps. It would also open access to low-level information such as topic, partition, offset if needed. KafkaEntity is a subclass of ByteEntity for backwards compatibility with existing input formats, and to avoid introducing unnecessary complexity for Kinesis indexing tasks.	2021-01-08 16:04:37 -08:00
Liran Funaro	08ab82f55c	IncrementalIndex Tests and Benchmarks Parametrization (#10593 ) * Remove redundant IncrementalIndex.Builder * Parametrize incremental index tests and benchmarks - Reveal and fix a bug in OffheapIncrementalIndex * Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale] * Fix Intellij errors: declared exception is never thrown * Add documentation and validate before closing objects on tearDown. * Add documentation to OffheapIncrementalIndexTestSpec * Doc corrections and minor changes. * Add logging for generated rows. * Refactor new tests/benchmarks. * Improve IncrementalIndexCreator documentation * Add required tests for DataGenerator * Revert "rollupOpportunity" to be a string	2021-01-07 22:18:47 -08:00
Jonathan Wei	68bb038b31	Multiphase segment merge for IndexMergerV9 (#10689 ) * Multiphase merge for IndexMergerV9 * JSON fix * Cleanup temp files * Docs * Address logging and add IT * Fix spelling and test unloader datasource name	2021-01-05 22:19:09 -08:00
Jihoon Son	ea2d51d61f	Better error message for compaction task when it sees no segments for compaction (#10728 )	2021-01-05 16:49:57 -08:00
Jonathan Wei	769c21cc87	Add sample method to IndexingServiceClient (#10729 ) * Add sample method to IndexingServiceClient * Add unit test * Fix LGTM	2021-01-05 15:02:44 -08:00
Clint Wylie	da0eabaa01	integration test for coordinator and overlord leadership client (#10680 ) * integration test for coordinator and overlord leadership, added sys.servers is_leader column * docs * remove not needed * fix comments * fix compile heh * oof * revert unintended * fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster * fixes * style * dang * heh * scripts are hard * fix spelling * fix thing that must not matter since was already wrong ip, log when test fails * needs more heap * fix merge * less aggro	2020-12-17 22:50:12 -08:00
Gian Merlino	96a387d972	Fixes and tests related to the Indexer process. (#10631 ) * Fixes and tests related to the Indexer process. Three bugs fixed: 1) Indexers would not announce themselves as segment servers if they did not have storage locations defined. This used to work, but was broken in #9971. Fixed this by adding an "isSegmentServer" method to ServerType and updating SegmentLoadDropHandler to always announce if this method returns true. 2) Certain batch task types were written in a way that assumed "isReady" would be called before "run", which is not guaranteed. In particular, they relied on it in order to initialize "taskLockHelper". Fixed this by updating AbstractBatchIndexTask to ensure "isReady" is called before "run" for these tasks. 3) UnifiedIndexerAppenderatorsManager did not properly handle complex datasources. Introduced DataSourceAnalysis in order to fix this. Test changes: 1) Add a new "docker-compose.cli-indexer.yml" config that spins up an Indexer instead of a MiddleManager. 2) Introduce a "USE_INDEXER" environment variable that determines if docker-compose will start up an Indexer or a MiddleManager. 3) Duplicate all the jdk8 tests and run them in both MiddleManager and Indexer mode. 4) Various adjustments to encourage fail-fast errors in the Docker build scripts. 5) Various adjustments to speed up integration tests and reduce memory usage. 6) Add another Mac-specific approach to determining a machine's own IP. This was useful on my development machine. 7) Update segment-count check in ITCompactionTaskTest to eliminate a race condition (it was looking for 6 segments, which only exist together briefly, until the older 4 are marked unused). Javadoc updates: 1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX that make it clear when taskLockHelper will be initialized as a side effect. (Related to the second bug above.) 2) Task: Clarified that "isReady" is not guaranteed to be called before "run". It was already implied, but now it's explicit. 3) ZkCoordinator: Clarified deprecation message. 4) DataSegmentServerAnnouncer: Clarified deprecation message. * Fix stop_cluster script. * Fix sanity check in script. * Fix hashbang lines. * Test and doc adjustments. * Additional tests, and adjustments for tests. * Split ITs back out. * Revert change to druid_coordinator_period_indexingPeriod. * Set Indexer capacity to match MM. * Bump up Historical memory. * Bump down coordinator, overlord memory. * Bump up Broker memory.	2020-12-08 16:02:26 -08:00
Gian Merlino	9acab0b646	DruidInputSource: Sort segments by ID before grouping into splits. (#10646 ) This is useful because it groups up segments for the same time chunk into the same splits, which in turn is useful because it minimizes the number of time chunks that each task will have to deal with.	2020-12-07 13:48:24 -08:00
egor-ryashin	f46cc4faaf	Revert "fixed input source sampler buildReader exp" This reverts commit `e688db8`	2020-12-07 18:34:59 +03:00
egor-ryashin	e688db8503	fixed input source sampler buildReader exp	2020-12-07 18:28:25 +03:00
Gian Merlino	b7641f644c	Two fixes related to encoding of % symbols. (#10645 ) * Two fixes related to encoding of % symbols. 1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments() returns already-decoded strings. Applying StringUtils.urlDecode on top of that causes erroneous behavior with '%' characters. 2) Update various ThreadFactoryBuilder name formats to escape '%' characters. This fixes situations where substrings starting with '%' are erroneously treated as format specifiers. ITs are updated to include a '%' in extra.datasource.name.suffix. * Avoid String.replace. * Work around surefire bug. * Fix xml encoding. * Another try at the proper encoding. * Give up on the emojis. * Less ambitious testing. * Fix an additional problem. * Adjust encodeForFormat to return null if the input is null.	2020-12-06 22:35:11 -08:00
Gian Merlino	17f39ab91e	Fix misspellings in druid-forbidden-apis. (#10634 ) These caused certain APIs to not actually be properly forbidden. Also removed two MoreExecutors entries for methods that don't exist in our version of Guava.	2020-12-05 15:26:57 -08:00
Jihoon Son	7462b0b953	Allow missing intervals for Parallel task with hash/range partitioning (#10592 ) * Allow missing intervals for Parallel task * fix row filter * fix tests * fix log	2020-11-25 14:50:22 -08:00
frank chen	fe693a4f01	Improve doc and exception message for invalid user configurations (#10598 ) * improve doc and exception message * add spelling check rules and remove unused import * add a test to improve test coverage	2020-11-23 15:03:13 -08:00
Lucas Capistrant	3447934a75	Ensure Krb auth before killing YARN apps in graceful shutdown (#9785 )	2020-11-16 09:59:14 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Himanshu	ee136303bb	optionally disable all of hardcoded zookeeper use (#9507 ) * optionally disable all of hardcoded zookeeper use * fix DruidCoordinatorTest compilation * fix test in DruidCoordinatorTest * fix strict compilation Co-authored-by: Himanshu Gupta <fill email>	2020-10-26 22:35:59 -07:00
Liran Funaro	f3a2903218	Configurable Index Type (#10335 ) * Introduce a Configurable Index Type * Change to @UnstableApi * Add AppendableIndexSpecTest * Update doc * Add spelling exception * Add tests coverage * Revert some of the changes to reduce diff * Minor fixes * Update getMaxBytesInMemoryOrDefault() comment * Fix typo, remove redundant interface * Remove off-heap spec (postponed to a later PR) * Add javadocs to AppendableIndexSpec * Describe testCreateTask() * Add tests for AppendableIndexSpec within TuningConfig * Modify hashCode() to conform with equals() * Add comment where building incremental-index * Add "EqualsVerifier" tests * Revert some of the API back to AppenderatorConfig * Don't use multi-line comments * Remove knob documentation (deferred)	2020-10-23 18:34:26 -07:00
Jihoon Son	ad437dd655	Add shuffle metrics for parallel indexing (#10359 ) * Add shuffle metrics for parallel indexing * javadoc and concurrency test * concurrency * fix javadoc * Feature flag * doc * fix doc and add a test * checkstyle * add tests * fix build and address comments	2020-10-10 19:35:17 -07:00
Jihoon Son	e9e7d82714	Fix compaction task slot computation in auto compaction (#10479 ) * Fix compaction task slot computation in auto compaction * add tests for task counting	2020-10-06 21:56:03 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Abhishek Agarwal	e282ab5695	Fix the task id creation in CompactionTask (#10445 ) * Fix the task id creation in CompactionTask * review comments * Ignore test for range partitioning and segment lock	2020-10-01 15:02:04 -07:00
Mainak Ghosh	8168e14e92	Adding task slot count metrics to Druid Overlord (#10379 ) * Adding more worker metrics to Druid Overlord * Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better * Few more instance of worker usage replaced with peon * Modifying the peon idle count logic to only use eligible workers available capacity * Changing the naming to task slot count instead of peon * Adding some unit test coverage for the new test runner apis * Addressing Review Comments * Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics * Fixing the spelling issue in the docs * Setting the annotation Nullable on the TaskSlotCountStatsProvider methods	2020-09-28 23:50:38 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Jonathan Wei	cb30b1fe23	Automatically determine numShards for parallel ingestion hash partitioning (#10419 ) * Automatically determine numShards for parallel ingestion hash partitioning * Fix inspection, tests, coverage * Docs and some PR comments * Adjust locking * Use HllSketch instead of HyperLogLogCollector * Fix tests * Address some PR comments * Fix granularity bug * Small doc fix	2020-09-24 13:47:53 -07:00
Maytas Monsereenusorn	72f1b55f56	Add last_compaction_state to sys.segments table (#10413 ) * Add is_compacted to sys.segments table * change is_compacted to last_compaction_state * fix tests * fix tests * address comments	2020-09-23 15:29:36 -07:00
Jihoon Son	8f14ac814e	More structured way to handle parse exceptions (#10336 ) * More structured way to handle parse exceptions * checkstyle; add more tests * forbidden api; test * address comment; new test * address review comments * javadoc for parseException; remove redundant parseException in streaming ingestion * fix tests * unnecessary catch * unused imports * appenderator test * unused import	2020-09-11 16:31:10 -07:00
Gian Merlino	8ab1979304	Remove implied profanity from error messages. (#10270 ) i.e. WTF, WTH.	2020-08-28 11:38:50 -07:00
Jihoon Son	f82fd22fa7	Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308 ) * Move tools for indexing to TaskToolbox instead of injecting them in constructor * oops, other changes * fix test * unnecessary new file * fix test * fix build	2020-08-26 17:08:12 -07:00
Gian Merlino	21703d81ac	Fix handling of 'join' on top of 'union' datasources. (#10318 ) * Fix handling of 'join' on top of 'union' datasources. The problem is that unions are typically rewritten into a series of individual queries on the underlying tables, but this isn't done when the union is wrapped in a join. The main changes are in UnionQueryRunner: 1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis. 2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource". Together, these enable UnionQueryRunner to "see through" a join. * Tests. * Adjust heap sizes for integration tests. * Different approach, more tests. * Tweak. * Styling.	2020-08-26 14:23:54 -07:00
Jihoon Son	b9ff3483ac	Add support for all partitioing schemes for auto compaction (#10307 ) * Add support for all partitioing schemes for auto compaction * annotate last compaction state for multi phase parallel indexing * fix build and tests * test * better home	2020-08-26 13:19:18 -07:00
Abhishek Agarwal	d4ac62f284	Handle internal kinesis sequence numbers when reporting lag (#10315 ) * Handle internal kinesis sequence numbers when reporting lag * add unit test	2020-08-26 11:27:37 -07:00
Clint Wylie	ab60661008	refactor internal type system (#9638 ) * better type tracking: add typed postaggs, finalized types for agg factories * more javadoc * adjustments * transition to getTypeName to be used exclusively for complex types * remove unused fn * adjust * more better * rename getTypeName to getComplexTypeName * setup expression post agg for type inference existing * more javadocs * fixup * oops * more test * more test * more comments/javadoc * nulls * explicitly handle only numeric and complex aggregators for incremental index * checkstyle * more tests * adjust * more tests to showcase difference in behavior * timeseries longsum array	2020-08-26 10:53:44 -07:00

1 2 3 4 5 ...

1804 Commits