druid

mirror of https://github.com/apache/druid.git synced 2025-02-21 09:46:21 +00:00

Author	SHA1	Message	Date
Gian Merlino	11c0da8097	Add availability and consistency docs. (#10149 ) * Add availability and consistency docs. Describes transactional ingestion and atomic replacement. Also, this patch deletes some bad advice from the javadocs for SegmentTransactionalInsertAction. * Fix missing word.	2020-07-07 15:22:52 -07:00
Parag Jain	98ac7dfeff	mask secrets in MM task command log (#10128 ) * mask secrets in MM task command log * unit test for masked iterator * checkstyle fix	2020-07-07 10:25:15 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Suneet Saldanha	363d0d86be	QueryCountStatsMonitor can be injected in the Peon (#10092 ) * QueryCountStatsMonitor can be injected in the Peon This change fixes a dependency injection bug where there is a circular dependency on getting the MonitorScheduler when a user configures the QueryCountStatsMonitor to be used. * fix tests * Actually fix the tests this time	2020-06-29 21:03:07 -07:00
Chi Cao Minh	33a37d85d7	Fix native batch range partition segment sizing (#10089 ) * Fix native batch range partition segment sizing Fixes #10057. Native batch range partitioning was only considering the partition dimension value when grouping rows instead of using all of the row's partition values. Thus, for schemas with multiple dimensions, the rollup was overestimated, which would cause too many dimension values to be packed into the same range partition. The resulting segments would then be overly large (and not honor the target or max partition sizes). Main changes: - PartialDimensionDistributionTask: Consider all dimension values when grouping row - RangePartitionMultiPhaseParallelIndexingTest: Regression test by having input with rows that should roll up and rows that should not roll up * Use hadoop & native hash ingestion row group key	2020-06-29 17:49:52 -07:00
Jihoon Son	c591ff8ea8	Add NonnullPair (#10013 ) * Add NonnullPair * new line * test * make it consistent	2020-06-26 09:52:06 -07:00
morrifeldman	f6594fff60	Fix missing temp dir for native single_dim (#10046 ) * Fix missing temp dir for native single_dim Native single dim indexing throws a file not found exception from InputEntityIteratingReader.java:81. This MR creates the required temporary directory when setting up the PartialDimensionDistributionTask. The change was tested on a Druid cluster. After installing the change native single_dim indexing completes successfully. * Fix indentation * Use SinglePhaseSubTask as example for creating the temp dir * Move temporary indexing dir creation in to TaskToolbox * Remove unused dependency Co-authored-by: Morri Feldman <morri@appsflyer.com>	2020-06-25 14:41:22 -07:00
Jihoon Son	aaee72c781	Allow append to existing datasources when dynamic partitioning is used (#10033 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports * Allow append to existing datasources when dynamic partitioing is used * fix test * checkstyle * checkstyle * fix test * fix test * fix other tests.. * checkstyle * hansle unknown core partitions size in overlord segment allocation * fail to append when numCorePartitions is unknown * log * fix comment; rename to be more intuitive * double append test * cleanup complete(); add tests * fix build * add tests * address comments * checkstyle	2020-06-25 13:37:31 -07:00
Jihoon Son	d644a27f1a	Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports	2020-06-18 18:40:43 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
Jihoon Son	9a10f8352b	Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle	2020-06-12 21:39:37 -07:00
Clint Wylie	c5d6163c76	add a GeneratorInputSource to fill up a cluster with generated data for testing (#9946 ) * move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data * newlines * make test coverage not fail maybe * remove useless test * Update pom.xml * Update GeneratorInputSourceTest.java * less passive aggressive test names	2020-06-09 19:31:04 -07:00
Jonathan Wei	771870ae2d	Load broadcast datasources on broker and tasks (#9971 ) * Load broadcast datasources on broker and tasks * Add javadocs * Support HTTP segment management * Fix indexer maxSize * inspection fix * Make segment cache optional on non-historicals * Fix build * Fix inspections, some coverage, failed tests * More tests * Add CliIndexer to MainTest * Fix inspection * Rename UnprunedDataSegment to LoadableDataSegment * Address PR comments * Fix	2020-06-08 20:15:59 -07:00
Yuanli Han	ee7bda5d8a	Fix compact partially overlapping segments (#9905 ) * fix compact overlapping segments * fix comment * fix CI failure	2020-06-08 09:54:39 -07:00
Mainak Ghosh	bcc066a27f	Empty partitionDimension has less rollup compared to when explicitly specified (#9861 ) * Empty partitionDimension has less rollup compared to the case when it is explicitly specified * Adding a unit test for the empty partitionDimension scenario. Fixing another test which was failing * Fixing CI Build Inspection Issue * Addressing all review comments * Updating the javadocs for the hash method in HashBasedNumberedShardSpec	2020-06-05 12:42:42 -07:00
Jihoon Son	474f6fc99b	Fix shutdown reason for unknown tasks in taskQueue (#9954 ) * Fix shutdown reason for unknown tasks in taskQueue * unused imports	2020-06-03 15:40:28 -07:00
Xavier Léauté	a934b2664c	remove ListenableFutures and revert to using the Guava implementation (#9944 ) This change removes ListenableFutures.transformAsync in favor of the existing Guava Futures.transform implementation. Our own implementation had a bug which did not fail the future if the applied function threw an exception, resulting in the future never completing. An attempt was made to fix this bug, however when running againts Guava's own tests, our version failed another half dozen tests, so it was decided to not continue down that path and scrap our own implementation. Explanation for how was this bug manifested itself: An exception thrown in BaseAppenderatorDriver.publishInBackground when invoked via transformAsync in StreamAppenderatorDriver.publish will cause the resulting future to never complete. This explains why when encountering https://github.com/apache/druid/issues/9845 the task will never complete, forever waiting for the publishFuture to register the handoff. As a result, the corresponding "Error while publishing segments ..." message only gets logged once the index task times out and is forcefully shutdown when the future is force-cancelled by the executor.	2020-06-03 10:46:03 -07:00
Clint Wylie	c690d10a7d	support customized factory.json via IndexSpec for segment persist (#9957 ) * support customized factory.json via IndexSpec for segment persist * equals verifier	2020-06-01 16:36:32 -07:00
Maytas Monsereenusorn	5b4b5d77a8	Fails creation of TaskResource if availabilityGroup is null (#9892 ) * Fails creation of TaskResource if availabilityGroup is null * add check for requiredCapacity	2020-05-19 22:19:22 -07:00
Clint Wylie	2e9548d93d	refactor SeekableStreamSupervisor usage of RecordSupplier (#9819 ) * refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting * fix style and test * cleanup, refactor, javadocs, test * fixes * keep collecting current offsets and lag if unhealthy in background reporting thread * review stuffs * add comment	2020-05-16 14:09:39 -07:00
Jihoon Son	46beaa0640	Fix potential resource leak in ParquetReader (#9852 ) * Fix potential resource leak in ParquetReader * add test * never thrown exception * catch potential exceptions	2020-05-16 09:57:12 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Clint Wylie	267a6cc175	low hanging fruit - presize hash map for DruidSegmentReader (#9836 )	2020-05-07 12:39:14 -07:00
Jihoon Son	6674d721bc	Avoid sorting values in InDimFilter if possible (#9800 ) * Avoid sorting values in InDimFilter if possible * tests * more tests * fix and and or filters * fix build * false and true vector matchers * fix vector matchers * checkstyle * in filter null handling * remove wrong test * address comments * remove unnecessary null check * redundant separator * address comments * typo * tests	2020-05-06 15:26:36 -07:00
Jihoon Son	964a1fc9df	Remove ParseSpec.toInputFormat() (#9815 ) * Remove toInputFormat() from ParseSpec * fix test	2020-05-05 11:17:57 -07:00
Maytas Monsereenusorn	8b78eebdbd	Test reading from empty kafka/kinesis partitions (#9729 ) * add test for stream sequence number returns null * fix checkstyle * add index test for when stream returns null * retrigger test	2020-04-27 10:23:56 -07:00
Jihoon Son	7fa72fbf15	Initialize SettableByteEntityReader only when inputFormat is not null (#9734 ) * Lazy initialization of SettableByteEntityReader to avoid NPE * toInputFormat for tsv * address comments * common code	2020-04-24 10:22:51 -07:00
Suneet Saldanha	642fe83897	Indexing Service validates externally received taskId (#9666 ) Addresses issues flagged by https://lgtm.com/rules/5970070/	2020-04-10 10:36:26 -07:00
Suneet Saldanha	1ced3b33fb	IntelliJ inspections cleanup (#9339 ) * IntelliJ inspections cleanup * Standard Charset object can be used * Redundant Collection.addAll() call * String literal concatenation missing whitespace * Statement with empty body * Redundant Collection operation * StringBuilder can be replaced with String * Type parameter hides visible type * fix warnings in test code * more test fixes * remove string concatenation inspection error * fix extra curly brace * cleanup AzureTestUtils * fix charsets for RangerAdminClient * review comments	2020-04-10 10:04:40 -07:00
Maytas Monsereenusorn	b95a1b9878	Fix NPE in RemoteTaskRunner event handler causes JVM shutdown (#9610 ) * Fix NPE in RemoteTaskRunner event handler causes JVM shutdown * address comments * fix compile * fix checkstyle * fix lgtm * fix merge * fix test * fix tests * change scope * address comments * address comments	2020-04-07 14:53:51 -07:00
Clint Wylie	d267b1c414	check paths used for shuffle intermediary data manager get and delete (#9630 ) * check paths used for shuffle intermediary data manager get and delete * add test * newline * meh	2020-04-07 09:47:18 -07:00
Jihoon Son	82ce60b5c1	Reuse transformer in stream indexing (#9625 ) * Reuse transformer in stream indexing * remove unused method * memoize complied pattern	2020-04-06 16:36:08 -07:00
Suneet Saldanha	af3337dac8	DruidInputSource can add new dimensions during re-ingestion (#9590 ) * WIP integration tests * Add integration test for ingestion with transformSpec * WIP almost working tests * Add ignored tests * checkstyle stuff * remove newPage from index task ingestion spec * more test cleanup * still not quite working * Actually disable the tests * working tests * fix codestyle * dont use junit in integration tests * actually fix the bug * fix checkstyle * bring index tests closer to reindex tests	2020-04-02 17:32:31 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Xavier Léauté	b4ad3d0d88	fix nullhandling exceptions related to test ordering (#9570 ) * fix nullhandling exceptions related to test ordering Tests might get executed in different order depending on the maven version and the test environment. This may lead to "NullHandling module not initialized" errors for some tests where we do not initialize null-handling explicitly. * use InitializedNullHandlingTest	2020-03-27 09:46:31 -07:00
Suneet Saldanha	55c08e0746	DruidSegmentReader should work if timestamp is specified as a dimension (#9530 ) * DruidSegmentReader should work if timestamp is specified as a dimension * Add integration tests Tests for compaction and re-indexing a datasource with the timestamp column * Instructions to run integration tests against quickstart * address pr	2020-03-25 13:47:34 -07:00
Clint Wylie	bf85ea19b2	roaring bitmaps by default (#9548 ) * it is finally time * fix it * more docs * fix doc	2020-03-23 18:15:57 -07:00
Clint Wylie	142742f291	add kinesis lag metric (#9509 ) * add kinesis lag metric * fixes * heh * do it right this time * more test * split out supervisor report lags into lagMillis, remove latest offsets from kinesis supervisor report since always null, review stuffs	2020-03-16 21:39:53 -07:00
Jihoon Son	7401bb3f93	Improve OvershadowableManager performance (#9441 ) * Use the iterator instead of higherKey(); use the iterator API instead of stream * Fix tests; fix a concurrency bug in timeline * fix test * add tests for findNonOvershadowedObjectsInInterval * fix test * add missing tests; fix a bug in QueueEntry * equals tests * fix test	2020-03-10 13:22:19 -07:00
Clint Wylie	8b9fe6f584	query laning and load shedding (#9407 ) * prototype * merge QueryScheduler and QueryManager * everything in its right place * adjustments * docs * fixes * doc fixes * use resilience4j instead of semaphore * more tests * simplify * checkstyle * spelling * oops heh * remove unused * simplify * concurrency tests * add SqlResource tests, refactor error response * add json config tests * use LongAdder instead of AtomicLong * remove test only stuffs from scheduler * javadocs, etc * style * partial review stuffs * adjust * review stuffs * more javadoc * error response documentation * spelling * preserve user specified lane for NoSchedulingStrategy * more test, why not * doc adjustment * style * missed review for make a thing a constant * fixes and tests * fix test * Update docs/configuration/index.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc update Co-authored-by: sthetland <steve.hetland@imply.io>	2020-03-10 02:57:16 -07:00
Jihoon Son	f456d2fcf8	Resource leak in DruidSegmentReader (#9476 ) * Close the Yielder in DruidSegmentReader * forbidden api	2020-03-09 10:05:25 -07:00
Chi Cao Minh	4ed83f6af6	Fix superbatch merge last partition boundaries (#9448 ) * Fix superbatch merge last partition boundaries A bug in the computation for the last parallel merge partition could cause an IndexOutOfBoundsException or precondition failure due to an empty partition. * Improve comments and tests	2020-03-04 10:35:21 -08:00
Lijia Liu	063811710e	#8690 use utc interval when create pedding segments (#9142 ) Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2020-02-26 13:20:59 -08:00
Jihoon Son	3bc7ae782c	Create splits of multiple files for parallel indexing (#9360 ) * Create splits of multiple files for parallel indexing * fix wrong import and npe in test * use the single file split in tests * rename * import order * Remove specific local input source * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc and error msg * fix build * fix a test and address comments Co-authored-by: sthetland <steve.hetland@imply.io>	2020-02-24 17:34:39 -08:00
Jihoon Son	3bb9e7e53a	Inject things instead of subclassing everything for parallel task testing (#9353 ) * Inject things instead of subclassing everything for parallel task testing * javadoc * fix compilation * fix wrong merge * Address comments	2020-02-16 13:00:12 -08:00
Chi Cao Minh	e8146d5914	More superbatch range partitioning tests (#9266 ) More functional tests to cover handling of input data that has a partition dimension that contains: 1) Null values: Should be in first partition 2) Multi values: Should cause superbatch task to abort	2020-02-10 15:17:53 -08:00
Suneet Saldanha	51d7864935	Codestyle - use java style array declaration (#9338 ) * Codestyle - use java style array declaration Replaced C-style array declarations with java style declarations and marked the intelliJ inspection as an error * cleanup test code	2020-02-10 14:25:26 -08:00
Clint Wylie	831ec172f1	Logging large segment list handling (#9312 ) * better handling of large segment lists in logs * more * adjust * exceptions * fixes * refactor * debug * heh * dang	2020-02-07 21:42:45 -08:00
Jihoon Son	e81230f9ab	Refactoring some codes around ingestion (#9274 ) * Refactoring codes around ingestion: - Parallel index task and simple task now use the same segment allocator implementation. This is reusable for the future implementation as well. - Added PartitionAnalysis to store the analysis of the partitioning - Move some util methods to SegmentLockHelper and rename it to TaskLockHelper * fix build * fix SingleDimensionShardSpecFactory * optimize SingledimensionShardSpecFactory * fix test * shard spec builder * import order * shardSpecBuilder -> partialShardSpec * build -> complete * fix comment; add unit tests for partitionBoundaries * add more tests and fix javadoc * fix toString(); add serde tests for HashBasedNumberedPartialShardSpec and SegmentAllocateAction * fix test * add equality test for hash and range partial shard specs	2020-02-07 16:23:07 -08:00
Lucas Capistrant	53bb45fc9a	Forbid easily misused HashSet and HashMap constructors (#9165 ) * Forbid easily misused HashSet and HashMap constructors * Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them * Fix visibility of constant in CollectionUtils.java * Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used * revert changes to sql module tests that should be in separate PR * Finish reverting changes to sql module tests that were flagged in checkstyle during CI * Add netty dependency resulting from SupressForbidden	2020-02-07 10:44:09 +03:00

1 2 3 4 5 ...

1748 Commits