druid

Commit Graph

Author	SHA1	Message	Date
Karan Kumar	90640bb316	Support for hadoop 3 via maven profiles (#11794 ) Add support for hadoop 3 profiles . Most of the details are captured in #11791 . We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.	2021-10-30 22:46:24 +05:30
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Rohan Garg	1a562f444c	Cleanup hadoop dependencies in indexing modules (#11516 ) * Remove hadoop-yarn-common dependency (cherry picked from commit d767c8f3d204d9d27d8122d55680c3c9f1cfe473) * Remove hdfs dependency from druid core	2021-08-03 17:56:54 -07:00
Agustin Gonzalez	7e61042794	Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294 ) * Bound memory in native batch ingest create segments * Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that * Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged * Changed name from RealtimeAppenderator to StreamAppenderator * Style * Incorporating tests from StreamAppenderatorTest * Keep totalRows and cleanup code * Added missing dep * Fix unit test * Checkstyle * allowIncrementalPersists should always be true for batch * Added sinks metadata * clear sinks metadata when closing appenderator * Style + minor edits to log msgs * Update sinks metadata & totalRows when dropping a sink (segment) * Remove max * Intelli-j check * Keep a count of hydrants persisted by sink for sanity check before merge * Move out sanity * Add previous hydrant count to sink metadata * Remove redundant field from SinkMetadata * Remove unneeded functions * Cleanup unused code * Removed unused code * Remove unused field * Exclude it from jacoco because it is very hard to get branch coverage * Remove segment announcement and some other minor cleanup * Add fallback flag * Minor code cleanup * Checkstyle * Code review changes * Update batchMemoryMappedIndex name * Code review comments * Exclude class from coverage, will include again when packaging gets fixed * Moved test classes to server module * More BatchAppenderator cleanup * Fix bug in wrong counting of totalHydrants plus minor cleanup in add * Removed left over comments * Have BatchAppenderator follow the Appenderator contract for push & getSegments * Fix LGTM violations * Review comments * Add stats after push is done * Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.) * Update javadocs * Add thread safety notice to BatchAppenderator * Further cleanup config * More config cleanup	2021-07-09 00:10:29 -07:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
Xavier Léauté	b517c3339b	remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073 ) With this change, Druid will only support ZooKeeper 3.5.x and later. In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x (see #10780 for the detailed reasons) * remove ZooKeeper 3.4.x compatibility * exclude additional ZK 3.5.x netty dependencies to ensure we use our version * keep ZooKeeper version used for integration tests in sync with client library version * remove the need to specify ZK version at runtime for docker * add support to run integration tests with JDK 15 * build and run unit tests with Java 15 in travis	2021-05-25 12:49:49 -07:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Jihoon Son	ad437dd655	Add shuffle metrics for parallel indexing (#10359 ) * Add shuffle metrics for parallel indexing * javadoc and concurrency test * concurrency * fix javadoc * Feature flag * doc * fix doc and add a test * checkstyle * add tests * fix build and address comments	2020-10-10 19:35:17 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
morrifeldman	f6594fff60	Fix missing temp dir for native single_dim (#10046 ) * Fix missing temp dir for native single_dim Native single dim indexing throws a file not found exception from InputEntityIteratingReader.java:81. This MR creates the required temporary directory when setting up the PartialDimensionDistributionTask. The change was tested on a Druid cluster. After installing the change native single_dim indexing completes successfully. * Fix indentation * Use SinglePhaseSubTask as example for creating the temp dir * Move temporary indexing dir creation in to TaskToolbox * Remove unused dependency Co-authored-by: Morri Feldman <morri@appsflyer.com>	2020-06-25 14:41:22 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Chi Cao Minh	e8146d5914	More superbatch range partitioning tests (#9266 ) More functional tests to cover handling of input data that has a partition dimension that contains: 1) Null values: Should be in first partition 2) Multi values: Should cause superbatch task to abort	2020-02-10 15:17:53 -08:00
Chi Cao Minh	0b0056b77f	More tests for range partition parallel indexing (#9232 ) Add more unit tests for range partition native batch parallel indexing. Also, fix a bug where ParallelIndexPhaseRunner incorrectly thinks that identical collected DimensionDistributionReports are not equal due to not overriding equals() in DimensionDistributionReport.	2020-01-21 12:59:43 -08:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Chi Cao Minh	ff6217365b	Refactor parallel indexing perfect rollup partitioning (#8852 ) * Refactor parallel indexing perfect rollup partitioning Refactoring to make it easier to later add range partitioning for perfect rollup parallel indexing. This is accomplished by adding several new base classes (e.g., PerfectRollupWorkerTask) and new classes for encapsulating logic that needs to be changed for different partitioning strategies (e.g., IndexTaskInputRowIteratorBuilder). The code is functionally equivalent to before except for the following small behavior changes: 1) PartialSegmentMergeTask: Previously, this task had a priority of DEFAULT_TASK_PRIORITY. It now has a priority of DEFAULT_BATCH_INDEX_TASK_PRIORITY (via the new PerfectRollupWorkerTask base class), since it is a batch index task. 2) ParallelIndexPhaseRunner: A decorator was added to subTaskSpecIterator to ensure the subtasks are generated with unique ids. Previously, only tests (i.e., MultiPhaseParallelIndexingTest) would have this decorator, but this behavior is desired for non-test code as well. * Fix forbidden apis and pmd warnings * Fix analyze dependencies warnings * Fix IndexTask json and add IT diags * Fix parallel index supervisor<->worker serde * Fix TeamCity inspection errors/warnings * Fix TeamCity inspection errors/warnings again * Integrate changes with those from #8823 * Address review comments * Address more review comments * Fix forbidden apis * Address more review comments	2019-11-20 17:24:12 -08:00
Roman Leventov	5c0fc0a13a	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 ) * IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals() * Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever * Fix tests * More fixes * Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code * Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase * More test fixes * More test fixes * Add a comment to VersionedIntervalTimelineTestBase * Fix tests * Set DataSegment.size(0) in more tests * Specify DataSegment.size(0) in more places in tests * Fix more tests * Fix DruidSchemaTest * Set DataSegment's size in more tests and benchmarks * Fix HdfsDataSegmentPusherTest * Doc changes addressing comments * Extended doc for visibility * Typo * Typo 2 * Address comment	2019-11-06 11:07:04 -08:00
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
Dylan Wylie	b2821a8371	do not exclude client core jar (#8339 ) make indexing service depend on hadoop client	2019-08-26 13:48:24 -07:00
Chi Cao Minh	ab71a2e1e4	Revert "Fix dependency analyze warnings (#8128 )" (#8189 ) This reverts commit `5dd0d8e873`.	2019-07-29 11:42:16 -07:00
Chi Cao Minh	5dd0d8e873	Fix dependency analyze warnings (#8128 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports. * Fix licenses and dependencies * Fix licenses and dependencies again * Fix integration test dependency * Address review comments * Fix unit test dependencies * Fix integration test dependency * Fix integration test dependency again * Fix integration test dependency third time * Fix integration test dependency fourth time * Fix compile error * Fix assert package	2019-07-26 10:49:03 -07:00
Jihoon Son	7abfbb066a	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
Jonathan Wei	fafbc4a80e	Set version to 0.15.0-incubating-SNAPSHOT (#7014 )	2019-02-07 14:02:52 -08:00
Jonathan Wei	8bc5eaa908	Set version to 0.14.0-incubating-SNAPSHOT (#7003 )	2019-02-04 19:36:20 -08:00
Clint Wylie	84598fba3b	combine druid-api, druid-common, java-util into druid-core (#6443 ) * combine druid-api, druid-common, java-util * spacing	2018-10-14 20:37:37 -07:00
David Lim	20ab213ba6	change project versions to 0.13.0-incubating-SNAPSHOT (#6453 )	2018-10-11 19:28:01 -07:00
Roman Leventov	3ae563263a	Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements (#5957 ) This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR. Some of the changes: - Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names. - Generified `ComplexMetricExtractor`	2018-10-02 14:50:22 -03:00
Gian Merlino	431d3d8497	Rename io.druid to org.apache.druid. (#6266 ) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests.	2018-08-30 09:56:26 -07:00
Gian Merlino	04ea3c9f8c	Update license headers. (#5976 ) * Update license headers. For compliance with http://www.apache.org/legal/src-headers.html. * More license adjustments. * Fix mistakenly edited package line.	2018-07-11 09:55:18 -07:00
Jihoon Son	9dca5ec76b	Simple cleanup for ThreadPoolTaskRunner and SetAndVerifyContextQueryRunner / Add ThreadPoolTaskRunnerTest (#5557 ) * Simple fix for ThreadPoolTaskRunner * fix build * address comments * update javadoc * fix build * fix test * add dependency	2018-05-15 22:53:11 +05:30
Gian Merlino	7e02408510	Update versions to 0.13.0-SNAPSHOT. (#5323 )	2018-02-02 12:06:38 -06:00
Roman Leventov	5787d04fad	Bump Druid version to 0.12.0 (#5138 )	2017-12-15 07:37:01 -08:00
Gian Merlino	1f2074c247	Bump versions in master to 0.11.1-SNAPSHOT. (#4878 ) * Bump versions in master to 0.11.1-SNAPSHOT. * Missed a few.	2017-09-28 17:09:51 -05:00
Roman Leventov	ae900a4934	Update versions to 0.11.0-SNAPSHOT (#4483 )	2017-06-28 17:05:58 -07:00
Gian Merlino	1f2afccdf8	Expressions: Add ExprMacros. (#4365 ) * Expressions: Add ExprMacros, which have the same syntax as functions, but can convert themselves to any kind of Expr at parse-time. ExprMacroTable is an extension point for adding new ExprMacros. Anything that might need to parse expressions needs an ExprMacroTable, which can be injected through Guice. * Address code review comments.	2017-06-08 09:32:10 -04:00
Gian Merlino	2ca7b00346	Update versions to 0.10.1-SNAPSHOT. (#4191 )	2017-04-20 18:12:28 -07:00
Gian Merlino	12317fd001	Bump version to 0.10.0-SNAPSHOT. (#3913 )	2017-02-06 17:54:35 -08:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00

1 2 3 4 5 ...

503 Commits