druid

Commit Graph

Author	SHA1	Message	Date
Kashif Faraz	abac9e39ed	Revert permission changes to Supervisor and Task APIs (#11819 ) * Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)" This reverts commit `f2d6100124`. * Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)" This reverts commit `6779c4652d`. * Fix docs for the reverted commits * Fix and restore deleted tests * Fix and restore SystemSchemaTest	2021-10-25 14:50:38 +05:30
Gian Merlino	98ecbb21cd	Remove CloseQuietly and migrate its usages to other methods. (#10247 ) * Remove CloseQuietly and migrate its usages to other methods. These other methods include: 1) New method CloseableUtils.closeAndWrapExceptions, which wraps IOExceptions in RuntimeExceptions for callers that just want to avoid dealing with checked exceptions. Most usages were migrated to this method, because it looks like they were mainly attempts to avoid declaring a throws clause, and perhaps were unintentionally suppressing IOExceptions. 2) New method CloseableUtils.closeInCatch, designed to properly close something in a catch block without losing exceptions. Some usages from catch blocks were migrated here, when it seemed that they were intended to avoid checked exception handling, and did not really intend to also suppress IOExceptions. 3) New method CloseableUtils.closeAndSuppressExceptions, which sends all exceptions to a "chomper" that consumes them. Nothing is thrown or returned. The behavior is slightly different: with this method, _all_ exceptions are suppressed, not just IOExceptions. Calls that seemed like they had good reason to suppress exceptions were migrated here. 4) Some calls were migrated to try-with-resources, in cases where it appeared that CloseQuietly was being used to avoid throwing an exception in a finally block. 🎵 You don't have to go home, but you can't stay here... 🎵 * Remove unused import. * Fix up various issues. * Adjustments to tests. * Fix null handling. * Additional test. * Adjustments from review. * Fixup style stuff. * Fix NPE caused by holder starting out null. * Fix spelling. * Chomp Throwables too.	2021-10-23 17:03:21 -07:00
Gian Merlino	cb9bc15e95	Fix task report streaming in https setups. (#11739 ) * Fix task report streaming in https setups. * Trivial change to re-trigger ITs.	2021-10-22 19:07:29 -07:00
Clint Wylie	187df58e30	better types (#11713 ) * better type system * needle in a haystack * ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support * fixup merge * more test * fixup * intern * fix * oops * oops again * ... * more test coverage * fix error message * adjust interning, more javadocs * oops * more docs more better	2021-10-19 01:47:25 -07:00
David Bar	7d4841471f	Optimize supervisor history retrieval for specific id (#11807 ) Optimization. Fetch from the metadata store only the relevant history items for the requested supervisor id.	2021-10-19 14:08:25 +05:30
Kashif Faraz	f2d6100124	Require Datasource WRITE authorization for Supervisor and Task access (#11718 ) Follow up PR for #11680 Description Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE authorization even if they are purely informative. Changes Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks" Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history Check Datasource for all Indexing Task APIs	2021-10-08 10:39:48 +05:30
Maytas Monsereenusorn	3c487ff5b4	fix broken build (#11727 )	2021-09-21 22:59:51 +07:00
Lucas Capistrant	5c3f3da146	Add handoff wait time to IngestionStatsAndErrorsTaskReportData (#11090 ) * Add handoff wait time to ingestion stats report. Refactor some code for batch handoff * fix checkstyle * Add assertion to AbstractITBatchIndexTask to make sure report reflects wait for segments happened * add docs to the task reports section of doc	2021-09-20 22:48:44 -07:00
Jonathan Wei	22b41ddbbf	Task reports for parallel task: single phase and sequential mode (#11688 ) * Task reports for parallel task: single phase and sequential mode * Address comments * Add null check for currentSubTaskHolder	2021-09-16 13:58:11 -05:00
Kashif Faraz	6779c4652d	Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680 ) * Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter * Remove unused imports * Add SupervisorResourceFilterTest * Verify mocks in test	2021-09-09 11:55:30 -07:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Agustin Gonzalez	9efa6cc9c8	Make persists concurrent with adding rows in batch ingestion (#11536 ) * Make persists concurrent with ingestion * Remove semaphore but keep concurrent persists (with add) and add push in the backround as well * Go back to documented default persists (zero) * Move to debug * Remove unnecessary Atomics * Comments on synchronization (or not) for sinks & sinkMetadata * Some cleanup for unit tests but they still need further work * Shutdown & wait for persists and push on close * Provide support for three existing batch appenderators using batchProcessingMode flag * Fix reference to wrong appenderator * Fix doc typos * Add BatchAppenderators class test coverage * Add log message to batchProcessingMode final value, fix typo in enum name * Another typo and minor fix to log message * LEGACY->OPEN_SEGMENTS, Edit docs * Minor update legacy->open segments log message * More code comments, mostly small adjustments to naming etc * fix spelling * Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage * Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments" Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-09-08 13:31:52 -07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
Clint Wylie	f2ac6cd96e	fix parse exception handling for stream parsers (#11556 ) * fix parse exception handling * fix style and inspections	2021-08-09 12:40:44 -07:00
Maytas Monsereenusorn	06bae29979	Fix ingestion task failure when no input split to process (#11553 ) * fix ingestion task failure when no input split to process * add IT * fix IT	2021-08-09 23:11:08 +07:00
Rohan Garg	1a562f444c	Cleanup hadoop dependencies in indexing modules (#11516 ) * Remove hadoop-yarn-common dependency (cherry picked from commit d767c8f3d204d9d27d8122d55680c3c9f1cfe473) * Remove hdfs dependency from druid core	2021-08-03 17:56:54 -07:00
Agustin Gonzalez	a2da407b70	Add error msg to parallel task's TaskStatus (#11486 ) * Add error msg to parallel task's TaskStatus * Consolidate failure block * Add failure test * Make it fail * Add fail while stopped * Simplify hash task test using a runner that fails after so many runs (parameter) * Remove unthrown exception * Use runner names to identify phase * Added range partition kill test & fixed a timing bug with the custom runner * Forbidden api * Style * Unit test code cleanup * Added message to invalid state exception and improved readability of the phase error messages for the parallel task failure unit tests	2021-08-02 12:11:28 -07:00
Harini Rajendran	995d99d9e4	add ingest/notices/queueSize metric to give visibility into supervisor notices queue size (#11417 )	2021-07-30 07:59:26 -07:00
Jonathan Wei	9b250c54aa	Allow kill task to mark segments as unused (#11501 ) * Allow kill task to mark segments as unused * Add IndexerSQLMetadataStorageCoordinator test * Update docs/ingestion/data-management.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Add warning to kill task doc Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-07-29 10:48:43 -05:00
Maytas Monsereenusorn	c068906fca	Make intermediate store for shuffle tasks an extension point (#11492 ) * add interface * add docs * fix errors * fix injection * fix injection * update javadoc	2021-07-27 11:29:43 +07:00
Abhishek Agarwal	ce1faa5635	Make SegmentLoader extensible and customizable (#11398 ) This PR refactors the code related to segment loading specifically SegmentLoader and SegmentLoaderLocalCacheManager. SegmentLoader is marked UnstableAPI which means, it can be extended outside core druid in custom extensions. Here is a summary of changes SegmentLoader returns an instance of ReferenceCountingSegment instead of Segment. Earlier, SegmentManager was wrapping Segment objects inside ReferenceCountingSegment. That is now moved to SegmentLoader. With this, a custom implementation can track the references of segments. It also allows them to create custom ReferenceCountingSegment implementations. For this reason, the constructor visibility in ReferenceCountingSegment is changed from private to protected. SegmentCacheManager has two additional methods called - reserve(DataSegment) and release(DataSegment). These methods let the caller reserve or release space without calling SegmentLoader#getSegment. We already had similar methods in StorageLocation and now they are available in SegmentCacheManager too which wraps multiple locations. Refactoring to simplify the code in SegmentCacheManager wherever possible. There is no change in the functionality.	2021-07-22 18:00:49 +05:30
Jihoon Son	0453e461f6	Add errorMessage in taskStatus for task failures in middleManagers/indexers/peons (#11446 ) * Add error message; add unit tests for ForkingTaskRunner * add tests * fix comment * unused import * add exit code in error message * fix test	2021-07-20 21:34:53 -07:00
Abhishek Agarwal	94c1671eaf	Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466 ) This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager. SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects. SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.	2021-07-21 00:14:19 +05:30
Jihoon Son	8729b40893	Add the error message in taskStatus for task failures in overlord (#11419 ) * add error messages in taskStatus for task failures in overlord * unused imports * add helper message for logs to look up * fix tests * fix counting the same task failures more than once * same fix for HttpRemoteTaskRunner	2021-07-15 13:14:28 -07:00
Maytas Monsereenusorn	8d7d60d18e	Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440 ) * fix pendingTaskBased * fix doc * address comments * address comments * address comments * address comments * address comments * address comments * address comments	2021-07-15 06:52:25 +07:00
Agustin Gonzalez	7e61042794	Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294 ) * Bound memory in native batch ingest create segments * Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that * Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged * Changed name from RealtimeAppenderator to StreamAppenderator * Style * Incorporating tests from StreamAppenderatorTest * Keep totalRows and cleanup code * Added missing dep * Fix unit test * Checkstyle * allowIncrementalPersists should always be true for batch * Added sinks metadata * clear sinks metadata when closing appenderator * Style + minor edits to log msgs * Update sinks metadata & totalRows when dropping a sink (segment) * Remove max * Intelli-j check * Keep a count of hydrants persisted by sink for sanity check before merge * Move out sanity * Add previous hydrant count to sink metadata * Remove redundant field from SinkMetadata * Remove unneeded functions * Cleanup unused code * Removed unused code * Remove unused field * Exclude it from jacoco because it is very hard to get branch coverage * Remove segment announcement and some other minor cleanup * Add fallback flag * Minor code cleanup * Checkstyle * Code review changes * Update batchMemoryMappedIndex name * Code review comments * Exclude class from coverage, will include again when packaging gets fixed * Moved test classes to server module * More BatchAppenderator cleanup * Fix bug in wrong counting of totalHydrants plus minor cleanup in add * Removed left over comments * Have BatchAppenderator follow the Appenderator contract for push & getSegments * Fix LGTM violations * Review comments * Add stats after push is done * Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag, remove real time flag stuff from stream appenderator, etc.) * Update javadocs * Add thread safety notice to BatchAppenderator * Further cleanup config * More config cleanup	2021-07-09 00:10:29 -07:00
Harini Rajendran	4c90c0c21d	Adding more debug logs to increase visibility into StreamSupervisor notices queue size and processing time. (#11415 )	2021-07-08 16:15:15 -07:00
Jason Koch	d3220693f3	perf: improve concurrency and improve perf for task query in HeapMemoryTaskStorage (#11272 ) * perf: improve concurrency and reduce algorithmic cost for task querying in HeapMemoryTaskStorage * fix: address intellij linter concern regarding use of ConcurrentMap interface * nit: document thread safety of HeapMemoryTaskStorage * empty to trigger ci	2021-07-05 10:07:26 +08:00
Abhishek Agarwal	03a6a6d6e1	Replace Processing ExecutorService with QueryProcessingPool (#11382 ) This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.	2021-07-01 16:03:08 +05:30
Yi Yuan	de8daf8139	Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351 ) * delete_buildV9Directly_in_kafka_and_kinesis_indexing_service * delete * delete them from server * delete buildV9Directly from hadoop indexing * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-23 16:36:46 -07:00
Xavier Léauté	c8b3f8cc00	avoid logging pause message multiple times (#11375 ) In some instances the ingestion thread could be woken up spuriously, resulting in the "Received pause command..." log message getting logged multiple times. This change ensures we only log it once the first time the pause is requested.	2021-06-22 09:30:38 -07:00
Kashif Faraz	f0b105ec63	Temporarily skip compaction for locked intervals (#11190 ) * Add overlord API /lockedIntervals. Skip compaction for locked intervals * Revert formatting changes * Add license info * Fix checkstyle * Remove invalid API invocation * Fix checkstyle * Add DatasourceIntervalsTest * Fix checkstyle * Remove LockedIntervalsResponse * Add integration tests for lockedIntervals * Add ITAutoCompactionLockContentionTest * Add config druid.coordinator.compaction.skipLockedIntervals * Add test for TaskQueue	2021-06-20 17:21:59 -07:00
Xavier Léauté	712f2a5d00	upgrade error-prone to 2.7.1 and support checks with Java 11+ (#11363 ) * upgrade error-prone to 2.7.1 and support checks with Java 11+ - upgrade error-prone to 2.7.1 - support running error-prone with Java 11 and above using -Xplugin instead of custom compiler - add compiler arguments to ignore warnings/errors in Java 15/16 - introduce strictCompile property to enable strict profiles since we now need multiple strict profiles for Java 8 - properly exclude all generated source files from error-prone - fix druid-processing overriding annotation processors from parent pom - fix druid-core disabling most non-default checks - align plugin and annotation errorprone versions - fix / suppress additional issues found by error-prone: * fix bug in SeekableStreamSupervisor initializing ArrayList size with the taskGroupdId * fix missing @Override annotations - remove outdated compiler plugin in benchmarks - remove deleted ParameterPackage error-prone rule - re-enable checks on benchmark module as well * fix IntelliJ inspections * disable LongFloatConversion due to bug in error-prone with JDK 8 * add comment about InsecureCrypto	2021-06-16 12:55:34 -07:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
zachjsh	27f1b6cbf3	Fix Index hadoop failing with index.zip is not a valid DFS filename (#11316 ) * * Fix bug * * simplify class loading * * fix example configs for integration tests * Small classloader cleanup Co-authored-by: jon-wei <jon.wei@imply.io>	2021-06-01 19:14:50 -04:00
Xavier Léauté	b517c3339b	remove ZooKeeper 3.4 support + pass tests with Java 15 (#11073 ) With this change, Druid will only support ZooKeeper 3.5.x and later. In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x (see #10780 for the detailed reasons) * remove ZooKeeper 3.4.x compatibility * exclude additional ZK 3.5.x netty dependencies to ensure we use our version * keep ZooKeeper version used for integration tests in sync with client library version * remove the need to specify ZK version at runtime for docker * add support to run integration tests with JDK 15 * build and run unit tests with Java 15 in travis	2021-05-25 12:49:49 -07:00
Jihoon Son	4100c5edc0	Fix taskQueue to honor (#11243 ) useLineageBasedSegmentAllocation in default taskContext	2021-05-14 09:30:50 +08:00
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn	4326e699bd	Add feature to automatically remove datasource metadata based on retention period (#11227 ) * add auto clean up datasource metadata * add test * fix checkstyle * add comments * fix error * address comments * Address comments * fix test * fix test * fix typo * add comment * fix test * fix test	2021-05-11 01:22:33 -07:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
zachjsh	99f39c7202	Hadoop segment index file rename (#11194 ) * Do stuff * Do more stuff * * Do more stuff * * Do more stuff * * working * * cleanup * * more cleanup * * more cleanup * * add license header * * Add unit tests * * add java docs * * add more unit tests * * Cleanup test * * Move removing of workingPath to index task rather than in hadoop job. * * Address review comments * * remove unused import * * Address review comments * Do not overwrite segment descriptor for segment if it already exists. * * add comments to FileSystemHelper class * * fix local hadoop integration test * * Fix failing test failures when running with java11 * Revert "Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)" This reverts commit `49a9c3ffb7`. * * remove JobHelperPowerMockTest * * remove FileSystemHelper class	2021-05-04 20:22:18 -04:00
Jonathan Wei	49a9c3ffb7	Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075 )" (#11151 ) This reverts commit `a2892d9c40`.	2021-04-22 15:33:27 -07:00
zachjsh	a2892d9c40	Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075 ) * Do stuff * Do more stuff * * Do more stuff * * Do more stuff * * working * * cleanup * * more cleanup * * more cleanup * * add license header * * Add unit tests * * add java docs * * add more unit tests * * Cleanup test * * Move removing of workingPath to index task rather than in hadoop job. * * Address review comments * * remove unused import * * Address review comments * Do not overwrite segment descriptor for segment if it already exists. * * add comments to FileSystemHelper class * * fix local hadoop integration test	2021-04-21 12:24:31 -07:00
Maytas Monsereenusorn	4576152e4a	Make dropExisting flag for Compaction configurable and add warning documentations (#11070 ) * Make dropExisting flag for Compaction configurable * fix checkstyle * fix checkstyle * fix test * add tests * fix spelling * fix docs * add IT * fix test * fix doc * fix doc	2021-04-09 00:12:28 -07:00
Lucas Capistrant	8264203cee	Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676 ) * Add ability to wait for segment availability for batch jobs * IT updates * fix queries in legacy hadoop IT * Fix broken indexing integration tests * address an lgtm flag * spell checker still flagging for hadoop doc. adding under that file header too * fix compaction IT * Updates to wait for availability method * improve unit testing for patch * fix bad indentation * refactor waitForSegmentAvailability * Fixes based off of review comments * cleanup to get compile after merging with master * fix failing test after previous logic update * add back code that must have gotten deleted during conflict resolution * update some logging code * fixes to get compilation working after merge with master * reset interrupt flag in catch block after code review pointed it out * small changes following self-review * fixup some issues brought on by merge with master * small changes after review * cleanup a little bit after merge with master * Fix potential resource leak in AbstractBatchIndexTask * syntax fix * Add a Compcation TuningConfig type * add docs stipulating the lack of support by Compaction tasks for the new config * Fixup compilation errors after merge with master * Remove erreneous newline	2021-04-08 21:03:00 -07:00
Xavier Léauté	15bdd6bc2f	Fix unit tests and GC settings for Java 15 (#11074 ) * JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it * Fix flaky HTTP client tests with Java 15 * Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15	2021-04-08 10:33:37 -07:00
Maytas Monsereenusorn	d7f5293364	Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (#11025 ) * Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer. * Add option to drop segments after ingestion * fix checkstyle * add tests * add tests * add tests * fix test * add tests * fix checkstyle * fix checkstyle * add docs * fix docs * address comments * address comments * fix spelling	2021-04-01 12:29:36 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Jihoon Son	a041933017	Allow overlapping intervals for the compaction task (#10912 ) * Allow overlapping intervals for the compaction task * unused import * line indentation Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2021-03-23 11:21:54 -07:00
Maytas Monsereenusorn	f19c2e9ce4	If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948 ) * add IT * add IT * add the fix * fix checkstyle * fix compile * fix compile * fix test * fix test * address comments	2021-03-18 17:04:28 -07:00

1 2 3 4 5 ...

1853 Commits