druid

Commit Graph

Author	SHA1	Message	Date
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Lucas Capistrant	8264203cee	Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676 ) * Add ability to wait for segment availability for batch jobs * IT updates * fix queries in legacy hadoop IT * Fix broken indexing integration tests * address an lgtm flag * spell checker still flagging for hadoop doc. adding under that file header too * fix compaction IT * Updates to wait for availability method * improve unit testing for patch * fix bad indentation * refactor waitForSegmentAvailability * Fixes based off of review comments * cleanup to get compile after merging with master * fix failing test after previous logic update * add back code that must have gotten deleted during conflict resolution * update some logging code * fixes to get compilation working after merge with master * reset interrupt flag in catch block after code review pointed it out * small changes following self-review * fixup some issues brought on by merge with master * small changes after review * cleanup a little bit after merge with master * Fix potential resource leak in AbstractBatchIndexTask * syntax fix * Add a Compcation TuningConfig type * add docs stipulating the lack of support by Compaction tasks for the new config * Fixup compilation errors after merge with master * Remove erreneous newline	2021-04-08 21:03:00 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
zhangyue19921010	8b4f966708	[BUG FIX]Kinesis lag keep increasing when there is no more new data for kinesis stream (#11006 ) * fix kinesis lag metrics bug and modify current UT * done * revert misc.xml * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-19 07:47:27 -07:00
Maytas Monsereenusorn	ed91a2bb38	Fix Kinesis should not increment throwAway count on EOS record (#10976 ) * fix Kinesis increament throwAway on EOS record * fix checkstyle * fix IT * fix test * fix IT * fix IT * fix IT * fix IT	2021-03-11 22:04:58 -08:00
Maytas Monsereenusorn	4dd22a850b	Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962 ) * Fix streaming ingestion fails and halt if it encounters empty rows * address comments	2021-03-09 12:11:58 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Abhishek Agarwal	96d26e5338	Fix kinesis ingestion bugs (#10761 ) * add offsetFetchPeriod to kinesis ingestion doc * Remove jackson dependencies from extensions * Use fixed delay for lag collection * Metrics reset after finishing processing * comments * Broaden the list of exceptions to retry for * Unit tests * Add more tests * Refactoring * re-order metrics * Doc suggestions Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Add tests Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-02-05 02:49:58 -08:00
Maytas Monsereenusorn	a46d561bd7	Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740 ) * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix checkstyle * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix test * fix test * add log * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * address comments * fix checkstyle * fix checkstyle * add config to skip overhead memory calculation * add test for the skipBytesInMemoryOverheadCheck config * add docs * fix checkstyle * fix checkstyle * fix spelling * address comments * fix travis * address comments	2021-01-27 00:34:56 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Xavier Léauté	118b50195e	Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730 ) Today Kafka message support in streaming indexing tasks is limited to message values, and does not provide a way to expose Kafka headers, timestamps, or keys, which may be of interest to more specialized Druid input formats. For instance, Kafka headers may be used to indicate payload format/encoding or additional metadata, and timestamps are often omitted from values in Kafka streams applications, since they are included in the record. This change proposes to introduce KafkaRecordEntity as InputEntity, which would give input formats full access to the underlying Kafka record, including headers, key, timestamps. It would also open access to low-level information such as topic, partition, offset if needed. KafkaEntity is a subclass of ByteEntity for backwards compatibility with existing input formats, and to avoid introducing unnecessary complexity for Kinesis indexing tasks.	2021-01-08 16:04:37 -08:00
Jonathan Wei	769c21cc87	Add sample method to IndexingServiceClient (#10729 ) * Add sample method to IndexingServiceClient * Add unit test * Fix LGTM	2021-01-05 15:02:44 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Liran Funaro	f3a2903218	Configurable Index Type (#10335 ) * Introduce a Configurable Index Type * Change to @UnstableApi * Add AppendableIndexSpecTest * Update doc * Add spelling exception * Add tests coverage * Revert some of the changes to reduce diff * Minor fixes * Update getMaxBytesInMemoryOrDefault() comment * Fix typo, remove redundant interface * Remove off-heap spec (postponed to a later PR) * Add javadocs to AppendableIndexSpec * Describe testCreateTask() * Add tests for AppendableIndexSpec within TuningConfig * Modify hashCode() to conform with equals() * Add comment where building incremental-index * Add "EqualsVerifier" tests * Revert some of the API back to AppenderatorConfig * Don't use multi-line comments * Remove knob documentation (deferred)	2020-10-23 18:34:26 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Jihoon Son	8f14ac814e	More structured way to handle parse exceptions (#10336 ) * More structured way to handle parse exceptions * checkstyle; add more tests * forbidden api; test * address comment; new test * address review comments * javadoc for parseException; remove redundant parseException in streaming ingestion * fix tests * unnecessary catch * unused imports * appenderator test * unused import	2020-09-11 16:31:10 -07:00
Gian Merlino	8ab1979304	Remove implied profanity from error messages. (#10270 ) i.e. WTF, WTH.	2020-08-28 11:38:50 -07:00
Jihoon Son	f82fd22fa7	Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308 ) * Move tools for indexing to TaskToolbox instead of injecting them in constructor * oops, other changes * fix test * unnecessary new file * fix test * fix build	2020-08-26 17:08:12 -07:00
Abhishek Agarwal	d4ac62f284	Handle internal kinesis sequence numbers when reporting lag (#10315 ) * Handle internal kinesis sequence numbers when reporting lag * add unit test	2020-08-26 11:27:37 -07:00
Suneet Saldanha	e6c9142129	Add validation for authenticator and authorizer name (#10106 ) * Add validation for authorizer name * fix deps * add javadocs * Do not use resource filters * Fix BasicAuthenticatorResource as well * Add integration tests * fix test * fix	2020-07-13 21:15:54 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Suneet Saldanha	363d0d86be	QueryCountStatsMonitor can be injected in the Peon (#10092 ) * QueryCountStatsMonitor can be injected in the Peon This change fixes a dependency injection bug where there is a circular dependency on getting the MonitorScheduler when a user configures the QueryCountStatsMonitor to be used. * fix tests * Actually fix the tests this time	2020-06-29 21:03:07 -07:00
Harshpreet Singh	d96aa1586a	retry 500 and 503 errors against kinesis (#10059 ) * retry 500 and 503 errors against kinesis * add test that exercises retry logic * more branch coverage * retry 500 and 503 on getRecords request when fetching sequence numberu Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>	2020-06-23 15:49:34 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
Jonathan Wei	771870ae2d	Load broadcast datasources on broker and tasks (#9971 ) * Load broadcast datasources on broker and tasks * Add javadocs * Support HTTP segment management * Fix indexer maxSize * inspection fix * Make segment cache optional on non-historicals * Fix build * Fix inspections, some coverage, failed tests * More tests * Add CliIndexer to MainTest * Fix inspection * Rename UnprunedDataSegment to LoadableDataSegment * Address PR comments * Fix	2020-06-08 20:15:59 -07:00
Xavier Léauté	a934b2664c	remove ListenableFutures and revert to using the Guava implementation (#9944 ) This change removes ListenableFutures.transformAsync in favor of the existing Guava Futures.transform implementation. Our own implementation had a bug which did not fail the future if the applied function threw an exception, resulting in the future never completing. An attempt was made to fix this bug, however when running againts Guava's own tests, our version failed another half dozen tests, so it was decided to not continue down that path and scrap our own implementation. Explanation for how was this bug manifested itself: An exception thrown in BaseAppenderatorDriver.publishInBackground when invoked via transformAsync in StreamAppenderatorDriver.publish will cause the resulting future to never complete. This explains why when encountering https://github.com/apache/druid/issues/9845 the task will never complete, forever waiting for the publishFuture to register the handoff. As a result, the corresponding "Error while publishing segments ..." message only gets logged once the index task times out and is forcefully shutdown when the future is force-cancelled by the executor.	2020-06-03 10:46:03 -07:00
Clint Wylie	c2c38f6ac2	only close exec if it exists (#9952 )	2020-05-29 20:09:34 -07:00
Clint Wylie	2e9548d93d	refactor SeekableStreamSupervisor usage of RecordSupplier (#9819 ) * refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting * fix style and test * cleanup, refactor, javadocs, test * fixes * keep collecting current offsets and lag if unhealthy in background reporting thread * review stuffs * add comment	2020-05-16 14:09:39 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Jihoon Son	964a1fc9df	Remove ParseSpec.toInputFormat() (#9815 ) * Remove toInputFormat() from ParseSpec * fix test	2020-05-05 11:17:57 -07:00
Maytas Monsereenusorn	8b78eebdbd	Test reading from empty kafka/kinesis partitions (#9729 ) * add test for stream sequence number returns null * fix checkstyle * add index test for when stream returns null * retrigger test	2020-04-27 10:23:56 -07:00
Clint Wylie	d267b1c414	check paths used for shuffle intermediary data manager get and delete (#9630 ) * check paths used for shuffle intermediary data manager get and delete * add test * newline * meh	2020-04-07 09:47:18 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Clint Wylie	142742f291	add kinesis lag metric (#9509 ) * add kinesis lag metric * fixes * heh * do it right this time * more test * split out supervisor report lags into lagMillis, remove latest offsets from kinesis supervisor report since always null, review stuffs	2020-03-16 21:39:53 -07:00
Roman Leventov	b9186f8f9f	Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306 ) * Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error * Fix brace * Import order * Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill * Fix tests * Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY * More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters * Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig * More variable and method renames * Rename MetadataSegments to SegmentsMetadata * Javadoc update * Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs * Update Javadoc of VersionedIntervalTimeline.iterateAllObjects() * Reorder imports * Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers * Complete merge * Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests * Remove MetadataSegmentManager * Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments * Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder * Fix inspections * Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest * Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods * Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator * Unused import * Optimize imports * Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata() * Unused import * Update terminology in datasource-view.tsx * Fix label in datasource-view.spec.tsx.snap * Fix lint errors in datasource-view.tsx * Doc improvements * Another attempt to please TSLint * Another attempt to please TSLint * Style fixes * Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge) * Try to fix docs build issue * Javadoc and spelling fixes * Rename SegmentsMetadata to SegmentsMetadataManager, address other comments * Address more comments	2020-01-27 11:24:29 -08:00
Gian Merlino	19b427e8f3	Add JoinableFactory interface and use it in the query stack. (#9247 ) * Add JoinableFactory interface and use it in the query stack. Also includes InlineJoinableFactory, which enables joining against inline datasources. This is the first patch where a basic join query actually works. It includes integration tests. * Fix test issues. * Adjustments from code review.	2020-01-24 13:10:01 -08:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Chi Cao Minh	513bb1f6da	Get proper Kinesis index task AWS credentials (#9082 ) Previously, the configured S3 credentials would be used instead of the ones configured for Kinesis for Kinesis index tasks.	2019-12-20 19:35:05 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Jonathan Wei	00ce18a0ea	Additional Kinesis resharding fixes (#8870 ) * Additional Kinesis resharding fixes * Address PR comments * Remove unused method * Adjust SegmentTransactionalInsertAction null handling * Check for unchanged metadata on empty publish * Add logs for empty publish * Fix javadoc * Clear offset when invalid endOffsets are seen * Fix LGTM alert * Fix build * Add resharding note to Kinesis docs * Checkstyle * Spelling * Address PR comments * Checkstyle	2019-11-28 12:59:01 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Jihoon Son	ac6d703814	Support inputFormat and inputSource for sampler (#8901 ) * Support inputFormat and inputSource for sampler * Cleanup javadocs and names * fix style * fix timed shutoff input source reader * fix timed shutoff input source reader again * tidy up timed shutoff reader * unused imports * fix tc	2019-11-20 14:51:25 -08:00
Surekha	d628bebbd7	Make supervisor API similar to submit task API (#8810 ) * accept spec or dataSchema, tuningConfig, ioConfig while submitting task json * fix test * update docs * lgtm warning * Add original constructor back to IndexTask to minimize changes * fix indentation in docs * Allow spec to be specified in supervisor schema * undo IndexTask spec changes * update docs * Add Nullable and deprecated annotations * remove deprecated configs from SeekableStreamSupervisorSpec * remove nullable annotation	2019-11-20 10:04:41 -08:00
Clint Wylie	3fcaa1a61b	fix sql compatible null handling config work with runtime.properties (#8876 ) * fix sql compatible null handling config work with runtime.properties * fix npe * fix tests * add friendly error * comment, and friendlier still * fix compile * fix from merges	2019-11-20 03:55:29 -08:00
Rye	d0913475b7	sampler returns nulls in CSV (#8871 ) * sampler returns nulls in CSV * fixed kafka sampler test * fix Kinesis test * sql compatibility fix * remove null to empty string conversion, use null * fix sql compatibility	2019-11-19 13:59:44 -08:00
Gian Merlino	c44452f0c1	Tidy up lifecycle, query, and ingestion logging. (#8889 ) * Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.	2019-11-19 13:57:58 -08:00

1 2 3

112 Commits