druid

Commit Graph

Author	SHA1	Message	Date
Curt Buechter	e3735602f2	Fix typo (#10385 )	2020-09-11 16:31:36 -07:00
Jihoon Son	8f14ac814e	More structured way to handle parse exceptions (#10336 ) * More structured way to handle parse exceptions * checkstyle; add more tests * forbidden api; test * address comment; new test * address review comments * javadoc for parseException; remove redundant parseException in streaming ingestion * fix tests * unnecessary catch * unused imports * appenderator test * unused import	2020-09-11 16:31:10 -07:00
Cheng Pan	8aea8cf1c6	Unit tests fail due to missing extend InitializedNullHandlingTest (#10382 ) * CsvInputFormatTest should extend InitializedNullHandlingTest * FirehoseFactoryToInputSourceAdaptorTest should extends InitializedNullHandlingTest	2020-09-11 16:23:46 -07:00
Lucas Capistrant	690e070c43	Fix doc for name of dynamic config to pause coordination (#10345 )	2020-09-11 08:40:06 -05:00
Abhishek Agarwal	a5c46dc84b	Add vectorization for druid-histogram extension (#10304 ) * First draft * Remove redundant code from FixedBucketsHistogramAggregator classes * Add test cases for new classes * Fix tests in sql compatible mode * Typo fix * Fix comment * Add spelling * Vectorize only for supported types * Rename internal aggregator files * Fix tests	2020-09-09 13:56:33 -07:00
Joy Kent	e5f0da30ae	Fix stringFirst/stringLast rollup during ingestion (#10332 ) * Add IndexMergerRollupTest This changelist adds a test to merge indexes with StringFirst/StringLast aggregator. * Fix StringFirstAggregateCombiner/StringLastAggregateCombiner The segment-level type for stringFirst/stringLast is SerializablePairLongString, not String. This changelist fixes it. * Fix EarliestLatestAnySqlAggregator to handle COMPLEX type This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX type as an operand. For its return type, we set it to VARCHAR, since COMPLEX column is only generated by stringFirst/stringLast during ingestion rollup. * Return value with smaller timestamp in StringFirstAggregatorFactory.combine function * Add integration tests for stringFirst/stringLast during ingestion * Use one EarliestLatestReturnTypeInference instance Co-authored-by: Joy Kent <joy@automonic.ai>	2020-09-08 17:36:04 -07:00
Jihoon Son	d32d1e7004	Fix result-level caching (#10341 ) * create baseSequence early * unit test * add comment and a new test	2020-09-08 11:04:00 -07:00
Chi Cao Minh	176b715624	Ignore CVEs from htrace and ambari transitive deps (#10353 ) * Ignore CVEs from htrace and ambari transitive deps htrace CVEs are suppressed for now as addressing them requires updating the hadoop version. ambari CVEs are suppressed for now since ambari is updated to the latest version and is no longer actively maintained. * Fix compilation issue from ambari upgrade * Add missing test coverage	2020-09-04 15:22:26 -07:00
Suneet Saldanha	91a153820e	fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator (#10325 ) * fix NPE in StringGroupByColumnSelectorStrategy#bufferComparator * Add tests * javadocs	2020-09-04 13:23:40 -07:00
Gian Merlino	d7fcff3aba	StringFirstAggregatorFactory: Fix incorrect "combine" method. (#10351 ) * StringFirstAggregatorFactory: Fix incorrect "combine" method. There was a test, but it was wrong. * Fix superclass.	2020-09-03 20:03:26 -07:00
LightGHLi	a3bb6ee4a6	Add missing comma between JSON members in data-formats.md (#10343 )	2020-09-03 20:03:06 -07:00
Suneet Saldanha	a5cd5f1e84	Fix VARIANCE aggregator comparator (#10340 ) * Fix VARIANCE aggregator comparator The comparator for the variance aggregator used to compare values using the count. This is now fixed to compare values using the variance. If the variance is equal, the count and sum are used as tie breakers. * fix tests + sql compatible mode * code review * more tests * fix last test	2020-09-03 17:38:37 -07:00
xiangqiao123	3fc8bc0701	optimize announceHistoricalSegments (#9935 ) * optimize announceHistoricalSegment * optimize announceHistoricalSegment * revert offline SegmentTransactionalInsertAction uses a separate lock * optimize segmentExistsBatch: Avoid too many elements in the in condition * add unit test && Modified according to cr Co-authored-by: xiangqiao <xiangqiao@kuaishou.com>	2020-09-02 13:07:10 -07:00
Clint Wylie	a7924a9dee	add link to Docker quickstart in github README (#10299 ) Per suggestion in comment https://github.com/apache/druid/pull/9262#issuecomment-675732237, I think this should eventually result in the copy mirrored on dockerhub to also be updated, if I understand how things work. Only the github `README.md` has been updated, not the `README.template` used for src and bin packages because presumably if you are reading from either of those you are just going to run locally and so the local quickstart is appropriate.	2020-09-02 01:17:34 -07:00
Vadim Ogievetsky	e81a9df507	Web console: add tile for Azure Event Hubs (via Kafka API) (#10317 ) * Add Azure Event Hubs * better note * update icon	2020-08-31 20:58:52 -07:00
Clint Wylie	475d86a4f7	split up Expr.java (#10333 )	2020-08-31 12:51:53 -07:00
Gian Merlino	8ab1979304	Remove implied profanity from error messages. (#10270 ) i.e. WTF, WTH.	2020-08-28 11:38:50 -07:00
Gian Merlino	5cd7610fb6	SQL support for union datasources. (#10324 ) * SQL support for union datasources. Exposed via the "UNION ALL" operator. This means that there are now two different implementations of UNION ALL: one at the top level of a query that works by concatenating subquery results, and one at the table level that works by creating a UnionDataSource. The SQL documentation is updated to discuss these two use cases and how they behave. Future work could unify these by building support for a native datasource that represents the union of multiple subqueries. (Today, UnionDataSource can only represent the union of tables, not subqueries.) * Fixes. * Error message for sanity check. * Additional test fixes. * Add some error messages.	2020-08-28 07:57:06 -07:00
Jihoon Son	f82fd22fa7	Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308 ) * Move tools for indexing to TaskToolbox instead of injecting them in constructor * oops, other changes * fix test * unnecessary new file * fix test * fix build	2020-08-26 17:08:12 -07:00
Gian Merlino	21703d81ac	Fix handling of 'join' on top of 'union' datasources. (#10318 ) * Fix handling of 'join' on top of 'union' datasources. The problem is that unions are typically rewritten into a series of individual queries on the underlying tables, but this isn't done when the union is wrapped in a join. The main changes are in UnionQueryRunner: 1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis. 2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource". Together, these enable UnionQueryRunner to "see through" a join. * Tests. * Adjust heap sizes for integration tests. * Different approach, more tests. * Tweak. * Styling.	2020-08-26 14:23:54 -07:00
Jihoon Son	b9ff3483ac	Add support for all partitioing schemes for auto compaction (#10307 ) * Add support for all partitioing schemes for auto compaction * annotate last compaction state for multi phase parallel indexing * fix build and tests * test * better home	2020-08-26 13:19:18 -07:00
Fernando	69d8645425	Adding supported compression formats for native batch ingestion (#10306 ) * Adding supported compression formats for native batch ingestion * Update docs/ingestion/native-batch.md Co-authored-by: sthetland <steve.hetland@imply.io> * fix spellcheck Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: sthetland <steve.hetland@imply.io>	2020-08-26 12:39:48 -07:00
Abhishek Agarwal	d4ac62f284	Handle internal kinesis sequence numbers when reporting lag (#10315 ) * Handle internal kinesis sequence numbers when reporting lag * add unit test	2020-08-26 11:27:37 -07:00
Clint Wylie	ab60661008	refactor internal type system (#9638 ) * better type tracking: add typed postaggs, finalized types for agg factories * more javadoc * adjustments * transition to getTypeName to be used exclusively for complex types * remove unused fn * adjust * more better * rename getTypeName to getComplexTypeName * setup expression post agg for type inference existing * more javadocs * fixup * oops * more test * more test * more comments/javadoc * nulls * explicitly handle only numeric and complex aggregators for incremental index * checkstyle * more tests * adjust * more tests to showcase difference in behavior * timeseries longsum array	2020-08-26 10:53:44 -07:00
Suneet Saldanha	a9de00d43a	Remove NUMERIC_HASHING_THRESHOLD (#10313 ) * Make NUMERIC_HASHING_THRESHOLD configurable Change the default numeric hashing threshold to 1 and make it configurable. Benchmarks attached to this PR show that binary searches are not more faster than doing a set contains check. The attached flamegraph shows the amount of time a query spent in the binary search. Given the benchmarks, we can expect to see roughly a 2x speed up in this part of the query which works out to ~ a 10% faster query in this instance. * Remove NUMERIC_HASHING_THRESHOLD * Remove stale docs	2020-08-25 20:05:39 -07:00
Gian Merlino	91bb27cdf7	Clarify SQL behavior for multi-value dimensions. (#10276 ) There are some known inconsistencies between SQL and native that users should be aware of.	2020-08-25 10:11:16 -07:00
Gian Merlino	f53785c52c	ExpressionFilter: Use index for expressions of single multi-value columns. (#10320 ) Previously, this was disallowed, because expressions treated multi-values as nulls. But now, if there's a single multi-value column that can be mapped over, it's okay to use the index. Expression selectors already do this.	2020-08-24 23:29:31 -07:00
Suneet Saldanha	707b5aae2b	Optimize large InDimFilters (#10312 ) * Optimize large InDimFilters For large InDimFilters, in default mode, the filter does a linear check of the set to see if it contains either an empty or null. If it does, the empties are converted to nulls by passing through the entire list again. Instead of this, in default mode, we attempt to remove an empty string from the values that are passed to the InDimFilter. If an empty string was removed, we add null to the set * code review * Revert "code review" This reverts commit `61fe33ebf7`. * code review - less brittle	2020-08-24 16:39:27 -07:00
frank chen	028442e75e	Redis cache extension enhancement (#10240 ) * support redis cluster * add 'password', 'database' properties * test cases passed * update doc * some improvements * fix CI * add more test cases to improve branch coverage * fix dependency check for test * resolve review comments	2020-08-24 10:29:04 +08:00
Himanshu	a607e9e7ff	introduce interning of internal files names in SmooshedFileMapper (#10295 )	2020-08-21 17:37:49 -07:00
Gian Merlino	0910d22f48	Add SQL "OFFSET" clause. (#10279 ) * Add SQL "OFFSET" clause. Under the hood, this uses the new offset features from #10233 (Scan) and #10235 (GroupBy). Since Timeseries and TopN queries do not currently have an offset feature, SQL planning will switch from one of those to Scan or GroupBy if users add an OFFSET. Includes a refactoring to harmonize offset and limit planning using an OffsetLimit wrapper class. This is useful because it ensures that the various places that need to deal with offset and limit collapsing all behave the same way, using its "andThen" method. * Fix test and add another test.	2020-08-21 14:11:54 -07:00
Jihoon Son	b5b3e6ecce	Add maxNumFiles to splitHintSpec (#10243 ) * Add maxNumFiles to splitHintSpec * missing link * fix build failure; use maxNumFiles for integration tests * spelling * lower default * Update docs/ingestion/native-batch.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * address comments; change default maxSplitSize * spelling * typos and doc * same change for segments splitHintSpec * fix build * fix build Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2020-08-21 09:43:58 -07:00
Clint Wylie	7620b0c54e	Segment backed broadcast join IndexedTable (#10224 ) * Segment backed broadcast join IndexedTable * fix comments * fix tests * sharing is caring * fix test * i hope this doesnt fix it * filter by schema to maybe fix test * changes * close join stuffs so it does not leak, allow table to directly make selector factory * oops * update comment * review stuffs * better check	2020-08-20 14:12:39 -07:00
Atul Mohan	618c04a99e	Fix CombiningFirehose compatibility (#10264 ) * Fix CombiningFirehose * Add integration test * Fix path * Add full datasource name * Fix input location Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-08-20 10:37:38 -07:00
Clint Wylie	b36dab0fe6	fix connectionId issue with JDBC prepared statement queries and router (#10272 ) * fix router jdbc prepared statement connectionId issue * column metadata too * style * remove tls * try tls again * add keystore stuffs * use keyManager password * add unit test * simplify	2020-08-19 00:18:06 -07:00
Jihoon Son	9a81740281	Don't log the entire task spec (#10278 ) * Don't log the entire task spec * fix lgtm * fix serde * address comments and add tests * fix tests * remove unnecessary codes	2020-08-18 11:03:13 -07:00
Suneet Saldanha	0891b1f833	Add note about aggregations on floats (#10285 ) * Add note about aggreations on floats Floating point math is known to be unstable. Due to the way aggregators work across segments it's possible for the same query operating on the same data to produce slightly different results. The same problem exists with any aggregators that are not commutative since the merge order across segments is not guaranteed. * Also talk about doubles * Apply suggestions from code review	2020-08-17 13:29:57 -07:00
Vatsal Bajpai	ee40d00be1	typo fix from hear to here (#10292 ) Should be `There are no other changes that need to be made here`	2020-08-17 07:54:21 -07:00
Xavier Léauté	225490474d	Update Kafka dependencies to 2.6.0 (#10286 ) * update Kafka dependencies to Kafka 2.6.0 * switch to Scala 2.13 build of Kafka * update integration tests * update Kafka tutorial	2020-08-15 07:56:40 -07:00
Himanshu	12ae84165e	remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaderClient.go(..) with InputStreamFullResponseHandler (#9717 ) * remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaadereClient.go(..) with InputStreamFullResponseHandler * remove ByteArrayResponseHolder dependency from JsonParserIterator * add UT to cover lines in InputStreamFullResponseHandler * refactor SystemSchema to reduce branches * further reduce branches * Revert "add UT to cover lines in InputStreamFullResponseHandler" This reverts commit `330aba3dd9`. * UTs for InputStreamFullResponseHandler * remove unused imports	2020-08-14 10:51:18 -07:00
Gian Merlino	6cca7242de	Add "offset" parameter to the Scan query. (#10233 ) * Add "offset" parameter to the Scan query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Fix constructor call. * Fix up JSONs. * Fix call to ScanQuery. * Doc update. * Fix javadocs. * Spotbugs, LGTM suppressions. * Javadocs. * Fix suppression. * Stabilize Scan query result order, add tests. * Update LGTM comment. * Fixup. * Test different batch sizes too. * Nicer tests. * Fix comment.	2020-08-13 14:56:24 -07:00
Clint Wylie	e053348f74	add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219 ) * add isNullable to ColumnCapabilities, ColumnAnalysis * better builder * fix segment metadata queries in integration tests * adjustments * cleanup * fix spotbugs * treat unknown as true in segmentmetadata * rename to hasNulls, add docs * fixup * test the dim indexer selector isNull fix for numeric columns * fixes * oof	2020-08-13 14:55:32 -07:00
Jihoon Son	a61263b4a9	Allow forceLimitPushDown in SQL (#10253 ) * Allow forceLimitPushDown in SQL * fix test * fix test * review comments * fix test	2020-08-13 13:30:41 -07:00
Vadim Ogievetsky	748a83cb78	Web console: fix json input (#10271 ) * fix json input * tidy up * add error extraction test	2020-08-13 12:20:58 -07:00
Gian Merlino	89860b7d6a	Fix javadoc mistake in DefaultLimitSpec. (#10269 ) Javadoc for getLimit should say it's a limit, not an offset.	2020-08-13 12:17:26 -07:00
Gian Merlino	d36a0f61da	Clarify documentation on dimensions, dimensionExclusions. (#10265 ) In particular: exclusions are ignored if dimensions are set.	2020-08-12 08:06:53 -07:00
Gian Merlino	e273264332	Fix two id-over-maxId errors in StringDimensionIndexer. (#10245 ) 1) lookupId could return IDs beyond maxId if called with a recently added value. 2) getRow could return an ID for null beyond maxId, if null was recently encountered in a dimension that initially didn't appear at all. (In this case, the dictionary ID for null can be > 0). Also add a comment explaining how this stuff is supposed to work.	2020-08-11 20:32:10 -07:00
Suneet Saldanha	6baea0b4d5	Fix broken sampler for re-indexing (#10196 ) * Fix broken sampler for re-indexer When re-indexing a Druid datasource, the web-console would generate an invalid inputFormat since the type is not specified. * code review	2020-08-11 19:22:26 -07:00
Clint Wylie	c72f96a4ba	fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248 ) * fix bug with realtime expressions on sparse string columns * fix test * add comment back * push capabilities for dimensions to dimension indexers since they know things * style * style * fixes * getting a bit carried away * missed one * fix it * benchmark build fix * review stuffs * javadoc and comments * add comment * more strict check * fix missed usaged of impl instead of interface	2020-08-11 11:07:17 -07:00
Jihoon Son	35284e5166	Make stale bot less aggressive (#10261 )	2020-08-10 20:59:02 -07:00

... 2 3 4 5 6 ...

10708 Commits All Branches Search

10708 Commits

All Branches