druid

Commit Graph

Author	SHA1	Message	Date
Abhishek Agarwal	d057c5149f	Fix the offset setting in GoogleStorage#get (#10449 ) * Fix the offset in get of GCP object * upgrade compute dependency * fix version * review comments * missed	2020-10-01 08:38:58 -07:00
Clint Wylie	19c4b16640	vectorized expressions and expression virtual columns (#10401 ) * vectorized expression virtual columns * cleanup * fixes * preserve float if explicitly specified * oops * null handling fixes, more tests * what is an expression planner? * better names * remove unused method, add pi * move vector processor builders into static methods * reduce boilerplate * oops * more naming adjustments * changes * nullable * missing hex * more	2020-09-23 13:56:38 -07:00
Tarun	49a09302f3	Issue fix for CSV loading with header and skip header not parsing well. (#10398 )	2020-09-21 15:14:22 -07:00
Igor Dvorzhak	d0ee2e3a48	Upgrade ORC to 1.5.10 version (#10291 )	2020-09-18 13:38:45 -07:00
belugabehr	74368d95af	Remove JODA Time Dependency from Avro Extensions (#10010 )	2020-09-18 12:41:42 -07:00
Suneet Saldanha	0b4c897fbe	Vectorized variance aggregators (#10390 ) * wip vectorize * close but not quite * faster * unit tests * fix complex types for variance	2020-09-17 15:05:40 -07:00
Clint Wylie	184b202411	add computed Expr output types (#10370 ) * push down ValueType to ExprType conversion, tidy up * determine expr output type for given input types * revert unintended name change * add nullable * tidy up * fixup * more better * fix signatures * naming things is hard * fix inspection * javadoc * make default implementation of Expr.getOutputType that returns null * rename method * more test * add output for contains expr macro, split operation and function auto conversion	2020-09-14 18:18:56 -07:00
Jihoon Son	8f14ac814e	More structured way to handle parse exceptions (#10336 ) * More structured way to handle parse exceptions * checkstyle; add more tests * forbidden api; test * address comment; new test * address review comments * javadoc for parseException; remove redundant parseException in streaming ingestion * fix tests * unnecessary catch * unused imports * appenderator test * unused import	2020-09-11 16:31:10 -07:00
Abhishek Agarwal	a5c46dc84b	Add vectorization for druid-histogram extension (#10304 ) * First draft * Remove redundant code from FixedBucketsHistogramAggregator classes * Add test cases for new classes * Fix tests in sql compatible mode * Typo fix * Fix comment * Add spelling * Vectorize only for supported types * Rename internal aggregator files * Fix tests	2020-09-09 13:56:33 -07:00
Suneet Saldanha	a5cd5f1e84	Fix VARIANCE aggregator comparator (#10340 ) * Fix VARIANCE aggregator comparator The comparator for the variance aggregator used to compare values using the count. This is now fixed to compare values using the variance. If the variance is equal, the count and sum are used as tie breakers. * fix tests + sql compatible mode * code review * more tests * fix last test	2020-09-03 17:38:37 -07:00
Gian Merlino	8ab1979304	Remove implied profanity from error messages. (#10270 ) i.e. WTF, WTH.	2020-08-28 11:38:50 -07:00
Jihoon Son	f82fd22fa7	Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308 ) * Move tools for indexing to TaskToolbox instead of injecting them in constructor * oops, other changes * fix test * unnecessary new file * fix test * fix build	2020-08-26 17:08:12 -07:00
Abhishek Agarwal	d4ac62f284	Handle internal kinesis sequence numbers when reporting lag (#10315 ) * Handle internal kinesis sequence numbers when reporting lag * add unit test	2020-08-26 11:27:37 -07:00
Clint Wylie	ab60661008	refactor internal type system (#9638 ) * better type tracking: add typed postaggs, finalized types for agg factories * more javadoc * adjustments * transition to getTypeName to be used exclusively for complex types * remove unused fn * adjust * more better * rename getTypeName to getComplexTypeName * setup expression post agg for type inference existing * more javadocs * fixup * oops * more test * more test * more comments/javadoc * nulls * explicitly handle only numeric and complex aggregators for incremental index * checkstyle * more tests * adjust * more tests to showcase difference in behavior * timeseries longsum array	2020-08-26 10:53:44 -07:00
Jihoon Son	b5b3e6ecce	Add maxNumFiles to splitHintSpec (#10243 ) * Add maxNumFiles to splitHintSpec * missing link * fix build failure; use maxNumFiles for integration tests * spelling * lower default * Update docs/ingestion/native-batch.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * address comments; change default maxSplitSize * spelling * typos and doc * same change for segments splitHintSpec * fix build * fix build Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2020-08-21 09:43:58 -07:00
Xavier Léauté	225490474d	Update Kafka dependencies to 2.6.0 (#10286 ) * update Kafka dependencies to Kafka 2.6.0 * switch to Scala 2.13 build of Kafka * update integration tests * update Kafka tutorial	2020-08-15 07:56:40 -07:00
Clint Wylie	cfb7a893e7	fill out missing test coverage for druid-datasketches postaggs (#9730 ) * fill out missing test coverage for druid-datasketches postaggs * fixup * fixup merge * oops * oops again	2020-07-31 10:08:07 -07:00
Suneet Saldanha	e6c9142129	Add validation for authenticator and authorizer name (#10106 ) * Add validation for authorizer name * fix deps * add javadocs * Do not use resource filters * Fix BasicAuthenticatorResource as well * Add integration tests * fix test * fix	2020-07-13 21:15:54 -07:00
Suneet Saldanha	58f2e51161	Do not echo back username on auth failure (#10097 ) * Do not echo back username on auth failure * use bad username * Remove username from exception messages * fix tests * fix the tests * hopefully this time * this time the tests work * fixed this time * fix * upgrade to Jetty 9.4.30 * Unknown users echo back Unauthorized * fix	2020-07-10 12:19:10 -07:00
Maytas Monsereenusorn	4e8570b71b	Add integration tests for all InputFormat (#10088 ) * Add integration tests for Avro OCF InputFormat * Add integration tests for Avro OCF InputFormat * add tests * fix bug * fix bug * fix failing tests * add comments * address comments * address comments * address comments * fix test data * reduce resource needed for IT * remove bug fix * fix checkstyle * add bug fix	2020-07-08 12:50:29 -07:00
Franklyn Dsouza	1b9aacb1cd	Fix avg sql aggregator (#10135 ) * new average aggregator * method to create count aggregator factory * test everything * update other usages * fix style * fix more tests * fix datasketches tests	2020-07-08 08:38:56 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Jonathan Wei	ed981ef88e	Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056 ) * Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter * Remove aggressive equality check that was used for testing * Use Suppliers.memoize * Checkstyle	2020-07-01 22:26:17 -07:00
Clint Wylie	477335abb4	update links datasketches.github.io to datasketches.apache.org (#10107 ) * update links datasketches.github.io to datasketches.apache.org * now with more apache * oops * oops	2020-07-01 14:56:17 -07:00
Suneet Saldanha	363d0d86be	QueryCountStatsMonitor can be injected in the Peon (#10092 ) * QueryCountStatsMonitor can be injected in the Peon This change fixes a dependency injection bug where there is a circular dependency on getting the MonitorScheduler when a user configures the QueryCountStatsMonitor to be used. * fix tests * Actually fix the tests this time	2020-06-29 21:03:07 -07:00
Suneet Saldanha	15a0b4ffe2	Filter http requests by http method (#10085 ) * Filter http requests by http method Add a config that allows a user which http methods to allow against their Druid server. Druid will only accept http requests with the method: GET, PUT, POST, DELETE and OPTIONS. If a Druid admin wants to allow other methods, they can do so by using the ServerConfig#allowedHttpMethods config. If a Druid user would like to disallow OPTIONS, this can be done by changing the AuthConfig#allowUnauthenticatedHttpOptions config * Exclude OPTIONS from always supported HTTP methods Add HEAD as an allowed method for web console e2e tests * fix docs * fix security IT * Actually fix the web console e2e tests * Ignore icode coverage for nitialization classes * code review	2020-06-29 16:59:31 -07:00
xhl0726	1596b3eacd	Optimize protobuf parsing for flatten data (#9999 ) * optimize for protobuf parsing * fix import error and maven dependency * add unit test in protobufInputrowParserTest for flatten data * solve code duplication (remove the log and main()) * rename 'flatten' to 'flat' to make it clearer Co-authored-by: xionghuilin <xionghuilin@bytedance.com>	2020-06-24 18:01:31 -07:00
Harshpreet Singh	d96aa1586a	retry 500 and 503 errors against kinesis (#10059 ) * retry 500 and 503 errors against kinesis * add test that exercises retry logic * more branch coverage * retry 500 and 503 on getRecords request when fetching sequence numberu Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>	2020-06-23 15:49:34 -07:00
Maytas Monsereenusorn	9bab6b6371	SketchAggregator.updateUnion should handle null inside List update object (#10055 )	2020-06-19 20:29:25 -07:00
Aleksey Plekhanov	2c384b61ff	IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" (#9690 ) IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty()" Reverted checkstyle rule * Added tests to pass CI * Codestyle	2020-06-18 09:47:07 -07:00
Jonathan Wei	771870ae2d	Load broadcast datasources on broker and tasks (#9971 ) * Load broadcast datasources on broker and tasks * Add javadocs * Support HTTP segment management * Fix indexer maxSize * inspection fix * Make segment cache optional on non-historicals * Fix build * Fix inspections, some coverage, failed tests * More tests * Add CliIndexer to MainTest * Fix inspection * Rename UnprunedDataSegment to LoadableDataSegment * Address PR comments * Fix	2020-06-08 20:15:59 -07:00
Maytas Monsereenusorn	790e9482ea	Fix Subquery could not be converted to groupBy query (#9959 ) * Fix join * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * Fix Subquery could not be converted to groupBy query * add tests * address comments * fix failing tests	2020-06-03 16:46:28 -07:00
Gian Merlino	3dfd7c30c0	Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893 ) * Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT. - Add REGEXP_LIKE function that returns a boolean, and is useful in WHERE clauses. - Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect filter elision). - Fix REGEXP_EXTRACT behavior for empty patterns: should always match (previously, they threw errors). - Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed non-literal patterns. - Improve documentation of REGEXP_EXTRACT. * Changes based on PR review. * Fix arg check. * Important fixes! * Add speller. * wip * Additional tests. * Fix up tests. * Add validation error tests. * Additional tests. * Remove useless call.	2020-06-03 14:31:37 -07:00
Xavier Léauté	a934b2664c	remove ListenableFutures and revert to using the Guava implementation (#9944 ) This change removes ListenableFutures.transformAsync in favor of the existing Guava Futures.transform implementation. Our own implementation had a bug which did not fail the future if the applied function threw an exception, resulting in the future never completing. An attempt was made to fix this bug, however when running againts Guava's own tests, our version failed another half dozen tests, so it was decided to not continue down that path and scrap our own implementation. Explanation for how was this bug manifested itself: An exception thrown in BaseAppenderatorDriver.publishInBackground when invoked via transformAsync in StreamAppenderatorDriver.publish will cause the resulting future to never complete. This explains why when encountering https://github.com/apache/druid/issues/9845 the task will never complete, forever waiting for the publishFuture to register the handoff. As a result, the corresponding "Error while publishing segments ..." message only gets logged once the index task times out and is forcefully shutdown when the future is force-cancelled by the executor.	2020-06-03 10:46:03 -07:00
Clint Wylie	c2c38f6ac2	only close exec if it exists (#9952 )	2020-05-29 20:09:34 -07:00
Xavier Léauté	65280a6953	update kafka client version to 2.5.0 (#9902 ) - remove dependency on deprecated internal Kafka classes - keep LZ4 version in line with the version shipped with Kafka	2020-05-27 13:20:32 -07:00
Clint Wylie	2e9548d93d	refactor SeekableStreamSupervisor usage of RecordSupplier (#9819 ) * refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting * fix style and test * cleanup, refactor, javadocs, test * fixes * keep collecting current offsets and lag if unhealthy in background reporting thread * review stuffs * add comment	2020-05-16 14:09:39 -07:00
Alexander Saydakov	522df300c2	Datasketches 1 3 0 (#9880 ) * use the latest datasketches release * new sketch debug print Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2020-05-16 14:09:23 -07:00
Joseph Glanville	793f386d6a	Add support for Avro OCF using InputFormat (#9671 ) * Add AvroOCFInputFormat * Support supplying a reader schema in AvroOCFInputFormat * Add docs for Avro OCF input format * Address review comments * Address second round of review	2020-05-16 14:09:12 -07:00
Jihoon Son	46beaa0640	Fix potential resource leak in ParquetReader (#9852 ) * Fix potential resource leak in ParquetReader * add test * never thrown exception * catch potential exceptions	2020-05-16 09:57:12 -07:00
zachjsh	80b212fe43	druid.storage.maxListingLength should default to 1000 for s3 (#9858 ) * druid.storage.maxListingLength should default to 1000 for s3 * * Address review comments * * Address review comments * * Address comments	2020-05-14 07:00:51 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Clint Wylie	339876b69d	fill out missing test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch postaggs (#9740 ) * postagg test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch and fixes * style fixes * fix comparator for TDigestQuantilePostAggregator	2020-05-07 13:48:33 -07:00
Clint Wylie	2c0746cfab	increase druid-histogram postagg test coverage (#9732 )	2020-05-07 00:10:29 -07:00
Jihoon Son	964a1fc9df	Remove ParseSpec.toInputFormat() (#9815 ) * Remove toInputFormat() from ParseSpec * fix test	2020-05-05 11:17:57 -07:00
Alexander Saydakov	844d626738	added number of bins parameter (#9436 ) * added number of bins parameter * addressed review points * test equals Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2020-05-04 16:53:09 -07:00
Francesco Nidito	e7e41e3a36	Adding support for autoscaling in GCE (#8987 ) * Adding support for autoscaling in GCE * adding extra google deps also in gce pom * fix link in doc * remove unused deps * adding terms to spelling file * version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT * GCEXyz -> GceXyz in naming for consistency * add preconditions * add VisibleForTesting annotation * typos in comments * use StringUtils.format instead of String.format * use custom exception instead of exit * factorize interval time between retries * making literal value a constant * iter all network interfaces * use provided on google (non api) deps * adding missing dep * removing unneded this and use Objects methods instead o 3-way if in hash and comparison * adding import * adding retries around getRunningInstances and adding limit for operation end waiting * refactor GceEnvironmentConfig.hashCode * 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT * removing unused config * adding tests to hash and equals * adding nullable to waitForOperationEnd * adding testTerminate * adding unit tests for createComputeService * increasing retries in unrelated integration-test to prevent sporadic failure (hopefully) * reverting queryResponseTemplate change * adding comment for Compute.Builder.build() returning null	2020-04-28 03:13:39 -07:00
Maytas Monsereenusorn	8b78eebdbd	Test reading from empty kafka/kinesis partitions (#9729 ) * add test for stream sequence number returns null * fix checkstyle * add index test for when stream returns null * retrigger test	2020-04-27 10:23:56 -07:00
Clint Wylie	fc5383cd00	revert datasketches-java version to 1.1.0-incubating until new version is released (#9751 ) * revert datasketches-java version to 1.1.0-incubating until fix is in place * fix tests * checkstyle	2020-04-24 12:52:12 -07:00
Himanshu	b082262a2a	druid-pac4j:add custom SSL handling to com.nimbusds.oauth2.sdk.http.HTTPRequest objects (#9695 )	2020-04-15 15:59:24 -07:00
Himanshu	ca369e5768	druid-pac4j: add ability to use custom ssl trust store while talking to auth server (#9637 ) * druid-pac4j: add ability for custom ssl trust store for talking to auth server * fix nimbusds DefaultResourceRetriever name in comment	2020-04-10 18:01:59 -07:00
Suneet Saldanha	332ca19621	Fix potential integer overflow issues (#9609 ) ApproximateHistogram - seems unlikely SegmentAnalyzer - unclear if this is an actual issue GenericIndexedWriter - unclear if this is an actual issue IncrementalIndexRow and OnheapIncrementalIndex are non-issues becaus it's very unlikely for the number of dims to be large enough to hit the overflow condition	2020-04-10 11:47:08 -07:00
Suneet Saldanha	22d3eed80c	Do not use external input in format strings (#9665 ) https://lgtm.com/rules/7900080/	2020-04-10 10:46:04 -07:00
Suneet Saldanha	1ced3b33fb	IntelliJ inspections cleanup (#9339 ) * IntelliJ inspections cleanup * Standard Charset object can be used * Redundant Collection.addAll() call * String literal concatenation missing whitespace * Statement with empty body * Redundant Collection operation * StringBuilder can be replaced with String * Type parameter hides visible type * fix warnings in test code * more test fixes * remove string concatenation inspection error * fix extra curly brace * cleanup AzureTestUtils * fix charsets for RangerAdminClient * review comments	2020-04-10 10:04:40 -07:00
Clint Wylie	d267b1c414	check paths used for shuffle intermediary data manager get and delete (#9630 ) * check paths used for shuffle intermediary data manager get and delete * add test * newline * meh	2020-04-07 09:47:18 -07:00
Himanshu	fc2897da1d	pac4j: be noop if a previous authenticator in chain has successfully authenticated (#9620 )	2020-04-06 11:55:55 -07:00
bolkedebruin	2d99966933	Add Apache Ranger Authorization (#9579 )	2020-04-04 18:02:24 +02:00
Jonathan Wei	dbaabdd247	Fix for [CVE-2020-1958]: Apache Druid LDAP injection vulnerability (#9600 )	2020-04-01 14:52:01 -07:00
zachjsh	e855c7fe1b	Allow Cloud Deep Storage configs without segment bucket or path specified (#9588 ) * Allow Cloud SegmentKillers to be instantiated without segment bucket or path This change fixes a bug that was introduced that causes ingestion to fail if data is ingested from one of the supported cloud storages (Azure, Google, S3), and the user is using another type of storage for deep storage. In this case the all segment killer implementations are instantiated. A change recently made forced a dependency between the supported cloud storage type SegmentKiller classes and the deep storage configuration for that storage type being set, which forced the deep storage bucket and prefix to be non-null. This caused a NullPointerException to be thrown when instantiating the SegmentKiller classes during ingestion. To fix this issue, the respective deep storage segment configs for the cloud storage types supported in druid are now allowed to have nullable bucket and prefix configurations * * Allow google deep storage bucket to be null	2020-04-01 11:57:32 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Chi Cao Minh	c0195a19e4	Fix HDFS input source split (#9574 ) Fixes an issue where splitting an HDFS input source for use in native parallel batch ingestion would cause the subtasks to get a split with an invalid HDFS path.	2020-03-28 15:45:57 -07:00
Xavier Léauté	b4ad3d0d88	fix nullhandling exceptions related to test ordering (#9570 ) * fix nullhandling exceptions related to test ordering Tests might get executed in different order depending on the maven version and the test environment. This may lead to "NullHandling module not initialized" errors for some tests where we do not initialize null-handling explicitly. * use InitializedNullHandlingTest	2020-03-27 09:46:31 -07:00
Maytas Monsereenusorn	3f521943fc	S3 ingestion spec should not uses the default credentials provider chain when environment value password provider is misconfigured. (#9552 ) * fix s3 optional cred * S3 ingestion spec uses the default credentials provider chain when environment value password provider is misconfigured. * fix failing test	2020-03-24 15:09:02 -07:00
Himanshu	5604ac7963	druid extension for OpenID Connect auth using pac4j lib (#8992 ) * druid pac4j security extension for OpenID Connect OAuth 2.0 authentication * update version in druid-pac4j pom * introducing unauthorized resource filter * authenticated but authorized /unified-webconsole.html * use httpReq.getRequestURI() for matching callback path * add documentation * minor doc addition * licesne file updates * make dependency analyze succeed * fix doc build * hopefully fixes doc build * hopefully fixes license check build * yet another try on fixing license build * revert unintentional changes to website folder * update version to 0.18.0-SNAPSHOT * check session and its expiry on each request * add crypto service * code for encrypting the cookie * update doc with cookiePassphrase * update license yaml * make sessionstore in Pac4jFilter private non static * make Pac4jFilter fields final * okta: use sha256 for hmac * remove incubating * add UTs for crypto util and session store impl * use standard charsets * add license header * remove unused file * add org.objenesis.objenesis to license.yaml * a bit of nit changes in CryptoService and embedding EncryptionResult for clarity * rename alg to cipherAlgName * take cipher alg name, mode and padding as input * add java doc for CryptoService and make it more understandable * another UT for CryptoService * cache pac4j Config * use generics clearly in Pac4jSessionStore * update cookiePassphrase doc to mention PasswordProvider * mark stuff Nullable where appropriate in Pac4jSessionStore * update doc to mention jdbc * add error log on reaching callback resource * javadoc for Pac4jCallbackResource * introduce NOOP_HTTP_ACTION_ADAPTER * add correct module name in license file * correct extensions folder name in licenses.yaml * replace druid-kubernetes-extensions to druid-pac4j * cache SecureRandom instance * rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter	2020-03-23 18:15:45 -07:00
zachjsh	4870ad7b56	Azure deep storage does not work with datasource name containing non-ASCII chars (#9525 ) * Azure deep storage does not work with datasource name containing non-ASCII chars Fixed a bug where recording the segment file location fails when using Azure Deep Storage, if the datasource has any special characters * * update jacoco thresholds * * resolve merge conflicts * address review comments	2020-03-19 12:32:35 -07:00
zachjsh	838735411f	Ability to Delete task logs and segments from Google Storage (#9519 ) * Ability to Delete task logs and segments from Google Storage * implement ability to delete all tasks logs or all task logs written before a particular date when written to Google storage * implement ability to delete all segments from Google deep storage * * Address review comments	2020-03-18 18:00:43 -07:00
zachjsh	b18dd2b7a9	Ability to Delete task logs and segments from Azure Storage (#9523 ) * Ability to Delete task logs and segments from Azure Storage * implement ability to delete all tasks logs or all task logs written before a particular date when written to Azure storage * implement ability to delete all segments from Azure deep storage * * Address review comments	2020-03-18 17:59:17 -07:00
Gian Merlino	1ef25a438f	Broker: Add ability to inline subqueries. (#9533 ) * Broker: Add ability to inline subqueries. The main changes: - ClientQuerySegmentWalker: Add ability to inline queries. - Query: Add "getSubQueryId" and "withSubQueryId" methods. - QueryMetrics: Add "subQueryId" dimension. - ServerConfig: Add new "maxSubqueryRows" parameter, which is used by ClientQuerySegmentWalker to limit how many rows can be inlined per query. - IndexedTableJoinMatcher: Allow creating keys on top of unknown types, by assuming they are strings. This is useful because not all types are known for fields in query results. - InlineDataSource: Store RowSignature rather than component parts. Add more zealous "equals" and "hashCode" methods to ease testing. - Moved QuerySegmentWalker test code from CalciteTests and SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest. * Adjustments from CI. * Fix integration test.	2020-03-18 15:06:45 -07:00
Clint Wylie	142742f291	add kinesis lag metric (#9509 ) * add kinesis lag metric * fixes * heh * do it right this time * more test * split out supervisor report lags into lagMillis, remove latest offsets from kinesis supervisor report since always null, review stuffs	2020-03-16 21:39:53 -07:00
Chi Cao Minh	e7b3dd9cd1	Update to mysql connector 5.1.48 (#9514 )	2020-03-16 10:38:31 -07:00
Gian Merlino	ff59d2e78b	Move RowSignature from druid-sql to druid-processing and make use of it. (#9508 ) * Move RowSignature from druid-sql to druid-processing and make use of it. 1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific stuff in a RowSignatures utility class. It also picked up some new convenience methods along the way. 2) There were a lot of places in the code where Map<String, ValueType> was used to associate columns with type info. These are now all replaced with RowSignature. 3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature, and it now provides type info. * Fix up extensions. * Various fixes	2020-03-12 11:06:44 -07:00
zachjsh	7e0e767cc2	Ability to Delete task logs and segments from S3 (#9459 ) * Ability to Delete task logs and segments from S3 * implement ability to delete all tasks logs or all task logs written before a particular date when written to S3 * implement ability to delete all segments from S3 deep storage * upgrade version of aws SDK in use * * update licenses for updated AWS SDK version * * fix bug in iterating through results from S3 * revert back to original version of AWS SDK * * Address review comments * * Fix failing dependency check	2020-03-10 13:13:46 -07:00
Gian Merlino	c6c2282b59	Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484 ) * Harmonization and bug-fixing for selector and filter behavior on unknown types. - Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory system, and set defaultType COMPLEX so unknown types can be dynamically matched. - Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing. - Switch various methods to use convertObjectToX when casting to numbers, rather than ad-hoc and inconsistent logic. - Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return true even for 0- or 1- element arrays. - Adjust various javadocs. * Add throwParseExceptions option to Rows.objectToNumber, switch back to that. * Update tests. * Adjust moment sketch tests.	2020-03-10 07:15:57 -07:00
Clint Wylie	f8b1f2f7f3	fix issue when distinct grouping dimensions are optimized into the same virtual column expression (#9429 ) * fix issue when distinct grouping dimensions are optimized into the same virtual column expression * fix tests * more better * fixes	2020-03-09 17:48:29 -07:00
Jihoon Son	9466ac7c9b	Skip empty files for local, hdfs, and cloud input sources (#9450 ) * Skip empty files for local, hdfs, and cloud input sources * split hint spec doc * doc for skipping empty files * fix typo; adjust tests * unnecessary fluent iterable * address comments * fix test * use the right lists * fix test * fix test	2020-03-03 20:51:06 -08:00
Maytas Monsereenusorn	92fb83726b	Add support for optional aws credentials for s3 for ingestion (#9375 ) * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion * fix build failure * fix failing build * fix failing build * Code cleanup * fix failing test * Removed CloudConfigProperties and make specific class for each cloudInputSource * Removed CloudConfigProperties and make specific class for each cloudInputSource * pass s3ConfigProperties for split * lazy init s3client * update docs * fix docs check * address comments * add ServerSideEncryptingAmazonS3.Builder * fix failing checkstyle * fix typo * wrap the ServerSideEncryptingAmazonS3.Builder in a provider * added java docs for S3InputSource constructor * added java docs for S3InputSource constructor * remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider	2020-02-25 20:59:53 -08:00
zachjsh	d771b42ed1	Move Azure extension into Core (#9394 ) * Move Azure extension into Core Moving the azure extension into Core. * * Fix build failure * * Add The MIT License (MIT) to list of compatible licenses * * Address review comments * * change reference to contrib azure to core azure * * Fix spelling mistakes.	2020-02-25 17:49:16 -08:00
Chi Cao Minh	7fc99ee206	Add common optional dependencies for extensions (#9399 ) * Add common optional dependencies for extensions Include hadoop-aws and postgres JDBC connector jar to improve out-of-the-box experience for extensions. The mysql JDBC connector jar is not bundled as it is GPL. * Update docs * Fix typo	2020-02-25 00:04:00 -08:00
Jihoon Son	3bc7ae782c	Create splits of multiple files for parallel indexing (#9360 ) * Create splits of multiple files for parallel indexing * fix wrong import and npe in test * use the single file split in tests * rename * import order * Remove specific local input source * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc and error msg * fix build * fix a test and address comments Co-authored-by: sthetland <steve.hetland@imply.io>	2020-02-24 17:34:39 -08:00
Clint Wylie	6d8dd5ec10	string -> expression -> string -> expression (#9367 ) * add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays * oops, macros are expressions too * style * spotbugs * qualified type arrays * review stuffs * simplify grammar * more permissive array parsing * reuse expr joiner * fix it	2020-02-21 15:43:02 -08:00
zachjsh	f707064bed	Add Azure config options for segment prefix and max listing length (#9356 ) * Add Azure config options for segment prefix and max listing length Added configuration options to allow the user to specify the prefix within the segment container to store the segment files. Also added a configuration option to allow the user to specify the maximum number of input files to stream for each iteration. * * Fix test failures * * Address review comments * * add dependency explicitly to pom * * update docs * * Address review comments * * Address review comments	2020-02-21 14:12:03 -08:00
Clint Wylie	b408a6d774	sql support for dynamic parameters (#6974 ) * sql support for dynamic parameters * fixup * javadocs * fixup from merge * formatting * fixes * fix it * doc fix * remove druid fallback self-join parameterized test * unused imports * ignore test for now * fix imports * fixup * fix merge * merge fixup * fix test that cannot vectorize * fixup and more better * dependency thingo * fix docs * tweaks * fix docs * spelling * unused imports after merge * review stuffs * add comment * add ignore text * review stuffs	2020-02-19 13:09:20 -08:00
Chi Cao Minh	e7eb45e648	Run IntelliJ inspections on Travis (#9179 ) * Run IntelliJ inspections on Travis Running IntelliJ inspections currently takes about 90 minutes, but they can be run in about 30 minutes on Travis. * Restore assert statements	2020-02-19 11:34:19 +03:00
Adam Peck	e9aebd994a	Fix for building in Eclipse & VS Code. (#7481 ) Fixes #6866 Reverse dependencies from /main/ to /test/ Add generated-test-sources to source path for Eclipse	2020-02-13 14:58:32 -08:00
Jonathan Wei	48a0681f7e	Fix basic auth polling to skip retries when cachedSerializedGroupMappingMap returns 404 (#9354 )	2020-02-12 16:52:03 -08:00
Clint Wylie	c3ebb5eb65	variance aggregator support for double columns (#9076 ) * variance aggregator support for double column instead of casting to float * docs * everything in its right place * checkstyle * adjustments	2020-02-12 09:32:42 -08:00
Manish Gill	d268ff7297	Use ExecutorService instead of ScheduledExecutorService where necessary (#9325 ) * Use ExecutorService instead of ScheduledExecutorService where necessary - #9286 * Added inspection rule to prohibit ScheduledExecutorService assignment to ExecutorService	2020-02-11 19:05:48 -08:00
zachjsh	5c202343c9	implement Azure InputSource reader and deprecate Azure FireHose (#9306 ) * IMPLY-1946: Improve code quality and unit test coverage of the Azure extension * Update unit tests to increase test coverage for the extension * Clean up any messy code * Enfore code coverage as part of tests. * * Update azure extension pom to remove unnecessary things * update jacoco thresholds * * updgrade version of azure-storage library version uses to most upto-date version * implement Azure InputSource reader and deprecate Azure FireHose * implement azure InputSource reader * deprecate Azure FireHose implementation * * exclude common libraries that are included from druid core * Implement more of Azure input source. * * Add tests * * Add more tests * * deprecate azure firehose * * added more tests * * rollback fix for google cloud batch ingestion bug. Will be fixed in another PR. * * Added javadocs for all azure related classes * Addressed review comments * * Remove dependency on org.apache.commons:commons-collections4 * Fix LGTM warnings * Add com.google.inject.extensions:guice-assistedinject to licenses * * rename classes as suggested in review comments * * Address review comments * * Address review comments * * Address review comments	2020-02-11 17:41:58 -08:00
Atul Mohan	7968524b01	Add Pig-specific file handling to Avro parser (#9258 ) * Add processing for data files from AvroStorage * Add words to spellings file	2020-02-10 21:53:11 -08:00
Suneet Saldanha	51d7864935	Codestyle - use java style array declaration (#9338 ) * Codestyle - use java style array declaration Replaced C-style array declarations with java style declarations and marked the intelliJ inspection as an error * cleanup test code	2020-02-10 14:25:26 -08:00
Clint Wylie	831ec172f1	Logging large segment list handling (#9312 ) * better handling of large segment lists in logs * more * adjust * exceptions * fixes * refactor * debug * heh * dang	2020-02-07 21:42:45 -08:00
Clint Wylie	b55657cc26	fix protobuf extension packaging and docs (#9320 ) * fix protobuf extension packaging and docs * fix paths * Update protobuf.md * Update protobuf.md	2020-02-07 09:26:52 -08:00
Lucas Capistrant	53bb45fc9a	Forbid easily misused HashSet and HashMap constructors (#9165 ) * Forbid easily misused HashSet and HashMap constructors * Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them * Fix visibility of constant in CollectionUtils.java * Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used * revert changes to sql module tests that should be in separate PR * Finish reverting changes to sql module tests that were flagged in checkstyle during CI * Add netty dependency resulting from SupressForbidden	2020-02-07 10:44:09 +03:00
Gian Merlino	3ef5c2f2e8	Add MemoryOpenHashTable, a table similar to ByteBufferHashTable. (#9308 ) * Add MemoryOpenHashTable, a table similar to ByteBufferHashTable. With some key differences to improve speed and design simplicity: 1) Uses Memory rather than ByteBuffer for its backing storage. 2) Uses faster hashing and comparison routines (see HashTableUtils). 3) Capacity is always a power of two, allowing simpler design and more efficient implementation of findBucket. 4) Does not implement growability; instead, leaves that to its callers. The idea is this removes the need for subclasses, while still giving callers flexibility in how to handle table-full scenarios. * Fix LGTM warnings. * Adjust dependencies. * Remove easymock from druid-benchmarks. * Adjustments from review. * Fix datasketches unit tests. * Fix checkstyle.	2020-02-04 19:57:59 -08:00
zachjsh	768d60c7b4	Get larger batch of input files when using native batch with google cloud (#9307 ) By default native batch ingestion was only getting a batch of 10 files at a time when used with google cloud. The Default for other cloud providers is 1024, and should be similar for google cloud. The low batch size was caused by mistype. This change updates the batch size to 1024 when using google cloud.	2020-02-04 12:03:32 -08:00
Clint Wylie	5c541f556b	remove log.info from FixedBucketsHistogramAggregator aggregate method (#9309 )	2020-02-04 11:52:50 -08:00
Suneet Saldanha	33a97dfaae	Guicify druid sql module (#9279 ) * Guicify druid sql module Break up the SQLModule in to smaller modules and provide a binding that modules can use to register schemas with druid sql. * fix some tests * address code review * tests compile * Working tests * Add all the tests * fix up licenses and dependencies * add calcite dependency to druid-benchmarks * tests pass * rename the schemas	2020-02-04 11:33:48 -08:00
Gian Merlino	b411443d22	SQL join support for lookups. (#9294 ) * SQL join support for lookups. 1) Add LookupSchema to SQL, so lookups show up in the catalog. 2) Add join-related rels and rules to SQL, allowing joins to be planned into native Druid queries. * Add two missing LookupSchema calls in tests. * Fix tests. * Fix typo.	2020-01-31 23:51:16 -08:00
Gian Merlino	660f8838f4	Allow HdfsDataSegmentKiller to be instantiated without storageDirectory set. (#9296 ) This is important because if a user has the hdfs extension loaded, but is not using hdfs deep storage, then they will not have storageDirectory set and will get the following error: IllegalArgumentException: Can not create a Path from an empty string at io.druid.storage.hdfs.HdfsDataSegmentKiller.<init>(HdfsDataSegmentKiller.java:47) This scenario is realistic: it comes up when someone has the hdfs extension loaded because they want to use HdfsInputSource, but don't want to use hdfs for deep storage. Fixes #4694.	2020-01-31 23:50:48 -08:00
Gian Merlino	204ba9966f	Add LookupJoinableFactory. (#9281 ) * Add LookupJoinableFactory. Enables joins where the right-hand side is a lookup. Includes an integration test. Also, includes changes to LookupExtractorFactoryContainerProvider: 1) Add "getAllLookupNames", which will be needed to eventually connect lookups to Druid's SQL catalog. 2) Convert "get" from nullable to Optional return. 3) Swap out most usages of LookupReferencesManager in favor of the simpler LookupExtractorFactoryContainerProvider interface. * Fixes for tests. * Fix another test. * Java 11 message fix. * Fixups. * Fixup benchmark class.	2020-01-30 14:46:21 -08:00

1 2 3 4 5 ...

835 Commits