druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	ab60661008	refactor internal type system (#9638 ) * better type tracking: add typed postaggs, finalized types for agg factories * more javadoc * adjustments * transition to getTypeName to be used exclusively for complex types * remove unused fn * adjust * more better * rename getTypeName to getComplexTypeName * setup expression post agg for type inference existing * more javadocs * fixup * oops * more test * more test * more comments/javadoc * nulls * explicitly handle only numeric and complex aggregators for incremental index * checkstyle * more tests * adjust * more tests to showcase difference in behavior * timeseries longsum array	2020-08-26 10:53:44 -07:00
Himanshu	a607e9e7ff	introduce interning of internal files names in SmooshedFileMapper (#10295 )	2020-08-21 17:37:49 -07:00
Jihoon Son	b5b3e6ecce	Add maxNumFiles to splitHintSpec (#10243 ) * Add maxNumFiles to splitHintSpec * missing link * fix build failure; use maxNumFiles for integration tests * spelling * lower default * Update docs/ingestion/native-batch.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * address comments; change default maxSplitSize * spelling * typos and doc * same change for segments splitHintSpec * fix build * fix build Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2020-08-21 09:43:58 -07:00
Jihoon Son	9a81740281	Don't log the entire task spec (#10278 ) * Don't log the entire task spec * fix lgtm * fix serde * address comments and add tests * fix tests * remove unnecessary codes	2020-08-18 11:03:13 -07:00
Himanshu	12ae84165e	remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaderClient.go(..) with InputStreamFullResponseHandler (#9717 ) * remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaadereClient.go(..) with InputStreamFullResponseHandler * remove ByteArrayResponseHolder dependency from JsonParserIterator * add UT to cover lines in InputStreamFullResponseHandler * refactor SystemSchema to reduce branches * further reduce branches * Revert "add UT to cover lines in InputStreamFullResponseHandler" This reverts commit `330aba3dd9`. * UTs for InputStreamFullResponseHandler * remove unused imports	2020-08-14 10:51:18 -07:00
Gian Merlino	6cca7242de	Add "offset" parameter to the Scan query. (#10233 ) * Add "offset" parameter to the Scan query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Fix constructor call. * Fix up JSONs. * Fix call to ScanQuery. * Doc update. * Fix javadocs. * Spotbugs, LGTM suppressions. * Javadocs. * Fix suppression. * Stabilize Scan query result order, add tests. * Update LGTM comment. * Fixup. * Test different batch sizes too. * Nicer tests. * Fix comment.	2020-08-13 14:56:24 -07:00
Gian Merlino	b6aaf59e8c	Add "offset" parameter to GroupBy query. (#10235 ) * Add "offset" parameter to GroupBy query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Stabilize GroupBy sorts. * Fix inspections. * Fix suppression. * Fixups. * Move TopNSequence to druid-core. * Addl comments. * NumberedElement equals verification. * Changes from review.	2020-08-05 15:39:58 -07:00
frank chen	646fa84d04	Support unit on byte-related properties (#10203 ) * support unit suffix on byte-related properties * add doc * change default value of byte-related properites in example files * fix coding style * fix doc * fix CI * suppress spelling errors * improve code according to comments * rename Bytes to HumanReadableBytes * add getBytesInInt to get value safely * improve doc * fix problem reported by CI * fix problem reported by CI * resolve code review comments * improve error message * improve code & doc according to comments * fix CI problem * improve doc * suppress spelling check errors	2020-07-31 09:58:48 +08:00
Jian Wang	271f90f205	Add segment pruning for hash based shard spec (#9810 ) * Add segment pruning for hash based partitioning * Update doc * Add additional test * Address comments * Fix unit test failure Co-authored-by: Jian Wang <jwang@pinterest.com>	2020-07-30 18:44:26 -07:00
Jihoon Son	6fdce36e41	Add integration tests for query retry on missing segments (#10171 ) * Add integration tests for query retry on missing segments * add missing dependencies; fix travis conf * address comments * Integration tests extension * remove unused dependency * remove druid_main * fix java agent port	2020-07-22 22:30:35 -07:00
Jihoon Son	26d099f39b	Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments (#10183 ) * Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments * fix tests and add missing tests * revert null handling fix * unused import * move out util methods from DiscoveryDruidNode	2020-07-21 17:52:51 -07:00
Suneet Saldanha	e6c9142129	Add validation for authenticator and authorizer name (#10106 ) * Add validation for authorizer name * fix deps * add javadocs * Do not use resource filters * Fix BasicAuthenticatorResource as well * Add integration tests * fix test * fix	2020-07-13 21:15:54 -07:00
Gian Merlino	eeaf609fc0	Update Jetty to 9.4.30.v20200611. (#10098 ) * Update Jetty to 9.4.30.v20200611. This is the latest version currently available in the 9.4.x line. * Various adjustments. * Class name fixes. * Remove unused HttpClientModule code. * Add coverage suppressions. * Another coverage suppression. * Fix wildcards.	2020-07-07 14:24:02 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Gian Merlino	ddda2a4f18	VersionedIntervalTimeline: Fix thread-unsafe call to "lookup". (#10130 )	2020-07-05 09:32:18 -07:00
Clint Wylie	a337ef351c	Closing yielder from ParallelMergeCombiningSequence should trigger cancellation (#10117 ) * cancel parallel merge combine sequence on yielder close * finish incomplete comment * Update core/src/test/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequenceTest.java Fixes checkstyle Co-authored-by: Jihoon Son <jihoonson@apache.org>	2020-07-01 14:07:44 -07:00
Mohammad Shoaib	84290a2332	Enabling Static Imports for Unit Testing DSLs (#331 ) (#9764 ) * Enabling Static Imports for Unit Testing DSLs (#331) Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com> * Feature 8885 - Enabling Static Imports for Unit Testing DSLs (#435) * Enabling Static Imports for Unit Testing DSLs * Using suppressions checkstyle to allow static imports only in the UTs Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com> * Removing the changes in the checkstyle because those are not needed Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>	2020-06-30 13:59:35 -07:00
Jihoon Son	8ef3598c05	Move shardSpec tests to core (#10079 ) * Move shardSpec tests to core * checkstyle * inject object mapper for testing * unused import	2020-06-29 17:31:37 -07:00
chenyuzhi459	a4c6d5f37e	fix query memory leak (#10027 ) * fix query memory leak * rollup ./idea * roll up .idea * clean code * optimize style * optimize cancel function * optimize style * add concurrentGroupTest test case * add test case * add unit test * fix code style * optimize cancell method use * format code * reback code * optimize cancelAll * clean code * add comment	2020-06-26 23:30:59 -07:00
Clint Wylie	4b99c6d3ef	ensure ParallelMergeCombiningSequence closes its closeables (#10076 ) * ensure close for all closeables of ParallelMergeCombiningSequence * revert unneeded change * consolidate methods * catch throwable instead of exception	2020-06-26 14:37:20 -07:00
Jihoon Son	c591ff8ea8	Add NonnullPair (#10013 ) * Add NonnullPair * new line * test * make it consistent	2020-06-26 09:52:06 -07:00
Jihoon Son	aaee72c781	Allow append to existing datasources when dynamic partitioning is used (#10033 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports * Allow append to existing datasources when dynamic partitioing is used * fix test * checkstyle * checkstyle * fix test * fix test * fix other tests.. * checkstyle * hansle unknown core partitions size in overlord segment allocation * fail to append when numCorePartitions is unknown * log * fix comment; rename to be more intuitive * double append test * cleanup complete(); add tests * fix build * add tests * address comments * checkstyle	2020-06-25 13:37:31 -07:00
Jihoon Son	d644a27f1a	Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle * Set core partition set size for hash-partitioned segments properly in batch ingestion * test for both parallel and single-threaded task * unused variables * fix test * unused imports * add hash/range buckets * some test adjustment and missing json serde * centralized partition id allocation in parallel and simple tasks * remove string partition chunk * revive string partition chunk * fill numCorePartitions for hadoop * clean up hash stuffs * resolved todos * javadocs * Fix tests * add more tests * doc * unused imports	2020-06-18 18:40:43 -07:00
Suneet Saldanha	4e483a70b4	ROUND and having comparators correctly handle special double values (#10014 ) * ROUND and having comparators correctly handle doubles Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real numbers. Because of this, they can not be converted to BigDecimal and instead throw a NumberFormatException. This change adds support for calculations that produce these numbers either for use in the `ROUND` function or the HavingSpecMetricComparator by not attempting to convert the number to a BigDecimal. The bug in ROUND was first introduced in #7224 where we added the ability to round to any decimal place. This PR changes the behavior back to using `Math.round` if we recognize a number that can not be converted to a BigDecimal. * Add tests and fix spellcheck * update error message in ExpressionsTest * Address comments * fix up round for infinity * round non numeric doubles returns a double * fix spotbugs * Update docs/misc/math-expr.md * Update docs/querying/sql.md	2020-06-16 16:09:46 -07:00
Suneet Saldanha	0035f39e25	lpad and rpad functions match postrges behavior in SQL compatible mode (#10006 ) * lpad and rpad functions deal with empty pad Return null if the pad string used by the `lpad` and `rpad` functions is an empty string * Fix rpad * Match PostgreSQL behavior in SQL compliant null handling mode * Match PostgreSQL behavior for pad -ve len * address review comments	2020-06-15 10:47:57 -07:00
Jihoon Son	9a10f8352b	Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012 ) * Fill in the core partition set size properly for batch ingestion with dynamic partitioning * incomplete javadoc * Address comments * fix tests * fix json serde, add tests * checkstyle	2020-06-12 21:39:37 -07:00
BIGrey	d4d0004338	Fix failed tests in TimestampParserTest when running locally (#9997 ) * fix failed tests in TimestampPaserTest due to timezone * remove unneeded -Duser.country=US Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com>	2020-06-10 09:19:38 -07:00
Atul Mohan	17cf8ea8f2	Add Sql InputSource (#9449 ) * Add Sql InputSource * Add spelling * Use separate DruidModule * Change module name * Fix docs * Use sqltestutils for tests * Add additional tests * Fix inspection * Add module test * Fix md in docs * Remove annotation Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-06-09 12:55:20 -07:00
Mainak Ghosh	bcc066a27f	Empty partitionDimension has less rollup compared to when explicitly specified (#9861 ) * Empty partitionDimension has less rollup compared to the case when it is explicitly specified * Adding a unit test for the empty partitionDimension scenario. Fixing another test which was failing * Fixing CI Build Inspection Issue * Addressing all review comments * Updating the javadocs for the hash method in HashBasedNumberedShardSpec	2020-06-05 12:42:42 -07:00
Xavier Léauté	a934b2664c	remove ListenableFutures and revert to using the Guava implementation (#9944 ) This change removes ListenableFutures.transformAsync in favor of the existing Guava Futures.transform implementation. Our own implementation had a bug which did not fail the future if the applied function threw an exception, resulting in the future never completing. An attempt was made to fix this bug, however when running againts Guava's own tests, our version failed another half dozen tests, so it was decided to not continue down that path and scrap our own implementation. Explanation for how was this bug manifested itself: An exception thrown in BaseAppenderatorDriver.publishInBackground when invoked via transformAsync in StreamAppenderatorDriver.publish will cause the resulting future to never complete. This explains why when encountering https://github.com/apache/druid/issues/9845 the task will never complete, forever waiting for the publishFuture to register the handoff. As a result, the corresponding "Error while publishing segments ..." message only gets logged once the index task times out and is forcefully shutdown when the future is force-cancelled by the executor.	2020-06-03 10:46:03 -07:00
Gian Merlino	3d81564a14	Fix various processing buffer leaks and simplify BlockingPool. (#9928 ) * - GroupByQueryEngineV2: Fix leak of intermediate processing buffer when exceptions are thrown before result sequence is created. - PooledTopNAlgorithm: Fix leak of intermediate processing buffer when exceptions are thrown before the PooledTopNParams object is created. - BlockingPool: Remove unused "take" methods. * Add tests to verify that buffers have been returned.	2020-06-02 18:26:18 -07:00
Gian Merlino	309fc04d54	Fix various Yielder leaks. (#9934 ) * Fix various Yielder leaks. - CombiningSequence leaked the input yielder from "toYielder" if it ran into an exception while accumulating the last value from the input yielder. - MergeSequence leaked input yielders from "toYielder" if it ran into an exception while building the initial priority queue. - ScanQueryRunnerFactory leaked the input yielder in its "priorityQueueSortAndLimit" strategy if it ran into an exception while scanning and sorting. - YieldingSequenceBase.accumulate chomped IOExceptions thrown in "accumulate" during yielder closing. * Add tests. * Fix braces.	2020-06-02 18:26:06 -07:00
Samarth Jain	82e5b0573e	Number based columns representing time in custom format cannot be used as timestamp column in Druid. (#9877 ) * Number based columns representing time in custom format cannot be used as timestamp column in Druid. Prior to this fix, if an integer column in parquet is storing dateint in format yyyyMMdd, it cannot be used as timestamp column in Druid as the timestamp parser interprets it as a number storing UTC time instead of treating it as a number representing time in yyyyMMdd format. Data formats like TSV or CSV don't suffer from this problem as the timestamp is passed in an as string which the timestamp parser is able to parse correctly.	2020-05-18 11:17:28 -07:00
Clint Wylie	2e9548d93d	refactor SeekableStreamSupervisor usage of RecordSupplier (#9819 ) * refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting * fix style and test * cleanup, refactor, javadocs, test * fixes * keep collecting current offsets and lag if unhealthy in background reporting thread * review stuffs * add comment	2020-05-16 14:09:39 -07:00
Jihoon Son	46beaa0640	Fix potential resource leak in ParquetReader (#9852 ) * Fix potential resource leak in ParquetReader * add test * never thrown exception * catch potential exceptions	2020-05-16 09:57:12 -07:00
Suneet Saldanha	b0167295d7	Fail incorrectly constructed join queries (#9830 ) * Fail incorrectly constructed join queries * wip annotation for equals implementations * Add equals tests * fix tests * Actually fix the tests * Address review comments * prohibit Pattern.hashCode()	2020-05-13 14:23:04 -07:00
mcbrewster	28be107a1c	add flag to flattenSpec to keep null columns (#9814 ) * add flag to flattenSpec to keep null columns * remove changes to inputFormat interface * add comment * change comment message * update web console e2e test * move keepNullColmns to JSONParseSpec * fix merge conflicts * fix tests * set keepNullColumns to false by default * fix lgtm * change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns * Add equals verifier tests	2020-05-08 21:53:39 -07:00
Jihoon Son	6674d721bc	Avoid sorting values in InDimFilter if possible (#9800 ) * Avoid sorting values in InDimFilter if possible * tests * more tests * fix and and or filters * fix build * false and true vector matchers * fix vector matchers * checkstyle * in filter null handling * remove wrong test * address comments * remove unnecessary null check * redundant separator * address comments * typo * tests	2020-05-06 15:26:36 -07:00
Jihoon Son	964a1fc9df	Remove ParseSpec.toInputFormat() (#9815 ) * Remove toInputFormat() from ParseSpec * fix test	2020-05-05 11:17:57 -07:00
Jihoon Son	c6caae9a24	Fix filtering on boolean values in transformation (#9812 ) * Fix filter on boolean value in Transform * assert * more descriptive test * remove assert * add assert for cached string; disable tests * typo	2020-05-04 18:47:10 -07:00
Aleksey Plekhanov	9341ea828a	Fixed flaky BlockingPoolTest.testConcurrentTakeBatch() (#9692 )	2020-05-03 12:54:27 -07:00
Clint Wylie	7711f776a0	fix issue where CloseableIterator.flatMap does not close inner CloseableIterator (#9761 ) * fix issue where CloseableIterator.flatMap does not close inner CloseableIterator * more test * style * clarify test	2020-04-24 13:52:50 -07:00
Jihoon Son	7fa72fbf15	Initialize SettableByteEntityReader only when inputFormat is not null (#9734 ) * Lazy initialization of SettableByteEntityReader to avoid NPE * toInputFormat for tsv * address comments * common code	2020-04-24 10:22:51 -07:00
Suneet Saldanha	1ced3b33fb	IntelliJ inspections cleanup (#9339 ) * IntelliJ inspections cleanup * Standard Charset object can be used * Redundant Collection.addAll() call * String literal concatenation missing whitespace * Statement with empty body * Redundant Collection operation * StringBuilder can be replaced with String * Type parameter hides visible type * fix warnings in test code * more test fixes * remove string concatenation inspection error * fix extra curly brace * cleanup AzureTestUtils * fix charsets for RangerAdminClient * review comments	2020-04-10 10:04:40 -07:00
Gian Merlino	75c543b50f	SQL: More straightforward handling of join planning. (#9648 ) * SQL: More straightforward handling of join planning. Two changes that simplify how joins are planned: 1) Stop using JoinProjectTransposeRule as a way of guiding subquery removal. Instead, add logic to DruidJoinRule that identifies removable subqueries and removes them at the point of creating a DruidJoinQueryRel. This approach reduces the size of the planning space and allows the planner to complete quickly. 2) Remove rules that reorder joins. Not because of an impact on the planning time (it seems minimal), but because the decisions that the planner was making in the new tests were sometimes worse than the user-provided order. I think we'll need to go with the user-provided order for now, and revisit reordering when we can add more smarts to the cost estimator. A third change updates numeric ExprEval classes to store their value as a boxed type that corresponds to what it is supposed to be. This is useful because it affects the behavior of "asString", and is included in this patch because it is needed for the new test "testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an actual double would. Fixes #9646. * Fix comments. * Fix tests.	2020-04-09 16:21:43 -07:00
Clint Wylie	d267b1c414	check paths used for shuffle intermediary data manager get and delete (#9630 ) * check paths used for shuffle intermediary data manager get and delete * add test * newline * meh	2020-04-07 09:47:18 -07:00
Jihoon Son	82ce60b5c1	Reuse transformer in stream indexing (#9625 ) * Reuse transformer in stream indexing * remove unused method * memoize complied pattern	2020-04-06 16:36:08 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Himanshu	5604ac7963	druid extension for OpenID Connect auth using pac4j lib (#8992 ) * druid pac4j security extension for OpenID Connect OAuth 2.0 authentication * update version in druid-pac4j pom * introducing unauthorized resource filter * authenticated but authorized /unified-webconsole.html * use httpReq.getRequestURI() for matching callback path * add documentation * minor doc addition * licesne file updates * make dependency analyze succeed * fix doc build * hopefully fixes doc build * hopefully fixes license check build * yet another try on fixing license build * revert unintentional changes to website folder * update version to 0.18.0-SNAPSHOT * check session and its expiry on each request * add crypto service * code for encrypting the cookie * update doc with cookiePassphrase * update license yaml * make sessionstore in Pac4jFilter private non static * make Pac4jFilter fields final * okta: use sha256 for hmac * remove incubating * add UTs for crypto util and session store impl * use standard charsets * add license header * remove unused file * add org.objenesis.objenesis to license.yaml * a bit of nit changes in CryptoService and embedding EncryptionResult for clarity * rename alg to cipherAlgName * take cipher alg name, mode and padding as input * add java doc for CryptoService and make it more understandable * another UT for CryptoService * cache pac4j Config * use generics clearly in Pac4jSessionStore * update cookiePassphrase doc to mention PasswordProvider * mark stuff Nullable where appropriate in Pac4jSessionStore * update doc to mention jdbc * add error log on reaching callback resource * javadoc for Pac4jCallbackResource * introduce NOOP_HTTP_ACTION_ADAPTER * add correct module name in license file * correct extensions folder name in licenses.yaml * replace druid-kubernetes-extensions to druid-pac4j * cache SecureRandom instance * rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter	2020-03-23 18:15:45 -07:00
Chi Cao Minh	6b02991464	Match GREATEST/LEAST function behavior to other DBs (#9488 ) * Match GREATEST/LEAST function behavior Change the behavior of the GREATEST / LEAST functions to be similar to how it is implemented in other databases (as functions instead of aggregators). The GREATEST/LEAST functions are not in the SQL standard, but users will expect behavior similar to what other databases provide. * Match postgres behavior & handle more SQL types * Fix imports	2020-03-12 15:10:11 -07:00
Jihoon Son	7401bb3f93	Improve OvershadowableManager performance (#9441 ) * Use the iterator instead of higherKey(); use the iterator API instead of stream * Fix tests; fix a concurrency bug in timeline * fix test * add tests for findNonOvershadowedObjectsInInterval * fix test * add missing tests; fix a bug in QueueEntry * equals tests * fix test	2020-03-10 13:22:19 -07:00
zachjsh	7e0e767cc2	Ability to Delete task logs and segments from S3 (#9459 ) * Ability to Delete task logs and segments from S3 * implement ability to delete all tasks logs or all task logs written before a particular date when written to S3 * implement ability to delete all segments from S3 deep storage * upgrade version of aws SDK in use * * update licenses for updated AWS SDK version * * fix bug in iterating through results from S3 * revert back to original version of AWS SDK * * Address review comments * * Fix failing dependency check	2020-03-10 13:13:46 -07:00
Gian Merlino	c6c2282b59	Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484 ) * Harmonization and bug-fixing for selector and filter behavior on unknown types. - Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory system, and set defaultType COMPLEX so unknown types can be dynamically matched. - Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing. - Switch various methods to use convertObjectToX when casting to numbers, rather than ad-hoc and inconsistent logic. - Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return true even for 0- or 1- element arrays. - Adjust various javadocs. * Add throwParseExceptions option to Rows.objectToNumber, switch back to that. * Update tests. * Adjust moment sketch tests.	2020-03-10 07:15:57 -07:00
Jihoon Son	75e2051195	Convert array_contains() and array_overlaps() into native filters if possible (#9487 ) * Convert array_contains() and array_overlaps() into native filters if possible * make spotbugs happy and fix null results when null compatible	2020-03-09 22:50:38 -07:00
Jihoon Son	9466ac7c9b	Skip empty files for local, hdfs, and cloud input sources (#9450 ) * Skip empty files for local, hdfs, and cloud input sources * split hint spec doc * doc for skipping empty files * fix typo; adjust tests * unnecessary fluent iterable * address comments * fix test * use the right lists * fix test * fix test	2020-03-03 20:51:06 -08:00
Gian Merlino	ae617bf5dd	Clarify InputSource.isSplittable usage. (#9424 ) Also removes TimedShutoffInputSource, which had a bug in isSplittable (it improperly returned true, even though it didn't implement SplittableInputSource). This bug had no user-visible impact, since the code wasn't used.	2020-02-26 22:05:46 -08:00
Jihoon Son	3bc7ae782c	Create splits of multiple files for parallel indexing (#9360 ) * Create splits of multiple files for parallel indexing * fix wrong import and npe in test * use the single file split in tests * rename * import order * Remove specific local input source * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * Update docs/ingestion/native-batch.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc and error msg * fix build * fix a test and address comments Co-authored-by: sthetland <steve.hetland@imply.io>	2020-02-24 17:34:39 -08:00
Clint Wylie	6d8dd5ec10	string -> expression -> string -> expression (#9367 ) * add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays * oops, macros are expressions too * style * spotbugs * qualified type arrays * review stuffs * simplify grammar * more permissive array parsing * reuse expr joiner * fix it	2020-02-21 15:43:02 -08:00
zachjsh	f707064bed	Add Azure config options for segment prefix and max listing length (#9356 ) * Add Azure config options for segment prefix and max listing length Added configuration options to allow the user to specify the prefix within the segment container to store the segment files. Also added a configuration option to allow the user to specify the maximum number of input files to stream for each iteration. * * Fix test failures * * Address review comments * * add dependency explicitly to pom * * update docs * * Address review comments * * Address review comments	2020-02-21 14:12:03 -08:00
Jihoon Son	141d8dd875	Enable druid.coordinator.kill.pendingSegments.on by default (#9385 ) * Enable druid.coordinator.kill.pendingSegments.on by default * checkstyle	2020-02-21 13:13:49 -08:00
Chi Cao Minh	e7eb45e648	Run IntelliJ inspections on Travis (#9179 ) * Run IntelliJ inspections on Travis Running IntelliJ inspections currently takes about 90 minutes, but they can be run in about 30 minutes on Travis. * Restore assert statements	2020-02-19 11:34:19 +03:00
Clint Wylie	b1be88d79c	fix Expressions.toQueryGranularity to be more correct, improve javadocs of Expr.getIdentifierIfIdentifier and Expr.getBindingIfIdentifier (#9363 )	2020-02-16 08:36:40 -08:00
zachjsh	5c202343c9	implement Azure InputSource reader and deprecate Azure FireHose (#9306 ) * IMPLY-1946: Improve code quality and unit test coverage of the Azure extension * Update unit tests to increase test coverage for the extension * Clean up any messy code * Enfore code coverage as part of tests. * * Update azure extension pom to remove unnecessary things * update jacoco thresholds * * updgrade version of azure-storage library version uses to most upto-date version * implement Azure InputSource reader and deprecate Azure FireHose * implement azure InputSource reader * deprecate Azure FireHose implementation * * exclude common libraries that are included from druid core * Implement more of Azure input source. * * Add tests * * Add more tests * * deprecate azure firehose * * added more tests * * rollback fix for google cloud batch ingestion bug. Will be fixed in another PR. * * Added javadocs for all azure related classes * Addressed review comments * * Remove dependency on org.apache.commons:commons-collections4 * Fix LGTM warnings * Add com.google.inject.extensions:guice-assistedinject to licenses * * rename classes as suggested in review comments * * Address review comments * * Address review comments * * Address review comments	2020-02-11 17:41:58 -08:00
Chi Cao Minh	e8146d5914	More superbatch range partitioning tests (#9266 ) More functional tests to cover handling of input data that has a partition dimension that contains: 1) Null values: Should be in first partition 2) Multi values: Should cause superbatch task to abort	2020-02-10 15:17:53 -08:00
Suneet Saldanha	51d7864935	Codestyle - use java style array declaration (#9338 ) * Codestyle - use java style array declaration Replaced C-style array declarations with java style declarations and marked the intelliJ inspection as an error * cleanup test code	2020-02-10 14:25:26 -08:00
Clint Wylie	831ec172f1	Logging large segment list handling (#9312 ) * better handling of large segment lists in logs * more * adjust * exceptions * fixes * refactor * debug * heh * dang	2020-02-07 21:42:45 -08:00
Jihoon Son	e81230f9ab	Refactoring some codes around ingestion (#9274 ) * Refactoring codes around ingestion: - Parallel index task and simple task now use the same segment allocator implementation. This is reusable for the future implementation as well. - Added PartitionAnalysis to store the analysis of the partitioning - Move some util methods to SegmentLockHelper and rename it to TaskLockHelper * fix build * fix SingleDimensionShardSpecFactory * optimize SingledimensionShardSpecFactory * fix test * shard spec builder * import order * shardSpecBuilder -> partialShardSpec * build -> complete * fix comment; add unit tests for partitionBoundaries * add more tests and fix javadoc * fix toString(); add serde tests for HashBasedNumberedPartialShardSpec and SegmentAllocateAction * fix test * add equality test for hash and range partial shard specs	2020-02-07 16:23:07 -08:00
Lucas Capistrant	53bb45fc9a	Forbid easily misused HashSet and HashMap constructors (#9165 ) * Forbid easily misused HashSet and HashMap constructors * Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them * Fix visibility of constant in CollectionUtils.java * Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used * revert changes to sql module tests that should be in separate PR * Finish reverting changes to sql module tests that were flagged in checkstyle during CI * Add netty dependency resulting from SupressForbidden	2020-02-07 10:44:09 +03:00
Gian Merlino	0f0554f8fa	LimitedSequence: Improve suppression comment. (#9298 )	2020-01-31 16:21:08 -08:00
Gian Merlino	7d91b8f281	Suppress false-alarm inspection. (#9297 ) I think a mid-air collision between #9260 and #9293 has led to master being unable to pass insepctions in TeamCity. Hopefully this fixes it.	2020-01-31 09:24:21 -08:00
Gian Merlino	07a91f9022	Fix early return from YieldingSequenceBase#accumulate. (#9293 ) Fixes #9291.	2020-01-30 12:01:18 -08:00
Suneet Saldanha	303b02eba1	intelliJ inspections cleanup (#9260 ) * intelliJ inspections cleanup - remove redundant escapes - performance warnings - access static member via instance reference - static method declared final - inner class may be static Most of these changes are aesthetic, however, they will allow inspections to be enabled as part of CI checks going forward The valuable changes in this delta are: - using StringBuilder instead of string addition in a loop indexing-hadoop/.../Utils.java processing/.../ByteBufferMinMaxOffsetHeap.java - Use class variables instead of static variables for parameterized test processing/src/.../ScanQueryLimitRowIteratorTest.java * Add intelliJ inspection warnings as errors to druid profile * one more static inner class	2020-01-29 11:50:52 -08:00
Suneet Saldanha	0ccfe5ca89	Expose JoinableFactory through Guice Bindings (#9271 ) * Make JoinableFactory an extension point This change makes it so that extensions can register a JoinableFactory that should be used for a DataSource. Extensions can provide the factories via DruidBinders#joinableFactoryBinder Known DataSources - like InlineDataSource are provided in the JoinableFactoryModule. This module installs a FactoryWarehouse that is used to decide which factory should be used to generate the Joinable for the provided DataSource. The ExtensionPoint is marked as Beta since it is not yet clear if this needs to remain available to other extensions or if the best way to register a factory is by using the datasource class. * Add module test * remove useless bindings in test * remove ExtensionPoint annotation * Make LifecycleLock not final to help with testing	2020-01-28 13:59:06 -08:00
Roman Leventov	b9186f8f9f	Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306 ) * Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error * Fix brace * Import order * Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill * Fix tests * Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY * More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters * Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig * More variable and method renames * Rename MetadataSegments to SegmentsMetadata * Javadoc update * Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs * Update Javadoc of VersionedIntervalTimeline.iterateAllObjects() * Reorder imports * Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers * Complete merge * Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests * Remove MetadataSegmentManager * Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments * Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder * Fix inspections * Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest * Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods * Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator * Unused import * Optimize imports * Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata() * Unused import * Update terminology in datasource-view.tsx * Fix label in datasource-view.spec.tsx.snap * Fix lint errors in datasource-view.tsx * Doc improvements * Another attempt to please TSLint * Another attempt to please TSLint * Style fixes * Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge) * Try to fix docs build issue * Javadoc and spelling fixes * Rename SegmentsMetadata to SegmentsMetadataManager, address other comments * Address more comments	2020-01-27 11:24:29 -08:00
Gian Merlino	19b427e8f3	Add JoinableFactory interface and use it in the query stack. (#9247 ) * Add JoinableFactory interface and use it in the query stack. Also includes InlineJoinableFactory, which enables joining against inline datasources. This is the first patch where a basic join query actually works. It includes integration tests. * Fix test issues. * Adjustments from code review.	2020-01-24 13:10:01 -08:00
Clint Wylie	8011211a0c	first/last aggregators and nulls (#9161 ) * null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs * initially null or not based on config * review stuff, make string first/last consistent with null handling of numeric columns, more tests * docs * handle nil selectors, revert to primitive first/last types so groupby v1 works...	2020-01-20 11:51:54 -08:00
Jihoon Son	84ff0d2352	Fix TSV bugs (#9199 ) * working * - support multi-char delimiter for tsv - respect "delimiter" property for tsv * default value check for findColumnsFromHeader * remove CSVParser to have a true and only CSVParser * fix tests * fix another test	2020-01-17 15:35:14 -08:00
Gian Merlino	448da78765	Speed up String first/last aggregators when folding isn't needed. (#9181 ) * Speed up String first/last aggregators when folding isn't needed. Examines the value column, and disables fold checking via a needsFoldCheck flag if that column can't possibly contain SerializableLongStringPairs. This is helpful because it avoids calling getObject on the value selector when unnecessary; say, because the time selector didn't yield an earlier or later value. * PR comments. * Move fastLooseChop to StringUtils.	2020-01-16 21:02:02 -08:00
Maytas Monsereenusorn	42359c93dd	Implement ANY aggregator (#9187 ) * Implement ANY aggregator * Add copyright headers * Add unit tests * fix BufferAggregator * Fix bug in BufferAggregator * hook up the SQL command * add check for buffer aggregator * Address comment * address comments * add docs * Address comments * add more tests for numeric columns that have null values when run in sql compatible null mode * fix checkstyle errors * fix failing tests * fix failing tests	2020-01-16 14:40:32 -08:00
Gian Merlino	a87db7f353	Add HashJoinSegment, a virtual segment for joins. (#9111 ) * Add HashJoinSegment, a virtual segment for joins. An initial step towards #8728. This patch adds enough functionality to implement a joining cursor on top of a normal datasource. It does not include enough to actually do a query. For that, future patches will need to wire this low-level functionality into the query language. * Fixups. * Fix missing format argument. * Various tests and minor improvements. * Changes. * Remove or add tests for unused stuff. * Fix up package locations.	2020-01-16 13:14:20 -08:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Jihoon Son	298425a33a	Fix handling interruptedException in resource pool (#9044 )	2019-12-16 09:41:13 -08:00
Himanshu	45101183bc	HRTR: make pending task execution handling to go through all tasks on not finding worker slots (#8697 ) * HRTR: make pending task execution handling to go through all tasks on not finding worker slots * make HRTR methods package private that are meant to be used only in HttpRemoteTaskRunnerResource * mark HttpRemoteTaskRunnerWorkItem.State global variables final * hrtr: move immutableWorker NULL check outside of try-catch or finally block could have NPE * add some explanatory comments * add comment on explaining mechanics around hand off of pending tasks from submission to it getting picked up by a task execution thread * fix spelling	2019-12-12 14:58:52 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Chi Cao Minh	3de7ab8523	DataSketches jars in core (#9003 ) Having DataSketches jars in core will allow potential improvements, for example: - Provide an alternative implementation of HLL: https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html - Range partitioning for native parallel batch indexing without having the user load extensions on the classpath Dev mailing list discussion: https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E	2019-12-10 14:02:34 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Clint Wylie	4327892b84	modify multi-value expression transformation behavior to not treat re-use of the same input as a candidate for cartesian mapping (#8957 )	2019-12-09 20:38:15 -08:00
Rye	ca77d576c6	add customize separator for TSV inputFormat (#8993 ) * add customize separator for TSV inputFormat * fix spotbug * code refactor * code refactor * add argument check for delimiter * refine null check * add check for delimiter and listdelimiter can not be same * add unit tests	2019-12-09 11:24:09 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Clint Wylie	06cd30460e	add query metrics for broker parallel merges, off by default (#8981 ) * add a bunch of metrics for broker parallel merges, off by default, and tests * fix tests * review stuffs * propogateIfPossible	2019-12-06 13:42:53 -08:00
Clint Wylie	ca2a7a1f08	more flush timeout for emitter tests (#8991 ) * more flush timeout for emitter tests * share constant	2019-12-05 16:52:35 -08:00
Chi Cao Minh	af74acaa85	Address security vulnerabilities CVSS >= 7 (#8980 ) * Address security vulnerabilities CVSS >= 7 Update dependencies to address security vulnerabilities with CVSS scores of 7 or higher. A new Travis CI job is added to prevent new high/critical security vulnerabilities from being added. Updated dependencies: - api-util 1.0.0 -> 1.0.3 - jackson 2.9.10 -> 2.10.1 - kafka 2.1.0 -> 2.1.1 - libthrift 0.10.0 -> 0.13.0 - protobuf 3.2.0 -> 3.11.0 The following high/critical security vulnerabilities are currently suppressed (so that the new Travis CI job can be added now) and are left as future work to fix: - hibernate-validator:5.2.5 - jackson-mapper-asl:1.9.13 - libthrift:0.6.1 - netty:3.10.6 - nimbus-jose-jwt:4.41.1 * Rename EDL1 license file * Fix inspection errors	2019-12-05 14:34:35 -08:00
Clint Wylie	5ecdf94d83	add 'prefixes' support to google input source (#8930 ) * add prefixes support to google input source, making it symmetrical-ish with s3 * docs * more better, and tests * unused * formatting * javadoc * dependencies * oops * review comments * better javadoc	2019-12-04 21:01:10 -08:00
Jihoon Son	86e8903523	Support orc format for native batch ingestion (#8950 ) * Support orc format for native batch ingestion * fix pom and remove wrong comment * fix unnecessary condition check * use flatMap back to handle exception properly * move exceptionThrowingIterator to intermediateRowParsingReader * runtime	2019-11-28 12:45:24 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Clint Wylie	923c003213	add flush timeout to emitter test (#8963 )	2019-11-27 19:30:09 -08:00
Atul Mohan	a5b40a6099	Remove null handling check (#8960 )	2019-11-27 12:09:33 -08:00
Chi Cao Minh	fba876b607	Update jackson to 2.9.10 (#8940 ) Addresses security vulnerabilities: - sonatype-2016-0397: https://github.com/FasterXML/jackson-core/issues/315 - sonatype-2017-0355: https://github.com/FasterXML/jackson-core/pull/322	2019-11-26 21:41:14 -08:00
Clint Wylie	4458113375	S3 input source (#8903 ) * add s3 input source for native batch ingestion * add docs * fixes * checkstyle * lazy splits * fixes and hella tests * fix it * re-use better iterator * use key * javadoc and checkstyle * exception * oops * refactor to use S3Coords instead of URI * remove unused code, add retrying stream to handle s3 stream * remove unused parameter * update to latest master * use list of objects instead of object * serde test * refactor and such * now with the ability to compile * fix signature and javadocs * fix conflicts yet again, fix S3 uri stuffs * more tests, enforce uri for bucket * javadoc * oops * abstract class instead of interface * null or empty * better error	2019-11-25 22:31:19 -08:00
Jihoon Son	a2e6de4b16	Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924 ) * Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor * Fix docs and javadoc * Add unit tests for large or small estimated num splits * add override	2019-11-23 01:38:08 -08:00
Gian Merlino	e0eb85ace7	Add FileUtils.createTempDir() and enforce its usage. (#8932 ) * Add FileUtils.createTempDir() and enforce its usage. The purpose of this is to improve error messages. Previously, the error message on a nonexistent or unwritable temp directory would be "Failed to create directory within 10,000 attempts". * Further updates. * Another update. * Remove commons-io from benchmark. * Fix tests.	2019-11-22 19:48:49 -08:00
Rye	0514e5686e	add TsvInputFormat (#8915 ) * add TsvInputFormat * refactor code * fix grammar * use enum replace string literal * code refactor * code refactor * mark abstract for base class meant not to be instantiated * remove constructor for test	2019-11-22 18:01:40 -08:00
Clint Wylie	7250010388	add parquet support to native batch (#8883 ) * add parquet support to native batch * cleanup * implement toJson for sampler support * better binaryAsString test * docs * i hate spellcheck * refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest * add comment, fix some stuff * adjustments * fix accident * tweaks	2019-11-22 10:49:16 -08:00
Jihoon Son	934547a215	RetryingInputEntity to retry on transient errors (#8923 ) * RetryingInputEntity to retry on transient errors * fix some javadoc and httpEntity * Make it interface * Javadoc for offset	2019-11-21 21:32:18 -08:00
Jonathan Wei	dc6178d1f2	Upgrade Calcite to 1.21 (#8566 ) * Upgrade Calcite to 1.21 * Checkstyle, test fix' * Exclude calcite yaml deps, update license.yaml * Add method for exception chain handling * Checkstyle * PR comments, Add outer limit context flag * Revert project settings change * Update subquery test comment * Checkstyle fix * Fix test in sql compat mode * Fix test * Fix dependency analysis * Address PR comments * Checkstyle * Adjust testSelectStarFromSelectSingleColumnWithLimitDescending	2019-11-20 21:22:55 -08:00
Chi Cao Minh	ff6217365b	Refactor parallel indexing perfect rollup partitioning (#8852 ) * Refactor parallel indexing perfect rollup partitioning Refactoring to make it easier to later add range partitioning for perfect rollup parallel indexing. This is accomplished by adding several new base classes (e.g., PerfectRollupWorkerTask) and new classes for encapsulating logic that needs to be changed for different partitioning strategies (e.g., IndexTaskInputRowIteratorBuilder). The code is functionally equivalent to before except for the following small behavior changes: 1) PartialSegmentMergeTask: Previously, this task had a priority of DEFAULT_TASK_PRIORITY. It now has a priority of DEFAULT_BATCH_INDEX_TASK_PRIORITY (via the new PerfectRollupWorkerTask base class), since it is a batch index task. 2) ParallelIndexPhaseRunner: A decorator was added to subTaskSpecIterator to ensure the subtasks are generated with unique ids. Previously, only tests (i.e., MultiPhaseParallelIndexingTest) would have this decorator, but this behavior is desired for non-test code as well. * Fix forbidden apis and pmd warnings * Fix analyze dependencies warnings * Fix IndexTask json and add IT diags * Fix parallel index supervisor<->worker serde * Fix TeamCity inspection errors/warnings * Fix TeamCity inspection errors/warnings again * Integrate changes with those from #8823 * Address review comments * Address more review comments * Fix forbidden apis * Address more review comments	2019-11-20 17:24:12 -08:00
Jihoon Son	ac6d703814	Support inputFormat and inputSource for sampler (#8901 ) * Support inputFormat and inputSource for sampler * Cleanup javadocs and names * fix style * fix timed shutoff input source reader * fix timed shutoff input source reader again * tidy up timed shutoff reader * unused imports * fix tc	2019-11-20 14:51:25 -08:00
Clint Wylie	3fcaa1a61b	fix sql compatible null handling config work with runtime.properties (#8876 ) * fix sql compatible null handling config work with runtime.properties * fix npe * fix tests * add friendly error * comment, and friendlier still * fix compile * fix from merges	2019-11-20 03:55:29 -08:00
Atul Mohan	f5fbd0bea0	Handle missing values for delimited text files when Nullhandling is enabled (#8779 ) * Handle missing values * Fix multi value tests * Fix firehose tests * Fix conflicts	2019-11-19 22:35:22 -08:00
Chi Cao Minh	4ae6466ae2	HDFS input source (#8899 ) * HDFS input source Add support for using HDFS as an input source. In this version, commas or globs are not supported in HDFS paths. * Fix forbidden api * Address review comments	2019-11-19 22:19:39 -08:00
Clint Wylie	074a45219d	add google cloud storage InputSource for native batch (#8907 ) * add google cloud storage InputSource for native batch * rename * checkstyle * fix * fix spelling * review comments	2019-11-19 19:49:43 -08:00
Gian Merlino	c44452f0c1	Tidy up lifecycle, query, and ingestion logging. (#8889 ) * Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.	2019-11-19 13:57:58 -08:00
Clint Wylie	7fa3182fe5	refactor InputFormat and InputEntityReader implementations (#8875 ) * refactor InputFormat and InputReader to supply InputEntity and temp dir to constructors instead of read/sample * fix style	2019-11-15 17:08:26 -08:00
Jihoon Son	1611792855	Add InputSource and InputFormat interfaces (#8823 ) * Add InputSource and InputFormat interfaces * revert orc dependency * fix dimension exclusions and failing unit tests * fix tests * fix test * fix test * fix firehose and inputSource for parallel indexing task * fix tc * fix tc: remove unused method * Formattable * add needsFormat(); renamed to ObjectSource; pass metricsName for reader * address comments * fix closing resource * fix checkstyle * fix tests * remove verify from csv * Revert "remove verify from csv" This reverts commit `1ea7758489`. * address comments * fix import order and javadoc * flatMap * sampleLine * Add IntermediateRowParsingReader * Address comments * move csv reader test * remove test for verify * adjust comments * Fix InputEntityIteratingReader * rename source -> entity * address comments	2019-11-15 09:22:09 -08:00
Rye	00f6a56370	Use RFC4180Parser as CSVParser (#8803 ) * Use RFC4180Parser as CSVParser, add unit test * change test file location, use assertEquals	2019-11-13 12:44:37 -08:00
Clint Wylie	cc54b2a9df	support for array expressions in TransformSpec with ExpressionTransform (#8744 ) * transformSpec + array expressions changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes * remove docs about being unsupported * re-arrange test assert * unused imports * imports * fix tests * preserve types * suppress warning, fixes, add test * formatting * cleanup * better list to array type conversion and tests * fix oops	2019-11-13 11:04:37 -08:00
Gian Merlino	c204d68376	Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 ) There is a class of bugs due to the fact that BaseObjectColumnValueSelector has both "getObject" and "isNull" methods, but in most selector implementations and most call sites, it is clear that the intent of "isNull" is only to apply to the primitive getters, not the object getter. This makes sense, because the purpose of isNull is to enable detection of nulls in otherwise-primitive columns. Imagine a string column with a numeric selector built on top of it. You would want it to return isNull = true, so numeric aggregators don't treat it as all zeroes. Sometimes this design leads people to accidentally guard non-primitive get methods with "selector.isNull" checks, which is improper. This patch has three goals: 1) Fix null-handling bugs that already exist in this class. 2) Make interface and doc changes that reduce the probability of future bugs. 3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast aggregators while fixing null-handling bugs. I thought about splitting this into its own patch, but it ended up being tough to split from the null-handling fixes. For (1) the fixes are, - Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject calls on isNull, by no longer extending NullableAggregatorFactory. Now uses -1 as a sigil value for null, to differentiate nulls and empty strings. - Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use eval.asBoolean() to avoid calling getLong on the selector after already calling getObject. - Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls on isNull. Also, refactored slightly to avoid the overhead of calling getObject followed by another getter (see BloomFilterAggregatorFactory for part of this). For (2) the main changes are, - Remove the "isNull" method from BaseObjectColumnValueSelector. - Clarify "isNull" doc on BaseNullableColumnValueSelector. - Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize that it only works on aggregators that take numbers as input. - Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner. - Similar naming changes to helper methods for groupBy, ValueMatchers, etc. For (3) the other fixes for StringFirst and StringLastAggregatorFactory are, - Fixed buffer overrun in the buffer aggregators when some characters in the string code into more than one byte (the old code used "substring" to apply a byte limit, which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method. - Fixed weird IncrementalIndex logic that led to reading nulls for the timestamp. - Adjusted weird StringFirst/Last logic that worked around the weird IncrementalIndex behavior. - Refactored to share code between the four aggregators. - Improved test coverage. - Made the base stringFirst, stringLast aggregators adaptive, and streamlined the xFold versions into aliases. The adaptiveness is similar to how other aggregators like hyperUnique work.	2019-11-07 17:46:59 -08:00
Clint Wylie	7aafcf8bca	parallel broker merges on fork join pool (#8578 ) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt	2019-11-07 11:58:46 -08:00
Zhenxiao Luo	fca23d0c32	use copy-on-write list in InMemoryAppender (#8808 ) * use copy-on-write synchronized list in InMemoryAppender * use copy-on-write list in InMemoryAppender * Fix comment	2019-11-07 21:11:40 +03:00
Roman Leventov	5c0fc0a13a	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 ) * IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals() * Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever * Fix tests * More fixes * Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code * Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase * More test fixes * More test fixes * Add a comment to VersionedIntervalTimelineTestBase * Fix tests * Set DataSegment.size(0) in more tests * Specify DataSegment.size(0) in more places in tests * Fix more tests * Fix DruidSchemaTest * Set DataSegment's size in more tests and benchmarks * Fix HdfsDataSegmentPusherTest * Doc changes addressing comments * Extended doc for visibility * Typo * Typo 2 * Address comment	2019-11-06 11:07:04 -08:00
Jihoon Son	511fa74fa2	Move maxFetchRetry to FetchConfig; rename OpenObject (#8776 )	2019-11-04 08:26:33 -08:00
Clint Wylie	3ff5e02237	remove select query (#8739 ) * remove select query * thanks teamcity * oops * oops * add back a SelectQuery class that throws RuntimeExceptions linking to docs * adjust text * update docs per review * deprecated	2019-10-30 19:29:56 -07:00
Jihoon Son	094936ca03	Remove commit() method Firehose (#8688 ) * Remove commit() method Firehose * fix javadoc	2019-10-23 16:52:02 -07:00
Gian Merlino	bb4368baec	Fix PeriodGranularity.bucketStart for days that don't start with midnight. (#8711 ) * Fix PeriodGranularity.bucketStart for days that don't start with midnight. * Use more newspeak method.	2019-10-22 10:00:52 -07:00
Jihoon Son	30c15900be	Auto compaction based on parallel indexing (#8570 ) * Auto compaction based on parallel indexing * javadoc and doc * typo * update spell * addressing comments * address comments * fix log * fix build * fix test * increase default max input segment bytes per task * fix test	2019-10-18 13:24:14 -07:00
Chi Cao Minh	8b2afa5c49	Use targetRowsPerSegment for single-dim partitions (#8624 ) When using single-dimension partitioning, use targetRowsPerSegment (if specified) to size segments. Previously, single-dimension partitioning would always size segments as close to the max size as possible. Also, change single-dimension partitioning to allow partitions that have a size equal to the target or max size. Previously, it would create partitions up to 1 less than those limits. Also, fix some IntelliJ inspection warnings in HadoopDruidIndexerConfig.	2019-10-17 15:55:12 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Benedict Jin	bba262a4c5	Fix resource leaks and suppress an incorrect LGTM alert (#8589 ) * Fix resource leaks and suppress an incorrect alert * Replace Guava's Files	2019-10-10 22:40:45 +03:00
Jihoon Son	96d8523ecb	Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571 ) * IOConfig for compaction task * add javadoc, doc, unit test * fix webconsole test * add spelling * address comments * fix build and test * address comments	2019-10-09 11:12:00 -07:00
Parag Jain	f0d74b240d	password provider for basic authentication of HttpEmitterConfig (#8618 )	2019-10-02 15:59:17 -07:00
Fokko Driesprong	82bfe86d0c	Make more package EverythingIsNonnullByDefault by default (#8198 ) * Make more package EverythingIsNonnullByDefault by default * Fixed additional voilations after pulling in master * Change iterator to list.addAll * Fix annotations	2019-09-30 18:53:18 -06:00
Himanshu	9f1f5e115c	doubleMean aggregator to be used at query time (#8459 ) * doubleMean aggregator for computing mean * make docs * build fixes * address review comment: handle null args	2019-09-26 08:04:33 -07:00
SandishKumarHN	ade8d1922d	#8156 : StructuralSearchInspection, Prohibit check on Thread.ge… (#8394 ) * StructuralSearchInspection, Prohibit check on Thread.getState() * review changes - 1 * review changes 2 * review changes 3 * test fix * review changes-2 * review changes-3	2019-09-22 14:12:05 +03:00
Chi Cao Minh	aeac0d4fd3	Adjust defaults for hashed partitioning (#8565 ) * Adjust defaults for hashed partitioning If neither the partition size nor the number of shards are specified, default to partitions of 5,000,000 rows (similar to the behavior of dynamic partitions). Previously, both could be null and cause incorrect behavior. Specifying both a partition size and a number of shards now results in an error instead of ignoring the partition size in favor of using the number of shards. This is a behavior change that makes it more apparent to the user that only one of the two properties will be honored (previously, a message was just logged when the specified partition size was ignored). * Fix test * Handle -1 as null * Add -1 as null tests for single dim partitioning * Simplify logic to handle -1 as null * Address review comments	2019-09-21 20:57:40 -07:00
Chi Cao Minh	99b6eedab5	Rename partition spec fields (#8507 ) * Rename partition spec fields Rename partition spec fields to be consistent across the various types (hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment and maxRowsPerSegment in favor of targetPartitionSize and maxSegmentSize. Consistent and clearer names are easier for users to understand and use. Also fix various IntelliJ inspection warnings and doc spelling mistakes. * Fix test * Improve docs * Add targetRowsPerSegment to HashedPartitionsSpec	2019-09-20 14:59:18 -06:00
Benedict Jin	c6f4f09557	Fix missing space in string literal and spurious Javadoc @param tags from LGTM (#8491 ) * Fix missing space in string literal * Fix spurious Javadoc @param tags	2019-09-16 14:37:47 +05:30
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Chi Cao Minh	14a8613d69	Exit JVM on curator unhandled errors (#8458 ) * Exit JVM on curator unhandled errors If an unhandled error occurs when curator is talking to ZooKeeper, exit the JVM in addition to stopping the lifecycle to prevent the process from being left in a zombie state. With this change, BoundedExponentialBackoffRetryWithQuit is no longer needed as when curator exceeds the configured retries, it triggers its unhandled error listeners. A new "connectionTimeoutMs" CuratorConfig setting is added mostly to facilitate testing curator unhandled errors, but it may be useful for users as well. * Address review comments	2019-09-06 16:43:59 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
Himanshu	5c3db41c2b	string column handling for long/float min/max/sum aggregators (#8319 ) * string column handling for long min/max/sum aggregators * add apache license to new files * use 'L' as suffix for long literal instead of 'l' * return null in ParallelCombiner.SettableColumnSelectorFactory.getColumnCapabilities(String) as is required by contract of ColumnSelectorFactory interface * fix more tests	2019-08-27 16:10:59 -07:00
Himanshu	4d87a19547	Logging emitter to publish query and other metric events as valid json objects (#8359 ) * LoggingEmitter: print event as json * use DefaultRequestLogEventBuilderFactory in emitting request logger by default * print context in query metric as json * removed unused jsonMapper from DefaultQueryMetrics * add comment * remove change to DefaultRequestLogEventBuilderFactory.java	2019-08-27 15:00:23 -07:00
Jihoon Son	e5ef5ddafa	Fix the shuffle with TLS enabled for parallel indexing; add an integration test; improve unit tests (#8350 ) * Fix shuffle with tls enabled; add an integration test; improve unit tests * remove debug log * fix tests * unused import * add javadoc * rename to getContent	2019-08-26 19:27:41 -07:00
Chi Cao Minh	a4b842ac2e	Speed up ExpressionSelectors.makeExprEvalSelector (#8373 ) * Speed up ExpressionSelectors.makeExprEvalSelector Addresses performance bottlenecks observed when running a search query with an expression filter and granularity set to none by caching and changing streams to for-loops. After changes, the query was observed to run 2-3x faster. Also fixes various IntelliJ inspection warnings. * Fix static analysis errors	2019-08-26 14:34:16 -07:00
Xavier Léauté	496dfa3b15	fix JVMMonitor initialization with JDK11 (#8397 ) JVMMonitor requires access to jdk.internal.perf.Perf to enable GC counters, which requires additional JVM arguments with JDK11. This change adds a fallback in case GC counters cannot be initialized, and logs a warning message explaining how GC counters can be enabled.	2019-08-25 08:26:58 -04:00
Xavier Léauté	8e0c307e54	Do not assume system classloader is URLClassLoader in Java 9+ (#8392 ) * Fallback to parsing classpath for hadoop task in Java 9+ In Java 9 and above we cannot assume that the system classloader is an instance of URLClassLoader. This change adds a fallback method to parse the system classpath in that case, and adds a unit test to validate it matches what JDK8 would do. Note: This has not been tested in an actual hadoop setup, so this is mostly to help us pass unit tests. * Remove granularity test of dubious value One of our granularity tests relies on system classloader being a URLClassLoaders to catch a bug related to class initialization and static initializers using a subclass (see #2979) This test was added to catch a potential regression, but it assumes we would add back the same type of static initializers to this specific class, so it seems to be of dubious value as a unit test and mostly serves to illustrate the bug. relates to #5589	2019-08-24 20:47:54 -04:00
SandishKumarHN	33f0753a70	Add Checkstyle for constant name static final (#8060 ) * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * merging with upstream * review-1 * unknow changes * unknow changes * review-2 * merging with master * review-2 1 changes * review changes-2 2 * bug fix	2019-08-23 13:13:54 +03:00

1 2 3 4 5 ...

375 Commits