druid

Commit Graph

Author	SHA1	Message	Date
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
Will Xu	4868ef9529	Enable Arm builds (#12451 ) This PR enables ARM builds on Travis. I've ported over the changes from @martin-g on reducing heap requirements for some of the tests to ensure they run well on Travis arm instances.	2022-04-26 20:14:40 +05:30
Didip Kerabat	2473de2552	Metrics for shenandoah based on this source code: `554caf33a0/src/hotspot/share/gc/shenandoah/shenandoahMonitoringSupport.cpp (L65)` (#12369 ) Co-authored-by: Didip Kerabat <didip@apple.com>	2022-04-22 11:44:05 -07:00
Tejaswini Bandlamudi	177e1856cd	Fix GCS based ingestion if bucket name contains underscores (#12445 ) GCP allows bucket names to contain underscores. When a location in such a bucket is mapped to `java.net.URI`, `URI.getHost()` returns null. `URI.getHost()` is used as the bucket name in `CloudObjectLocation`, leading to an NPE. This commit uses `URI.getAuthority()` as the bucket name if `URI.getHost()` is null.	2022-04-21 09:22:35 +05:30
Agustin Gonzalez	0460d45e92	Make tombstones ingestible by having them return an empty result set. (#12392 ) * Make tombstones ingestible by having them return an empty result set. * Spotbug * Coverage * Coverage * Remove unnecessary exception (checkstyle) * Fix integration test and add one more to test dropExisting set to false over tombstones * Force dropExisting to true in auto-compaction when the interval contains only tombstones * Checkstyle, fix unit test * Changed flag by mistake, fixing it * Remove method from interface since this method is specific to only DruidSegmentInputentity * Fix typo * Adapt to latest code * Update comments when only tombstones to compact * Move empty iterator to a new DruidTombstoneSegmentReader * Code review feedback * Checkstyle * Review feedback * Coverage	2022-04-15 09:08:06 -07:00
hqx871	a22d413725	Use binary search to improve DimensionRangeShardSpec lookup (#12417 ) If there are many shards, mapper of IndexGeneratorJob seems to spend a lot of time in calling DimensionRangeShardSpec.isInChunk to lookup target shard. This can be significantly improved by using binary search instead of comparing an input row to every shardSpec. Changes: * Add `BaseDimensionRangeShardSpec` which provides a binary-search-based implementation for `createLookup` * `DimensionRangeShardSpec`, `SingleDimensionShardSpec`, and `DimensionRangeBucketShardSpec` now extend `BaseDimensionRangeShardSpec`	2022-04-15 21:37:06 +05:30
Clint Wylie	5824ab9608	fix issue with boolean expression input (#12429 )	2022-04-13 16:34:01 -07:00
Jihoon Son	5e5625f3ae	Fix indexMerger to respect the includeAllDimensions flag (#12428 ) * Fix indexMerger to respect flag includeAllDimensions flag; jsonInputFormat should set keepNullColumns if useFieldDiscovery is set * address comments	2022-04-13 12:43:11 -07:00
Maytas Monsereenusorn	8edea5a82d	Add a new flag for ingestion to preserve existing metrics (#12185 ) * add impl * add impl * fix checkstyle * add impl * add unit test * fix stuff * fix stuff * fix stuff * add unit test * add more unit tests * add more unit tests * add IT * add IT * add IT * add IT * add ITs * address comments * fix test * fix test * fix test * address comments * address comments * address comments * fix conflict * fix checkstyle * address comments * fix test * fix checkstyle * fix test * fix test * fix IT	2022-04-08 11:02:02 -07:00
somu-imply	a1ea658115	Introducing a new config to ignore nulls while computing String Cardinality (#12345 ) * Counting nulls in String cardinality with a config * Adding tests for the new config * Wrapping the vectorize part to allow backward compatibility * Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes * Updating testcase and code * Adding null handling test to improve coverage * Checkstyle fix * Adding 1 more change in docs * Making docs clearer	2022-03-29 14:31:36 -07:00
Maytas Monsereenusorn	dbb9518f50	Fix auto compaction by adjusting compaction task's interval to align with segmentGranularity when segmentGranularity is set (#12334 ) * add impl * add ITs * address comments * address comments * address comments * fix failure * fix checkstyle * fix checkstyle	2022-03-18 12:46:16 -07:00
Xavier Léauté	c33fa11669	improve test compatibility with Java 17 and remove deprecated methods (#12341 ) * remove use of reflection in EnvironmentVariableDynamicConfigProvider for Java 17 compatibility * fix mocks mock objects not getting closed properly, causing issues with Java 17 * remove use of deprecated methods and rules in tests	2022-03-18 08:19:28 -07:00
Jihoon Son	5e23674fe5	Fix a race condition in the '/tasks' Overlord API (#12330 ) * finds complete and active tasks from the same snapshot * overlord resource * unit test * integration test * javadoc and cleanup * more cleanup * fix test and add more	2022-03-17 10:47:45 +09:00
Dr. Sizzles	69f928f50e	Adding k8s support for human readable parsing (#12316 ) * Adding k8s support for human readable parsing * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java Co-authored-by: Frank Chen <frankchen@apache.org> * Changes per review Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-03-16 11:18:47 +08:00
Xavier Léauté	5d02a91faa	upgrade Error Prone to 2.11 (requires Java 11) (#12306 ) The latest version of Error Prone now requires Java 11. Upgrading means we can remove a lot of the maven profile complexity required to run checks with Java 8. This also requires switching our strict build to use Java 11. * update error-prone to 2.11 * remove need for specific maven profiles for Java 8 and Java 15 * fix additional Error Prone warnings with Java 11 * update strict build to use Java 11	2022-03-14 19:40:48 -07:00
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	28f8bcce9b	Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307 ) * Always reopen stream in FileUtils.copyLarge, RetryingInputStream. When an InputStream throws an exception from one of its read methods, we should assume it's bad and reopen it. The main changes here are: - In FileUtils.copyLarge, replace InputStream with InputStreamSupplier. - In RetryingInputStream, collapse retryCondition and resetCondition into a single condition. Also, make it required, since every usage is passing in a specific condition anyway. * Test fixes. * Fix read impl.	2022-03-05 14:39:14 -08:00
Laksh Singla	3f709db173	Make ParseExceptions more informative (#12259 ) This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader) Following changes are addressed in this PR: A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next(). IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number"). TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error. This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).	2022-02-28 22:31:15 +05:30
Xavier Léauté	d105519558	Replace use of PowerMock with Mockito (#12282 ) Mockito now supports all our needs and plays much better with recent Java versions. Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. * replace all uses of powermock with mockito-inline * upgrade mockito to 4.3.1 and fix use of deprecated methods * import mockito bom to align all our mockito dependencies * add powermock to forbidden-apis to avoid accidentally reintroducing it in the future	2022-02-27 22:47:09 -08:00
Xavier Léauté	1434197ee1	update airline dependency to 2.x (#12270 ) * upgrade Airline to Airline 2 https://github.com/airlift/airline is no longer maintained, updating to https://github.com/rvesse/airline (Airline 2) to use an actively maintained version, while minimizing breaking changes. Note, this is a backwards incompatible change, and extensions relying on the CliCommandCreator extension point will also need to be updated. * fix dependency checks where jakarta.inject is now resolved first instead of javax.inject, due to Airline 2 using jakarta	2022-02-27 15:19:28 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Karan Kumar	b86f2d4c2e	Performance fixes in proto readers (#12267 )	2022-02-24 23:21:48 +05:30
somu-imply	033989eb1d	Adding vectorized time_shift (#12254 ) * Adding vectorized time_shift * Vectorize time shift, addressing review comments * Remove an unused import	2022-02-11 14:44:52 -08:00
Clint Wylie	3ee66bb492	allow optimizing sql expressions and virtual columns (#12241 ) * rework sql planner expression and virtual column handling * simplify a bit * add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests * spotbugs * fix bugs with multi-value string array expression handling * javadocs and adjust test * better * fix tests	2022-02-09 14:55:50 -08:00
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
Clint Wylie	ae71e05fc5	array_concat_agg and array_agg support for array inputs (#12226 ) * array_concat_agg and array_agg support for array inputs changes: * added array_concat_agg to aggregate arrays into a single array * added array_agg support for array inputs to make nested array * added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating * fix test * tie capabilities type to legacy mode flag about coercing arrays to strings * oops * better javadoc	2022-02-07 19:59:30 -08:00
Gian Merlino	de82c611de	Harmonize implementations of "visit" for Exprs from ExprMacros. (#12230 ) * Harmonize implementations of "visit" for Exprs from ExprMacros. Many of them had bugs where they would not visit all of the original arguments. I don't think this has user-visible consequences right now, but it's possible it would in a future world where "visit" is used for more stuff than it is today. So, this patch all updates all implementations to a more consistent style that emphasizes reapplying the macro to the shuttled args. * Test fixes, test coverage, PR review comments.	2022-02-04 08:08:54 -08:00
tejaswini-imply	290130b1fa	Fix bug while adding `Range` header in HttpEntity (#12215 ) Changes: - Add `Range` header to the request before opening the connection - Use header `Content-Range` instead of `Accept-Ranges` as `Content-Range` is guaranteed to be populated if the server is returning a partial response	2022-02-04 18:17:51 +05:30
Clint Wylie	f9b406c8f2	add backwards compatibility mode for multi-value string array null value coercion (#12210 )	2022-01-31 22:38:15 -08:00
Karan Kumar	96b3498a40	Grouping on arrays as arrays (#12078 ) * init multiValue column group by * Changing sorting to Lexicographic as default * Adding initial tests * 1.Fixing test cases adding 2.Optimized inmem structs * Linking SQL layer to native layer * Adding multiDimension support to group by column strategy * 1. Removing array coercion in Calcite layer 2. Removing ResultRowDeserializer * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Checkstyle things 2. Removing flag * Minor naming things * CheckStyle Things * Fixing test case * Fixing hashing * 1. Adding the MV function 2. Added few test cases * 1. Adding MV function test cases * Adding Selector strategy function test cases * Fixing ClientQuerySegmentWalkerTest * Adding GroupByQueryRunnerTest test cases * Fixing test cases * Adding few more test cases * Fixing Exception asset statement and intellij inspection * Adding null compatibility tests * Review comments * Fixing few failing tests * Fixing few failing tests * Do no convert to topN Q incase of group by on array * Fixing checkstyle * Fixing differences between jdk's class cast exception message * 1. Fixing ordering if the grouping key is an array * Fixing DefaultLimitSpec * Fixing CalciteArraysQueryTest * Dummy commit for LGTM * changes: * only coerce multi-value string null values when `ExpressionPlan.Trait.NEEDS_APPLIED` is set * correct return type inference for ARRAY_APPEND,ARRAY_PREPEND,ARRAY_SLICE,ARRAY_CONCAT * fix bug with ExprEval.ofType when actual type of object from binding doesn't match its claimed type * Review comments * Fixing test cases * Fixing spot bugs * Fixing strict compile Co-authored-by: Clint Wylie <cwylie@apache.org>	2022-01-25 20:30:56 -08:00
Maytas Monsereenusorn	bd7fe45da0	Support adding metrics in Auto Compaction (#12125 ) * add impl * add impl * add unit tests * add unit tests * add unit tests * add unit tests * add unit tests * add integration tests * add integration tests * fix LGTM * fix test * remove doc	2022-01-17 20:19:31 -08:00
Clint Wylie	1dba089a62	fix array type strategy write size tracking (#12150 ) * fix array type strategy write size tracking * fix checkstyle	2022-01-13 10:22:40 -08:00
Xavier Léauté	e56ea31697	follow-up to fix formatting broken in #12147 (#12148 ) follow-up to #12147 to fix the build	2022-01-12 20:59:32 -08:00
Xavier Léauté	168187e6df	avoid unnecessary String.format calls in IdUtils.validateId (#12147 ) Based on profiling data, about 25% of the time de-serializing DataSchema is spent on formatting strings in validateId. This can add up quickly, especially when de-serializing task information in the overlord, where in can consume almost 2% of CPU if there are many tasks. Since the formatting is unnecessary unless the checks fail, we can leverage the built-in formatting of Preconditions.checkArgument instead to avoid the cost.	2022-01-12 16:34:40 -08:00
Clint Wylie	7cf9192765	fix delegated smoosh writer and some new facilities for segment writeout medium (#12132 ) * fix delegated smoosh writer and some new facilities for segment writeout medium changes: * fixed issue with delegated `SmooshedWriter` when writing files that look like paths, causing `NoSuchFileException` exceptions when attempting to open a channel to the file * `FileSmoosher.addWithSmooshedWriter` when _not_ delegating now checks that it is still open when closing, making it a no-op if already closed (allowing column serializers to add additional files and avoid delegated mode if they are finished writing out their own content and ned to add additional files) * add `makeChildWriteOutMedium` to `SegmentWriteOutMedium` interface, which allows users of a shared medium to clean up `WriteOutBytes` if they fully control the lifecycle. there are no callers of this yet, adding for future functionality * `OnHeapByteBufferWriteOutBytes` now can be marked as not open so it `OnHeapMemorySegmentWriteOutMedium` can now behave identically to other medium implementations * fix to address nit - use AtomicLong	2022-01-10 22:25:19 -08:00
Clint Wylie	e583033231	add 'TypeStrategy' to types (#11888 ) * add TypeStrategy - value comparators and binary serialization for any TypeSignature	2022-01-10 17:12:14 -08:00
AmatyaAvadhanula	c0b1514177	Segment pruning for multi-dim partitioning given query domain (#12046 ) Segment pruning for multi-dim partitioning for a given query DimensionRangeShardSpec#possibleInDomain has been modified to enhance pruning when multi-dim partitioning is used. Idea While iterating through each dimension, If query domain doesn't overlap with the set of permissible values in the segment, the segment is pruned. If the overlap happens on a boundary, consider the next dimensions. If there is an overlap within the segment boundaries, the segment cannot be pruned.	2021-12-17 12:44:43 +05:30
Suneet Saldanha	25ac04e067	MySqlFirehoseDatabaseConnector uses configured driver class name (#12049 )	2021-12-09 20:58:55 -08:00
Frank Chen	58245b4617	Support JsonPath functions in JsonPath expressions (#11722 ) * Add jsonPath functions support * Add jsonPath function test for Avro * Add jsonPath function length() to Orc * Add jsonPath function length() to Parquet * Add more tests to ORC format * update doc * Fix exception during ingestion * Add IT test case * Revert "Fix exception during ingestion" This reverts commit `5a5484b9ea`. * update IT test case * Add 'keys()' * Commit IT test case * Fix UT	2021-12-10 10:53:23 +08:00
Jonathan Wei	229f82a6f0	Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961 ) * Add parse error list API for stream supervisors, simplify parse exception message * Add input string to parse exception * Use structured ParseExceptionReport * Fix tests * Add test * PR comments, add ParseExceptionReport equals verifier * Fix test	2021-12-09 15:42:55 -06:00
Xavier Léauté	0565f0e6a1	fix build warnings for forbidden-apis (#12034 ) * replace deprecated forbidden-apis config failOnUnresolvableSignatures with ignoreSignaturesOfMissingClasses which avoids warnings for classes not present in a particular sub-module * fix incorrect signature for Files.createTempDirectory	2021-12-07 22:21:01 -08:00
Abhishek Agarwal	834aae096a	Human-readable and actionable SQL error messages (#11911 ) This PR does two things 1. It adds the capability to surface missing features in SQL to users - The calcite planner will explore through multiple rules to convert a logical SQL query to a druid native query. Some rules change the shape of the query itself, optimize it and some rules are responsible for translating the query into a druid native query. These are DruidQueryRule, DruidOuterQueryRule, DruidJoinRule, DruidUnionDataSourceRule, DruidUnionRule etc. These rules will look at SQL and will do the necessary transformation. But if the rule can't transform the query, it returns back the control to the calcite planner without recording why was it not able to transform. E.g. there is a join query with a non-equal join condition. DruidJoinRule will look at the condition, see that it is not supported, and return back the control. The reason can be that a query can be planned in many different ways so if one rule can't parse it, the query may still be parseable by other rules. In this PR, we are intercepting these gaps and passing them back to the user if the query could not be planned at all. 2. The said capability has been used to generate actionable errors for some common unsupported SQL features. However, not all possible errors are covered and we can keep adding more in the future.	2021-12-07 09:44:08 +05:30
Gian Merlino	76d281d64f	Enable allocating segments at ALL granularity. (#12003 ) * Enable allocating segments at ALL granularity. The main change is that Granularity.granularitiesFinerThan will return ALL if ALL is passed in. Allocating segments at ALL granularity is somewhat unconventional, but there is nothing wrong with it, and it actually makes a lot of sense for tables that are meant to be used for lookups or dimensions rather than main fact tables. This change enables ALL segmentGranularity to work properly in appendToExisting mode. Also clarifies behavior in javadocs and tests. * Move tests to improve coverage.	2021-12-03 14:15:05 -08:00
Gian Merlino	e0e05aad99	Enhancements to IndexTaskClient. (#12011 ) * Enhancements to IndexTaskClient. 1) Ability to use handlers other than StringFullResponseHandler. This functionality is not used in production code yet, but is useful because it will allow tasks to communicate with each other in non-string-based formats and in streaming fashion. In the future, we'll be able to use this to make task-to-task communication more efficient. 2) Truncate server errors at 1KB, so long errors do not pollute logs. 3) Change error log level for retryable errors from WARN to INFO. (The final error is still WARN.) 4) Harmonize log and exception messages to have a more consistent format. * Additional tests and improvements.	2021-12-03 09:14:32 -08:00
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
Gian Merlino	f47afd7b98	HttpResponseHandler: Fill out truncated javadoc. (#12004 )	2021-12-02 14:05:51 -08:00
Karan Kumar	ffa553593f	Use one factory in json reader (#11999 )	2021-12-01 16:17:48 +05:30
Paul Rogers	a66f10eea1	Code cleanup from query profile project (#11822 ) * Code cleanup from query profile project * Fix spelling errors * Fix Javadoc formatting * Abstract out repeated test code * Reuse constants in place of some string literals * Fix up some parameterized types * Reduce warnings reported by Eclipse * Reverted change due to lack of tests	2021-11-30 11:35:38 -08:00
Agustin Gonzalez	8eff6334f7	AWS "Data read has a different length than the expected" error should reset stream and try again (#11941 ) * Add support for custom reset condition & support for other args to have defaults to make the method api consistent * Add support for custom reset condition to InputEntity * Fix test names * Clarifying comments to why we need to read the message's content to identify S3's resettable exception * Add unit test to verify custom resettable condition for S3Entity * Provide a way to customize retries since they are expensive to test	2021-11-26 12:45:34 -07:00
Gian Merlino	3d72e66f56	Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. (#11582 ) * Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. This patch gathers together a variety of SQL from SqlSegmentsMetadataManager and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery. It focuses on SQL related to retrieving segment payloads and marking segments used and unused. In addition to cleaning up the code a bit, this patch also fixes a bug with years before 0 or after 9999. The prior SQL did not work properly because dates outside this range cannot be compared as strings. The new code does work for these far-past and far-future years. So, if you're ever interested in using Druid to analyze things from ancient Babylon, you better apply this patch first! * Fix test compiling. * Fixes and improvements. * Fix forbidden API. * Additional fixes.	2021-11-24 14:51:53 -08:00
Gian Merlino	0354407655	SQL INSERT planner support. (#11959 ) * SQL INSERT planner support. The main changes are: 1) DruidPlanner is able to validate and authorize INSERT queries. They require WRITE permission on the target datasource. 2) QueryMaker is now an interface, and there is a QueryMakerFactory that creates instances of it. There is only one production implementation of each (NativeQueryMaker and NativeQueryMakerFactory), which together behave the same way as the former QueryMaker class. But this opens the door to executing queries in ways other than the Druid query stack, and is used by unit tests (CalciteInsertDmlTest) to test the INSERT planning functionality. 3) Adds an EXTERN table macro that allows references external data using InputSource and InputFormat from Druid's batch ingestion API. This is not exposed in production yet, but is used by unit tests. 4) Adds a QueryFeature concept that enables the planner to change its behavior slightly depending on the capabilities of the execution system. 5) Adds an "AuthorizableOperator" concept that enables SqlOperators to require additional permissions. This is used by the EXTERN table macro. Related odds and ends: - Add equals, hashCode, toString methods to InlineInputSource. Aids in the "from external" tests in CalciteInsertDmlTest. - Add JSON-serializability to RowSignature. - Move the SQL string inside PlannerContext so it is "baked into" the planner when the planner is created. Cleans up the code a bit, since in practice, the same query is passed in every time to the same planner anyway. * Fix up calls to CalciteTests.createMockQueryLifecycleFactory. * Fix checkstyle issues. * Adjustments for CI. * Adjust DruidAvaticaHandlerTest for stricter test authorizations.	2021-11-24 12:14:04 -08:00
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
cheddar	e6570cadc4	Update LifecycleModule.java (#11972 ) Update the javadoc on LifecycleModule to be more clear about why the register methods exist and why they should always be used instead of Guice's eager instantiation.	2021-11-23 17:03:37 -08:00
Gian Merlino	b13f07a057	Harmonize local input sources; fix batch index integration test. (#11965 ) * Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list. Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the 4-segment case will always happen. Providing the file list in a specific order ensures that this will happen as expected by the tests. I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so changing it to a List was the simplest way to achieve the consistent ordering. I think it will also make the behavior make more sense if someone does specify the same input file multiple times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This is consistent with the behavior of other input sources like S3, GCS, HTTP. * Sort files in LocalFirehoseFactory.	2021-11-21 22:26:31 -08:00
Clint Wylie	f260bbed23	restore and deprecate AggregatorFactory methods (#11917 ) * add back and deprecate aggregator factory methods so i can say i told you so when i delete these later * rename to make less ambiguous, fix fill method * adjust	2021-11-19 15:59:35 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
Kashif Faraz	223c5692a8	Add dimension partitioningType to metrics to track usage of different partitioning schemes (#11902 ) Add method ShardSpec.getType() to get name of shard spec type List all names of shard spec types in the interface ShardSpec itself for easy reference and maintenance Add dimension partitioningType to metric segment/added/bytes	2021-11-11 18:34:27 +05:30
Gian Merlino	14b0b4aee2	RowBasedSegment: Use Sequence instead of Iterable. (#11886 ) * RowBasedSegment: Use Sequence instead of Iterable. The main reason this is good is that Sequences can include baggage that must be closed after iteration is finished. This enables creating RowBasedSegments on top of closeable sequences of rows. To preserve the optimization that allows reversing a List without copying it, this patch also makes SimpleSequence its own class and allows extracting the Iterable that was used to create it. * Fix tests.	2021-11-10 06:06:52 -08:00
Kashif Faraz	d3914c1a78	Ensure backward compatibility of multi dimension partitioning (#11889 ) This PR has changes to ensure backward compatibility of multi dimension partitioning such that if some middle managers are upgraded to a newer version, the cluster still functions normally for single_dim use cases.	2021-11-10 10:23:34 +05:30
Maytas Monsereenusorn	a36a41da73	Support routing data through an HTTP proxy (#11891 ) * Support routing data through an HTTP proxy * Support routing data through an HTTP proxy This adds the ability for the HttpClient to connect through an HTTP proxy. We augment the channel factory to check if it is supposed to be proxied and, if so, we connect to the proxy host first, issue a CONNECT command through to the final recipient host and then give the channel to the normal http client for usage. * add docs * address comments Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2021-11-09 17:24:06 -08:00
Gian Merlino	babf00f8e3	Migrate File.mkdirs to FileUtils.mkdirp. (#11879 ) * Migrate File.mkdirs to FileUtils.mkdirp. * Remove unused imports. * Fix LookupReferencesManager. * Simplify. * Also migrate usages of forceMkdir. * Fix var name. * Fix incorrect call. * Update test.	2021-11-09 11:10:49 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Clint Wylie	7237dc837c	complex typed expressions (#11853 ) * complex typed expressions * add built-in hll collector expressions to get coverage on druid-processing, more types, more better * rampage!!! * more javadoc * adjustments * oops * lol * remove unused dependency * contradiction? * more test	2021-11-08 00:33:06 -08:00
Kashif Faraz	2d77e1a3c6	Add support for multi dimension range partitioning (#11848 ) This PR adds support for range partitioning on multiple dimensions. It extends on the concept and implementation of single dimension range partitioning. The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension. The start and end values of a DimensionRangeShardSpec are represented by StringTuples, where each String in the tuple is the value of a partition dimension.	2021-11-06 12:50:17 +05:30
Gian Merlino	1c12dd97dc	Add javadocs to StringUtils.fromUtf8. (#11881 ) They clarify that the methods advance the position of the buffer.	2021-11-05 15:27:24 -07:00
Gian Merlino	98ecbb21cd	Remove CloseQuietly and migrate its usages to other methods. (#10247 ) * Remove CloseQuietly and migrate its usages to other methods. These other methods include: 1) New method CloseableUtils.closeAndWrapExceptions, which wraps IOExceptions in RuntimeExceptions for callers that just want to avoid dealing with checked exceptions. Most usages were migrated to this method, because it looks like they were mainly attempts to avoid declaring a throws clause, and perhaps were unintentionally suppressing IOExceptions. 2) New method CloseableUtils.closeInCatch, designed to properly close something in a catch block without losing exceptions. Some usages from catch blocks were migrated here, when it seemed that they were intended to avoid checked exception handling, and did not really intend to also suppress IOExceptions. 3) New method CloseableUtils.closeAndSuppressExceptions, which sends all exceptions to a "chomper" that consumes them. Nothing is thrown or returned. The behavior is slightly different: with this method, _all_ exceptions are suppressed, not just IOExceptions. Calls that seemed like they had good reason to suppress exceptions were migrated here. 4) Some calls were migrated to try-with-resources, in cases where it appeared that CloseQuietly was being used to avoid throwing an exception in a finally block. 🎵 You don't have to go home, but you can't stay here... 🎵 * Remove unused import. * Fix up various issues. * Adjustments to tests. * Fix null handling. * Additional test. * Adjustments from review. * Fixup style stuff. * Fix NPE caused by holder starting out null. * Fix spelling. * Chomp Throwables too.	2021-10-23 17:03:21 -07:00
Gian Merlino	cb9bc15e95	Fix task report streaming in https setups. (#11739 ) * Fix task report streaming in https setups. * Trivial change to re-trigger ITs.	2021-10-22 19:07:29 -07:00
Clint Wylie	741b4ed516	add output type information to ExpressionPostAggregator (#11818 ) * add ColumnInspector argument to PostAggregator.getType to allow post-aggs to compute their output type based on input types * add test for test for coverage * simplify * Remove unused imports. Co-authored-by: Gian Merlino <gian@imply.io>	2021-10-22 13:52:51 -07:00
Arun Ramani	df4894afff	Fallback to /sys/fs root when looking for cgroups (#11810 ) ProcCgroupDiscoverer builds the cgroup directory by concatenating the proc mounts and proc cgroup paths together. This doesn't seem to work in Kubernetes if the execution context is within the container. Also this isn't consistent across all Linux OSes. The fix is to fallback to / as the root and it seems to work empirically.	2021-10-21 09:51:16 +05:30
Clint Wylie	187df58e30	better types (#11713 ) * better type system * needle in a haystack * ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support * fixup merge * more test * fixup * intern * fix * oops * oops again * ... * more test coverage * fix error message * adjust interning, more javadocs * oops * more docs more better	2021-10-19 01:47:25 -07:00
Kashif Faraz	7352c83e11	Do not log sensitive property value if JsonConfigurator fails to parse (#11787 ) * Do not log property value if JsonConfigurator fails to parse * Add comment to explain log change * Fix log language	2021-10-09 09:59:03 +05:30
Arun Ramani	b6b42d3936	Minor processor quota computation fix + docs (#11783 ) * cpu/cpuset cgroup and procfs data gathering * Renames and default values * Formatting * Trigger Build * Add cgroup monitors * Return 0 if no period * Update * Minor processor quota computation fix + docs * Address comments * Address comments * Fix spellcheck Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>	2021-10-08 22:52:03 -05:00
Arun Ramani	15789137a3	Add cpu/cpuset cgroup and procfs data gathering (#11763 ) * cpu/cpuset cgroup and procfs data gathering * Renames and default values * Formatting * Trigger Build * Add cgroup monitors * Return 0 if no period * Update Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>	2021-10-06 20:27:36 -07:00
Maytas Monsereenusorn	a04b08e45c	Add new config to filter internal Druid-related messages from Query API response (#11711 ) * add impl * add impl * add tests * add unit test * fix checkstyle * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * address comments * address comments * address comments * fix test * fix test * fix test * fix test * fix test * change config name * change config name * change config name * address comments * address comments * address comments * address comments * address comments * address comments * fix compile * fix compile * change config * add more tests * fix IT	2021-09-29 12:55:49 +07:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Atul Mohan	dcee99df78	Improve error message when buckets are null for cloud objects (#11644 ) * Add error message * Add test * Checkstyle	2021-09-07 17:31:17 -07:00
Sandeep	ac2b65e837	fixes possible data truncation (#11462 ) * fixes possible data truncation * fixes possible data truncation * add unit test case to catch the possible data truncation	2021-08-26 20:16:26 +08:00
Jihoon Son	2a658acad4	Put sleep in an extension (#11632 ) * Put sleep in an extension * dependency	2021-08-25 01:27:45 -07:00
Jihoon Son	78b4be467e	Add sleep function for testing (#11626 ) * Add sleep function for testing * sql function * javadoc	2021-08-24 14:30:31 +07:00
Yi Yuan	bf863343f8	delete some code (#11552 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-16 10:40:40 -07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
frank chen	e40be0ae28	Add SQL functions to format numbers into human readable format (#10635 ) * add binary_byte_format/decimal_byte_format/decimal_format * clean code * fix doc * fix review comments * add spelling check rules * remove extra param * improve type handling and null handling * remove extra zeros * fix tests and add space between unit suffix and number as most size-format functions do * fix tests * add examples * change function names according to review comments * fix merge Signed-off-by: frank chen <frank.chen021@outlook.com> * no need to configure NullHandling explicitly for tests Signed-off-by: frank chen <frank.chen021@outlook.com> * fix tests in SQL-Compatible mode Signed-off-by: frank chen <frank.chen021@outlook.com> * Resolve review comments * Update SQL test case to check null handling * Fix intellij inspections * Add more examples * Fix example	2021-08-13 10:27:49 -07:00
Harini Rajendran	ccd362d228	Fix FileIteratingFirehoseTest to extend NullHandlingTest (#11581 )	2021-08-12 08:26:04 -07:00
Yi Yuan	23d7d71ea5	Add Environment Variable DynamicConfigProvider (#11377 ) * add_environment_variable_DynamicConfigProvider * fix code * code fixed * code fixed * add document * fix doc * fix doc * add more unit test * fix style * fix document * bug fixed * fix unit test * fix comment * fix test Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-04 20:26:58 -07:00
wx930910	578625b771	Replace TestInputRowHandler with mocking object (#11529 ) * Replace TestInputRowHandler with mocking object * Change EasyMock object to Mockito object. Make test logic concise * correct code format	2021-08-04 16:45:22 -07:00
Yi Yuan	aa7cb50f24	Add DynamicConfigProvider for Schema Registry (#11362 ) * add_DynamicConfigProvider_for_schema_registry * bug fixed * add document * fix document * fix spot bug * fix document * inject ObjectMapper * add DynamicConfigProviderUtils * add UT * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-03 13:24:52 -07:00
Agustin Gonzalez	a2da407b70	Add error msg to parallel task's TaskStatus (#11486 ) * Add error msg to parallel task's TaskStatus * Consolidate failure block * Add failure test * Make it fail * Add fail while stopped * Simplify hash task test using a runner that fails after so many runs (parameter) * Remove unthrown exception * Use runner names to identify phase * Added range partition kill test & fixed a timing bug with the custom runner * Forbidden api * Style * Unit test code cleanup * Added message to invalid state exception and improved readability of the phase error messages for the parallel task failure unit tests	2021-08-02 12:11:28 -07:00
Xavier Léauté	4bca7f014e	update error-prone to 2.8.0 with fix for crashing check (#11494 ) * error-prone 2.8.0 fixes https://github.com/google/error-prone/issues/2396 * fix for a few ignored return values * fix unknown args in sub-modules	2021-07-29 09:13:46 -07:00
Jihoon Son	8729b40893	Add the error message in taskStatus for task failures in overlord (#11419 ) * add error messages in taskStatus for task failures in overlord * unused imports * add helper message for logs to look up * fix tests * fix counting the same task failures more than once * same fix for HttpRemoteTaskRunner	2021-07-15 13:14:28 -07:00
Suneet Saldanha	49e8732e4f	Display errors for invalid timezones in TIME_FORMAT (#11423 ) Users sometimes make typos when picking timezones - like `America/Los Angeles` instead of `America/Los_Angeles` instead of defaulting to UTC, this change makes it so that an error is thrown instead notifying the user of their mistake.	2021-07-09 06:07:13 -07:00
Clint Wylie	63fcd77c38	support using mariadb connector with mysql extensions (#11402 ) * support using mariadb connector with mysql extensions * cleanup and more tests * fix test * javadocs, more tests, etc * style and more test * more test more better * missing pom * more pom	2021-07-08 12:25:37 -07:00
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
Xavier Léauté	712f2a5d00	upgrade error-prone to 2.7.1 and support checks with Java 11+ (#11363 ) * upgrade error-prone to 2.7.1 and support checks with Java 11+ - upgrade error-prone to 2.7.1 - support running error-prone with Java 11 and above using -Xplugin instead of custom compiler - add compiler arguments to ignore warnings/errors in Java 15/16 - introduce strictCompile property to enable strict profiles since we now need multiple strict profiles for Java 8 - properly exclude all generated source files from error-prone - fix druid-processing overriding annotation processors from parent pom - fix druid-core disabling most non-default checks - align plugin and annotation errorprone versions - fix / suppress additional issues found by error-prone: * fix bug in SeekableStreamSupervisor initializing ArrayList size with the taskGroupdId * fix missing @Override annotations - remove outdated compiler plugin in benchmarks - remove deleted ParameterPackage error-prone rule - re-enable checks on benchmark module as well * fix IntelliJ inspections * disable LongFloatConversion due to bug in error-prone with JDK 8 * add comment about InsecureCrypto	2021-06-16 12:55:34 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Clint Wylie	920aa414ca	enrich expression cache key information to support expressions which depend on external state (#11358 ) * enrich expression cache key information to support expressions which depend on external state such as lookups * cache rules everything around me * low carb * rename	2021-06-14 17:26:43 -07:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
Maytas Monsereenusorn	e5633d7842	Fix bug: 502 bad gateway thrown when we edit/delete any auto compaction config created 0.21.0 or before (#11311 ) * fix bug * add test * fix IT * fix checkstyle * address comments	2021-05-27 16:34:32 -07:00
Clint Wylie	2bfcee5824	Fix issue with empty array converting to string expression instead of string array (#11270 )	2021-05-22 09:31:28 +08:00
Clint Wylie	6d08a7051e	fix bug with aggregator expressions on realtime index with string columns always producing 0 values (#11185 ) * fix bug with aggregator expressions on realtime index with string columns always producing 0 values * more test * rework some stuff * javadocs	2021-05-17 11:59:13 -07:00
Clint Wylie	3649c608d2	array handling improvements (#11233 ) * fix jdbc array handling, split handling for some array and multi value operator, split and add more tests * formatting	2021-05-13 18:50:32 -07:00
Clint Wylie	790262e5d0	add estimated byte size limit enforcement for heap based expression aggregator (#11236 )	2021-05-12 01:21:50 -07:00
Maytas Monsereenusorn	3455352241	Add feature to automatically remove compaction configurations for inactive datasources (#11232 ) * add auto cleanup * add auto cleanup * add auto cleanup * add tests * add tests * use retryutils * use retryutils * use retryutils * address comments	2021-05-11 18:49:18 -07:00
Maytas Monsereenusorn	3a660bc6ee	Make sure updating coordinator config is protected against race condition (#11144 ) * Make sure changing coordinator config is protected against concurrent updates * Make sure updating coordinator config is protected against race condition * add retry * fix checkstyle * add tests * add tests * add more tests * add tests * fix * fix checkstyle	2021-05-10 13:58:08 -07:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
Gian Merlino	ad028de538	InDimFilter: Fix NPE involving certain Set types. (#11169 ) * InDimFilter: Fix NPE involving certain Set types. Normally, InDimFilters that come from JSON have HashSets for "values". However, programmatically-generated filters (like the ones from #11068) may use other set types. Some set types, like TreeSets with natural ordering, will throw NPE on "contains(null)", which causes the InDimFilter's ValueMatcher to throw NPE if it encounters a null value. This patch adds code to detect if the values set can support contains(null), and if not, wrap that in a null-checking lambda. Also included: - Remove unneeded NullHandling.needsEmptyToNull method. - Update IndexedTableJoinable to generate a TreeSet that does not require lambda-wrapping. (This particular TreeSet is how I noticed the bug in the first place.) * Test fixes. * Improve test coverage	2021-04-28 14:13:42 -07:00
Clint Wylie	57ff1f9cdb	expression aggregator (#11104 ) * add experimental expression aggregator * add test * fix lgtm * fix test * adjust test * use not null constant * array_set_concat docs * add equals and hashcode and tostring * fix it * spelling * do multi-value magic for expression agg, more javadocs, tests * formatting * fix inspection * more better * nullable	2021-04-22 18:30:16 -07:00
Maytas Monsereenusorn	6d2b5cdd7e	Add feature to automatically remove audit logs based on retention period (#11084 ) * add docs * add impl * fix checkstyle * fix test * add test * fix checkstyle * fix checkstyle * fix test * Address comments * Address comments * fix spelling * fix docs	2021-04-20 17:10:43 -07:00
Maytas Monsereenusorn	f968400170	Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078 ) * Add config to skip storing audit payload if exceed limit * fix checkstyle * change config name * skip null fields for audit payload * fix checkstyle * address comments * fix guice * fix test * add tests * address comments * address comments * address comments * fix checkstyle * address comments * fix test * fix test * address comments * Address comments Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-13 20:18:28 -07:00
chenyuzhi459	b8423a38df	add round test (#11088 ) * add round test * code style * handle null val for round function * handle null val for round function * support null for round * fix compatiblity * fix test * fix test * code style * optimize format	2021-04-13 11:36:32 -07:00
Lucas Capistrant	8264203cee	Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676 ) * Add ability to wait for segment availability for batch jobs * IT updates * fix queries in legacy hadoop IT * Fix broken indexing integration tests * address an lgtm flag * spell checker still flagging for hadoop doc. adding under that file header too * fix compaction IT * Updates to wait for availability method * improve unit testing for patch * fix bad indentation * refactor waitForSegmentAvailability * Fixes based off of review comments * cleanup to get compile after merging with master * fix failing test after previous logic update * add back code that must have gotten deleted during conflict resolution * update some logging code * fixes to get compilation working after merge with master * reset interrupt flag in catch block after code review pointed it out * small changes following self-review * fixup some issues brought on by merge with master * small changes after review * cleanup a little bit after merge with master * Fix potential resource leak in AbstractBatchIndexTask * syntax fix * Add a Compcation TuningConfig type * add docs stipulating the lack of support by Compaction tasks for the new config * Fixup compilation errors after merge with master * Remove erreneous newline	2021-04-08 21:03:00 -07:00
Clint Wylie	338886fd5f	vector group by support for string expressions (#11010 ) * vector group by support for string expressions * fix test * comments, javadoc	2021-04-08 19:23:39 -07:00
Xavier Léauté	15bdd6bc2f	Fix unit tests and GC settings for Java 15 (#11074 ) * JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it * Fix flaky HTTP client tests with Java 15 * Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15	2021-04-08 10:33:37 -07:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Jihoon Son	43ea184b74	Add explicit EOF and use assert instead of exception (#11041 )	2021-03-31 09:41:57 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Jihoon Son	a041933017	Allow overlapping intervals for the compaction task (#10912 ) * Allow overlapping intervals for the compaction task * unused import * line indentation Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2021-03-23 11:21:54 -07:00
Xavier Léauté	1061faa6ba	prefer string concatenation over String.format in performance sensitive code (#10997 ) String.format relies on regex parsing, which makes these calls expensive at higher request volumes.	2021-03-16 22:06:26 -07:00
Clint Wylie	4cd4a22f87	expression filter support for vectorized query engines (#10613 ) * expression filter support for vectorized query engines * remove unused codes * more tests * refactor, more tests * suppress * more * more * more * oops, i was wrong * comment * remove decorate, object dimension selector, more javadocs * style	2021-03-16 11:46:50 -07:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Maytas Monsereenusorn	4dd22a850b	Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962 ) * Fix streaming ingestion fails and halt if it encounters empty rows * address comments	2021-03-09 12:11:58 -08:00
Abhishek Agarwal	489f5b1a03	Avoid expensive findEntry call in segment metadata query (#10892 ) * Avoid expensive findEntry call in segment metadata query * other places * Remove findEntry * Fix add cost * Refactor a bit * Add performance test * Add comment * Review comments * intellij	2021-03-08 22:08:33 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
Gian Merlino	05e8f8fe06	CsvInputFormat: Create a parser per InputEntityReader. (#10923 ) RFC4180Parser is not thread safe and cannot be shared across readers.	2021-02-27 18:37:05 -08:00
Gian Merlino	07902f607b	Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904 ) * Granularity: Introduce primitive-typed bucketStart, increment methods. Saves creation of unnecessary DateTime objects in timestamp_floor and timestamp_ceil expressions. * Fix style. * Amp up the test coverage.	2021-02-25 07:59:20 -08:00
Clint Wylie	cbbef80c7f	add SQL operators for bitwise expressions (#10823 ) * add SQL operators for bitwise expressions * more test * fix spelling * more tests	2021-02-18 20:56:33 -08:00
Agustin Gonzalez	eabad0fb35	Keep query granularity of compacted segments after compaction (#10856 ) * Keep query granularity of compacted segments after compaction * Protect against null isRollup * Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment * Make sure that NONE is also included when comparing for the finer granularity * Update integration test check for segment size due to query granularity propagation affecting size * Minor code cleanup * Added functional test to verify queryGranlarity after compaction * Minor style fix * Update unit tests	2021-02-18 01:35:10 -08:00
Maytas Monsereenusorn	6541178c21	Support segmentGranularity for auto-compaction (#10843 ) * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * resolve conflict * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * fix tests * fix more tests * fix checkstyle * add unit tests * fix checkstyle * fix checkstyle * fix checkstyle * add unit tests * add integration tests * fix checkstyle * fix checkstyle * fix failing tests * address comments * address comments * fix tests * fix tests * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test	2021-02-12 03:03:20 -08:00
Abhishek Agarwal	8718155f8f	Allow for empty keys in hash map (#10869 ) * allow for empty keys in hash map * fix serde test	2021-02-10 11:19:57 -08:00
Jihoon Son	1ec3f0bd73	Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535 )" (#10871 ) This reverts commit `6b14bdb3a5`.	2021-02-09 17:51:26 -08:00
Agustin Gonzalez	3785ad5812	Add log message when local input's filter does not match any files (#10837 ) * Add log message when local input's filter does not match any files * Re-use previously defined fileIterator	2021-02-05 11:35:19 -06:00
Jihoon Son	ac41e41232	Update doc for query errors and add unit tests for JsonParserIterator (#10833 ) * Update doc for query errors and add unit tests for JsonParserIterator * static constructor for convenience * rename method	2021-02-05 02:55:32 -08:00
Jihoon Son	3f8f00a231	Fix CVE-2021-25646 (#10818 )	2021-02-04 11:21:43 -08:00
Agustin Gonzalez	0e4750bac2	Granularity interval materialization (#10742 ) * Prevent interval materialization for UniformGranularitySpec inside the overlord * Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval> * Javadoc update, respect inputIntervals contract * Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec * Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity * Fix Travis style & other checks * Refactor TreeSet to facilitate re-use in UniformGranularitySpec * Make sure intervals are unique when there is no segment granularity * Style/bugspot fixes... * More travis checks * Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method * Style & PR feedback * Fixed failing test * Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change) * Refactor so that we can get the condensed buckets without materializing the intervals * Get rid of GranularitySpec::condensedInputIntervals ... not needed * Travis failures fixes * Travis checkstyle fix * Edited/added javadoc comments and a method name (code review feedback) * Fixed jacoco coverage by moving class and adding more coverage * Avoid materializing the condensed intervals when locking * Deal with overlapping intervals * Remove code and use library code instead * Refactor intervals by granularity using the FluentIterable, add sanity checks * Change !hasNext() to inputIntervals().isEmpty() * Remove redundant lambda * Use materialized intervals here since this is outside the overlord (for performance) * Name refactor to reflect the fact that bucket intervals are sorted. * Style fixes * Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense) * Remove forbidden api * Move helper class inside common base class to reduce public space pollution	2021-01-29 06:02:10 -08:00
Clint Wylie	2ce7b3dcf4	bitwise math function expressions (#10605 ) * expressions: adding bitwise expressions * double handling and vectorization * move conversion to Evals * revert unintended changes * less magic, split convert functions, fix parser for funny exponent doubles * fix spelling exceptions list * more spelling * fix grammar, add more test, fix docs * fix docs Co-authored-by: Max Kaplan <max@maxkaplan.me>	2021-01-28 11:16:53 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Jihoon Son	b3325c1601	Add a config for monitorScheduler type (#10732 ) * Add a config for monitorScheduler type * check interrupted * null check * do not schedule monitor if the previous one is still running * checkstyle * clean up names * change default back to basic * fix test	2021-01-13 17:20:43 -08:00
Jihoon Son	149306c9db	Tidy up HTTP status codes for query errors (#10746 ) * Tidy up query error codes * fix tests * Restore query exception type in JsonParserIterator * address review comments; add a comment explaining the ugly switch * fix test	2021-01-13 17:20:00 -08:00
Clint Wylie	9362dc7968	re-use expression vector evaluation results for the same offset in expression vector selectors (#10614 ) * cache expression selector results by associating vector expression bindings to underlying vector offset * better coverage, fix floats * style * stupid bot * stupid me * more test * intellij threw me under the bus when it generated those junit methods * narrow interface instead of passing around offset	2021-01-13 12:44:56 -08:00
Xavier Léauté	118b50195e	Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730 ) Today Kafka message support in streaming indexing tasks is limited to message values, and does not provide a way to expose Kafka headers, timestamps, or keys, which may be of interest to more specialized Druid input formats. For instance, Kafka headers may be used to indicate payload format/encoding or additional metadata, and timestamps are often omitted from values in Kafka streams applications, since they are included in the record. This change proposes to introduce KafkaRecordEntity as InputEntity, which would give input formats full access to the underlying Kafka record, including headers, key, timestamps. It would also open access to low-level information such as topic, partition, offset if needed. KafkaEntity is a subclass of ByteEntity for backwards compatibility with existing input formats, and to avoid introducing unnecessary complexity for Kinesis indexing tasks.	2021-01-08 16:04:37 -08:00
Clint Wylie	edfbdbfc97	fix NPE when calling TaskLocation.hashCode with null host (#10708 )	2020-12-24 15:30:54 -08:00
Gian Merlino	57ee8ce4e7	CompressionUtils: Read the entire stream when unzipping from a stream. (#10664 ) * CompressionUtils: Read the entire stream when unzipping from a stream. Should fix #6905 by making sure we avoid closing partially-read streams. * CHECKSTYLE!	2020-12-17 22:52:04 -08:00
Himanshu	ac1882bf74	kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544 ) * honor zk enablement config in more places in druid code * kubernetes based discovery module * fix spotbugs check * fix intellij checks error * fix doc link to kubernetes.md from extension * make spellchecker happy * update license.yaml * fix dependency check errors * update extension coverage * UTs for BaseNodeRoleWatcher * fix forbidden-api check * update k8s module coverage ignores * add Bouncy Castle License being same as MIT License for license checking purposes * further update licenses.yaml * label/annotation pre-existence assumption * address review comment	2020-12-14 21:10:31 -08:00
Gian Merlino	753fa6b3bd	IdUtils: Forbid characters that cannot be used in znodes. (#10659 ) * IdUtils: Forbid characters that cannot be used in znodes. * Fix whitespace.	2020-12-10 10:49:40 -08:00
Gian Merlino	b7641f644c	Two fixes related to encoding of % symbols. (#10645 ) * Two fixes related to encoding of % symbols. 1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments() returns already-decoded strings. Applying StringUtils.urlDecode on top of that causes erroneous behavior with '%' characters. 2) Update various ThreadFactoryBuilder name formats to escape '%' characters. This fixes situations where substrings starting with '%' are erroneously treated as format specifiers. ITs are updated to include a '%' in extra.datasource.name.suffix. * Avoid String.replace. * Work around surefire bug. * Fix xml encoding. * Another try at the proper encoding. * Give up on the emojis. * Less ambitious testing. * Fix an additional problem. * Adjust encodeForFormat to return null if the input is null.	2020-12-06 22:35:11 -08:00
Himanshu	7e9522870f	introduce DynamicConfigProvider interface and make kafka consumer props extensible (#10309 ) * introduce DynamicConfigProvider interface and make kafka consumer props extensible * fix intellij inspection error * make DynamicConfigProvider generic Change-Id: I2e3e89f8617b6fe7fc96859deca4011f609dc5a3 * deprecate PasswordProvider	2020-12-02 16:38:27 -08:00

1 2 3 4 5 ...

549 Commits