Commit Graph

565 Commits

Author SHA1 Message Date
Xavier Léauté 168187e6df
avoid unnecessary String.format calls in IdUtils.validateId (#12147)
Based on profiling data, about 25% of the time de-serializing DataSchema
is spent on formatting strings in validateId.

This can add up quickly, especially when de-serializing task information
in the overlord, where in can consume almost 2% of CPU if there are many
tasks.

Since the formatting is unnecessary unless the checks fail, we can
leverage the built-in formatting of Preconditions.checkArgument instead
to avoid the cost.
2022-01-12 16:34:40 -08:00
Clint Wylie 7cf9192765
fix delegated smoosh writer and some new facilities for segment writeout medium (#12132)
* fix delegated smoosh writer and some new facilities for segment writeout medium
changes:
* fixed issue with delegated `SmooshedWriter` when writing files that look like paths, causing `NoSuchFileException` exceptions when attempting to open a channel to the file
* `FileSmoosher.addWithSmooshedWriter` when _not_ delegating now checks that it is still open when closing, making it a no-op if already closed (allowing column serializers to add additional files and avoid delegated mode if they are finished writing out their own content and ned to add additional files)
* add `makeChildWriteOutMedium` to `SegmentWriteOutMedium` interface, which allows users of a shared medium to clean up `WriteOutBytes` if they fully control the lifecycle. there are no callers of this yet, adding for future functionality
* `OnHeapByteBufferWriteOutBytes` now can be marked as not open so it `OnHeapMemorySegmentWriteOutMedium` can now behave identically to other medium implementations

* fix to address nit - use AtomicLong
2022-01-10 22:25:19 -08:00
Clint Wylie e583033231
add 'TypeStrategy' to types (#11888)
* add TypeStrategy - value comparators and binary serialization for any TypeSignature
2022-01-10 17:12:14 -08:00
AmatyaAvadhanula c0b1514177
Segment pruning for multi-dim partitioning given query domain (#12046)
Segment pruning for multi-dim partitioning for a given query

DimensionRangeShardSpec#possibleInDomain has been modified to enhance pruning when multi-dim partitioning is used.

Idea
While iterating through each dimension,

If query domain doesn't overlap with the set of permissible values in the segment, the segment is pruned.
If the overlap happens on a boundary, consider the next dimensions.
If there is an overlap within the segment boundaries, the segment cannot be pruned.
2021-12-17 12:44:43 +05:30
Suneet Saldanha 25ac04e067
MySqlFirehoseDatabaseConnector uses configured driver class name (#12049) 2021-12-09 20:58:55 -08:00
Frank Chen 58245b4617
Support JsonPath functions in JsonPath expressions (#11722)
* Add jsonPath functions support

* Add jsonPath function test for Avro

* Add jsonPath function length() to Orc

* Add jsonPath function length() to Parquet

* Add more tests to ORC format

* update doc

* Fix exception during ingestion

* Add IT test case

* Revert "Fix exception during ingestion"

This reverts commit 5a5484b9ea.

* update IT test case

* Add 'keys()'

* Commit IT test case

* Fix UT
2021-12-10 10:53:23 +08:00
Jonathan Wei 229f82a6f0
Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961)
* Add parse error list API for stream supervisors, simplify parse exception message

* Add input string to parse exception

* Use structured ParseExceptionReport

* Fix tests

* Add test

* PR comments, add ParseExceptionReport equals verifier

* Fix test
2021-12-09 15:42:55 -06:00
Xavier Léauté 0565f0e6a1
fix build warnings for forbidden-apis (#12034)
* replace deprecated forbidden-apis config failOnUnresolvableSignatures
with ignoreSignaturesOfMissingClasses which avoids warnings for
classes not present in a particular sub-module

* fix incorrect signature for Files.createTempDirectory
2021-12-07 22:21:01 -08:00
Abhishek Agarwal 834aae096a
Human-readable and actionable SQL error messages (#11911)
This PR does two things

1. It adds the capability to surface missing features in SQL to users - The calcite planner will explore through multiple rules to convert a logical SQL query to a druid native query. Some rules change the shape of the query itself, optimize it and some rules are responsible for translating the query into a druid native query. These are DruidQueryRule, DruidOuterQueryRule, DruidJoinRule, DruidUnionDataSourceRule, DruidUnionRule etc. These rules will look at SQL and will do the necessary transformation. But if the rule can't transform the query, it returns back the control to the calcite planner without recording why was it not able to transform. E.g. there is a join query with a non-equal join condition. DruidJoinRule will look at the condition, see that it is not supported, and return back the control. The reason can be that a query can be planned in many different ways so if one rule can't parse it, the query may still be parseable by other rules. In this PR, we are intercepting these gaps and passing them back to the user if the query could not be planned at all.

2. The said capability has been used to generate actionable errors for some common unsupported SQL features. However, not all possible errors are covered and we can keep adding more in the future.
2021-12-07 09:44:08 +05:30
Gian Merlino 76d281d64f
Enable allocating segments at ALL granularity. (#12003)
* Enable allocating segments at ALL granularity.

The main change is that Granularity.granularitiesFinerThan will return ALL if ALL
is passed in.

Allocating segments at ALL granularity is somewhat unconventional, but there
is nothing wrong with it, and it actually makes a lot of sense for tables that
are meant to be used for lookups or dimensions rather than main fact tables.
This change enables ALL segmentGranularity to work properly in appendToExisting
mode.

Also clarifies behavior in javadocs and tests.

* Move tests to improve coverage.
2021-12-03 14:15:05 -08:00
Gian Merlino e0e05aad99
Enhancements to IndexTaskClient. (#12011)
* Enhancements to IndexTaskClient.

1) Ability to use handlers other than StringFullResponseHandler. This
   functionality is not used in production code yet, but is useful
   because it will allow tasks to communicate with each other in
   non-string-based formats and in streaming fashion. In the future,
   we'll be able to use this to make task-to-task communication
   more efficient.

2) Truncate server errors at 1KB, so long errors do not pollute logs.

3) Change error log level for retryable errors from WARN to INFO. (The
   final error is still WARN.)

4) Harmonize log and exception messages to have a more consistent format.

* Additional tests and improvements.
2021-12-03 09:14:32 -08:00
Clint Wylie 84b4bf56d8
vectorize logical operators and boolean functions (#11184)
changes:
* adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions
* vectorize logical operators and boolean functions, some only if useStrictBooleans is true
2021-12-02 16:40:23 -08:00
Gian Merlino f47afd7b98
HttpResponseHandler: Fill out truncated javadoc. (#12004) 2021-12-02 14:05:51 -08:00
Karan Kumar ffa553593f
Use one factory in json reader (#11999) 2021-12-01 16:17:48 +05:30
Paul Rogers a66f10eea1
Code cleanup from query profile project (#11822)
* Code cleanup from query profile project

* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse

* Reverted change due to lack of tests
2021-11-30 11:35:38 -08:00
Agustin Gonzalez 8eff6334f7
AWS "Data read has a different length than the expected" error should reset stream and try again (#11941)
* Add support for custom reset condition & support for other args to have defaults to make the method api consistent

* Add support for custom reset condition to InputEntity

* Fix test names

* Clarifying comments to why we need to read the message's content to identify S3's resettable exception

* Add unit test to verify custom resettable condition for S3Entity

* Provide a way to customize retries since they are expensive to test
2021-11-26 12:45:34 -07:00
Gian Merlino 3d72e66f56
Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. (#11582)
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.

This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.

In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.

So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!

* Fix test compiling.

* Fixes and improvements.

* Fix forbidden API.

* Additional fixes.
2021-11-24 14:51:53 -08:00
Gian Merlino 0354407655
SQL INSERT planner support. (#11959)
* SQL INSERT planner support.

The main changes are:

1) DruidPlanner is able to validate and authorize INSERT queries. They
   require WRITE permission on the target datasource.

2) QueryMaker is now an interface, and there is a QueryMakerFactory that
   creates instances of it. There is only one production implementation
   of each (NativeQueryMaker and NativeQueryMakerFactory), which
   together behave the same way as the former QueryMaker class. But this
   opens the door to executing queries in ways other than the Druid
   query stack, and is used by unit tests (CalciteInsertDmlTest) to
   test the INSERT planning functionality.

3) Adds an EXTERN table macro that allows references external data using
   InputSource and InputFormat from Druid's batch ingestion API. This is
   not exposed in production yet, but is used by unit tests.

4) Adds a QueryFeature concept that enables the planner to change its
   behavior slightly depending on the capabilities of the execution
   system.

5) Adds an "AuthorizableOperator" concept that enables SqlOperators
   to require additional permissions. This is used by the EXTERN table
   macro.

Related odds and ends:

- Add equals, hashCode, toString methods to InlineInputSource. Aids in
  the "from external" tests in CalciteInsertDmlTest.
- Add JSON-serializability to RowSignature.
- Move the SQL string inside PlannerContext so it is "baked into" the
  planner when the planner is created. Cleans up the code a bit, since
  in practice, the same query is passed in every time to the
  same planner anyway.

* Fix up calls to CalciteTests.createMockQueryLifecycleFactory.

* Fix checkstyle issues.

* Adjustments for CI.

* Adjust DruidAvaticaHandlerTest for stricter test authorizations.
2021-11-24 12:14:04 -08:00
Maytas Monsereenusorn bb3d2a433a
Support filtering data in Auto Compaction (#11922)
* add impl

* fix checkstyle

* add test

* add test

* add unit tests

* fix unit tests

* fix unit tests

* fix unit tests

* add IT

* add IT

* add comments

* fix spelling
2021-11-24 10:56:38 -08:00
cheddar e6570cadc4
Update LifecycleModule.java (#11972)
Update the javadoc on LifecycleModule to be more clear about why the register methods exist and why they should always be used instead of Guice's eager instantiation.
2021-11-23 17:03:37 -08:00
Gian Merlino b13f07a057
Harmonize local input sources; fix batch index integration test. (#11965)
* Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list.

Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input
files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like
ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the
4-segment case will always happen. Providing the file list in a specific order ensures that this
will happen as expected by the tests.

I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so
changing it to a List was the simplest way to achieve the consistent ordering. I think it will
also make the behavior make more sense if someone does specify the same input file multiple
times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This
is consistent with the behavior of other input sources like S3, GCS, HTTP.

* Sort files in LocalFirehoseFactory.
2021-11-21 22:26:31 -08:00
Clint Wylie f260bbed23
restore and deprecate AggregatorFactory methods (#11917)
* add back and deprecate aggregator factory methods so i can say i told you so when i delete these later

* rename to make less ambiguous, fix fill method

* adjust
2021-11-19 15:59:35 -08:00
somu-imply 29710789a4
Adding safe divide function (#11904)
* IMPLY-4344: Adding safe divide function along with testcases and documentation updates

* Changing based on review comments

* Addressing review comments, fixing coding style, docs and spelling

* Checkstyle passes for all code

* Fixing expected results for infinity

* Revert "Fixing expected results for infinity"

This reverts commit 5fd5cd480d.

* Updating test result and a space in docs
2021-11-17 08:22:41 -08:00
TSFenwick 1487f558b1
Use a simple class to sanitize JDBC exceptions and also log them (#11843)
* Use a simple class to sanitize sanitizable errors and log them

The purpose of this is to sanitize JDBC errors, but can sanitize other errors
if they implement SanitizableError Interface

add a class to log errors and sanitize them
added a simple test that tests out that the error gets sanitized
add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy

* return less information as part of too many connections, and instead only log specific details

This is so an end user gets relevant information but not too much info since they might now how
many brokers they have

* return only runtime exceptions

added new error types that need to be sanitized
also sanitize deprecated and unsupported exceptions.

* dont reqrewite exceptions unless necessary for checked exceptions

add docs
avoid blanket turning all exceptions into runtime exceptions

* address comments, to fix up docs.

add more javadocs
add support UOE sanitization

* use try catch instead and sanitize at public methods

* checkstyle fixes

* throw noSuchStatement and NoSuchConnection as Avatica is affected by those

* address comments. move log error back to druid meta

clean up bad formatting and commented code. add missed catch for NoSuchStatementException
clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions

* alter test to reflect new error message
2021-11-16 13:13:03 -08:00
Kashif Faraz 223c5692a8
Add dimension partitioningType to metrics to track usage of different partitioning schemes (#11902)
Add method ShardSpec.getType() to get name of shard spec type
List all names of shard spec types in the interface ShardSpec itself
for easy reference and maintenance
Add dimension partitioningType to metric segment/added/bytes
2021-11-11 18:34:27 +05:30
Gian Merlino 14b0b4aee2
RowBasedSegment: Use Sequence instead of Iterable. (#11886)
* RowBasedSegment: Use Sequence instead of Iterable.

The main reason this is good is that Sequences can include baggage that
must be closed after iteration is finished. This enables creating
RowBasedSegments on top of closeable sequences of rows.

To preserve the optimization that allows reversing a List without
copying it, this patch also makes SimpleSequence its own class and allows
extracting the Iterable that was used to create it.

* Fix tests.
2021-11-10 06:06:52 -08:00
Kashif Faraz d3914c1a78
Ensure backward compatibility of multi dimension partitioning (#11889)
This PR has changes to ensure backward compatibility of multi dimension partitioning
such that if some middle managers are upgraded to a newer version, the cluster still
functions normally for single_dim use cases.
2021-11-10 10:23:34 +05:30
Maytas Monsereenusorn a36a41da73
Support routing data through an HTTP proxy (#11891)
* Support routing data through an HTTP proxy

* Support routing data through an HTTP proxy

This adds the ability for the HttpClient to connect through an HTTP proxy.  We
augment the channel factory to check if it is supposed to be proxied and, if so,
we connect to the proxy host first, issue a CONNECT command through to the final
recipient host and *then* give the channel to the normal http client for usage.

* add docs

* address comments

Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2021-11-09 17:24:06 -08:00
Gian Merlino babf00f8e3
Migrate File.mkdirs to FileUtils.mkdirp. (#11879)
* Migrate File.mkdirs to FileUtils.mkdirp.

* Remove unused imports.

* Fix LookupReferencesManager.

* Simplify.

* Also migrate usages of forceMkdir.

* Fix var name.

* Fix incorrect call.

* Update test.
2021-11-09 11:10:49 -08:00
Maytas Monsereenusorn ddc68c6a81
Support changing dimension schema in Auto Compaction (#11874)
* add impl

* add unit tests

* fix checkstyle

* add impl

* add impl

* add impl

* add impl

* add impl

* add impl

* fix test

* add IT

* add IT

* fix docs

* add test

* address comments

* fix conflict
2021-11-08 21:17:08 -08:00
Clint Wylie 7237dc837c
complex typed expressions (#11853)
* complex typed expressions

* add built-in hll collector expressions to get coverage on druid-processing, more types, more better

* rampage!!!

* more javadoc

* adjustments

* oops

* lol

* remove unused dependency

* contradiction?

* more test
2021-11-08 00:33:06 -08:00
Kashif Faraz 2d77e1a3c6
Add support for multi dimension range partitioning (#11848)
This PR adds support for range partitioning on multiple dimensions. It extends on the
concept and implementation of single dimension range partitioning.

The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension.

The start and end values of a DimensionRangeShardSpec are represented
by StringTuples, where each String in the tuple is the value of a partition dimension.
2021-11-06 12:50:17 +05:30
Gian Merlino 1c12dd97dc
Add javadocs to StringUtils.fromUtf8. (#11881)
They clarify that the methods advance the position of the buffer.
2021-11-05 15:27:24 -07:00
Gian Merlino 98ecbb21cd
Remove CloseQuietly and migrate its usages to other methods. (#10247)
* Remove CloseQuietly and migrate its usages to other methods.

These other methods include:

1) New method CloseableUtils.closeAndWrapExceptions, which wraps IOExceptions
   in RuntimeExceptions for callers that just want to avoid dealing with
   checked exceptions. Most usages were migrated to this method, because it
   looks like they were mainly attempts to avoid declaring a throws clause,
   and perhaps were unintentionally suppressing IOExceptions.
2) New method CloseableUtils.closeInCatch, designed to properly close something
   in a catch block without losing exceptions. Some usages from catch blocks
   were migrated here, when it seemed that they were intended to avoid checked
   exception handling, and did not really intend to also suppress IOExceptions.
3) New method CloseableUtils.closeAndSuppressExceptions, which sends all
   exceptions to a "chomper" that consumes them. Nothing is thrown or returned.
   The behavior is slightly different: with this method, _all_ exceptions are
   suppressed, not just IOExceptions. Calls that seemed like they had good
   reason to suppress exceptions were migrated here.
4) Some calls were migrated to try-with-resources, in cases where it appeared
   that CloseQuietly was being used to avoid throwing an exception in a finally
   block.

🎵 You don't have to go home, but you can't stay here... 🎵

* Remove unused import.

* Fix up various issues.

* Adjustments to tests.

* Fix null handling.

* Additional test.

* Adjustments from review.

* Fixup style stuff.

* Fix NPE caused by holder starting out null.

* Fix spelling.

* Chomp Throwables too.
2021-10-23 17:03:21 -07:00
Gian Merlino cb9bc15e95
Fix task report streaming in https setups. (#11739)
* Fix task report streaming in https setups.

* Trivial change to re-trigger ITs.
2021-10-22 19:07:29 -07:00
Clint Wylie 741b4ed516
add output type information to ExpressionPostAggregator (#11818)
* add ColumnInspector argument to PostAggregator.getType to allow post-aggs to compute their output type based on input types

* add test for test for coverage

* simplify

* Remove unused imports.

Co-authored-by: Gian Merlino <gian@imply.io>
2021-10-22 13:52:51 -07:00
Arun Ramani df4894afff
Fallback to /sys/fs root when looking for cgroups (#11810)
ProcCgroupDiscoverer builds the cgroup directory by concatenating the proc mounts and proc cgroup paths together. This doesn't seem to work in Kubernetes if the execution context is within the container. Also this isn't consistent across all Linux OSes. The fix is to fallback to / as the root and it seems to work empirically.
2021-10-21 09:51:16 +05:30
Clint Wylie 187df58e30
better types (#11713)
* better type system

* needle in a haystack

* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support

* fixup merge

* more test

* fixup

* intern

* fix

* oops

* oops again

* ...

* more test coverage

* fix error message

* adjust interning, more javadocs

* oops

* more docs more better
2021-10-19 01:47:25 -07:00
Kashif Faraz 7352c83e11
Do not log sensitive property value if JsonConfigurator fails to parse (#11787)
* Do not log property value if JsonConfigurator fails to parse

* Add comment to explain log change

* Fix log language
2021-10-09 09:59:03 +05:30
Arun Ramani b6b42d3936
Minor processor quota computation fix + docs (#11783)
* cpu/cpuset cgroup and procfs data gathering

* Renames and default values

* Formatting

* Trigger Build

* Add cgroup monitors

* Return 0 if no period

* Update

* Minor processor quota computation fix + docs

* Address comments

* Address comments

* Fix spellcheck

Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>
2021-10-08 22:52:03 -05:00
Arun Ramani 15789137a3
Add cpu/cpuset cgroup and procfs data gathering (#11763)
* cpu/cpuset cgroup and procfs data gathering

* Renames and default values

* Formatting

* Trigger Build

* Add cgroup monitors

* Return 0 if no period

* Update

Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>
2021-10-06 20:27:36 -07:00
Maytas Monsereenusorn a04b08e45c
Add new config to filter internal Druid-related messages from Query API response (#11711)
* add impl

* add impl

* add tests

* add unit test

* fix checkstyle

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* address comments

* address comments

* address comments

* fix test

* fix test

* fix test

* fix test

* fix test

* change config name

* change config name

* change config name

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* fix compile

* fix compile

* change config

* add more tests

* fix IT
2021-09-29 12:55:49 +07:00
Clint Wylie fe1d8c206a
bump version to 0.23.0-SNAPSHOT (#11670) 2021-09-08 15:56:04 -07:00
Atul Mohan dcee99df78
Improve error message when buckets are null for cloud objects (#11644)
* Add error message

* Add test

* Checkstyle
2021-09-07 17:31:17 -07:00
Sandeep ac2b65e837
fixes possible data truncation (#11462)
* fixes possible data truncation

* fixes possible data truncation

* add unit test case to catch the possible data truncation
2021-08-26 20:16:26 +08:00
Jihoon Son 2a658acad4
Put sleep in an extension (#11632)
* Put sleep in an extension

* dependency
2021-08-25 01:27:45 -07:00
Jihoon Son 78b4be467e
Add sleep function for testing (#11626)
* Add sleep function for testing

* sql function

* javadoc
2021-08-24 14:30:31 +07:00
Yi Yuan bf863343f8
delete some code (#11552)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-16 10:40:40 -07:00
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
frank chen e40be0ae28
Add SQL functions to format numbers into human readable format (#10635)
* add binary_byte_format/decimal_byte_format/decimal_format

* clean code

* fix doc

* fix review comments

* add spelling check rules

* remove extra param

* improve type handling and null handling

* remove extra zeros

* fix tests and add space between unit suffix and number as most size-format functions do

* fix tests

* add examples

* change function names according to review comments

* fix merge

Signed-off-by: frank chen <frank.chen021@outlook.com>

* no need to configure NullHandling explicitly for tests

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix tests in SQL-Compatible mode

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Resolve review comments

* Update SQL test case to check null handling

* Fix intellij inspections

* Add more examples

* Fix example
2021-08-13 10:27:49 -07:00
Harini Rajendran ccd362d228
Fix FileIteratingFirehoseTest to extend NullHandlingTest (#11581) 2021-08-12 08:26:04 -07:00
Yi Yuan 23d7d71ea5
Add Environment Variable DynamicConfigProvider (#11377)
* add_environment_variable_DynamicConfigProvider

* fix code

* code fixed

* code fixed

* add document

* fix doc

* fix doc

* add more unit test

* fix style

* fix document

* bug fixed

* fix unit test

* fix comment

* fix test

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-04 20:26:58 -07:00
wx930910 578625b771
Replace TestInputRowHandler with mocking object (#11529)
* Replace TestInputRowHandler with mocking object

* Change EasyMock object to Mockito object. Make test logic concise

* correct code format
2021-08-04 16:45:22 -07:00
Yi Yuan aa7cb50f24
Add DynamicConfigProvider for Schema Registry (#11362)
* add_DynamicConfigProvider_for_schema_registry

* bug fixed

* add document

* fix document

* fix spot bug

* fix document

* inject ObjectMapper

* add DynamicConfigProviderUtils

* add UT

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-03 13:24:52 -07:00
Agustin Gonzalez a2da407b70
Add error msg to parallel task's TaskStatus (#11486)
* Add error msg to parallel task's TaskStatus

* Consolidate failure block

* Add failure test

* Make it fail

* Add fail while stopped

* Simplify hash task test using a runner that fails after so many runs (parameter)

* Remove unthrown exception

* Use runner names to identify phase

* Added range partition kill test & fixed a timing bug with the custom runner

* Forbidden api

* Style

* Unit test code cleanup

* Added message to invalid state exception and improved readability  of the phase error messages for the parallel task failure unit tests
2021-08-02 12:11:28 -07:00
Xavier Léauté 4bca7f014e
update error-prone to 2.8.0 with fix for crashing check (#11494)
* error-prone 2.8.0 fixes https://github.com/google/error-prone/issues/2396
* fix for a few ignored return values
* fix unknown args in sub-modules
2021-07-29 09:13:46 -07:00
Jihoon Son 8729b40893
Add the error message in taskStatus for task failures in overlord (#11419)
* add error messages in taskStatus for task failures in overlord

* unused imports

* add helper message for logs to look up

* fix tests

* fix counting the same task failures more than once

* same fix for HttpRemoteTaskRunner
2021-07-15 13:14:28 -07:00
Suneet Saldanha 49e8732e4f
Display errors for invalid timezones in TIME_FORMAT (#11423)
Users sometimes make typos when picking timezones - like `America/Los Angeles`
instead of `America/Los_Angeles` instead of defaulting to UTC, this change
makes it so that an error is thrown instead notifying the user of their mistake.
2021-07-09 06:07:13 -07:00
Clint Wylie 63fcd77c38
support using mariadb connector with mysql extensions (#11402)
* support using mariadb connector with mysql extensions

* cleanup and more tests

* fix test

* javadocs, more tests, etc

* style and more test

* more test more better

* missing pom

* more pom
2021-07-08 12:25:37 -07:00
Clint Wylie 17efa6f556
add single input string expression dimension vector selector and better expression planning (#11213)
* add single input string expression dimension vector selector and better expression planning

* better

* fixes

* oops

* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing

* oops

* javadocs, renaming

* more javadocs

* benchmarks

* use string expression vector processor with vector size 1 instead of expr.eval

* better logging

* javadocs, surprising number of the the

* more

* simplify
2021-07-06 11:20:49 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
Clint Wylie df9b57aa1a
bitwise aggregators, better null handling options for expression agg (#11280)
* bitwise aggregators, better nulls for expression agg

* correct behavior

* rework deserialize, better names

* fix json, share mask
2021-06-25 16:51:16 -07:00
Xavier Léauté 712f2a5d00
upgrade error-prone to 2.7.1 and support checks with Java 11+ (#11363)
* upgrade error-prone to 2.7.1 and support checks with Java 11+

- upgrade error-prone to 2.7.1
- support running error-prone with Java 11 and above using -Xplugin
  instead of custom compiler
- add compiler arguments to ignore warnings/errors in Java 15/16
- introduce strictCompile property to enable strict profiles since we
  now need multiple strict profiles for Java 8
- properly exclude all generated source files from error-prone
- fix druid-processing overriding annotation processors from parent pom
- fix druid-core disabling most non-default checks
- align plugin and annotation errorprone versions
- fix / suppress additional issues found by error-prone:
  * fix bug in SeekableStreamSupervisor initializing ArrayList size with
    the taskGroupdId
  * fix missing @Override annotations
- remove outdated compiler plugin in benchmarks
- remove deleted ParameterPackage error-prone rule
- re-enable checks on benchmark module as well

* fix IntelliJ inspections

* disable LongFloatConversion due to bug in error-prone with JDK 8

* add comment about InsecureCrypto
2021-06-16 12:55:34 -07:00
Clint Wylie bfbd7ec432
fix a bugs related to SQL type inference return type nullability (#11327)
* fix a bunch of type inference nullability bugs

* fixes

* style

* fix test

* fix concat
2021-06-15 12:26:59 -07:00
Clint Wylie 920aa414ca
enrich expression cache key information to support expressions which depend on external state (#11358)
* enrich expression cache key information to support expressions which depend on external state such as lookups

* cache rules everything around me

* low carb

* rename
2021-06-14 17:26:43 -07:00
dependabot[bot] 167044f715
Bump fastutil from 8.2.3 to 8.5.4 (#11347)
* Bump fastutil from 8.2.3 to 8.5.4

Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4.
- [Release notes](https://github.com/vigna/fastutil/releases)
- [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES)
- [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4)

---
updated-dependencies:
- dependency-name: it.unimi.dsi:fastutil
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml
* update maven dependency list for -core and -extra libraries to pass maven dependency checks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2021-06-10 07:43:18 -07:00
Maytas Monsereenusorn e5633d7842
Fix bug: 502 bad gateway thrown when we edit/delete any auto compaction config created 0.21.0 or before (#11311)
* fix bug

* add test

* fix IT

* fix checkstyle

* address comments
2021-05-27 16:34:32 -07:00
Clint Wylie 2bfcee5824
Fix issue with empty array converting to string expression instead of string array (#11270) 2021-05-22 09:31:28 +08:00
Clint Wylie 6d08a7051e
fix bug with aggregator expressions on realtime index with string columns always producing 0 values (#11185)
* fix bug with aggregator expressions on realtime index with string columns always producing 0 values

* more test

* rework some stuff

* javadocs
2021-05-17 11:59:13 -07:00
Clint Wylie 3649c608d2
array handling improvements (#11233)
* fix jdbc array handling, split handling for some array and multi value operator, split and add more tests

* formatting
2021-05-13 18:50:32 -07:00
Clint Wylie 790262e5d0
add estimated byte size limit enforcement for heap based expression aggregator (#11236) 2021-05-12 01:21:50 -07:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Maytas Monsereenusorn 3a660bc6ee
Make sure updating coordinator config is protected against race condition (#11144)
* Make sure changing coordinator config is protected against concurrent updates

* Make sure updating coordinator config is protected against race condition

* add retry

* fix checkstyle

* add tests

* add tests

* add more tests

* add tests

* fix

* fix checkstyle
2021-05-10 13:58:08 -07:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00
Clint Wylie 554f1ffeee
ARRAY_AGG sql aggregator function (#11157)
* ARRAY_AGG sql aggregator function

* add javadoc

* spelling

* review stuff, return null instead of empty when nil input

* review stuff

* Update sql.md

* use type inference for finalize, refactor some things
2021-05-03 22:17:10 -07:00
Gian Merlino ad028de538
InDimFilter: Fix NPE involving certain Set types. (#11169)
* InDimFilter: Fix NPE involving certain Set types.

Normally, InDimFilters that come from JSON have HashSets for "values".
However, programmatically-generated filters (like the ones from #11068)
may use other set types. Some set types, like TreeSets with natural
ordering, will throw NPE on "contains(null)", which causes the
InDimFilter's ValueMatcher to throw NPE if it encounters a null value.

This patch adds code to detect if the values set can support
contains(null), and if not, wrap that in a null-checking lambda.

Also included:

- Remove unneeded NullHandling.needsEmptyToNull method.
- Update IndexedTableJoinable to generate a TreeSet that does not
  require lambda-wrapping. (This particular TreeSet is how I noticed
  the bug in the first place.)

* Test fixes.

* Improve test coverage
2021-04-28 14:13:42 -07:00
Clint Wylie 57ff1f9cdb
expression aggregator (#11104)
* add experimental expression aggregator

* add test

* fix lgtm

* fix test

* adjust test

* use not null constant

* array_set_concat docs

* add equals and hashcode and tostring

* fix it

* spelling

* do multi-value magic for expression agg, more javadocs, tests

* formatting

* fix inspection

* more better

* nullable
2021-04-22 18:30:16 -07:00
Maytas Monsereenusorn 6d2b5cdd7e
Add feature to automatically remove audit logs based on retention period (#11084)
* add docs

* add impl

* fix checkstyle

* fix test

* add test

* fix checkstyle

* fix checkstyle

* fix test

* Address comments

* Address comments

* fix spelling

* fix docs
2021-04-20 17:10:43 -07:00
Maytas Monsereenusorn f968400170
Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078)
* Add config to skip storing audit payload if exceed limit

* fix checkstyle

* change config name

* skip null fields for audit payload

* fix checkstyle

* address comments

* fix guice

* fix test

* add tests

* address comments

* address comments

* address comments

* fix checkstyle

* address comments

* fix test

* fix test

* address comments

* Address comments

Co-authored-by: Jihoon Son <jihoonson@apache.org>

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-04-13 20:18:28 -07:00
chenyuzhi459 b8423a38df
add round test (#11088)
* add round test

* code style

* handle null val for round function

* handle null val for round function

* support null for round

* fix compatiblity

* fix test

* fix test

* code style

* optimize format
2021-04-13 11:36:32 -07:00
Lucas Capistrant 8264203cee
Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676)
* Add ability to wait for segment availability for batch jobs

* IT updates

* fix queries in legacy hadoop IT

* Fix broken indexing integration tests

* address an lgtm flag

* spell checker still flagging for hadoop doc. adding under that file header too

* fix compaction IT

* Updates to wait for availability method

* improve unit testing for patch

* fix bad indentation

* refactor waitForSegmentAvailability

* Fixes based off of review comments

* cleanup to get compile after merging with master

* fix failing test after previous logic update

* add back code that must have gotten deleted during conflict resolution

* update some logging code

* fixes to get compilation working after merge with master

* reset interrupt flag in catch block after code review pointed it out

* small changes following self-review

* fixup some issues brought on by merge with master

* small changes after review

* cleanup a little bit after merge with master

* Fix potential resource leak in AbstractBatchIndexTask

* syntax fix

* Add a Compcation TuningConfig type

* add docs stipulating the lack of support by Compaction tasks for the new config

* Fixup compilation errors after merge with master

* Remove erreneous newline
2021-04-08 21:03:00 -07:00
Clint Wylie 338886fd5f
vector group by support for string expressions (#11010)
* vector group by support for string expressions

* fix test

* comments, javadoc
2021-04-08 19:23:39 -07:00
Xavier Léauté 15bdd6bc2f
Fix unit tests and GC settings for Java 15 (#11074)
* JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it
* Fix flaky HTTP client tests with Java 15
* Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15
2021-04-08 10:33:37 -07:00
Jihoon Son cfcebc40f6
Allow list for JDBC connection properties to address CVE-2021-26919 (#11047)
* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11
2021-04-01 17:30:47 -07:00
Jihoon Son 43ea184b74
Add explicit EOF and use assert instead of exception (#11041) 2021-03-31 09:41:57 -07:00
Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00
Jihoon Son a041933017
Allow overlapping intervals for the compaction task (#10912)
* Allow overlapping intervals for the compaction task

* unused import

* line indentation

Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>
2021-03-23 11:21:54 -07:00
Xavier Léauté 1061faa6ba
prefer string concatenation over String.format in performance sensitive code (#10997)
String.format relies on regex parsing, which makes these calls expensive
at higher request volumes.
2021-03-16 22:06:26 -07:00
Clint Wylie 4cd4a22f87
expression filter support for vectorized query engines (#10613)
* expression filter support for vectorized query engines

* remove unused codes

* more tests

* refactor, more tests

* suppress

* more

* more

* more

* oops, i was wrong

* comment

* remove decorate, object dimension selector, more javadocs

* style
2021-03-16 11:46:50 -07:00
Abhishek Agarwal c66951a59e
Add flag in SQL to disable left base filter optimization for joins (#10947)
* Add flag to disable left base filter

* code coverage

* Draft

* Review comments

* code coverage

* add docs

* Add old tests
2021-03-09 13:07:34 -08:00
Maytas Monsereenusorn 4dd22a850b
Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962)
* Fix streaming ingestion fails and halt if it  encounters empty rows

* address comments
2021-03-09 12:11:58 -08:00
Abhishek Agarwal 489f5b1a03
Avoid expensive findEntry call in segment metadata query (#10892)
* Avoid expensive findEntry call in segment metadata query

* other places

* Remove findEntry

* Fix add cost

* Refactor a bit

* Add performance test

* Add comment

* Review comments

* intellij
2021-03-08 22:08:33 -08:00
Jihoon Son 9946306d4b
Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830)
* Allow only HTTP and HTTPS protocols for the HTTP inputSource

* rename

* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* fix http firehose and update doc

* HDFS inputSource

* add configs for allowed protocols

* fix checkstyle and doc

* more checkstyle

* remove stale doc

* remove more doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* update hdfs address in docs

* fix test

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-06 11:43:00 -08:00
Gian Merlino 05e8f8fe06
CsvInputFormat: Create a parser per InputEntityReader. (#10923)
RFC4180Parser is not thread safe and cannot be shared across readers.
2021-02-27 18:37:05 -08:00
Gian Merlino 07902f607b
Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904)
* Granularity: Introduce primitive-typed bucketStart, increment methods.

Saves creation of unnecessary DateTime objects in timestamp_floor and
timestamp_ceil expressions.

* Fix style.

* Amp up the test coverage.
2021-02-25 07:59:20 -08:00
Clint Wylie cbbef80c7f
add SQL operators for bitwise expressions (#10823)
* add SQL operators for bitwise expressions

* more test

* fix spelling

* more tests
2021-02-18 20:56:33 -08:00
Agustin Gonzalez eabad0fb35
Keep query granularity of compacted segments after compaction (#10856)
* Keep query granularity of compacted segments after compaction

* Protect against null isRollup

* Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment

* Make sure that NONE is also included when comparing for the finer granularity

* Update integration test check for segment size due to query granularity propagation affecting size

* Minor code cleanup

* Added functional test to verify queryGranlarity after compaction

* Minor style fix

* Update unit tests
2021-02-18 01:35:10 -08:00
Maytas Monsereenusorn 6541178c21
Support segmentGranularity for auto-compaction (#10843)
* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* resolve conflict

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* fix tests

* fix more tests

* fix checkstyle

* add unit tests

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add unit tests

* add integration tests

* fix checkstyle

* fix checkstyle

* fix failing tests

* address comments

* address comments

* fix tests

* fix tests

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test
2021-02-12 03:03:20 -08:00
Abhishek Agarwal 8718155f8f
Allow for empty keys in hash map (#10869)
* allow for empty keys in hash map

* fix serde test
2021-02-10 11:19:57 -08:00
Jihoon Son 1ec3f0bd73
Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535)" (#10871)
This reverts commit 6b14bdb3a5.
2021-02-09 17:51:26 -08:00
Agustin Gonzalez 3785ad5812
Add log message when local input's filter does not match any files (#10837)
* Add log message when local input's filter does not match any files

* Re-use previously defined fileIterator
2021-02-05 11:35:19 -06:00
Jihoon Son ac41e41232
Update doc for query errors and add unit tests for JsonParserIterator (#10833)
* Update doc for query errors and add unit tests for JsonParserIterator

* static constructor for convenience

* rename method
2021-02-05 02:55:32 -08:00
Jihoon Son 3f8f00a231
Fix CVE-2021-25646 (#10818) 2021-02-04 11:21:43 -08:00
Agustin Gonzalez 0e4750bac2
Granularity interval materialization (#10742)
* Prevent interval materialization for UniformGranularitySpec inside the overlord

* Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval>

* Javadoc update, respect inputIntervals contract

* Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec

* Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity

* Fix Travis style & other checks

* Refactor TreeSet to facilitate re-use in UniformGranularitySpec

* Make sure intervals are unique when there is no segment granularity

* Style/bugspot fixes...

* More travis checks

* Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method

* Style & PR feedback

* Fixed failing test

* Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change)

* Refactor so that we can get the condensed buckets without materializing the intervals

* Get rid of GranularitySpec::condensedInputIntervals ... not needed

* Travis failures fixes

* Travis checkstyle fix

* Edited/added javadoc comments and a method name (code review feedback)

* Fixed jacoco coverage by moving class and adding more coverage

* Avoid materializing the condensed intervals when locking

* Deal with overlapping intervals

* Remove code and use library code instead

* Refactor intervals by granularity using the FluentIterable, add sanity checks

* Change !hasNext() to inputIntervals().isEmpty()

* Remove redundant lambda

* Use materialized intervals here since this is outside the overlord (for performance)

* Name refactor to reflect the fact that bucket intervals are sorted.

* Style fixes

* Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense)

* Remove forbidden api

* Move helper class inside common base class to reduce public space pollution
2021-01-29 06:02:10 -08:00
Clint Wylie 2ce7b3dcf4
bitwise math function expressions (#10605)
* expressions: adding bitwise expressions

* double handling and vectorization

* move conversion to Evals

* revert unintended changes

* less magic, split convert functions, fix parser for funny exponent doubles

* fix spelling exceptions list

* more spelling

* fix grammar, add more test, fix docs

* fix docs

Co-authored-by: Max Kaplan <max@maxkaplan.me>
2021-01-28 11:16:53 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Jihoon Son b3325c1601
Add a config for monitorScheduler type (#10732)
* Add a config for monitorScheduler type

* check interrupted

* null check

* do not schedule monitor if the previous one is still running

* checkstyle

* clean up names

* change default back to basic

* fix test
2021-01-13 17:20:43 -08:00
Jihoon Son 149306c9db
Tidy up HTTP status codes for query errors (#10746)
* Tidy up query error codes

* fix tests

* Restore query exception type in JsonParserIterator

* address review comments; add a comment explaining the ugly switch

* fix test
2021-01-13 17:20:00 -08:00
Clint Wylie 9362dc7968
re-use expression vector evaluation results for the same offset in expression vector selectors (#10614)
* cache expression selector results by associating vector expression bindings to underlying vector offset

* better coverage, fix floats

* style

* stupid bot

* stupid me

* more test

* intellij threw me under the bus when it generated those junit methods

* narrow interface instead of passing around offset
2021-01-13 12:44:56 -08:00
Xavier Léauté 118b50195e
Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730)
Today Kafka message support in streaming indexing tasks is limited to
message values, and does not provide a way to expose Kafka headers,
timestamps, or keys, which may be of interest to more specialized
Druid input formats. For instance, Kafka headers may be used to indicate
payload format/encoding or additional metadata, and timestamps are often
omitted from values in Kafka streams applications, since they are
included in the record.

This change proposes to introduce KafkaRecordEntity as InputEntity,
which would give input formats full access to the underlying Kafka record,
including headers, key, timestamps. It would also open access to low-level
information such as topic, partition, offset if needed.

KafkaEntity is a subclass of ByteEntity for backwards compatibility with
existing input formats, and to avoid introducing unnecessary complexity
for Kinesis indexing tasks.
2021-01-08 16:04:37 -08:00
Clint Wylie edfbdbfc97
fix NPE when calling TaskLocation.hashCode with null host (#10708) 2020-12-24 15:30:54 -08:00
Gian Merlino 57ee8ce4e7
CompressionUtils: Read the entire stream when unzipping from a stream. (#10664)
* CompressionUtils: Read the entire stream when unzipping from a stream.

Should fix #6905 by making sure we avoid closing partially-read streams.

* CHECKSTYLE!
2020-12-17 22:52:04 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Gian Merlino 753fa6b3bd
IdUtils: Forbid characters that cannot be used in znodes. (#10659)
* IdUtils: Forbid characters that cannot be used in znodes.

* Fix whitespace.
2020-12-10 10:49:40 -08:00
Gian Merlino b7641f644c
Two fixes related to encoding of % symbols. (#10645)
* Two fixes related to encoding of % symbols.

1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments()
   returns already-decoded strings. Applying StringUtils.urlDecode on
   top of that causes erroneous behavior with '%' characters.

2) Update various ThreadFactoryBuilder name formats to escape '%'
   characters. This fixes situations where substrings starting with '%'
   are erroneously treated as format specifiers.

ITs are updated to include a '%' in extra.datasource.name.suffix.

* Avoid String.replace.

* Work around surefire bug.

* Fix xml encoding.

* Another try at the proper encoding.

* Give up on the emojis.

* Less ambitious testing.

* Fix an additional problem.

* Adjust encodeForFormat to return null if the input is null.
2020-12-06 22:35:11 -08:00
Himanshu 7e9522870f
introduce DynamicConfigProvider interface and make kafka consumer props extensible (#10309)
* introduce DynamicConfigProvider interface and make kafka consumer props extensible

* fix intellij inspection error

* make DynamicConfigProvider generic

Change-Id: I2e3e89f8617b6fe7fc96859deca4011f609dc5a3

* deprecate PasswordProvider
2020-12-02 16:38:27 -08:00
Ayush Kulshrestha d0c2ede50c
Added CronScheduler support as a proof to clock drift while emitting metrics (#10448)
Co-authored-by: Ayush Kulshrestha <ayush.kulshrestha@miqdigital.com>
2020-11-25 12:31:38 +01:00
frank chen fe693a4f01
Improve doc and exception message for invalid user configurations (#10598)
* improve doc and exception message

* add spelling check rules and remove unused import

* add a test to improve test coverage
2020-11-23 15:03:13 -08:00
zhangyue19921010 31740b3b29
Fix : Druid throws java.util.concurrent.RejectedExecutionException when ingest task is stopping. (#10555)
* check exec status before return Signal

* add more log

* change log level to debug and add UT

* change log leverl to warn and merge master

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-11-23 14:52:03 -08:00
frank chen e83d5cb59e
Fix ingestion failure of pretty-formatted JSON message (#10383)
* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check
2020-11-13 13:59:23 -08:00
Atul Mohan 6ccddedb7a
Improved exception handling in case of query timeouts (#10464)
* Separate timeout exceptions

* Add more tests

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-11-03 09:00:33 -06:00
Nishant Bangarwa 6b14bdb3a5
Add support for Blacklisting some domains for HTTPInputSource (#10535)
fix inspections

refactor class name

change name

 add allowList as well

distinguish between empty and null list

Fix CI
2020-11-02 21:47:25 +05:30
Clint Wylie d0821de854
support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499)
* support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

* inspector

* changes

* more test

* clean
2020-10-26 19:55:24 -07:00
Jihoon Son ad437dd655
Add shuffle metrics for parallel indexing (#10359)
* Add shuffle metrics for parallel indexing

* javadoc and concurrency test

* concurrency

* fix javadoc

* Feature flag

* doc

* fix doc and add a test

* checkstyle

* add tests

* fix build and address comments
2020-10-10 19:35:17 -07:00
Atul Mohan 0ab8b6e0a9
Improve test (#10480) 2020-10-07 08:40:02 -05:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Clint Wylie 9ec5c08e2a
fix array types from escaping into wider query engine (#10460)
* fix array types from escaping into wider query engine

* oops

* adjust

* fix lgtm
2020-10-03 15:30:34 -07:00
Gian Merlino 599aacce0f
Remove Expr.visit. (#10437)
* Remove Expr.visit.

It isn't used and doesn't have tests.

* Remove Visitor too.
2020-09-28 22:13:10 -07:00
Clint Wylie 3d700a5e31
vectorize remaining math expressions (#10429)
* vectorize remaining math expressions

* fixes

* remove cannotVectorize() where no longer true

* disable vectorized groupby for numeric columns with nulls

* fixes
2020-09-26 23:30:14 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Jonathan Wei cb30b1fe23
Automatically determine numShards for parallel ingestion hash partitioning (#10419)
* Automatically determine numShards for parallel ingestion hash partitioning

* Fix inspection, tests, coverage

* Docs and some PR comments

* Adjust locking

* Use HllSketch instead of HyperLogLogCollector

* Fix tests

* Address some PR comments

* Fix granularity bug

* Small doc fix
2020-09-24 13:47:53 -07:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Atul Mohan b6ad790dc7
Support combining inputsource for parallel ingestion (#10387)
* Add combining inputsource

* Fix documentation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 16:25:35 -07:00
Clint Wylie 184b202411
add computed Expr output types (#10370)
* push down ValueType to ExprType conversion, tidy up

* determine expr output type for given input types

* revert unintended name change

* add nullable

* tidy up

* fixup

* more better

* fix signatures

* naming things is hard

* fix inspection

* javadoc

* make default implementation of Expr.getOutputType that returns null

* rename method

* more test

* add output for contains expr macro, split operation and function auto conversion
2020-09-14 18:18:56 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Cheng Pan 8aea8cf1c6
Unit tests fail due to missing extend InitializedNullHandlingTest (#10382)
* CsvInputFormatTest should extend InitializedNullHandlingTest

* FirehoseFactoryToInputSourceAdaptorTest should extends InitializedNullHandlingTest
2020-09-11 16:23:46 -07:00
Clint Wylie 475d86a4f7
split up Expr.java (#10333) 2020-08-31 12:51:53 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Jihoon Son b9ff3483ac
Add support for all partitioing schemes for auto compaction (#10307)
* Add support for all partitioing schemes for auto compaction

* annotate last compaction state for multi phase parallel indexing

* fix build and tests

* test

* better home
2020-08-26 13:19:18 -07:00
Clint Wylie ab60661008
refactor internal type system (#9638)
* better type tracking: add typed postaggs, finalized types for agg factories

* more javadoc

* adjustments

* transition to getTypeName to be used exclusively for complex types

* remove unused fn

* adjust

* more better

* rename getTypeName to getComplexTypeName

* setup expression post agg for type inference existing

* more javadocs

* fixup

* oops

* more test

* more test

* more comments/javadoc

* nulls

* explicitly handle only numeric and complex aggregators for incremental index

* checkstyle

* more tests

* adjust

* more tests to showcase difference in behavior

* timeseries longsum array
2020-08-26 10:53:44 -07:00
Himanshu a607e9e7ff
introduce interning of internal files names in SmooshedFileMapper (#10295) 2020-08-21 17:37:49 -07:00
Jihoon Son b5b3e6ecce
Add maxNumFiles to splitHintSpec (#10243)
* Add maxNumFiles to splitHintSpec

* missing link

* fix build failure; use maxNumFiles for integration tests

* spelling

* lower default

* Update docs/ingestion/native-batch.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* address comments; change default maxSplitSize

* spelling

* typos and doc

* same change for segments splitHintSpec

* fix build

* fix build

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2020-08-21 09:43:58 -07:00
Jihoon Son 9a81740281
Don't log the entire task spec (#10278)
* Don't log the entire task spec

* fix lgtm

* fix serde

* address comments and add tests

* fix tests

* remove unnecessary codes
2020-08-18 11:03:13 -07:00
Himanshu 12ae84165e
remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaderClient.go(..) with InputStreamFullResponseHandler (#9717)
* remove DruidLeaderClient.goAsync(..) that does not follow redirect.
Replace its usage by DruidLeaadereClient.go(..) with
InputStreamFullResponseHandler

* remove ByteArrayResponseHolder dependency from JsonParserIterator

* add UT to cover lines in InputStreamFullResponseHandler

* refactor SystemSchema to reduce branches

* further reduce branches

* Revert "add UT to cover lines in InputStreamFullResponseHandler"

This reverts commit 330aba3dd9.

* UTs for InputStreamFullResponseHandler

* remove unused imports
2020-08-14 10:51:18 -07:00
Gian Merlino 6cca7242de
Add "offset" parameter to the Scan query. (#10233)
* Add "offset" parameter to the Scan query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Fix constructor call.

* Fix up JSONs.

* Fix call to ScanQuery.

* Doc update.

* Fix javadocs.

* Spotbugs, LGTM suppressions.

* Javadocs.

* Fix suppression.

* Stabilize Scan query result order, add tests.

* Update LGTM comment.

* Fixup.

* Test different batch sizes too.

* Nicer tests.

* Fix comment.
2020-08-13 14:56:24 -07:00
Gian Merlino b6aaf59e8c
Add "offset" parameter to GroupBy query. (#10235)
* Add "offset" parameter to GroupBy query.

It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.

* Stabilize GroupBy sorts.

* Fix inspections.

* Fix suppression.

* Fixups.

* Move TopNSequence to druid-core.

* Addl comments.

* NumberedElement equals verification.

* Changes from review.
2020-08-05 15:39:58 -07:00
frank chen 646fa84d04
Support unit on byte-related properties (#10203)
* support unit suffix on byte-related properties

* add doc

* change default value of byte-related properites in example files

* fix coding style

* fix doc

* fix CI

* suppress spelling errors

* improve code according to comments

* rename Bytes to HumanReadableBytes

* add getBytesInInt to get value safely

* improve doc

* fix problem reported by CI

* fix problem reported by CI

* resolve code review comments

* improve error message

* improve code & doc according to comments

* fix CI problem

* improve doc

* suppress spelling check errors
2020-07-31 09:58:48 +08:00
Jian Wang 271f90f205
Add segment pruning for hash based shard spec (#9810)
* Add segment pruning for hash based partitioning

* Update doc

* Add additional test

* Address comments

* Fix unit test failure

Co-authored-by: Jian Wang <jwang@pinterest.com>
2020-07-30 18:44:26 -07:00
Jihoon Son 6fdce36e41
Add integration tests for query retry on missing segments (#10171)
* Add integration tests for query retry on missing segments

* add missing dependencies; fix travis conf

* address comments

* Integration tests extension

* remove unused dependency

* remove druid_main

* fix java agent port
2020-07-22 22:30:35 -07:00
Jihoon Son 26d099f39b
Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments (#10183)
* Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments

* fix tests and add missing tests

* revert null handling fix

* unused import

* move out util methods from DiscoveryDruidNode
2020-07-21 17:52:51 -07:00
Suneet Saldanha e6c9142129
Add validation for authenticator and authorizer name (#10106)
* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix
2020-07-13 21:15:54 -07:00
Gian Merlino eeaf609fc0
Update Jetty to 9.4.30.v20200611. (#10098)
* Update Jetty to 9.4.30.v20200611.

This is the latest version currently available in the 9.4.x line.

* Various adjustments.

* Class name fixes.

* Remove unused HttpClientModule code.

* Add coverage suppressions.

* Another coverage suppression.

* Fix wildcards.
2020-07-07 14:24:02 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Gian Merlino ddda2a4f18
VersionedIntervalTimeline: Fix thread-unsafe call to "lookup". (#10130) 2020-07-05 09:32:18 -07:00
Clint Wylie a337ef351c
Closing yielder from ParallelMergeCombiningSequence should trigger cancellation (#10117)
* cancel parallel merge combine sequence on yielder close

* finish incomplete comment

* Update core/src/test/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequenceTest.java

Fixes checkstyle

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-07-01 14:07:44 -07:00
Mohammad Shoaib 84290a2332
Enabling Static Imports for Unit Testing DSLs (#331) (#9764)
* Enabling Static Imports for Unit Testing DSLs (#331)

Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>

* Feature 8885 - Enabling Static Imports for Unit Testing DSLs (#435)

* Enabling Static Imports for Unit Testing DSLs

* Using suppressions checkstyle to allow static imports only in the UTs

Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>

* Removing the changes in the checkstyle because those are not needed

Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>
2020-06-30 13:59:35 -07:00
Jihoon Son 8ef3598c05
Move shardSpec tests to core (#10079)
* Move shardSpec tests to core

* checkstyle

* inject object mapper for testing

* unused import
2020-06-29 17:31:37 -07:00
chenyuzhi459 a4c6d5f37e
fix query memory leak (#10027)
* fix query memory leak

* rollup ./idea

* roll up .idea

* clean code

* optimize style

* optimize cancel function

* optimize style

* add concurrentGroupTest test case

* add test case

* add unit test

* fix code style

* optimize cancell method use

* format code

* reback code

* optimize cancelAll

* clean code

* add comment
2020-06-26 23:30:59 -07:00
Clint Wylie 4b99c6d3ef
ensure ParallelMergeCombiningSequence closes its closeables (#10076)
* ensure close for all closeables of ParallelMergeCombiningSequence

* revert unneeded change

* consolidate methods

* catch throwable instead of exception
2020-06-26 14:37:20 -07:00
Jihoon Son c591ff8ea8
Add NonnullPair (#10013)
* Add NonnullPair

* new line

* test

* make it consistent
2020-06-26 09:52:06 -07:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Suneet Saldanha 0035f39e25
lpad and rpad functions match postrges behavior in SQL compatible mode (#10006)
* lpad and rpad functions deal with empty pad

Return null if the pad string used by the `lpad` and `rpad` functions is
an empty string

* Fix rpad

* Match PostgreSQL behavior in SQL compliant null handling mode

* Match PostgreSQL behavior for pad -ve len

* address review comments
2020-06-15 10:47:57 -07:00
Jihoon Son 9a10f8352b
Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle
2020-06-12 21:39:37 -07:00
BIGrey d4d0004338
Fix failed tests in TimestampParserTest when running locally (#9997)
* fix failed tests in TimestampPaserTest due to timezone

* remove unneeded -Duser.country=US

Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com>
2020-06-10 09:19:38 -07:00
Atul Mohan 17cf8ea8f2
Add Sql InputSource (#9449)
* Add Sql InputSource

* Add spelling

* Use separate DruidModule

* Change module name

* Fix docs

* Use sqltestutils for tests

* Add additional tests

* Fix inspection

* Add module test

* Fix md in docs

* Remove annotation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-09 12:55:20 -07:00
Mainak Ghosh bcc066a27f
Empty partitionDimension has less rollup compared to when explicitly specified (#9861)
* Empty partitionDimension has less rollup compared to the case when it is explicitly specified

* Adding a unit test for the empty partitionDimension scenario. Fixing another test which was failing

* Fixing CI Build Inspection Issue

* Addressing all review comments

* Updating the javadocs for the hash method in HashBasedNumberedShardSpec
2020-06-05 12:42:42 -07:00
Xavier Léauté a934b2664c
remove ListenableFutures and revert to using the Guava implementation (#9944)
This change removes ListenableFutures.transformAsync in favor of the
existing Guava Futures.transform implementation. Our own implementation
had a bug which did not fail the future if the applied function threw an
exception, resulting in the future never completing.

An attempt was made to fix this bug, however when running againts Guava's own
tests, our version failed another half dozen tests, so it was decided to not
continue down that path and scrap our own implementation.

Explanation for how was this bug manifested itself:

An exception thrown in BaseAppenderatorDriver.publishInBackground when
invoked via transformAsync in StreamAppenderatorDriver.publish will
cause the resulting future to never complete.

This explains why when encountering https://github.com/apache/druid/issues/9845
the task will never complete, forever waiting for the publishFuture to
register the handoff. As a result, the corresponding "Error while
publishing segments ..." message only gets logged once the index task
times out and is forcefully shutdown when the future is force-cancelled
by the executor.
2020-06-03 10:46:03 -07:00
Gian Merlino 3d81564a14
Fix various processing buffer leaks and simplify BlockingPool. (#9928)
* - GroupByQueryEngineV2: Fix leak of intermediate processing buffer when
  exceptions are thrown before result sequence is created.
- PooledTopNAlgorithm: Fix leak of intermediate processing buffer when
  exceptions are thrown before the PooledTopNParams object is created.
- BlockingPool: Remove unused "take" methods.

* Add tests to verify that buffers have been returned.
2020-06-02 18:26:18 -07:00
Gian Merlino 309fc04d54
Fix various Yielder leaks. (#9934)
* Fix various Yielder leaks.

- CombiningSequence leaked the input yielder from "toYielder" if it ran
  into an exception while accumulating the last value from the input
  yielder.
- MergeSequence leaked input yielders from "toYielder" if it ran into
  an exception while building the initial priority queue.
- ScanQueryRunnerFactory leaked the input yielder in its
  "priorityQueueSortAndLimit" strategy if it ran into an exception
  while scanning and sorting.
- YieldingSequenceBase.accumulate chomped IOExceptions thrown in
  "accumulate" during yielder closing.

* Add tests.

* Fix braces.
2020-06-02 18:26:06 -07:00
Samarth Jain 82e5b0573e
Number based columns representing time in custom format cannot be used as timestamp column in Druid. (#9877)
* Number based columns representing time in custom format cannot be used as timestamp column in Druid.

Prior to this fix, if an integer column in parquet is storing dateint in format yyyyMMdd, it cannot be used as timestamp column in Druid as the timestamp parser interprets it as a number storing UTC time instead of treating it as a number representing time in yyyyMMdd format. Data formats like TSV or CSV don't suffer from this problem as the timestamp is passed in an as string which the timestamp parser is able to parse correctly.
2020-05-18 11:17:28 -07:00
Clint Wylie 2e9548d93d
refactor SeekableStreamSupervisor usage of RecordSupplier (#9819)
* refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting

* fix style and test

* cleanup, refactor, javadocs, test

* fixes

* keep collecting current offsets and lag if unhealthy in background reporting thread

* review stuffs

* add comment
2020-05-16 14:09:39 -07:00
Jihoon Son 46beaa0640
Fix potential resource leak in ParquetReader (#9852)
* Fix potential resource leak in ParquetReader

* add test

* never thrown exception

* catch potential exceptions
2020-05-16 09:57:12 -07:00
Suneet Saldanha b0167295d7
Fail incorrectly constructed join queries (#9830)
* Fail incorrectly constructed join queries

* wip annotation for equals implementations

* Add equals tests

* fix tests

* Actually fix the tests

* Address review comments

* prohibit Pattern.hashCode()
2020-05-13 14:23:04 -07:00
mcbrewster 28be107a1c
add flag to flattenSpec to keep null columns (#9814)
* add flag to flattenSpec to keep null columns

* remove changes to inputFormat interface

* add comment

* change comment message

* update web console e2e test

* move keepNullColmns to JSONParseSpec

* fix merge conflicts

* fix tests

* set keepNullColumns to false by default

* fix lgtm

* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns

* Add equals verifier tests
2020-05-08 21:53:39 -07:00
Jihoon Son 6674d721bc
Avoid sorting values in InDimFilter if possible (#9800)
* Avoid sorting values in InDimFilter if possible

* tests

* more tests

* fix and and or filters

* fix build

* false and true vector matchers

* fix vector matchers

* checkstyle

* in filter null handling

* remove wrong test

* address comments

* remove unnecessary null check

* redundant separator

* address comments

* typo

* tests
2020-05-06 15:26:36 -07:00
Jihoon Son 964a1fc9df
Remove ParseSpec.toInputFormat() (#9815)
* Remove toInputFormat() from ParseSpec

* fix test
2020-05-05 11:17:57 -07:00
Jihoon Son c6caae9a24
Fix filtering on boolean values in transformation (#9812)
* Fix filter on boolean value in Transform

* assert

* more descriptive test

* remove assert

* add assert for cached string; disable tests

* typo
2020-05-04 18:47:10 -07:00
Aleksey Plekhanov 9341ea828a
Fixed flaky BlockingPoolTest.testConcurrentTakeBatch() (#9692) 2020-05-03 12:54:27 -07:00
Clint Wylie 7711f776a0
fix issue where CloseableIterator.flatMap does not close inner CloseableIterator (#9761)
* fix issue where CloseableIterator.flatMap does not close inner CloseableIterator

* more test

* style

* clarify test
2020-04-24 13:52:50 -07:00
Jihoon Son 7fa72fbf15
Initialize SettableByteEntityReader only when inputFormat is not null (#9734)
* Lazy initialization of SettableByteEntityReader to avoid NPE

* toInputFormat for tsv

* address comments

* common code
2020-04-24 10:22:51 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Gian Merlino 75c543b50f
SQL: More straightforward handling of join planning. (#9648)
* SQL: More straightforward handling of join planning.

Two changes that simplify how joins are planned:

1) Stop using JoinProjectTransposeRule as a way of guiding subquery
removal. Instead, add logic to DruidJoinRule that identifies removable
subqueries and removes them at the point of creating a DruidJoinQueryRel.
This approach reduces the size of the planning space and allows the
planner to complete quickly.

2) Remove rules that reorder joins. Not because of an impact on the
planning time (it seems minimal), but because the decisions that the
planner was making in the new tests were sometimes worse than the
user-provided order. I think we'll need to go with the user-provided
order for now, and revisit reordering when we can add more smarts to
the cost estimator.

A third change updates numeric ExprEval classes to store their
value as a boxed type that corresponds to what it is supposed to be.
This is useful because it affects the behavior of "asString", and
is included in this patch because it is needed for the new test
"testInnerJoinTwoLookupsToTableUsingNumericColumnInReverse". This
test relies on CAST('6', 'DOUBLE') stringifying to "6.0" like an
actual double would.

Fixes #9646.

* Fix comments.

* Fix tests.
2020-04-09 16:21:43 -07:00
Clint Wylie d267b1c414
check paths used for shuffle intermediary data manager get and delete (#9630)
* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
2020-04-07 09:47:18 -07:00
Jihoon Son 82ce60b5c1
Reuse transformer in stream indexing (#9625)
* Reuse transformer in stream indexing

* remove unused method

* memoize complied pattern
2020-04-06 16:36:08 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Himanshu 5604ac7963
druid extension for OpenID Connect auth using pac4j lib (#8992)
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication

* update version in druid-pac4j pom

* introducing unauthorized resource filter

* authenticated but authorized /unified-webconsole.html

* use httpReq.getRequestURI() for matching callback path

* add documentation

* minor doc addition

* licesne file updates

* make dependency analyze succeed

* fix doc build

* hopefully fixes doc build

* hopefully fixes license check build

* yet another try on fixing license build

* revert unintentional changes to website folder

* update version to 0.18.0-SNAPSHOT

* check session and its expiry on each request

* add crypto service

* code for encrypting the cookie

* update doc with cookiePassphrase

* update license yaml

* make sessionstore in Pac4jFilter private non static

* make Pac4jFilter fields final

* okta: use sha256 for hmac

* remove incubating

* add UTs for crypto util and session store impl

* use standard charsets

* add license header

* remove unused file

* add org.objenesis.objenesis to license.yaml

* a bit of nit changes  in CryptoService  and embedding EncryptionResult for clarity

* rename alg  to cipherAlgName

* take cipher alg name, mode and padding as input

* add java doc  for CryptoService  and make it more understandable

* another  UT for CryptoService

* cache pac4j Config

* use generics clearly in Pac4jSessionStore

* update cookiePassphrase doc to mention PasswordProvider

* mark stuff Nullable where appropriate in Pac4jSessionStore

* update doc to mention jdbc

* add error log on reaching callback resource

* javadoc  for Pac4jCallbackResource

* introduce NOOP_HTTP_ACTION_ADAPTER

* add correct module name in license file

* correct extensions folder name in licenses.yaml

* replace druid-kubernetes-extensions to druid-pac4j

* cache SecureRandom instance

* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 18:15:45 -07:00
Chi Cao Minh 6b02991464
Match GREATEST/LEAST function behavior to other DBs (#9488)
* Match GREATEST/LEAST function behavior

Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.

* Match postgres behavior & handle more SQL types

* Fix imports
2020-03-12 15:10:11 -07:00
Jihoon Son 7401bb3f93
Improve OvershadowableManager performance (#9441)
* Use the iterator instead of higherKey(); use the iterator API instead of stream

* Fix tests; fix a concurrency bug in timeline

* fix test

* add tests for findNonOvershadowedObjectsInInterval

* fix test

* add missing tests; fix a bug in QueueEntry

* equals tests

* fix test
2020-03-10 13:22:19 -07:00
zachjsh 7e0e767cc2
Ability to Delete task logs and segments from S3 (#9459)
* Ability to Delete task logs and segments from S3

* implement ability to delete all tasks logs or all task logs
  written before a particular date when written to S3
* implement ability to delete all segments from S3 deep storage
* upgrade version of aws SDK in use

* * update licenses for updated AWS SDK version

* * fix bug in iterating through results from S3
* revert back to original version of AWS SDK

* * Address review comments

* * Fix failing dependency check
2020-03-10 13:13:46 -07:00
Gian Merlino c6c2282b59
Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484)
* Harmonization and bug-fixing for selector and filter behavior on unknown types.

- Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
  system, and set defaultType COMPLEX so unknown types can be dynamically matched.
- Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
- Switch various methods to use convertObjectToX when casting to numbers, rather
  than ad-hoc and inconsistent logic.
- Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
  true even for 0- or 1- element arrays.
- Adjust various javadocs.

* Add throwParseExceptions option to Rows.objectToNumber, switch back to that.

* Update tests.

* Adjust moment sketch tests.
2020-03-10 07:15:57 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Jihoon Son 9466ac7c9b
Skip empty files for local, hdfs, and cloud input sources (#9450)
* Skip empty files for local, hdfs, and cloud input sources

* split hint spec doc

* doc for skipping empty files

* fix typo; adjust tests

* unnecessary fluent iterable

* address comments

* fix test

* use the right lists

* fix test

* fix test
2020-03-03 20:51:06 -08:00
Gian Merlino ae617bf5dd
Clarify InputSource.isSplittable usage. (#9424)
Also removes TimedShutoffInputSource, which had a bug in isSplittable (it
improperly returned true, even though it didn't implement SplittableInputSource).
This bug had no user-visible impact, since the code wasn't used.
2020-02-26 22:05:46 -08:00
Jihoon Son 3bc7ae782c
Create splits of multiple files for parallel indexing (#9360)
* Create splits of multiple files for parallel indexing

* fix wrong import and npe in test

* use the single file split in tests

* rename

* import order

* Remove specific local input source

* Update docs/ingestion/native-batch.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* Update docs/ingestion/native-batch.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc and error msg

* fix build

* fix a test and address comments

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-02-24 17:34:39 -08:00
Clint Wylie 6d8dd5ec10
string -> expression -> string -> expression (#9367)
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays

* oops, macros are expressions too

* style

* spotbugs

* qualified type arrays

* review stuffs

* simplify grammar

* more permissive array parsing

* reuse expr joiner

* fix it
2020-02-21 15:43:02 -08:00
zachjsh f707064bed
Add Azure config options for segment prefix and max listing length (#9356)
* Add Azure config options for segment prefix and max listing length

Added configuration options to allow the user to specify the prefix
within the segment container to store the segment files. Also
added a configuration option to allow the user to specify the
maximum number of input files to stream for each iteration.

* * Fix test failures

* * Address review comments

* * add dependency explicitly to pom

* * update docs

* * Address review comments

* * Address review comments
2020-02-21 14:12:03 -08:00
Jihoon Son 141d8dd875
Enable druid.coordinator.kill.pendingSegments.on by default (#9385)
* Enable druid.coordinator.kill.pendingSegments.on by default

* checkstyle
2020-02-21 13:13:49 -08:00