Commit Graph

10737 Commits

Author SHA1 Message Date
Chi Cao Minh 6b02991464
Match GREATEST/LEAST function behavior to other DBs (#9488)
* Match GREATEST/LEAST function behavior

Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.

* Match postgres behavior & handle more SQL types

* Fix imports
2020-03-12 15:10:11 -07:00
Vadim Ogievetsky ddc6f87920
Web console: standardize the spec format (#9477)
* standerdize the spec format

* fix spec upgrade
2020-03-12 14:21:23 -07:00
Himanshu 1ba1a3c523
fix worker category on Indexer node (#9510) 2020-03-12 14:11:02 -07:00
Gian Merlino ff59d2e78b
Move RowSignature from druid-sql to druid-processing and make use of it. (#9508)
* Move RowSignature from druid-sql to druid-processing and make use of it.

1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific
   stuff in a RowSignatures utility class. It also picked up some new convenience
   methods along the way.
2) There were a lot of places in the code where Map<String, ValueType> was used to
   associate columns with type info. These are now all replaced with RowSignature.
3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature,
   and it now provides type info.

* Fix up extensions.

* Various fixes
2020-03-12 11:06:44 -07:00
Jonathan Wei 3082b9289a
Fix NPE when using IndexedTable and all left rows are filtered out (#9490)
* Fix NPE when using IndexedTable and all left rows are filtered out

* Fix compile

* Add constant for uninitialized current row

* Fix checkstyle
2020-03-11 19:23:05 -07:00
Gian Merlino 2ef5c17441
Link up row-based datasources to serving layer. (#9503)
* Link up row-based datasources to serving layer.

- Add SegmentWrangler interface that allows linking of DataSources to Segments.
- Add LocalQuerySegmentWalker that uses SegmentWranglers to compute queries on
  data that is available locally.
- Modify ClientQuerySegmentWalker to use LocalQuerySegmentWalker when the base
  datasource is concrete and not a table.
- Add SegmentWranglerModule to the Broker so it has them available and can
  properly instantiate . LocalQuerySegmentWalkers.
- Set InlineDataSource and LookupDataSource to concrete, since they can be
  directly queried now.

* Fix tests.
2020-03-11 11:32:27 -07:00
Maytas Monsereenusorn e9888f41cb
Modify check java version script to indicate experimental support for Java 11 (#9455)
* Modify check java version script to indicate experimental support for Java 11

* update docs
2020-03-11 09:22:39 -07:00
Maytas Monsereenusorn 9231f2acb3
Integration test compile with Java 8 and run with Java 8 and 11 (#9491)
* test integration compile with 8 and run with 11

* Integration test compile with Java 8 and run with Java 8 and 11
2020-03-11 09:22:27 -07:00
Gian Merlino 4f085896c6
Ability to directly query row-based datasources. (#9502)
* Ability to directly query row-based datasources.

Includes:

- Foundational classes RowBasedSegment, RowBasedStorageAdapter,
  RowBasedCursor provide a queryable interface on top of a
  RowBasedColumnSelectorFactory.
- Add LookupSegment: A RowBasedSegment that is built on lookup data.
- Improve capability reporting in RowBasedColumnSelectorFactory.

* Fix import.

* Remove unthrown IOException.
2020-03-10 20:39:01 -07:00
Samarth Jain c74749f0f4
Don't exclude null dimension values from the map based query response (#9438) 2020-03-10 15:06:03 -07:00
Jihoon Son 7401bb3f93
Improve OvershadowableManager performance (#9441)
* Use the iterator instead of higherKey(); use the iterator API instead of stream

* Fix tests; fix a concurrency bug in timeline

* fix test

* add tests for findNonOvershadowedObjectsInInterval

* fix test

* add missing tests; fix a bug in QueueEntry

* equals tests

* fix test
2020-03-10 13:22:19 -07:00
zachjsh 7e0e767cc2
Ability to Delete task logs and segments from S3 (#9459)
* Ability to Delete task logs and segments from S3

* implement ability to delete all tasks logs or all task logs
  written before a particular date when written to S3
* implement ability to delete all segments from S3 deep storage
* upgrade version of aws SDK in use

* * update licenses for updated AWS SDK version

* * fix bug in iterating through results from S3
* revert back to original version of AWS SDK

* * Address review comments

* * Fix failing dependency check
2020-03-10 13:13:46 -07:00
Himanshu 75a5591448
remove old unused zookeeper dependent lookups code (#9480)
* remove old unused zookeeper dependent lookups code

* make  intellij inspector happy
2020-03-10 12:12:48 -07:00
Chi Cao Minh 559c7b64cc
Suppress CVEs for htrace-core4 and openstack-swift (#9489)
CVE-2013-7109 can be ignored for openstack-swift as it is for the python
SDK and druid uses the java SDK.

The jackson-databind:2.4.0 CVEs via htrace-core4 are all suppressed for
now as fixing them requires updating the hadoop version.
2020-03-10 10:55:41 -07:00
Gian Merlino c6c2282b59
Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484)
* Harmonization and bug-fixing for selector and filter behavior on unknown types.

- Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
  system, and set defaultType COMPLEX so unknown types can be dynamically matched.
- Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
- Switch various methods to use convertObjectToX when casting to numbers, rather
  than ad-hoc and inconsistent logic.
- Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
  true even for 0- or 1- element arrays.
- Adjust various javadocs.

* Add throwParseExceptions option to Rows.objectToNumber, switch back to that.

* Update tests.

* Adjust moment sketch tests.
2020-03-10 07:15:57 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Maytas Monsereenusorn 2db20afbb7
Integration test cluster supports override config (#9473)
* integration test refactor

* integration test refactor

* refactor integration test

* refactor integration test

* refactor integration test

* refactor integration test

* refactor integration test

* refactor integration test

* refactor integration test

* refactor integration test

* address comments
2020-03-09 21:17:49 -07:00
mcbrewster 95406ca20a
[IMPLY-2285] fix maxRowsPerSegment tool tip (#9468) 2020-03-09 20:12:05 -07:00
mcbrewster 96ed7210d3
Fix history dialog overflow (#9471)
* [IMPLY-1661] fix history dialog overflow

* jest -u
2020-03-09 19:09:59 -07:00
Maytas Monsereenusorn 814f5a9717
add password provider reference to s3 optional cred docs (#9439) 2020-03-09 17:56:42 -07:00
Clint Wylie f8b1f2f7f3
fix issue when distinct grouping dimensions are optimized into the same virtual column expression (#9429)
* fix issue when distinct grouping dimensions are optimized into the same virtual column expression

* fix tests

* more better

* fixes
2020-03-09 17:48:29 -07:00
Jonathan Wei 0136dba95d
Add option to control join filter rewrites (#9472)
* Add option to control join filter rewrites

* Fix inspections
2020-03-09 17:36:07 -07:00
mcbrewster a676d16226
[IMPLY-1767] fix popover direction (#9470) 2020-03-09 17:35:02 -07:00
mcbrewster da0ea627d0
Add disabled run button during loading state (#9474)
* [IMPLY-1782] add disabled run button during loading state

* jest -u
2020-03-09 17:10:35 -07:00
Himanshu 072bbe210f
remove ServerDiscoverySelector from DruidLeaderClient (#9481) 2020-03-09 12:13:59 -07:00
Jihoon Son f456d2fcf8
Resource leak in DruidSegmentReader (#9476)
* Close the Yielder in DruidSegmentReader

* forbidden api
2020-03-09 10:05:25 -07:00
Clint Wylie a677664811
allow optimization of single multi-value column input expr with repeated identifier (#9425)
* allow optimization of single multi-value column input expr with repeated identifier

* add test
2020-03-06 12:53:32 -08:00
Julian Jaffe eda03630d0
Add OnHeapMemorySegmentWriteOutMediumFactory (#9454)
* Add OnHeapMemorySegmentWriteOutMediumFactory

Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark.

* Register OnHeapMemorySegmentWriteOutMediumFactory.

Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory.

* Remove unnecessary throws

The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception.

* Update SegmentWriteOutMedium docs to include onHeapMemory

Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.
2020-03-05 22:34:08 -08:00
Jihoon Son 64afc05080
Open the licenses.yaml with an explicit encoding (#9462) 2020-03-05 17:13:44 -08:00
Clint Wylie 32cd47bc8e
Fix home view styling (#9444) 2020-03-04 19:39:36 -08:00
Jihoon Son 3016057178
Make Transform an ExtensionPoint (#9319)
* Make Transform an ExtensionPoint

* Add transform to the list of documented extensions

* Add example transform implementation
2020-03-04 12:13:14 -08:00
Chi Cao Minh 4ed83f6af6
Fix superbatch merge last partition boundaries (#9448)
* Fix superbatch merge last partition boundaries

A bug in the computation for the last parallel merge partition could
cause an IndexOutOfBoundsException or precondition failure due to an
empty partition.

* Improve comments and tests
2020-03-04 10:35:21 -08:00
Jihoon Son 9466ac7c9b
Skip empty files for local, hdfs, and cloud input sources (#9450)
* Skip empty files for local, hdfs, and cloud input sources

* split hint spec doc

* doc for skipping empty files

* fix typo; adjust tests

* unnecessary fluent iterable

* address comments

* fix test

* use the right lists

* fix test

* fix test
2020-03-03 20:51:06 -08:00
mcbrewster 99095c4ac5
Add Azure ingestion flow to web console (#9437)
* add support for azure

* change bucket to container

* add azure to input menu

* remove static-azure
2020-03-03 11:06:00 -08:00
Gian Merlino 1fd865b7c1
BufferArrayGrouper: Fix potential overflow in requiredBufferCapacity. (#9435)
* BufferArrayGrouper: Fix potential overflow in requiredBufferCapacity.

If cardinality was high, the computation could overflow an int. There
were tests for this, but the tests were wrong.

* Nicer.
2020-02-28 14:27:52 -08:00
Gian Merlino 81d8be6e39
CacheStrategy: Improve Javadocs. (#9280)
* CacheStrategy: Improve Javadocs.

* Update processing/src/main/java/org/apache/druid/query/CacheStrategy.java

Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>

Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
2020-02-28 11:30:58 -08:00
Vadim Ogievetsky c294e0b7c6
Web console: Column counter (#9334)
* Column counter

* more general test
2020-02-27 12:04:27 -08:00
Gian Merlino ef3d24e886
Add javadocs for enableFilterPushDown. (#9423) 2020-02-26 22:07:33 -08:00
Gian Merlino ae617bf5dd
Clarify InputSource.isSplittable usage. (#9424)
Also removes TimedShutoffInputSource, which had a bug in isSplittable (it
improperly returned true, even though it didn't implement SplittableInputSource).
This bug had no user-visible impact, since the code wasn't used.
2020-02-26 22:05:46 -08:00
Chi Cao Minh 5d05b40e6d
Remove druid incubating references (#9405) 2020-02-26 21:47:58 -08:00
Lijia Liu 063811710e
#8690 use utc interval when create pedding segments (#9142)
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-02-26 13:20:59 -08:00
Jihoon Son b924161086
Add main method to VersionedIntervalTimelineBenchmark (#9404) 2020-02-26 12:01:02 -08:00
Aditya e506fc9fdf
fix cursor position after function autocomplete (#9396)
Closes #9395
2020-02-26 09:41:24 -08:00
Gian Merlino c9faf3e148
Add SQL GROUPING SETS support. (#9122)
* Add SQL GROUPING SETS support.

Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:

- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
  each grouping set. This is more in line with SQL standard behavior. I think it is okay
  to make this change, since the old behavior was not documented, so users should
  hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
  should not have been.

Also fixes two bugs in query equality checking:

- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
  latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.

* Fix bugs.

* Fix tests.

* PR updates.

* Grouping class hygiene.
2020-02-26 08:52:39 -08:00
Maytas Monsereenusorn 92fb83726b
Add support for optional aws credentials for s3 for ingestion (#9375)
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* fix build failure

* fix failing build

* fix failing build

* Code cleanup

* fix failing test

* Removed CloudConfigProperties and make specific class for each cloudInputSource

* Removed CloudConfigProperties and make specific class for each cloudInputSource

* pass s3ConfigProperties for split

* lazy init s3client

* update docs

* fix docs check

* address comments

* add ServerSideEncryptingAmazonS3.Builder

* fix failing checkstyle

* fix typo

* wrap the ServerSideEncryptingAmazonS3.Builder in a provider

* added java docs for S3InputSource constructor

* added java docs for S3InputSource constructor

* remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider
2020-02-25 20:59:53 -08:00
Jonathan Wei 5ce9c81b68
Add join prefix duplicate/shadowing check (#9384)
* Add join prefix duplicate/shadowing check

* Fix format string

* PR comments

* PR comment

* Optimize loop PR comment
2020-02-25 18:17:23 -08:00
zachjsh d771b42ed1
Move Azure extension into Core (#9394)
* Move Azure extension into Core

Moving the azure extension into Core.

* * Fix build failure

* * Add The MIT License (MIT) to list of compatible licenses

* * Address review comments

* * change reference to contrib azure to core azure

* * Fix spelling mistakes.
2020-02-25 17:49:16 -08:00
Francesco Nidito 14accb50ad
Improves on the fix for 8918 (#9387)
* Improves on the fix for 8918

* factorize constants for ITRetryUtil.retryUntil call

* increasing retries and sleep in HttpUtil to cope with 401s in testing

* adding retries in EventReceiverFirehoseTestClient

* adding missing space
2020-02-25 15:50:27 -08:00
als-sdin f619903403
Updated the configuration documentation on coordinator kill tasks to clarify whether they delete only unused segments. (#9400) 2020-02-25 13:15:55 -08:00