Commit Graph

10733 Commits

Author SHA1 Message Date
BIGrey bbe23c652c
Fix negative queuedSize problem in CuratorLoadQueuePeon (#10362)
* fix negative queuedSize problem in CuratorLoadQueuePeon

* add comment and optimize test case

* fix typo

Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com>
2020-10-16 13:38:49 -07:00
Maytas Monsereenusorn 3538abd5d0
Make sure all fields in sys.segments are JSON-serialized (#10481)
* fix JSON format

* Change all columns in sys segments to be JSON

* Change all columns in sys segments to be JSON

* add tests

* fix failing tests

* fix failing tests
2020-10-14 13:49:46 -07:00
Maytas Monsereenusorn 9056d113d0
Add docs and integration tests for Auto-compaction snapshot status API (#10510)
* add docs and IT for Auto-compaction snapshot status API

* fix spellings

* fix test

* address comments
2020-10-14 06:42:22 -07:00
Vadim Ogievetsky e8c5893c34
Web console: show segment sizes in rows not bytes (#10496)
* added query error suggestions

* simplify the SQLs

* change segment size display to rows

* suggestion tests

* update snapshot

* make error detection more robust

* remove errant console log

* fix imports

* put suggestion on top

* better error rendering

* format as millions

* add .druid.pid to gitignore

* rename segment_size to segment_rows, fix visability, fix divide by zero

* update snapshots
2020-10-13 13:19:39 -07:00
Abhishek Agarwal 567e381705
Any virtual column on "__time" should be a pre-join virtual column (#10451)
* Virtual column on __time should be in pre-join

* Add unit test
2020-10-12 13:04:55 -07:00
Suneet Saldanha b45a56f989
Web console: targetRowsPerSegment for hashed partionin (#10500)
* Web console: targetRowsPerSegment for hashed partionin

Added `targetRowsPerSegment` to the web console for hashed partition for both
the auto compaction view and as part of the ingestion workflow.

The help text was also updated to indicate when a user should care about
updating these fields

* code review

* update test snapshots

* oops
2020-10-11 16:55:28 -07:00
Jihoon Son ad437dd655
Add shuffle metrics for parallel indexing (#10359)
* Add shuffle metrics for parallel indexing

* javadoc and concurrency test

* concurrency

* fix javadoc

* Feature flag

* doc

* fix doc and add a test

* checkstyle

* add tests

* fix build and address comments
2020-10-10 19:35:17 -07:00
Abhishek Agarwal 4d2a92f46a
Add caching support to join queries (#10366)
* Proposed changes for making joins cacheable

* Add unit tests

* Fix tests

* simplify logic

* Pull empty byte array logic out of CachingQueryRunner

* remove useless null check

* Minor refactor

* Fix tests

* Fix segment caching on Broker

* Move join cache key computation in Broker

Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient

* Fix compilation

* Review comments

* Add more tests

* Fix inspection errors

* Pushed condition analysis to JoinableFactory

* review comments

* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable

* Remove commented lines

* Fix populateCache

* Disable caching for selective datasources

Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
2020-10-09 17:42:30 -07:00
Ashish Kapoor 4c78b514c9
Update README.md (#10357)
Compile scss files before npm start.
2020-10-09 20:21:59 +05:30
Jonathan Wei 0aa2a8e2c6
Suppress CVE-2018-11765 for hadoop dependencies (#10485) 2020-10-07 21:55:34 -07:00
Joseph Glanville 7ce9ac4548
Fix Avro support in Web Console (#10232)
* Fix Avro OCF detection prefix and run formation detection on raw input

* Support Avro Fixed and Enum types correctly

* Check Avro version byte in format detection

* Add test for AvroOCFReader.sample

Ensures that the Sampler doesn't receive raw input that it can't
serialize into JSON.

* Document Avro type handling

* Add TS unit tests for guessInputFormat
2020-10-07 21:08:22 -07:00
Vadim Ogievetsky 2e50ada407
Web console: fix compaction status when no compaction config, and small cleanup (#10483)
* move timed button to icons

* cleanup redundant logic

* fix compaction status text

* remove extra style
2020-10-07 14:54:08 -07:00
Atul Mohan 0ab8b6e0a9
Improve test (#10480) 2020-10-07 08:40:02 -05:00
Jihoon Son e9e7d82714
Fix compaction task slot computation in auto compaction (#10479)
* Fix compaction task slot computation in auto compaction

* add tests for task counting
2020-10-06 21:56:03 -07:00
Gian Merlino d78fedd13c
Web console: Don't include realtime segments in size calculations. (#10482)
It's always zero, and so it messes up averages, mins, and counts.
2020-10-06 18:56:54 -07:00
Jihoon Son 1deed9fbcd
Close aggregators in HashVectorGrouper.close() (#10452)
* Close aggregators in HashVectorGrouper.close()

* reuse grouper

* Add missing dependency
2020-10-06 10:17:33 -07:00
Clint Wylie 207ef310f2
vectorized group by support for nullable numeric columns (#10441)
* vectorized group by support for numeric null columns

* revert unintended change

* adjust

* review stuffs
2020-10-05 21:53:53 -07:00
Clint Wylie 307c1b0720
adjustments to Kafka integration tests to allow running against Azure Event Hubs streams (#10463)
* adjustments to kafka integration tests to allow running against azure event hubs in kafka mode

* oops

* make better

* more better
2020-10-05 08:54:29 -07:00
Chi Cao Minh 1c77491da6
Test UI to trigger auto compaction (#10469)
In the web console E2E tests, Use the new UI to trigger auto compaction
instead of calling the REST API directly so that the UI is covered by
tests.
2020-10-04 00:06:07 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Clint Wylie 9ec5c08e2a
fix array types from escaping into wider query engine (#10460)
* fix array types from escaping into wider query engine

* oops

* adjust

* fix lgtm
2020-10-03 15:30:34 -07:00
Vadim Ogievetsky f77c16bc6c
Web console: fix lookup edit dialog version setting (#10461)
* fix lookup edit dialog

* update snapshots

* clean up test
2020-10-03 08:35:20 -07:00
Chi Cao Minh d11537b5f7
Improve UI E2E test usability (#10466)
- Update playwright to latest version
- Provide environment variable to disable/enable headless mode
- Allow running E2E tests against any druid cluster running on standard
  ports (tutorial-batch.spec.ts now uses an absolute instead of relative
  path for the input data)
- Provide environment variable to change target web console port
- Druid setup does not need to download zookeeper
2020-10-03 08:21:44 -07:00
Lasse Krogh Mammen 20ca9aaaf7
Allow using jsonpath predicates with AvroFlattener (#10330) 2020-10-02 10:14:48 +01:00
Chi Cao Minh ede25f1b45
Fix UI datasources view edit action compaction (#10459)
Restore the web console's ability to view a datasource's compaction
configuration via the "action" menu. Refactoring done in
https://github.com/apache/druid/pull/10438 introduced a regression that
always caused the default compaction configuration to be shown via the
"action" menu instead.

Regression test is added in e2e-tests/auto-compaction.spec.ts.
2020-10-01 23:59:21 -07:00
Chi Cao Minh 7385af0272
Web console reindexing E2E test (#10453)
Add an E2E test for the web console workflow of reindexing a Druid
datasource to change the secondary partitioning type.  The new test
changes dynamic to single dim partitions since the autocompaction test
already does dynamic to hashed partitions.

Also, run the web console E2E tests in parallel to reduce CI time and
change naming convention for test datasources to make it easier to map
them to the corresponding test run.

Main changes:

1) web-consolee2e-tests/reindexing.spec.ts
   - new E2E test

2) web-console/e2e-tests/component/load-data/data-connector/reindex.ts
   - new data loader connector for druid input source

3) web-console/e2e-tests/component/load-data/config/partition.ts
   - move partition spec definitions from compaction.ts
   - add new single dim partition spec definition
2020-10-01 15:14:41 -07:00
Abhishek Agarwal e282ab5695
Fix the task id creation in CompactionTask (#10445)
* Fix the task id creation in CompactionTask

* review comments

* Ignore test for range partitioning and segment lock
2020-10-01 15:02:04 -07:00
Abhishek Agarwal d057c5149f
Fix the offset setting in GoogleStorage#get (#10449)
* Fix the offset in get of GCP object

* upgrade compute dependency

* fix version

* review comments

* missed
2020-10-01 08:38:58 -07:00
Vadim Ogievetsky d09fd8b035
Web console: switch to switches instead of checkboxes (#10454)
* switch to switches

* add img alt

* add relative

* change icons

* update snapshot
2020-09-30 15:13:22 -07:00
Clint Wylie 753bce324b
vectorize constant expressions with optimized selectors (#10440) 2020-09-29 13:19:06 -07:00
Gian Merlino 2be1ae128f
RowBasedIndexedTable: Add specialized index types for long keys. (#10430)
* RowBasedIndexedTable: Add specialized index types for long keys.

Two new index types are added:

1) Use an int-array-based index in cases where the difference between
   the min and max values isn't too large, and keys are unique.

2) Use a Long2ObjectOpenHashMap (instead of the prior Java HashMap) in
   all other cases.

In addition:

1) RowBasedIndexBuilder, a new class, is responsible for picking which
   index implementation to use.

2) The IndexedTable.Index interface is extended to support using
   unboxed primitives in the unique-long-keys case, and callers are
   updated to use the new functionality.

Other key types continue to use indexes backed by Java HashMaps.

* Fixup logic.

* Add tests.
2020-09-29 10:46:47 -07:00
Mainak Ghosh 8168e14e92
Adding task slot count metrics to Druid Overlord (#10379)
* Adding more worker metrics to Druid Overlord

* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better

* Few more instance of worker usage replaced with peon

* Modifying the peon idle count logic to only use eligible workers available capacity

* Changing the naming to task slot count instead of peon

* Adding some unit test coverage for the new test runner apis

* Addressing Review Comments

* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics

* Fixing the spelling issue in the docs

* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
2020-09-28 23:50:38 -07:00
Vadim Ogievetsky 729bcba7ac
Web console: Display compaction status (#10438)
* init compaction status

* % compacted

* final UI tweaks

* extracted utils, added tests

* add tests to general foramt functions
2020-09-28 22:19:28 -07:00
Gian Merlino 599aacce0f
Remove Expr.visit. (#10437)
* Remove Expr.visit.

It isn't used and doesn't have tests.

* Remove Visitor too.
2020-09-28 22:13:10 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Chi Cao Minh cbe2b44e29
Compaction config UI optional numShards (#10446)
* Compaction config UI optional numShards

Specifying `numShards` for hashed partitions is no longer required after
https://github.com/apache/druid/pull/10419. Update the UI to make
`numShards` an optional field for hash partitions.

* Update snapshot
2020-09-28 17:15:48 -07:00
Chi Cao Minh d16c78ba98
Add intent for web console IntervalInput (#10447)
When using the web console to load data by reindexing from Druid, the
`Datasource` and `Interval` inputs are required during the `Connect`
step. Unlike the `Datasource` input, the `Interval` input did not have a
blue outline to indicate that it was required as the `IntervalInput`
component did not support an `intent` property.
2020-09-28 17:13:07 -07:00
Torsten Hain 5356d8821b
fix typo in docker/druid.sh (#10433)
DRUID_NEWSIZE should not set MaxNewSize.
2020-09-28 14:38:21 -07:00
Clint Wylie b95bf444b2
add docs for kinesis lag metrics (#10435) 2020-09-28 13:13:53 -07:00
Clint Wylie 64df71f25f
more timeout handling in JsonParserIterator (#10426) 2020-09-28 13:12:42 -07:00
Clint Wylie 3d700a5e31
vectorize remaining math expressions (#10429)
* vectorize remaining math expressions

* fixes

* remove cannotVectorize() where no longer true

* disable vectorized groupby for numeric columns with nulls

* fixes
2020-09-26 23:30:14 -07:00
Chi Cao Minh cbd9ac8592
Web console autocompaction E2E test (#10425)
Add an E2E test for the common case web console workflow of setting up
autocompaction that changes the partitions from dynamic to hashed.

Also fix an issue with the async test setup to properly wait for the web
console to be ready.
2020-09-25 18:28:25 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Jonathan Wei cb30b1fe23
Automatically determine numShards for parallel ingestion hash partitioning (#10419)
* Automatically determine numShards for parallel ingestion hash partitioning

* Fix inspection, tests, coverage

* Docs and some PR comments

* Adjust locking

* Use HllSketch instead of HyperLogLogCollector

* Fix tests

* Address some PR comments

* Fix granularity bug

* Small doc fix
2020-09-24 13:47:53 -07:00
Vadim Ogievetsky 89160c2f9b
better query view initial state (#10431) 2020-09-24 09:49:58 -07:00
Clint Wylie dad69481f0
add light weight version of /druid/coordinator/v1/lookups/nodeStatus (#10422)
* add light weight version /druid/coordinator/v1/lookups/nodeStatus

* review stuffs
2020-09-24 14:36:53 +08:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Vadim Ogievetsky a60d034d01
Web console: compaction dialog update (#10417)
* compaction dialog update

* fix test snapshot

* Update web-console/src/dialogs/compaction-dialog/compaction-dialog.tsx

Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>

* Update web-console/src/dialogs/compaction-dialog/compaction-dialog.tsx

Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>

* feedback changes

Co-authored-by: Chi Cao Minh <chi.caominh@imply.io>
2020-09-23 09:34:19 -07:00
Gian Merlino 1af2eace41
Include Sequence-building time in CPU time metric. (#10377)
* Include Sequence-building time in CPU time metric.

Meaningful work can be done while building Sequences, and we should
count this work. On the Broker, this includes subquery processing
work done by the mergeResults call of the GroupByQueryQueryToolChest.

* Add test.
2020-09-23 14:33:55 +08:00