Commit Graph

10175 Commits

Author SHA1 Message Date
Gian Merlino c204d68376 Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834)
There is a class of bugs due to the fact that BaseObjectColumnValueSelector
has both "getObject" and "isNull" methods, but in most selector implementations
and most call sites, it is clear that the intent of "isNull" is only to apply
to the primitive getters, not the object getter. This makes sense, because the
purpose of isNull is to enable detection of nulls in otherwise-primitive columns.
Imagine a string column with a numeric selector built on top of it. You would
want it to return isNull = true, so numeric aggregators don't treat it as
all zeroes.

Sometimes this design leads people to accidentally guard non-primitive get
methods with "selector.isNull" checks, which is improper.

This patch has three goals:

1) Fix null-handling bugs that already exist in this class.
2) Make interface and doc changes that reduce the probability of future bugs.
3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast
   aggregators while fixing null-handling bugs. I thought about splitting this
   into its own patch, but it ended up being tough to split from the
   null-handling fixes.

For (1) the fixes are,

- Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject
  calls on isNull, by no longer extending NullableAggregatorFactory. Now uses
  -1 as a sigil value for null, to differentiate nulls and empty strings.
- Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use
  eval.asBoolean() to avoid calling getLong on the selector after already
  calling getObject.
- Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls
  on isNull. Also, refactored slightly to avoid the overhead of calling
  getObject followed by another getter (see BloomFilterAggregatorFactory for
  part of this).

For (2) the main changes are,

- Remove the "isNull" method from BaseObjectColumnValueSelector.
- Clarify "isNull" doc on BaseNullableColumnValueSelector.
- Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize
  that it only works on aggregators that take numbers as input.
- Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner.
- Similar naming changes to helper methods for groupBy, ValueMatchers, etc.

For (3) the other fixes for StringFirst and StringLastAggregatorFactory are,

- Fixed buffer overrun in the buffer aggregators when some characters in the string
  code into more than one byte (the old code used "substring" to apply a byte limit,
  which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method.
- Fixed weird IncrementalIndex logic that led to reading nulls for the timestamp.
- Adjusted weird StringFirst/Last logic that worked around the weird IncrementalIndex
  behavior.
- Refactored to share code between the four aggregators.
- Improved test coverage.
- Made the base stringFirst, stringLast aggregators adaptive, and streamlined the
  xFold versions into aliases. The adaptiveness is similar to how other aggregators
  like hyperUnique work.
2019-11-07 17:46:59 -08:00
Evan Ren b03aa060bd Web console: Interval input component (#8777)
* Created temporary interval input component

* Make reusable interval component

* Fixed errors with typing invalid dates

* Fix interval input styling and place into autoform

* Fix styling of popover calendar that opens off the page

* Add snapshot test and change interval to required props

* Add functionality to enter hours minutes second

* Fix min date limit

* Remove console log

* Fix difference in timezone

* Update snapshot test

* Fixed snapshot test without changing min max date

* Made changes based on discussion before converting to hooks

* Rewrote using hooks and deleted duplicate states

* Remove unused states

* Change sql query view numbers to monospace

* Made changes based on discussion

* Removed duplicate state
2019-11-07 13:07:17 -08:00
Clint Wylie 7aafcf8bca parallel broker merges on fork join pool (#8578)
* sketch of broker parallel merges done in small batches on fork join pool

* fix non-terminating sequences, auto compute parallelism

* adjust benches

* adjust benchmarks

* now hella more faster, fixed dumb

* fix

* remove comments

* log.info for debug

* javadoc

* safer block for sequence to yielder conversion

* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool

* smooth yield rate adjustment, more logs to help tune

* cleanup, less logs

* error handling, bug fixes, on by default, more parallel, more tests

* remove unused var

* comments

* timeboundary mergeFn

* simplify, more javadoc

* formatting

* pushdown config

* use nanos consistently, move logs back to debug level, bit more javadoc

* static terminal result batch

* javadoc for nullability of createMergeFn

* cleanup

* oops

* fix race, add docs

* spelling, remove todo, add unhandled exception log

* cleanup, revert unintended change

* another unintended change

* review stuff

* add ParallelMergeCombiningSequenceBenchmark, fixes

* hyper-threading is the enemy

* fix initial start delay, lol

* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2

* fix those important style issues with the benchmarks code

* lazy sequence creation for benchmarks

* more benchmark comments

* stable sequence generation time

* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs

* add jmh thread based benchmarks, cleanup some stuff

* oops

* style

* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose

* retool benchmark to allow modeling more typical heterogenous heavy workloads

* spelling

* fix

* refactor benchmarks

* formatting

* docs

* add maxThreadStartDelay parameter to threaded benchmark

* why does catch need to be on its own line but else doesnt
2019-11-07 11:58:46 -08:00
Zhenxiao Luo a9aa416c3d In DirectDruidClient, don't run Future cancellation listener in… (#8700)
* In DirectDruidClient, don't run Future cancellation listener in HTTP library executor

* extract cancelQuery as a method of DirectDruidClient

* Fix testCancel

* Add exception as the first argument to log.error
2019-11-07 21:12:18 +03:00
Zhenxiao Luo fca23d0c32 use copy-on-write list in InMemoryAppender (#8808)
* use copy-on-write synchronized list in InMemoryAppender

* use copy-on-write list in InMemoryAppender

* Fix comment
2019-11-07 21:11:40 +03:00
Atul Mohan 517c14632e Upgrade joda-time to 2.10.5 (#8821)
* Upgrade joda

* Update license
2019-11-06 14:30:22 -08:00
Jad Naous ce3c0dae4d Add note on JDBC libs for lookups (#8825)
* Add note on JDBC libs for lookups

* Fix directory and additional "the"
2019-11-06 13:31:26 -08:00
Himanshu 5adc8212b4
add documentation for druid docker and k8s operator (#8802)
* add documentation for druid docker and k8s operator

* address review comment and add Kubernetes to spelling file
2019-11-06 12:56:21 -08:00
Roman Leventov 5c0fc0a13a Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564)
* IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals()

* Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever

* Fix tests

* More fixes

* Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code

* Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase

* More test fixes

* More test fixes

* Add a comment to VersionedIntervalTimelineTestBase

* Fix tests

* Set DataSegment.size(0) in more tests

* Specify DataSegment.size(0) in more places in tests

* Fix more tests

* Fix DruidSchemaTest

* Set DataSegment's size in more tests and benchmarks

* Fix HdfsDataSegmentPusherTest

* Doc changes addressing comments

* Extended doc for visibility

* Typo

* Typo 2

* Address comment
2019-11-06 11:07:04 -08:00
Fangjin Yang 7b77cf142f
Update README.md (#8829)
small edits to the druid readme
2019-11-06 08:59:00 -08:00
Vadim Ogievetsky 7addfc27da
Web console: fine grained capabilities / graceful degradation (#8805)
* fine grained capabilities

* fix tests

* configure all cards

* better detection

* update tests

* rename server to service

* node -> service

* remove console log

* better loader in data loader
2019-11-05 23:39:14 -08:00
Vadim Ogievetsky 6f7fbeb63a Fix logo overflow (#8817) 2019-11-05 21:52:38 -08:00
Vadim Ogievetsky c2889ca4f4 show hollow circle when unavailable (#8819) 2019-11-05 21:50:10 -08:00
Fokko Driesprong 3b602da8f7 Bump Apache Thrift to 0.10.0 (#8419)
* Bump Apache Thrift to 0.10.0

* Remove unused dependency

* Bump maven-scrooge-plugin to the latest
2019-11-05 15:38:50 -08:00
Jihoon Son 511fa74fa2 Move maxFetchRetry to FetchConfig; rename OpenObject (#8776) 2019-11-04 08:26:33 -08:00
Clint Wylie 49bd16766f serve web-console even if router management proxy is not enabled (#8797) 2019-10-31 21:15:40 -07:00
Vadim Ogievetsky 16aaf7227e Web console: work in IE11 (#8804)
* fix IE11

* also support flexbox
2019-10-31 21:03:05 -07:00
Vadim Ogievetsky f6028de7a8 Web console: use SQL for the supervisor view (#8796)
* use SQL for supervisor view

* home view sql also

* no proxy mode

* fix alert

* improve message
2019-10-31 20:59:36 -07:00
Tijo Thomas 27acdbd2b8 'hadoop fs' command is deprecated . The new approach is to use hdfs command . Replacing 'hadoop fs' command with 'hdfs dfs' (#8762) 2019-11-01 04:42:10 +05:30
Giuseppe Martino 9c171e2b1f Message rejection absolute date (#8656)
* Add option lateMessageRejectionStartDate

* Use option lateMessageRejectionStartDate

* Fix tests

* Add lateMessageRejectionStartDate to kafka indexing service

* Update tests kafka indexing service

* Fix tests for KafkaSupervisorTest

* Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig

* Fix var name

* Update documentation

* Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.
2019-10-31 15:13:02 -07:00
Gian Merlino e70b71c90f
Fix verify script. (#8798)
Accidentally missed some quote escaping in #8794.
2019-10-30 23:30:01 -07:00
Gian Merlino edb3b00d26 Startup scripts: verify Java 8 (exactly), improve port/java verification messages. (#8794)
* Startup scripts: verify Java 8 (exactly), improve port/java verification messages.

Java 11 compatibility isn't fully baked yet (users have reported various
issues on Java 11), so block startup with an error message unless Java 8
is found. Allow overriding this decision with an environment variable.

* Message adjustments.
2019-10-30 22:37:05 -07:00
yuanli bca649e492 Case sensitive comparison of nonbinary string in MySQL metadata storage (#8758) 2019-10-30 20:48:08 -07:00
Vadim Ogievetsky ed6be81d12 Web console: fix error when querying with grand totals (#8795)
* fix error when querying with grand totals

* also support object

* improve tests
2019-10-30 19:37:53 -07:00
Clint Wylie 3ff5e02237 remove select query (#8739)
* remove select query

* thanks teamcity

* oops

* oops

* add back a SelectQuery class that throws RuntimeExceptions linking to docs

* adjust text

* update docs per review

* deprecated
2019-10-30 19:29:56 -07:00
Jihoon Son 2363b61983
Asynchronous file copy in the shuffle of parallel indexing task (#8783) 2019-10-30 18:00:05 -07:00
Gian Merlino 7605c23354 Remove Tranquility configs and certain doc references. (#8793)
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
2019-10-30 16:30:16 -07:00
Gian Merlino c922d2c3c9 Use bundled ZooKeeper in tutorials. (#8792) 2019-10-30 16:17:28 -07:00
Vadim Ogievetsky 929a8b6337 Web console: Support all possible metric types in the data loader (#8785)
* Support all possible metric types in the data loader

* added more sketches
2019-10-30 09:34:13 -07:00
Vadim Ogievetsky a95e3d438e Web console: Data loader user feedback changes (#8770)
* init fixes

* cleaning styling issues

* more conversion types
2019-10-29 08:42:51 -07:00
Roman Leventov 3e9723e3ce
Add an item to concurrency checklist about assertions in parall… (#8701)
Add an item to concurrency checklist about assertions in parallel threads and async code in tests
2019-10-29 11:38:04 +03:00
karthikbhat13 b8ceee4eee Removed 'if' condition. (#8768) 2019-10-28 13:40:03 -07:00
Gian Merlino aa81253cf4 Fix typos. (#8767) 2019-10-28 12:47:01 -07:00
Vadim Ogievetsky 11230dff52 Support HDFS firehose (#8752) 2019-10-28 08:22:20 -07:00
Vadim Ogievetsky ec8ce74f1c Web console: Better data loader flow (#8763)
* filter table

* go over the entire data loader flow
2019-10-28 08:08:46 -07:00
Gian Merlino b65d2ac648 Add HDFS firehose (#8754)
* Add HDFS firehose.

* Tests, support for lists of paths.

* Fixups.

* Update list of firehoses.

* Wildcards is a word.
2019-10-28 08:07:38 -07:00
Vadim Ogievetsky f9b94a5db1
Docs: remove self link (#8760)
This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.
2019-10-27 22:33:22 -07:00
Vadim Ogievetsky 1b9d4ce811 Web console: Memoize all the functional components and improve transform step highlighting (#8757)
* Fix transform table

* memoize all components

* use named functions
2019-10-26 17:58:26 -07:00
Chi Cao Minh ad615438f6 Fix Hadoop tutorial Dockerfile (#8753)
When following the instructions from Hadoop batch load tutorial in the
docs, building the Docker images fails with:

  gzip: stdin: unexpected end of file
  tar: Child returned status 1
  tar: Error is not recoverable: exiting now

Updating nss allows the curl command for downloading Hadoop to succeed
while building the Docker image.
2019-10-25 17:58:40 -07:00
Vadim Ogievetsky 80be9462a6 Web console: Update versions and dependencies (#8751)
* update console version and deps

* bump script version
2019-10-25 17:53:06 -07:00
Vadim Ogievetsky 9eb68b1d53
Add version to canonical URL (#8747) 2019-10-25 12:46:39 -07:00
Vadim Ogievetsky 27127345b7 Add delimiter option for TSV parser (#8741) 2019-10-25 09:17:59 -07:00
Xiaobao b9d10473a5 fix typo (#8745) 2019-10-25 19:21:56 +08:00
Clint Wylie 09f92818d4 update druid expression docs to indicate that array functions do not work at indexing time (#8734)
* update druid expression docs to indicate that array functions are not supported in transformSpec

* fix unrelated spelling check
2019-10-24 22:04:08 -07:00
Eyal Yurman 14e33428f0 Moving Average extention: Add Sum averagers (#8511)
* Add sum averagers.

* avoid casting double to long.
2019-10-24 16:37:24 -07:00
Vadim Ogievetsky 774ce3ce6d add canonical to the docs (#8731) 2019-10-24 15:25:57 -07:00
Vadim Ogievetsky efd669757e fix save button (#8732) 2019-10-24 15:25:28 -07:00
Vadim Ogievetsky cc3650ee3b fix doc headers (#8729) 2019-10-24 11:17:39 -07:00
Evan Ren fdbc4ae147 Web console: Button to pretty print Druid JSON query (#8724)
* Add button and functionality to pretty format Rune JSON

* Removed console log

* Fix lgtm error about updating state

* Update test snapshot
2019-10-23 20:24:47 -07:00
Vadim Ogievetsky 137c2a6025 Web console: disable data loader Submit button when submitting so as not to submit multiple times (#8725)
* disable Submit button when submitting so as not to submit twice

* also check in fn
2019-10-23 18:42:44 -07:00