8503 Commits

Author SHA1 Message Date
Caroline1000
c73e3ea4f5 Provide examples to havingSpec filters (#5774)
* expand examples

* expand examples for filtered havingSpecs

* expand other having examples

* remove blank code block

* add better AND/OR/NOT examples

* fix indentation
2018-05-14 13:43:42 -07:00
Alexander Saydakov
15864434be ArrayOfDoublesSketch module (#5148)
* ArrayOfDoublesSketch module

* UTF-8 fix

* javadoc, style fixes

* more style fixes

* null key selector fix

* more style fixes

* removed @Override, strict compiler doesn't like it

* removed @Override, strict compiler doesn't like it

* IndexedInts is not autoclosable? removed one more @0verride

* synchronized with upstream master

* removed unused imports

* addressed review points

* null fix

* addressed review points

* IAE from druid package

* synchronized aggregate() and get()

* use locks per buffer position

* corrected javadoc

* style fixes

* added lock and narrowed the scope

* addressed review comments

* conflict resolution went wrong

* addressed review comments

* javadoc

* javadoc links

* fully qualified name since there is no import for this class

* addressed review points

* style fix

* StandardCharsets.UTF_8

* addressed review points

* added @Override

* added equals and hashCode tests for post aggs

* formatting

* suppress warnings

* optimal IndexedInts iteration

* suppress SelfEquals

* added comments about getClass() in equals()
2018-05-13 15:48:00 +03:00
Jonathan Wei
7a1faa332f Fix KerberosAuthenticator serverPrincipal host replacement (#5766) 2018-05-10 11:04:49 +05:30
Dylan Wylie
e8caf02147 Revert "Use a bimap for reverse lookups on injective maps" (#5764)
* Revert "Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor (#5704)"

This reverts commit c7a59394e0f4d9b870eda3f2b11ddd5d1acb4d0d.

* Revert "Fix metrics for inserting segments (#5749)"

This reverts commit c9d645103b3e712b678cdca9695800232a025ab0.

* Revert "Typo fix in historical doc (#5753)"

This reverts commit aa23fe638632d15191e2bdacb193cd7a45fca507.

* Revert "Use a bimap for reverse lookups on injective maps (#5681)"

This reverts commit e1277d306c7d3a4afd40b0d6caf97858143ee197.
2018-05-09 19:12:36 -07:00
Samarth Jain
13ba840653 Remove setParitionerClass call from SortableBytes (#5677)
* Remove setParitionerClass call from SortableBytes since callers override the paritioner class themselves

* Get rid of SortableBytesPartitioner class

* Make partitioner class a parameter
2018-05-09 18:59:42 -07:00
Surekha
2f8904e25f Check against the real default of maxBytes(1/6 max mem) in AppenderatorImpl's add (#5758)
* The check for maxBytesInMemory should be >= 0 instead of > 0

* if the default value is 0, the actual check could be skipped
* fix the message for persistReasons

* Address PR comments

* if maxBytes set -1, make is Long.MAX_VAL, so we do not need to check if it's 0 or -1
* set the maxBytesTuningconfig in AppenderatorImpl constructor to avoid duplicate code

* fix the failing test cases

* Address PR comments
2018-05-09 13:41:51 -07:00
Jihoon Son
c7a59394e0 Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor (#5704)
* Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor

* fix build

* fix logging
2018-05-08 19:03:54 -07:00
Jihoon Son
c9d645103b Fix metrics for inserting segments (#5749)
* Fix metrics for inserting segments

* Add a comment
2018-05-08 13:07:39 -07:00
Abhishek Kaushik
aa23fe6386 Typo fix in historical doc (#5753) 2018-05-08 11:08:27 -07:00
Dylan Wylie
e1277d306c Use a bimap for reverse lookups on injective maps (#5681)
* Use a bimap for reverse lookups on injective maps

- A BiMap provides constant-time lookups for mapping values to keys

* Address comments

* Fix Tests
2018-05-07 18:46:21 -07:00
Kirill Kozlov
67d0b0ee42 Add taskType dimension to task metrics (#5664) 2018-05-07 09:42:26 -07:00
Fokko Driesprong
a95ec92296 Move to the org.lz4 dependency (#5746)
The net.jpountz.lz4 moved to org.lz4
2018-05-07 08:16:45 -07:00
Slim Bouguerra
8aa8d9fa5b
Kerberos Spnego Authentication Router Issue (#5706)
* Adding decoration method to proxy servlet

Change-Id: I872f9282fb60bfa20524271535980a36a87b9621

* moving the proxy request decoration to authenticators

Change-Id: I7f94b9ff5ecf08e8abf7169b58bc410f33148448

* added docs

Change-Id: I901543e52f0faf4666bfea6256a7c05593b1ae70

* use the authentication result to decorate request

Change-Id: I052650de9cd02b4faefdbcdaf2332dd3b2966af5

* adding authenticated by name

Change-Id: I074d2933460165feeddb19352eac9bd0f96f42ca

* ensure that authenticator is not null

Change-Id: Idb58e308f90db88224a06f3759114872165b24f5

* fix types and minor bug

Change-Id: I6801d49a05d5d8324406fc0280286954eb66db10

* fix typo

Change-Id: I390b12af74f44d760d0812a519125fbf0df4e97b

* use actual type names

Change-Id: I62c3ee763363781e52809ec912aafd50b8486b8e

* set authenitcatedBy to null for AutheticationResults created by
Escalator.

Change-Id: I4a675c372f59ebd8a8d19c61b85a1e4bf227a8ba
2018-05-05 20:33:51 -07:00
kaijianding
c12c16385e support throw duplcate row during realtime ingestion in RealtimePlumber (#5693) 2018-05-04 10:12:25 -07:00
Dylan Wylie
2c5f0038fd Make lookup offheap buffer configurable (#5696)
* Make lookup offheap buffer configurable

Fixes #3663

* Address comments

* Update docs

* Update docs
2018-05-04 10:00:55 -07:00
Stuart McLean
c2b5e5ec95 Default caffeine cache size (#5738)
* add default caffeine cache size based on runtime Xmx or max 1GB

* update docs for caffeine cache

* fix formatting

* test caffeine size should never be less than 0

* set caffeine max default size to 1G not 1M

* fix caffeine cache tests
2018-05-04 09:29:11 -07:00
Surekha
13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583)
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
2018-05-03 16:25:58 -07:00
Gian Merlino
739e347320 Allow Hadoop dataSource inputSpec to be specified multiple times. (#5717)
* Allow Hadoop dataSource inputSpec to be specified multiple times.

* Fix test
2018-05-03 13:51:57 -07:00
Gian Merlino
df01998213 SegmentLoadDropHandler: Fix deadlock when segments have errors loading on startup. (#5735)
The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
2018-05-03 09:59:01 -07:00
Stuart McLean
d2b8d880ea include hybrid and caffeine in cache docs and show caffeine as default (#5737) 2018-05-03 09:52:05 -07:00
Jihoon Son
2c8296f94d Fix Appenderator.push() to commit the metadata of all segments (#5730)
* Remove persist from Appenderator

* fix javadoc
2018-05-02 13:17:54 -07:00
Jihoon Son
d4311b4a5a Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702)
* Support enablePathStyleAccess and disableChunkedEncoding for aws client

* add an option for forceGlobalBucketAccessEnabled

* add missing doc
2018-05-02 10:45:38 -07:00
Jakub Kukul
e2431ae161 Update defaultHadoopCoordinates in documentation. (#5720)
* Update defaultHadoopCoordinates in documentation.

To match changes applied in #5382.

* Remove a parameter with defaults from example configuration file.

If it has reasonable defaults, then why would it be in an example config file?

Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future.

Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes?

* Fix typo in documentation.
2018-04-30 20:49:14 -07:00
Nishant Bangarwa
a0c2ae7a38 Fix NullPointerException when in DeterminePartitionsJob for Hadoop 3.0 and later versions (#5724)
In DeterminePartitonsJob -
config.get("mapred.job.tracker").equals("local") throws NPE as the
property name is changed in hadoop 3.0 to mapreduce.jobtracker.address

This patch extracts the logic to fetch jobTrackerAddress in JobHelper
and reuses it when needed.
2018-04-30 20:41:58 -07:00
Dylan Wylie
754c80e74a Fix quickstart docs to specify that Java 8 is required. (#5722)
See #4907 #5719
2018-04-30 13:25:59 -07:00
Gian Merlino
0f8493846e Replace dev list references in docs. (#5723) 2018-04-30 11:25:45 -07:00
David Lim
8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Gian Merlino
762f8829e4
Add task action metrics, add taskId metric dimension. (#5714)
* Add task action metrics, add taskId metric dimension.

Adds two new metrics: task/action/log/time and task/action/run/time. Also
adds taskId as a dimension, to give us the ability to drill down into metrics
for an individual task. Also standardizes metrics-attachment using two helper
methods in IndexTaskUtils.

* Fix typo
2018-04-29 21:24:06 -07:00
Joseph Glanville
90cd05696e Document processing properties required for Middlemanager (#5660) 2018-04-29 17:20:17 -07:00
Jihoon Son
23dc0d5b24 Better logging for taskLockBox (#5703) 2018-04-28 21:08:10 -07:00
Jonathan Wei
513fab77d9 Lazy init of fullyQualifiedStorageDirectory in HDFS pusher (#5684)
* Lazy init of fullyQualifiedStorageDirectory in HDFS pusher

* Comment

* Fix test

* PR comments
2018-04-28 21:07:39 -07:00
kaijianding
4db9e39a71 fix NPE when buffersList contains null in SmooshedFileMapper (#5689)
* fix NPE when buffersList contains null

* address the comment
2018-04-27 18:15:04 -07:00
Jihoon Son
86746f82d8 Use mergeBuffer instead of processingBuffer in parallelCombiner (#5634)
* Use mergeBuffer instead of processingBuffer in parallelCombiner

* Fix test

* address comments

* fix test

* Fix test

* Update comment

* address comments

* fix build

* Fix test failure
2018-04-27 18:14:37 -07:00
Roman Leventov
9be000758d Refactor index merging, replace Rowboats with RowIterators and RowPointers (#5335)
* Refactor index merging, replace Rowboats with RowIterators and RowPointers

* Add javadocs

* Fix a bug in QueryableIndexIndexableAdapter

* Fixes

* Remove unused declarations

* Remove unused GenericColumn.isNull() method

* Fix test

* Address comments

* Rearrange some code in MergingRowIterator for more clarity

* Self-review

* Fix style

* Improve docs

* Fix docs

* Rename IndexMergerV9.writeDimValueAndSetupDimConversion to setUpDimConversion()

* Update Javadocs

* Minor fixes

* Doc fixes, more code comments, cleanup of RowCombiningTimeAndDimsIterator

* Fix doc link
2018-04-27 17:34:32 -07:00
Erik Dubbelboer
71ecb71dcf Add DataSegmentFinder for Google Storage adapter (#5686)
Partially fixes #5628
2018-04-26 14:00:49 -07:00
Gian Merlino
f81855d607
Add unauthorized errorCode to query docs. (#5691) 2018-04-26 13:06:25 -07:00
Gian Merlino
dc786ebc4c SQL: Remove some unused code. (#5690) 2018-04-24 11:42:16 -07:00
Jihoon Son
034a0aa42b Fix wrong null check in TaskStatusPlus (#5678) 2018-04-24 10:59:29 -07:00
Charles Allen
2e76012aca Allow GCS data segment killer to delete if present (#5675) 2018-04-24 07:16:54 -07:00
Caroline1000
fd76af9737 remove old prod cluster config link (#5676) 2018-04-23 18:00:24 -07:00
Slim Bouguerra
73da7426da
Timeseries results are incoherent for case interval is out of range and case false filter. (#5649)
* adding some tests

Change-Id: I92180498e2e6695212b286d980e349c136c78c86

* added empty sequence runner

Change-Id: I20c83095072bbf3b4a3a57dfc1934d528e2c7a1a

* treat only granularity ALL

Change-Id: I1d88fab500c615bc46db4f4497ce93089976441f

* moving toList within If and add expected queries

Change-Id: I56cdd980e44f0685806efb45e29031fa2e328ec4

* typo

Change-Id: I42fdd28da5471f6ae57d3962f671741b106300cd

* adding tests and fix logic of intervals

Change-Id: I0bd414d2278e3eddc2810e4f5080e6cf6a117f12

* fix style

Change-Id: I99a2380934c9ab350ca934c56041dc343c08b99f

* comments review

Change-Id: I726a3b905a9520d8b1db70e4ba17853c65c414a4
2018-04-23 15:55:18 -07:00
David Lim
55b003e5e8 Fix loadstatus?full double counting expected segments (#5667)
* fix loadstatus?full double counting expected segments

* remove possible flakiness from Thread.sleep() in test
2018-04-24 01:11:16 +05:30
Roman Leventov
a3a9ada843 Add GenericWhitespace checkstyle check (#5668) 2018-04-24 01:09:14 +05:30
Jihoon Son
ca3f833426 Fix coordinator's dataSource api with full parameter (#5662)
* Fix coordinator's dataSource api with full parameter

* address comment

* Add a constructor for json serde and fix result order

* Change to immutableSortedMap

* Revert immutableSortedMap to treeMap
2018-04-19 17:41:53 -07:00
scrawfor
15f4ab2b31 Expose noop filter to users (#5597) 2018-04-18 07:57:07 -07:00
Gian Merlino
5d09f76df6 topN: Fix caching of Float dimension values. (#5653)
Jackson would deserialize them as Doubles, leading to ClassCastExceptions
in the topN processing pipeline when it attempted to treat them as Floats.
2018-04-17 15:35:18 -07:00
Kirill Kozlov
a7ba2bf275 Detailed error message when unable to create temp dir (#5648) 2018-04-17 15:12:46 -07:00
Charles Allen
8e441cd142 Fix cache bug in stats module (#5650) 2018-04-17 15:11:03 -07:00
Gian Merlino
fbf3fc178e Timeseries: Add "grandTotal" option. (#5640)
* Timeseries: Add "grandTotal" option.

* Modify whitespace.

* Checkstyle workaround.
2018-04-16 18:22:19 -07:00
Jonathan Wei
d0b66a6af5 Fix HTTP OPTIONS request auth handling (#5638)
* Fix HTTP OPTIONS request auth handling

* PR comment

* More PR comments

* Fix

* PR comment
2018-04-16 18:09:56 -07:00