Commit Graph

4250 Commits

Author SHA1 Message Date
Maytas Monsereenusorn 4576152e4a
Make dropExisting flag for Compaction configurable and add warning documentations (#11070)
* Make dropExisting flag for Compaction configurable

* fix checkstyle

* fix checkstyle

* fix test

* add tests

* fix spelling

* fix docs

* add IT

* fix test

* fix doc

* fix doc
2021-04-09 00:12:28 -07:00
Lucas Capistrant 8264203cee
Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676)
* Add ability to wait for segment availability for batch jobs

* IT updates

* fix queries in legacy hadoop IT

* Fix broken indexing integration tests

* address an lgtm flag

* spell checker still flagging for hadoop doc. adding under that file header too

* fix compaction IT

* Updates to wait for availability method

* improve unit testing for patch

* fix bad indentation

* refactor waitForSegmentAvailability

* Fixes based off of review comments

* cleanup to get compile after merging with master

* fix failing test after previous logic update

* add back code that must have gotten deleted during conflict resolution

* update some logging code

* fixes to get compilation working after merge with master

* reset interrupt flag in catch block after code review pointed it out

* small changes following self-review

* fixup some issues brought on by merge with master

* small changes after review

* cleanup a little bit after merge with master

* Fix potential resource leak in AbstractBatchIndexTask

* syntax fix

* Add a Compcation TuningConfig type

* add docs stipulating the lack of support by Compaction tasks for the new config

* Fixup compilation errors after merge with master

* Remove erreneous newline
2021-04-08 21:03:00 -07:00
Xavier Léauté 15bdd6bc2f
Fix unit tests and GC settings for Java 15 (#11074)
* JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it
* Fix flaky HTTP client tests with Java 15
* Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15
2021-04-08 10:33:37 -07:00
Abhishek Agarwal 0df0bff44b
Enable multiple distinct aggregators in same query (#11014)
* Enable multiple distinct count

* Add more tests

* fix sql test

* docs fix

* Address nits
2021-04-07 00:52:19 -07:00
Jihoon Son cc12a57034
Enforce allow list for JDBC properties by default (#11063)
* Enforce allow list for JDBC properties by default

* fix tests
2021-04-06 19:46:19 -07:00
zachjsh 8cf1e83543
Add paramter to loadstatus API to compute underdeplication against cluster view (#11056)
* Add paramter to loadstatus API to compute underdeplication against cluster view

This change adds a query parameter `computeUsingClusterView` to loadstatus apis
that if specified have the coordinator compute undereplication for segments based
on the number of services available within cluster that the segment can be replicated
on, instead of the configured replication count configured in load rule. A default
load rule is created in all clusters that specified that all segments should be
replicated 2 times. As replicas are forced to be on separate nodes in the cluster,
this causes the loadstatus api to report that there are under-replicated segments
when there is only 1 data server in the cluster. In this case, calling loadstatus
api without this new query parameter will always result in a response indicating
under-replication of segments

* * fix exception mapper

* * Address review comments

* * update external API docs

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* * update more external docs

* * update javadoc

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-04-05 00:02:43 -04:00
Jihoon Son cfcebc40f6
Allow list for JDBC connection properties to address CVE-2021-26919 (#11047)
* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11
2021-04-01 17:30:47 -07:00
Maytas Monsereenusorn d7f5293364
Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (#11025)
* Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer.

* Add option to drop segments after ingestion

* fix checkstyle

* add tests

* add tests

* add tests

* fix test

* add tests

* fix checkstyle

* fix checkstyle

* add docs

* fix docs

* address comments

* address comments

* fix spelling
2021-04-01 12:29:36 -07:00
Parag Jain b35486fa81
request logs through kafka emitter (#11036)
* request logs through kafka emitter

* travis fixes

* review comments

* kafka emitter unit test

* new line

* travis checks

* checkstyle fix

* count request lost when request topic is null
2021-04-01 11:31:32 +05:30
Lasse Krogh Mammen 782a1d4e6c
Add Calcite Avatica protobuf handler (#10543) 2021-03-31 12:46:25 -07:00
Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00
Maytas Monsereenusorn c87ac0823f
Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap (#11019)
* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap

* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap

* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap

* Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap

* address comments
2021-03-24 11:37:29 -07:00
Jihoon Son a041933017
Allow overlapping intervals for the compaction task (#10912)
* Allow overlapping intervals for the compaction task

* unused import

* line indentation

Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>
2021-03-23 11:21:54 -07:00
Maytas Monsereenusorn 51d2c61f1c
Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity (#11009)
* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity

* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity

* Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity

* address comments

* address comments

* address comments

* address comments

* address comments
2021-03-19 17:38:28 -07:00
Samarth Jain 5fae7dfcf2
Fix regression introduced by #11008 (#11013)
* Fix regression introduced by #11008

* Add back and tweak the check to not inspect resources for authorization when AllowAllAuthorizer is configured.
Add a unit test to validate that the change doesn't introduce new behavior.
2021-03-19 17:15:03 -07:00
Maytas Monsereenusorn f19c2e9ce4
If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948)
* add IT

* add IT

* add the fix

* fix checkstyle

* fix compile

* fix compile

* fix test

* fix test

* address comments
2021-03-18 17:04:28 -07:00
Samarth Jain 83fcab1d0f
Improve performance of queries against SYSTEM.SEGMENT table. (#11008)
Size HashMap and HashSet appropriately. Perf analysis of the queries
revealed that over 25% of the query time was spent in resizing HashMap and HashSet
collections. Also, prevent the need to examine and authorize all resources when
AllowAllAuthorizer is the configured authorizer.
2021-03-17 22:24:02 -07:00
Atul Mohan 3d7e7c2c83
Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213)
* Skip queue removal on timeout

* Clarify error

* Add new config to control replication

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2021-03-17 11:34:05 -07:00
Xavier Léauté 1061faa6ba
prefer string concatenation over String.format in performance sensitive code (#10997)
String.format relies on regex parsing, which makes these calls expensive
at higher request volumes.
2021-03-16 22:06:26 -07:00
Maytas Monsereenusorn f37713dc6d
Fix auto compaction with mixed versions in the same time chunk based on new segment granularity (#11000) 2021-03-16 12:48:19 -07:00
zhangyue19921010 3277479ff7
[Minor]Add metadata-related logs and missing UT for kill tasks. (#10956)
* logs more info when delete segments && add deleteSegments-related UT

* revert msic.xml

* code review

* use log.debugSegments

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-11 18:00:52 -08:00
Xavier Léauté d26e1bc70d
update code check plugins for Java 15 support (#10978)
* update maven-forbidden-api plugin to 3.1
* update maven-pmd-plugin to 3.14
* update spotbugs to 4.2.2
* fixes validation failures newly caught by those updates
  - fix SpotBugs NP_NONNULL_PARAM_VIOLATION
  - fix PMD UnnecessaryFullyQualifiedName
2021-03-11 07:31:41 -08:00
Abhishek Agarwal c66951a59e
Add flag in SQL to disable left base filter optimization for joins (#10947)
* Add flag to disable left base filter

* code coverage

* Draft

* Review comments

* code coverage

* add docs

* Add old tests
2021-03-09 13:07:34 -08:00
Abhishek Agarwal 489f5b1a03
Avoid expensive findEntry call in segment metadata query (#10892)
* Avoid expensive findEntry call in segment metadata query

* other places

* Remove findEntry

* Fix add cost

* Refactor a bit

* Add performance test

* Add comment

* Review comments

* intellij
2021-03-08 22:08:33 -08:00
Jihoon Son 9946306d4b
Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830)
* Allow only HTTP and HTTPS protocols for the HTTP inputSource

* rename

* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* fix http firehose and update doc

* HDFS inputSource

* add configs for allowed protocols

* fix checkstyle and doc

* more checkstyle

* remove stale doc

* remove more doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* update hdfs address in docs

* fix test

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-06 11:43:00 -08:00
zhangyue19921010 bddacbb1c3
Dynamic auto scale Kafka-Stream ingest tasks (#10524)
* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* test dynamic auto scale done

* auto scale tasks tested on prd cluster

* auto scale tasks tested on prd cluster

* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20

* rename test fiel function

* change codes and add docs based on capistrant reviewed

* midify test docs

* modify docs

* modify docs

* modify docs

* merge from master

* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there &&  Make autoscaling algorithm configurable and scalable.

* fix ci failed

* revert msic.xml

* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable

* add more uts

* fix inner class check

* add IT for kafka ingestion with autoscaler

* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler

* review change

* code review

* remove unused imports

* fix NLP

* fix docs and UTs

* revert misc.xml

* use jackson to build autoScaleConfig with default values

* add uts

* use jackson to init AutoScalerConfig in IOConfig instead of Map<>

* autoscalerConfig interface and provide a defaultAutoScalerConfig

* modify uts

* modify docs

* fix checkstyle

* revert misc.xml

* modify uts

* reviewed code change

* reviewed code change

* code reviewed

* code review

* log changed

* do StringUtils.encodeForFormat when create allocationExec

* code review && limit taskCountMax to partitionNumbers

* modify docs

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-06 14:36:52 +05:30
Abhishek Agarwal 1a15987432
Supporting filters in the left base table for join datasources (#10697)
* where filter left first draft

* Revert changes in calcite test

* Refactor a bit

* Fixing the Tests

* Changes

* Adding tests

* Add tests for correlated queries

* Add comment

* Fix typos
2021-03-04 10:39:21 -08:00
Maytas Monsereenusorn 23333914c7
add javadoc and test (#10938) 2021-03-03 11:34:00 +08:00
Maytas Monsereenusorn b7b0ee8362
Add query granularity to compaction task (#10900)
* add query granularity to compaction task

* fix checkstyle

* fix checkstyle

* fix test

* fix test

* add tests

* fix test

* fix test

* cleanup

* rename class

* fix test

* fix test

* add test

* fix test
2021-03-02 11:23:52 -08:00
Gian Merlino 07902f607b
Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904)
* Granularity: Introduce primitive-typed bucketStart, increment methods.

Saves creation of unnecessary DateTime objects in timestamp_floor and
timestamp_ceil expressions.

* Fix style.

* Amp up the test coverage.
2021-02-25 07:59:20 -08:00
Abhishek Agarwal 3a0a0c033f
Reload segment usage when starting the process (#10884)
* Reload segment usage when starting the process

* doc

* Add more tests

* remove forbidden method

* Add alert
2021-02-22 00:08:44 -08:00
Maytas Monsereenusorn f5bfccc720
Fix maxBytesInMemory for heap overhead of all sinks and hydrants check (#10891)
* fix maxBytesInMemory

* fix maxBytesInMemory check

* fix maxBytesInMemory check

* fix test
2021-02-18 21:48:57 -08:00
Jonathan Wei 8ad68135c8
Filter unauthorized views in InformationSchema (#10874)
* Filter unauthorized views in InformationSchema

* Use fixed name for view schema

* Remove unused string
2021-02-16 17:36:45 -08:00
Maytas Monsereenusorn 6541178c21
Support segmentGranularity for auto-compaction (#10843)
* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* resolve conflict

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* fix tests

* fix more tests

* fix checkstyle

* add unit tests

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add unit tests

* add integration tests

* fix checkstyle

* fix checkstyle

* fix failing tests

* address comments

* address comments

* fix tests

* fix tests

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test
2021-02-12 03:03:20 -08:00
Jihoon Son 1ec3f0bd73
Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535)" (#10871)
This reverts commit 6b14bdb3a5.
2021-02-09 17:51:26 -08:00
Clint Wylie fe30f4b414
refactor sql lifecycle, druid planner, views, and view permissions (#10812)
* before i leaped i should've seen, the view from halfway down

* fixes

* fixes, more test

* rename

* fix style

* further refactoring

* review stuffs

* rename

* more javadoc and comments
2021-02-05 12:56:55 -08:00
Jihoon Son ac41e41232
Update doc for query errors and add unit tests for JsonParserIterator (#10833)
* Update doc for query errors and add unit tests for JsonParserIterator

* static constructor for convenience

* rename method
2021-02-05 02:55:32 -08:00
Abhishek Agarwal 96d26e5338
Fix kinesis ingestion bugs (#10761)
* add offsetFetchPeriod to kinesis ingestion doc

* Remove jackson dependencies from extensions

* Use fixed delay for lag collection

* Metrics reset after finishing processing

* comments

* Broaden the list of exceptions to retry for

* Unit tests

* Add more tests

* Refactoring

* re-order metrics

* Doc suggestions

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Add tests

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-02-05 02:49:58 -08:00
Agustin Gonzalez 0e4750bac2
Granularity interval materialization (#10742)
* Prevent interval materialization for UniformGranularitySpec inside the overlord

* Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval>

* Javadoc update, respect inputIntervals contract

* Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec

* Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity

* Fix Travis style & other checks

* Refactor TreeSet to facilitate re-use in UniformGranularitySpec

* Make sure intervals are unique when there is no segment granularity

* Style/bugspot fixes...

* More travis checks

* Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method

* Style & PR feedback

* Fixed failing test

* Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change)

* Refactor so that we can get the condensed buckets without materializing the intervals

* Get rid of GranularitySpec::condensedInputIntervals ... not needed

* Travis failures fixes

* Travis checkstyle fix

* Edited/added javadoc comments and a method name (code review feedback)

* Fixed jacoco coverage by moving class and adding more coverage

* Avoid materializing the condensed intervals when locking

* Deal with overlapping intervals

* Remove code and use library code instead

* Refactor intervals by granularity using the FluentIterable, add sanity checks

* Change !hasNext() to inputIntervals().isEmpty()

* Remove redundant lambda

* Use materialized intervals here since this is outside the overlord (for performance)

* Name refactor to reflect the fact that bucket intervals are sorted.

* Style fixes

* Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense)

* Remove forbidden api

* Move helper class inside common base class to reduce public space pollution
2021-01-29 06:02:10 -08:00
Abhishek Agarwal 0080e333cc
Fix cardinality estimation (#10762)
* Fix cardinality estimation

* Add unit test

* code coverage

* fix typo
2021-01-28 15:06:10 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
zhangyue19921010 2590ad4f67
Historical unloads damaged segments automatically when lazy on start. (#10688)
* ready to test

* tested on dev cluster

* tested

* code review

* add UTs

* add UTs

* ut passed

* ut passed

* opti imports

* done

* done

* fix checkstyle

* modify uts

* modify logs

* changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment

* merge from master

* modify import orders

* merge from master

* merge from master

* modify logs

* modify docs

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-16 19:53:30 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
kaijianding 4437c6af60
use actual dataInterval in cache key (#10714)
* use actual dataInterval in cache key

* fix ut fail

* fix segmentMaxTime exclusive
2021-01-13 18:31:36 -08:00
Jihoon Son b3325c1601
Add a config for monitorScheduler type (#10732)
* Add a config for monitorScheduler type

* check interrupted

* null check

* do not schedule monitor if the previous one is still running

* checkstyle

* clean up names

* change default back to basic

* fix test
2021-01-13 17:20:43 -08:00
Jihoon Son 149306c9db
Tidy up HTTP status codes for query errors (#10746)
* Tidy up query error codes

* fix tests

* Restore query exception type in JsonParserIterator

* address review comments; add a comment explaining the ugly switch

* fix test
2021-01-13 17:20:00 -08:00
Lucas Capistrant aecc9e5e7e
Remove legacy code from LogUsedSegments duty (#10287)
* allow the LogUsedSegments duty to be skippped

* Fixes for TravisCI coverage checks and documentation spell checking

* prameterize DruidCoordinatorTest in order to achieve coverage

* update config name to remove duty ref and improve documentation

* refine documentation for new config with reviewer advice

* add default column to docs for new config

* remove legacy code in LogUsedSegments and remove config to disbale duty

* fix makeHistoricalMangementDuties now that the returned list is always the same
2021-01-12 14:09:19 -08:00
Jihoon Son 3984457e5b
Add missing unit tests for segment loading in historicals (#10737)
* Add missing unit tests for segment loading in historicals

* unused import
2021-01-11 18:20:13 -06:00
Lucas Capistrant fe0511b16a
Coordinator Dynamic Config changes to ease upgrading with new config value (#10724)
* Coordinator Dynamic Config changes to ease upgrading with new config value

* change a log to debug level following review

* changes based on review feedback

* fix checkstyle
2021-01-10 20:05:39 -08:00
Liran Funaro 08ab82f55c
IncrementalIndex Tests and Benchmarks Parametrization (#10593)
* Remove redundant IncrementalIndex.Builder

* Parametrize incremental index tests and benchmarks

- Reveal and fix a bug in OffheapIncrementalIndex

* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]

* Fix Intellij errors: declared exception is never thrown

* Add documentation and validate before closing objects on tearDown.

* Add documentation to OffheapIncrementalIndexTestSpec

* Doc corrections and minor changes.

* Add logging for generated rows.

* Refactor new tests/benchmarks.

* Improve IncrementalIndexCreator documentation

* Add required tests for DataGenerator

* Revert "rollupOpportunity" to be a string
2021-01-07 22:18:47 -08:00
kaijianding 01e25f1e69
reuse DataSegment object when a segment found on another server (#10715) 2021-01-07 21:55:25 -08:00
Himanshu c7b1212a43
AWS RDS token based password provider (#9518)
* refresh db pwd

* aws iam token password provider

* fix analyze-dependencies build

* fix doc build

* add  ut for BasicDataSourceExt

* more doc updates

* more  doc update

* moving aws  token password  provider to new extension

* remove duplicate changes

* make  all config inline

* extension docs

* refresh db  password  in SQL Firehose code path as well

* add ut

* fix build

* add new extension to distribution

* rds lib is not provided

* fix license build

* add version to license

* change parent version to 0.19.0-snapshot

* address review comments

* fix core/ code coverage

* Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* address review comments

* fix spellchecker

* remove inadvertant website file change

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-01-06 21:15:29 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Jonathan Wei 769c21cc87
Add sample method to IndexingServiceClient (#10729)
* Add sample method to IndexingServiceClient

* Add unit test

* Fix LGTM
2021-01-05 15:02:44 -08:00
Clint Wylie 74fbdd322d
refactor NodeRole so extensions can participate in disco and announcement (#10700)
* refactor NodeRole so extensions can participate in disco and announcement

* fixes, maybe

* retries

* javadoc

* fix

* spelling
2020-12-24 15:29:32 -08:00
Lucas Capistrant 58ce2e55d8
Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284)
* dynamic coord config adding more balancing control

add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.

* fix checkstyle failure

* Make doc more detailed for admin to understand when/why to use new config

* refactor PR to use a % of segments instead of raw number

* update the docs

* remove bad doc line

* fix typo in name of new dynamic config

* update RservoirSegmentSampler to gracefully deal with values > 100%

* add handler for <= 0 in ReservoirSegmentSampler

* fixup CoordinatorDynamicConfigTest naming and argument ordering

* fix items in docs after spellcheck flags

* Fix lgtm flag on missing space in string literal

* improve documentation for new config

* Add default value to config docs and add advice in cluster tuning doc

* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog

* update jest snapshot after console change

* fix spell checker errors

* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider

* add new config back to web console module after merge with master

* fix ReservoirSegmentSamplerTest

* fix line breaks in coordinator console dialog

* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove

* Make improvements based off of feedback in review

* additional cleanup coming from review

* Add a warning log if limit on segments to consider for move can't be calcluated

* remove unused import

* fix tests for CoordinatorDynamicConfig

* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
2020-12-22 08:27:55 -08:00
keefe roedersheimer ca3b925133
allow server selection to be aware of query (#10428)
* add query through to server selector

* add nullable extensions, deprecate old methods with defaults

* style changes

* add nullable to ServerSelectorStrategy

* fix test coverage

* missing override in test

* add null check
2020-12-18 13:56:19 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
zhangyue19921010 0ad27c06da
Historical load Segments enhancement (#10650)
* load segments with segment files check

* add more java docs

* done

* add java docs

* revert misc

* resolve ci failures

* resolve ci failures

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-14 13:56:03 -08:00
Gian Merlino 96a387d972
Fixes and tests related to the Indexer process. (#10631)
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in #9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
2020-12-08 16:02:26 -08:00
frank chen c410648630
fix injection failure of StorageLocationSelectorStrategy objects (#10363)
* fix to allow customer storage location selector strategy

* add test cases to check instance of selector strategy

* update doc

* code format

* resolve code review comments

* inject StorageLocation

* fix CI

* fix mismatched license item reported by CI

* change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy

* using a helper method to bind to correct property path
2020-12-08 09:48:31 -08:00
Gian Merlino b7641f644c
Two fixes related to encoding of % symbols. (#10645)
* Two fixes related to encoding of % symbols.

1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments()
   returns already-decoded strings. Applying StringUtils.urlDecode on
   top of that causes erroneous behavior with '%' characters.

2) Update various ThreadFactoryBuilder name formats to escape '%'
   characters. This fixes situations where substrings starting with '%'
   are erroneously treated as format specifiers.

ITs are updated to include a '%' in extra.datasource.name.suffix.

* Avoid String.replace.

* Work around surefire bug.

* Fix xml encoding.

* Another try at the proper encoding.

* Give up on the emojis.

* Less ambitious testing.

* Fix an additional problem.

* Adjust encodeForFormat to return null if the input is null.
2020-12-06 22:35:11 -08:00
Gian Merlino 17f39ab91e
Fix misspellings in druid-forbidden-apis. (#10634)
These caused certain APIs to not actually be properly forbidden.

Also removed two MoreExecutors entries for methods that don't exist in
our version of Guava.
2020-12-05 15:26:57 -08:00
Maytas Monsereenusorn 7eb5f59a9a
Fix string byte calculation in StringDimensionIndexer (#10623)
* fix string byte calculation

* fix tests

* fix test
2020-12-04 00:51:48 -08:00
Liran Funaro 52d46cebc3
Move common configurations to TuningConfig (#10478)
* Move common methods that are used in HadoopTuningConfig and in AppenderatorConfig to TuningConfig
* Rename rowFlushBoundary in HadoopTuningConfig to maxRowsInMemory to match TuningConfig API
2020-12-03 18:13:32 -08:00
Lucas Capistrant 2560bf0a19
Add new coordinator metrics for coordinator duty runtimes (#10603)
* Add new coordinator metrics for duty runtimes

* fix spelling for a constant variable value

* add comment clarifying why the global runtime metric is emitted where it is

* Remove duty alias in lieu of using the class name for metrics

* fix docs

* CoordinatorStats tests + add duty stats to accumulate() logic
2020-11-29 14:47:35 -08:00
Ayush Kulshrestha d0c2ede50c
Added CronScheduler support as a proof to clock drift while emitting metrics (#10448)
Co-authored-by: Ayush Kulshrestha <ayush.kulshrestha@miqdigital.com>
2020-11-25 12:31:38 +01:00
Atul Mohan 111b431c07
Introduce query/timeout/count metric (#10567)
* Add timeout metric

* Add tests
2020-11-20 15:17:26 -08:00
frank chen d7d2c804ad
Add zero period support to TIMESTAMPADD (#10550)
* Allow zero period for TIMESTAMPADD

* update test cases

* add empty zone test case

* add unit test cases for TimestampShiftMacro
2020-11-18 18:26:53 -08:00
frank chen e83d5cb59e
Fix ingestion failure of pretty-formatted JSON message (#10383)
* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check
2020-11-13 13:59:23 -08:00
Lucas Capistrant c3cad461bc
Correct getRandomBalancerSegmentHolderTest (#10569) 2020-11-12 23:21:05 +05:30
Clint Wylie 6563599de4
modify access to protected SQLMetadataConnector methods to allow extensions to create SQL metadata tables using implementation specific constructs (payload type, serial type, etc) (#10573) 2020-11-12 23:20:01 +05:30
Atul Mohan 6ccddedb7a
Improved exception handling in case of query timeouts (#10464)
* Separate timeout exceptions

* Add more tests

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-11-03 09:00:33 -06:00
Nishant Bangarwa 6b14bdb3a5
Add support for Blacklisting some domains for HTTPInputSource (#10535)
fix inspections

refactor class name

change name

 add allowList as well

distinguish between empty and null list

Fix CI
2020-11-02 21:47:25 +05:30
Himanshu 4de4d4d111
remove ServerDiscoverySelector from DruidLeaderClient (#10537) 2020-10-28 10:55:11 -07:00
Himanshu ee136303bb
optionally disable all of hardcoded zookeeper use (#9507)
* optionally disable all of hardcoded zookeeper use

* fix DruidCoordinatorTest compilation

* fix test in DruidCoordinatorTest

* fix strict compilation

Co-authored-by: Himanshu Gupta <fill email>
2020-10-26 22:35:59 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
BIGrey bbe23c652c
Fix negative queuedSize problem in CuratorLoadQueuePeon (#10362)
* fix negative queuedSize problem in CuratorLoadQueuePeon

* add comment and optimize test case

* fix typo

Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com>
2020-10-16 13:38:49 -07:00
Abhishek Agarwal 4d2a92f46a
Add caching support to join queries (#10366)
* Proposed changes for making joins cacheable

* Add unit tests

* Fix tests

* simplify logic

* Pull empty byte array logic out of CachingQueryRunner

* remove useless null check

* Minor refactor

* Fix tests

* Fix segment caching on Broker

* Move join cache key computation in Broker

Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient

* Fix compilation

* Review comments

* Add more tests

* Fix inspection errors

* Pushed condition analysis to JoinableFactory

* review comments

* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable

* Remove commented lines

* Fix populateCache

* Disable caching for selective datasources

Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
2020-10-09 17:42:30 -07:00
Jihoon Son e9e7d82714
Fix compaction task slot computation in auto compaction (#10479)
* Fix compaction task slot computation in auto compaction

* add tests for task counting
2020-10-06 21:56:03 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Mainak Ghosh 8168e14e92
Adding task slot count metrics to Druid Overlord (#10379)
* Adding more worker metrics to Druid Overlord

* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better

* Few more instance of worker usage replaced with peon

* Modifying the peon idle count logic to only use eligible workers available capacity

* Changing the naming to task slot count instead of peon

* Adding some unit test coverage for the new test runner apis

* Addressing Review Comments

* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics

* Fixing the spelling issue in the docs

* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
2020-09-28 23:50:38 -07:00
Clint Wylie 64df71f25f
more timeout handling in JsonParserIterator (#10426) 2020-09-28 13:12:42 -07:00
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Clint Wylie dad69481f0
add light weight version of /druid/coordinator/v1/lookups/nodeStatus (#10422)
* add light weight version /druid/coordinator/v1/lookups/nodeStatus

* review stuffs
2020-09-24 14:36:53 +08:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Maytas Monsereenusorn e78d7862a8
Auto-compaction snapshot status API (#10371)
* Auto-compaction snapshot API

* Auto-compaction snapshot API

* Auto-compaction snapshot API

* Auto-compaction snapshot API

* Auto-compaction snapshot API

* Auto-compaction snapshot API

* Auto-compaction snapshot API

* fix when not all compacted segments are iterated

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add some tests to make code cov happy

* address comments

* address comments

* address comments

* address comments

* make code coverage happy

* address comments

* address comments

* address comments

* address comments
2020-09-18 16:37:58 -07:00
Mainak Ghosh d9beda7f24
Adding the missing sqlQueryContext api (#10368)
* Adding the missing sqlQueryContext api

* Adding a serialization test for DefaultRequestLogEvent

* Fixing the unit test failure
2020-09-18 00:46:31 -07:00
Mainak Ghosh 14072d3ab0
Adding more dimensions to the audit log entry (#10373)
* Adding more dimensions to the audit log entry

* Making adding payload in audit metric optional

* Changing the name of the parameter to includePayloadAsDimensionInMetric. Adding a unit test

* Fixing the intellij code introspection issues
2020-09-17 18:36:28 -07:00
Arvin.Z 1b05d6e542
recreate the balancer executor only when needed (#10280)
* recreate the balancer executor only when needed

* fix UT error

* shutdown the balancer executor in stopBeingLeader and stop

* remove commented code

* remove comments
2020-09-16 14:25:57 -05:00
Atul Mohan 94226f1b3d
Disable sending server version in response headers (#9832)
* Toggle sending of server version

* Remove config

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 22:48:00 -07:00
Abhishek Agarwal f5e2645bbb
Support SearchQueryDimFilter in sql via new methods (#10350)
* Support SearchQueryDimFilter in sql via new methods

* Contains is a reserved word

* revert unnecessary change

* Fix toDruidExpression method

* rename methods

* java docs

* Add native functions

* revert change in dockerfile

* remove changes from dockerfile

* More tests

* travis fix

* Handle null values better
2020-09-14 09:57:54 -07:00
Cheng Pan 3d4b48e0aa
TransformSpecTest should extends InitializedNullHandlingTest (#10392) 2020-09-14 08:22:24 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Jihoon Son d32d1e7004
Fix result-level caching (#10341)
* create baseSequence early

* unit test

* add comment and a new test
2020-09-08 11:04:00 -07:00
xiangqiao123 3fc8bc0701
optimize announceHistoricalSegments (#9935)
* optimize announceHistoricalSegment

* optimize announceHistoricalSegment

* revert offline SegmentTransactionalInsertAction uses a separate lock

* optimize segmentExistsBatch: Avoid too many elements in the in condition

* add unit test && Modified according to cr

Co-authored-by: xiangqiao <xiangqiao@kuaishou.com>
2020-09-02 13:07:10 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Gian Merlino 21703d81ac
Fix handling of 'join' on top of 'union' datasources. (#10318)
* Fix handling of 'join' on top of 'union' datasources.

The problem is that unions are typically rewritten into a series of
individual queries on the underlying tables, but this isn't done when
the union is wrapped in a join.

The main changes are in UnionQueryRunner:

1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis.
2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource".

Together, these enable UnionQueryRunner to "see through" a join.

* Tests.

* Adjust heap sizes for integration tests.

* Different approach, more tests.

* Tweak.

* Styling.
2020-08-26 14:23:54 -07:00
Jihoon Son b9ff3483ac
Add support for all partitioing schemes for auto compaction (#10307)
* Add support for all partitioing schemes for auto compaction

* annotate last compaction state for multi phase parallel indexing

* fix build and tests

* test

* better home
2020-08-26 13:19:18 -07:00
Clint Wylie ab60661008
refactor internal type system (#9638)
* better type tracking: add typed postaggs, finalized types for agg factories

* more javadoc

* adjustments

* transition to getTypeName to be used exclusively for complex types

* remove unused fn

* adjust

* more better

* rename getTypeName to getComplexTypeName

* setup expression post agg for type inference existing

* more javadocs

* fixup

* oops

* more test

* more test

* more comments/javadoc

* nulls

* explicitly handle only numeric and complex aggregators for incremental index

* checkstyle

* more tests

* adjust

* more tests to showcase difference in behavior

* timeseries longsum array
2020-08-26 10:53:44 -07:00
Jihoon Son b5b3e6ecce
Add maxNumFiles to splitHintSpec (#10243)
* Add maxNumFiles to splitHintSpec

* missing link

* fix build failure; use maxNumFiles for integration tests

* spelling

* lower default

* Update docs/ingestion/native-batch.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* address comments; change default maxSplitSize

* spelling

* typos and doc

* same change for segments splitHintSpec

* fix build

* fix build

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2020-08-21 09:43:58 -07:00
Clint Wylie 7620b0c54e
Segment backed broadcast join IndexedTable (#10224)
* Segment backed broadcast join IndexedTable

* fix comments

* fix tests

* sharing is caring

* fix test

* i hope this doesnt fix it

* filter by schema to maybe fix test

* changes

* close join stuffs so it does not leak, allow table to directly make selector factory

* oops

* update comment

* review stuffs

* better check
2020-08-20 14:12:39 -07:00
Atul Mohan 618c04a99e
Fix CombiningFirehose compatibility (#10264)
* Fix CombiningFirehose

* Add integration test

* Fix path

* Add full datasource name

* Fix input location

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-20 10:37:38 -07:00
Clint Wylie b36dab0fe6
fix connectionId issue with JDBC prepared statement queries and router (#10272)
* fix router jdbc prepared statement connectionId issue

* column metadata too

* style

* remove tls

* try tls again

* add keystore stuffs

* use keyManager password

* add unit test

* simplify
2020-08-19 00:18:06 -07:00
Jihoon Son 9a81740281
Don't log the entire task spec (#10278)
* Don't log the entire task spec

* fix lgtm

* fix serde

* address comments and add tests

* fix tests

* remove unnecessary codes
2020-08-18 11:03:13 -07:00
Himanshu 12ae84165e
remove DruidLeaderClient.goAsync(..) that does not follow redirect. Replace its usage by DruidLeaderClient.go(..) with InputStreamFullResponseHandler (#9717)
* remove DruidLeaderClient.goAsync(..) that does not follow redirect.
Replace its usage by DruidLeaadereClient.go(..) with
InputStreamFullResponseHandler

* remove ByteArrayResponseHolder dependency from JsonParserIterator

* add UT to cover lines in InputStreamFullResponseHandler

* refactor SystemSchema to reduce branches

* further reduce branches

* Revert "add UT to cover lines in InputStreamFullResponseHandler"

This reverts commit 330aba3dd9.

* UTs for InputStreamFullResponseHandler

* remove unused imports
2020-08-14 10:51:18 -07:00
Clint Wylie c72f96a4ba
fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248)
* fix bug with realtime expressions on sparse string columns

* fix test

* add comment back

* push capabilities for dimensions to dimension indexers since they know things

* style

* style

* fixes

* getting a bit carried away

* missed one

* fix it

* benchmark build fix

* review stuffs

* javadoc and comments

* add comment

* more strict check

* fix missed usaged of impl instead of interface
2020-08-11 11:07:17 -07:00
Atul Mohan 06539bc828
Set default server.maxsize to the sum of segment cache (#10255)
* Default server.maxsize

* Remove maxsize refs from config

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-10 09:21:22 -07:00
frank chen 646fa84d04
Support unit on byte-related properties (#10203)
* support unit suffix on byte-related properties

* add doc

* change default value of byte-related properites in example files

* fix coding style

* fix doc

* fix CI

* suppress spelling errors

* improve code according to comments

* rename Bytes to HumanReadableBytes

* add getBytesInInt to get value safely

* improve doc

* fix problem reported by CI

* fix problem reported by CI

* resolve code review comments

* improve error message

* improve code & doc according to comments

* fix CI problem

* improve doc

* suppress spelling check errors
2020-07-31 09:58:48 +08:00
Jian Wang 271f90f205
Add segment pruning for hash based shard spec (#9810)
* Add segment pruning for hash based partitioning

* Update doc

* Add additional test

* Address comments

* Fix unit test failure

Co-authored-by: Jian Wang <jwang@pinterest.com>
2020-07-30 18:44:26 -07:00
Maytas Monsereenusorn 574b062f1f
Cluster wide default query context setting (#10208)
* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
2020-07-29 15:19:18 -07:00
Jihoon Son 6fdce36e41
Add integration tests for query retry on missing segments (#10171)
* Add integration tests for query retry on missing segments

* add missing dependencies; fix travis conf

* address comments

* Integration tests extension

* remove unused dependency

* remove druid_main

* fix java agent port
2020-07-22 22:30:35 -07:00
Jihoon Son 26d099f39b
Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments (#10183)
* Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments

* fix tests and add missing tests

* revert null handling fix

* unused import

* move out util methods from DiscoveryDruidNode
2020-07-21 17:52:51 -07:00
Jihoon Son 41982116f4
Report missing segments when there is no segment for the query datasource in historicals (#10199)
* Report missing segments when there is no segment for the query
datasource in historicals

* test

* missing part for test

* another test
2020-07-20 21:02:52 -07:00
Maytas Monsereenusorn 454eed06a8
JettyTest.testNumConnectionsMetricHttp is rarely flaky (#10169) 2020-07-14 10:36:46 -07:00
Suneet Saldanha e6c9142129
Add validation for authenticator and authorizer name (#10106)
* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix
2020-07-13 21:15:54 -07:00
Maytas Monsereenusorn 54a8fb827d
Fix flaky tests in DruidCoordinatorTest (#10157)
* Fix flaky tests in DruidCoordinatorTest

* Imporve fail msg

* Fix flaky tests in DruidCoordinatorTest
2020-07-08 20:03:52 -07:00
Antoine Huret 88d20a61a6
renamed authenticationChain to authenticatorChain (#10143) 2020-07-08 19:58:21 -07:00
Jihoon Son 53a2550571
Follow-up for RetryQueryRunner fix (#10144)
* address comments; use guice instead of query context

* typo

* QueryResource tests

* address comments

* catch queryException

* fix spell check
2020-07-08 13:28:11 -07:00
Gian Merlino eeaf609fc0
Update Jetty to 9.4.30.v20200611. (#10098)
* Update Jetty to 9.4.30.v20200611.

This is the latest version currently available in the 9.4.x line.

* Various adjustments.

* Class name fixes.

* Remove unused HttpClientModule code.

* Add coverage suppressions.

* Another coverage suppression.

* Fix wildcards.
2020-07-07 14:24:02 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Jihoon Son 2b93dc6019
Fix CachingClusteredClient when querying specific segments (#10125)
* Fix CachingClusteredClient when querying specific segments

* delete useless test

* roll back timeout
2020-07-02 17:50:50 -07:00
Jihoon Son 657f8ee80f
Fix RetryQueryRunner to actually do the job (#10082)
* Fix RetryQueryRunner to actually do the job

* more javadoc

* fix test and checkstyle

* don't combine for testing

* address comments

* fix unit tests

* always initialize response context in cachingClusteredClient

* fix subquery

* address comments

* fix test

* query id for builders

* make queryId optional in the builders and ClusterQueryResult

* fix test

* suppress tests and unused methods

* exclude groupBy builder

* fix jacoco exclusion

* add tests for builders

* address comments

* don't truncate
2020-07-01 14:02:21 -07:00
Gian Merlino 5faa897a34
Join filter pre-analysis simplifications and sanity checks. (#10104)
* Join filter pre-analysis simplifications and sanity checks.

- At pre-analysis time, only compute pre-analysis for the innermost
  root query, since this is the one that will run on the join that involves
  the base datasource. Previously, pre-analyses were computed for multiple
  levels of the query, some of which were unnecessary.
- Remove JoinFilterPreAnalysisGroup and join query level gathering code,
  since they existed to support precomputation of multiple pre-analyses.
- Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to
  sanity check at processing time that the correct pre-analysis was done.

Tangentially related changes:

- Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker.
  The computed priority and lanes were not being used.
- Add "getBaseQuery" method to DataSourceAnalysis to support identification
  of the proper subquery for filter pre-analysis.

* Fix compilation errors.

* Adjust tests.
2020-06-30 19:14:22 -07:00
Suneet Saldanha b91a16943b
Make 0.19 brokers compatible with 0.18 router (#10091)
* Make brokers backwards compatible

In 0.19, Brokers gained the ability to serve segments. To support this change,
a `BROKER` ServerType was added to `druid.server.coordination`.

Druid nodes prior to this change do not know of this new server type and so
they would fail to deserialize this node's announcement.

This change makes it so that the broker only announces itself if the segment
cache is configured on the broker. It is expected that a Druid admin will only
configure the segment cache on the broker once the cluster has been upgraded
to a version that supports a broker using the segment cache.

* make code nicer

* Add tests

* Ignore icode coverage for nitialization classes

* Revert "Ignore icode coverage for nitialization classes"

This reverts commit aeec0c2ac2.

* code review
2020-06-29 20:57:33 -07:00
Atul Mohan 0841c89df6
Fix nullhandling exception (#10095)
Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-29 20:55:38 -07:00
Jihoon Son 8ef3598c05
Move shardSpec tests to core (#10079)
* Move shardSpec tests to core

* checkstyle

* inject object mapper for testing

* unused import
2020-06-29 17:31:37 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
chenyuzhi459 a4c6d5f37e
fix query memory leak (#10027)
* fix query memory leak

* rollup ./idea

* roll up .idea

* clean code

* optimize style

* optimize cancel function

* optimize style

* add concurrentGroupTest test case

* add test case

* add unit test

* fix code style

* optimize cancell method use

* format code

* reback code

* optimize cancelAll

* clean code

* add comment
2020-06-26 23:30:59 -07:00
Jian Wang 20fd72bd13
Fix NPE when brokers use custom priority list (#9878) 2020-06-26 17:28:54 -07:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Parag Jain 422a8af14e
Fix balancer strategy (#10070)
* fix server overassignment

* fix random balancer strategy, add more tests

* comment

* added more tests

* fix forbidden apis

* fix typo
2020-06-25 16:45:00 +05:30
Dylan Wylie 0470fcc9da
change default number of segment loading threads (#9856)
* change default number of segment loading threads

* fix docs

* missed file

* min -> max for segment loading threads

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-06-23 13:56:44 -07:00
Maytas Monsereenusorn 191572ad5e
Add safeguard to make sure new Rules added are aware of Rule usage in loadstatus API (#10054)
* Add safeguard to make sure new Rules added are aware of Rule usuage in loadstatus API

* address comments

* address comments

* add tests
2020-06-19 20:18:56 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00
Jonathan Wei 37e150c075
Fix join filter rewrites with nested queries (#10015)
* Fix join filter rewrites with nested queries

* Fix test, inspection, coverage

* Remove clauses from group key

* Fix import order

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-06-18 21:32:29 -07:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Maytas Monsereenusorn 857e5204bf
Coordinator loadstatus API full format does not consider Broadcast rules (#10048)
* Coordinator loadstatus API full format does not consider Broadcast rules

* address comments

* fix checkstyle

* minor optimization

* address comments
2020-06-18 17:52:33 -07:00
Clint Wylie b5e6569d2c
global table only if joinable (#10041)
* global table if only joinable

* oops

* fix style, add more tests

* Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java

* better information schema columns, distinguish broadcast from joinable

* fix javadoc

* fix mistake

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-06-18 17:32:10 -07:00
Aleksey Plekhanov 2c384b61ff
IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690)
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"

* Reverted checkstyle rule

* Added tests to pass CI

* Codestyle
2020-06-18 09:47:07 -07:00
Maytas Monsereenusorn 1a2620606d
API to verify a datasource has the latest ingested data (#9965)
* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix checksyle

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix spelling

* address comments

* fix checkstyle

* update docs

* fix tests

* fix doc

* address comments

* fix typo

* fix spelling

* address comments

* address comments

* fix typo in docs
2020-06-16 20:48:30 -10:00
Clint Wylie 68aa384190
global table datasource for broadcast segments (#10020)
* global table datasource for broadcast segments

* tests

* fix

* fix test

* comments and javadocs

* review stuffs

* use generated equals and hashcode
2020-06-16 17:58:05 -07:00
Gian Merlino 9330ca9717
Remove LegacyDataSource. (#10037)
* Remove LegacyDataSource.

Its purpose was to enable deserialization of strings into TableDataSources.
But we can do this more straightforwardly with Jackson annotations.

* Slight test improvement.
2020-06-16 14:40:35 -07:00
Jihoon Son 9a10f8352b
Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle
2020-06-12 21:39:37 -07:00
Clint Wylie 8a7e7e773a
fix balancer + broadcast segments npe (#10021) 2020-06-12 13:09:22 -07:00
Jonathan Wei fe2f656427
Fix broadcast rule drop and docs (#10019)
* Fix broadcast rule drop and docs

* Remove racy test check

* Don't drop non-broadcast segments on tasks, add overshadowing handling

* Don't use realtimes for overshadowing

* Fix dropping for ingestion services
2020-06-12 02:33:28 -07:00
Stefan Birkner 369ed2503e
Remove duplicate parameters from test (#10022)
Commit 771870ae2d removed constructor
arguments from the rules. Therefore multiple parameters of the test are
now the same and can be removed.
2020-06-11 14:15:02 -07:00
Clint Wylie 96eb69e475
ignore brokers in broker views (#10017) 2020-06-10 12:29:30 -07:00
Clint Wylie f8b643ec72
make joinables closeable (#9982)
* make joinables closeable

* tests and adjustments

* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests

* fixes

* javadocs and stuff

* fix bugs

* more test

* fix lgtm alert

* simplify

* fixup javadoc

* review stuffs

* safeguard against exceptions

* i hate this checkstyle rule

* make IndexedTable extend Closeable
2020-06-09 20:12:36 -07:00
Atul Mohan 17cf8ea8f2
Add Sql InputSource (#9449)
* Add Sql InputSource

* Add spelling

* Use separate DruidModule

* Change module name

* Fix docs

* Use sqltestutils for tests

* Add additional tests

* Fix inspection

* Add module test

* Fix md in docs

* Remove annotation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-09 12:55:20 -07:00
Jonathan Wei 771870ae2d
Load broadcast datasources on broker and tasks (#9971)
* Load broadcast datasources on broker and tasks

* Add javadocs

* Support HTTP segment management

* Fix indexer maxSize

* inspection fix

* Make segment cache optional on non-historicals

* Fix build

* Fix inspections, some coverage, failed tests

* More tests

* Add CliIndexer to MainTest

* Fix inspection

* Rename UnprunedDataSegment to LoadableDataSegment

* Address PR comments

* Fix
2020-06-08 20:15:59 -07:00
Gian Merlino 3dfd7c30c0
Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893)
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.

- Add REGEXP_LIKE function that returns a boolean, and is useful in
  WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
  filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
  (previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
  non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.

* Changes based on PR review.

* Fix arg check.

* Important fixes!

* Add speller.

* wip

* Additional tests.

* Fix up tests.

* Add validation error tests.

* Additional tests.

* Remove useless call.
2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn 0d22462e07
Document unsupported Join on multi-value column (#9948)
* Document Unsupported Join on multi-value column

* Document Unsupported Join on multi-value column

* address comments

* Add unit tests

* address comments

* add tests
2020-06-03 09:55:52 -10:00
Xavier Léauté a934b2664c
remove ListenableFutures and revert to using the Guava implementation (#9944)
This change removes ListenableFutures.transformAsync in favor of the
existing Guava Futures.transform implementation. Our own implementation
had a bug which did not fail the future if the applied function threw an
exception, resulting in the future never completing.

An attempt was made to fix this bug, however when running againts Guava's own
tests, our version failed another half dozen tests, so it was decided to not
continue down that path and scrap our own implementation.

Explanation for how was this bug manifested itself:

An exception thrown in BaseAppenderatorDriver.publishInBackground when
invoked via transformAsync in StreamAppenderatorDriver.publish will
cause the resulting future to never complete.

This explains why when encountering https://github.com/apache/druid/issues/9845
the task will never complete, forever waiting for the publishFuture to
register the handoff. As a result, the corresponding "Error while
publishing segments ..." message only gets logged once the index task
times out and is forcefully shutdown when the future is force-cancelled
by the executor.
2020-06-03 10:46:03 -07:00
Xavier Léauté acfcfd35b1
fix unsafe concurrent access in StreamAppenderatorDriver (#9943)
during segment publishing we do streaming operations on a collection not
safe for concurrent modification. To guarantee correct results we must
also guard any operations on the stream itself.

This may explain the issue seen in https://github.com/apache/druid/issues/9845
2020-05-31 09:12:25 -07:00
Clint Wylie 2e9548d93d
refactor SeekableStreamSupervisor usage of RecordSupplier (#9819)
* refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting

* fix style and test

* cleanup, refactor, javadocs, test

* fixes

* keep collecting current offsets and lag if unhealthy in background reporting thread

* review stuffs

* add comment
2020-05-16 14:09:39 -07:00
mcbrewster 28be107a1c
add flag to flattenSpec to keep null columns (#9814)
* add flag to flattenSpec to keep null columns

* remove changes to inputFormat interface

* add comment

* change comment message

* update web console e2e test

* move keepNullColmns to JSONParseSpec

* fix merge conflicts

* fix tests

* set keepNullColumns to false by default

* fix lgtm

* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns

* Add equals verifier tests
2020-05-08 21:53:39 -07:00
Francesco Nidito e7e41e3a36
Adding support for autoscaling in GCE (#8987)
* Adding support for autoscaling in GCE

* adding extra google deps also in gce pom

* fix link in doc

* remove unused deps

* adding terms to spelling file

* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT

* GCEXyz -> GceXyz in naming for consistency

* add preconditions

* add VisibleForTesting annotation

* typos in comments

* use StringUtils.format instead of String.format

* use custom exception instead of exit

* factorize interval time between retries

* making literal value a constant

* iter all network interfaces

* use provided on google (non api) deps

* adding missing dep

* removing unneded this and use Objects methods instead o 3-way if in hash and comparison

* adding import

* adding retries around getRunningInstances and adding limit for operation end waiting

* refactor GceEnvironmentConfig.hashCode

* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT

* removing unused config

* adding tests to hash and equals

* adding nullable to waitForOperationEnd

* adding testTerminate

* adding unit tests for createComputeService

* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)

* reverting queryResponseTemplate change

* adding comment for Compute.Builder.build() returning null
2020-04-28 03:13:39 -07:00
Clint Wylie 68cc0b2e1c
fixes for inline subqueries when multi-value dimension is present (#9698)
* fixes for inline subqueries when multi-value dimension is present

* fix test

* allow missing capabilities for vectorized group by queries to be treated as single dims since it means that column doesnt exist

* add comment
2020-04-21 18:44:26 -07:00
Maytas Monsereenusorn 8328d91b30
Add missing integration tests for the compaction by the coordinator (#9644)
* Add API to trigger a compaction by the coordinator for integration tests

* Add missing integration tests for the compaction by the coordinator

* address comments
2020-04-15 14:27:33 -07:00
Jihoon Son b8f7128b2d
Revert "remove ServerDiscoverySelector from DruidLeaderClient (#9481)" (#9702)
* Revert "remove ServerDiscoverySelector from DruidLeaderClient (#9481)"

This reverts commit 072bbe210f.

* fix build
2020-04-14 20:42:56 -07:00
Gian Merlino 5249155284
Fix off-by-one in IndexedTableJoinMatcher.getCardinality. (#9674)
* Fix off-by-one in IndexedTableJoinMatcher.getCardinality.

It would report a cardinality that is one lower than the actual cardinality.
The missing value is the phantom null that can be generated by outer joins.

* Fix tests.
2020-04-10 18:11:05 -07:00
Suneet Saldanha 22d3eed80c
Do not use external input in format strings (#9665)
https://lgtm.com/rules/7900080/
2020-04-10 10:46:04 -07:00
Suneet Saldanha bd1cff24a2
Remove no-op assert statement in ClientQuerySegmentWalker (#9607)
* Remove no-op assert statement

The assert statement in ClientQuerySegmentWalker will always be true because
of the preceeding while loop which has the same condition.
This change removes dead code to fix an error reported by LGTM

* Suppress lgtm

* cleanup whitespace
2020-04-10 10:41:29 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Abhishek Radhakrishnan 08851c0198
Preserve the null values for numeric type dimensions post-compaction. (#9622)
* Add selector null check to preserve null values as-is.

* Fix typo.

* add wrapping dimension selector test.

* Address review comments.

* nit: replace exception type.

* uh, float is indeed NOT a special case.
2020-04-08 18:56:06 -07:00
Clint Wylie 7bf2dfa3b1
fix flaky jetty test (#9633) 2020-04-08 10:07:06 -07:00
Maytas Monsereenusorn b95a1b9878
Fix NPE in RemoteTaskRunner event handler causes JVM shutdown (#9610)
* Fix NPE in RemoteTaskRunner event handler causes JVM shutdown

* address comments

* fix compile

* fix checkstyle

* fix lgtm

* fix merge

* fix test

* fix tests

* change scope

* address comments

* address comments
2020-04-07 14:53:51 -07:00
Clint Wylie d267b1c414
check paths used for shuffle intermediary data manager get and delete (#9630)
* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
2020-04-07 09:47:18 -07:00
Suneet Saldanha 7bf1ebb0b8
Add tests for valid and invalid datasource names (#9614)
* Add tests for valid and invalid datasource names

* code review

* clean up dependencies
2020-04-06 16:02:50 -07:00
Clint Wylie 4d277dbf99
Fix double count ssl connection metrics (#9594)
* fix double counted jetty/numOpenConnections metric for ssl connections

* tests

* more better

* style
2020-04-03 23:29:23 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Clint Wylie fa5da6693c
add lane enforcement for joinish queries (#9563)
* add lane enforcement for joinish queries

* oops

* style

* review stuffs
2020-03-30 11:58:16 -07:00
Clint Wylie 2bc29543e5
modify QueryCapacityExceededException to provide better messaging (#9547)
* modify QueryCapacityExceededException to provide better messaging

* style
2020-03-23 20:05:11 -07:00
Clint Wylie bf85ea19b2
roaring bitmaps by default (#9548)
* it is finally time

* fix it

* more docs

* fix doc
2020-03-23 18:15:57 -07:00
Himanshu 5604ac7963
druid extension for OpenID Connect auth using pac4j lib (#8992)
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication

* update version in druid-pac4j pom

* introducing unauthorized resource filter

* authenticated but authorized /unified-webconsole.html

* use httpReq.getRequestURI() for matching callback path

* add documentation

* minor doc addition

* licesne file updates

* make dependency analyze succeed

* fix doc build

* hopefully fixes doc build

* hopefully fixes license check build

* yet another try on fixing license build

* revert unintentional changes to website folder

* update version to 0.18.0-SNAPSHOT

* check session and its expiry on each request

* add crypto service

* code for encrypting the cookie

* update doc with cookiePassphrase

* update license yaml

* make sessionstore in Pac4jFilter private non static

* make Pac4jFilter fields final

* okta: use sha256 for hmac

* remove incubating

* add UTs for crypto util and session store impl

* use standard charsets

* add license header

* remove unused file

* add org.objenesis.objenesis to license.yaml

* a bit of nit changes  in CryptoService  and embedding EncryptionResult for clarity

* rename alg  to cipherAlgName

* take cipher alg name, mode and padding as input

* add java doc  for CryptoService  and make it more understandable

* another  UT for CryptoService

* cache pac4j Config

* use generics clearly in Pac4jSessionStore

* update cookiePassphrase doc to mention PasswordProvider

* mark stuff Nullable where appropriate in Pac4jSessionStore

* update doc to mention jdbc

* add error log on reaching callback resource

* javadoc  for Pac4jCallbackResource

* introduce NOOP_HTTP_ACTION_ADAPTER

* add correct module name in license file

* correct extensions folder name in licenses.yaml

* replace druid-kubernetes-extensions to druid-pac4j

* cache SecureRandom instance

* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 18:15:45 -07:00
Gian Merlino 54c9325256
SQL support for joins on subqueries. (#9545)
* SQL support for joins on subqueries.

Changes to SQL module:

- DruidJoinRule: Allow joins on subqueries (left/right are no longer
  required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
  handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
  it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
  version of Calcite. Some of these are useful for optimizing joins on
  subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
  relevant constants in CostEstimates.

Other changes:

- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
  of isComplete is enough to let callers know that columns might have
  multiple values, and explicitly setting it to true causes
  ExpressionSelectors to think it definitely has multiple values, and
  treat the inputs as arrays. This behavior interfered with some of the
  new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
  tests.

* Fixes for tests.

* Adjustments.
2020-03-22 16:43:55 -07:00
Jihoon Son 1e667362eb
Do not use UnmodifiableList in auto compaction (#9535) 2020-03-19 11:43:33 -07:00
Clint Wylie 68013fbc64
fix issue where total limit was being applied even when not configured (#9534)
* fix issue where total limit was being applied even when not configured

* fix inspection

* add reserved lane name check to manual laning strategy
2020-03-18 18:05:59 -07:00
Gian Merlino 1ef25a438f
Broker: Add ability to inline subqueries. (#9533)
* Broker: Add ability to inline subqueries.

The main changes:

- ClientQuerySegmentWalker: Add ability to inline queries.
- Query: Add "getSubQueryId" and "withSubQueryId" methods.
- QueryMetrics: Add "subQueryId" dimension.
- ServerConfig: Add new "maxSubqueryRows" parameter, which is used by
  ClientQuerySegmentWalker to limit how many rows can be inlined per
  query.
- IndexedTableJoinMatcher: Allow creating keys on top of unknown types,
  by assuming they are strings. This is useful because not all types are
  known for fields in query results.
- InlineDataSource: Store RowSignature rather than component parts. Add
  more zealous "equals" and "hashCode" methods to ease testing.
- Moved QuerySegmentWalker test code from CalciteTests and
  SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in
  druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest.

* Adjustments from CI.

* Fix integration test.
2020-03-18 15:06:45 -07:00
Jonathan Wei b1847364b0
More efficient join filter rewrites (#9516)
* More efficient join filter rewrites

* Rebase

* Remove unused functions

* PR comments, fix compile

* Adjust comment

* Allow filter rewrite when join condition has LHS expression

* Fix inspections

* Fix tests
2020-03-16 22:16:14 -07:00
Clint Wylie 69af760a19
add manual laning strategy, integration test (#9492)
* add manual laning strategy, integration test, json config test

* share percent conversion method

* wrong assert

* review stuffs

* doc adjustments

* more tests

* test adjustment

* adjust docs

* Update index.md
2020-03-13 20:06:55 -07:00
Clint Wylie 6afd55c8f4
threshold based automatic query prioritization (#9493)
* threshold based automatic query prioritization

* fixes

* spelling and fixes

* fix docs

* spelling

* checkstyle

* adjustments

* doc fix
2020-03-13 01:41:54 -07:00
Gian Merlino ff59d2e78b
Move RowSignature from druid-sql to druid-processing and make use of it. (#9508)
* Move RowSignature from druid-sql to druid-processing and make use of it.

1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific
   stuff in a RowSignatures utility class. It also picked up some new convenience
   methods along the way.
2) There were a lot of places in the code where Map<String, ValueType> was used to
   associate columns with type info. These are now all replaced with RowSignature.
3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature,
   and it now provides type info.

* Fix up extensions.

* Various fixes
2020-03-12 11:06:44 -07:00
Gian Merlino 2ef5c17441
Link up row-based datasources to serving layer. (#9503)
* Link up row-based datasources to serving layer.

- Add SegmentWrangler interface that allows linking of DataSources to Segments.
- Add LocalQuerySegmentWalker that uses SegmentWranglers to compute queries on
  data that is available locally.
- Modify ClientQuerySegmentWalker to use LocalQuerySegmentWalker when the base
  datasource is concrete and not a table.
- Add SegmentWranglerModule to the Broker so it has them available and can
  properly instantiate . LocalQuerySegmentWalkers.
- Set InlineDataSource and LookupDataSource to concrete, since they can be
  directly queried now.

* Fix tests.
2020-03-11 11:32:27 -07:00
Jihoon Son 7401bb3f93
Improve OvershadowableManager performance (#9441)
* Use the iterator instead of higherKey(); use the iterator API instead of stream

* Fix tests; fix a concurrency bug in timeline

* fix test

* add tests for findNonOvershadowedObjectsInInterval

* fix test

* add missing tests; fix a bug in QueueEntry

* equals tests

* fix test
2020-03-10 13:22:19 -07:00
Himanshu 75a5591448
remove old unused zookeeper dependent lookups code (#9480)
* remove old unused zookeeper dependent lookups code

* make  intellij inspector happy
2020-03-10 12:12:48 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jonathan Wei 0136dba95d
Add option to control join filter rewrites (#9472)
* Add option to control join filter rewrites

* Fix inspections
2020-03-09 17:36:07 -07:00
Himanshu 072bbe210f
remove ServerDiscoverySelector from DruidLeaderClient (#9481) 2020-03-09 12:13:59 -07:00
Lijia Liu 063811710e
#8690 use utc interval when create pedding segments (#9142)
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-02-26 13:20:59 -08:00
Clint Wylie 6d8dd5ec10
string -> expression -> string -> expression (#9367)
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays

* oops, macros are expressions too

* style

* spotbugs

* qualified type arrays

* review stuffs

* simplify grammar

* more permissive array parsing

* reuse expr joiner

* fix it
2020-02-21 15:43:02 -08:00
Chi Cao Minh e7eb45e648
Run IntelliJ inspections on Travis (#9179)
* Run IntelliJ inspections on Travis

Running IntelliJ inspections currently takes about 90 minutes, but they
can be run in about 30 minutes on Travis.

* Restore assert statements
2020-02-19 11:34:19 +03:00
Jihoon Son 3bb9e7e53a
Inject things instead of subclassing everything for parallel task testing (#9353)
* Inject things instead of subclassing everything for parallel task
testing

* javadoc

* fix compilation

* fix wrong merge

* Address comments
2020-02-16 13:00:12 -08:00
Atul Mohan 043abd5529
Fix compatibility issues with SqlFirehose (#9365)
* Make SqlFirehose compatible with FiniteFirehose

* Fix build
2020-02-14 17:45:12 -08:00
Jihoon Son bcf8f91e46
Add unit tests for CoordinatorRuleManager (#9318) 2020-02-13 19:29:57 -08:00
Adam Peck e9aebd994a
Fix for building in Eclipse & VS Code. (#7481)
Fixes #6866
Reverse dependencies from /main/ to /test/
Add generated-test-sources to source path for Eclipse
2020-02-13 14:58:32 -08:00
Suneet Saldanha b1f38131af
Fix timestamp extract fn to match postgreSQL (#9337)
* Fix timestamp extract fn to match postgres

Update the timestamp extract function so that it matches the PostgreSQL docs.
Examples from the PostgreSQL docs were added as tests for DECADE, CENTURY
and MILLENIUM extraction.

There were bugs in CENTURY and MILLENIUM that were spotted because of intelliJ
inspections - 'Integer division in floating point context'

* Update CalciteQueryTest

* remove useless round

* mark integer division as an error
2020-02-12 15:39:19 -08:00
Maytas Monsereenusorn c30579e47b
ANY Aggregator should not skip null values implementation (#9317)
* ANY Aggregator should not skip null values implementation

* add tests

* add more tests

* Update documentation

* add more tests

* address review comments

* optimize StringAnyBufferAggregator

* fix failing tests

* address pr comments
2020-02-12 14:01:41 -08:00
Jonathan Wei b2c00b3a79
Add query context option to disable join filter push down (#9335) 2020-02-11 15:31:34 -08:00
Clint Wylie 831ec172f1
Logging large segment list handling (#9312)
* better handling of large segment lists in logs

* more

* adjust

* exceptions

* fixes

* refactor

* debug

* heh

* dang
2020-02-07 21:42:45 -08:00
Jihoon Son e81230f9ab
Refactoring some codes around ingestion (#9274)
* Refactoring codes around ingestion:

- Parallel index task and simple task now use the same segment allocator implementation. This is reusable for the future implementation as well.
- Added PartitionAnalysis to store the analysis of the partitioning
- Move some util methods to SegmentLockHelper and rename it to TaskLockHelper

* fix build

* fix SingleDimensionShardSpecFactory

* optimize SingledimensionShardSpecFactory

* fix test

* shard spec builder

* import order

* shardSpecBuilder -> partialShardSpec

* build -> complete

* fix comment; add unit tests for partitionBoundaries

* add more tests and fix javadoc

* fix toString(); add serde tests for HashBasedNumberedPartialShardSpec and SegmentAllocateAction

* fix test

* add equality test for hash and range partial shard specs
2020-02-07 16:23:07 -08:00
Lucas Capistrant 53bb45fc9a
Forbid easily misused HashSet and HashMap constructors (#9165)
* Forbid easily misused HashSet and HashMap constructors

* Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them

* Fix visibility of constant in CollectionUtils.java

* Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used

* revert changes to sql module tests that should be in separate PR

* Finish reverting changes to sql module tests that were flagged in checkstyle during CI

* Add netty dependency resulting from SupressForbidden
2020-02-07 10:44:09 +03:00
Lucas Capistrant 2e1dbe598c
Create new dynamic config to pause coordinator helpers when needed (#9224)
* Create new dynamic config to pause coordinator helpers when needed

* Fix spelling mistakes flagged in Travis build

* Add an integration test for coordinator pause dynamic config

* Improve documentation for new dynamic coordinator config and remove un-needed info logs in favor of debug

* address naming convention of 'deep store' vs 'deep storage' in new configs doc line

* Fix newline at end of configuration index.md

* Last try to resolve newline issue in configuration readme

* fix spell checks from travis build

* Fix another flagges spelling error from Travis
2020-02-05 15:33:42 -08:00
Zhenxiao Luo 98cefc61fa
Not use ConcurrentHashMap in CoordinatorRuleManager.rules (#9302) 2020-02-05 15:33:31 -08:00
Gian Merlino 475b90c3a6
Remove EasyMock dependency from CalciteTests. (#9310)
* Remove EasyMock dependency from CalciteTests.

Useful because CalciteTests is used by other modules (e.g. druid-benchmarks)
and we don't want them to have to pull in EasyMock.

* CalciteTests no longer needs curator-x-discovery either.
2020-02-04 22:10:17 -08:00
lamber-ken 48b95f02f2
Remove unnecessary casts (#9208) 2020-02-04 21:52:33 -08:00
Gian Merlino b411443d22
SQL join support for lookups. (#9294)
* SQL join support for lookups.

1) Add LookupSchema to SQL, so lookups show up in the catalog.
2) Add join-related rels and rules to SQL, allowing joins to be planned into
   native Druid queries.

* Add two missing LookupSchema calls in tests.

* Fix tests.

* Fix typo.
2020-01-31 23:51:16 -08:00
Gian Merlino 4963a113dc
Make JoinableFactoryModule tests look at all the actual mappings. (#9295)
By depending on JoinableFactoryModule.FACTORY_MAPPINGS, we verify that the bound
JoinableFactory can actually handle and create all default classes.
2020-01-31 17:15:38 -08:00
Gian Merlino 204ba9966f
Add LookupJoinableFactory. (#9281)
* Add LookupJoinableFactory.

Enables joins where the right-hand side is a lookup. Includes an
integration test.

Also, includes changes to LookupExtractorFactoryContainerProvider:

1) Add "getAllLookupNames", which will be needed to eventually connect
   lookups to Druid's SQL catalog.
2) Convert "get" from nullable to Optional return.
3) Swap out most usages of LookupReferencesManager in favor of the
   simpler LookupExtractorFactoryContainerProvider interface.

* Fixes for tests.

* Fix another test.

* Java 11 message fix.

* Fixups.

* Fixup benchmark class.
2020-01-30 14:46:21 -08:00
Suneet Saldanha 6b44d4aa80
Add getRightEquiConditionKeys to JoinConditionAnalysis (#9287)
* Add getRightColumns to JoinConditionAnalysis

This change other implementations of JoinableFactory to ask the analysis
for the right key columns instead of having to calculate it themselves.

* Address some review comments

* more code review stuff
2020-01-29 22:31:29 -08:00
Suneet Saldanha 303b02eba1
intelliJ inspections cleanup (#9260)
* intelliJ inspections cleanup

- remove redundant escapes
- performance warnings
- access static member via instance reference
- static method declared final
- inner class may be static

Most of these changes are aesthetic, however, they will allow inspections to
be enabled as part of CI checks going forward

The valuable changes in this delta are:
- using StringBuilder instead of string addition in a loop
    indexing-hadoop/.../Utils.java
    processing/.../ByteBufferMinMaxOffsetHeap.java
- Use class variables instead of static variables for parameterized test
    processing/src/.../ScanQueryLimitRowIteratorTest.java

* Add intelliJ inspection warnings as errors to druid profile

* one more static inner class
2020-01-29 11:50:52 -08:00
Suneet Saldanha 6ee0afa8e5
Rename MapDataSourceJoinableFactoryWarehouse (#9275) 2020-01-28 19:00:07 -08:00
Suneet Saldanha 0ccfe5ca89 Expose JoinableFactory through Guice Bindings (#9271)
* Make JoinableFactory an extension point

This change makes it so that extensions can register a JoinableFactory that
should be used for a DataSource.

Extensions can provide the factories via DruidBinders#joinableFactoryBinder
Known DataSources - like InlineDataSource are provided in the
JoinableFactoryModule. This module installs a FactoryWarehouse that is
used to decide which factory should be used to generate the Joinable for
the provided DataSource.

The ExtensionPoint is marked as Beta since it is not yet clear if this
needs to remain available to other extensions or if the best way to
register a factory is by using the datasource class.

* Add module test

* remove useless bindings in test

* remove ExtensionPoint annotation

* Make LifecycleLock not final to help with testing
2020-01-28 13:59:06 -08:00
Roman Leventov b9186f8f9f Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306)
* Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error

* Fix brace

* Import order

* Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill

* Fix tests

* Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY

* More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters

* Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig

* More variable and method renames

* Rename MetadataSegments to SegmentsMetadata

* Javadoc update

* Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs

* Update Javadoc of VersionedIntervalTimeline.iterateAllObjects()

* Reorder imports

* Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers

* Complete merge

* Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests

* Remove MetadataSegmentManager

* Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments

* Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder

* Fix inspections

* Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest

* Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods

* Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator

* Unused import

* Optimize imports

* Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata()

* Unused import

* Update terminology in datasource-view.tsx

* Fix label in datasource-view.spec.tsx.snap

* Fix lint errors in datasource-view.tsx

* Doc improvements

* Another attempt to please TSLint

* Another attempt to please TSLint

* Style fixes

* Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge)

* Try to fix docs build issue

* Javadoc and spelling fixes

* Rename SegmentsMetadata to SegmentsMetadataManager, address other comments

* Address more comments
2020-01-27 11:24:29 -08:00
Gian Merlino 19b427e8f3
Add JoinableFactory interface and use it in the query stack. (#9247)
* Add JoinableFactory interface and use it in the query stack.

Also includes InlineJoinableFactory, which enables joining against
inline datasources. This is the first patch where a basic join query
actually works. It includes integration tests.

* Fix test issues.

* Adjustments from code review.
2020-01-24 13:10:01 -08:00
Gian Merlino f0f68570ec
Use DataSourceAnalysis throughout the query stack. (#9239)
Builds on #9235, using the datasource analysis functionality to replace various ad-hoc
approaches. The most interesting changes are in ClientQuerySegmentWalker (brokers),
ServerManager (historicals), and SinkQuerySegmentWalker (indexing tasks).

Other changes related to improving how we analyze queries:

1) Changes TimelineServerView to return an Optional timeline, which I thought made
   the analysis changes cleaner to implement.
2) Added QueryToolChest#canPerformSubquery, which is now used by query entry points to
   determine whether it is safe to pass a subquery dataSource to the query toolchest.
   Fixes an issue introduced in #5471 where subqueries under non-groupBy-typed queries
   were silently ignored, since neither the query entry point nor the toolchest did
   anything special with them.
3) Removes the QueryPlus.withQuerySegmentSpec method, which was mostly being used in
   error-prone ways (ignoring any potential subqueries, and not verifying that the
   underlying data source is actually a table). Replaces with a new function,
   Queries.withSpecificSegments, that includes sanity checks.
2020-01-23 14:07:14 -08:00
Zhenxiao Luo 479c09751c Add MostAvailableSizeStorageLocationSelectorStrategy (#8879)
* Add MostAvailableSize LocationSelectorStrategy

* Add doc for mostAvailableSize strategy

* Fix docs for mostAvailableSize
2020-01-23 13:42:03 -08:00
Gian Merlino d886463253
Add join-related DataSource types, and analysis functionality. (#9235)
* Add join-related DataSource types, and analysis functionality.

Builds on #9111 and implements the datasource analysis mentioned in #8728. Still can't
handle join datasources, but we're a step closer.

Join-related DataSource types:

1) Add "join", "lookup", and "inline" datasources.
2) Add "getChildren" and "withChildren" methods to DataSource, which will be used
   in the future for query rewriting (e.g. inlining of subqueries).

DataSource analysis functionality:

1) Add DataSourceAnalysis class, which breaks down datasources into three components:
   outer queries, a base datasource (left-most of the highest level left-leaning join
   tree), and other joined-in leaf datasources (the right-hand branches of the
   left-leaning join tree).
2) Add "isConcrete", "isGlobal", and "isCacheable" methods to DataSource in order to
   support analysis.

Other notes:

1) Renamed DataSource#getNames to DataSource#getTableNames, which I think is clearer.
   Also, made it a Set, so implementations don't need to worry about duplicates.
2) The addition of "isCacheable" should work around #8713, since UnionDataSource now
   returns false for cacheability.

* Remove javadoc comment.

* Updates reflecting code review.

* Add comments.

* Add more comments.
2020-01-22 14:54:47 -08:00
Gian Merlino d21054f7c5
Remove the deprecated interval-chunking stuff. (#9216)
* Remove the deprecated interval-chunking stuff.

See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details.

* Remove unused import.

* Remove chunkInterval too.
2020-01-19 17:14:23 -08:00
Gian Merlino bd49ec03bc
Move result-to-array logic from SQL layer into QueryToolChests. (#9130)
* Move result-to-array logic from SQL layer into QueryToolChests.

* Checkstyle adjustment.

* Fix typo.
2020-01-16 15:42:10 -08:00
Atul Mohan b642b1aa5b Fix deserialization of maxBytesInMemory (#9092)
* Fix deserialization of maxBytesInMemory

* Add maxBytes check
2020-01-12 20:08:07 -08:00
Jonathan Wei aa539177ec De-incubation cleanup in code, docs, packaging (#9108)
* De-incubation cleanup in code, docs, packaging

* remove unused docs script
2020-01-03 12:33:19 -05:00
Jonathan Wei 4e8368a5d9 Set version to 0.18.0-SNAPSHOT (#9109) 2020-01-02 17:55:10 -05:00
Suneet Saldanha 301c0649a7 Fix equalsAndHashCode in ClientCompactQueryTuningConfig (#9035)
* Fix equalsAndHashCode in ClientCompactQueryTuningConfig

This change introduces a dependency to EqualsVerifier for the test scope.
The dependency is licensed under Apache 2. The library makes it trivial
to add equals and hashCode checks to prevent bugs like this from happening
in the future

* fix checkstyle

* fix test name
2019-12-16 14:33:00 -08:00
Himanshu 9236dd9467
optionally enable Jetty ForwardedRequestCustomizer (#9010)
* optionally enable Jetty ForwardedRequestCustomizer

* fix doc build
2019-12-12 17:00:08 -08:00
Jihoon Son e5e1e9c4ee
Fix broken master (#9005)
* Multibinding for NodeRole

* Fix endpoints

* fix doc

* fix test
2019-12-11 15:56:36 -08:00
Jonathan Wei 8af41d7cd0 Update version to 0.18.0-incubating-SNAPSHOT (#9009) 2019-12-11 14:04:03 -08:00
Parag Jain 24fe824055 add readiness endpoints to processes having initialization delays (#8841) 2019-12-10 17:26:13 -08:00
Parag Jain 9640f9649a fix npe while logging sql/query request (#9001)
* fix npe while logging sql/query request

* forbid forbidden DateTime API
2019-12-09 12:02:11 -08:00
Roman Leventov 1c62987783
Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702)
* Add SelfDiscoveryResource

* Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself).

* Extended docs

* Fix brace

* Remove redundant throws in Lifecycle.Handler.stop()

* Import order

* Remove unresolvable link

* Address comments

* tmp

* tmp

* Rollback docker changes

* Remove extra .sh files

* Move filter

* Fix SecurityResourceFilterTest
2019-12-08 18:47:58 +03:00
Clint Wylie 06cd30460e
add query metrics for broker parallel merges, off by default (#8981)
* add a bunch of metrics for broker parallel merges, off by default, and tests

* fix tests

* review stuffs

* propogateIfPossible
2019-12-06 13:42:53 -08:00
Chi Cao Minh af74acaa85 Address security vulnerabilities CVSS >= 7 (#8980)
* Address security vulnerabilities CVSS >= 7

Update dependencies to address security vulnerabilities with CVSS scores
of 7 or higher. A new Travis CI job is added to prevent new
high/critical security vulnerabilities from being added.

Updated dependencies:
- api-util 1.0.0 -> 1.0.3
- jackson 2.9.10 -> 2.10.1
- kafka 2.1.0 -> 2.1.1
- libthrift 0.10.0 -> 0.13.0
- protobuf 3.2.0 -> 3.11.0

The following high/critical security vulnerabilities are currently
suppressed (so that the new Travis CI job can be added now) and are left
as future work to fix:
- hibernate-validator:5.2.5
- jackson-mapper-asl:1.9.13
- libthrift:0.6.1
- netty:3.10.6
- nimbus-jose-jwt:4.41.1

* Rename EDL1 license file

* Fix inspection errors
2019-12-05 14:34:35 -08:00
Fangyuan Deng 187cf0dd3f [Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988)
* historical fast restart by lazy load columns metadata

* delete repeated code

* add documentation for druid.segmentCache.lazyLoadOnStart

* fix unit test fail

* fix spellcheck

* update docs

* update docs mentioning a catch
2019-12-03 09:47:01 -08:00
Jonathan Wei 00ce18a0ea
Additional Kinesis resharding fixes (#8870)
* Additional Kinesis resharding fixes

* Address PR comments

* Remove unused method

* Adjust SegmentTransactionalInsertAction null handling

* Check for unchanged metadata on empty publish

* Add logs for empty publish

* Fix javadoc

* Clear offset when invalid endOffsets are seen

* Fix LGTM alert

* Fix build

* Add resharding note to Kinesis docs

* Checkstyle

* Spelling

* Address PR comments

* Checkstyle
2019-11-28 12:59:01 -08:00
jon-wei dfbc066163 Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1"
This reverts commit a0f21d9b07.
2019-11-27 23:22:43 -08:00
jon-wei 0402ff85b8 Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit 8ffa71e7e6.
2019-11-27 23:22:32 -08:00
jon-wei 8ffa71e7e6 [maven-release-plugin] prepare for next development iteration 2019-11-27 23:18:48 -08:00
jon-wei a0f21d9b07 [maven-release-plugin] prepare release druid-0.16.1-incubating-rc1 2019-11-27 23:18:37 -08:00
Chi Cao Minh fba876b607 Update jackson to 2.9.10 (#8940)
Addresses security vulnerabilities:

- sonatype-2016-0397:
  https://github.com/FasterXML/jackson-core/issues/315

- sonatype-2017-0355:
  https://github.com/FasterXML/jackson-core/pull/322
2019-11-26 21:41:14 -08:00
Gian Merlino e0eb85ace7 Add FileUtils.createTempDir() and enforce its usage. (#8932)
* Add FileUtils.createTempDir() and enforce its usage.

The purpose of this is to improve error messages. Previously, the error
message on a nonexistent or unwritable temp directory would be
"Failed to create directory within 10,000 attempts".

* Further updates.

* Another update.

* Remove commons-io from benchmark.

* Fix tests.
2019-11-22 19:48:49 -08:00
SeKing 9955107e8e RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461) 2019-11-22 03:46:54 -08:00
Jihoon Son 934547a215
RetryingInputEntity to retry on transient errors (#8923)
* RetryingInputEntity to retry on transient errors

* fix some javadoc and httpEntity

* Make it interface

* Javadoc for offset
2019-11-21 21:32:18 -08:00
Chi Cao Minh ff6217365b Refactor parallel indexing perfect rollup partitioning (#8852)
* Refactor parallel indexing perfect rollup partitioning

Refactoring to make it easier to later add range partitioning for
perfect rollup parallel indexing. This is accomplished by adding several
new base classes (e.g., PerfectRollupWorkerTask) and new classes for
encapsulating logic that needs to be changed for different partitioning
strategies (e.g., IndexTaskInputRowIteratorBuilder).

The code is functionally equivalent to before except for the following
small behavior changes:

1) PartialSegmentMergeTask: Previously, this task had a priority of
   DEFAULT_TASK_PRIORITY. It now has a priority of
   DEFAULT_BATCH_INDEX_TASK_PRIORITY (via the new PerfectRollupWorkerTask
   base class), since it is a batch index task.

2) ParallelIndexPhaseRunner: A decorator was added to
   subTaskSpecIterator to ensure the subtasks are generated with unique
   ids. Previously, only tests (i.e., MultiPhaseParallelIndexingTest)
   would have this decorator, but this behavior is desired for non-test
   code as well.

* Fix forbidden apis and pmd warnings

* Fix analyze dependencies warnings

* Fix IndexTask json and add IT diags

* Fix parallel index supervisor<->worker serde

* Fix TeamCity inspection errors/warnings

* Fix TeamCity inspection errors/warnings again

* Integrate changes with those from #8823

* Address review comments

* Address more review comments

* Fix forbidden apis

* Address more review comments
2019-11-20 17:24:12 -08:00
Jihoon Son ac6d703814 Support inputFormat and inputSource for sampler (#8901)
* Support inputFormat and inputSource for sampler

* Cleanup javadocs and names

* fix style

* fix timed shutoff input source reader

* fix timed shutoff input source reader again

* tidy up timed shutoff reader

* unused imports

* fix tc
2019-11-20 14:51:25 -08:00
Gian Merlino c44452f0c1 Tidy up lifecycle, query, and ingestion logging. (#8889)
* Tidy up lifecycle, query, and ingestion logging.

The goal of this patch is to improve the clarity and usefulness of
Druid's logging for cluster operators. For more information, see
https://twitter.com/cowtowncoder/status/1195469299814555648.

Concretely, this patch does the following:

- Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the
  goal of reducing redundancy and improving clarity by avoiding
  showing rarely-useful log messages. This includes most "starting"
  and "stopping" messages, and most messages related to individual
  columns.
- Adds new log4j2 templates that show operators how to enabled DEBUG
  logging for certain important packages.
- Eliminate stack traces for query errors, unless log level is DEBUG
  or more. This is useful because query errors often indicate user
  error rather than system error, but dumping stack trace often gave
  operators the impression that there was a system failure.
- Adds task id to Appenderator, AppenderatorDriver thread names. In
  the default log4j2 configuration, this will put them in log lines
  as well. It's very useful if a user is using the Indexer, where
  multiple tasks run in the same JVM.
- More consistent terminology when it comes to "sequences" (sets of
  segments that are handed-off together by Kafka ingestion) and
  "offsets" (cursors in partitions). These terms had been confused in
  some log messages due to the fact that Kinesis calls offsets
  "sequence numbers".
- Replaces some ugly toString calls with either the JSONification or
  something more operator-accessible (like a URL or segment identifier,
  instead of JSON object representing the same).

* Adjustments.

* Adjust integration test.
2019-11-19 13:57:58 -08:00
Chi Cao Minh 8365bdf62a Address security vulnerabilities (#8878)
* Address security vulnerabilities

Security vulnerabilities addressed by upgrading 3rd party libs:

- Upgrade avro-ipc to 1.9.1
  - sonatype-2019-0115
- Upgrade caffeine to 2.8.0
  - sonatype-2019-0282
- Upgrade commons-beanutils to 1.9.4
  - CVE-2014-0114
- Upgrade commons-codec to 1.13
  - sonatype-2012-0050
- Upgrade commons-compress to 1.19
  - CVE-2019-12402
  - sonatype-2018-0293
- Upgrade hadoop-common to 2.8.5
  - CVE-2018-11767
- Upgrade hadoop-mapreduce-client-core to 2.8.5
  - CVE-2017-3166
- Upgrade hibernate-validator to 5.2.5
  - CVE-2017-7536
- Upgrade httpclient to 4.5.10
  - sonatype-2017-0359
- Upgrade icu4j to 55.1
  - CVE-2014-8147
- Upgrade jackson-databind to 2.6.7.3:
  - CVE-2017-7525
- Upgrade jetty-http to 9.4.12:
  - CVE-2017-7657
  - CVE-2017-7658
  - CVE-2017-7656
  - CVE-2018-12545
- Upgrade log4j-core to 2.8.2
  - CVE-2017-5645:
- Upgrade netty to 3.10.6
  - CVE-2015-2156
- Upgrade netty-common to 4.1.42
  - CVE-2019-9518
- Upgrade netty-codec-http to 4.1.42
  - CVE-2019-16869
- Upgrade nimbus-jose-jwt to 4.41.1
  - CVE-2017-12972
  - CVE-2017-12974
- Upgrade plexus-utils to 3.0.24
  - CVE-2017-1000487
  - sonatype-2015-0173
  - sonatype-2016-0398
- Upgrade postgresql to 42.2.8
  - CVE-2018-10936

Note that if users are using JDBC lookups with postgres, they may need
to update the JDBC jar used by the lookup extension.

* Fix license for postgresql
2019-11-19 09:14:33 -08:00
Vadim Ogievetsky 17d773dca2 Web console: replace (and remove) old consoles (#8838)
* first steps

* clean licenses

* fix capabilities

* fix specs

* more tests

* new web console on coordinator and overlord, remove setup for old consoles, old configs

* better message

* update licenses

* sync license files

* more button

* fix tslint issue

* jetty-rewrite dependency to add redirects for old console paths

* put dependency in the right place

* fix overlord detection

* fix notices, dedupe licenses

* make segment timeline work in no SQL mode

* update license

* revert hard coded coordinator mode from testing

* update restricted mode copy
2019-11-15 19:45:14 -08:00
Jihoon Son 1611792855
Add InputSource and InputFormat interfaces (#8823)
* Add InputSource and InputFormat interfaces

* revert orc dependency

* fix dimension exclusions and failing unit tests

* fix tests

* fix test

* fix test

* fix firehose and inputSource for parallel indexing task

* fix tc

* fix tc: remove unused method

* Formattable

* add needsFormat(); renamed to ObjectSource; pass metricsName for reader

* address comments

* fix closing resource

* fix checkstyle

* fix tests

* remove verify from csv

* Revert "remove verify from csv"

This reverts commit 1ea7758489.

* address comments

* fix import order and javadoc

* flatMap

* sampleLine

* Add IntermediateRowParsingReader

* Address comments

* move csv reader test

* remove test for verify

* adjust comments

* Fix InputEntityIteratingReader

* rename source -> entity

* address comments
2019-11-15 09:22:09 -08:00
Clint Wylie 7aafcf8bca parallel broker merges on fork join pool (#8578)
* sketch of broker parallel merges done in small batches on fork join pool

* fix non-terminating sequences, auto compute parallelism

* adjust benches

* adjust benchmarks

* now hella more faster, fixed dumb

* fix

* remove comments

* log.info for debug

* javadoc

* safer block for sequence to yielder conversion

* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool

* smooth yield rate adjustment, more logs to help tune

* cleanup, less logs

* error handling, bug fixes, on by default, more parallel, more tests

* remove unused var

* comments

* timeboundary mergeFn

* simplify, more javadoc

* formatting

* pushdown config

* use nanos consistently, move logs back to debug level, bit more javadoc

* static terminal result batch

* javadoc for nullability of createMergeFn

* cleanup

* oops

* fix race, add docs

* spelling, remove todo, add unhandled exception log

* cleanup, revert unintended change

* another unintended change

* review stuff

* add ParallelMergeCombiningSequenceBenchmark, fixes

* hyper-threading is the enemy

* fix initial start delay, lol

* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2

* fix those important style issues with the benchmarks code

* lazy sequence creation for benchmarks

* more benchmark comments

* stable sequence generation time

* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs

* add jmh thread based benchmarks, cleanup some stuff

* oops

* style

* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose

* retool benchmark to allow modeling more typical heterogenous heavy workloads

* spelling

* fix

* refactor benchmarks

* formatting

* docs

* add maxThreadStartDelay parameter to threaded benchmark

* why does catch need to be on its own line but else doesnt
2019-11-07 11:58:46 -08:00
Zhenxiao Luo a9aa416c3d In DirectDruidClient, don't run Future cancellation listener in… (#8700)
* In DirectDruidClient, don't run Future cancellation listener in HTTP library executor

* extract cancelQuery as a method of DirectDruidClient

* Fix testCancel

* Add exception as the first argument to log.error
2019-11-07 21:12:18 +03:00
Roman Leventov 5c0fc0a13a Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564)
* IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals()

* Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever

* Fix tests

* More fixes

* Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code

* Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase

* More test fixes

* More test fixes

* Add a comment to VersionedIntervalTimelineTestBase

* Fix tests

* Set DataSegment.size(0) in more tests

* Specify DataSegment.size(0) in more places in tests

* Fix more tests

* Fix DruidSchemaTest

* Set DataSegment's size in more tests and benchmarks

* Fix HdfsDataSegmentPusherTest

* Doc changes addressing comments

* Extended doc for visibility

* Typo

* Typo 2

* Address comment
2019-11-06 11:07:04 -08:00
Jihoon Son 511fa74fa2 Move maxFetchRetry to FetchConfig; rename OpenObject (#8776) 2019-11-04 08:26:33 -08:00
yuanli bca649e492 Case sensitive comparison of nonbinary string in MySQL metadata storage (#8758) 2019-10-30 20:48:08 -07:00
Clint Wylie 3ff5e02237 remove select query (#8739)
* remove select query

* thanks teamcity

* oops

* oops

* add back a SelectQuery class that throws RuntimeExceptions linking to docs

* adjust text

* update docs per review

* deprecated
2019-10-30 19:29:56 -07:00
Jihoon Son 094936ca03 Remove commit() method Firehose (#8688)
* Remove commit() method Firehose

* fix javadoc
2019-10-23 16:52:02 -07:00
Jihoon Son 2518478b20 Remove deprecated parameter for Checkpoint request (#8707)
* Remove deprecated parameter for Checkpoint request

* fix wrong doc
2019-10-23 16:51:16 -07:00
Jihoon Son f5b9bf5525 Cluster-wide configuration for query vectorization (#8657)
* Cluster-wide configuration for query vectorization

* add doc

* fix build

* fix doc

* rename to QueryConfig and add javadoc

* fix checkstyle

* fix variable names
2019-10-23 21:44:28 +08:00
Surekha 98f59ddd7e Add `sys.supervisors` table to system tables (#8547)
* Add supervisors table to SystemSchema

* Add docs

* fix checkstyle

* fix test

* fix CI

* Add comments

* Fix javadoc teamcity error

* comments

* fix links in docs

* fix links

* rename fullStatus query param to system and remove it from docs
2019-10-18 15:16:42 -07:00
Jihoon Son 30c15900be
Auto compaction based on parallel indexing (#8570)
* Auto compaction based on parallel indexing

* javadoc and doc

* typo

* update spell

* addressing comments

* address comments

* fix log

* fix build

* fix test

* increase default max input segment bytes per task

* fix test
2019-10-18 13:24:14 -07:00
Mingming Qiu 2c758ef5ff Support assign tasks to run on different categories of MiddleManagers (#7066)
* Support assign tasks to run on different tiers of MiddleManagers

* address comments

* address comments

* rename tier to category and docs

* doc

* fix doc

* fix spelling errors

* docs
2019-10-17 12:57:19 -07:00
Aditya 75527f09cd implement FiniteFirehoseFactory in InlineFirehose (#8682)
* implement FiniteFirehoseFactory in InlineFirehose

* override isSplittable in InlineFirehoseFactory & improve tests
2019-10-16 19:28:55 -07:00
Jihoon Son 4046c86d62
Stateful auto compaction (#8573)
* Stateful auto compaction

* javaodc

* add removed test back

* fix test

* adding indexSpec to compactionState

* fix build

* add lastCompactionState

* address comments

* extract CompactionState

* fix doc

* fix build and test

* Add a task context to store compaction state; add javadoc

* fix it test
2019-10-15 22:57:42 -07:00
Sashidhar Thallam 124efa85f6 Fix for RoundRobinStorageLocationSelectorStrategy not to pick the same storage location over and again (#8634)
* Fix for 8614: iterators returned by strategy don't share iteration state.

* Adding comments
2019-10-12 09:53:52 +08:00
Jihoon Son 96d8523ecb Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571)
* IOConfig for compaction task

* add javadoc, doc, unit test

* fix webconsole test

* add spelling

* address comments

* fix build and test

* address comments
2019-10-09 11:12:00 -07:00
Chi Cao Minh b6b5517c20 Speed up ParallelIndexSupervisorTask tests (#8633)
Previously, some tests for ParallelIndexSupervisorTask were being run
twice unnecessarily.
2019-10-08 19:56:12 -07:00
Nishant Bangarwa 0853273091 Add tier based usage metrics for historical nodes to help with autoscaling (#8636)
* Add tier based usage metrics for historical nodes to help with druid historical autoscaling

Add tier based usage metrics for historical nodes to help druid cluster orchestration systems understand the historical node usage and requirements. Following metrics would be helpful -

tier/required/capacity- total capacity in bytes required in each tier. Dimensions - tier
tier/total/capacity - total capacity in bytes available in a given tier. Dimension - tier
tier/historical/count - no. of historical nodes available in each tier. Dimension - tier
tier/replication/factor - configured maximum replication factor in given tier. Dimension - tier

* fix unit test failures
2019-10-08 19:55:32 -07:00
Himanshu 0f7e0ff030 CuratorDruidNodeDiscoveryProvider: do not ignore exception in listener execution and log it (#8616) 2019-10-02 08:50:28 -07:00
Sashidhar Thallam 51a7235ebc Making optimal usage of multiple segment cache locations (#8038)
* #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations

* Fixing indentation

* WIP

* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy

* fixing code style

* Fixing test

* Adding a method visible only for testing, fixing tests

* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes

* fixing the conditional statement

* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy

* to trigger CI build

* Add documentation for the selection strategy configuration

* to re trigger CI build

* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes

* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail

* Implementing review comments. Added tests for StorageLocationSelectorStrategy

* Checkstyle fixes

* Adding java doc comments for StorageLocationSelectorStrategy interface

* checkstyle

* empty commit to retrigger build

* Empty commit

* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file

* Impl review comments including updating docs as suggested

* Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose

* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy

* Fixing the round robin iterator

* Removed numLocationsToTry, updated java docs

* changing property attribute value from tier to type

* Fixing assert messages
2019-09-28 00:17:44 -06:00
Faxian Zhao e1b4a3ab71 bug fix for lookup leak when we remove the last lookup from lookup tier (#8598)
* bug fix for lookup leak when we remove the last lookup from lookup tier

* warnings about lookups that will never be loaded

* fix unit test
2019-09-27 03:55:02 -07:00
Clint Wylie 7781820dea JsonParserIterator.init future timeout (#8550)
* add timeout support for JsonParserIterator init future

* add queryId

* should be less than 1

* fix

* fix npe

* fix lgtm

* adjust exception, nullable

* fix test

* refactor

* revert queryId change

* add log.warn to tie exception to json parser iterator
2019-09-27 09:13:37 +09:00
Fangyuan Deng a280c5dc03 fix queuedSize not decrease in HttpLoadQueuePeon when load failed (#8596) 2019-09-26 08:48:00 -07:00
Nishant Bangarwa a75ddaad9e Add TrustedDomain Authenticator (#8248)
* Add TrustedDomain Authenticator

update javadoc

Add nullable annotations

Add cautionary note

fix travis failure

* add IP to spell checker
2019-09-25 11:25:03 -07:00
Clint Wylie eabddffd6e fix http firehose factory leaky connection in constructor (#8576)
* fix http firehose factory leaky connection in constructor

* stylin
2019-09-24 17:08:43 -06:00
SandishKumarHN ade8d1922d #8156 : StructuralSearchInspection, Prohibit check on Thread.ge… (#8394)
* StructuralSearchInspection, Prohibit check on Thread.getState()

* review changes - 1

* review changes 2

* review changes 3

* test fix

* review changes-2

* review changes-3
2019-09-22 14:12:05 +03:00
Chi Cao Minh 187b507b3d Fix CuratorModule flaky test (#8562)
For CuratorModuleTest. exitsJvmWhenMaxRetriesExceeded(), the expected
log message is intermittently not the first one in the list of captured
log messages. For example, it is the second one in
https://travis-ci.org/apache/incubator-druid/jobs/586792178#L754.
2019-09-20 13:36:22 -07:00
Chi Cao Minh 5f39ee21ff Add diags for flaky CuratorModuleTest (#8528)
Add diagnostic messages to debug
CuratorModuleTest.exitsJvmWhenMaxRetriesExceeded() intermittent test
failures.
2019-09-14 11:55:26 -07:00
Jihoon Son 762f4d0e58 Check targetCompactionSizeBytes to search for candidate segments in auto compaction (#8495)
* Check targetCompactionSizeBytes to search for candidate segments in auto compaction

* fix logs

* add javadoc

* rename
2019-09-09 23:11:08 -07:00
Chi Cao Minh 5f61374cb3 Fix dependency analyze warnings (#8230)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.

* Address review comments

* Adjust scope for org.glassfish.jaxb:jaxb-runtime

* Fix dependencies for hdfs-storage

* Consolidate netty4 versions
2019-09-09 14:37:21 -07:00
Chi Cao Minh 14a8613d69 Exit JVM on curator unhandled errors (#8458)
* Exit JVM on curator unhandled errors

If an unhandled error occurs when curator is talking to ZooKeeper, exit
the JVM in addition to stopping the lifecycle to prevent the process
from being left in a zombie state. With this change,
BoundedExponentialBackoffRetryWithQuit is no longer needed as when
curator exceeds the configured retries, it triggers its unhandled error
listeners. A new "connectionTimeoutMs" CuratorConfig setting is added
mostly to facilitate testing curator unhandled errors, but it may be
useful for users as well.

* Address review comments
2019-09-06 16:43:59 -07:00
Rye 645799f977 disallow whitespace characters except space in data source names (#8465)
* disallow whitespace characters in data source names

* wrapped preconditions in a function, and simplify unit tests code

* Fixed regex to allow space, simplified repeat logic

* Fixed import style against mvn checkstyle

* Add msg in case test fails, use emptyMap(), improved naming

* Changes on assertion functions

* change wording of "whitespace" to "whitespace except space" to avoid misleading
2019-09-06 08:55:21 -07:00
Clint Wylie c73a489335
bump master version to 0.17.0-incubating-SNAPSHOT (#8421) 2019-08-28 01:58:36 -07:00
Himanshu 4d87a19547
Logging emitter to publish query and other metric events as valid json objects (#8359)
* LoggingEmitter: print event as json

* use DefaultRequestLogEventBuilderFactory in emitting request logger by default

* print context in query metric as json

* removed unused jsonMapper from DefaultQueryMetrics

* add comment

* remove change to DefaultRequestLogEventBuilderFactory.java
2019-08-27 15:00:23 -07:00
Jihoon Son e5ef5ddafa Fix the shuffle with TLS enabled for parallel indexing; add an integration test; improve unit tests (#8350)
* Fix shuffle with tls enabled; add an integration test; improve unit tests

* remove debug log

* fix tests

* unused import

* add javadoc

* rename to getContent
2019-08-26 19:27:41 -07:00
Xavier Léauté d5e3c53e74 workaround for Guava 16 bug using Java 9 and above 2019-08-25 00:58:59 -04:00
SandishKumarHN 33f0753a70 Add Checkstyle for constant name static final (#8060)
* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* merging with upstream

* review-1

* unknow changes

* unknow changes

* review-2

* merging with master

* review-2 1 changes

* review changes-2 2

* bug fix
2019-08-23 13:13:54 +03:00
Jonathan Wei 96e2142ea3 Cleanup appenderators and segment walkers in UnifiedIndexerAppenderatorsManager (#8287)
* Cleanup Appenderators in UnifiedIndexerAppenderatorsManager

* PR comments

* More PR comments

* Fix test
2019-08-22 12:18:46 -07:00
Jihoon Son 22d6384d36
Fix unrealistic test variables in KafkaSupervisorTest and tidy up unused variable in checkpointing process (#7319)
* Fix unrealistic test arguments in KafkaSupervisorTest

* remove currentCheckpoint from checkpoint action

* rename variable
2019-08-21 10:58:22 -07:00
Benedict Jin 14a4238381 Bump JUnitParams from 1.0.4 to 1.1.1 (#8017) 2019-08-20 16:15:12 -07:00
Fokko Driesprong cb1339e19a Bump derby from 10.11.1.1 to 10.14.2.0 (#8292)
* Bump derby from 10.11.1.1 to 10.15.1.3

* Update server/pom.xml as well

* Move to derby 10.14.2.0

10.15.* is Java9+
https://db.apache.org/derby/derby_downloads.html
2019-08-20 14:03:32 -07:00
pphust ffcbd1ecb8 Ensure ReferenceCountingSegment.decrement() is invoked correctly (#8323)
* fix issue8291. Make sure ReferenceCountingSegment.decrement() is invoked correctly

* add some comments in SinkQuerySegmentWalker.java

* extracting perSegmentRunners and perHydrantRunners to reduce the level of nesting
2019-08-20 22:01:16 +03:00
Benedict Jin 781873ba53 Fix resource leak (#8337)
* Fix resource leak

* Patch comments
2019-08-20 12:55:41 +03:00
Benedict Jin 566dc8c719
Fix missing format argument (#8331) 2019-08-19 16:19:44 +08:00
Jihoon Son 5dac6375f3
Add support for parallel native indexing with shuffle for perfect rollup (#8257)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* Add support for parallel native indexing with shuffle for perfect rollup.

* Add comment about volatiles

* fix test

* fix test

* handling missing exceptions

* more clear javadoc for stopGracefully

* unused import

* update javadoc

* Add missing statement in javadoc

* address comments; fix doc

* add javadoc for isGuaranteedRollup

* Rename confusing variable name and fix typos

* fix typos; move fetch() to a better home; fix the expiration time

* add support https
2019-08-15 17:43:35 -07:00
Fokko Driesprong 1a3aa1cfc0 Bump commons-io from 2.5 to 2.6 (#8006)
* Bump commons-io from 2.5 to 2.6

* Update licenses.yaml

* Address comments
2019-08-13 17:10:37 -07:00
Jihoon Son a5c9c2950f Add missing maxBytesInMemory in tuningConfig for auto compaction (#8274)
* Add missing tuningConfigs for auto compaciton

* Add doc

* add test
2019-08-13 14:10:26 -05:00
Jihoon Son 312cdc2452 Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks (#8236)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* handling missing exceptions

* more clear javadoc for stopGracefully

* update javadoc

* Add missing statement in javadoc

* typo
2019-08-12 19:42:06 -05:00
Clint Wylie 1054d85171
add mechanism to control filter optimization in historical query processing (#8209)
* add support for mechanism to control filter optimization in historical query processing

* oops

* adjust

* woo

* javadoc

* review comments

* fix

* default

* oops

* oof

* this will fix it

* more nullable, refactor DimFilter.getRequiredColumns to use Set, formatting

* extract class DimFilterToStringBuilder with common code from custom DimFilter toString implementations

* adjust variable naming

* missing nullable

* more nullable

* fix javadocs

* nullable

* address review comments

* javadocs, precondition

* nullable

* rename method to be consistent

* review comments

* remove tuning from ColumnComparisonFilter/ColumnComparisonDimFilter
2019-08-09 16:36:18 -07:00
Jonathan Wei e88bbe71c0 Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255) 2019-08-07 23:04:07 -07:00
Jihoon Son 8fa114c349 Fix bugs in overshadowableManager and add unit tests (#8222)
* Fix bugs in overshadowableManager and add unit tests

* Fix SegmentManager

* add segment manager test

* Address comments

* Address comments
2019-08-07 15:51:21 -05:00
Jonathan Wei 5e57492298 Add docs for CliIndexer as an experimental feature (#8245)
* Experimental CliIndexer docs

* PR comments
2019-08-06 15:57:17 -07:00
Lucas Capistrant e252abedc5 Enable toggling request logging on/off for different query types (#7562)
* Enable ability to toggle SegmentMetadata request logging on/off

* Move SegmentMetadata query log filter to FilteredRequestLogger

* Update documentation to reflect the segment metadata flag moving to the filtered request logger

* Modify patch to allow blacklist of query types to not log to request logger

* Address styling and naming requests following latest code review

* Fix indentation on multiple locations per Druid style rules
2019-08-06 15:47:30 +03:00
Jihoon Son ab5b3be6c6 Add shuffleSegmentPusher for data shuffle (#8115)
* Fix race between canHandle() and addSegment() in StorageLocation

* add comment

* Add shuffleSegmentPusher which is a dataSegmentPusher used for writing shuffle data in local storage.

* add comments

* unused import

* add comments

* fix test

* address comments

* remove <p> tag from javadoc

* address comments

* comparingLong

* Address comments

* fix test
2019-08-05 13:38:35 -07:00
Eugene Sevastianov 3f3162b85e Enum of ResponseContext keys (#8157)
* Refactored ResponseContext and aggregated its keys into Enum

* Added unit tests for ResponseContext and refactored the serialization

* Removed unused methods

* Fixed code style

* Fixed code style

* Fixed code style

* Made SerializationResult static

* Updated according to the PR discussion:

Renamed an argument

Updated comparator

Replaced Pair usage with Map.Entry

Added a comment about quadratic complexity

Removed boolean field with an expression

Renamed SerializationResult field

Renamed the method merge to add and renamed several context keys

Renamed field and method related to scanRowsLimit

Updated a comment

Simplified a block of code

Renamed a variable

* Added JsonProperty annotation to renamed ScanQuery field

* Extension-friendly context key implementation

* Refactored ResponseContext: updated delegate type, comments and exceptions

Reducing serialized context length by removing some of its'
collection elements

* Fixed tests

* Simplified response context truncation during serialization

* Extracted a method of removing elements from a response context and
added some comments

* Fixed typos and updated comments
2019-08-03 12:05:21 +03:00
Fokko Driesprong 91743eeebe Spotbugs: NP_NONNULL_PARAM_VIOLATION (#8129) 2019-08-02 19:20:22 +03:00
Chi Cao Minh 7783b31846 Add IPv4 druid expressions (#8197)
* Add IPv4 druid expressions

New druid expressions for filtering IPv4 addresses:
- ipv4address_match: Check if IP address belongs to a subnet
- ipv4address_parse: Convert string IP address to long
- ipv4address_stringify: Convert long IP address to string

These expressions operate on IP addresses represented as either strings
or longs, so that they can be applied to dimensions with mixed
representation of IP addresses. The filtering is more efficient when
operating on IP addresses as longs. In other words, the intended use
case is:

1) Use ipv4address_parse to convert to long at ingestion time
2) Use ipv4address_match to filter (on longs) at query time
3) Use ipv4adress_stringify to convert to (readable) string at query
time

* Fix licenses and null handling

* Simplify IPv4 expressions

* Fix tests

* Fix check for valid ipv4 address string
2019-08-01 11:45:04 -07:00
Jonathan Wei 41893d4647 Simple memory allocation for CliIndexer tasks (#8201)
* Simple memory allocation for CliIndexer

* PR comments

* Checkstyle
2019-08-01 10:22:41 +08:00
Gian Merlino 77297f4e6f GroupBy array-based result rows. (#8196)
* GroupBy array-based result rows.

Fixes #8118; see that proposal for details.

Other than the GroupBy changes, the main other "interesting" classes are:

- ResultRow: The array-based result type.
- BaseQuery: T is no longer required to be Comparable.
- QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization
  and deserialization of result rows (necessary due to their positional nature).
- QueryResource: Uses the new decoration functionality.
- DirectDruidClient: Also uses the new decoration functionality.
- QueryMaker (in Druid SQL): Modifications to read ResultRows.

These classes weren't changed, but got some new javadocs:

- BySegmentQueryRunner
- FinalizeResultsQueryRunner
- Query

* Adjustments for TC stuff.
2019-07-31 16:15:12 -07:00
Fokko Driesprong faf51107d5 Add SuppressWarnings SS_SHOULD_BE_STATIC (#8138)
* Spotbugs: SS_SHOULD_BE_STATIC (#8073)

* Add SuppressWarnings SS_SHOULD_BE_STATIC

Fixes #8073

* Fix the voilation

* Make them non-final

* Remove @Nonnull
2019-07-31 19:44:42 +03:00
Jihoon Son 385f492a55
Use PartitionsSpec for all task types (#8141)
* Use partitionsSpec for all task types

* fix doc

* fix typos and revert to use isPushRequired

* address comments

* move partitionsSpec to core

* remove hadoopPartitionsSpec
2019-07-30 17:24:39 -07:00
Clint Wylie 653b558134 sql firehose and firehose doc adjustments (#8067)
* firehose doc adjustments

* fix typo

* additional information on parser types in ingestion docs

* clarify ingest segment firehose docs, add sql firehose examples to sql extension pages

* fixit

* make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers

* transforms

* fix tests
2019-07-30 15:28:10 -07:00
Fokko Driesprong e016995d1f Enable Spotbugs: WMI_WRONG_MAP_ITERATOR (#8005)
* WMI_WRONG_MAP_ITERATOR

* Fixed missing loop
2019-07-30 19:51:53 +03:00
Jonathan Wei 640b7afc1c Add CliIndexer process type and initial task runner implementation (#8107)
* Add CliIndexer process type and initial task runner implementation

* Fix HttpRemoteTaskRunnerTest

* Remove batch sanity check on PeonAppenderatorsManager

* Fix paralle index tests

* PR comments

* Adjust Jersey resource logging

* Additional cleanup

* Fix SystemSchemaTest

* Add comment to LocalDataSegmentPusherTest absolute path test

* More PR comments

* Use Server annotated with RemoteChatHandler

* More PR comments

* Checkstyle

* PR comments

* Add task shutdown to stopGracefully

* Small cleanup

* Compile fix

* Address PR comments

* Adjust TaskReportFileWriter and fix nits

* Remove unnecessary closer

* More PR comments

* Minor adjustments

* PR comments

* ThreadingTaskRunner: cancel  task run future not shutdownFuture and remove thread from workitem
2019-07-29 17:06:33 -07:00
Chi Cao Minh ab71a2e1e4 Revert "Fix dependency analyze warnings (#8128)" (#8189)
This reverts commit 5dd0d8e873.
2019-07-29 11:42:16 -07:00
Jihoon Son adf7bafb9f Fix race between canHandle() and addSegment() in StorageLocation (#8114)
* Fix race between canHandle() and addSegment() in StorageLocation

* add comment

* add comments

* fix test

* address comments

* remove <p> tag from javadoc

* address comments

* comparingLong
2019-07-27 11:11:06 +03:00
Chi Cao Minh 5dd0d8e873 Fix dependency analyze warnings (#8128)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.

* Fix licenses and dependencies

* Fix licenses and dependencies again

* Fix integration test dependency

* Address review comments

* Fix unit test dependencies

* Fix integration test dependency

* Fix integration test dependency again

* Fix integration test dependency third time

* Fix integration test dependency fourth time

* Fix compile error

* Fix assert package
2019-07-26 10:49:03 -07:00
Parag Jain 31a29d8883 add noop type name to prevent jackson exception when setting type to noop (#8133) 2019-07-25 16:07:08 -07:00
Jihoon Son db14946207
Add support minor compaction with segment locking (#7547)
* Segment locking

* Allow both timeChunk and segment lock in the same gruop

* fix it test

* Fix adding same chunk to atomicUpdateGroup

* resolving todos

* Fix segments to lock

* fix segments to lock

* fix kill task

* resolving todos

* resolving todos

* fix teamcity

* remove unused class

* fix single map

* resolving todos

* fix build

* fix SQLMetadataSegmentManager

* fix findInputSegments

* adding more tests

* fixing task lock checks

* add SegmentTransactionalOverwriteAction

* changing publisher

* fixing something

* fix for perfect rollup

* fix test

* adjust package-lock.json

* fix test

* fix style

* adding javadocs

* remove unused classes

* add more javadocs

* unused import

* fix test

* fix test

* Support forceTimeChunk context and force timeChunk lock for parallel index task if intervals are missing

* fix travis

* fix travis

* unused import

* spotbug

* revert getMaxVersion

* address comments

* fix tc

* add missing error handling

* fix backward compatibility

* unused import

* Fix perf of versionedIntervalTimeline

* fix timeline

* fix tc

* remove remaining todos

* add comment for parallel index

* fix javadoc and typos

* typo

* address comments
2019-07-24 17:35:46 -07:00
Clint Wylie 0695e487e7 fix issue with CuratorLoadQueuePeon shutting down executors it does not own (#8140)
* fix issue with CuratorLoadQueuePeon shutting down executors it does not own

* use lifecycled executors

* maybe this
2019-07-24 10:59:43 -07:00
Eugene Sevastianov 799d20249f Response context refactoring (#8110)
* Response context refactoring

* Serialization/Deserialization of ResponseContext

* Added java doc comments

* Renamed vars related to ResponseContext

* Renamed empty() methods to createEmpty()

* Fixed ResponseContext usage

* Renamed multiple ResponseContext static fields

* Added PublicApi annotations

* Renamed QueryResponseContext class to ResourceIOReaderWriter

* Moved the protected method below public static constants

* Added createEmpty method to ResponseContext with DefaultResponseContext creation

* Fixed inspection error

* Added comments to the ResponseContext length limit and ResponseContext
http header name

* Added a comment of possible future refactoring

* Removed .gitignore file of indexing-service

* Removed a never-used method

* VisibleForTesting method reducing boilerplate

Co-Authored-By: Clint Wylie <cjwylie@gmail.com>

* Reduced boilerplate

* Renamed the method serialize to serializeWith

* Removed unused import

* Fixed incorrectly refactored test method

* Added comments for ResponseContext keys

* Fixed incorrectly refactored test method

* Fixed IntervalChunkingQueryRunnerTest mocks
2019-07-24 18:29:03 +03:00
Clint Wylie 0388581493
Revert "Spotbugs: SS_SHOULD_BE_STATIC (#8073)" (#8145)
This reverts commit 04a180a5fb.
2019-07-23 22:57:19 -07:00
Clint Wylie 83514958db remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention (#8116)
* remove unecessary lock in ForegroundCachePopulator leading to a lot of contention

* mutableboolean, javadocs,document some cache configs that were missing

* more doc stuff

* adjustments

* remove background documentation
2019-07-23 10:57:59 -07:00
Fokko Driesprong 04a180a5fb Spotbugs: SS_SHOULD_BE_STATIC (#8073) 2019-07-23 18:18:49 +08:00
Fokko Driesprong e1a745717e Spotbugs: NP_STORE_INTO_NONNULL_FIELD (#8021) 2019-07-21 21:23:47 +08:00
Clint Wylie f24e2f16af fix npe with sql metadata manager polling and empty database (#8106)
* fix npe with sql metadata manager polling and empty database

* treat null segments separately

* use preconditions check

* add test
2019-07-20 19:09:02 -07:00
Himanshu 54a7b54d2d avoid 'must return non-void type' warning (#8105) 2019-07-18 15:02:27 -07:00
Jihoon Son c7eb7cd018
Add intermediary data server for shuffle (#8088)
* Add intermediary data server for shuffle

* javadoc

* adjust timeout

* resolved todo

* fix test

* style

* address comments

* rename to shuffleDataLocations

* Address comments

* bit adjustment StorageLocation

* fix test

* address comment & fix test

* handle interrupted exception
2019-07-18 14:46:47 -07:00
Clint Wylie 03e55d30eb
add CachingClusteredClient benchmark, refactor some stuff (#8089)
* add CachingClusteredClient benchmark, refactor some stuff

* revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select

* adjust javadoc

* adjustments

* oops

* use it

* use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments

* rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters
2019-07-18 13:16:28 -07:00
Sashidhar Thallam 72496d3712 #7858 Throwing UnsupportedOperationException from ImmutableDrui… (#7933)
* #7858 Throwing UnsupportedOperationException from ImmutableDruidDataSource's equals() and hashCode() methods.

* 1. Turning ImmutableDruidDataSource into a data container. 2. Adding a Util method to be used in tests for checking equality of ImmutableDruidDataSource objects.

* Removing unused method

* Fixing assert equals

* Fixing assert equals in TestUtils.java

* Adding java doc comments, Using ExpectedException in tests

* Fixing test cases

* Fixed expected exception message in tests

* fixed line width

* line width fix

* code style fixes

* code indentation fixes

* fixing method name
2019-07-18 22:35:19 +03:00
Surekha da16144495 Refactoring to use `CollectionUtils.mapValues` (#8059)
* doc updates and changes to use the CollectionUtils.mapValues utility method

* Add Structural Search patterns to intelliJ

* refactoring from PR comments

* put -> putIfAbsent

* do single key lookup
2019-07-17 23:02:22 -07:00
Roman Leventov ceb969903f
Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653)
* Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource

* Style fixes

* Unused imports

* Fix tests

* Fix style

* Comments

* Comment fix

* Remove unresolvable Javadoc references; address comments

* Add comments to ImmutableDruidDataSource

* Merge with master

* Fix bad web-console merge

* Fixes in api-reference.md

* Rename in DruidCoordinatorRuntimeParams

* Fix compilation

* Residual changes
2019-07-17 17:18:48 +03:00
Clint Wylie 15fbf5983d add Class.getCanonicalName to forbidden-apis (#8086)
* add checkstyle to forbid unecessary use of Class.getCanonicalName

* use forbiddin-api instead of checkstyle

* add space
2019-07-16 15:21:50 -07:00
Chi Cao Minh da3d141dd2 Add inline firehose (#8056)
* Add inline firehose

To allow users to quickly parsing and schema, add a firehose that reads
data that is inlined in its spec.

* Address review comments

* Remove suppression of sonar warnings
2019-07-11 21:43:46 -07:00
Atul Mohan 631cda649b Include replicated segment size property for datasources endpoint (#8039)
* Add replication size

* Summon comma
2019-07-11 01:10:38 -07:00
Himanshu 14aec7fcec
add config to optionally disable all compression in intermediate segment persists while ingestion (#7919)
* disable all compression in intermediate segment persists while ingestion

* more changes and build fix

* by default retain existing indexingSpec for intermediate persisted segments

* document indexSpecForIntermediatePersists index tuning config

* fix build issues

* update serde tests
2019-07-10 12:22:24 -07:00
Gian Merlino 338b8b3fef
SupervisorManager: Add authorization checks to bulk endpoints. (#8044)
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
2019-07-09 13:16:54 -07:00
Parag Jain 027291a90d set DRUID_AUTHORIZATION_CHECKED attribute for router endpoints (#8026)
* add state resource filter to router endpoints

* add RouterResource to ResourceFilter test framework
2019-07-09 00:51:36 -07:00
Sashidhar Thallam 6701dc08fe Making StatusResponseHandler singleton and fixing all its instantiation invocations (#7969)
* Making StatusResponseHandler singleton and fixing all its instantiation invocations

* Using StatusResponseHandler.getInstance() where applicable
2019-07-08 13:33:00 +05:30
Chi Cao Minh 1166bbcb75 Remove static imports from tests (#8036)
Make static imports forbidden in tests and remove all occurrences to be
consistent with the non-test code.

Also, various changes to files affected by above:
- Reformat to adhere to druid style guide
- Fix various IntelliJ warnings
- Fix various SonarLint warnings (e.g., the expected/actual args to
  Assert.assertEquals() were flipped)
2019-07-06 09:33:12 -07:00
Clint Wylie 42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Clint Wylie f7283378ac
remove deprecated standalone realtime node (#7915)
* remove CliRealtime, RealtimeManager, etc

* add redirects for deleted page to page that explains the deleted thing

* adjust docs
2019-07-02 18:12:17 -07:00
Fokko Driesprong c6baa59f77 Enable DLS_DEAD_LOCAL_STORE (#7967) 2019-06-28 04:39:42 +08:00
Fokko Driesprong 82b248cc17 Spotbugs: Enable MS_SHOULD_BE_FINAL (#7946) 2019-06-23 15:42:18 -07:00
SandishKumarHN e80297efef Set Test timeout higher for robust performance (#7890) 2019-06-17 22:01:54 -07:00
SandishKumarHN 01881e3a98 Use only com.google.errorprone.annotations.concurrent.GuardedBy, not javax.annotations.concurrent.GuardedBy (#7889) 2019-06-17 15:58:51 +02:00
Himanshu b3328b2785
endpoint to delete lookup tier and remove tier on last lookup deletion (#7852) 2019-06-15 17:55:50 -07:00
Sashidhar Thallam 3bee6adcf7 Use map.putIfAbsent() or map.computeIfAbsent() as appropriate instead of containsKey() + put() (#7764)
* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()

* fixing indentation

* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate

* fixing checkstyle

* Changing the recommendation text

* Reverting auto changes made by IDE

* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap

* Removing unused import
2019-06-14 17:59:36 +02:00
Clint Wylie 3fbb0a5e00 Supervisor list api with states and health (#7839)
* allow optionally listing all supervisors with their state and health

* docs

* add state to full

* clean

* casing

* format

* spelling
2019-06-07 16:26:33 -07:00
Surekha ea752ef562 Optimize overshadowed segments computation (#7595)
* Move the overshadowed segment computation to SQLMetadataSegmentManager's poll

* rename method in MetadataSegmentManager

* Fix tests

* PR comments

* PR comments

* PR comments

* fix indentation

* fix tests

*  fix test

*  add test for SegmentWithOvershadowedStatus serde format

* PR comments

* PR comments

* fix test

* remove snapshot updates outside poll

* PR comments

* PR comments

* PR comments

*  removed unused import
2019-06-07 19:15:54 +02:00
Xue Yu d482da6e9b fix timestamp ceil lower bound bug (#7823) 2019-06-04 01:16:31 -07:00
Fokko Driesprong f2b00023f8 Bump Checkstyle to 8.21 (#7826) 2019-06-04 01:02:46 -07:00
Jihoon Son 61ec521135
Remove keepSegmentGranularity option for compaction (#7747)
* Remove keepSegmentGranularity option from compaction

* fix it test

* clean up

* remove from web console

* fix test
2019-06-03 12:59:15 -07:00
Justin Borromeo 8032c4add8 Add errors and state to stream supervisor status API endpoint (#7428)
* Add state and error tracking for seekable stream supervisors

* Fixed nits in docs

* Made inner class static and updated spec test with jackson inject

* Review changes

* Remove redundant config param in supervisor

* Style

* Applied some of Jon's recommendations

* Add transience field

* write test

* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()

* remove transience reporting and fix SeekableStreamSupervisorStateManager impl

* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests

* remove stateHistory because it wasn't adding much value, some fixes, and add more tests

* fix tests

* code review changes and add HTTP health check status

* fix test failure

* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager

* fixup after merge

* code review changes - add additional docs

* cleanup KafkaIndexTaskTest

* add additional documentation for Kinesis indexing

* remove unused throws class
2019-05-31 17:16:01 -07:00
Jihoon Son 7abfbb066a Bump up snapshot version to 0.16.0 (#7802) 2019-05-30 17:17:33 -07:00
Roman Leventov 782863ed0f Fix some problems reported by PVS-Studio (#7738)
* Fix some problems reported by PVS-Studio

* Address comments
2019-05-29 11:20:45 -07:00
Gian Merlino 7ec7257e1d Fix lookup serde on node types that don't load lookups. (#7752)
This includes the router, overlord, middleManager, and coordinator.
Does the following things:

- Loads LookupSerdeModule on MM, overlord, and coordinator.
- Adds LookupExprMacro to LookupSerdeModule, which allows these node
  types to understand that the 'lookup' function exists.
- Adds a test to make sure that LookupSerdeModule works for virtual
  columns, filters, transforms, and dimension specs.

This is implementing the technique discussed on these two issues:

- https://github.com/apache/incubator-druid/issues/7724#issuecomment-494723333
- https://github.com/apache/incubator-druid/pull/7082#discussion_r264888771
2019-05-24 12:30:49 -07:00
Merlin Lee 26fad7e06a Add checkstyle for "Local variable names shouldn't start with capital" (#7681)
* Add checkstyle for "Local variable names shouldn't start with capital"

* Adjust some local variables to constants

* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
2019-05-23 18:40:28 +02:00
Jonathan Wei 6901123a53 Fix compareAndSwap() in SQLMetadataConnector (#7661)
* Fix compareAndSwap() in SQLMetadataConnector

* Catch serialization_failure and retry for Postgres
2019-05-15 14:53:04 -07:00
Clint Wylie b87c8f0314 fix lookup editor to use lookup tiers instead of historical tiers (#7647)
* fix lookup editor to use lookup tiers instead of historical tiers

* use default tier if empty response, fix if configured lookups is null

* fixes

* fix typo
2019-05-14 13:30:51 -07:00
Fokko Driesprong 2aa9613bed Bump Checkstyle to 8.20 (#7651)
* Bump Checkstyle to 8.20

Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle

Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.

Affected versions: < 8.18

* Oops, missed one

* Oops, missed a few
2019-05-14 11:53:37 -07:00
Xavier Léauté 1d49364d08 Set direct memory if unable to detect JVM config (#7606)
* Set direct memory if unable to detect JVM config

Java 9 and above prevents us from detecting the maximum available direct
memory.

This change adds a fallback method to use at most 25% of maximum heap
size, which should be a reasonable default.

Unless -XX:MaxDirectMemorySize is set, recent JVMs will default maximum
direct memory to match the maximum heap size, so this should work out of
the box in most cases. For completeness we print instructions in the log
to explain how to adjust settings if necessary.

* skip test rather than succeeding

* reword log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>
2019-05-09 22:30:42 -07:00
Jihoon Son 18e0d6acb4 Fix resultLevelCache for timeseries with grandTotal (#7624)
* Fix resultLevelCache for timeseries with grandTotal

* Address comment

* fix test
2019-05-09 18:11:04 -07:00
Jonathan Wei dadf6a2f11
Add tool for migrating from local deep storage/Derby metadata (#7598)
* Add tool for migrating from local deep storage/Derby metadata

* Split deep storage and metadata migration docs

* Support import into Derby

* Fix create tables cmd

* Fix create tables cmd

* Fix commands

* PR comment

* Add -p
2019-05-06 23:39:40 -07:00
Xavier Léauté c58aa2f2ab
Remove unnecessary cast to URLClassLoader (#7603)
Java 9 and above will fail trying to cast the system classloader
2019-05-06 20:17:22 -07:00
Xavier Léauté f7bfe8f269
Update mocking libraries for Java 11 support (#7596)
* update easymock / powermock for to 4.0.2 / 2.0.2 for JDK11 support
* update tests to use new easymock interfaces
* fix tests failing due to easymock fixes
* remove dependency on jmockit
* fix race condition in ResourcePoolTest
2019-05-06 12:28:56 -07:00
Samarth Jain afbcb9c07f Improve parallelism of zookeeper based segment change processing (#7088)
* V1 - improve parallelism of zookeeper based segment change processing

* Create zk nodes in batches. Address code review comments.
Introduce various configs.

* Add documentation for the newly added configs

* Fix test failures

* Fix more test failures

* Remove prinstacktrace statements

* Address code review comments

* Use a single queue

* Address code review comments

Since we have a separate load peon for every historical, just having a single SegmentChangeProcessor
task per historical is enough. This commit also gets rid of the associated config druid.coordinator.loadqueuepeon.curator.numCreateThreads

* Resolve merge conflict

* Fix compilation failure

* Remove batching since we already have a dynamic config maxSegmentsInNodeLoadingQueue that provides that control

* Fix NPE in test

* Remove documentation for configs that are no longer needed

* Address code review comments

* Address more code review comments

* Fix checkstyle issue

* Address code review comments

* Code review comments

* Add back monitor node remove executor

* Cleanup code to isolate null checks  and minor refactoring

* Change param name since it conflicts with member variable name
2019-05-03 15:58:42 +02:00
Jonathan Wei a013350018 Adjust required permissions for system schema (#7579)
* Adjust required permissions for system schema

* PR comments, fix current_size handling

* Checkstyle

* Set curr_size instead of current_size

* Adjust information schema docs

* Fix merge conflict

* Update tests
2019-05-02 07:18:02 -07:00
David Lim ec8562c885 Data loader (sampler component) (#7531)
* sampler initial check-in
fix checkstyle issues
add sampler fix to process CSV files from cache properly
change to composition and rename some classes
add tests and report num rows read and indexed
remove excludedByFilter flag and don't send filtered out data
fix tests to handle both settings for druid.generic.useDefaultValueForNull

* wrap sampler firehose in TimedShutoffFirehoseFactory to support timeouts

* code review changes - add additional comments, limit maxRows
2019-05-01 22:37:14 -07:00
Surekha 15d19f3059 Add is_overshadowed column to sys.segments table (#7425)
* Add is_overshadowed column to sys.segments table

* update docs

* Rename class and variables

* PR comments

* PR comments

* remove unused variables in MetadataResource

* move constants together

* add getFullyOvershadowedSegments method to ImmutableDruidDataSource

* Fix compareTo of SegmentWithOvershadowedStatus

* PR comment

* PR comments

* PR comments

* PR comments

* PR comments

* fix issue with already consumed stream

* minor refactoring

* PR comments
2019-05-01 18:00:57 +02:00
Xavier Léauté 6d4181191f replace jdk internal exceptions with closest publicly available one 2019-04-30 14:21:45 -07:00
Gian Merlino 7b8bc9a5ef EmitterModule: Throw an error on invalid emitter types. (#7328)
* EmitterModule: Throw an error on invalid emitter types.

The current behavior of silently using the "noop" emitter is unhelpful
and makes it difficult to debug config typos.

* Add comments.
2019-04-29 19:23:53 +02:00
Gian Merlino ce7298b51e BaseAppenderatorDriver: Fix potentially overeager segment cleanup. (#7558)
* BaseAppenderatorDriver: Fix potentially overeager segment cleanup.

Here is a thing that I think can go wrong:

1. We push some segments, then try to publish them transactionally.
2. The segments are actually published, but the 200 OK response gets
   lost (connection dropped, whatever).
3. We try again, and on the second try, the publish fails (because
   the transaction baseline start metadata no longer matches).
4. Because the publish failed, we delete the pushed segments.
5. But this is bad, because the publish didn't really fail, it actually
   succeeded in step 2.

I haven't seen this in the wild, but thought about it while
reviewing #7537.

This patch also cleans up logging a bit, making it more accurate and
somewhat less chatty.

* Avoid wrapping exceptions when not necessary.
2019-04-29 09:55:04 -07:00
Justin Borromeo 07dd742e35 Fix time-ordered scan queries on realtime segments (#7546)
* Initial commit

* Added test for int to long conversion

* Add appenderator test for realtime scan query

* get rid of todo

* Fix forbidden apis

* Jon's recommendations

* Formatting
2019-04-26 16:12:10 -07:00
Adam Peck ebdf07b69f Add reload by interval API (#7490)
* Add reload by interval API
Implements the reload proposal of #7439
Added tests and updated docs

* PR updates

* Only build timeline with required segments
Use 404 with message when a segmentId is not found
Fix typo in doc
Return number of segments modified.

* Fix checkstyle errors

* Replace String.format with StringUtils.format

* Remove return value

* Expand timeline to segments that overlap for intervals
Restrict update call to only segments that need updating.

* Only add overlapping enabled segments to the timeline

* Some renames for clarity
Added comments

* Don't rely on cached poll data
Only fetch required information from DB

* Match error style

* Merge and cleanup doc

* Fix String.format call

* Add unit tests

* Fix unit tests that check for overshadowing
2019-04-26 16:01:50 -07:00
Surekha 8308ffef1f API to drop data by interval (#7494)
* Add api to drop data by interval

* update to address comments

* unused imports

* PR comments + add tests in SQLMetadataSegmentManagerTest

*  update tests and docs
2019-04-25 14:24:40 -07:00
Jihoon Son c60e7feab8 Fix encoded taskId check in chatHandlerResource (#7520)
* Fix encoded taskId check in chatHandlerResource

* fix tests
2019-04-20 18:08:34 -07:00
Surekha c2a42e05bb Fix result-level cache for queries (#7325)
* Add SegmentDescriptor interval in the hash while calculating Etag

* Add computeResultLevelCacheKey to CacheStrategy

Make HavingSpec cacheable and implement getCacheKey for subclasses
Add unit tests for computeResultLevelCacheKey

* Add more tests

* Use CacheKeyBuilder for HavingSpec's getCacheKey

* Initialize aggregators map to avoid NPE

* adjust cachekey builder for HavingSpec to ignore aggregators

* unused import

* PR comments
2019-04-18 13:31:29 -07:00
Xavier Léauté 4322ce3303 Java 9 compatible cleaner operations (#7487)
Java 9 removed support for sun.misc.Cleaner in favor of
java.lang.ref.Cleaner. This change adds a thin abstraction to switch
between Cleaner implementations based on JDK version at runtime
2019-04-17 08:04:52 -07:00
Lucas Capistrant 8acad27d99 Enhance the Http Firehose to work with URIs requiring basic authentication (#7145)
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication

* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials

* Fix checkstyle failure in HttpFirehoseFactory.java

* Update docs and fix TeamCity build with required noinspection

* Indentation cleanup and logic modification for HttpFirehose object stream

* Remove default Empty string password provider in http firehose

* Add JavaDoc for MixIn describing its intended use

* Reverting documentation notation for json code to be inline with rest of doc

* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs

* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
2019-04-15 14:29:01 -07:00
Surekha 4654e1e851 Remove unnecessary collection (#7350)
From the discussion [here](https://github.com/apache/incubator-druid/pull/6901#discussion_r265741002)

Remove the collection and filter datasources from the stream. 
Also remove StreamingOutput and JsonFactory constructs.
2019-04-15 19:49:21 +02:00
Gian Merlino 3854cfd15e SQLMetadataSegmentManager: Comments, formatting adjustments (#7452)
Follow up to #7447.
2019-04-11 21:57:50 -07:00
Gian Merlino a517f8ce49 Coordinator: Allow dropping all segments. (#7447)
Removes the coordinator sanity check that prevents it from dropping all
segments. It's useful to get rid of this, since the behavior is
unintuitive for dev/testing clusters where users might regularly want
to drop all their data to get back to a clean slate.

But the sanity check was there for a reason: to prevent a race condition
where the coordinator might drop all segments if it ran before the
first metadata store poll finished. This patch addresses that concern
differently, by allowing methods in MetadataSegmentManager to return
null if a poll has not happened yet, and canceling coordinator runs
in that case.

This patch also makes the "dataSources" reference in
SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile
before, but it seems necessary to me: it's not final, and it's dereferenced
from multiple threads without synchronization.
2019-04-11 08:45:38 -07:00
Justin Borromeo 2771ed50b0 Support Kafka supervisor adopting running tasks between versions (#7212)
* Recompute hash in isTaskCurrent() and added tests

* Fixed checkstyle stuff

* Fixed failing tests

* Make TestableKafkaSupervisorWithCustomIsTaskCurrent static

* Add doc

* baseSequenceName change

* Added comment

* WIP

* Fixed imports

* Undid lambda change for diff sake

* Cleanup

* Added comment

* Reinsert Kafka tests

* Readded kinesis test

* Readd bad partition assignment in kinesis supervisor test

* Nit

* Misnamed var
2019-04-10 18:16:38 -07:00
Clint Wylie 76b4a5c62e refactor lookups to be more chill to router (#7222)
* refactor lookups to be more chill to router

* remove accidental change

* fix and combine LookupIntrospectionResourceTest

* fix inspection

* rename RouterLookupModule to LookupSerdeModule and RouterLookupExtractorFactoryContainerProvider to NoopLookupExtractorFactoryContainerProvider

* make comment generic

* use ConfigResourceFilter instead of StateResourceFilter

* fix indentation

* unused import

* another unused import

* refactor some stuff into processing module, split up LookupModule.java classes into their own files
2019-04-05 14:49:41 -07:00
Gian Merlino 78745fea84 Fix two issues with Coordinator -> Overlord communication. (#7412)
* Fix two issues with Coordinator -> Overlord communication.

1) ClientCompactQuery needs to recognize the potential for 'intervals'
to be set instead of 'segments'. The lack of this led to a
NullPointerException on DruidCoordinatorSegmentCompactor.java:102.

2) In two locations (DruidCoordinatorSegmentCompactor,
DruidCoordinatorCleanupPendingSegments) tasks were being retrieved
using waiting/pending/running tasks in the wrong order: by checking
'running' first and then 'pending', tasks could be missed if they
moved from 'pending' to 'running' in between the two calls. Replaced
these methods with calls to 'getActiveTasks', a new method that does
the calls in the right order.

* Remove unused import.
2019-04-04 10:25:18 -07:00
David Glasser 4e23c11345 Make IngestSegmentFirehoseFactory splittable for parallel ingestion (#7048)
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion

* Code review feedback

- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)

* Add missing @JsonProperty

* Integration test passes

* Add unit test

* Remove two FIXME comments from CompactionTask

I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.

* Updates from code review
2019-04-02 14:59:17 -07:00
Michael Trelinski 347779b17a Zookeeper loss (#6740)
* Update init

Fix bin/init to source from proper directory.

* Fix for Proposal #6518: Shutdown druid processes upon complete loss of ZK connectivity

* Zookeeper Loss:

- Add feature documentation
- Cosmetic refactors
- Variable extractions
- Remove getter

* - Change config key name and reword documentation
- Switch from Function<Void,Void> to Runnable/Lambda
- try { … } finally { … }

* Fix line length too long

* - change to formatted string for logging
- use System.err.println after lifecycle stops

* commenting on makeEnsembleProvider()-created Zookeeper termination

* Add javadoc

* added java doc reference back to apache discussion thread.

* move comment to other class

* favor two-slash comments instead of multiline comments
2019-03-29 15:10:42 -07:00
Jihoon Son 62c3e89266 maxTotalRows should be checked in DataSourceCompactionConfig before setting targetCompactionSizeBytes (#7368)
* maxTotalRows should be checked in DataSourceCompactionConfig before setting targetCompactionSizeBytes

* remove unnecessary default values

* remove flacky test

* fix build

* Add comments
2019-03-28 20:25:10 -07:00
Justin Borromeo ad7862c58a Time Ordering On Scans (#7133)
* Moved Scan Builder to Druids class and started on Scan Benchmark setup

* Need to form queries

* It runs.

* Stuff for time-ordered scan query

* Move ScanResultValue timestamp comparator to a separate class for testing

* Licensing stuff

* Change benchmark

* Remove todos

* Added TimestampComparator tests

* Change number of benchmark iterations

* Added time ordering to the scan benchmark

* Changed benchmark params

* More param changes

* Benchmark param change

* Made Jon's changes and removed TODOs

* Broke some long lines into two lines

* nit

* Decrease segment size for less memory usage

* Wrote tests for heapsort scan result values and fixed bug where iterator
wasn't returning elements in correct order

* Wrote more tests for scan result value sort

* Committing a param change to kick teamcity

* Fixed codestyle and forbidden API errors

* .

* Improved conciseness

* nit

* Created an error message for when someone tries to time order a result
set > threshold limit

* Set to spaces over tabs

* Fixing tests WIP

* Fixed failing calcite tests

* Kicking travis with change to benchmark param

* added all query types to scan benchmark

* Fixed benchmark queries

* Renamed sort function

* Added javadoc on ScanResultValueTimestampComparator

* Unused import

* Added more javadoc

* improved doc

* Removed unused import to satisfy PMD check

* Small changes

* Changes based on Gian's comments

* Fixed failing test due to null resultFormat

* Added config and get # of segments

* Set up time ordering strategy decision tree

* Refactor and pQueue works

* Cleanup

* Ordering is correct on n-way merge -> still need to batch events into
ScanResultValues

* WIP

* Sequence stuff is so dirty :(

* Fixed bug introduced by replacing deque with list

* Wrote docs

* Multi-historical setup works

* WIP

* Change so batching only occurs on broker for time-ordered scans

Restricted batching to broker for time-ordered queries and adjusted
tests

Formatting

Cleanup

* Fixed mistakes in merge

* Fixed failing tests

* Reset config

* Wrote tests and added Javadoc

* Nit-change on javadoc

* Checkstyle fix

* Improved test and appeased TeamCity

* Sorry, checkstyle

* Applied Jon's recommended changes

* Checkstyle fix

* Optimization

* Fixed tests

* Updated error message

* Added error message for UOE

* Renaming

* Finish rename

* Smarter limiting for pQueue method

* Optimized n-way merge strategy

* Rename segment limit -> segment partitions limit

* Added a bit of docs

* More comments

* Fix checkstyle and test

* Nit comment

* Fixed failing tests -> allow usage of all types of segment spec

* Fixed failing tests -> allow usage of all types of segment spec

* Revert "Fixed failing tests -> allow usage of all types of segment spec"

This reverts commit ec470288c7.

* Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge"

This reverts commit 57033f36df, reversing
changes made to 8f01d8dd16.

* Check type of segment spec before using for time ordering

* Fix bug in numRowsScanned

* Fix bug messing up count of rows

* Fix docs and flipped boolean in ScanQueryLimitRowIterator

* Refactor n-way merge

* Added test for n-way merge

* Refixed regression

* Checkstyle and doc update

* Modified sequence limit to accept longs and added test for long limits

* doc fix

* Implemented Clint's recommendations
2019-03-28 14:37:09 -07:00
Charles Allen eeb3dbe79d Move GCP to a core extension (#6953)
* Move GCP to a core extension

* Don't provide druid-core >.<

* Keep AWS and GCP modules separate

* Move AWSModule to its own module

* Add aws ec2 extension and more modules in more places

* Fix bad imports

* Fix test jackson module

* Include AWS and GCP core in server

* Add simple empty method comment

* Update version to 15

* One more 0.13.0-->0.15.0 change

* Fix multi-binding problem

* Grep for s3-extensions and update docs

* Update extensions.md
2019-03-27 09:00:43 -07:00
Jihoon Son 543324f8a9 Fix logging in IndexerSQLMetadataStorageCoordinator (#7349) 2019-03-26 20:36:19 -07:00
Jihoon Son 4d37edac1e Suppress stack trace in warning (#7348) 2019-03-26 17:27:29 -07:00
Jihoon Son 5294277cb4
Fix exclusive start partitions for sequenceMetadata (#7339)
* Fix exclusvie start partitions for sequenceMetadata

* add empty check
2019-03-26 14:39:07 -07:00
Roman Leventov bca40dcdaf
Fix some IntelliJ inspections (#7273)
Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.
2019-03-25 21:11:01 -03:00
Jihoon Son f410c28af6
Always convert start metadata to start (#7332) 2019-03-22 21:12:15 -07:00
Jihoon Son 0c5dcf5586 Fix exclusivity for start offset in kinesis indexing service & check exclusivity properly in IndexerSQLMetadataStorageCoordinator (#7291)
* Fix exclusivity for start offset in kinesis indexing service

* some adjustment

* Fix SeekableStreamDataSourceMetadata

* Add missing javadocs

* Add missing comments and unit test

* fix SeekableStreamStartSequenceNumbers.plus and add comments

* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue

* Add javadocs

* fix compilation

* fix test

* remove unused variable
2019-03-21 13:12:22 -07:00
Roman Leventov dfd27e00c0
Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality (#7185)
* Avoid many  unnecessary materializations of collections of 'all segments in cluster' cardinality

* Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier()

* More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet()

* Style

* typo

* Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives

* Fixes
2019-03-19 18:22:56 -03:00
Jihoon Son e18d5d96d9 Ignore bad JSON entries in SQLMetadataSupervisorManager.getAll() (#7278) 2019-03-18 14:28:11 +08:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Atul Mohan 2daeb50008 Add support for optional client authentication on TLS (#7250)
* Add optional client auth

* Add docs
2019-03-15 15:14:34 -07:00
Furkan KAMACI 7ada1c49f9 Prohibit Throwables.propagate() (#7121)
* Throw caught exception.

* Throw caught exceptions.

* Related checkstyle rule is added to prevent further bugs.

* RuntimeException() is used instead of Throwables.propagate().

* Missing import is added.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.

* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.

* Throwable is kept before firing a Runtime Exception.

* Fix unused assignments.
2019-03-14 18:28:33 -03:00
Hongze Zhang f9d99b245b Add missing doc link for operations/http-compression.html; Fix magic numbers in test cases using JettyServerInitUtils.wrapWithDefaultGzipHandler (#7110) 2019-03-13 14:09:19 -07:00
Clint Wylie 3895914aa2 consolidate CompressionUtils.java since now in the same jar (#6908) 2019-03-13 11:02:44 -04:00
Clint Wylie 4d3987c1dd lifecycle stage refactor to ensure proper start and stop ordering of servers and announcements (#7234)
* lifecycle stage refactor to ensure proper ordering of servers and announcements

* move DerivativeDataSourceManager to Lifecycle.Stage.NORMAL
2019-03-12 07:09:03 -07:00
Jihoon Son e240fba247 Fix logs in SegmentLoaderLocalCacheManager (#7229) 2019-03-11 21:16:03 -07:00
Gian Merlino dcfca03718
More accurate RealtimeMetricsMonitor messages. (#7230)
The old messages did not reflect the full range of reasons why messages
could be thrown away.
2019-03-11 19:50:32 -04:00
Samarth Jain 8804bd0dc1 Remove unnecessary check for contains() in LoadRule (#7073)
See https://github.com/apache/incubator-druid/issues/7072
2019-03-11 13:52:46 -03:00
Clint Wylie 5cc171419c move jetty module to Lifecycle.Stage.LAST to allow graceful shutdown to work with lookups and stuff, put http-clint on lifecycle modules lifecycle (#7215) 2019-03-09 15:14:09 -08:00
Jihoon Son 9bebf113ba
Fix race in historical when loading segments in parallel (#7203)
* Fix race in historical when loading segments in parallel

* revert unnecessary change

* remove synchronized

* add reference counting locking

* fix build

* fix comment
2019-03-08 17:54:05 -08:00
Clint Wylie a44df6522c rename maintenance mode to decommission (#7154)
* rename maintenance mode to decommission

* review changes

* missed one

* fix straggler, add doc about decommissioning stalling if no active servers

* fix missed typo, docs

* refine docs

* doc changes, replace generals

* add explicit comment to mention suppressed stats for balanceTier

* rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs

* fix precondition check

* decommissioningMaxPercentOfMaxSegmentsToMove

* fix test

* fix test

* fixes
2019-03-08 16:33:51 -08:00
Roman Leventov 10c9f6d708
Fix and document concurrency of EventReceiverFirehose and TimedShutoffFirehose; Refine concurrency specification of Firehose (#7038)
#### `EventReceiverFirehoseFactory`
Fixed several concurrency bugs in `EventReceiverFirehoseFactory`:
 - Race condition over putting an entry into `producerSequences` in `checkProducerSequence()`.
 - `Stopwatch` used to measure time across threads, but it's a non-thread-safe class.
 - Use `System.nanoTime()` instead of `System.currentTimeMillis()` because the latter are [not suitable](https://stackoverflow.com/a/351571/648955)  for measuring time intervals.
 - `close()` was not synchronized by could be called from multiple threads concurrently.

Removed unnecessary `readLock` (protecting `hasMore()` and `nextRow()` which are always called from a single thread). Removed unnecessary `volatile` modifiers.

Documented threading model and concurrent control flow of `EventReceiverFirehose` instances.

**Important:** please read the updated Javadoc for `EventReceiverFirehose.addAll()`. It allows events from different requests (batches) to be interleaved in the buffer. Is this OK?

#### `TimedShutoffFirehoseFactory`
- Fixed a race condition that was possible because `close()` that was not properly synchronized.

Documented threading model and concurrent control flow of `TimedShutoffFirehose` instances.

#### `Firehose`

Refined concurrency contract of `Firehose` based on `EventReceiverFirehose` implementation. Importantly, now it states that `close()` doesn't affect `hasMore()` and `nextRow()` and could be called concurrently with them. In other words, specified that `close()` is for "row supply" side rather than "row consume" side. However, I didn't check that other `Firehose` implementatations adhere to this contract.

<hr>

This issue is the result of reviewing `EventReceiverFirehose` and `TimedShutoffFirehose` using [this checklist](https://medium.com/@leventov/code-review-checklist-java-concurrency-49398c326154).
2019-03-04 18:50:03 -03:00
Himanshu Pandey 8b803cbc22 Added checkstyle for "Methods starting with Capital Letters" (#7118)
* Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this.

* Un-abbreviate the method names in the calcite tests

* Fixed checkstyle errors

* Changed asserts position in the code
2019-02-23 20:10:31 -08:00
David Glasser 1c2753ab90 ParallelIndexSubTask: support ingestSegment in delegating factories (#7089)
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.

This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.

Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.

This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.
2019-02-23 17:02:56 -08:00
Jihoon Son 4e2b085201
Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911)
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file

* delete descriptor.file when killing segments

* fix test

* Add doc for ha

* improve warning
2019-02-20 15:10:29 -08:00
Mingming Qiu dd34691004 Coordinator await initialization before finishing startup (#6847)
* Curator server inventory await initialization

* address comments

* print exception object in log

* remove throws ISE

* cachingCost awaitInitialization default to false
2019-02-20 11:56:23 -08:00
Justin Borromeo c7eeeabf45 2528 Replace Incremental Index Global Flags with Getters (#7043)
* Eliminated reportParseExceptions and deserializeComplexMetrics

* Removed more global flags

* Cleanup

* Addressed Surekha's recommendations
2019-02-15 13:36:46 -08:00
Jihoon Son 1701fbcad3
Improve error message for revoked locks (#7035)
* Improve error message for revoked locks

* fix test

* fix test

* fix test

* fix toString
2019-02-13 11:22:48 -08:00
Jihoon Son d42de574d6 Add an api to get all lookup specs (#7025)
* Add an api to get all lookup specs

* add doc
2019-02-08 11:05:59 -08:00
Jonathan Wei fafbc4a80e
Set version to 0.15.0-incubating-SNAPSHOT (#7014) 2019-02-07 14:02:52 -08:00
Jonathan Wei 8bc5eaa908
Set version to 0.14.0-incubating-SNAPSHOT (#7003) 2019-02-04 19:36:20 -08:00
Egor Riashin 97b6407983 maintenance mode for Historical (#6349)
* maintenance mode for Historical

forbidden api fix, config deserialization fix

logging fix, unit tests

* addressed comments

* addressed comments

* a style fix

* addressed comments

* a unit-test fix due to recent code-refactoring

* docs & refactoring

* addressed comments

* addressed a LoadRule drop flaw

* post merge cleaning up
2019-02-04 18:11:00 -08:00
Roman Leventov 0e926e8652 Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898)
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool

* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java

* Remove unnecessary comment

* Checkstyle

* Fix getFromExtensions()

* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization

* Fix UriCacheGeneratorTest

* Workaround issue with MaterializedViewQueryQueryToolChest

* Strengthen Appenderator's contract regarding concurrency
2019-02-04 09:18:12 -08:00
Surekha 7baa33049c Introduce published segment cache in broker (#6901)
* Add published segment cache in broker

* Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set

* Added a trueEquals to DataSegment class

* Use separate interner for realtime and historical segments

* Remove trueEquals as it's not used anymore, change log message

* PR comments

* PR comments

* Fix tests

* PR comments

* Few more modification to

* change the coordinator api
* removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments
* Change the poll behaviour to avoid multiple poll execution at same time

* minor changes

* PR comments

* PR comments

* Make the segment cache in broker off by default

* Added a config to PlannerConfig
* Moved MetadataSegmentView to sql module

* Add doc for new planner config

* Update documentation

* PR comments

* some more changes

* PR comments

* fix test

* remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR

* minor changes

* some changes to initialization

* use pollPeriodInMS

* Add boolean cachePopulated to check if first poll succeeds

* Remove poll from start()

* take the log message out of condition in stop()
2019-02-02 22:27:13 -08:00
Vadim Ogievetsky 7f1b19bfb1 Adding a Unified web console. (#6923)
* Adding new web console.

* fixed css

* fix form height

* fix typo

* do import custom react-table css

* added repo field so npm does not complain

* ask travis for node 10

* move indexing-service/src/main/resources/indexer_static into web-console

* fix resource names and paths

* add licenses

* fix exclude file

* add licenses to misc files and tidy up

* remove rebase marker

* fix link

* updated env variable name

* tidy up licenses and surface errors

* cleanup

* remove unused code, fix missing await

* TeamCity does not like the name aux

* add more links to tasks view

* rm pages

* update gitignore

* update readme to be accurate

* make clean script

* removed old console dependancy

* update Jetty routes

* add a comment for welcome files for coordinator

* do not show inital notifaction for now

* renamed overlord console back to console.html

* fix coordinator console

* rename coordinator-console.html to index.html
2019-01-31 17:26:41 -08:00
Jihoon Son e56c598cc1 Fall back to the old coordinator API for checking segment handoff if new one is not supported (#6966) 2019-01-31 08:50:46 -08:00
Benedict Jin 72a571fbf7 For performance reasons, use `java.util.Base64` instead of Base64 in Apache Commons Codec and Guava (#6913)
* * Add few methods about base64 into StringUtils

* Use `java.util.Base64` instead of others

* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis

* Rename encodeBase64String & decodeBase64String

* Update druid-forbidden-apis
2019-01-25 17:32:29 -08:00
Roman Leventov 8eae26fd4e Introduce SegmentId class (#6370)
* Introduce SegmentId class

* tmp

* Fix SelectQueryRunnerTest

* Fix indentation

* Fixes

* Remove Comparators.inverse() tests

* Refinements

* Fix tests

* Fix more tests

* Remove duplicate DataSegmentTest, fixes #6064

* SegmentDescriptor doc

* Fix SQLMetadataStorageUpdaterJobHandler

* Fix DataSegment deserialization for ignoring id

* Add comments

* More comments

* Address more comments

* Fix compilation

* Restore segment2 in SystemSchemaTest according to a comment

* Fix style

* fix testServerSegmentsTable

* Fix compilation

* Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes

* Fix SystemSchemaTest

* Fix style

* Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment

* Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164

* Fix compilation
2019-01-21 11:11:10 -08:00
Clint Wylie 8ba33b2505 add 'init' lifecycle stage for finer control over startup and shutdown (#6864)
* add Lifecycle.Stage.INIT, put log shutter downer in init stage, tests, rad startup banner

* log cleanup

* log changes

* add task-master lifecycle to module lifecycle to gracefully stop task-master stuff

* fix it the right way

* remove announce spam

* unused import

* one more log

* updated comments

* wrap leadership lifecycle stop to prevent exceptions from wrecking rest of task master stop

* add precondition check
2019-01-21 09:01:36 -08:00
Mingming Qiu b704ebfa37 Let cachingCost balancer strategy only consider segment replicatable nodes (#6879) 2019-01-17 09:26:33 -08:00
Jihoon Son a07e66c540 Fix auto compaction to compact only same or abutting intervals (#6808)
* Fix auto compaction to compact only same or abutting intervals

* fix test
2019-01-16 14:54:11 -08:00
Dayue Gao 5b8a221713 Add SQL id, request logs, and metrics (#6302)
* use SqlLifecyle to manage sql execution, add sqlId

* add sql request logger

* fix UT

* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc

* add docs and more sql request logger impls

* add UT for http and jdbc

* fix forbidden use of com.google.common.base.Charsets

* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId

* do not use default method in QueryMetrics interface

* capitalize 'sql' everywhere in the non-property parts of the docs

* use RequestLogger interface to log sql query

* minor bugfixes and add switching request logger

* add filePattern configs for FileRequestLogger

* address review comments, adjust sql request log format

* fix inspection error

* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
2019-01-15 23:12:59 -08:00
Jonathan Wei 8537a771b0 Some fixes and tests for spaces/non-ASCII chars in datasource names (#6761)
* Fixes and tests for spaces/non-ASCII datasource names

* Some unit test fixes

* Fix ITRealtimeIndexTaskTest

* Checkstyle

* TeamCity

* PR comments
2019-01-15 08:33:31 -08:00
Surekha f72f33f84a Fix num_replicas count in sys.segments table (#6804)
* Fix num_replicas count from sys.segments

* Adjust unit test for num_replica > 1

* Pass named arguments instead of passing boolean constants

* Address PR comments

* PR comments
2019-01-15 08:31:29 -08:00
Charles Allen 5d2947cd52 Use Guava Compatible immediate executor service (#6815)
* Use multi-guava version friendly direct executor implementation

* Don't use a singleton

* Fix strict compliation complaints

* Copy Guava's DirectExecutor

* Fix javadoc

* Imports are the devil
2019-01-11 10:42:19 -08:00
Jihoon Son c35a39d70b
Add support maxRowsPerSegment for auto compaction (#6780)
* Add support maxRowsPerSegment for auto compaction

* fix build

* fix build

* fix teamcity

* add test

* fix test

* address comment
2019-01-10 09:50:14 -08:00
Mingming Qiu 8ebb7b558b Handoff should ignore segments that are dropped by drop rules (#6676)
* Handoff should ignore segments that are dropped by drop rules

* fix travis-ci

* fix tests

* address comments

* remove line added by accident

* address comments

* add javadoc and logging the full stack trace of exception

* add error message
2019-01-07 14:43:11 -08:00
Mingming Qiu 636964fcb5 Fix issue that tasks failed because of no sink for identifier (#6724)
* Fix issue that tasks failed because of no sink for identifier

* make find sinks to persist run in one callable together with the actual persist work

* Revert "make find sinks to persist run in one callable together with the actual persist work"

This reverts commit a24a2d80ae.
2019-01-04 17:09:11 -08:00
elloooooo 832a3b16ed Improve slfj logger input for MDC field:datasource (#6787)
* improve slfj logger  MDC datasource input

* add some UT and isNested field
2019-01-03 18:00:04 -08:00
Jihoon Son 9ad6a733a5 Add support segmentGranularity for CompactionTask (#6758)
* Add support segmentGranularity

* add doc and fix combination of options

* improve doc
2019-01-03 17:50:45 -08:00
Mingming Qiu 114a9fc38f change propertyBase in ServerViewModule (#6774) 2019-01-02 16:44:02 +08:00
Jihoon Son fa7cb906e4 Fix auto compaction to consider intervals of running tasks (#6767)
* Fix auto compaction to consider intervals of running tasks

* adjust initial collection size
2018-12-27 18:03:44 -08:00
Gian Merlino 7a09cde4de
Broker: Await initialization before finishing startup. (#6742)
* Broker: Await initialization before finishing startup.

In particular, hold off on announcing the service and starting the
HTTP server until the server view and SQL metadata cache are finished
initializing. This closes a window of time where a Broker could return
partial results shortly after startup.

As part of this, some simplification of server-lifecycle service
announcements. This helps ensure that the two different kinds of
announcements we do (legacy and new-style) stay in sync.

* Remove unused imports.

* Fix NPE in ServerRunnable.
2018-12-18 20:32:31 -08:00
Clint Wylie 9505074530 fix log typo (#6755)
* fix log typo, add DataSegmentUtils.getIdentifiersString util method

* fix indecisive oops
2018-12-18 15:10:25 -08:00
Jihoon Son f0ee6bf898 Fix auto compaction when the firstSegment is in skipOffset (#6738)
* Fix auto compaction when the firstSegment is in skipOffset

* remove duplicate
2018-12-18 19:10:46 +08:00
Clint Wylie 486c6f3cf9 emit logs that are only useful for debugging at debug level (#6741)
* make logs that are only useful for debugging be at debug level so log volume is much more chill

* info level messages for total merge buffer allocated/free

* more chill compaction logs
2018-12-17 14:20:28 +08:00
Jonathan Wei c713116a75 Use @Coordinator leader client in CoordinatorRuleManager (#6729) 2018-12-16 15:18:09 -08:00
Gian Merlino 04e7c7fbdc FilteredRequestLogger: Fix start/stop, invalid delegate behavior. (#6637)
* FilteredRequestLogger: Fix start/stop, invalid delegate behavior.

Fixes two bugs:

1) FilteredRequestLogger did not start/stop the delegate.

2) FilteredRequestLogger would ignore an invalid delegate type, and
instead silently substitute the "noop" logger. This was due to a larger
problem with RequestLoggerProvider setup in general; the fix here is
to remove "defaultImpl" from the RequestLoggerProvider interface, and
instead have JsonConfigurator be responsible for creating the
default implementations. It is stricter about things than the old system
was, and is only willing to make a noop logger if it doesn't see any
request logger configs. Otherwise, it'll raise a provision error.

* Remove unneeded annotations.
2018-12-14 16:55:44 +08:00
dongyifeng 91e3cf7196 add charset UTF-8 to log api (#6709)
When I retrieve the task log in browser, the Chinese characters all end up as garbage.
![image](https://user-images.githubusercontent.com/1322134/49502749-bd614080-f8b0-11e8-839e-07f7117eebfd.png)
After adding charset UTF-8, it was correct.
![image](https://user-images.githubusercontent.com/1322134/49502804-dc5fd280-f8b0-11e8-916b-bda8f1e7f318.png)
2018-12-12 16:31:04 +01:00
Atul Mohan 86e3ae5b48 Add fail message (#6720) 2018-12-11 08:05:50 -08:00
Mingming Qiu e8dd3716b8
add close method in Cache interface (#6540)
* add close method in Cache interface

* address comments

* address comments and fix travis-ci

* use try-finally
2018-12-06 17:28:41 +08:00
Mingming Qiu 607339003b Add TaskCountStatsMonitor to monitor task count stats (#6657)
* Add TaskCountStatsMonitor to monitor task count stats

* address comments

* add file header

* tweak test
2018-12-04 13:37:17 -08:00
Clint Wylie a1c9d0add2 autosize processing buffers based on direct memory sizing by default (#6588)
* autosize processing buffers based on direct memory sizing

* remove oops, more test

* max 1gb autosize buffers, test, start of docs

* fix oops

* revert accidental change

* print buffer size in exception

* change the things
2018-12-03 18:40:02 -07:00
Clint Wylie 43adb391c2 remove AbstractResourceFilter.isApplicable because it is not (#6691)
* remove AbstractResourceFilter.isApplicable because it is not, add tests for OverlordResource.doShutdown and OverlordResource.shutdownTasksForDatasource

* cleanup
2018-12-01 21:52:31 +08:00
Roman Leventov ec38df7575
Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() (#6606)
* Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() method; prohibit and eliminate some suboptimal Java 8 patterns

* Fix style

* Fix HttpEmitterTest.timeoutEmptyQueue()

* Add DruidNodeDiscovery.Listener.nodeViewInitialized() calls in tests

* Clarify code
2018-12-01 01:12:56 +01:00
Jihoon Son d6539abd0a Fix overlord api and console (#6686)
* Fix overlord APIs and console

* remove getRunningTasksByDataSource

* add missing path to isApplicable
2018-11-29 23:45:28 -08:00
hate13 f4b49f01ff add rule count on log (#6467)
* add rule count on log

* add final
2018-11-28 16:08:38 +08:00
Mingming Qiu 9a89200607 Emit query metrics even if the ETags are equal (#6663) 2018-11-27 15:18:01 -08:00
Jihoon Son 219f0965dc Remove duplicate DataSegmentTest (#6669) 2018-11-27 15:13:39 -08:00
seoeun 22a5bf97a2 Fix issue that tasks tables in metadata storage are not cleared (#6592)
* tasks tables in metadata storage are not cleared

* address comments. remove tasklogs and revert obsolete changes

* address comments. change comment and update doc.

* address comments. update doc more detailed

* address comments. remove redundant log and update doc more detailed.

* address comments. update document
2018-11-22 11:50:31 +08:00
Gian Merlino 92cce04165 Fix missing default config in some calls to coordinator dynamic configs. (#6652)
* Fix missing default config in some calls to coordinator dynamic configs.

The lack of a default config meant that if someone called an API
_without_ a default config before one _with_ a default config, then
the default value would get stuck at null instead of the intended
default value. I noticed this in a cluster where calling /druid/coordinator/v1/config
before a coordinator had fully started up would lead to NPEs during
DruidCoordinatorRuleRunner.

This patch makes the default configs consistent across all calls.

* Remove unnecessary null check.
2018-11-22 10:25:39 +08:00
Roman Leventov 87b96fb1fd
Add checkstyle rules about imports and empty lines between members (#6543)
* Add checkstyle rules about imports and empty lines between members

* Add suppressions

* Update Eclipse import order

* Add empty line

* Fix StatsDEmitter
2018-11-20 12:42:15 +01:00
Roman Leventov 8f3fe9cd02 Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies (#6607)
* Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies

* Fix bug

* Replace checkstyle regexp with IntelliJ inspection
2018-11-15 13:21:34 -08:00
Jihoon Son 0395d554e1 Properly reset total size of segmentsToCompact in NewestSegmentFirstIterator (#6622)
* Properly reset total size of segmentsToCompact in NewestSegmentFirstIterator

* add test
2018-11-15 01:00:51 -08:00
Jihoon Son 7b262b7123 Remove unnecessary path param from auto compaction api (#6594)
* Remove unnecessary path param from auto compaction api

* fix ci
2018-11-13 09:46:13 -08:00
David Lim afb239b17a add missing license headers, in particular to MD files; clean up RAT … (#6563)
* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg
2018-11-13 09:38:37 -08:00
Roman Leventov 54351a5c75 Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490)
* Fix various bugs; Enable more IntelliJ inspections and update error-prone

* Fix NPE

* Fix inspections

* Remove unused imports
2018-11-06 14:38:08 -08:00
Surekha bcb754d066 Use current coordinator leader instead of cached one (#6551) (#6552)
* Use current coordinator leader instead of cached one (#6551)

Check the response status and throw exception if not OK

* Modify tests

* PR comment

* Add the correct check for status of BytesAccumulatingResponseHandler

* Move the status check into JsonParserIterator so sql query outputs meaningful message on failure

* Fix tests
2018-11-06 13:09:51 -08:00
QiuMM 7b34662462 Period load/drop/broadcast rules should include the future by default (#6414)
* Period load/drop/broadcast rules should include the future by default

* address comments

* adjust coordinator console and tweak docs

* address comments

* fix travis-ci
2018-11-01 09:43:34 -07:00
Roman Leventov 2cdce2e2a6
Add RequestLogEventBuilderFactory (#6477)
This PR allows to control the fields in `RequestLogEvent`, emitted in `EmittingRequestLogger`. In our case, we want to get rid of the `intervals` fields of the query objects that are a part of `DefaultRequestLogEvent`. They are enormous (thousands of segments) and not useful.

Related to #5522, FYI @a2l007.
2018-10-31 22:24:37 +01:00
QiuMM 676f5e6d7f Prohibit some guava collection APIs and use JDK collection APIs directly (#6511)
* Prohibit some guava collection APIs and use JDK APIs directly

* reset files that changed by accident

* sort codestyle/druid-forbidden-apis.txt alphabetically
2018-10-29 13:02:43 +01:00
Jonathan Wei b2d9b6f23d Allow custom TLS cert checks (#6432)
* Allow custom TLS cert checks

* PR comment

* Checkstyle, PR comment
2018-10-24 16:31:52 -07:00
QiuMM 601183b4c7 Add period drop before rule (#6415)
* Add period drop before rule

* add license header

* support period drop before rule in coordinator console

* address comments
2018-10-24 12:44:30 -07:00
Roman Leventov 84ac18dc1b
Catch some incorrect method parameter or call argument formatting patterns with checkstyle (#6461)
* Catch some incorrect method parameter or call argument formatting patterns with checkstyle

* Fix DiscoveryModule

* Inline parameters_and_arguments.txt

* Fix a bug in PolyBind

* Fix formatting
2018-10-23 07:17:38 -03:00
Faxian Zhao c5bf4e7503 update insert pending segments logic to synchronous (#6336)
* 1. Mysql default transaction isolation is REPEATABLE_READ, treat it as READ_COMMITTED will reduce insert id conflict.
2. Add an index to 'dataSource used end' is work well for the most of scenarios(get recently segments), and it will speed up sync add pending segments in DB.
3. 'select and insert' is not need within transaction.

* Use TaskLockbox.doInCriticalSection instead of synchronized syntax to speed up insert pending segments.

* fix typo for NullPointerException
2018-10-22 19:48:20 -07:00
Samarth Jain 359576a80b Implement force push down for nested group by query (#5471)
* Force nested query push down

* Code review changes
2018-10-22 13:43:47 -07:00
QiuMM f5f4171a45 QueryCountStatsMonitor: emit query/count (#6473)
Let `QueryCountStatsMonitor` emit `query/count`, then I can monitor QPS of my services, or I have to count it by myself.
2018-10-19 10:15:02 -03:00
patelh c780aacc03 Add ability to specify dbcp properties file (#6419)
* Add ability to specify dbcp properties file

* Address PR comments, use mock config, remove setter

* Add documentation

* APRC, updated docs with example file contents

* APRC, add @Nullable, @VisibileForTesting, update doc

* APRC, remove error log, use props directly as jackson binding

* Remove unused files
2018-10-16 12:27:19 -07:00
QiuMM 85a89e2703 make druid node bind address configurable (#6464)
* make druid node bind address configurable

* fix tests

* fix travis-ci
2018-10-15 14:19:40 -07:00
Roman Leventov aa121da25f Use NodeType enum instead of Strings (#6377)
* Use NodeType enum instead of Strings

* Make NodeType constants uppercase

* Fix CommonCacheNotifier and NodeType/ServerType comments

* Reconsidering comment

* Fix import

* Add a comment to CommonCacheNotifier.NODE_TYPES
2018-10-14 20:49:38 -07:00
Clint Wylie 84598fba3b combine druid-api, druid-common, java-util into druid-core (#6443)
* combine druid-api, druid-common, java-util

* spacing
2018-10-14 20:37:37 -07:00
vishnu rao 6567fff9e7 Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' header (#4033)
* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header

* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header
o- if Accept header is absent, it defaults to Content-Type header

* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type'  PR #4033
Minor change to a comment - restoring to previous wording

* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type'  PR #4033
o- minor change to check for empty string
2018-10-12 14:29:14 -07:00
Surekha 3be4a97150 Fix inconsistent segment size(#6448) (#6451)
* Fix inconsistent segment size(#6448)

* Fix the segment size for published segments
* Changes to get numReplicas
* Make coordinator segments API truly streaming

* Changes to store partial segment data

* Simplify SegmentMetadataHolder
* Store partial the columns from available segments

* Address comments
2018-10-12 12:55:20 -07:00
Clint Wylie 39d61b9ae5 update druid-console to 0.0.4 (#6450) 2018-10-11 22:37:08 -06:00
David Lim 20ab213ba6 change project versions to 0.13.0-incubating-SNAPSHOT (#6453) 2018-10-11 19:28:01 -07:00
Clint Wylie f7775d1db3 fixes for LookupReferencesManagerTest (#6444)
* some fixes for LookupReferencesManagerTest

* docs

* formatting

* more formatting fixes
2018-10-10 18:02:11 -07:00
Surekha 3a0a667fe0 Introduce SystemSchema tables (#5989) (#6094)
* Added SystemSchema with following tables (#5989)

* SEGMENTS table provides details on served and published segments
* SERVERS table provides details on data servers
* SERVERSEGMETS table is the JOIN of SEGMENTS and SERVERS
* TASKS table provides details on tasks

* Add documentation for system schema

* Fix static-analysis warnings

* Address PR comments

*Add unit tests

* Fix a test

* Try to fix a test

* Fix a bug around replica count

* rename io.druid to org.apache.druid

* Major change is to make tasks and segment queries streaming

* Made tasks/segments stream to calcite instead of storing it in memory
* Add num_rows to segments table
* Refactor JsonParserIterator
* Replace with closeable iterator

* Fix docs, make num_rows column nullable, some unit test changes

* make num_rows column type long, allow it to be null

fix a compile error after merge, add TrafficCop param to InputStreamResponseHandler

* Filter null rows for segments table from Linq4j enumerable

* change num_replicas datatype to long in segments table

* Fix some tests and address comments

* Doc updates, other PR comments

* Update tests

* Address comments

* Add auth check
* Update docs
* Refactoring

* Fix teamcity warning, change the getQueryableServer in TimelineServerView

* Fix compilation after rebase

* Use the stream API from AuthorizationUtils

* Added LeaderClient interface and NoopDruidLeaderClient class

* Revert "Added LeaderClient interface and NoopDruidLeaderClient class"

This reverts commit 100fa46e39.

* Make the naming consistent to server_segments for the join table

* Add ForbiddenException on auth check failure
* Remove static block from SystemSchema

* Try to fix a test in CalciteQueryTest due to rename of server_segments

* Fix the json output format in the coordinator API

* Add auth check in the segments API
* Add null check to avoid NPE

* Use annonymous class object instead of mock for DruidLeaderClient in SqlBenchmark

* Fix test failures, type long/BIGINT can be nullable

* Revert long nullability to fix tests

* Fix style for tests

* PR comments

* Address PR comments

* Add the missing BytesAccumulatingResponseHandler class

* Use Sequences.withBaggage in DruidPlanner

* Fix docs, add comments

* Close the iterator if hasNext returns false
2018-10-10 17:17:29 -07:00
Jihoon Son 88d23b77b7 Add support keepSegmentGranularity for automatic compaction (#6407)
* Add support keepSegmentGranularity for automatic compaction

* skip unknown dataSource

* ignore single semgnet to compact

* add doc

* address comments

* address comment
2018-10-07 16:48:58 -07:00
Jihoon Son 45aa51a00c Add support hash partitioning by a subset of dimensions to indexTask (#6326)
* Add support hash partitioning by a subset of dimensions to indexTask

* add doc

* fix style

* fix test

* fix doc

* fix build
2018-10-06 16:45:07 -07:00
QiuMM 0b8085aff7 Prohibit jackson ObjectMapper#reader methods which are deprecated (#6386)
* Prohibit jackson ObjectMapper#reader methods which are deprecated

* address comments
2018-10-03 17:55:20 -03:00
Roman Leventov 3ae563263a
Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements (#5957)
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.

Some of the changes:
 - Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
 - Generified `ComplexMetricExtractor`
2018-10-02 14:50:22 -03:00
Jihoon Son cb14a43038 Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask (#6393)
* Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask

* update doc and remove auto conversion

* remove remaining doc

* fix teamcity
2018-10-01 12:03:35 -07:00
Gian Merlino 9fa4afdb8e URL encode datasources, task ids, authenticator names. (#5938)
* URL encode datasources, task ids, authenticator names.

* Fix URL encoding for router forwarding servlets.

* Fix log-with-offset API.

* Fix test.

* Test adjustments.

* Task client fixes.

* Remove unused import.
2018-09-30 12:29:51 -07:00
Shiv Toolsidass 5a894f830b Added backpressure metric (#6335)
* Added backpressure metric

* Updated channelReadable to AtomicBoolean and fixed broken test

* Moved backpressure metric logic to NettyHttpClient

* Fix placement of calculating backPressureDuration
2018-09-29 14:24:04 -07:00
Jihoon Son 122caec7b1
Add support targetCompactionSizeBytes for compactionTask (#6203)
* Add support targetCompactionSizeBytes for compactionTask

* fix test

* fix a bug in keepSegmentGranularity

* fix wrong noinspection comment

* address comments
2018-09-28 11:16:35 -07:00
Jihoon Son aef022de98 Fix race in taskMaster (#6388) 2018-09-26 21:48:02 -07:00
Clint Wylie fc1d5795c1 remove wikipedia irc firehose and dependencies from core server module to examples (#6391) 2018-09-26 21:46:37 -07:00
Roman Leventov 8978d3751b Don't convert DruidServer to ImmutableDruidServers multiple times in CoordinatorHistoricalManagerRunnable (#6385) 2018-09-26 09:14:14 -07:00
Gian Merlino a92a20e037 Fix indexes introduced in #6348. (#6356)
The indexes introduced in #6348 were on the wrong table. The tests
did not catch them due to retries on the create table steps (the
first try created the table but not the bogus indexes; the second
try noticed that the table already existed and did nothing). This
patch doesn't fix the issue with the tests, since the best way to
do that would be to do the table and index creation in a
transaction; but, this is not supported by all of our supported
database engines.
2018-09-25 20:49:13 -07:00
Jihoon Son d08c2c5eba
Make JvmThreadsMonitor injectable (#6369) 2018-09-24 20:41:17 -07:00
Jihoon Son 99428e20d2 Deprecate dimensions / metrics APIs on brokers (#6361)
* Deprecate dimensions / metrics APIs on brokers

* add segmentMetadataQuery link

* add more doc
2018-09-24 17:56:38 -07:00
Roman Leventov 9a3195e98c Improve interning in SQLMetadataSegmentManager (#6357)
* Improve interning in SQLMetadataSegmentManager

* typo
2018-09-22 13:23:30 -07:00
Jonathan Wei 364bf9d1f9 Fix non org.apache.druid files and add package name checkstyle rule (#6367)
* Fix non org.apache.druid files and add package name checkstyle rule

* PR comment
2018-09-21 17:58:19 -07:00
Gian Merlino e1c649e906 Add metadata indexes to help with segment allocation. (#6348)
Segment allocation queries can take a long time (10s of seconds) when
you have a lot of segments. Adding these indexes helps greatly.
2018-09-19 15:54:13 -07:00
Jonathan Wei 8972244c68 Mutual TLS support (#6076)
* Mutual TLS support

* Kafka test fixes

* TeamCity fix

* Split integration tests

* Use localhost DOCKER_IP

* Increase server thread count

* Increase SSL handshake timeouts

* Add broken pipe retries, use injected client config params

* PR comments, Rat license check exclusion
2018-09-19 09:56:15 -07:00
Slim Bouguerra 028354eea8 Adding licenses and enable apache-rat-plugin. (#6215)
* Adding licenses and enable apache-rat-plugi.

Change-Id: I4685a2d9f1e147855dba69329b286f2d5bee3c18

* restore the copywrite of demo_table and add it to the list of allowed ones

Change-Id: I2a9efde6f4b984bc1ac90483e90d98e71f818a14

* revirew comments

Change-Id: I0256c930b7f9a5bb09b44b5e7a149e6ec48cb0ca

* more fixup

Change-Id: I1355e8a2549e76cd44487abec142be79bec59de2

* align

Change-Id: I70bc47ecb577bdf6b91639dd91b6f5642aa6b02f
2018-09-18 08:39:26 -07:00
Hongze Zhang 2fac6743d4 Add maxIdleTime option to EventReceiverFirehose (#5997) 2018-09-17 13:50:56 -07:00
Roman Leventov 0c4bd2b57b Prohibit some Random usage patterns (#6226)
* Prohibit Random usage patterns

* Fix FlattenJSONBenchmarkUtil
2018-09-14 13:35:51 -07:00
QiuMM 87ccee05f7 Add ability to specify list of task ports and port range (#6263)
* support specify list of task ports

* fix typos

* address comments

* remove druid.indexer.runner.separateIngestionEndpoint config

* tweak doc

* fix doc

* code cleanup

* keep some useful comments
2018-09-13 19:36:04 -07:00
Roman Leventov d50b69e6d4 Prohibit LinkedList (#6112)
* Prohibit LinkedList

* Fix tests

* Fix

* Remove unused import
2018-09-13 18:07:06 -07:00
Clint Wylie 91a37c692d 'suspend' and 'resume' support for supervisors (kafka indexing service, materialized views) (#6234)
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'

* move overlord console status to own column in supervisor table so does not look like garbage

* spacing

* padding

* other kind of spacing

* fix rebase fail

* fix more better

* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests

* fix log
2018-09-13 14:42:18 -07:00
Gian Merlino d6cbdf86c2
Broker backpressure. (#6313)
* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.
2018-09-10 09:33:29 -07:00
Clint Wylie e6e068ce60 Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task (#6129)
* resolves #5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask

* address review comments

* changes due to review

* merge fail
2018-09-07 13:17:49 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Gian Merlino cb40b6d369 Fix all inspection errors currently reported. (#6236)
* Fix all inspection errors currently reported.

TeamCity builds on master are reporting inspection errors, possibly
because there was a while where it was not running due to the Apache
migration, and there was some drift.

* Fix one more location.

* Fix tests.

* Another fix.
2018-08-26 18:36:01 -06:00
Benedict Jin 3647d4c94a Make time-related variables more readable (#6158)
* Make time-related variables more readable

* Patch some improvements from the code reviewer

* Remove unnecessary boxing of Long type variables
2018-08-21 15:29:40 -07:00
Jihoon Son 2bfe1b6a5a Fix NPE for taskGroupId when rolling update (#6168)
* Fix NPE for taskGroupId

* missing changes

* fix wrong annotation

* fix potential race

* keep baseSequenceName

* make deprecated old param
2018-08-17 10:15:45 -07:00
Samarth Jain 1c8032f9f3 Composite request logger doesn't invoke @LifeCycleStart and @LifeCycleStop methods on its dependencies (#6173) 2018-08-17 12:34:25 -04:00
Gian Merlino 5ce3185b9c Fix three bugs with segment publishing. (#6155)
* Fix three bugs with segment publishing.

1. In AppenderatorImpl: always use a unique path if requested, even if the segment
   was already pushed. This is important because if we don't do this, it causes
   the issue mentioned in #6124.
2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return
   a "not published" result instead of throwing an exception, when there was one
   metadata update failure, followed by some random exception. This is done by
   resetting the AtomicBoolean that tracks what case we're in, each time the
   callback runs.
3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false
   publish result. Skip killing if we just got some exception. The reason for this
   is that we want to avoid killing segments if they are in an unknown state.

Two other changes to clarify the contracts a bit and hopefully prevent future bugs:

1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it
more similar to announceHistoricalSegments.
2. Make it explicit, at multiple levels of javadocs, that a "false" publish result
must indicate that the publish _definitely_ did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.

* Remove javadoc-only import.

* Updates.

* Fix test.

* Fix tests.
2018-08-15 13:55:53 -07:00
Jihoon Son ecee3e0a24 Further optimize memory for Travis jobs (#6150)
* Further optimize memory for Travis jobs

* fix build

* sudo false
2018-08-10 22:03:36 -07:00
Christoph Hösler 1a37dfdcd1 Fetch unhandled curator exceptions (#6131)
* fix: stop druid on unhandled curator exceptions

* catch exceptions when stopping lifecycle
2018-08-09 21:47:42 -07:00
Jihoon Son d6a02de5b5 Add support 'keepSegmentGranularity' for compactionTask (#6095)
* Add keepSegmentGranularity for compactionTask

* fix build

* createIoConfig method

* fix build

* fix build

* address comments

* fix build
2018-08-09 13:51:20 -07:00
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Jihoon Son 56ab4363ea
Native parallel batch indexing without shuffle (#5492)
* Native parallel indexing without shuffle

* fix build

* fix ci

* fix ingestion without intervals

* fix retry

* fix retry

* add it test

* use chat handler

* fix build

* add docs

* fix ITUnionQueryTest

* fix failures

* disable metrics reporting

* working

* Fix split of static-s3 firehose

* Add endpoints to supervisor task and a unit test for endpoints

* increase timeout in test

* Added doc

* Address comments

* Fix overlapping locks

* address comments

* Fix static s3 firehose

* Fix test

* fix build

* fix test

* fix typo in docs

* add missing maxBytesInMemory to doc

* address comments

* fix race in test

* fix test

* Rename to ParallelIndexSupervisorTask

* fix teamcity

* address comments

* Fix license

* addressing comments

* addressing comments

* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator

* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner

* Add more javadocs

* use StringUtils.nonStrictFormat for logging

* fix typo and remove unused class

* fix tests

* change package

* fix strict build

* tmp

* Fix overlord api according to the recent change in master

* Fix it test
2018-08-06 23:59:42 -07:00
Nishant Bangarwa 75c8a87ce1 Part 2 of changes for SQL Compatible Null Handling (#5958)
* Part 2 of changes for SQL Compatible Null Handling

* Review comments - break lines longer than 120 characters

* review comments

* review comments

* fix license

* fix test failure

* fix CalciteQueryTest failure

* Null Handling - Review comments

* review comments

* review comments

* fix checkstyle

* fix checkstyle

* remove unrelated change

* fix test failure

* fix failing test

* fix travis failures

* Make StringLast and StringFirst aggregators nullable and fix travis failures
2018-08-02 08:20:25 -07:00
Jonathan Wei b9c445c780
Optimize filtered aggs with interval filters in per-segment queries (#5857)
* Optimize per-segment queries

* Always optimize, add unit test

* PR comments

* Only run IntervalDimFilter optimization on __time column

* PR comments

* Checkstyle fix

* Add test for non __time column
2018-08-01 14:39:38 -07:00
Clint Wylie 297810e7a4 log correct moved count on balance instead of snapshot of currently moving (#6032) 2018-08-01 03:36:10 -07:00
Roman Leventov 0754d78a2e Prohibit Lists.newArrayList() with a single argument (#6068)
* Prohibit Lists.newArrayList() with a single argument

* Test fixes

* Add Javadoc to Node constructor
2018-07-31 20:09:10 -07:00
Gian Merlino 3aa7017975 Remove some unnecessary task storage internal APIs. (#6058)
* Remove some unnecessary task storage internal APIs.

- Remove MetadataStorageActionHandler's getInactiveStatusesSince and getActiveEntriesWithStatus.
- Remove TaskStorage's getCreatedDateTimeAndDataSource.
- Remove TaskStorageQueryAdapter's getCreatedTime, and getCreatedDateAndDataSource.
- Migrated all callers to getActiveTaskInfo and getCompletedTaskInfo.

This has one side effect: since getActiveTaskInfo (new) warns and continues when it
sees unreadable tasks, but getActiveEntriesWithStatus threw an exception when it
encountered those, it means that after this patch bad tasks will be ignored when
syncing from metadata storage rather than causing an exception to be thrown.

IMO, this is an improvement, since the most likely reason for bad tasks is either:

- A new version introduced an additional validation, and a pre-existing task doesn't
  pass it.
- You are rolling back from a newer version to an older version.

In both cases, I believe you would want to skip tasks that can't be deserialized,
rather than blocking overlord startup.

* Remove unused import.

* Fix formatting.

* Fix formatting.
2018-07-30 18:35:06 -07:00
Benedict Jin 331a0afb98 Remove redundant type parameters and enforce some other style and inspection rules (#5980)
* Various changes about druid-services module

* Patch improvements from reviewer

* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile

* Fix ArraysAsListWithZeroOrOneArgument

* Fix conflict

* Fix ToArrayCallWithZeroLengthArrayArgument

* Fix AliEqualsAvoidNull

* Remove blank line

* Remove unused import clauses

* Fix code style in TopNQueryRunnerTest

* Fix conflict

* Don't use Collections.singletonList when converting the type of array type

* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase

* Roll back the latest commit

* Add java.io.File#toURL() into druid-forbidden-apis

* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord

* Add a new regexp element into stylecode xml file

* Fix style error for new regexp

* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING

* Fix style error for new regexp

* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile

* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR

* Add toArray(new Object[0]) regexp into checkstyle config file & fix them

* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it

* Add a comment for string equals regexp in checkstyle config

* Fix code format

* Add RedundantTypeArguments as ERROR level inspection

* Fix cannot resolve symbol datasource
2018-07-27 16:56:49 -05:00
Jihoon Son 1524af703d
Fix IllegalArgumentException in TaskLockBox.syncFromStorage() (#6050) 2018-07-27 10:43:32 -07:00
kaijianding 7919e4d5df move rangeSet compare into shardspec (#5688) 2018-07-26 14:17:57 -07:00
Jihoon Son 5ee7b0cada Synchronize scheduled poll() calls in SQLMetadataSegmentManager (#6041)
Similar issue to https://github.com/apache/incubator-druid/issues/6028.
2018-07-24 22:57:30 -05:00
Roman Leventov 7d5eb0c21a Synchronize scheduled poll() calls in SQLMetadataRuleManager to prevent flakiness in SqlMetadataRuleManagerTest (#6033) 2018-07-24 12:00:48 -07:00
Surekha 414487a78e Add support to filter on datasource for active tasks (#5998)
* Add support to filter on datasource for active tasks

* Added datasource filter to sql query for active tasks
* Fixed unit tests

* Address PR comments
2018-07-19 16:33:46 -07:00
Jihoon Son 4a2df2b23a Log the full stack trace when an HTTP request fails (#6022) 2018-07-19 12:05:46 -07:00
Jihoon Son c48aa74a30 Fix NPE while handling CheckpointNotice in KafkaSupervisor (#5996)
* Fix NPE while handling CheckpointNotice

* fix code style

* Fix test

* fix test

* add a log for creating a new taskGroup

* fix backward compatibility in KafkaIOConfig
2018-07-13 17:14:57 -07:00
Clint Wylie 31c2179fe1 Coordinator fix balancer stuck (#5987)
* this will fix it

* filter destinations to not consider servers already serving segment

* fix it

* cleanup

* fix opposite day in ImmutableDruidServer.equals

* simplify
2018-07-11 20:19:11 -07:00
Clint Wylie ac194cc082 Coordinator fix exception caused by additional logging (#5988)
* fix explosion in curator load queue peon caused by additional logging, as well as annoying chatty log

* remove log message
2018-07-11 16:13:32 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Gian Merlino 948e73da77 Extend various test timeouts. (#5978)
False failures on Travis due to spurious timeout (in turn due to noisy
neighbors) is a bigger problem than legitimate failures taking too long
to time out. So it makes sense to extend timeouts.
2018-07-10 13:02:14 -07:00
Gian Merlino 24c20b4734 Forbid slashes in datasource names. (#5937)
They are bad because datasources are used as paths on filesystems,
and slashes invariably make things get stored improperly.
2018-07-05 09:49:16 -07:00
Clint Wylie aa4987b871 change default compaction task target size from 800MB to 400MB to fall within range of what docs recommend for segment sizing (#5930) 2018-07-05 00:12:31 -07:00
Jihoon Son 4cd14e8158 Proper handling of the exceptions from auto persisting in AppenderatorImpl.add() (#5932) 2018-07-04 23:42:41 -07:00
Clint Wylie 39371b0ff8 More coordinator logging to help give context to load queue peon log messages (#5929)
* more coordinator logging to help give context to load queue peon log messages

* fix style

* more chill load queue peon log messages
2018-07-04 23:40:25 -07:00
Clint Wylie 0a472d3fa0 coordinator slight optimze load rule to skip drop if numToDrop is 0 (#5928) 2018-07-03 17:56:11 -07:00
Clint Wylie d5a3871864 Coordinator fix balance to try to move max segments instead of up to max segments (#5927)
* fix move to try to move max segments instead of "up to" max segments

* fix

* fix oops
2018-07-03 17:06:38 -07:00
Jihoon Son 1ccabab98e Fix the broken Appenderator contract in KafkaIndexTask (#5905)
* Fix broken Appenderator contract in KafkaIndexTask

* fix build

* add publishFuture

* reuse sequenceToUse if possible
2018-07-03 13:31:29 -07:00
mhshimul 867f6a9e2b Fix SQL Server select query in createInactiveStatusesSinceQuery() method. (#5901)
* Fix SQL Server select query in createInactiveStatusesSinceQuery() method.

SQL server does not support LIMIT N in select queries. Instead it has TOP N to limiting number of query results.
And TOP N is already added in the select statement as per maxNumStatuses value.

* Add parentheses for TOP in SELECT statement as SQL Servers no longer support TOP without parentheses.
2018-07-03 23:16:47 +05:30
Jihoon Son b6c957b0d2 Allow reordered segment allocation in kafka indexing service (#5805)
* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug
2018-07-02 15:09:12 -07:00
Surekha 933b25416c Handle task deserialization failure in the tasks api (#5911)
If task payload fails to deserialize json to Java, make the task null and handle null task in OverlordResource
2018-06-29 11:57:48 -07:00
Gian Merlino a28314349c
Fix spelling of "propagate" in various places. (#5896)
One of these is a configuration parameter (introduced in #5429),
but it's never been in a release, so I think it's ok to rename it.
2018-06-25 09:18:08 -07:00
George Paraskevas 4b111929ec Fix typo lage->large , improve warning message (#5890) 2018-06-22 17:33:02 -07:00
Clint Wylie 1a7adabf57 Coordinator segment balancer max load queue fix (#5888)
* Coordinator segment balancer will now respect "maxSegmentsInNodeLoadingQueue" config

* allow moves from full load queues

* better variable names
2018-06-20 23:04:41 -07:00
Niketh Sabbineni 0982472c90 Use historical node instead of realtime for querying (#4764)
* Use historical node instead of realtime for querying

* Incorporated code review comments

* Incorporate code review comments

* Remove artifact comment

* Consider non-historical nodes as realtime
2018-06-20 22:53:56 -07:00
Surekha 8619adb5b9 Improve task retrieval APIs on Overlord (#5801)
* Add the new tasks api in overlordResource

It takes 4 optional query params
* state(pending/running/waiting/compelte)
* dataSource
* interval (applies to completed tasks)
* maxCompletedTasks (applies to completed tasks)

If all params are null, the api returns all the tasks

* Add the state to each task returned by tasks endpoint

* divide active tasks into waiting, pending or running
* Add more unit tests

* Add UNKNOWN state to TaskState

* Fix the authorization calls

* WIP: PR comments

Added new class to capture task info for caching
Other refactoring

* Refactoring : move TaskStatus class to druid-api

so it can be accessed within server
And other related classes like TaskState and TaskStatusPlus are in api

* Remove unused class and apis accessing it

* Add a separate cache for recently completed tasks

This is to mainly capture the task type from payload

* Ignore a test

* Add a RuntimeTaskState to encompass all states a task can be in

* Revert "Add a RuntimeTaskState to encompass all states a task can be in"

This reverts commit 2a527a0731.

* Fix wrong api call

* Fix and unignore tests

* Remove waiting,pending state from TaskState

* Add RunnerTaskState

* Missed the annotation runnerStatusCode

* Fix the creationTime

* Fix the createdTime and queueInsertionTime for running/active tasks
* Clean up tests

* Add javadocs

* Potentially fix the teamcity build

* Address PR comments

*Get rid of TaskInfoBuilder
*Make TaskInfoMapper static nested class
*Other changes

* fix import in MaterializedViewSupervisor after merge

* Address PR comments on

* Replace global cache with local map
* combine multiple queries into one
* Removed unused code

* Fix unit tests

Fix a bug in securedTaskStatusPlus

* Remove getRecentlyFinishedTaskStatuses method

Change TaskInfoMapper signature to add generic type

* Address PR comments

* Passed datasource as argument to be used in sql query
* Other minor fixes

* Address PR comments

*Some minor changes, rename method, spacing changes

* Add early auth check if datasource is not null

* Fix test case

* Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage
* Add TaskLocation to Anytask object

* Address PR comments

* Fix a bug in test case causing ClassCastException
2018-06-19 11:34:59 -07:00
varaga b4b1b2a020 Provisioning support for ZooKeeper Authorization (#5701)
Review comments implemented
2018-06-15 14:02:01 -07:00
Jonathan Wei dc67b77ec2 Immediately send 401 on basic HTTP authentication failure (#5856)
* Immediately send 401 on basic HTTP authentication failure

* Add unit tests
2018-06-14 10:23:10 -07:00
Jonathan Wei 24efbb054c
Fix inefficient available segment cache population in SQLMetadataSegmentManager (#5878) 2018-06-12 18:53:30 -07:00
zhangxinyu e43e5ebbcd Materialized view implementation (#5556)
* implement materialized view

* modify code according to jihoonson's comments

* modify code according to jihoonson's comments - 2

* add documentation about materialized view

* use new HadoopTuningConfig in pr 5583

* add minDataLag and fix optimizer bug

* correct value of DEFAULT_MIN_DATA_LAG_MS

* modify code according to jihoonson's comments - 3

* use the boolean expression instead of if-else
2018-06-09 12:24:54 -07:00
awelsh93 6f0aedd6ab Fix defaultQueryTimeout (#5807)
* Fix defaultQueryTimeout

- set default timeout in query context before query fail time is evaluated

Remove unused import

* Address failing checks

* Addressing code review comments

* Removed line that was no longer used
2018-06-08 15:34:10 -07:00
Hongze Zhang cfa94b747b Update to jetty 9.4; Enable request decompression (#5624)
* Update to jetty 9.4; Enable request decompression; Add http compression config options

* Fix BadMessageException from jetty server at HttpGenerator.generateHeaders(...)
2018-06-08 14:53:08 -07:00
awelsh93 adbe22c05b Security - add anonymous authenticator (#5842)
* Anonymous authenticator that authenticates all requests and then directs them to an authorizer.

* Adding documentation

* Removed some fields from class AnonymousAuthenticator

* Updating docs
2018-06-07 10:17:54 -07:00
Jonathan Wei 684b5d18c1
Moving averages for ingestion row stats (#5748)
* Moving averages for ingestion row stats

* PR comments

* Make RowIngestionMeters extensible

* test and checkstyle fixes

* More PR comments

* Fix metrics

* Add some comments

* PR comments

* Comments
2018-06-05 09:08:57 -07:00
Michael Schnupp 33b4eb624d fix freeSpacePercent in segmentCache.locations (#5765)
* fix freeSpacePercent in segmentCache.locations

* the check should probably test the other way around
* documentation should put the option in the right place
* examples have a superfluous backslash

* add test to verify correct behavior

* switch to Path and test with jimfs

Path allows to use different filesystems.
Jimfs provides an actual (in memory) filesystem.
This also allows more complex test scenarios.

The behavior should be unchanged by this commit.

* Revert "switch to Path and test with jimfs"

This reverts commit 8b9a418d65.
2018-05-24 11:15:30 +09:00
Atul Mohan 1b9611a60e Local indexing from RDBMS (#5441)
* Local indexing from RDBMS

*  Fix content

* Remove pom changes

* Remove extraneous space

* Add tests and update documentation

* Fix comments

* Fix docs

*  Fix build related issue

*  Handle invalid strings

* Make target database independent of metadata storage

* Add firehose connector

* Fix accessibility

* Add docs

* Remove unused def

* Remove lazy instantiation of jsoniterator

* Move unused changes

* Move unused changes

* Fix build

* Make Sqlfirehose method private
2018-05-22 12:33:01 +09:00
Dylan Wylie c537ea56f6 Validate dataschema datasource (#5785)
* Validate dataschema has a datasource

* Fix tests

* Use Guava Strings.isNullOrEmpty

* Inverse nullempty check, whoops
2018-05-18 16:29:06 -07:00
Gian Merlino f2cc6ce4d5
VersionedIntervalTimeline: Optimize construction with heavily populated holders. (#5777)
* VersionedIntervalTimeline: Optimize construction with heavily populated holders.

Each time a segment is "add"ed to a timeline, "isComplete" is called on the holder
that it is added to. "isComplete" is an O(segments per chunk) operation, meaning
that adding N segments to a chunk is an O(N^2) operation. This blows up badly if
we have thousands of segments per chunk.

The patch defers the "isComplete" check until after all segments have been
inserted.

* Fix imports.
2018-05-16 09:16:59 -07:00
Jihoon Son 9dca5ec76b Simple cleanup for ThreadPoolTaskRunner and SetAndVerifyContextQueryRunner / Add ThreadPoolTaskRunnerTest (#5557)
* Simple fix for ThreadPoolTaskRunner

* fix build

* address comments

* update javadoc

* fix build

* fix test

* add dependency
2018-05-15 22:53:11 +05:30
Surekha 2f8904e25f Check against the real default of maxBytes(1/6 max mem) in AppenderatorImpl's add (#5758)
* The check for maxBytesInMemory should be >= 0 instead of > 0

* if the default value is 0, the actual check could be skipped
* fix the message for persistReasons

* Address PR comments

* if maxBytes set -1, make is Long.MAX_VAL, so we do not need to check if it's 0 or -1
* set the maxBytesTuningconfig in AppenderatorImpl constructor to avoid duplicate code

* fix the failing test cases

* Address PR comments
2018-05-09 13:41:51 -07:00
Jihoon Son c7a59394e0 Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor (#5704)
* Consider waiting and pending compaction tasks as well as running tasks in DruidCoordinatorSegmentCompactor

* fix build

* fix logging
2018-05-08 19:03:54 -07:00
Kirill Kozlov 67d0b0ee42 Add taskType dimension to task metrics (#5664) 2018-05-07 09:42:26 -07:00
Fokko Driesprong a95ec92296 Move to the org.lz4 dependency (#5746)
The net.jpountz.lz4 moved to org.lz4
2018-05-07 08:16:45 -07:00
Slim Bouguerra 8aa8d9fa5b
Kerberos Spnego Authentication Router Issue (#5706)
* Adding decoration method to proxy servlet

Change-Id: I872f9282fb60bfa20524271535980a36a87b9621

* moving the proxy request decoration to authenticators

Change-Id: I7f94b9ff5ecf08e8abf7169b58bc410f33148448

* added docs

Change-Id: I901543e52f0faf4666bfea6256a7c05593b1ae70

* use the authentication result to decorate request

Change-Id: I052650de9cd02b4faefdbcdaf2332dd3b2966af5

* adding authenticated by name

Change-Id: I074d2933460165feeddb19352eac9bd0f96f42ca

* ensure that authenticator is not null

Change-Id: Idb58e308f90db88224a06f3759114872165b24f5

* fix types and minor bug

Change-Id: I6801d49a05d5d8324406fc0280286954eb66db10

* fix typo

Change-Id: I390b12af74f44d760d0812a519125fbf0df4e97b

* use actual type names

Change-Id: I62c3ee763363781e52809ec912aafd50b8486b8e

* set authenitcatedBy to null for AutheticationResults created by
Escalator.

Change-Id: I4a675c372f59ebd8a8d19c61b85a1e4bf227a8ba
2018-05-05 20:33:51 -07:00
kaijianding c12c16385e support throw duplcate row during realtime ingestion in RealtimePlumber (#5693) 2018-05-04 10:12:25 -07:00
Stuart McLean c2b5e5ec95 Default caffeine cache size (#5738)
* add default caffeine cache size based on runtime Xmx or max 1GB

* update docs for caffeine cache

* fix formatting

* test caffeine size should never be less than 0

* set caffeine max default size to 1G not 1M

* fix caffeine cache tests
2018-05-04 09:29:11 -07:00
Surekha 13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583)
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
2018-05-03 16:25:58 -07:00
Gian Merlino df01998213 SegmentLoadDropHandler: Fix deadlock when segments have errors loading on startup. (#5735)
The "lock" object was used to synchronize start/stop as well as synchronize removals
from segmentsToDelete (when a segment is done dropping). This could cause a deadlock
if a segment-load throws an exception during loadLocalCache. loadLocalCache is run
by start() while it holds the lock, but then it spawns loading threads, and those
threads will try to acquire the "segmentsToDelete" lock if they want to drop a corrupt
segments.

I don't see any reason for these two locks to be the same lock, so I split them.
2018-05-03 09:59:01 -07:00
Jihoon Son 2c8296f94d Fix Appenderator.push() to commit the metadata of all segments (#5730)
* Remove persist from Appenderator

* fix javadoc
2018-05-02 13:17:54 -07:00
Jihoon Son d4311b4a5a Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702)
* Support enablePathStyleAccess and disableChunkedEncoding for aws client

* add an option for forceGlobalBucketAccessEnabled

* add missing doc
2018-05-02 10:45:38 -07:00
David Lim 8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Roman Leventov 9be000758d Refactor index merging, replace Rowboats with RowIterators and RowPointers (#5335)
* Refactor index merging, replace Rowboats with RowIterators and RowPointers

* Add javadocs

* Fix a bug in QueryableIndexIndexableAdapter

* Fixes

* Remove unused declarations

* Remove unused GenericColumn.isNull() method

* Fix test

* Address comments

* Rearrange some code in MergingRowIterator for more clarity

* Self-review

* Fix style

* Improve docs

* Fix docs

* Rename IndexMergerV9.writeDimValueAndSetupDimConversion to setUpDimConversion()

* Update Javadocs

* Minor fixes

* Doc fixes, more code comments, cleanup of RowCombiningTimeAndDimsIterator

* Fix doc link
2018-04-27 17:34:32 -07:00
David Lim 55b003e5e8 Fix loadstatus?full double counting expected segments (#5667)
* fix loadstatus?full double counting expected segments

* remove possible flakiness from Thread.sleep() in test
2018-04-24 01:11:16 +05:30
Roman Leventov a3a9ada843 Add GenericWhitespace checkstyle check (#5668) 2018-04-24 01:09:14 +05:30
Jihoon Son ca3f833426 Fix coordinator's dataSource api with full parameter (#5662)
* Fix coordinator's dataSource api with full parameter

* address comment

* Add a constructor for json serde and fix result order

* Change to immutableSortedMap

* Revert immutableSortedMap to treeMap
2018-04-19 17:41:53 -07:00
Kirill Kozlov a7ba2bf275 Detailed error message when unable to create temp dir (#5648) 2018-04-17 15:12:46 -07:00
Jonathan Wei d0b66a6af5 Fix HTTP OPTIONS request auth handling (#5638)
* Fix HTTP OPTIONS request auth handling

* PR comment

* More PR comments

* Fix

* PR comment
2018-04-16 18:09:56 -07:00
Jonathan Wei 882b172318
Revert "Fix HTTP OPTIONS request auth handling (#5615)" (#5637)
This reverts commit df51a7bcb7.
2018-04-12 16:43:54 -07:00
Jonathan Wei e91add6843
Fix coordinator loadStatus performance (#5632)
* Optimize coordinator loadStatus

* Add comment

* Fix teamcity

* Checkstyle

* More checkstyle

* Checkstyle
2018-04-12 15:07:52 -07:00
Jonathan Wei df51a7bcb7
Fix HTTP OPTIONS request auth handling (#5615)
* Fix HTTP OPTIONS request auth handling

* Flip configuration boolean
2018-04-12 14:02:20 -07:00
Gian Merlino d0400a0688 SegmentWithState: Add toString method. (#5635)
The class appears in log messages, and the default toString method
isn't very informative.
2018-04-12 14:01:09 -05:00
palanieppan-m dbea5cb9b7 Load rules should honor partial overlap (#5595)
Load rules should load segments that partially overlap with rule window,
instead of loading only segments that fully overlap.
2018-04-12 09:46:00 -07:00
Atul Mohan 19f359957f Add getters for AlertEvent (#5522)
* Add getters for AlertEvent

* Move PublicApi and ExtensionPoint to java-util

* Fix publicapi annotation usage

* Add publicapi annotations to ServiceMetricEvent and RequestLogEvent
2018-04-12 23:38:20 +07:00
Nishant Bangarwa e6efd75a3d Add config to allow setting up custom unsecured paths for druid nodes. (#5614)
* Add config to allow setting up custom unsecured paths for druid nodes.

* return all resources for Unsecured paths

* review comment - Add test

* fix tests

* fix test
2018-04-11 17:10:07 -07:00
Clint Wylie ea4f8544fb revert lambda conversion to fix occasional jvm error (#5591) 2018-04-06 14:18:55 -07:00
Gian Merlino 5ab17668c0 CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)
Also switch various firehoses to the new method.

Fixes #5585.
2018-04-06 08:06:45 -07:00
Niketh Sabbineni 270fd1ea15 Allow getDomain to return disjointed intervals (#5570)
* Allow getDomain to return disjointed intervals

* Indentation issues
2018-04-05 22:12:30 -07:00
Jonathan Wei 969342cd28
More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
2018-04-05 21:38:57 -07:00
Niketh Sabbineni f0a94f5035 Remove unused config (#5564)
* Remove unused config

* Fix failing tests
2018-04-03 13:23:46 -07:00
Clint Wylie f31dba6c5b Coordinator drop segment selection through cost balancer (#5529)
* drop selection through cost balancer

* use collections.emptyIterator

* add test to ensure does not drop from server with larger loading queue with cost balancer

* javadocs and comments to clear things up

* random drop for completeness
2018-04-03 11:22:51 -07:00
Clint Wylie a81ae99021 add 'stopped' check and handling to HttpLoadQueuePeon load and drop segment methods (#5555)
* add stopped check and handling to HttpLoadQueuePeon load and drop segment methods

* fix unrelated timeout :(

* revert unintended change

* PR feedback: change logging

* fix dumb
2018-04-03 11:21:52 -07:00