Commit Graph

612 Commits

Author SHA1 Message Date
Xue Yu b9c6a26c0e Use ComplexMetrics.registerSerde() across the codebase (#7925)
* refactor complexmetric registerserde

* fix error

* feedback address
2019-06-25 11:39:04 +03:00
Sashidhar Thallam 6cc8802b8e #7875: Setting ACL on S3 task logs on similar lines as that of data segments pushed to S3 (#7907)
* #7875: Setting ACL on S3 task logs on similar lines as that of data segment pushed to S3

* #7875 1. Extracting a method (which uploads a file to S3 setting appropriate access control list to the file being uploaded) and moving it to utils class. 2. Adding S3TaskLogsTest.java file to test acl (permissions) on the task log files pushed to S3.

* fixing checkstyle errors

* #7875 Incorporating review comments
2019-06-24 17:25:59 -06:00
Fokko Driesprong 82b248cc17 Spotbugs: Enable MS_SHOULD_BE_FINAL (#7946) 2019-06-23 15:42:18 -07:00
Jonathan Wei 35601bb7a0 Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory (#7784)
* Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory

* Add finalizeAsBase64Binary option to ApproximateHistogramFactory

* Update approx histogram doc
2019-06-21 18:00:19 -07:00
Clint Wylie 494b8ebe56 multi-value string column support for expressions (#7588)
* array support for expression language for multi-value string columns

* fix tests?

* fixes

* more tests

* fixes

* cleanup

* more better, more test

* ignore inspection

* license

* license fix

* inspection

* remove dumb import

* more better

* some comments

* add expr rewrite for arrayfn args for more magic, tests

* test stuff

* more tests

* fix test

* fix test

* castfunc can deal with arrays

* needs more empty array

* more tests, make cast to long array more forgiving

* refactor

* simplify ExprMacro Expr implementations with base classes in core

* oops

* more test

* use Shuttle for Parser.flatten, javadoc, cleanup

* fixes and more tests

* unused import

* fixes

* javadocs, cleanup, refactors

* fix imports

* more javadoc

* more javadoc

* more

* more javadocs, nonnullbydefault, minor refactor

* markdown fix

* adjustments

* more doc

* move initial filter out

* docs

* map empty arg lambda, apply function argument validation

* check function args at parse time instead of eval time

* more immutable

* more more immutable

* clarify grammar

* fix docs

* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
2019-06-19 13:57:37 -07:00
Fokko Driesprong 0a6fbbbb80 Bump Apache Avro to 1.9.0 (#7772)
* Bump Apache Avro to 1.9.0

Apache Avro 1.9.0 brings a lot of new features:
* Deprecate Joda-Time in favor of Java8 JSR310 and setting it as default
* Remove support for Hadoop 1.x
* Move from Jackson 1.x to 2.9
* Add ZStandard Codec
* Lots of updates on the dependencies to fix CVE's
* Remove Jackson classes from public API
* Apache Avro is built by default with Java 8
* Apache Avro is compiled and tested with Java 11 to guarantee compatibility
* Apache Avro MapReduce is compiled and tested with Hadoop 3
* Apache Avro is now leaner, multiple dependencies were removed: guava, paranamer, commons-codec, and commons-logging
* Introduce JMH Performance Testing Framework
* Add Snappy support for C++ DataFile
* and many, many more!

* Add exclusions for Jackson
2019-06-19 03:31:18 -07:00
Clint Wylie 71997c16a2 switch links from druid.io to druid.apache.org (#7914)
* switch links from druid.io to druid.apache.org

* fix it
2019-06-18 09:06:27 -07:00
Fokko Driesprong f581118f05 Remove Apache Pig from the tests (#7810)
* Remove Apache Pig from the tests

* Remove the Pig specific part

* Fix the Checkstyle issues

* Cleanup a bit

* Add an additional test

* Revert the abstract class
2019-06-14 14:18:58 -07:00
Sashidhar Thallam 3bee6adcf7 Use map.putIfAbsent() or map.computeIfAbsent() as appropriate instead of containsKey() + put() (#7764)
* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()

* fixing indentation

* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate

* fixing checkstyle

* Changing the recommendation text

* Reverting auto changes made by IDE

* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap

* Removing unused import
2019-06-14 17:59:36 +02:00
Xue Yu ce591d1457 Support var_pop, var_samp, stddev_pop and stddev_samp etc in sql (#7801)
* support var_pop, stddev_pop etc in sql

* fix sql compatible

* rebase on master

* update doc
2019-06-10 09:40:09 -07:00
Clint Wylie 3fbb0a5e00 Supervisor list api with states and health (#7839)
* allow optionally listing all supervisors with their state and health

* docs

* add state to full

* clean

* casing

* format

* spelling
2019-06-07 16:26:33 -07:00
Clint Wylie ee0d4ea589 add bloom filter fallback aggregator when types are unknown (#7719) 2019-06-06 14:39:32 -07:00
Alexander Saydakov 4dd446bfdd sketches-core-0.13.4 (#7666) 2019-06-06 14:36:52 -07:00
Eugene Sevastyanov 080270283a Druid basic authentication class composition config (#7789)
* Druid basic authentication class composition config.

* Added comments

* Reduced nulls

* Used noop implementations to get rid of null

* Added docs for no-op metadata storage updaters

* Fixed BasicAuthClassCompositionConfig javadoc

* Removed incorrect comments
2019-06-06 15:51:37 +02:00
Gian Merlino 1de1a02e49 Kinesis: Fix getPartitionIds, should be checking isHasMoreShards. (#7830) 2019-06-04 16:26:22 -07:00
Nishant Bangarwa fdc03bd336 [druid-kerberos] Fix checking of host URI when reading cookies from cookie store (#7825)
Reading of auth cookie was not checking URI of the server where request was being sent.  This was causing cookie set for one server to be sent to another one and extra authentication round trips between internal druid services.
2019-06-03 19:32:50 -07:00
Justin Borromeo 8032c4add8 Add errors and state to stream supervisor status API endpoint (#7428)
* Add state and error tracking for seekable stream supervisors

* Fixed nits in docs

* Made inner class static and updated spec test with jackson inject

* Review changes

* Remove redundant config param in supervisor

* Style

* Applied some of Jon's recommendations

* Add transience field

* write test

* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()

* remove transience reporting and fix SeekableStreamSupervisorStateManager impl

* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests

* remove stateHistory because it wasn't adding much value, some fixes, and add more tests

* fix tests

* code review changes and add HTTP health check status

* fix test failure

* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager

* fixup after merge

* code review changes - add additional docs

* cleanup KafkaIndexTaskTest

* add additional documentation for Kinesis indexing

* remove unused throws class
2019-05-31 17:16:01 -07:00
Jihoon Son 7abfbb066a Bump up snapshot version to 0.16.0 (#7802) 2019-05-30 17:17:33 -07:00
Gian Merlino 8649b8ab4c
SQL: Allow select-sort-project query shapes. (#7769)
* SQL: Allow select-sort-project query shapes.

Fixes #7768.

Design changes:

- In PartialDruidQuery, allow projection after select + sort by removing
  the SELECT_SORT query stage and instead allowing the SORT and
  SORT_PROJECT stages to apply either after aggregation or after a plain
  non-aggregating select. This is different from prior behavior, where
  SORT and SORT_PROJECT were only considered valid after aggregation
  stages. This logic change is in the "canAccept" method.
- In DruidQuery, represent either kind of sorting with a single "Sorting"
  class (instead of DefaultLimitSpec). The Sorting class is still
  convertible into a DefaultLimitSpec, but is also convertible into the
  sorting parameters accepted by a Scan query.
- In DruidQuery, represent post-select and post-sorting projections with
  a single "Projection" class. This obsoletes the SortProject and
  SelectProjection classes, and simplifies the DruidQuery by allowing us
  to move virtual-column and post-aggregator-creation logic into the
  new Projection class.
- Split "DruidQuerySignature" into RowSignature and VirtualColumnRegistry.
  This effectively means that instead of having mutable and immutable
  versions of DruidQuerySignature, we instead of RowSignature (always
  immutable) and VirtualColumnRegistry (always mutable, but sometimes
  null). This change wasn't required, but IMO it this makes the logic
  involving them easier to follow, and makes it more clear when the
  virtual column registry is active and when it's not.

Other changes:

- ConvertBoundsToSelectors now just accepts a RowSignature, but we
  use the VirtualColumnRegistry.getFullRowSignature() method to get
  a signature that includes all columns, and therefore allows us to
  simplify the logic (no need to special-case virtual columns).
- Add `__time` to the Scan column list if the query is ordering by time.

* Remove unused import.
2019-05-30 12:56:29 -07:00
Roman Leventov 782863ed0f Fix some problems reported by PVS-Studio (#7738)
* Fix some problems reported by PVS-Studio

* Address comments
2019-05-29 11:20:45 -07:00
Fokko Driesprong e46bdf082e Remove Codehaus references from the tests (#7773) 2019-05-27 10:51:14 -07:00
Clint Wylie eef69619d3 add support for multi-value string dimensions for HllSketch build aggregator (#7730) 2019-05-23 17:07:32 -07:00
Jonathan Wei ec4d09a02f Remove obsolete isExcluded config from Kerberos authenticator (#7745) 2019-05-23 16:00:05 -07:00
Jonathan Wei 54b3f363c4
Remove unnecessary principal handling in KerberosAuthenticator (#7685) 2019-05-23 13:15:44 -07:00
Clint Wylie 23e96d15d4 allow quantiles merge aggregator to also accept doubles (#7718)
* allow quantiles merge aggregator to also accept doubles

* consolidate dupe

* import
2019-05-23 11:13:41 -07:00
Merlin Lee 26fad7e06a Add checkstyle for "Local variable names shouldn't start with capital" (#7681)
* Add checkstyle for "Local variable names shouldn't start with capital"

* Adjust some local variables to constants

* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
2019-05-23 18:40:28 +02:00
Jihoon Son eff2be4f8f Remove LegacyKafkaIndexTaskRunner (#7735) 2019-05-23 09:25:35 -07:00
Gian Merlino 53b6467fc8 SeekableStreamIndexTaskRunner: Lazy init of runner. (#7729)
The main motivation is that this fixes #7724, by making it so the overlord
doesn't try to create a task runner and parser when all it really wants to
do is create a task object and serialize it.
2019-05-22 21:13:57 -07:00
Clint Wylie ffc2397bcd fix AggregatorFactory.finalizeComputation implementations to be ok with null inputs (#7731)
* AggregatorFactory finalizeComputation is nullable with nullable input, make implementations honor this

* fixes
2019-05-22 21:13:09 -07:00
Gian Merlino b6941551ae Upgrade various build and doc links to https. (#7722)
* Upgrade various build and doc links to https.

Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.

* Switch to https://apache.org.
2019-05-21 11:30:14 -07:00
Merlin Lee 5f08b0b474 Add checkstyle for "Prohibit @author tags in Javadoc" (#7682)
* Add checkstyle for "Prohibit @author tags in Javadoc"

* Add "Do not use author tags/information in the code" back to CONTRIBUTING.md
2019-05-20 00:09:51 -07:00
David Lim d38457933f Data loader (sampler component) - Kafka/Kinesis samplers (#7566)
* implement Kafka/Kinesis sampler

* add KafkaSamplerSpecTest and KinesisSamplerSpecTest

* code review changes
2019-05-16 20:26:23 -07:00
Jonathan Wei 7d63c295cc Fix compilation error in CoordinatorBasicAuthorizerResourceTest (#7667)
* Fix compilation error in CoordinatorBasicAuthorizerResourceTest

* Don't use simplifyPermissions
2019-05-15 17:47:38 -07:00
Jonathan Wei 6901123a53 Fix compareAndSwap() in SQLMetadataConnector (#7661)
* Fix compareAndSwap() in SQLMetadataConnector

* Catch serialization_failure and retry for Postgres
2019-05-15 14:53:04 -07:00
Jonathan Wei e874da7cea
Add simpler permissions option to BasicAuthorizer GET APIs (#7635)
* Add simpler permissions option to BasicAuthorizer GET APIs

* Adjust log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>

* Adjust log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>
2019-05-15 12:59:32 -07:00
Fokko Driesprong 2aa9613bed Bump Checkstyle to 8.20 (#7651)
* Bump Checkstyle to 8.20

Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle

Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.

Affected versions: < 8.18

* Oops, missed one

* Oops, missed a few
2019-05-14 11:53:37 -07:00
Fokko Driesprong 4c709ddbc1 Bump Apache Parquet to 1.10.1 (#7645)
https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1101
2019-05-12 14:38:33 -07:00
Alexander Saydakov ca1a6649f6 Datasketches quantiles more post-aggs (#7550)
* rank and CDF post-aggs

* added post-aggs to the module

* added new post-aggs

* moved post-agg IDs

* moved post-agg IDs
2019-05-10 11:46:54 -07:00
Alexander Saydakov 59f9ff38c7 fix issue #7607 (#7619)
* fix issue #7607

* exclude com.google.code.findbugs:annotations
2019-05-09 17:33:29 -07:00
Jinseon Lee 0ef435a16c add postgresql meta db table schema configuration property (#7137) (#7183)
* add postgresql meta db table schema configuration property (#7137)

If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public

* create postgresql metadb table schema configuration property (#7137)
If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public
check PostgreSQLTablesConfig.java

* modify postgresql readme file. - metadb table schema (#7137)
If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public
check PostgreSQLTablesConfig.java
2019-05-08 12:56:30 -07:00
Xavier Léauté f7bfe8f269
Update mocking libraries for Java 11 support (#7596)
* update easymock / powermock for to 4.0.2 / 2.0.2 for JDK11 support
* update tests to use new easymock interfaces
* fix tests failing due to easymock fixes
* remove dependency on jmockit
* fix race condition in ResourcePoolTest
2019-05-06 12:28:56 -07:00
Alexander Saydakov 9d8f934e68 handle empty sketches (#7526)
* handle empty sketches

* return array of NaN in case of empty sketch

* noinspection ForLoopReplaceableByForEach in tests

* style fixes
2019-04-25 14:28:41 -07:00
es1220 3e25b75c3f Fix aggregatorFactory meta merge exception (#7504) 2019-04-24 14:08:46 -07:00
Jihoon Son c60e7feab8 Fix encoded taskId check in chatHandlerResource (#7520)
* Fix encoded taskId check in chatHandlerResource

* fix tests
2019-04-20 18:08:34 -07:00
Clint Wylie be65cca248 refactor druid-bloom-filter aggregators (#7496)
* now with 100% more buffer

* there can be only 1

* simplify

* javadoc

* clean up unused test method

* fix exception message

* style

* why does style hate javadocs

* review stuff

* style :(
2019-04-18 11:54:06 -07:00
Xavier Léauté 4322ce3303 Java 9 compatible cleaner operations (#7487)
Java 9 removed support for sun.misc.Cleaner in favor of
java.lang.ref.Cleaner. This change adds a thin abstraction to switch
between Cleaner implementations based on JDK version at runtime
2019-04-17 08:04:52 -07:00
Jonathan Wei 7d9cb6944b Adjust BufferAggregator.get() impls to return copies (#7464)
* Adjust BufferAggregator.get() impls to return copies

* Update BufferAggregator docs, more agg fixes

* Update BufferAggregator get() doc
2019-04-12 19:04:07 -07:00
Justin Borromeo 2771ed50b0 Support Kafka supervisor adopting running tasks between versions (#7212)
* Recompute hash in isTaskCurrent() and added tests

* Fixed checkstyle stuff

* Fixed failing tests

* Make TestableKafkaSupervisorWithCustomIsTaskCurrent static

* Add doc

* baseSequenceName change

* Added comment

* WIP

* Fixed imports

* Undid lambda change for diff sake

* Cleanup

* Added comment

* Reinsert Kafka tests

* Readded kinesis test

* Readd bad partition assignment in kinesis supervisor test

* Nit

* Misnamed var
2019-04-10 18:16:38 -07:00
Clint Wylie 89bb43f382 'core' ORC extension (#7138)
* orc extension reworked to use apache orc map-reduce lib, moved to core extensions, support for flattenSpec, tests, docs

* change binary handling to be compatible with avro and parquet, Rows.objectToStrings now converts byte[] to base64, change date handling

* better docs and tests

* fix it

* formatting

* doc fix

* fix it

* exclude redundant dependencies

* use latest orc-mapreduce, add hadoop jobProperties recommendations to docs

* doc fix

* review stuff and fix binaryAsString

* cache for root level fields

* more better
2019-04-09 09:03:26 -07:00
Jonathan Wei 09abc4a856 Fix comments in KinesisIndexTaskIOConfig (#7430) 2019-04-09 08:40:08 -07:00
Jihoon Son e87d6e32b3 Support kinesis compatibility (#7351) 2019-04-08 19:19:34 -07:00
Gian Merlino 7cd5477658 DoublesSketchComplexMetricSerde: Handle empty strings. (#7429) 2019-04-08 18:01:31 -07:00
Clint Wylie 76b4a5c62e refactor lookups to be more chill to router (#7222)
* refactor lookups to be more chill to router

* remove accidental change

* fix and combine LookupIntrospectionResourceTest

* fix inspection

* rename RouterLookupModule to LookupSerdeModule and RouterLookupExtractorFactoryContainerProvider to NoopLookupExtractorFactoryContainerProvider

* make comment generic

* use ConfigResourceFilter instead of StateResourceFilter

* fix indentation

* unused import

* another unused import

* refactor some stuff into processing module, split up LookupModule.java classes into their own files
2019-04-05 14:49:41 -07:00
Alexander Saydakov 28b4e8586d use latest sketches-core-0.13.1 (#7320)
* use latest sketches-core-0.13.0

* latest release
2019-04-03 17:06:02 -04:00
Clint Wylie a99f0ff450 prefix no-op aggs with "Noop" (#6960) 2019-04-02 15:05:07 -07:00
Justin Borromeo ad7862c58a Time Ordering On Scans (#7133)
* Moved Scan Builder to Druids class and started on Scan Benchmark setup

* Need to form queries

* It runs.

* Stuff for time-ordered scan query

* Move ScanResultValue timestamp comparator to a separate class for testing

* Licensing stuff

* Change benchmark

* Remove todos

* Added TimestampComparator tests

* Change number of benchmark iterations

* Added time ordering to the scan benchmark

* Changed benchmark params

* More param changes

* Benchmark param change

* Made Jon's changes and removed TODOs

* Broke some long lines into two lines

* nit

* Decrease segment size for less memory usage

* Wrote tests for heapsort scan result values and fixed bug where iterator
wasn't returning elements in correct order

* Wrote more tests for scan result value sort

* Committing a param change to kick teamcity

* Fixed codestyle and forbidden API errors

* .

* Improved conciseness

* nit

* Created an error message for when someone tries to time order a result
set > threshold limit

* Set to spaces over tabs

* Fixing tests WIP

* Fixed failing calcite tests

* Kicking travis with change to benchmark param

* added all query types to scan benchmark

* Fixed benchmark queries

* Renamed sort function

* Added javadoc on ScanResultValueTimestampComparator

* Unused import

* Added more javadoc

* improved doc

* Removed unused import to satisfy PMD check

* Small changes

* Changes based on Gian's comments

* Fixed failing test due to null resultFormat

* Added config and get # of segments

* Set up time ordering strategy decision tree

* Refactor and pQueue works

* Cleanup

* Ordering is correct on n-way merge -> still need to batch events into
ScanResultValues

* WIP

* Sequence stuff is so dirty :(

* Fixed bug introduced by replacing deque with list

* Wrote docs

* Multi-historical setup works

* WIP

* Change so batching only occurs on broker for time-ordered scans

Restricted batching to broker for time-ordered queries and adjusted
tests

Formatting

Cleanup

* Fixed mistakes in merge

* Fixed failing tests

* Reset config

* Wrote tests and added Javadoc

* Nit-change on javadoc

* Checkstyle fix

* Improved test and appeased TeamCity

* Sorry, checkstyle

* Applied Jon's recommended changes

* Checkstyle fix

* Optimization

* Fixed tests

* Updated error message

* Added error message for UOE

* Renaming

* Finish rename

* Smarter limiting for pQueue method

* Optimized n-way merge strategy

* Rename segment limit -> segment partitions limit

* Added a bit of docs

* More comments

* Fix checkstyle and test

* Nit comment

* Fixed failing tests -> allow usage of all types of segment spec

* Fixed failing tests -> allow usage of all types of segment spec

* Revert "Fixed failing tests -> allow usage of all types of segment spec"

This reverts commit ec470288c7.

* Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge"

This reverts commit 57033f36df, reversing
changes made to 8f01d8dd16.

* Check type of segment spec before using for time ordering

* Fix bug in numRowsScanned

* Fix bug messing up count of rows

* Fix docs and flipped boolean in ScanQueryLimitRowIterator

* Refactor n-way merge

* Added test for n-way merge

* Refixed regression

* Checkstyle and doc update

* Modified sequence limit to accept longs and added test for long limits

* doc fix

* Implemented Clint's recommendations
2019-03-28 14:37:09 -07:00
Clint Wylie db0125b709 this will fix it (#7358) 2019-03-27 19:42:58 -07:00
Charles Allen eeb3dbe79d Move GCP to a core extension (#6953)
* Move GCP to a core extension

* Don't provide druid-core >.<

* Keep AWS and GCP modules separate

* Move AWSModule to its own module

* Add aws ec2 extension and more modules in more places

* Fix bad imports

* Fix test jackson module

* Include AWS and GCP core in server

* Add simple empty method comment

* Update version to 15

* One more 0.13.0-->0.15.0 change

* Fix multi-binding problem

* Grep for s3-extensions and update docs

* Update extensions.md
2019-03-27 09:00:43 -07:00
Jihoon Son 5294277cb4
Fix exclusive start partitions for sequenceMetadata (#7339)
* Fix exclusvie start partitions for sequenceMetadata

* add empty check
2019-03-26 14:39:07 -07:00
Roman Leventov bca40dcdaf
Fix some IntelliJ inspections (#7273)
Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.
2019-03-25 21:11:01 -03:00
Jihoon Son 0c5dcf5586 Fix exclusivity for start offset in kinesis indexing service & check exclusivity properly in IndexerSQLMetadataStorageCoordinator (#7291)
* Fix exclusivity for start offset in kinesis indexing service

* some adjustment

* Fix SeekableStreamDataSourceMetadata

* Add missing javadocs

* Add missing comments and unit test

* fix SeekableStreamStartSequenceNumbers.plus and add comments

* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue

* Add javadocs

* fix compilation

* fix test

* remove unused variable
2019-03-21 13:12:22 -07:00
Surekha e170203876 Consolidate kafka consumer configs (#7249)
* Consolidate kafka consumer configs

* change the order of adding properties

* Add consumer properties to fix test

it seems kafka consumer does not reveive any message without these configs

* Use KafkaConsumerConfigs in integration test

* Update zookeeper and kafka versions in the setup.sh for the base druid image

*  use version 0.2 of base druid image

* Try to fix tests in KafkaRecordSupplierTest

* unused import

* Fix tests in KafkaSupervisorTest
2019-03-21 11:19:49 -07:00
Christian Senet b2a940addd Fix flattening Avro Maps with Utf8 keys (#7258)
* #5884 Avro Maps can't be flattened in Kafka Indexing on master

* correct typo and reduce code size
2019-03-18 14:23:53 -07:00
Gian Merlino a8c7132482 Logic adjustments to SeekableStreamIndexTaskRunner. (#7267)
* Logic adjustments to SeekableStreamIndexTaskRunner.

A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:

- Remove "initialOffsetsSnapshot", which was used to determine when to
  skip start offsets. Replace it with "lastReadOffsets", which I hope
  is more intuitive. (There is a connection: start offsets must be
  skipped if and only if they have already been read, either by a
  previous task or by a previous sequence in the same task, post-restoring.)
- Remove "isStartingSequenceOffsetsExclusive", because it should always
  be the opposite of isEndOffsetExclusive. The reason is that starts are
  exclusive exactly when the prior ends are inclusive: they must match
  up in that way for adjacent reads to link up properly.
- Don't call "seekToStartingSequence" after the initial seek. There is
  no reason to, since we expect to read continuous message streams
  throughout the task. And calling it makes offset-tracking logic
  trickier, so better to avoid the need for trickiness. I believe the
  call being here was causing a bug in Kinesis ingestion where a
  message might get double-read.
- Remove the "continue" calls in the main read loop. They are bad
  because they prevent keeping currOffsets and lastReadOffsets up to
  date, and prevent us from detecting that we have finished reading.
- Rework "verifyInitialRecordAndSkipExclusivePartition" into
  "verifyRecordInRange". It no longer has side effects. It does a sanity
  check on the message offset and also makes sure that it is not past
  the endOffsets.
- Rework "assignPartitions" to replace inline comparisons with
  "isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
  believe this fixes an off-by-one error with Kinesis where the last
  record would not get read. It also makes the logic easier to read.
- When doing the final publish, only adjust end offsets of the final
  sequence, rather than potentially adjusting any unpublished sequence.
  Adjusting sequences other than the last one is a mistake since it
  will extend their endOffsets beyond what they actually read. (I'm not
  sure if this was an issue in practice, since I'm not sure if real
  world situations would have more than one unpublished sequence.)
- Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
  shorter and more clear, I think.
- Add equals/hashCode/toString methods to OrderedSequenceNumber.

Kafka test changes:

- Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
  the very end of the task lifecycle still work properly.

Kinesis test changes:

- Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
  given Kinesis semantics, the right behavior when start offset equals
  end offset (and there aren't exclusive partitions set) is to read that
  single offset. This is because they are both meant to be treated as
  inclusive.
- Adjusted "testRestoreAfterPersistingSequences" to expect one more
  message read. I believe the old test was wrong; it expected the task
  not to read message number 5.
- Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
  checkpoint starting from 1 rather than 2. I believe the old test was
  wrong here too; it was expecting the task to start reading from the
  checkpointed offset, but it actually should have started reading from
  one past the checkpointed offset.
- Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
  11 messages read instead of 12. It's starting at message 0 and reading
  up to 10, which should be 11 messages.

* Changes from code review.
2019-03-15 00:22:42 -07:00
Jihoon Son 69a6f1154a Fix testIncrementalHandOffReadsThroughEndOffsets in Kafka/KinesisIndexTaskTest (#7264)
* Fix testIncrementalHandOffReadsThroughEndOffsets in Kafka/KinesisIndexTaskTest

* revert unnecessary change

* fix test

* remove debug log
2019-03-14 23:31:08 -04:00
Furkan KAMACI 7ada1c49f9 Prohibit Throwables.propagate() (#7121)
* Throw caught exception.

* Throw caught exceptions.

* Related checkstyle rule is added to prevent further bugs.

* RuntimeException() is used instead of Throwables.propagate().

* Missing import is added.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.

* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.

* Throwable is kept before firing a Runtime Exception.

* Fix unused assignments.
2019-03-14 18:28:33 -03:00
Jonathan Wei 74bbe5ab9f Improve error message for unavailable auth cache in BasicHTTPAuthenticator (#7263) 2019-03-14 14:39:43 +08:00
Jonathan Wei c020272add Fix KafkaRecordSupplier assign (#7260)
* Fix KafkaRecordSupplier assign

* TeamCity fix
2019-03-14 14:36:14 +08:00
Clint Wylie fb1489d313 fix SequenceMetadata deserialization (#7256)
* wip

* fix tests, stop reading if we are at end offset

* fix build

* remove restore at end offsets fix in favor of a separate PR

* use typereference from method for serialization too
2019-03-13 17:29:39 -04:00
Clint Wylie 3895914aa2 consolidate CompressionUtils.java since now in the same jar (#6908) 2019-03-13 11:02:44 -04:00
Jihoon Son 32e86ea75e Fix record validation in SeekableStreamIndexTaskRunner (#7246)
* Fix record validation in SeekableStreamIndexTaskRunner

* add kinesis test
2019-03-12 21:12:21 -07:00
Clint Wylie d7ba19d477 sql, filters, and virtual columns (#6902)
* refactor sql planning to re-use expression virtual columns when possible when constructing a DruidQuery, allowing virtual columns to be defined in filter expressions, and making resulting native druid queries more concise. also minor refactor of built-in sql aggregators to maximize code re-use

* fix it

* fix it in the right place

* fixup for base64 stuff

* fixup tests

* fix merge conflict on import order

* fixup

* fix imports

* fix tests

* review comments

* refactor

* re-arrange

* better javadoc

* fixup merge

* fixup tests

* fix accidental changes
2019-03-11 11:37:58 -07:00
Gian Merlino fa218f5160 Fix two SeekableStream serde issues. (#7176)
* Fix two SeekableStream serde issues.

1) Fix backwards-compatibility serde for SeekableStreamPartitions. It is needed
   for split 0.13 / 0.14 clusters to work properly during a rolling update.
2) Abstract classes don't need JsonCreator constructors; remove them.

* Comment fixes.
2019-03-01 22:27:08 -08:00
Jihoon Son 9a066558a4 Fix exception when the scheme is missing in endpointUrl for S3 (#7129)
* Fix exception when the scheme is missing in endpointUrl for S3

* add null check
2019-02-25 11:10:35 -08:00
Himanshu Pandey 8b803cbc22 Added checkstyle for "Methods starting with Capital Letters" (#7118)
* Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this.

* Un-abbreviate the method names in the calcite tests

* Fixed checkstyle errors

* Changed asserts position in the code
2019-02-23 20:10:31 -08:00
David Glasser 1c2753ab90 ParallelIndexSubTask: support ingestSegment in delegating factories (#7089)
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.

This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.

Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.

This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.
2019-02-23 17:02:56 -08:00
Jihoon Son 4e2b085201
Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911)
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file

* delete descriptor.file when killing segments

* fix test

* Add doc for ha

* improve warning
2019-02-20 15:10:29 -08:00
Surekha 80a2ef7be4 Support kafka transactional topics (#5404) (#6496)
* Support kafka transactional topics

* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests

* Remove unused import from test

* Fix compilation

* Invoke transaction api to fix a unit test

* temporary modification of travis.yml for debugging

* another attempt to get travis tasklogs

* update kafka to 2.0.1 at all places

* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes

* Add deprecated in docs for kafka-eight and kafka-simple extensions

* Remove skipOffsetGaps and code changes for transaction support

* Fix indentation

* remove skipOffsetGaps from kinesis

* Add transaction api to KafkaRecordSupplierTest

* Fix indent

* Fix test

* update kafka version to 2.1.0
2019-02-18 11:50:08 -08:00
scrawfor 0fa9000849 Add Postgresql SqlFirehose (#6813)
* Add Postgresql SqlFirehose

* Fix Code Style.

* Fix style.

* Fix Import Order.

* Add Line Break before package.
2019-02-14 22:52:03 -08:00
Mingming Qiu d0abf5c20a fix kafka index task doesn't resume when recieve duplicate request (#6990)
* fix kafka index task doesn't resume when recieve duplicate request

* add unit test
2019-02-12 13:24:28 -08:00
Ankit Kothari 16a4a50e91 [Issue #6967] NoClassDefFoundError when using druid-hdfs-storage (#7015)
* Fix:
  1. hadoop-common dependency for druid-hdfs and druid-kerberos extensions
 Refactoring:
  2. Hadoop config call in the inner static class to avoid class path conflicts for stopGracefully kill

* Fix:
  1. hadoop-common test dependency

* Fix:
  1. Avoid issue of kill command once the job is actually completed
2019-02-08 18:26:37 -08:00
Jonathan Wei fafbc4a80e
Set version to 0.15.0-incubating-SNAPSHOT (#7014) 2019-02-07 14:02:52 -08:00
Furkan KAMACI 3097562adf Improper getter value is fixed. (#6930)
* Improper getter value is fixed.

* Test class is added.
2019-02-07 11:51:07 -08:00
anantmf 315ccb76b8 Fix for getSingleObjectSummary, replacing keyCount with objectSummaries().size (#7000)
* Instead of using keyCount, changing it to check the size of objectSummaries.

For issue:
https://github.com/apache/incubator-druid/issues/6980
https://github.com/apache/incubator-druid/issues/6980#issuecomment-460006580

* Changing another usage of keyCount with size of objectSummaries.

* Adding some comments to explain why using keyCount is not working as expected.
2019-02-05 15:45:44 -08:00
Jonathan Wei 8bc5eaa908
Set version to 0.14.0-incubating-SNAPSHOT (#7003) 2019-02-04 19:36:20 -08:00
Roman Leventov 0e926e8652 Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898)
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool

* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java

* Remove unnecessary comment

* Checkstyle

* Fix getFromExtensions()

* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization

* Fix UriCacheGeneratorTest

* Workaround issue with MaterializedViewQueryQueryToolChest

* Strengthen Appenderator's contract regarding concurrency
2019-02-04 09:18:12 -08:00
Clint Wylie 6207b66e20 fix build (#6994) 2019-02-03 09:38:51 -08:00
Jonathan Wei 953b96d0a4 Add more sketch aggregator support in Druid SQL (#6951)
* Add more sketch aggregator support in Druid SQL

* Add docs

* Tweak module serde register

* Fix tests

* Checkstyle

* Test fix

* PR comment

* PR comment

* PR comments
2019-02-02 22:34:53 -08:00
Surekha 7baa33049c Introduce published segment cache in broker (#6901)
* Add published segment cache in broker

* Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set

* Added a trueEquals to DataSegment class

* Use separate interner for realtime and historical segments

* Remove trueEquals as it's not used anymore, change log message

* PR comments

* PR comments

* Fix tests

* PR comments

* Few more modification to

* change the coordinator api
* removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments
* Change the poll behaviour to avoid multiple poll execution at same time

* minor changes

* PR comments

* PR comments

* Make the segment cache in broker off by default

* Added a config to PlannerConfig
* Moved MetadataSegmentView to sql module

* Add doc for new planner config

* Update documentation

* PR comments

* some more changes

* PR comments

* fix test

* remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR

* minor changes

* some changes to initialization

* use pollPeriodInMS

* Add boolean cachePopulated to check if first poll succeeds

* Remove poll from start()

* take the log message out of condition in stop()
2019-02-02 22:27:13 -08:00
Clint Wylie 7a5827e12e bloom filter sql aggregator (#6950)
* adds sql aggregator for bloom filter, adds complex value serde for sql results

* fix tests

* checkstyle

* fix copy-paste
2019-02-01 13:54:46 -08:00
Roman Leventov f7df5fedcc Add several missing inspectRuntimeShape() calls (#6893)
* Add several missing inspectRuntimeShape() calls

* Add lgK to runtime shapes
2019-01-31 20:04:26 -08:00
Jihoon Son d4fbbb8deb Support protocol configuration for S3 (#6954)
* Support protocol configuration for S3

* Add doc
2019-01-30 19:32:00 -08:00
Clint Wylie a6d81c0d16 Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397)
* blooming aggs

* partially address review

* fix docs

* minor test refactor after rebase

* use copied bloomkfilter

* add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests

* add methods to BloomKFilter to get number of set bits, use in comparator, fixes

* more docs

* fix

* fix style

* simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets

* oof, more fixes

* more sane docs example

* fix it

* do the right thing in the right place

* formatting

* fix

* avoid conflict

* typo fixes, faster comparator, docs for comparator behavior

* unused imports

* use buffer comparator instead of deserializing

* striped readwrite lock for buffer agg, null handling comparator, other review changes

* style fixes

* style

* remove sync for now

* oops

* consistency

* inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception

* CardinalityBufferAggregator inspect selectors instead of selectorPluses

* fix style

* refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments

* adjustment

* fix teamcity error?

* rename nil aggs to empty, change empty agg constructor signature, add comments

* use stringutils base64 stuff to be chill with master

* add aggregate combiner, comment
2019-01-29 20:05:17 +07:00
Gian Merlino ba33bdc497 Add exclusions to limit doubling up on jars. (#6927) 2019-01-28 11:06:30 -08:00
Clint Wylie af3cbc3687 add bloom filter druid expression (#6904)
* add "bloom_filter_test" druid expression to support bloom filters in ExpressionVirtualColumn and ExpressionDimFilter and sql expressions

* more docs

* use java.util.Base64, doc fixes
2019-01-28 08:41:45 -05:00
Benedict Jin 72a571fbf7 For performance reasons, use `java.util.Base64` instead of Base64 in Apache Commons Codec and Guava (#6913)
* * Add few methods about base64 into StringUtils

* Use `java.util.Base64` instead of others

* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis

* Rename encodeBase64String & decodeBase64String

* Update druid-forbidden-apis
2019-01-25 17:32:29 -08:00
Ankit Kothari 8492d94f59 Kill Hadoop MR task on kill of Hadoop ingestion task (#6828)
* KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute

Run in jobby is now split into 2
 1. submitAndGetHadoopJobId followed by 2. run
  submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion

JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner

HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called

Hadoop*Job files have been changed to incorporate the changes to jobby

* Addressing PR comments

* Addressing PR comments - Fix taskDir

* Addressing PR comments - For changing the contract of Task.stopGracefully()
`SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task

* Addressing PR comments
 1. Formatting
 2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself

* Addressing PR comments
 1. POM change. Moving hadoop dependency to indexing-hadoop

* Addressing PR comments
 1. stopGracefully now accepts TaskConfig as a param
     Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask`
     Changing tests to make TaskConfig param isRestoreOnRestart to true
2019-01-25 15:43:06 -08:00
Clint Wylie ffded61f5e
fix build (#6897) 2019-01-21 17:18:14 -08:00
Roman Leventov 8eae26fd4e Introduce SegmentId class (#6370)
* Introduce SegmentId class

* tmp

* Fix SelectQueryRunnerTest

* Fix indentation

* Fixes

* Remove Comparators.inverse() tests

* Refinements

* Fix tests

* Fix more tests

* Remove duplicate DataSegmentTest, fixes #6064

* SegmentDescriptor doc

* Fix SQLMetadataStorageUpdaterJobHandler

* Fix DataSegment deserialization for ignoring id

* Add comments

* More comments

* Address more comments

* Fix compilation

* Restore segment2 in SystemSchemaTest according to a comment

* Fix style

* fix testServerSegmentsTable

* Fix compilation

* Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes

* Fix SystemSchemaTest

* Fix style

* Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment

* Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164

* Fix compilation
2019-01-21 11:11:10 -08:00
Jonathan Wei 68f744ec0a
Fixed buckets histogram aggregator (#6638)
* Fixed buckets histogram aggregator

* PR comments

* More PR comments

* Checkstyle

* TeamCity

* More TeamCity

* PR comment

* PR comment

* Fix doc formatting
2019-01-17 14:51:16 -08:00