* Add state and error tracking for seekable stream supervisors
* Fixed nits in docs
* Made inner class static and updated spec test with jackson inject
* Review changes
* Remove redundant config param in supervisor
* Style
* Applied some of Jon's recommendations
* Add transience field
* write test
* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()
* remove transience reporting and fix SeekableStreamSupervisorStateManager impl
* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests
* remove stateHistory because it wasn't adding much value, some fixes, and add more tests
* fix tests
* code review changes and add HTTP health check status
* fix test failure
* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager
* fixup after merge
* code review changes - add additional docs
* cleanup KafkaIndexTaskTest
* add additional documentation for Kinesis indexing
* remove unused throws class
ambari depends on hadoop-annotation. Under linux, it sometimes activates a
maven profile that includes an explicit dependency on ${java.home}/../lib/tools.jar
This change excludes this dependency, since this jar is no longer
included with Java 9. It also seems seems unlikely that we would require
this dependency at runtime, since it is only enabled on some platforms.
* Add checkstyle for "Local variable names shouldn't start with capital"
* Adjust some local variables to constants
* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
* Upgrade various build and doc links to https.
Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.
* Switch to https://apache.org.
* Bump Checkstyle to 8.20
Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle
Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.
Affected versions: < 8.18
* Oops, missed one
* Oops, missed a few
* First set of changes for tDigest histogram
* Add license
* Address code review comments
* Add a doc page for new T-Digest sketch aggregators. Minor code cleanup and comments.
* Remove synchronization from BufferAggregators. Address code review comments
* Fix typo
* update easymock / powermock for to 4.0.2 / 2.0.2 for JDK11 support
* update tests to use new easymock interfaces
* fix tests failing due to easymock fixes
* remove dependency on jmockit
* fix race condition in ResourcePoolTest
* Contributing Moving-Average Query to open source.
* Fix failing code inspections.
* See if explicit types will invoke the correct comparison function.
* Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter.
* Update styling and headers for complience.
* Refresh code with latest master changes:
* Remove NullDimensionSelector.
* Apply changes of RequestLogger.
* Apply changes of TimelineServerView.
* Small checkstyle fix.
* Checkstyle fixes.
* Fixing rat errors; Teamcity errors.
* Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved.
* Implements some of the review fixes.
* Contributing Moving-Average Query to open source.
* Fix failing code inspections.
* See if explicit types will invoke the correct comparison function.
* Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter.
* Update styling and headers for complience.
* Refresh code with latest master changes:
* Remove NullDimensionSelector.
* Apply changes of RequestLogger.
* Apply changes of TimelineServerView.
* Small checkstyle fix.
* Checkstyle fixes.
* Fixing rat errors; Teamcity errors.
* Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved.
* Implements some of the review fixes.
* More fixes for review.
* More fixes from review.
* MapBasedRow is Unmodifiable. Create new rows instead of modifying existing ones.
* Remove more changes related to datasketches support.
* Refactor BaseAverager startFrom field and add a comment.
* fakeEvents field: Refactor initialization and add comment.
* Rename parameters (tiny change).
* Fix variable name typo in test (JAN_4).
* Fix styling of non camelCase fields.
* Fix Preconditions.checkArgument for cycleSize.
* Add more documentation to RowBucketIterable and other classes.
* key/value comment on in MovingAverageIterable.
* Fix anonymous makeColumnValueSelector returning null.
* Replace IdentityYieldingAccumolator with Yielders.each().
* * internalNext() should return null instead of throwing exception.
* Remove unused variables/prarameters.
* Harden MovingAverageIterableTest (Switch anyOf to exact match).
* Change internalNext() from recursion to iteration; Simplify next() and hasNext().
* Remove unused imports.
* Address review comments.
* Rename fakeEvents to emptyEvents.
* Remove redundant parameter key from computeMovingAverage.
* Check yielder as well in RowBucketIterable#hasNext()
* Fix javadoc.
* orc extension reworked to use apache orc map-reduce lib, moved to core extensions, support for flattenSpec, tests, docs
* change binary handling to be compatible with avro and parquet, Rows.objectToStrings now converts byte[] to base64, change date handling
* better docs and tests
* fix it
* formatting
* doc fix
* fix it
* exclude redundant dependencies
* use latest orc-mapreduce, add hadoop jobProperties recommendations to docs
* doc fix
* review stuff and fix binaryAsString
* cache for root level fields
* more better
* Move GCP to a core extension
* Don't provide druid-core >.<
* Keep AWS and GCP modules separate
* Move AWSModule to its own module
* Add aws ec2 extension and more modules in more places
* Fix bad imports
* Fix test jackson module
* Include AWS and GCP core in server
* Add simple empty method comment
* Update version to 15
* One more 0.13.0-->0.15.0 change
* Fix multi-binding problem
* Grep for s3-extensions and update docs
* Update extensions.md
* Fix exclusivity for start offset in kinesis indexing service
* some adjustment
* Fix SeekableStreamDataSourceMetadata
* Add missing javadocs
* Add missing comments and unit test
* fix SeekableStreamStartSequenceNumbers.plus and add comments
* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue
* Add javadocs
* fix compilation
* fix test
* remove unused variable
* Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality
* Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier()
* More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet()
* Style
* typo
* Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives
* Fixes
* Throw caught exception.
* Throw caught exceptions.
* Related checkstyle rule is added to prevent further bugs.
* RuntimeException() is used instead of Throwables.propagate().
* Missing import is added.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* Throwables are propogated if possible.
* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.
* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.
* Throwable is kept before firing a Runtime Exception.
* Fix unused assignments.
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file
* delete descriptor.file when killing segments
* fix test
* Add doc for ha
* improve warning
* bugfix: when building materialized-view, if taskCount >1, may cause ConcurrentModificationException
* remove entry after iteration instead of using ConcurrentMap, and add unit test
* small change
* modify unit test for coverage
* remove unused method
* Support kafka transactional topics
* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests
* Remove unused import from test
* Fix compilation
* Invoke transaction api to fix a unit test
* temporary modification of travis.yml for debugging
* another attempt to get travis tasklogs
* update kafka to 2.0.1 at all places
* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes
* Add deprecated in docs for kafka-eight and kafka-simple extensions
* Remove skipOffsetGaps and code changes for transaction support
* Fix indentation
* remove skipOffsetGaps from kinesis
* Add transaction api to KafkaRecordSupplierTest
* Fix indent
* Fix test
* update kafka version to 2.1.0
* Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool
* Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java
* Remove unnecessary comment
* Checkstyle
* Fix getFromExtensions()
* Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization
* Fix UriCacheGeneratorTest
* Workaround issue with MaterializedViewQueryQueryToolChest
* Strengthen Appenderator's contract regarding concurrency
This is an extension of PR #5750 by @drcrallen which added retry to a variety of
GCS operations, but not to GoogleTaskLogs, which we have found to
occasionally fail in our cluster.
Also fixes a typo in a variable name and removes an unused private method
parameter.
Fixes#6912.
* blooming aggs
* partially address review
* fix docs
* minor test refactor after rebase
* use copied bloomkfilter
* add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests
* add methods to BloomKFilter to get number of set bits, use in comparator, fixes
* more docs
* fix
* fix style
* simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets
* oof, more fixes
* more sane docs example
* fix it
* do the right thing in the right place
* formatting
* fix
* avoid conflict
* typo fixes, faster comparator, docs for comparator behavior
* unused imports
* use buffer comparator instead of deserializing
* striped readwrite lock for buffer agg, null handling comparator, other review changes
* style fixes
* style
* remove sync for now
* oops
* consistency
* inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception
* CardinalityBufferAggregator inspect selectors instead of selectorPluses
* fix style
* refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments
* adjustment
* fix teamcity error?
* rename nil aggs to empty, change empty agg constructor signature, add comments
* use stringutils base64 stuff to be chill with master
* add aggregate combiner, comment
* * Add few methods about base64 into StringUtils
* Use `java.util.Base64` instead of others
* Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis
* Rename encodeBase64String & decodeBase64String
* Update druid-forbidden-apis
PR #6605 added support to the statsd emitter for DogStatsD tags. This commit
lets you specify "constant tags" in the config file which are included with
every event. This is helpful if you are running in an environment where you
cannot configure your datadog-agent with tags like "cluster name" --- eg, a
Kubernetes cluster with a datadog-agent on each node and different Druid
deployments in different namespaces but sharing the same datadog-agent
daemonset.
Also fix the name of an existing boolean getter to start with 'is'.
* Add checkstyle rules about imports and empty lines between members
* Add suppressions
* Update Eclipse import order
* Add empty line
* Fix StatsDEmitter
* Replace StatsD client library
The [Datadog package][1] is a StatsD compatible drop-in replacement for the
client library, but it seems to be [better maintained][2] and has support for
Datadog DogStatsD specific features, which will be made use of in a subsequent
commit.
The `count`, `time`, and `gauge` methods are actually exactly compatible with
the previous library and the modifications shouldn't be required, but EasyMock
seems to have a hard time dealing with the variable arguments added by the
DogStatsD library and causes tests to fail if no arguments are provided for the
last String vararg. Passing an empty array fixes the test failures.
[1]: https://github.com/DataDog/java-dogstatsd-client
[2]: https://github.com/tim-group/java-statsd-client/issues/37#issuecomment-248698856
* Retain dimension key information for StatsD metrics
This doesn't change behavior, but allows separating dimensions from the metric
name in subsequent commits.
There is a possible order change for values from
`dimsBuilder.build().values()`, but from the tests it looks like it doesn't
affect actual behavior and the order of user dimensions is also retained.
* Support DogStatsD style tags in statsd-emitter
Datadog [doesn't support name-encoded dimensions and uses a concept of _tags_
instead.][1] This change allows Datadog users to send the metrics without
having to encode the various dimensions in the metric names. This enables
building graphs and monitors with and without aggregation across various
dimensions from the same data.
As tests in this commit verify, the behavior remains the same for users who
don't enable the `druid.emitter.statsd.dogstatsd` configuration flag.
[1]: https://www.datadoghq.com/blog/the-power-of-tagged-metrics/#tags-decouple-collection-and-reporting
* Disable convertRange behavior for DogStatsD users
DogStatsD, unlike regular StatsD, supports floating-point values, so this
behavior is unnecessary. It would be possible to still support `convertRange`,
even with `dogstatsd` enabled, but that would mean that people using the
default mapping would have some of the gauges unnecessarily converted.
`time` is in milliseconds and doesn't support floating-point values.