* save work
* Working
* Fix runner constructor
* Working runner
* extra log lines
* try using lifecycle for everything
* clean up configs
* cleanup /workers call
* Use a single config
* Allow selecting runner
* debug changes
* Work on composite task runner
* Unit tests running
* Add documentation
* Add some javadocs
* Fix spelling
* Use standard libraries
* code review
* fix
* fix
* use taskRunner as string
* checkstyl
---------
Co-authored-by: Suneet Saldanha <suneet@apache.org>
* prometheus-emitter: add extraLabels parameter
* prometheus-emitter: update readme to include the extraLabels parameter
* prometheus-emitter: remove nullable and surface label name issues
* remove import to make linter happy
This adds a new contrib extension: druid-iceberg-extensions which can be used to ingest data stored in Apache Iceberg format. It adds a new input source of type iceberg that connects to a catalog and retrieves the data files associated with an iceberg table and provides these data file paths to either an S3 or HDFS input source depending on the warehouse location.
Two important dependencies associated with Apache Iceberg tables are:
Catalog : This extension supports reading from either a Hive Metastore catalog or a Local file-based catalog. Support for AWS Glue is not available yet.
Warehouse : This extension supports reading data files from either HDFS or S3. Adapters for other cloud object locations should be easy to add by extending the AbstractInputSourceAdapter.
In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
* Hook up PodTemplateTaskAdapter
* Make task adapter TYPE parameters final
* Rename adapters types
* Include specified adapter name in exception message
* Documentation for sidecarSupport deprecation
* Fix order
* Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969)
* Update docs/development/extensions-contrib/k8s-jobs.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Hook up PodTemplateTaskAdapter
* Make task adapter TYPE parameters final
* Rename adapters types
* Include specified adapter name in exception message
* Documentation for sidecarSupport deprecation
* Fix order
* fix spelling errors
---------
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Better sidecar support
* remove un-thrown exception from test
* Druid you are such a stickler about spelling :)
* Only require the primaryContainerName, no need to exclude containers
* Support for middle manager less druid, tasks launch as k8s jobs
* Fixing forking task runner test
* Test cleanup, dependency cleanup, intellij inspections cleanup
* Changes per PR review
Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support
* Removing un-needed log lines
* Small changes per PR review
* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly
* Merge conflict fix
* Fixing tests and docs
* update tiny-cluster.yaml
changed `enableTaskLevelLogPush` to `encapsulatedTask`
* Apply suggestions from code review
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Minor changes per PR request
* Cleanup, adding test to AbstractTask
* Add comment in peon.sh
* Bumping code coverage
* More tests to make code coverage happy
* Doh a duplicate dependnecy
* Integration test setup is weird for k8s, will do this in a different PR
* Reverting back all integration test changes, will do in anotbher PR
* use StringUtils.base64 instead of Base64
* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different
Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Various documentation updates.
1) Split out "data management" from "ingestion". Break it into thematic pages.
2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
all conceptual content is in concepts.md and all syntax content is in reference.md.
Shorten the known issues page to the most interesting ones.
3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.
4) Rename various mentions of "Druid console" to "web console".
5) Add additional information to ingestion/partitioning.md.
6) Remove a mention of Tranquility.
7) Remove a note about upgrading to Druid 0.10.1.
8) Remove no-longer-relevant task types from ingestion/tasks.md.
9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.
10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
in the sidebar.
11) Make all br tags self-closing.
12) Certain other cosmetic changes.
13) Update to node-sass 7.
* make travis use node12 for docs
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
* prometheus-emitter supports sending metrics to pushgateway regularly and continuously
* spell check fix
* Optimization variable name and related documents
* Update docs/development/extensions-contrib/prometheus.md
OK, it looks more conspicuous
Co-authored-by: Frank Chen <frankchen@apache.org>
* Update doc
* Update docs/development/extensions-contrib/prometheus.md
Co-authored-by: Frank Chen <frankchen@apache.org>
* When PrometheusEmitter is closed, close the scheduler
* Ensure that registeredMetrics is thread safe.
* Local variable name optimization
* Remove unnecessary white space characters
Co-authored-by: Frank Chen <frankchen@apache.org>
Compressed Big Decimal is an extension which provides support for
Mutable big decimal value that can be used to accumulate values
without losing precision or reallocating memory. This type helps in
absolute precision arithmetic on large numbers in applications,
where greater level of accuracy is required, such as financial
applications, currency based transactions. This helps avoid rounding
issues where in potentially large amount of money can be lost.
Accumulation requires that the two numbers have the same scale,
but does not require that they are of the same size. If the value
being accumulated has a larger underlying array than this value
(the result), then the higher order bits are dropped, similar to what
happens when adding a long to an int and storing the result in an
int. A compressed big decimal that holds its data with an embedded
array.
Compressed big decimal is an absolute number based complex type
based on big decimal in Java. This supports all the functionalities
supported by Java Big Decimal. Java Big Decimal is not mutable in
order to avoid big garbage collection issues. Compressed big decimal
is needed to mutate the value in the accumulator.
* Make nodeRole available during binding; add support for dynamic registration of DruidService
* fix checkstyle and test
* fix customRole test
* address comments
* add more javadoc
* request logs through kafka emitter
* travis fixes
* review comments
* kafka emitter unit test
* new line
* travis checks
* checkstyle fix
* count request lost when request topic is null
* prometheus-emitter
* use existing jetty server to expose prometheus collection endpoint
* unused variables
* better variable names
* removed unused dependencies
* more metric definitions
* reorganize
* use prometheus HTTPServer instead of hooking into Jetty server
* temporary empty help string
* temporary non-empty help. fix incorrect dimension value in JSON (also updated statsd json)
* added full help text. added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation
* added documentation for prometheus emitter
* safety for invalid labelNames
* fix travis checks
* Unit test and better sanitization of metrics names and label values
* add precondition to check namespace against regex
* use precompiled regex
* remove static imports. fix metric types
* better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start()
* Update regex for label-value replacements to allow internal numeric values. Additional tests
* Adds missing license header
updates website/.spelling to add words used in prometheus-emitter docs.
updates docs/operations/metrics.md to correct the spelling of
bufferPoolName
* fixes version in extensions-contrib/prometheus-emitter
* fix style guide errors
* update import ordering
* add another word to website/.spelling
* remove unthrown declared exception
* remove unused import
* Pushgateway strategy for metrics
* typo
* Format fix and nullable strategy
* Update pom file for prometheus-emitter
* code review comments. Counter to gauge for cache metrics, periodical task to pushGateway
* Syntax fix
* Dimension label regex include numeric character back, fix previous commit
* bump prometheus-emitter pom dev version
* Remove scheduled task inside poen that push metrics
* Fix checkstyle
* Unit test coverage
* Unit test coverage
* Spelling
* Doc fix
* spelling
Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com>
Co-authored-by: Michael Schiff <schiff.michael@gmail.com>
Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com>
Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
* Add https to druid-influxdb-emitter extension
* address CI failures
* increase test coverage
* tests for being unable to load trustStore
* fix EqualsVerifier test
* fix intellij inspection error
* use try-with-resources when loading trustStore
* support redis cluster
* add 'password', 'database' properties
* test cases passed
* update doc
* some improvements
* fix CI
* add more test cases to improve branch coverage
* fix dependency check for test
* resolve review comments
* init commit, all tests passed
* fix format
Signed-off-by: frank chen <frank.chen021@outlook.com>
* data stored successfully
* modify config path
* add doc
* add aliyun-oss extension to project
* remove descriptor deletion code to avoid warning message output by aliyun client
* fix warnings reported by lgtm-com
* fix ci warnings
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix errors reported by intellj inspection check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc spelling check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix dependency warnings reported by ci
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix warnings reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add package configuration to support showing extension info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add IT test cases and fix bugs
Signed-off-by: frank chen <frank.chen021@outlook.com>
* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add license info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc
Signed-off-by: frank chen <frank.chen021@outlook.com>
* exclude execution of IT testcases of OSS extension from CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* put the extensions under contrib group and add to distribution
* fix names in test cases
* add unit test to cover OssInputSource
* fix names in test cases
* fix dependency problem reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Adding support for autoscaling in GCE
* adding extra google deps also in gce pom
* fix link in doc
* remove unused deps
* adding terms to spelling file
* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT
* GCEXyz -> GceXyz in naming for consistency
* add preconditions
* add VisibleForTesting annotation
* typos in comments
* use StringUtils.format instead of String.format
* use custom exception instead of exit
* factorize interval time between retries
* making literal value a constant
* iter all network interfaces
* use provided on google (non api) deps
* adding missing dep
* removing unneded this and use Objects methods instead o 3-way if in hash and comparison
* adding import
* adding retries around getRunningInstances and adding limit for operation end waiting
* refactor GceEnvironmentConfig.hashCode
* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT
* removing unused config
* adding tests to hash and equals
* adding nullable to waitForOperationEnd
* adding testTerminate
* adding unit tests for createComputeService
* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)
* reverting queryResponseTemplate change
* adding comment for Compute.Builder.build() returning null
* Move Azure extension into Core
Moving the azure extension into Core.
* * Fix build failure
* * Add The MIT License (MIT) to list of compatible licenses
* * Address review comments
* * change reference to contrib azure to core azure
* * Fix spelling mistakes.
* Add Azure config options for segment prefix and max listing length
Added configuration options to allow the user to specify the prefix
within the segment container to store the segment files. Also
added a configuration option to allow the user to specify the
maximum number of input files to stream for each iteration.
* * Fix test failures
* * Address review comments
* * add dependency explicitly to pom
* * update docs
* * Address review comments
* * Address review comments
* Add config option for namespacePrefix
opentsdb emitter sends metric names to opentsdb verbatim as what druid
names them, for example "query.count", this doesn't fit well with a
central opentsdb server which might have namespaced metrics, for example
"druid.query.count". This adds support for adding an optional prefix.
The prefix also gets a trailing dot (.), after it, so the metric name
becomes <namespacePrefix>.<metricname>
configureable as "druid.emitter.opentsdb.namespacePrefix", as
documented.
Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com>
* Spelling for PR #9372
Added "namespacePrefix" to .spelling exceptions, it's a variable name
used in documentation for opentsdb-emitter.
* fixing tests for PR #9372
changed naming of variables to be more descriptive
added test of prefix being an empty string: "".
added a conditional to buildNamespacePrefix to check for empty string
being fed if EventConverter called without OpentsdbEmitterConfig
instance.
* fixing checkstyle errors for PR #9372
used == to compare literal string, should be equals()
* cleaned up and updated PR #9372
Created a buildMetric function as suggested by clintropolis, and
removed redundant tests for empty strings as they're only used when
calling EventConverter directly without going through
OpentsdbEmitterConfig.
* consistent naming of tests PR #9372
Changed names of tests in files to match better with what it was
actually testing
changed check for Strings.isNullOrEmpty to just check for `null`, as
empty string valued `namespacePrefix` is handled in
OpentsdbEmitterConfig.
Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
* move google ext docs from contrib to core
* fix links
* revert unintended change
* more links, add note to example ext doc that it was removed, unlink from sidebar