Commit Graph

136 Commits

Author SHA1 Message Date
Jihoon Son fc9513b6cd
Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012)
* Make nodeRole available during binding; add support for dynamic registration of DruidService

* fix checkstyle and test

* fix customRole test

* address comments

* add more javadoc
2021-12-03 11:59:00 -08:00
jacobtolar f7f5505631
Add avro_ocf to supported Kafka/Kinesis InputFormats (#11865)
* Update docs - Kinesis InputFormat ingestion

* Add avro_ocf to list of supported Kafka InputFormats

* Remove extra whitespace.

* Update kafka-supervisor-reference.md

* Delete extra whitespace.
2021-12-03 07:57:26 -08:00
Charles Smith 7ed46800c3
Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983)
Adds documentation for multi-dimension partitioning. cc: @kfaraz
Refactors the native batch partitioning topic as follows:

Native batch ingestion covers parallel-index
Native batch simple task indexing covers index
Native batch input sources covers ioSource
Native batch ingestion with firehose covers deprecated firehose
2021-12-03 16:37:14 +05:30
benkrug 11746b8536
Update datasketches-hll.md (#12010)
under "Aggregators", about the lgK setting, it said "Must be a power of 2 from 4 to 21 inclusively."  21 is not a power of 2, nor is 12, the given default.  I think there may have been confusion because lgK represents log2 of K.  We could say "K must be a power of 2...", or just say lgK must be between 4 and 21.
2021-11-30 18:52:00 -08:00
Charles Smith f536f31229
clarify avro support & general style improvements (#11975)
* clarify avro support & general style improvements

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/avro.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update avro.md

remove redundancy

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-11-28 16:10:18 +08:00
Charles Smith 33a5cda061
Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912)
* Splits Kafka topic according to function. Adds detailed example for kafka inputFormat

* Apply suggestions from code review

accept suggestions from review

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

accept suggestions

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* accept suggestions

* accept suggestions

* final typos and clarifications

* bringing forward some syntax fixes

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-11-12 13:02:23 -08:00
Maytas Monsereenusorn a36a41da73
Support routing data through an HTTP proxy (#11891)
* Support routing data through an HTTP proxy

* Support routing data through an HTTP proxy

This adds the ability for the HttpClient to connect through an HTTP proxy.  We
augment the channel factory to check if it is supposed to be proxied and, if so,
we connect to the proxy host first, issue a CONNECT command through to the final
recipient host and *then* give the channel to the normal http client for usage.

* add docs

* address comments

Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>
2021-11-09 17:24:06 -08:00
zachjsh 1d6df48145
Warn if cache size of lookup is beyond max size (#11863)
Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.
2021-11-03 21:32:22 -04:00
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Charles Smith 6089a168ea
Docs - update dynamic config provider topic (#11795)
* update dynamic config provider

* update topic

* add examples for dynamic config provider:

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-10-14 17:51:32 -07:00
Victoria Lim 42e44269be
Docs update for druid-basic-security (#11782)
* update druid-basic-security

* typo

* revisions from review
2021-10-08 14:45:09 -07:00
lokesh-lingarajan ad6609a606
Kafka Input Format for headers, key and payload parsing (#11630)
### Description

Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights.

PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats.

We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload.

This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row.

Lets look at a sample input format from the above discussion

"inputFormat":
{
    "type": "kafka",     // New input format type
    "headerLabelPrefix": "kafka.header.",   // Label prefix for header columns, this will avoid collusions while merging columns
    "recordTimestampLabelPrefix": "kafka.",  // Kafka record's timestamp is made available in case payload does not carry timestamp
    "headerFormat":  // Header parser specifying that values are of type string
    {
        "type": "string"
    },
    "valueFormat": // Value parser from json parsing
    {
        "type": "json",
        "flattenSpec": {
          "useFieldDiscovery": true,
          "fields": [...]
        }
    },
    "keyFormat":  // Key parser also from json parsing
    {
        "type": "json"
    }
}

Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. 

KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. 

"headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload.

Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch.

Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key".

## KafkaInputFormat Class: 
This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases.

During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 08:56:27 -07:00
Charles Smith 8fd17fe0af
fix a few typos in Kinesis doc (#11776) 2021-10-06 19:43:20 -07:00
Frank Chen 104c9a07f0
Fix broken anchor and heading levels in Kafka/Kinesis ingestion (#11748)
* Fix broken anchor and heading levels

* Fix CI
2021-10-05 19:30:50 -07:00
Vaibhav 3c4bba1478
Update kinesis-ingestion.md (#11767)
* Update kinesis-ingestion.md

It seems that we are declaring (a final int) recordsPerFetch as 400 and fetchDelayMillis as 0 in https://github.com/implydata/druid/blob/imply-2021.09/extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskIOConfig.java#L36

```
public static final int DEFAULT_RECORDS_PER_FETCH = 4000;
public static final int DEFAULT_FETCH_DELAY_MILLIS = 0;
```

updating `recordsPerFetch` and `fetchDelayMillis` to actual default values as hardcoded above .

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-04 11:26:53 -07:00
Jihoon Son 7e90d00cc0
Configurable maxStreamLength for doubles sketches (#11574)
* Configurable maxStreamLength for doubles sketches

* fix equals/hashcode and it test failure

* fix test

* fix it test

* benchmark

* doc

* grouping key

* fix comment

* dependency check

* Update docs/development/extensions-core/datasketches-quantiles.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-31 14:56:37 -07:00
zhangyue19921010 6d14ea2d14
Dynamic auto scale Kinesis-Stream ingest tasks (#10985)
* ready to test

* revert misc.xml

* document kinesis md

* Update docs/development/extensions-core/kafka-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update docs/development/extensions-core/kinesis-ingestion.md

* Update kafka-ingestion.md

remove leading `

* Update kinesis-ingestion.md

add missing `

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-30 15:44:29 -07:00
Charles Smith 9032a0b079
updates Kafka and Kinesis to use . Fixes some typos and other style i… (#11624)
* updates Kafka and Kinesis to use . Fixes some typos and other style issues for Kafka.

* fix spelling

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* address comments

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-08-26 13:22:30 -07:00
Jeet Patel adb2f5c884
Add prometheus-emitter docs (#11618)
* Add prometheus-emitter docs

* Update docs/development/extensions.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-24 08:48:03 -07:00
Clint Wylie ec334a641b
MySQL extension with MariaDB connector docs (#11608)
* add docs for mariadb support via mysql extensions

* add logging so you know what druid knows

* homogenize

* spelling

* missed a couple
2021-08-19 01:52:26 -07:00
Maytas Monsereenusorn ce4dd48bb8
Support custom coordinator duties (#11601)
* impl

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add test

* add test

* add test

* add integration tests

* add integration tests

* add more docs

* address comments

* address comments

* address comments

* add test

* fix checkstyle

* fix test
2021-08-19 11:54:11 +07:00
Karan Kumar d1bad92880
Made the instructions of adding extra resources as part of extensions simpler (#11577) 2021-08-17 17:33:55 +05:30
sthetland 95c5bc3a6d
Clarify when changes to credentialIterations take effect (#11590)
This change updates doc to clarify when and how a change to druid.auth.authenticator.basic.credentialIterations takes effect: changes apply only to new users or existing users upon changing their password via the credentials API, which may not be the expectation.
2021-08-13 17:02:07 -07:00
Charles Smith 6524d838d7
Docs refactor of ingestion. Carries #11541 (#11576)
* Docs refactor of ingestion. Carries #11541

* Update docs/misc/math-expr.md

* add Apache license

* fix header, add topics to sidebar

* Update docs/ingestion/partitioning.md

* pick up changes to  and  md from c7fdf1d, #11479

Co-authored-by: Suneet Saldanha <suneet@apache.org>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-08-13 08:42:03 -07:00
frank chen bf5d829b71
Add more guidelines on the use of aliyun-oss-extensions (#11420)
* Add more description

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Update prefixes usage and Add troubleshooting section

* Add endpoint configuration recommendation

* Fix link

* resolve review comments
2021-08-09 17:27:35 -07:00
Yi Yuan 59c8430d29
change document (#11545)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-08-06 07:57:12 -07:00
Peter Marshall 973e5bf7d0
Docs - HLL lgK tip and slight layout change (#11482)
* HLL lgK and a tip

Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200.  Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated.  Also added a note about using K over 16 being pretty much pointless.

* Corrected spelling

* Create datasketches-hll.md

Put roll-up back to rollup

* Update docs/development/extensions-core/datasketches-hll.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2021-07-26 12:28:53 -07:00
Paul Rogers aa8c615ac2
Updates to source and doc build pages (#11464)
* Updates to source and doc build pages.

Clarifies a few points for newbies.

* Fixed spelling error

And added spellcheck info to website README file.
2021-07-20 18:07:34 -07:00
Abhishek Agarwal 94c1671eaf
Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466)
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager.

SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects.

SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
2021-07-21 00:14:19 +05:30
Joseph Glanville d5e8d4d680
Avro union support (#10505)
* Avro union support

* Document new union support

* Add support for AvroStreamInputFormat and fix checkstyle

* Extend multi-member union test schema and format

* Some additional docs and add Enums to spelling

* Rename explodeUnions -> extractUnions

* explode -> extract

* ByType

* Correct spelling error
2021-07-06 22:05:41 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
Hoseung Lee ed0a57e106
Update kafka-ingestion.md to clarify PasswordProvider support limitation (#11374)
Co-authored-by: Clint Wylie <cjwylie@gmail.com>

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-06-24 21:54:48 -07:00
Yi Yuan 8de0d36c52
Allow query through router when load moving average extension (#11276)
* init commit

* change NoopQuerySegmentWalker name

* change doc

* move NoopQuerySegmentWalker and add document

* fix doc

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-10 18:46:53 +08:00
Yi Yuan 145cf9e5c3
fix document about input format (#11342)
Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-08 23:44:54 +08:00
frank chen 2ee7e31e5b
Fix syntax error (#11332) 2021-06-07 22:35:02 -07:00
frank chen e664bfd433
Improve doc of movingAverage (#11262)
* Make doc more directive

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Add limitation

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Suppress spelling check error
2021-05-28 13:10:55 +08:00
Agustin Gonzalez 4ba5738ffb
Add an issues section to deal with common issues when building druid (#11271) 2021-05-21 09:04:51 -07:00
Yuanli Han 14f1f2aa76
Fix a broken link in the development doc (#11226) 2021-05-10 16:14:06 +08:00
Yuanli Han 8647040f4d
Allow user to set group.id for Kafka ingestion task (#11147)
* allow user to set group.id for Kafka ingestion task

* fix test coverage by removing deprecated code and add doc

* fix typo

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: frank chen <frankchen@apache.org>

Co-authored-by: frank chen <frankchen@apache.org>
2021-05-09 11:56:19 +08:00
Yuanli Han 34169c8550
fix doc (#11202)
(cherry picked from commit ffb3c049726b5e461c6f7f8b6f4b75d2cb907dcc)
2021-05-05 06:17:07 -07:00
Jeet Patel 7139c60868
Change the `id` for `kubernetes` doc link to work (#11176)
* Change the `id` for doc link to work

* Added `druid-kubernetes-extensions` to the list
2021-04-28 10:12:28 -07:00
sthetland fb6751fa45
Fix old broken link (#11048)
* link check fixes

* updated link target

* Update aggregations.md

* spelling error
2021-04-07 20:40:50 -07:00
Himanshu a0d52c3def
k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value (#10961)
* k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value

* update doc

* fix test
2021-04-07 17:45:28 -07:00
Jihoon Son cfcebc40f6
Allow list for JDBC connection properties to address CVE-2021-26919 (#11047)
* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11
2021-04-01 17:30:47 -07:00
Parag Jain b35486fa81
request logs through kafka emitter (#11036)
* request logs through kafka emitter

* travis fixes

* review comments

* kafka emitter unit test

* new line

* travis checks

* checkstyle fix

* count request lost when request topic is null
2021-04-01 11:31:32 +05:30
Charles Smith 8544d29bc7
remove experimental from Kinesis with caveats (#10998)
* remove experimental from Kinesis with caveats

* add suggested known issue

* spelling fixes
2021-03-29 13:57:58 -07:00
Parag Jain 2fdc313e4d
GCS lookup support (#11026)
* GCS lookup support

* checkstyle fix

* review comments

* review comments

* remove unused import
2021-03-30 01:40:41 +05:30
Yi Yuan 36e86a2880
Add protobuf schema registry (#10839)
* dd_protobuf_schema_registry

* change licese

* delete some annotation

* nodify tests

* delete extra exception

* add licenses

* add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version

* seperate kafka-protobuf-provider

* modify protobuf.md

* refine protobuf.md

* add config and header

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-03-09 15:15:51 -08:00
Tianxin Zhao a57c28e9ce
prometheus metric exporter (#10412)
* prometheus-emitter

* use existing jetty server to expose prometheus collection endpoint

* unused variables

* better variable names

* removed unused dependencies

* more metric definitions

* reorganize

* use prometheus HTTPServer instead of hooking into Jetty server

* temporary empty help string

* temporary non-empty help.  fix incorrect dimension value in JSON (also updated statsd json)

* added full help text.  added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation

* added documentation for prometheus emitter

* safety for invalid labelNames

* fix travis checks

* Unit test and better sanitization of metrics names and label values

* add precondition to check namespace against regex

* use precompiled regex

* remove static imports. fix metric types

* better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start()

* Update regex for label-value replacements to allow internal numeric values.  Additional tests

* Adds missing license header
updates website/.spelling to add words used in prometheus-emitter docs.
updates docs/operations/metrics.md to correct the spelling of
bufferPoolName

* fixes version in extensions-contrib/prometheus-emitter

* fix style guide errors

* update import ordering

* add another word to website/.spelling

* remove unthrown declared exception

* remove unused import

* Pushgateway strategy for metrics

* typo

* Format fix and nullable strategy

* Update pom file for prometheus-emitter

* code review comments. Counter to gauge for cache metrics, periodical task to pushGateway

* Syntax fix

* Dimension label regex include numeric character back, fix previous commit

* bump prometheus-emitter pom dev version

* Remove scheduled task inside poen that push metrics

* Fix checkstyle

* Unit test coverage

* Unit test coverage

* Spelling

* Doc fix

* spelling

Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com>
Co-authored-by: Michael Schiff <schiff.michael@gmail.com>
Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com>
Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
2021-03-09 14:37:31 -08:00
zhangyue19921010 bddacbb1c3
Dynamic auto scale Kafka-Stream ingest tasks (#10524)
* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* test dynamic auto scale done

* auto scale tasks tested on prd cluster

* auto scale tasks tested on prd cluster

* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20

* rename test fiel function

* change codes and add docs based on capistrant reviewed

* midify test docs

* modify docs

* modify docs

* modify docs

* merge from master

* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there &&  Make autoscaling algorithm configurable and scalable.

* fix ci failed

* revert msic.xml

* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable

* add more uts

* fix inner class check

* add IT for kafka ingestion with autoscaler

* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler

* review change

* code review

* remove unused imports

* fix NLP

* fix docs and UTs

* revert misc.xml

* use jackson to build autoScaleConfig with default values

* add uts

* use jackson to init AutoScalerConfig in IOConfig instead of Map<>

* autoscalerConfig interface and provide a defaultAutoScalerConfig

* modify uts

* modify docs

* fix checkstyle

* revert misc.xml

* modify uts

* reviewed code change

* reviewed code change

* code reviewed

* code review

* log changed

* do StringUtils.encodeForFormat when create allocationExec

* code review && limit taskCountMax to partitionNumbers

* modify docs

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-06 14:36:52 +05:30