Commit Graph

11392 Commits

Author SHA1 Message Date
Gian Merlino b7a4c79314
Null handling fixes for DS HLL and Theta sketches. (#11830)
* Null handling fixes for DS HLL and Theta sketches.

For HLL, this fixes an NPE when processing a null in a multi-value dimension.

For both, empty strings are now properly treated as nulls (and ignored) in
replace-with-default mode. Behavior in SQL-compatible mode is unchanged.

* Fix expectation.
2021-10-22 19:09:00 -07:00
Gian Merlino cb9bc15e95
Fix task report streaming in https setups. (#11739)
* Fix task report streaming in https setups.

* Trivial change to re-trigger ITs.
2021-10-22 19:07:29 -07:00
Clint Wylie 02b2057371
extract generic dictionary encoded column indexing and merging stuffs (#11829)
* extract generic dictionary encoded column indexing and merging stuffs to pave the path towards supporting other types of dictionary encoded columns

* spotbugs and inspections fixes

* friendlier

* javadoc

* better name

* adjust
2021-10-22 17:31:22 -07:00
Victoria Lim 43103632fb
Docs - add description on time origin (#11826)
* add description on time origin

* reorder parameter descriptions

* add example of origin value
2021-10-22 14:57:13 -07:00
Clint Wylie 741b4ed516
add output type information to ExpressionPostAggregator (#11818)
* add ColumnInspector argument to PostAggregator.getType to allow post-aggs to compute their output type based on input types

* add test for test for coverage

* simplify

* Remove unused imports.

Co-authored-by: Gian Merlino <gian@imply.io>
2021-10-22 13:52:51 -07:00
Arun Ramani df4894afff
Fallback to /sys/fs root when looking for cgroups (#11810)
ProcCgroupDiscoverer builds the cgroup directory by concatenating the proc mounts and proc cgroup paths together. This doesn't seem to work in Kubernetes if the execution context is within the container. Also this isn't consistent across all Linux OSes. The fix is to fallback to / as the root and it seems to work empirically.
2021-10-21 09:51:16 +05:30
Alexander Saydakov 8cf1cbc4a9
latest datasketches-java and datasketches-memory (#11773)
* latest datasketches-java and datasketches-memory

* updated versions of datasketches-java and datasketches-memory

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2021-10-19 23:42:30 -07:00
David Ferlay a7ee646927
Missing Loader parameter in generate-binary-license and generate-binary-notice py scripts (#11815) 2021-10-20 00:25:17 +05:30
Clint Wylie 187df58e30
better types (#11713)
* better type system

* needle in a haystack

* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support

* fixup merge

* more test

* fixup

* intern

* fix

* oops

* oops again

* ...

* more test coverage

* fix error message

* adjust interning, more javadocs

* oops

* more docs more better
2021-10-19 01:47:25 -07:00
Sandeep 17459a84d3
Update link to helm chart quickstart guide (#11801) 2021-10-19 14:10:40 +05:30
David Bar 7d4841471f
Optimize supervisor history retrieval for specific id (#11807)
Optimization. Fetch from the metadata store only the relevant history items for the requested supervisor id.
2021-10-19 14:08:25 +05:30
TSFenwick 9c15f938fd
fix test issue where JettyTest would fail if JettyWithResponseFilterEnabledTest ran before it (#11803)
this change ensures that JettyTest is setting the properties it needs in case some other test overwrites them
this also changes up the ordering of the call for setProperties to call super's first in case super is setting the same property
2021-10-18 12:42:41 -07:00
Charles Smith 938c1493e5
edits to kafka inputFormat (#11796)
* edits to kafka inputFormat

* revise conflict resolution description

* tweak for clarity

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* style fixes

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-10-15 14:01:10 -07:00
Charles Smith 6089a168ea
Docs - update dynamic config provider topic (#11795)
* update dynamic config provider

* update topic

* add examples for dynamic config provider:

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-10-14 17:51:32 -07:00
Abhishek Agarwal 4f62905be0
Fix the travis build (#11799) 2021-10-14 16:31:51 +05:30
Agustin Gonzalez 887cecf29e
Simplify ITHttpInputSourceTest to mitigate flakiness (#11751)
* Increment retry count to add more time for tests to pass

* Re-enable ITHttpInputSourceTest

* Restore original count

* This test is about input source, hash partitioning takes longer and not required thus changing to dynamic

* Further simplify by removing sketches
2021-10-12 11:51:27 -05:00
andreacyc adb2237628
Fix CVE-2021-3749 reported in security vulnerabilities job (#11786)
* Fix CVE-2021-3749 reported in security vulnerabilities job

* test why test fail

* update axios

* remove console log for testing
2021-10-08 23:02:58 -07:00
Kashif Faraz 7352c83e11
Do not log sensitive property value if JsonConfigurator fails to parse (#11787)
* Do not log property value if JsonConfigurator fails to parse

* Add comment to explain log change

* Fix log language
2021-10-09 09:59:03 +05:30
Arun Ramani b6b42d3936
Minor processor quota computation fix + docs (#11783)
* cpu/cpuset cgroup and procfs data gathering

* Renames and default values

* Formatting

* Trigger Build

* Add cgroup monitors

* Return 0 if no period

* Update

* Minor processor quota computation fix + docs

* Address comments

* Address comments

* Fix spellcheck

Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>
2021-10-08 22:52:03 -05:00
Victoria Lim 42e44269be
Docs update for druid-basic-security (#11782)
* update druid-basic-security

* typo

* revisions from review
2021-10-08 14:45:09 -07:00
Kashif Faraz c2c724c065
Fix docs to explain that WRITE permissions do not include READ (#11785)
* Fix docs to explain that WRITE and READ are exclusive

* Fix indentation

* Use suggested doc style
2021-10-08 14:10:20 -07:00
Joseph Glanville 989297edc3
Docker copy before env and respect JAVA_OPTS (#11364)
* Change ordering of config file vs env vars in Docker

Currently if you provide a config file it negates any settings set via environment variables.
This change allows use of a config file as a base and allow environment variables to override.
Additionally this allows dynamic features such as DRUID_SET_HOST to function correctly when a config file has been provided.

* Custom JAVA_OPTS should override service jvm.config
2021-10-08 14:05:37 -07:00
Charles Smith 3ecbd3aec4
docs for changes to authorization in #11718 and #11720 (#11779)
* security recommendation

* Update docs/operations/security-overview.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/operations/security-user-auth.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/operations/security-user-auth.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update security-user-auth.md

add newline

* Update docs/operations/security-overview.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update security-overview.md

add suggestion for environment variable dynamic config provider

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
2021-10-08 14:04:04 -07:00
Kashif Faraz f2d6100124
Require Datasource WRITE authorization for Supervisor and Task access (#11718)
Follow up PR for #11680

Description
Supervisor and Task APIs are related to ingestion and must always require Datasource WRITE
authorization even if they are purely informative.

Changes
Check Datasource WRITE in SystemSchema for tables "supervisors" and "tasks"
Check Datasource WRITE for APIs /supervisor/history and /supervisor/{id}/history
Check Datasource for all Indexing Task APIs
2021-10-08 10:39:48 +05:30
Katya Macedo 45d0ecbefb
clarify hadoop input paths (#11781)
Co-authored-by: Katya Macedo <katya.macedo@imply.io>
2021-10-07 20:22:51 -07:00
lokesh-lingarajan ad6609a606
Kafka Input Format for headers, key and payload parsing (#11630)
### Description

Today we ingest a number of high cardinality metrics into Druid across dimensions. These metrics are rolled up on a per minute basis, and are very useful when looking at metrics on a partition or client basis. Events is another class of data that provides useful information about a particular incident/scenario inside a Kafka cluster. Events themselves are carried inside kafka payload, but nonetheless there are some very useful metadata that is carried in kafka headers that can serve as useful dimension for aggregation and in turn bringing better insights.

PR(https://github.com/apache/druid/pull/10730) introduced support of Kafka headers in InputFormats.

We still need an input format to parse out the headers and translate those into relevant columns in Druid. Until that’s implemented, none of the information available in the Kafka message headers would be exposed. So first there is a need to write an input format that can parse headers in any given format(provided we support the format) like we parse payloads today. Apart from headers there is also some useful information present in the key portion of the kafka record. We also need a way to expose the data present in the key as druid columns. We need a generic way to express at configuration time what attributes from headers, key and payload need to be ingested into druid. We need to keep the design generic enough so that users can specify different parsers for headers, key and payload.

This PR is designed to solve the above by providing wrapper around any existing input formats and merging the data into a single unified Druid row.

Lets look at a sample input format from the above discussion

"inputFormat":
{
    "type": "kafka",     // New input format type
    "headerLabelPrefix": "kafka.header.",   // Label prefix for header columns, this will avoid collusions while merging columns
    "recordTimestampLabelPrefix": "kafka.",  // Kafka record's timestamp is made available in case payload does not carry timestamp
    "headerFormat":  // Header parser specifying that values are of type string
    {
        "type": "string"
    },
    "valueFormat": // Value parser from json parsing
    {
        "type": "json",
        "flattenSpec": {
          "useFieldDiscovery": true,
          "fields": [...]
        }
    },
    "keyFormat":  // Key parser also from json parsing
    {
        "type": "json"
    }
}

Since we have independent sections for header, key and payload, it will enable parsing each section with its own parser, eg., headers coming in as string and payload as json. 

KafkaInputFormat will be the uber class extending inputFormat interface and will be responsible for creating individual parsers for header, key and payload, blend the data resolving conflicts in columns and generating a single unified InputRow for Druid ingestion. 

"headerFormat" will allow users to plug parser type for the header values and will add default header prefix as "kafka.header."(can be overridden) for attributes to avoid collision while merging attributes with payload.

Kafka payload parser will be responsible for parsing the Value portion of the Kafka record. This is where most of the data will come from and we should be able to plugin existing parser. One thing to note here is that if batching is performed, then the code is augmenting header and key values to every record in the batch.

Kafka key parser will handle parsing Key portion of the Kafka record and will ingest the Key with dimension name as "kafka.key".

## KafkaInputFormat Class: 
This is the class that orchestrates sending the consumerRecord to each parser, retrieve rows, merge the columns into one final row for Druid consumption. KafkaInputformat should make sure to release the resources that gets allocated as a part of reader in CloseableIterator<InputRow> during normal and exception cases.

During conflicts in dimension/metrics names, the code will prefer dimension names from payload and ignore the dimension either from headers/key. This is done so that existing input formats can be easily migrated to this new format without worrying about losing information.
2021-10-07 08:56:27 -07:00
Arun Ramani 15789137a3
Add cpu/cpuset cgroup and procfs data gathering (#11763)
* cpu/cpuset cgroup and procfs data gathering

* Renames and default values

* Formatting

* Trigger Build

* Add cgroup monitors

* Return 0 if no period

* Update

Co-authored-by: arunramani-imply <84351090+arunramani-imply@users.noreply.github.com>
2021-10-06 20:27:36 -07:00
Charles Smith 8fd17fe0af
fix a few typos in Kinesis doc (#11776) 2021-10-06 19:43:20 -07:00
Lucas Capistrant 1930ad1f47
Implement configurable internally generated query context (#11429)
* Add the ability to add a context to internally generated druid broker queries

* fix docs

* changes after first CI failure

* cleanup after merge with master

* change default to empty map and improve unit tests

* add doc info and fix checkstyle

* refactor DruidSchema#runSegmentMetadataQuery and add a unit test
2021-10-06 09:02:41 -07:00
Kashif Faraz b688db790b
Add Broker config `druid.broker.segment.ignoredTiers` (#11766)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to ignore the segments being served
by the specified historical tiers. By default, no tier is ignored.

This config is useful when you want a completely isolated tier amongst many other tiers.

Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn
and there are several brokers Broker B1, Broker B2 .... Broker Bm

If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers
on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"]
for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]
2021-10-06 10:06:32 +05:30
Frank Chen 104c9a07f0
Fix broken anchor and heading levels in Kafka/Kinesis ingestion (#11748)
* Fix broken anchor and heading levels

* Fix CI
2021-10-05 19:30:50 -07:00
Vadim Ogievetsky 635490d568
don't throw local storage errors (#11752) 2021-10-05 18:49:16 -07:00
Vadim Ogievetsky c1e0e6825f
auto refresh in foreground only (#11750) 2021-10-05 18:48:23 -07:00
Clint Wylie 2593df5e5b
add utility to aid in formatting release notes to be linkable (#11728)
* add utility to aid in formatting release notes to be linkable

* add docs
2021-10-05 18:26:41 -07:00
Charles Smith 621e5ac63f
docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor (#11565)
* docs: clarify RealtimeMetricsMonitor, HistoricalMetricsMonitor

* Update docs/configuration/index.md
2021-10-05 17:38:23 -07:00
Maytas Monsereenusorn f60b3b3bab
fix doc (#11772) 2021-10-05 15:42:11 -07:00
andreacyc f82baf174e
Support real query cancelling for web console (#11738)
* Support real query cancelling for web console

* use uuid for queryId, create isSql reuse variable, and add catch for rejectionhandled promise

* remove delete api promise.then() response

* slove conflicts

* update read me with debug

* add degub code to test why CI failed

* included a druid extension called druid-testing-tools and it is not build nor loaded by default

* remove unuse variable

* remove debug log
2021-10-05 10:28:49 -07:00
Xavier Léauté bc3b038712
Update Apache Kafka client libraries to 3.0.0 (#11735)
Release notes:
https://downloads.apache.org/kafka/3.0.0/RELEASE_NOTES.html
https://blogs.apache.org/kafka/entry/what-s-new-in-apache6
2021-10-05 10:23:19 -07:00
Victoria Lim a31d99fb37
update docs with X-Druid-SQL-Query-Id (#11761)
* update docs with X-Druid-SQL-Query-Id

* review comments

* update header description

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-06 00:15:05 +07:00
Caroline1000 ffbe303828
Update balancer strategy recommendations (#11759)
* Update balancer strategy recommendations

* Update docs/configuration/index.md

* Update docs/configuration/index.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-10-05 09:47:37 -07:00
Vaibhav 3c4bba1478
Update kinesis-ingestion.md (#11767)
* Update kinesis-ingestion.md

It seems that we are declaring (a final int) recordsPerFetch as 400 and fetchDelayMillis as 0 in https://github.com/implydata/druid/blob/imply-2021.09/extensions-core/kinesis-indexing-service/src/main/java/org/apache/druid/indexing/kinesis/KinesisIndexTaskIOConfig.java#L36

```
public static final int DEFAULT_RECORDS_PER_FETCH = 4000;
public static final int DEFAULT_FETCH_DELAY_MILLIS = 0;
```

updating `recordsPerFetch` and `fetchDelayMillis` to actual default values as hardcoded above .

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-04 11:26:53 -07:00
sthetland d02d2d9d56
Design/architecture doc touchups (#11762)
* rearrange design content

* casing consistency

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-04 11:09:35 -07:00
Maytas Monsereenusorn 129911a20e
Add documentations for config to filter internal Druid-related messages from error response (#11755)
* add doc

* add doc

* address comments

* fix typo

* address comments
2021-10-01 17:49:02 +07:00
Jihoon Son 1c0b76ba93
Add killAndRestart for container for integration tests (#11754) 2021-09-30 13:47:57 -07:00
Maytas Monsereenusorn 8cc58a4368
Add sql query id to response header for failed sql query (#11756)
* add impl

* add impl
2021-09-30 13:43:39 +07:00
Clint Wylie 11017ef00a
support jdbc even if trailing / is missing (#11737)
* support jdbc even if trailing / is missing

* fix tests
2021-09-29 13:59:26 -07:00
Clint Wylie 335b582377
suppress hive-storage-api thrift security vulnerability (#11753) 2021-09-28 23:54:13 -07:00
Maytas Monsereenusorn a04b08e45c
Add new config to filter internal Druid-related messages from Query API response (#11711)
* add impl

* add impl

* add tests

* add unit test

* fix checkstyle

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* address comments

* address comments

* address comments

* fix test

* fix test

* fix test

* fix test

* fix test

* change config name

* change config name

* change config name

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* fix compile

* fix compile

* change config

* add more tests

* fix IT
2021-09-29 12:55:49 +07:00
Agustin Gonzalez 988623b7ae
ITHttpInputSourceTest instability blocking the development pipeline (#11749) 2021-09-28 13:42:01 -07:00
Kashif Faraz c641657bae
Fix router documentation for `druid.router.sql.enable` (#11716)
* Rename field, fix router documentation

* Add more lines to doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-28 22:54:13 +05:30